Forgive my ignorance for not knowing the technical term for foreigner languages that use characters as
ø i.e. Helsingør
Ł i.e Łeczna
ı i.e Altınordu
ł i.e. Głogow
how could I normalize those with Javascript (write a regex), while also making case insentive?
const strArr = ['Helsingør', 'Łeczna', 'Altınordu', 'Głogow', Népoão's]
With the code below, I was able to replace latin based characters (é, ñ, ô, etc) but not the above ones.
strArr.map(string => string.normalize('NFD')
.replaceAll(/[\u0300-\u036f,"\'"]/g, ''))
Mi final output should read as ['Helsingor', 'Leczna', 'Altinordu', 'Glogow', Nepoaos]
For replace non-ASCII characters with closet ASCII Character you need to use lodash library:
So your final code for replace special characters will be:
const _ = require('lodash');
const strArr = ['Helsingør', 'Łeczna', 'Altınordu', 'Głogow'];
const normalized = strArr.map(string => _.deburr(string));
console.log(normalized);
Output Result :
[ 'Helsingor', 'Leczna', 'Altinordu', 'Glogow' ]
As you discovered the .normalize('NFD') only decomposes Latin characters. You'd need to use library that normalizes other languages, possibly https://github.com/walling/unorm.
You can also roll your own. Here is a solution that uses the .normalize('NFD') in conjunction with a iMap object that maps from international to English characters. This iMap is short, you can expand as needed, such as taking the mapping from https://github.com/cvan/lunr-unicode-normalizer/blob/master/lunr.unicodeNormalizer.js. You can also override the mapping of .normalize('NFD'), for example, umlaut ü is better mapped to ue instead of u.
function normalizeString(str) {
const iMap = {
'ð': 'd',
'ı': 'i',
'Ł': 'L',
'ł': 'l',
'ø': 'o',
'ß': 'ss',
'ü': 'ue'
};
const iRegex = new RegExp(Object.keys(iMap).join('|'), 'g')
return str
.replace(iRegex, (m) => iMap[m])
.normalize("NFD")
.replace(/[\u0300-\u036f]/g, '');
}
[
'Helsingør', 'Łeczna', 'Altınordu', 'Głogow',
'Áfram með smjörið', 'Crème Brulée', 'Bär Müller Straße'
].forEach(str => {
let result = normalizeString(str);
console.log(str, '=>', result);
});
Output:
Helsingør => Helsingor
Łeczna => Leczna
Altınordu => Altinordu
Głogow => Glogow
Áfram með smjörið => Afram med smjorid
Crème Brulée => Creme Brulee
Bär Müller Straße => Bar Mueller Strasse
I am trying to make my code looks professional by removing those duplicate code. the question is I want to get some data from a string, to be specific, I need to know the NUMBER, X, Y, Z, A, B, etc. values but the regex expression are different for each variable so I have to repeat myself writing a lot of duplicate code.
let TextString = `DRILL(NUMBER:=20,NAME:='4',PN:=1,X:=10.1,Y:=73.344,Z:=0,A:=-1.435,B:=1.045,M1:=1,M2:=2,M3:=3,M4:=4,M5:=1,S1:=10.5,S2:=2.1,S3:=1.2,S4:=2,S5:=2.4,RS1:=1,RS2:=2);`;
const regNumber = /(?<=NUMBER:=)[0-9]+/gm;
let lineNumber = Number(TextString.match(regNumber));
const regX = /(?<=X:=)(-?[0-9]+)(.[0-9]+)?/gm;
let X = Number(TextString.match(regX)).toFixed(1);
const regY = /(?<=Y:=)(-?[0-9]+)(.[0-9]+)?/gm;
let Y = Number(TextString.match(regY)).toFixed(1);
const regZ = /(?<=Z:=)(-?[0-9]+)(.[0-9]+)?/gm;
let Z = Number(TextString.match(regZ)).toFixed(1);
const regA = /(?<=A:=)(-?[0-9]+)(.[0-9]+)?/gm;
let A = Number(TextString.match(regA)).toFixed(1);
const regB = /(?<=B:=)(-?[0-9]+)(.[0-9]+)?/gm;
let B = Number(TextString.match(regB)).toFixed(1);
// and many more duplicate code.
console.log(lineNumber, X, Y, Z, A, B);
I could only think of a way like the above, to match each variable individually and run .match() multiple times, but as you can see there are 17 variables total and in real situations, there are hundreds of these TextString. I was worried that this matching process will have a huge impact on performance.
Are there any other ways to fetch all variables in one match and store them in an array or object? or any other elegant way of doing this?
Every coordinate will have a single letter identifier, so you can use a more general positive lookback (?<=,[A-Z]:=). This lookback matches a comma followed by a single uppercase letter then the equality symbol.
You can then use .match() to get all matches and use .map() to run the conversion you were doing.
let TextString = `DRILL(NUMBER:=20,NAME:='4',PN:=1,X:=10.1,Y:=73.344,Z:=0,A:=-1.435,B:=1.045,M1:=1,M2:=2,M3:=3,M4:=4,M5:=1,S1:=10.5,S2:=2.1,S3:=1.2,S4:=2,S5:=2.4,RS1:=1,RS2:=2);`;
const regNumber = /(?<=NUMBER:=)[0-9]+/gm;
let lineNumber = Number(TextString.match(regNumber));
const regex = /(?<=,[A-Z]:=)(-?[0-9]+)(.[0-9]+)?/gm;
let coord = TextString.match(regex).map(n => Number(n).toFixed(1));
console.log(lineNumber, coord);
You could write a single pattern:
(?<=\b(?:NUMBER|[XYZAB]):=)-?\d+(?:\.\d+)?\b
Explanation
(?<= Positive lookbehind, assert that to the left of the current position is
\b(?:NUMBER|[XYZAB]):= Match either NUMBER or one of X Y Z A B preceded by a word boundary and followed by :=
) Close the lookbehind
-? Match an optional -
\d+(?:\.\d+)? Match 1+ digits and an optional decimal part
\b A word boundary to prevent a partial word match
See a regex demo.
const TextString = `DRILL(NUMBER:=20,NAME:='4',PN:=1,X:=10.1,Y:=73.344,Z:=0,A:=-1.435,B:=1.045,M1:=1,M2:=2,M3:=3,M4:=4,M5:=1,S1:=10.5,S2:=2.1,S3:=1.2,S4:=2,S5:=2.4,RS1:=1,RS2:=2);`;
const regNumber = /(?<=\b(?:NUMBER|[XYZAB]):=)-?\d+(?:\.\d+)?\b/g;
const result = TextString
.match(regNumber)
.map(s =>
Number(s).toFixed(1)
);
console.log(result);
One possible approach could be based on a regex pattern which utilizes capturing groups. The matching regex for the OP's sample text would look like this ...
/\b(NUMBER|[XYZAB])\:=([^,]+),/g
... and the description is provided with the regex' test site.
The pattern is both simple and generic. The latter is due to always capturing both the matching key like Number and its related value like 20. Thus it doesn't matter where a key-value pair occurs within a drill-data string.
Making use later of an object based Destructuring Assignment for assigning all of the OP's variables at once the post processing task needs to reduce the result array of matchAll into an object which features all the captured keys and values. Within this task one also can control how the values are computed and/or whether or how the keys might get sanitized.
const regXDrillData = /\b(NUMBER|[XYZAB])\:=([^,]+),/g;
const textString =
`DRILL(NUMBER:=20,NAME:='4',PN:=1,X:=10.1,Y:=73.344,Z:=0,A:=-1.435,B:=1.045,M1:=1,M2:=2,M3:=3,M4:=4,M5:=1,S1:=10.5,S2:=2.1,S3:=1.2,S4:=2,S5:=2.4,RS1:=1,RS2:=2);`;
// - processed values via reducing the captured
// groups of a `matchAll` result array of a
// generic drill-data match-pattern.
const {
number: lineNumber,
x, y, z,
a, b,
} = [...textString.matchAll(regXDrillData)]
.reduce((result, [match, key, value]) => {
value = Number(value);
value = (key !== 'NUMBER') ? value.toFixed(1) : value;
return Object.assign(result, { [ key.toLowerCase() ]: value });
}, {})
console.log(
`processed values via reducing the captured
groups of a 'matchAll' result array of a
generic drill-data match-pattern ...`,
{ lineNumber, x, y, z, a, b },
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
Every value match a pattern :=[value], or :=[value]) for the last one. So there is my regex
(?<=:=)-?[\d\w.']+(?=[,)])
Positive Lookbehind (?<=:=) look for match behind :=
-? match - optional (for negative number)
[\d\w.']+: match digit, word character, ., '
Positive Lookahead (?=[,)]) look for match ahead character , or )
Live regex101.com demo
Now change your code to
let TextString = `DRILL(NUMBER:=20,NAME:='4',PN:=1,X:=10.1,Y:=73.344,Z:=0,A:=-1.435,B:=1.045,M1:=1,M2:=2,M3:=3,M4:=4,M5:=1,S1:=10.5,S2:=2.1,S3:=1.2,S4:=2,S5:=2.4,RS1:=1,RS2:=2);`;
const regexPattern= /(?<=:=)-?[\d\w.']+(?=[,)])/g;
console.log(TextString.match(regexPattern))
// ['20', "'4'", '1', '10.1', '73.344', '0', '-1.435', '1.045', '1', '2', '3', '4', '1', '10.5', '2.1', '1.2', '2', '2.4', '1', '2']
Edit
I just realized the the Positive Lookahead is unnecessary as #Peter Seliger
mentioned
(?<=:=)-?[\d\w.']+
Change your regex pattern to
const regexPattern= /(?<=:=)-?[\d\w.']+/g;
Here is a solution using a .reduce() on keys of interest and returns an object:
const TextString = `DRILL(NUMBER:=20,NAME:='4',PN:=1,X:=10.1,Y:=73.344,Z:=0,A:=-1.435,B:=1.045,M1:=1,M2:=2,M3:=3,M4:=4,M5:=1,S1:=10.5,S2:=2.1,S3:=1.2,S4:=2,S5:=2.4,RS1:=1,RS2:=2);`;
const keys = [ 'NUMBER', 'X', 'Y', 'Z', 'A', 'B' ];
let result = keys.reduce((obj, key) => {
const regex = new RegExp('(?<=\\b' + key + ':=)-?[0-9.]+');
obj[key] = Number(TextString.match(regex)).toFixed(1);
return obj;
}, {});
console.log(result);
Output:
{
"NUMBER": "20.0",
"X": "10.1",
"Y": "73.3",
"Z": "0.0",
"A": "-1.4",
"B": "1.0"
}
Notes:
The regex is built dynamically from the key
A \b word boundary is added to the regex to reduce the chance of unintended matches
If you need the line number as an integer you could take that out of the keys, and handle it separately.
I want to replace multiple parts of a string with different things. I have a series of URLs that contain these strings that need to change, they all follow the same pattern.
e.g.
'spanish-beginners-course'
'italian-beginners-course'
'spanish-italian-beginners-course'
I just want the result to be the languages e.g. spanish, italian, spanish italian
I have tried this as a test but it returns 'spanish undefined undefined'
const pageName = 'spanish-beginners-course'
const chars = { '-beginners': '', '-course': '', '-': ' ' }
const language = pageName.replace(/-|beginners|course/g, m => chars[m])
This is happening because your REGEX match finds beginners, but in your chars object there is no key called beginners - it's called -beginners. Same for course/-course.
const pageName = 'spanish-beginners-course'
const chars = { '-beginners': '', '-course': '', '-': ' ' }
const language = pageName.replace(/-|beginners|course/g, m => chars[m])
In any case your object is unnecessary, and so is REGEX (as #Alastair points out) since you're replacing a static, unchanging substring.
const language = pageName.replace('-beginners-course', '');
Your case is very simple, you can split and take first.
let pageName = "spanish-beginners-course";
let language = pageName.split(/-/)[0];
console.log(language); // spanish
pageName = "italian-beginners-course";
language = pageName.split(/-/)[0];
console.log(language); //italian
.as-console-row {color: blue!important}
You get null because this part -|beginners|course is an alternation which will match either -, beginners or course
You use the match to get the value from the object, but the object contains -beginners and -course
If you want to do the replacement, and there has to be at least 1 word before it, you could use a capturing group $1 in the replacement and match after it what you want to remove.
(\w+(?:-\w+)*)-beginners-course\b
Regex demo
const pageName = 'spanish-beginners-course';
const language = pageName.replace(/(\w+(?:-\w+)*)-beginners-course\b/g, "$1");
console.log(language)
[
'spanish-beginners-course',
'italian-beginners-course',
'spanish-italian-beginners-course',
'donotremove!-beginners-course'
]
.forEach(s => console.log(s.replace(/(\w+(?:-\w+)*)-beginners-course\b/g, "$1")));
I have a regular expression to search in a string.
new RegExp("\\b"+searchText+"\\b", "i")
My strings:
"You are likely to find [[children]] in [[a school]]"
"[[school]] is for [[learning]]"
How can I search only the words in double brackets?
A regular expression should contain searchText as a function argument.
This RegEx will give you the basics of what you want:
\[\[[^\]]*\]\]
\[\[ matches the two starting brackets. A bracket is a special character in RegEx, hence it must be escaped with \
[^\]]* is a negated set that matches zero or more of any character except a closing bracket. This matches the content in-between the brackets.
\]\] matches the two closing brackets.
Here's a very basic example of what you could do with this:
let string = "You are likely to find [[children]] in [[a school]]<br>[[school]] is for [[learning]]";
string = string.replace(/\[\[[^\]]*\]\]/g, x => `<mark>${x}</mark>`);
document.body.innerHTML = string;
You can use this regex:
var str = `-- You are likely to find [[children]] in [[a school]]
-- [[school]] is for [[learning]]`;
var regex = /(?<=(\[\[))([\w\s]*)(?=(\]\]))/gm;
var match = str.match(regex);
console.log(match);
const re = /(?<=\[\[)[^\]]+(?=]])/gm
const string = `-- You are likely to find [[children]] in [[a school]]
-- [[school]] is for [[learning]]`
console.log(string.match(re))
const replacement = {
children: 'adults',
'a school': 'a home',
school: 'home',
learning: 'rest',
}
console.log(string.split(/(?<=\[\[)[^\]]+(?=]])/).map((part, index) => part + (replacement[string.match(re)[index]] || '')).join(''))