What is lookbehind support in JS? How to replace it?

What is lookbehind support in JS? How to replace it? - javascript

I have a string and I want to replace every 'i' that is NOT following/followed by any other i and replace it with 'z`. I know that there is negative lookahead and lookbehind.
Results shoud be:
i => z
iki => zkz
iiki => iikz
ii => ii
iii => iii
I tried to use this:
/(?<!i)i(?!i)/gi
and it failed and thrown an error: Invalid regex group.
Yet
/i(?!i)/gi
works fine, but matches second "i" in this: "ii".
Is there some other way?
What is support for lookbehind in JS if there is any?

In your case you don't really need look-behind:
'iiki'.replace(/i+/g, (m0) => m0.length > 1 ? m0 : 'z')
You can just use a function as the replacement part and test the length of the matched string.
Here are all your test cases:
function test(input, expect) {
const result = input.replace(/i+/g, (m0) => m0.length > 1 ? m0 : 'z');
console.log(input + " => " + result + " // " + (result === expect ? "Good" : "ERROR"));
}
test('i', 'z');
test('iki', 'zkz');
test('iiki', 'iikz');
test('ii', 'ii');
test('iii', 'iii');

Lookbehind in JavaScript regular expressions is quite new. As of this writing, it's only supported in V8 (in Chrome, Chromium, Brave...), not by other engines.
There are many questions with answers here about how to work around not having lookbehind, such as this one.
This article by Steven Levithan also shows ways to work around the absense of the feature.
I want to replace every 'i' that is NOT following/followed by any other i and replace it with 'z`
That's fairly easy to do without either lookahead or lookbehind, using placeholders and a capture group. You can capture what follows the i:
const rex = /i(i+|.|$)/g;
...and then conditionally replace it if what was captured isn't an i or series of is:
const result = input.replace(rex, (m, c) => {
return c[0] === "i" ? m : "z" + c;
});
Live Example:
const rex = /i(i+|.|$)/g;
function test(input, expect) {
const result = input.replace(rex, (m, c) => {
return c[0] === "i" ? m : "z" + c;
});
console.log(input, result, result === expect ? "Good" : "ERROR");
}
test("i", "z");
test("iki", "zkz");
test("iiki", "iikz");
test("ii", "ii");
test("iii", "iii");

One hack you can use in this case. is changing the offset value based on match.
let arr = ['i','iki','iiki','ii','iii', 'ki']
arr.forEach(e=>{
let value = e.replace(/i(?!i)/g, function(match,offset,string){
return offset > 0 && string[offset-1] === 'i' ? 'i' : 'z'
})
console.log(value)
})

Related

Can this problem be solved using functional programming only?

The prompt was:
Create a function that takes in a string and returns a "URL version" of the string. This simply involves replacing the spaces with %20.
It asked to solve the problem using recursion and using .replace is not allowed.
Here is my solution but I understand the ouputArray is being mutated. Is there any other way to solve this without a mutation?
let inputString = "hello world I am fine";
let outputArray = [];
let stringToUrl = (inputString, n) => {
inputArray = [...inputString]
if(n < inputArray.length) {
if(inputArray[n] !== " ") {
outputArray.push(inputArray[n])
return stringToUrl(inputArray, n+1)
}
else {
outputArray.push("%20")
return stringToUrl(inputArray, n+1)
}
}
return outputArray.join('');
}
console.log(stringToUrl(inputString, 0))

Yes, you can do this with FP. In keeping with How do I ask and answer homework questions?, I won't reply with code, but with pointers.
If you weren't doing this with FP (but still had to write it yourself rather than using the string replace method, etc.), you'd probably use a loop building up a new string by looping through the original string character by character and either adding the original character to the new string or adding %20 to it.
In FP, loops are often done via recursion, and your instructions are to use recursion, so we'll do that instead.
Your function should handle the first character in the string it's given (either keeping it or replacing it with %20), and if that character is the only character, just return that updated "character;" otherwise, it should return the updated character followed by the result of passing the rest of the string (all but that first character) through your function again. That will work through the entire string via recursion, building up the new string. (No need for arrays, string concatenation and substring should be fine.)

Here I have made some changes to your code. Hope this solves your problem.
I don't have to use the second array but make changes to the original array.
let inputString = "hello world I am fine";
let stringToUrl = (inputString, n) => {
inputArray = [...inputString]
if(n < inputArray.length) {
if(inputArray[n] === " ") {
inputArray[n] = "%20"
return stringToUrl(inputArray, n+1)
}
else {
return stringToUrl(inputArray, n+1)
}
}
return inputArray.join('');
}
console.log(stringToUrl(inputString, 0))

const replace = (char: string) => char === ' ' ? '%20' : char;
const convert = (str: string, cache = ''): string => {
const [head, ...tail] = str;
return head
? convert(
tail.join(''),
cache.concat(replace(head))
)
: cache
}
const result = convert("hello world I am fine") // hello%20world%20I%20am%20fine
Playground
I hope this task is not language agnostic, because JS is not best choise in terms of recursion optimization.

One option to do that could be using a call to stringToUrl and use an inner recursive function making use of default parameters passing the values of the variables as function arguments.
For example using an arrow function, and also passing a function as a parameter that does a check to either add %20 to the array with final characters:
const stringToUrl = str => {
const func = (
s,
r = "",
c = s.charAt(0),
f = () => r += c === ' ' ? '%20' : c
) => s.length ? f() && func(s.substr(1), r) : r
return func(str)
}
console.log(stringToUrl("hello world I am fine"));
Output
hello%20world%20I%20am%20fine
const stringToUrl = str => {
const func = (
s,
r = "",
c = s.charAt(0),
f = () => r += c === ' ' ? '%20' : c
) => s.length ? f() && func(s.substr(1), r) : r
return func(str)
}
[
"",
" ",
" ",
"hello world I am fine"
].forEach(s =>
console.log(`[${s}] --> ${stringToUrl(s)}`)
);

Match all instances of character except the first one, without lookbehind

I’m struggling with this simple regex that is not working correctly in Safari:
(?<=\?.*)\?
It should match each ?, except of the first one.
I know that lookbehind is not working on Safari yet, but I need to find some workaround for it. Any suggestions?

You can use an alternation capture until the first occurrence of the question mark. Use that group again in the replacement to leave it unmodified.
In the second part of the alternation, match a questionmark to be replaced.
const regex = /^([^?]*\?)|\?/g;
const s = "test ? test ? test ?? test /";
console.log(s.replace(regex, (m, g1) => g1 ? g1 : "[REPLACE]"));

There are always alternatives to lookbehinds.
In this case, all you need to do is replace all instances of a character (sequence), except the first.
The .replace method accepts a function as the second argument.
That function receives the full match, each capture group match (if any), the offset of the match, and a few other things as parameters.
.indexOf can report the first offset of a match.
Alternatively, .search can also report the first offset of a match, but works with regexes.
The two offsets can be compared inside the function:
const yourString = "Hello? World? What? Who?",
yourReplacement = "!",
pattern = /\?/g,
patternString = "?",
firstMatchOffsetIndexOf = yourString.indexOf(patternString),
firstMatchOffsetSearch = yourString.search(pattern);
console.log(yourString.replace(pattern, (match, offset) => {
if(offset !== firstMatchOffsetIndexOf){
return yourReplacement;
}
return match;
}));
console.log(yourString.replace(pattern, (match, offset) => {
if(offset !== firstMatchOffsetSearch){
return yourReplacement;
}
return match;
}));
This works for character sequences, too:
const yourString = "Hello. Hello. Hello. Hello.",
yourReplacement = "Hi",
pattern = /Hello/g,
firstOffset = yourString.search(pattern);
console.log(yourString.replace(pattern, (match, offset) => {
if(offset !== firstOffset){
return yourReplacement;
}
return match;
}));

Split and join with
var s = "one ? two ? three ? four"
var l = s.split("?") // Split with ?
var first = l.shift() // Get first item and remove from l
console.log(first + "?" + l.join("<REPLACED>")) // Build the results

Why non-space elements in string aren't changing to upperCase

I want to write a function to change the characters in a string at even indices to uppercase. I don't want my program to count the spaces as an even index, even if it falls on an even index.
For example: 'This is a test' => 'ThIs Is A TeSt'
I originally had this solution, but I could not get it to work to ignore the space characters when counting the even indices.
function toWeirdCase(string) {
return string.split("").map((x,i) => i%2=== 0 && x!== " " ? x.toUpperCase() : x).join("")
}
This is my second attempt and I don't know why the string elements aren't actually changing to uppercase. Any help on this would be appreciated. It is just returning the original string.
function toWeirdCase(string) {
let indexCount = 0;
let isSpace = false;
for (let i = 0; i < string.length; i++) {
if (string[i] === " ") {
isSpace = true;
}
if (indexCount % 2 === 0 && !isSpace) {
string[indexCount] = string[indexCount].toUpperCase();
}
indexCount++;
isSpace = false;
}
return string;
}

Answer:
You can use a modified reduce function that utilizes a closure as a character counter. This has the benefit of completing the transformation in one pass:
["", ...str].reduce((n =>
(r, c) => r += /\s/.test(c) ? c : c[`to${n++ & 1 ? "Lower" : "Upper"}Case`]())(0)
);
Example:
const biUpperCase = str => ["", ...str].reduce((n =>
(r, c) =>r += /\s/.test(c) ? c : c[`to${n++ & 1 ? "Lower" : "Upper"}Case`]())(0)
);
let test = biUpperCase("This is a Test");
console.log(test);
Explanation:
n is a character counter that keeps track of all non-space characters. You can think of this as an additional index that only worries about non-space characters.
We use this to determine whether or not a character is an even or odd non-space character by performing bitwise AND ( n & 1 ) or, alternatively, by performing a modulus operation ( n % 2 )
r is the accumulator in the Array.prototype.reduce method. This is what is returned by our reduce method.
Since there was no secondary parameter to Array.prototype.reduce, the first index of the Array is used as the accumulator.
This is why we perform ["", ...str] instead of simply [...str].
Syntactically we could also have written [...str].reduce( fn , "" ) instead of ["", ...str].reduce( fn ), but this would alter our succinct code.
c is the current character that we are looking at within the string array. We use RegExp.prototype.match to determine if it's a space character.
if it is we simply add the space to r ( our accumulated string )
if it is not we add a transformed character to r ( our accumulated string )
To determine which case transformation( upper or lower ) should be applied we check if n ( our character counter ) is even or odd.
if n++ & 1 is truthy the case is lower
if n++ & 1 is falsy the case is upper
Aside:
You'll notice in the snippet and code I provided that I changed your parameter name string to str. The reason for this is because String is a built-in Constructor in JavaScript and it's best to never purposefully "cross the streams" when naming variables.
In the current way that you're attempting to use this variable it makes no difference since it's properly scoped, and truthfully it is up to you if you want to take my advice. Just be aware that it could lead to an annoying, invisible problem.
Hope this Helps! Happy Coding!

You could rewind the index counter for a single word.
function toWeirdCase(string) {
return Array
.from(
string,
(i => x => (/[a-z]/i.test(x) ? i++ : (i = 0)) % 2 ? x : x.toUpperCase())
(0)
)
.join('');
}
console.log(toWeirdCase('This is a test')); // 'ThIs Is A TeSt'

A string in javascript is immutable so you will need to do something like :
let test = 'This is a test';
test = toWeirdCase(test); //Here you assign the result
And here is an example solution which ignores spaces in the count
function toWeirdCase(string) {
let count = 0;
return string.split("").map((x) => {
if(x !== " ")
count++;
if(count%2 === 0) return x.toUpperCase();
else return x;
}).join("")
}
let test = 'This is a test';
test = toWeirdCase(test);
console.log(test); //THiS iS a TeSt

Like the comments mention, strings in Javascript are immutable. That being said, you can break down the input string on whitespace, do the transformations, and join back into a string, something like this -
function toWeirdCase(sentence) {
return sentence
.split(' ')
.map(word => word
.split('')
.map((c, i) => i % 2 ? c : c.toUpperCase())
.join('')).join(' ');
}

You could store the number of spaces in a variable in the functions scope.
function toWeirdCase(string) {
let spaceCount = 0;
// Personal preference: I like the reduce fn for this, but a similar thing could be achieved with map
return string.split('').reduce((value, letter, index) => {
if (letter === ' ') spaceCount++;
return value += ((index - spaceCount) % 2)
? letter
: letter.toUpperCase();
},'')
}
This returns the leter if the index ingoring the space count has a remainder when divided by 2.

You can achieve this like so:
const str = "this is a test";
function toWeirdCase(str) {
return str.split(" ").map(word => (
[...word].map((c, i) => i % 2 ?
c.toLowerCase() :
c.toUpperCase())).join("")).join(" ");
}
console.log(toWeirdCase(str));
Updated: to set odd indexes toLowerCase() to handle edge cases like acronyms (ie: currency acronyms; "CA", "USD")
Hope this helps,

How to remove specific character surrounding a string?

I have this string:
var str = "? this is a ? test ?";
Now I want to get this:
var newstr = "this is a ? test";
As you see I want to remove just those ? surrounding (in the beginning and end) that string (not in the middle of string). How can do that using JavaScript?
Here is what I have tried:
var str = "? this is a ? test ?";
var result = str.trim("?");
document.write(result);
So, as you see it doesn't work. Actually I'm a PHP developer and trim() works well in PHP. Now I want to know if I can use trim() to do that in JS.
It should be noted I can do that using regex, but to be honest I hate regex for this kind of jobs. Anyway is there any better solution?
Edit: As this mentioned in the comment, I need to remove both ? and whitespaces which are around the string.

Search for character mask and return the rest without.
This proposal the use of the bitwise not ~ operator for checking.
~ is a bitwise not operator. It is perfect for use with indexOf(), because indexOf returns if found the index 0 ... n and if not -1:
value ~value boolean
-1 => 0 => false
0 => -1 => true
1 => -2 => true
2 => -3 => true
and so on
function trim(s, mask) {
while (~mask.indexOf(s[0])) {
s = s.slice(1);
}
while (~mask.indexOf(s[s.length - 1])) {
s = s.slice(0, -1);
}
return s;
}
console.log(trim('??? this is a ? test ?', '? '));
console.log(trim('abc this is a ? test abc', 'cba '));

Simply use:
let text = '?? something ? really ??'
text = text.replace(/^([?]*)/g, '')
text = text.replace(/([?]*)$/g, '')
console.log(text)

A possible solution would be to use recursive functions to remove the unwanted leading and trailing characters. This doesn't use regular expressions.
function ltrim(char, str) {
if (str.slice(0, char.length) === char) {
return ltrim(char, str.slice(char.length));
} else {
return str;
}
}
function rtrim(char, str) {
if (str.slice(str.length - char.length) === char) {
return rtrim(char, str.slice(0, 0 - char.length));
} else {
return str;
}
}
Of course this is only one of many possible solutions. The function trim would use both ltrim and rtrim.
The reason that char is the first argument and the string that needs to be cleaned the second, is to make it easier to change this into a functional programming style function, like so (ES 2015):
function ltrim(char) {
(str) => {
<body of function>
}
}
// No need to specify str here
function ltrimSpaces = ltrim(' ');

Here is one way to do it which checks for index-out-of-bounds and makes only a single call to substring:
String.prototype.trimChars = function(chars) {
var l = 0;
var r = this.length-1;
while(chars.indexOf(this[l]) >= 0 && l < r) l++;
while(chars.indexOf(this[r]) >= 0 && r >= l) r--;
return this.substring(l, r+1);
};
Example:
var str = "? this is a ? test ?";
str.trimChars(" ?"); // "this is a ? test"

No regex:
uberTrim = s => s.length >= 2 && (s[0] === s[s.length - 1])?
s.slice(1, -1).trim()
: s;
Step-by-step explanation:
Check if the string is at least 2 characters long and if it is surrounded by a specific character;
If it is, then first slice it to remove the surrounding characters then trim it to remove whitespaces;
If not just return it.
In case you're weirded out by that syntax, it's an Arrow Function and a ternary operator.
The parenthesis are superfluous in the ternary by the way.
Example use:
uberTrim(''); // ''
uberTrim(' Plop! '); //'Plop!'
uberTrim('! ...What is Plop?!'); //'...What is Plop?'

Simple approach using Array.indexOf, Array.lastIndexOf and Array.slice functions:
Update: (note: the author has requested to trim the surrounding chars)
function trimChars(str, char){
var str = str.trim();
var checkCharCount = function(side) {
var inner_str = (side == "left")? str : str.split("").reverse().join(""),
count = 0;
for (var i = 0, len = inner_str.length; i < len; i++) {
if (inner_str[i] !== char) {
break;
}
count++;
}
return (side == "left")? count : (-count - 1);
};
if (typeof char === "string"
&& str.indexOf(char) === 0
&& str.lastIndexOf(char, -1) === 0) {
str = str.slice(checkCharCount("left"), checkCharCount("right")).trim();
}
return str;
}
var str = "???? this is a ? test ??????";
console.log(trimChars(str, "?")); // "this is a ? test"

to keep this question up to date using an ES6 approach:
I liked the bitwise method but when readability is a concern too then here's another approach.
function trimByChar(string, character) {
const first = [...string].findIndex(char => char !== character);
const last = [...string].reverse().findIndex(char => char !== character);
return string.substring(first, string.length - last);
}

Using regex
'? this is a ? test ?'.replace(/^[? ]*(.*?)[? ]*$/g, '$1')
You may hate regex but after finding a solution you will feel cool :)

Javascript's trim method only remove whitespaces, and takes no parameters. For a custom trim, you will have to make your own function. Regex would make a quick solution for it, and you can find an implementation of a custom trim on w3schools in case you don't want the trouble of going through the regex creation process. (you'd just have to adjust it to filter ? instead of whitespace

This in one line of code which returns your desire output:
"? this is a ? test ?".slice(1).slice(0,-1).trim();

Using JavaScript to perform text matches with/without accented characters

I am using an AJAX-based lookup for names that a user searches in a text box.
I am making the assumption that all names in the database will be transliterated to European alphabets (i.e. no Cyrillic, Japanese, Chinese). However, the names will still contain accented characters, such as ç, ê and even č and ć.
A simple search like "Micic" will not match "Mičić" though - and the user expectation is that it will.
The AJAX lookup uses regular expressions to determine a match. I have modified the regular expression comparison using this function in an attempt to match more accented characters. However, it's a little clumsy since it doesn't take into account all characters.
function makeComp (input)
{
input = input.toLowerCase ();
var output = '';
for (var i = 0; i < input.length; i ++)
{
if (input.charAt (i) == 'a')
output = output + '[aàáâãäåæ]'
else if (input.charAt (i) == 'c')
output = output + '[cç]';
else if (input.charAt (i) == 'e')
output = output + '[eèéêëæ]';
else if (input.charAt (i) == 'i')
output = output + '[iìíîï]';
else if (input.charAt (i) == 'n')
output = output + '[nñ]';
else if (input.charAt (i) == 'o')
output = output + '[oòóôõöø]';
else if (input.charAt (i) == 's')
output = output + '[sß]';
else if (input.charAt (i) == 'u')
output = output + '[uùúûü]';
else if (input.charAt (i) == 'y')
output = output + '[yÿ]'
else
output = output + input.charAt (i);
}
return output;
}
Apart from a substitution function like this, is there a better way? Perhaps to "deaccent" the string being compared?

There is a way to “"deaccent" the string being compared” without the use of a substitution function that lists all the accents you want to remove…
Here is the easiest solution I can think about to remove accents (and other diacritics) from a string.
See it in action:
var string = "Ça été Mičić. ÀÉÏÓÛ";
console.log(string);
var string_norm = string.normalize('NFD').replace(/\p{Diacritic}/gu, ""); // Old method: .replace(/[\u0300-\u036f]/g, "");
console.log(string_norm);
.normalize(…) decomposes the letters and diacritics.
.replace(…) removes all the diacritics.

Came upon this old thread and thought I'd try my hand at doing a fast function. I'm relying on the ordering of pipe-separated ORs setting variables when they match in the function replace() is calling. My goal was to use the standard regex-implementation javascript's replace() function uses as much as possible, so that the heavy-processing can take place in low-level browser-optimized space, instead of in expensive javascript char-by-char comparisons.
It's not scientific at all, but my old Huawei IDEOS android phone is sluggish when I plug the other functions in this thread in to my autocomplete, while this function zips along:
function accentFold(inStr) {
return inStr.replace(
/([àáâãäå])|([çčć])|([èéêë])|([ìíîï])|([ñ])|([òóôõöø])|([ß])|([ùúûü])|([ÿ])|([æ])/g,
function (str, a, c, e, i, n, o, s, u, y, ae) {
if (a) return 'a';
if (c) return 'c';
if (e) return 'e';
if (i) return 'i';
if (n) return 'n';
if (o) return 'o';
if (s) return 's';
if (u) return 'u';
if (y) return 'y';
if (ae) return 'ae';
}
);
}
If you're a jQuery dev, here's a handy example of using this function; you could use :icontains the same way you'd use :contains in a selector:
jQuery.expr[':'].icontains = function (obj, index, meta, stack) {
return accentFold(
(obj.textContent || obj.innerText || jQuery(obj).text() || '').toLowerCase()
)
.indexOf(accentFold(meta[3].toLowerCase())
) >= 0;
};

I searched and upvoted herostwist answer but kept searching and truly, here is a modern solution, core to JavaScript (string.localeCompare function)
var a = 'réservé'; // with accents, lowercase
var b = 'RESERVE'; // no accents, uppercase
console.log(a.localeCompare(b));
// expected output: 1
console.log(a.localeCompare(b, 'en', {sensitivity: 'base'}));
// expected output: 0
NOTE, however, that full support is still missing for some mobile browser !!!
Until then, keep watching out for full support across ALL platforms and env.
Is that all ?
No, we can go further right now and use string.toLocaleLowerCase function.
var dotted = 'İstanbul';
console.log('EN-US: ' + dotted.toLocaleLowerCase('en-US'));
// expected output: "istanbul"
console.log('TR: ' + dotted.toLocaleLowerCase('tr'));
// expected output: "istanbul"
Thank You !

There is no easier way to "deaccent" that I can think of, but your substitution could be streamlined a little more:
var makeComp = (function(){
var accents = {
a: 'àáâãäåæ',
c: 'ç',
e: 'èéêëæ',
i: 'ìíîï',
n: 'ñ',
o: 'òóôõöø',
s: 'ß',
u: 'ùúûü',
y: 'ÿ'
},
chars = /[aceinosuy]/g;
return function makeComp(input) {
return input.replace(chars, function(c){
return '[' + c + accents[c] + ']';
});
};
}());

I think this is the neatest solution
var nIC = new Intl.Collator(undefined , {sensitivity: 'base'})
var cmp = nIC.compare.bind(nIC)
It will return 0 if the two strings are the same, ignoring accents.
Alternatively you try localecompare
'être'.localeCompare('etre',undefined,{sensitivity: 'base'})

I made a Prototype Version of this:
String.prototype.strip = function() {
var translate_re = /[öäüÖÄÜß ]/g;
var translate = {
"ä":"a", "ö":"o", "ü":"u",
"Ä":"A", "Ö":"O", "Ü":"U",
" ":"_", "ß":"ss" // probably more to come
};
return (this.replace(translate_re, function(match){
return translate[match];})
);
};
Use like:
var teststring = 'ä ö ü Ä Ö Ü ß';
teststring.strip();
This will will change the String to a_o_u_A_O_U_ss

You can also use http://fusejs.io, which describes itself as "Lightweight fuzzy-search library.
Zero dependencies", for fuzzy searching.

First, I'd recommend a switch statement instead of a long string of if-else if ...
Then, I am not sure why you don't like your current solution. It certainly is the cleanest one. What do you mean by not taking into account "all characters"?
There is no standard method in JavaScript to map accented letters to ASCII letters outside of using a third-party library, so the one you wrote is as good as any.
Also, "ß" I believe maps to "ss", not a single "s". And beware of "i" with and without dot in Turkish -- I believe they refer to different letters.

We Keep Coding

JavaScript is the programming language of the Web.

What is lookbehind support in JS? How to replace it? - javascript

One hack you can use in this case. is changing the offset value based on match. let arr = ['i','iki','iiki','ii','iii', 'ki'] arr.forEach(e=>{ let value = e.replace(/i(?!i)/g, function(match,offset,string){ return offset > 0 && string[offset-1] === 'i' ? 'i' : 'z' }) console.log(value) })

Related

Can this problem be solved using functional programming only?

Match all instances of character except the first one, without lookbehind

Why non-space elements in string aren't changing to upperCase

How to remove specific character surrounding a string?

Using JavaScript to perform text matches with/without accented characters

Categories

Resources