Javascript regular expression to sanitize string with pipes - javascript

I need a little help in trying to sanitize a string. I have written a regular expression that is pretty close to giving me the results I want but I just can't quite get it right. The string I'm receiving is in this format.
||a|assa||asss||ssss
The pipe character are basically placeholders to what would have been a separator for text. However, I'm trying to end up with something that would look like this.
|a|b|c|d in other words I'm just trying to remove consecutive pipes. I have put together a little example to illustrate what I have attempted and keep failing miserably.
const str1 = "||a||jump|fences||in the street";
const str2 = "im a wolf";
const hasPipe = /\|{1}\+/;//if the | is consecutevely repeated more than once than deleted.
console.log(hasPipe.test(str1));
console.log(str1.replace(hasPipe, ""))
console.log(hasPipe.test(str2));
The expected result to the above code should simply be.
|a|jump|fences|in the street"
Can someone please point me in the right direction or point my silly mistake.

Given your test string const str1 = "||a||jump|fences||in the street"; you want to replace multiple occurrences of pipe | with a single pipe.
There are a couple of ways to match a non-empty sequence:
+ = match 1 or more of the previous expression
{n,m} = match at least n but not more than m occurrences.
{n,} = match at least n and unlimited times.
Simple:
str1.replace(/\|+/g, "|")
"|a|jump|fences|in the street"
Matches one or more pipes and replaces with a single pipe. This replaces a single pipe with a pipe.
More exact:
str1.replace(/\|{2,}/g, "|")
"|a|jump|fences|in the street"
Matches two or more (because there is no max after the comma) pipes and replaces with a single pipe. This does not bother replacing a single pipe with another single pipe.
There are also a couple of ways to match exactly two pipes, if you'll never have a run of 3 or more:
str1.replace(/\|\|/, "|");
str1.replace(/\|{2}/, "|");

Not much to it:
\|\|+ replace with |
https://regex101.com/r/vvkrI0/1/

You can use the + to find all the locations that have 1 or more pipes in a row, and replace them all with a single pipe. Your regex would simply be:
/\|+/g
Here is an example, with a variable number of pipes:
const str1 = "||a|||jump|fences||||in the street";
var filtered_str1 = str1.replace(/\|+/g,"|")
console.log(filtered_str1);

You could substitute consective pipe characters like this:
const pat = /\|{2,}/gm;
const str = `||a|||jump|fences||in the street`;
const sub = `|`;
const res = str.replace(pat, sub);
console.log('result: ', res);
Result:
|a|jump|fences|in the street

Related

How can I include the delimiter with regex String.split()?

I need to parse the tokens from a GS1 UDI format string:
"(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
I would like to split that string with a regex on the "(nnn)" and have the delimiter included with the split values, like this:
[ "(20)987111", "(240)A", "(10)ABC123", "(17)2022-04-01", "(21)888888888888888" ]
Below is a JSFiddle with examples, but in case you want to see it right here:
// This includes the delimiter match in the results, but I want the delimiter included WITH the value
// after it, e.g.: ["(20)987111", ...]
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\))/).filter(Boolean))
// Result: ["(20)", "987111", "(240)", "A", "(10)", "ABC123", "(17)", "2022-04-01", "(21)", "888888888888888"]
// If I include a pattern that should (I think) match the content following the delimiter I will
// only get a single result that is the full string:
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)\W+)/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
// I think this is because I'm effectively mathching the entire string, hence a single result.
// So now I'll try to match only up to the start of the next "(":
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)(^\())/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
I've found and read this question, however the examples there are matching literals and I'm using character classes and getting different results.
I'm failing to create a regex pattern that will provide what I'm after. Here's a JSFiddle of some of the things I've tried: https://jsfiddle.net/6bogpqLy/
I can't guarantee the order of the "application identifiers" in the input string and as such, match with named captures isn't an attractive option.
You can split on positions where parenthesised element follows, by using a zero-length lookahead assertion:
const text = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
const parts = text.split(/(?=\(\d+\))/)
console.log(parts)
Instead of split use match to create the array. Then find 1) digits in parenthesis, followed by a group that might contain a digit, a letter, or a hyphen, and then 2) group that whole query.
(PS. I often find a site like Regex101 really helps when it comes to testing out expressions outside of a development environment.)
const re = /(\(\d+\)[\d\-A-Z]+)/g;
const str = '(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888';
console.log(str.match(re));

How to take first character from 2 words

I have input = "Graha Cinere". And I want the output just "GCi". So G it's take from first characters in first words and Ci from second words.
val = "Graha Cinere"
val.match(/\b(\w)/g).join('')
Current output : GC
Expected Output : GCi
I wish there's an answer from my question.
Here's a couple of ways to do this. Firstly you can use a regex to match the single character at the start of the first word and two characters at the start of the second word and then join those parts.
val = "Graha Cinere";
out = val.match(/^(\w)\w*\s+(\w{1,2})/).slice(1).join('');
console.log(out);
Secondly you could split the string on space and then take the first character of the first result and the first two characters of the second result and join them:
val = "Graha Cinere";
out = val.split(' ').map((v, i) => v.slice(0, i+1)).join('');
console.log(out);
first split then take two words
val = "Graha Cinere";
parts = val.split(" ");
neededStr = parts[0][0] +parts[1][0]+ parts[1][1];
console.log(neededStr);
Also, You can use String slice
let val = "Graha Cinere";
let parts = val.split(" ");
neededPartOne = parts[0][0];
neededPartTwo = parts[1].slice(0,2);
exactNeeded = neededPartOne + neededPartTwo;
console.log(exactNeeded);
Hello Bai!
Although the use of Regex is extremely useful and practical for this and some other situations with more complex scenarios, I would recommend you to first take a look and give a try to the native string methods coming along with most languages, that will give ya a better scope of what you can do with the language itself to find a prompt solution to your cases and if not, then you go to the next step by getting the extra help from some other tools such as RegEx, Underscore or its successor Lodash just to name a few:
I put this small snippet of JS together for you to take a look at a simpler way to handle this case. This is not the only way to do it, but it is merely made in the language in use.
Cheers, pal!
let str="Graha Cinere";
(() => {
return str.charAt(0).concat(str.substr(str.indexOf(' ')+1,2));
})();
this will give you what you're looking for-
val.split(" ")[0][0] + val.split(" ")[1][0]

RegEx for matching YouTube embed ID

I'm in non-modern JavaScript and I have a string defined as follows:
"//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0"
I want to pull out just the DmYK479EpQc but I don't know the length. I do know that I want what is after the / and before the ?
Is there some simple lines of JavaScript that would solve this?
Use the URL object?
console.log(
(new URL("//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0", location.href)).pathname
.split('/')
.pop());
Why? Because I can likely make up a URL that defeats the regex (though for youtube it's probably unlikely)
This expression might help you to do so, and it might be faster:
(d\/)([A-z0-9]+)(\?)
Graph
This graph shows how the expression would work and you can visualize other expressions in this link:
const regex = /(.*)(d\/)([A-z0-9]+)(\?)(.*)/gm;
const str = `//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0`;
const subst = `$3`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
Performance Test
This JavaScript snippet shows the performance of that expression using a simple 1-million times for loop.
const repeat = 1000000;
const start = Date.now();
for (var i = repeat; i >= 0; i--) {
const string = '//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0';
const regex = /(.*)(d\/)([A-z0-9]+)(\?)(.*)/gm;
var match = string.replace(regex, "$3");
}
const end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚💚💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");
How about non-regex way
console.log("//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0".split('/').pop().split('?')[0]);
I'm not going to give a piece of code because this is a relatively simple algorithm, and easy to implement.
Please note that those links has this format (correct me if I'm wrong):
https:// or http://
www.youtube.com/
embed/
Video ID (DmYK479EpQc in this case)
?parameters (note that they start ALWAYS with the character ?)
You want the ID of the video, so you can split the string into those sections and if you store those sections in one array you can be sure that the ID is at the 3rd position.
One example of how that array would look like would be:
['https://', 'www.youtube.com', 'embed', 'DmYK479EpQc', '?vq=hd720&rel=0']
One option uses a regex replacement:
var url = "//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0";
var path = url.replace(/.*\/([^?]+).*/, "$1");
console.log(path);
The above regex pattern says to:
.* match and consume everything up to and
/ including the last path separator
([^?]+) then match and capture any number of non ? characters
.* then consume the rest of the input
Then, we just replace with the first capture group, which corresponds to the text after the final path separator, but before the start of the query string, should the URL have one.
You can use this regex
.* match and consume everything up to
[A-z0-9]+ then match and capture any number and character between A-z
.* then consume the rest of the input
const ytUrl = '//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0';
const regex = /(.*)(d\/)([A-z0-9]+)(\?)(.*)/gm;
const position = '$3';
let result = ytUrl.replace(regex, position);
console.log('YouTube ID: ', result);
This regex just split the string into different sections and the YouTube id is at the 3rd position.
Another, solution is using split. This method splits a string into an array of substrings.
const ytUrl = '//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0';
let result = ytUrl.split('/').pop().split('?').shift()
console.log('YouTube ID: ', result);
In this sample, we split the URL using / as separator. Then we took the last element of the array with the pop method. and finally we split again using ? as separator and we take the first element of the array with the shift method.

How to store only the nth substring into a variable in Javascript

var a="how are you?";
In the above example I want to store the second word "are" into another variable in a single step.
I don't want to use something like below
var bigArray = a.split(" ");
var secondText = bigArray[1];
as we may need to store the entire paragraph into a big array and consume a lot of memory without any use.
I would like to know if there is some function which works as below
var secondText=specialFunction(a," ",1);
so that we will get the second substring when the paragraph is split by " "
Well, I would spend my time worrying about more important things than the size of some arrays.
Anyway, you could try using a regexp:
var secondText = (a.match(/ (\w+)/) || []) [1];
This reads as "find a space, then capture the following word".
The || [] part is meant to deal with the situation where there is no match (for example, no second word). In that case, the result will be [][1] which is undefined.
This finds only the second word. What about the more general case? Since we are not allowed to split the string on spaces, because that would create an array and the OP doesn't want that due to memory concerns. So, we will instead build a dynamic regexp. To find the nth word, we want to skip over the first n-1 spaces. Or, to be more precise, we want to skip over the first word, some spaces, then the second word, then some more spaces, etc. So the regexp is
/(?:\w+ ){n}(\w+)/
^^ NO CAPTURING GROUP
^^^^ WORD FOLLOWED BY SPACE
^^^ N TIMES
^^^^^ CAPTURE FOLLOWING WORD
The ?: is to avoid this being treated as a capturing group. We build the regexp using
function make_nth_word_regexp(n) {
n--;
return new RegExp("(?:\\w+ ){" + n + "}(\\w+)");
}
Now look for your nth word:
var fifth_word = str.match(make_nth_word_regexp(5)) [1];
> "Hey there you".match(make_nth_word_regexp(3))[1]
< "you"
Alternative to regex is just to use substring(). Something like
var a="how are you";
alert(a.substring(a.indexOf(" "), a.length).substring(0, a.indexOf(" ")+1));

Regex .exec into array

I want to capture some values in a string, THEN return them to the page. Here is an example of the code. As I understand, the .exec should store the values it matches into the array correct? This should return Savage, Betsy. Can someone enlighten me on to what's wrong?
var regex = /\b(Betsy)(Savage)\b/i;
var string = "My friend is Betsy Ann Savage";
var arrayMatch = null;
while(arrayMatch = regex.exec(string)){
document.getElementById("text").innerHTML = arrayMatch[1] + ", " + arrayMatch[0];
}
You don't get any matches like this. You could add .* between (Betsy) and (Savage)...
It sounds like you think \b(Besty)(Savage)\b will match EITHER Besty, OR Savage, but that isn't the case. It's looking for one string where both parts are combined - you might as well try to match \b(BetsySavage)\b. This is because a while yes, you do have two groups separated by parentasis, you have them directly next to each other, so the Regex engine says, 'okay', I'll look for both right next to each other. I think what you really want to do is use | which represents an OR. As in \b(Besty|Savage)\b.

Categories