This question already has answers here:
Regular expression to get a string between two strings in Javascript
(13 answers)
Closed 2 years ago.
I'm taking this post as reference where they suggest this solution:
Current
Str = 'MyLongString:StringIWant;'
Desired Output
newStr = 'StringIWant'
Solution
var mySubString = str.substring(
str.lastIndexOf(":") + 1,
str.lastIndexOf(";")
);
but what if i have multiple incidence? in my case i have a long text, lets say:
str = 'MyLongString has :multiple; words that i :need; to :extract;'
New desired output:
strarray = ["multiple","need","extract"]
When i apply the suggested solution its just getting the last word "extract"
This is one way of doing it, running a loop from start of the string to the end and using indexOf and substring to get the desired result
const str = 'MyLongString has :multiple; words that i :need; to :extract;';
function extractWords(str) {
const words = [];
for (let i = 0; i < str.length; i++) {
if (str.charAt(i) === ':') {
const stopIndex = str.indexOf(';', i);
if (stopIndex !== -1)
words.push(str.substring(i + 1, stopIndex));
}
}
return words;
}
console.log(extractWords(str));
The linked suggestion is actually using a string function instead of a regular pattern; a simple pattern like /:\w+;/g gives us all desired words, thanks to the /g flag.
We can get rid of the pre- and postfix using lookarounds, however, those may not work in all JavaScript environments (but actually it is less of a problem now since all major browsers support the lookbehind (?<=..) operator in their latest versions, and the lookahead (?=..) is supported since forever).
str = 'MyLongString has :multiple; words that i :need; to :extract;'
let result = str.match(/:\w+;/g) || [];
console.log(result);
let result2 = str.match(/(?<=:)\w+(?=;)/g) || [];
console.log(result2);
Related
This question already has answers here:
JS remove everything after the last occurrence of a character [duplicate]
(9 answers)
Closed 3 years ago.
I been looking into split(), slice(), substring() but im not sure they can do what i am looking for.
Basically I want to remove the last sub directory of a string for examble:
Computers/Hard-drive/5GB/Brand
I need to be able to remove the Brand out to something like Computers/Hard-drive/5GB
My solution is
var val = directory.split("/");
var result;
val.forEach(function (dir) {
result = result+"/"+dir;
});
console.log(result);
This returns what I want. But I think this is bad code, I think there must be some way of doing this using only one line. What are other ways of doing this?
You have lots of options.
The regular expression approach
The lastIndexOf approach
The split/join approach
Personally I prefer the simplicity and lack of unnecessary intermediate objects that #2 provides.
1. The regular expression approach
...either as shown by CertainPerformance or a version using replace to replace all characters after the last /.
let str = "Computers/Hard-drive/5GB/Brand";
str = str.replace(/\/[^\/]+$/, "");
console.log(str);
This works just fine if there isn't any slash; it just returns the string unchanged.
2. The lastIndexOf version
let str = "Computers/Hard-drive/5GB/Brand";
const index = str.lastIndexOf("/");
str = str.substring(0, index);
console.log(str);
Note that the above assumes there will be a slash. To not make that assumption, you need to add a guard in case index is -1:
let str = "Computers/Hard-drive/5GB/Brand";
let index = str.lastIndexOf("/");
if (index !== -1) {
str = str.substring(0, index);
}
console.log(str);
str = "Computers";
index = str.lastIndexOf("/");
if (index !== -1) {
str = str.substring(0, index);
}
console.log(str);
3. The split and join approach
...as shown by Nina. (Removed mine when I saw hers, which is much more concise without being harder to read.) Note that Nina's implementation also assumes there will be a slash, you'll need to do a check on the result of split (e.g., is it more than one long) to handle the case where there isn't one.
One option is to use a regular expression: match characters until lookahead matches /, followed by non-/ characters, followed by the end of the string:
const str = 'Computers/Hard-drive/5GB/Brand';
const result = str.match(/.*(?=\/[^/]+$)/)[0];
console.log(result);
You can try with String.prototype.lastIndexOf() and String.prototype.substring()
var directory = 'Computers/Hard-drive/5GB/Brand'
var pos = directory.lastIndexOf("/");
var result = directory.substring(0, pos);
console.log(result);
You could split, slice untile the item before the last one and join the array to a string.
var string = 'Computers/Hard-drive/5GB/Brand',
result = string.split('/').slice(0, -1).join('/');
console.log(result);
This question already has answers here:
Javascript and regex: split string and keep the separator
(11 answers)
Closed 6 years ago.
I have the following string
str = "11122+3434"
I want to split it into ["11122", "+", "3434"]. There can be following delimiters +, -, /, *
I have tried the following
strArr = str.split(/[+,-,*,/]/g)
But I get
strArr = [11122, 3434]
Delimiters are things that separate data. So the .split() method is designed to remove delimiters since delimiters are not data so they are not important at all.
In your case, the thing between two values is also data. So it's not a delimiter, it's an operator (in fact, that's what it's called in mathematics).
For this you want to parse the data instead of splitting the data. The best thing for that is therefore regexp:
var result = str.match(/(\d+)([+,-,*,/])(\d+)/);
returns an array:
["11122+3434", "11122", "+", "3434"]
So your values would be result[1], result[2] and result[3].
This should help...
str = '11122+3434+12323*56767'
strArr = str.replace(/[+,-,*,/]/g, ' $& ').split(/ /g)
console.log(strArr)
Hmm, one way is to add a space as delimiter first.
// yes,it will be better to use regex for this too
str = str.replace("+", " + ");
Then split em
strArr = str.split(" ");
and it will return your array
["11122", "+", "3434"]
in bracket +-* need escape, so
strArr = str.split(/[\+\-\*/]/g)
var str = "11122+77-3434";
function getExpression(str) {
var temp = str.split('');
var part = '';
var result = []
for (var i = 0; i < temp.length; i++) {
if (temp[i].match(/\d/) && part.match(/\d/g)) {
part += temp[i];
} else {
result.push(part);
part = temp[i]
}
if (i === temp.length - 1) { //last item
result.push(part);
part = '';
}
}
return result;
}
console.log(getExpression(str))
I've have a input string:
12345,3244,654,ffgv,87676,988ff,87657
I'm having a difficulty to transform all terms in the string that are not five digit numbers to a constant 34567 using regular expressions. So, the output would be like this:
12345,34567,34567,34567,87676,34567,87657
For this, I looked at two options:
negated character class: Not useful because it does not execute directly on this expression ,[^\d{5}],
lookahead and lookbehind: Issue here is that it doesn't include non-matched part in the result of this expression ,(?!\d{5}) or (?<!\d{5}), for the purpose of substitution/replace.
Once the desired expression is found, it would give a result so that one can replace non-matched part using tagged regions like \1, \2.
Is there any mechanism in regular expression tools to achieve the output as mentioned in the above example?
Edit: I really appreciate those who have answered non-regex solutions, but I would be more thankful if you provide a regex-based solution.
You don't need regex for this. You can use str.split to split the string at commas first and then for each item check if its length is greater than or equal to 5 and it contains only digits(using str.isdigit). Lastly combine all the items using str.join.
>>> s = '12345,3244,654,ffgv,87676,988ff,87657'
>>> ','.join(x if len(x) >= 5 and x.isdigit() else '34567' for x in s.split(','))
'12345,34567,34567,34567,87676,34567,87657'
Javascript version:
function isdigit(s){
for(var i=0; i <s.length; i++){
if(!(s[i] >= '0' && s[i] <= '9')){
return false;
}
}
return true;
}
arr = "12345,3244,654,ffgv,87676,988ff,87657".split(",");
for(var i=0; i < arr.length; i++){
if(arr[i].length < 5 || ! isdigit(arr[i])) arr[i] = '34567';
}
output = arr.join(",")
Try the following: /\b(?!\d{5})[^,]+\b/g
It constrains the expression between word boundaries (\b),
Followed by a negative look-ahead for non five digit numbers (!\d{5}),
Followed by any characters between ,
const expression = /\b(?!\d{5})[^,]+\b/g;
const input = '12345,3244,654,ffgv,87676,988ff,87657';
const expectedOutput = '12345,34567,34567,34567,87676,34567,87657';
const output = input.replace(expression, '34567');
console.log(output === expectedOutput, expectedOutput, output);
This approach uses /\b(\d{5})|(\w+)\b/g:
we match on boundaries (\b)
our first capture group captures "good strings"
our looser capture group gets the leftovers (bad strings)
our replacer() function knows the difference
const str = '12345,3244,654,ffgv,87676,988ff,87657';
const STAND_IN = '34567';
const massageString = (str) => {
const pattern = /\b(\d{5})|(\w+)\b/g;
const replacer = (match, goodstring, badstring) => {
if (goodstring) {
return goodstring;
} else {
return STAND_IN;
}
}
const r = str.replace(pattern,replacer);
return r;
};
console.log( massageString(str) );
I think the following would work for value no longer than 5 alphanumeric characters:
(,(?!\d{5})\w{1,5})
if longer than 5 alphanumeric characters, then remove 5 in above expression:
(,(?!\d{5})\w{1,})
and you can replace using:
,34567
You can see a demo on regex101. Of course, there might be faster non-regex methods for specific languages as well (python, perl or JS)
So I have the following:
var token = '[token]';
var tokenValue = 'elephant';
var string = 'i have a beautiful [token] and i sold my [token]';
string = string.replace(token, tokenValue);
The above will only replace the first [token] and leave the second on alone.
If I were to use regex I could use it like
string = string.replace(/[token]/g, tokenValue);
And this would replace all my [tokens]
However I don't know how to do this without the use of //
I have found split/join satisfactory enough for most of my cases.
A real-life example:
myText.split("\n").join('<br>');
Why not replace the token every time it appears with a do while loop?
var index = 0;
do {
string = string.replace(token, tokenValue);
} while((index = string.indexOf(token, index + 1)) > -1);
string = string.replace(new RegExp("\\[token\\]","g"), tokenValue);
Caution with the accepted answer, the replaceWith string can contain the inToReplace string, in which case there will be an infinite loop...
Here a better version:
function replaceSubstring(inSource, inToReplace, inReplaceWith)
{
var outString = [];
var repLen = inToReplace.length;
while (true)
{
var idx = inSource.indexOf(inToReplace);
if (idx == -1)
{
outString.push(inSource);
break;
}
outString.push(inSource.substring(0, idx))
outString.push(inReplaceWith);
inSource = inSource.substring(idx + repLen);
}
return outString.join("");
}
"[.token.*] nonsense and [.token.*] more nonsense".replace("[.token.*]", "some", "g");
Will produce:
"some nonsense and some more nonsense"
I realized that the answer from #TheBestGuest won't work for the following example as you will end up in an endless loop:
var stringSample= 'CIC';
var index = 0;
do { stringSample = stringSample.replace('C', 'CC'); }
while((index = stringSample.indexOf('C', index + 1)) > -1);
So here is my proposition for replaceAll method written in TypeScript:
let matchString = 'CIC';
let searchValueString= 'C';
let replacementString ='CC';
matchString = matchString.split(searchValueString).join(replacementString);
console.log(matchString);
Unfortunately since Javascript's string replace() function doesn't let you start from a particular index, and there is no way to do in-place modifications to strings it is really hard to do this as efficiently as you could in saner languages.
.split().join() isn't a good solution because it involves the creation of a load of strings (although I suspect V8 does some dark magic to optimise this).
Calling replace() in a loop is a terrible solution because replace starts its search from the beginning of the string every time. This is going to lead to O(N^2) behaviour! It also has issues with infinite loops as noted in the answers here.
A regex is probably the best solution if your replacement string is a compile time constant, but if it isn't then you can't really use it. You should absolutely not try and convert an arbitrary string into a regex by escaping things.
One reasonable approach is to build up a new string with the appropriate replacements:
function replaceAll(input: string, from: string, to: string): string {
const fromLen = from.length;
let output = "";
let pos = 0;
for (;;) {
let matchPos = input.indexOf(from, pos);
if (matchPos === -1) {
output += input.slice(pos);
break;
}
output += input.slice(pos, matchPos);
output += to;
pos = matchPos + fromLen;
}
return output;
}
I benchmarked this against all the other solutions (except calling replace() in a loop which is going to be terrible) and it came out slightly faster than a regex, and about twice as fast as split/join.
Edit: This is almost the same method as Stefan Steiger's answer which I totally missed for some reason. However his answer still uses .join() for some reason which makes it 4 times slower than mine.
My Text
1618148163###JASSER-PC#-#1125015374###anas kayyat#-#1543243035###anas kayyat#-#
Result Should Be:
JASSER-PC
anas kayyat
anas kayyat
I am using :
(?<=###)(.+)(?=#-#)
But it gives me that :
JASSER-PC#-#1125015374###anas kayyat#-#1543243035###anas kayyat
JavaScript’s regular expressions don’t support look-behind assertions (i.e. (?<=…) and (?<!…)), so you can’t use that regular expression. But you can use this:
###(.+)(?=#-#)
Then just take the matched string of the first group. Additionally, to only match as little as possible, make the + quantifier non-greedy by using +?.
The group (.+) will match as much as it can (it's "greedy"). To make it find a minimal match you can use (.+?).
JavaScript does not support lookbehinds. Make the quantifier non greedy, and use:
var regex = /###(.+?)#-#/g;
var strings = [];
var result;
while ((result = regex.exec(input)) != null) {
strings.push(result[1]);
}
I'll give you a non-regex answer, since using regular expressions isn't always appropriate, be it speed or readibility of the regex itself:
function getText(text) {
var arr = text.split("###"); // arr now contains [1618148163,JASSER-PC#-#1125015374,anas kayyat#-#1543243035,anas kayyat#-#]
var newarr = [];
for(var i = 0; i < arr.length; i++) {
var index = arr[i].indexOf("#-#");
if(index != -1) { // if an array element doesn't contain "#-#", we ignore it
newarr.push(arr[i].substring(0, index));
}
}
return newarr;
}
Now, using
getText("1618148163###JASSER-PC#-#1125015374###anas kayyat#-#1543243035###anas kayyat#-#");
returns what you wanted.