I am trying to do some string replacement with RegEx in Javascript. The scenario is a single line string containing long comma-delimited list of numbers, in which duplicates are possible.
An example string is: 272,2725,2726,272,2727,297,272 (The end may or may not end in a comma)
In this example, I am trying to match each occurrence of the whole number 272. (3 matches expected)
The example regex I'm trying to use is: (?:^|,)272(?=$|,)
The problem I am having is that the second and third matches are including the leading comma, which I do not want. I am confused because I thought (?:^|,) would match, but not capture. Can someone shed light on this for me? An interesting bit is that the trailing comma is excluded from the result, which is what I want.
For what it is worth, if I were using C# there is syntax for prefix matching that does what I want: (?<=^|,)
However, it appears to be unsupported in JavaScript.
Lastly, I know I could workaround it using string splitting, array manipulation and rejoining, but I want to learn.
Use word boundaries instead:
\b272\b
ensures that only 272 matches, but not 2725.
(?:...) matches and doesn't capture - but whatever it matches will be part of the overall match.
A lookaround assertion like (?=...) is different: It only checks if it is possible (or impossible) to match the enclosed regex at the current point, but it doesn't add to the overall match.
Here is a way to create a JavaScript look behind that has worked in all cases I needed.
This is an example. One can do many more complex and flexible things.
The main point here is that in some cases,
it is possible to create a RegExp non-capturing prefix
(look behind) construct in JavaScript .
This example is designed to extract all fields that are surrounded by braces '{...}'.
The braces are not returned with the field.
This is just an example to show the idea at work not necessarily a prelude to an application.
function testGetSingleRepeatedCharacterInBraces()
{
var leadingHtmlSpaces = ' ' ;
// The '(?:\b|\B(?={))' acts as a prefix non-capturing group.
// That is, this works (?:\b|\B(?=WhateverYouLike))
var regex = /(?:\b|\B(?={))(([0-9a-zA-Z_])\2{4})(?=})/g ;
var string = '' ;
string = 'Message has no fields' ;
document.write( 'String => "' + string
+ '"<br>' + leadingHtmlSpaces + 'fields => '
+ getMatchingFields( string, regex )
+ '<br>' ) ;
string = '{LLLLL}Message {11111}{22222} {ffffff}abc def{EEEEE} {_____} {4444} {666666} {55555}' ;
document.write( 'String => "' + string
+ '"<br>' + leadingHtmlSpaces + 'fields => '
+ getMatchingFields( string, regex )
+ '<br>' ) ;
} ;
function getMatchingFields( stringToSearch, regex )
{
var matches = stringToSearch.match( regex ) ;
return matches ? matches : [] ;
} ;
Output:
String => "Message has no fields"
fields =>
String => "{LLLLL}Message {11111}{22222} {ffffff}abc def{EEEEE} {_____} {4444} {666666} {55555}"
fields => LLLLL,11111,22222,EEEEE,_____,55555
Related
Am trying to find a regex expression for this result:
string => should be matched (a single word or set of words at the beginning or the ending)
string => should be matched (a single word or set of words in the middle)
{{string}} -- should not be matched (a single word or set of words surrounded by two "{}" should not be matched)
am using this regex in this function :
text = text.replace(RegExp("([^{]{2})[^(\d:)]" + aTags[index].textContent + "\w*
([^}]{2})", 'i'), "{{" + index + ":" + aTags[index].textContent + "}}");
the function should find the textContent of an 'a' tag in a 'text' string and replace it by adding a digit and ':' to the beginning of the textContent so that the result should be something like this :
some text => will became => {{1:some text}}
We can apply the good old *SKIP what's to avoid approach and throw everything that does not need to be replaced in the full match and capture the desired output in group 1:
{{[^}]+}}|(string)
To make this work effectively in JavaScript we have to use a .replace callback function:
const regex = /{{[^}]+}}|(string)/gm;
const str = `string
string
{{string}}`;
var index = 1; //this is your index var and is somehow set from outside
const result = str.replace(regex, function(m, group1) {
if (group1) return `{{${index}:${group1}}}`;
else return m;
});
console.log('Substitution result: ', result);
I had pseudo-coded this a bit since I cannot know where index and aTags[index].textContent is coming from. Adjust as needed.
You cannot use PCRE verbs like (*SKIP)(*F) in a JavaScript regex, i.e. you cannot skip a matched portion of text with the regex means only. In JavaScript, you may match and capture a part of the string you want to later analyze in the replacement callback method (JS String#replace accepts a callback as the replacement argument).
So, in your case the solution will look like
text = text.replace(RegExp("{{.*?}}|(" + aTags[index].textContent + ")", "gi"),
function ($0, $1) {
return $1 ? "{{" + index + ":" + $1 + "}}" : $0;
}
);
I understand the aTags[index].textContent value is alphanumeric, else, consider escaping it for use in a regex pattern.
The pattern will match a {{...}} substring having no } inside (with {{.*?}}) or (|) it will match and capture the text content ((aTags[index].textContent)) into Group 1. When you get a match, you need to pass 2 arguments to the callback, the whole match and Group 1 value. If Group 1 is not empty, you perform string manipulations, else, just insert the match back.
I am looking for an easier (and less hacky) way to get the substring of what is inside matching square brackets in a string. For example, lets say this is the string:
[ABC[D][E[FG]]HIJK[LMN]]OPQR[STUVW]XYZ
I want the substring:
ABC[D][E[FG]]HIJK[LMN]
Right now, I am looping through the string and counting the open and closed brackets, and when those numbers are the same, I take substring of the first open bracket and last closed bracket.
Is there an easier way to do this (ie with regex), so that I do need to loop through every character?
Here's another approach, an ugly hack which turns the input into a JS array representation and then parses it using JSON.parse:
function parse(str) {
return JSON.parse('[' +
str.split('') . join(',') . // insert commas
replace(/\[,/g, '[') . // clean up leading commas
replace(/,]/g, ']') . // clean up trailing commas
replace(/\w/g, '"$&"') // quote strings
+ ']');
}
>> hack('A[B]C')
<< ["A", ["B"], "C"]
Now a stringifier to turn arrays back into the bracketed form:
function stringify(array) {
return Array.isArray(array) ? '[' + array.map(stringify).join('') + ']' : array;
}
Now your problem can be solved by:
stringify(parse("[ABC[D][E[FG]]HIJK[LMN]]OPQR[STUVW]XYZ")[0])
Not sure if I get the question right (sorry about that).
So you mean that if you were to have a string of characters X, you would like to check if the string combination Y is contained within X?
Where Y being ABC[D][E[FG]]HIJK[LMN]
If so then you could simply do:
var str = "[ABC[D][E[FG]]HIJK[LMN]]OPQR[STUVW]XYZ";
var res = str.match(/ABC\[D]\[E\[FG]]HIJK\[LMN]/);
The above would then return the string literal Y as it matches what is inside str.
It is important that you pay attention to the fact that the symbols [ are being escaped with a \. This is because in regex if you were to have the two square brackets with any letter in between (ie. [asd]) regex would then match the single characters included in the specified set.
You can test the regex here:
https://regex101.com/r/zK3vZ3/1
I think the problem is to get all characters from an opening square bracket up to the corresponding closing square bracket. Balancing groups are not implemented in JavaScript, but there is a workaround: we can use several optional groups between these square brackets.
The following regex will match up to 3 nested [...] groups and you can add the capturing groups to support more:
\[[^\]\[]*(?:
\[[^\]\[]*(?:
\[[^\]\[]*(?:\[[^\]\[]*\])*\]
)*[^\]\[]*
\][^\]\[]*
)*[^\]\[]*
\]
See example here. However, performance may be not that high with such heavy backtracking.
UPDATE
Use XRegExp:
var str = '[ABC[D][E[FG]]HIJK[LMN]]OPQR[STUVW]XYZ';
// First match:
var res = XRegExp.matchRecursive(str, '\\[', ']');
document.body.innerHTML = "Getting the first match:<br/><pre>" + JSON.stringify(res, 0, 4) + "</pre><br/>And now, multiple matches (add \"g\" modifier when defining the XRegExp)";
// Multiple matches:
res = XRegExp.matchRecursive(str, '\\[', ']', 'g');
document.body.innerHTML += "<pre>" + JSON.stringify(res, 0, 4) + "</pre>";
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/2.0.0/xregexp-all-min.js"></script>
var = invert ('HTMLBody ("Love"s ");');
staying equal to:
using invert to stay more or less the same, with \, only within the HTMLBody:
var = invert ('HTMLBody ("Love\"s ");');
I want to put \ var inside, and can not be in quotes function HTMLBody put \ only what you have inside the function HTMLBody in var
What is the function invert? Using regex?
Since you've stated that the snippet is at least partially generated from PHP, you'd honestly be better served fixing that code so the snippet is generated properly rather than trying to fix the result after-the-fact.
[Edit] For example, if you started with:
$functionName = 'HTMLBody';
$arguments = array('Love"s ');
You can json_encode() each $argument with array_map():
$encodedArgs = array_map('json_encode', $arguments);
Then, combine the $functionName and $encodedArgs, and once again encode as JSON to produce a String literal of the code:
echo json_encode( $functionName . '( ' . implode(', ', $encodedArgs) . ' )' );
This should echo the following (demo):
"HTMLBody( \"Love\\\"s \" );"
For the example in your comment, the starting values would be:
$functionName = 'functionjsalert';
$arguments = array('type"s" here', 'value"s here');
Which should result in (demo):
"functionjsalert( \"type\\\"s\\\" here\", \"value\\\"s here\" )"
To accomplish this with Regular Expressions, the string should be somewhat regular.
So, if it always follows the format of a function call with a single String argument, albeit with too many unescaped quotations:
functionName (" stringValue ");
Note: If the String of code includes multiple arguments, then this isn't possible. There's no reliable way to recognize when a " is part of a value or is supposed to delimit one.
But, the following should work:
function invert(snippet) {
return snippet.replace(/(\w+\s*\(\s*\")(.*?)(\"\s*\)\s*;)/, function (_, pre, sub, post) {
return pre + sub.replace(/(")/g, '\\$1') + post;
});
}
var result = invert('HTMLBody ("Love"s ");');
console.log(result); // HTMLBody ("Love\"s ")
The 1st RegExp matches 3 individual parts:
(\w+\s*\(\s*\") matches the function name, left parenthesis, and 1st double-quote
pre === 'HTMLBody ("'
(.*?) matches anything in between the previous group and the next
sub === 'Love"s '
(\"\s*\)\s*;) matches the last double-quote, the right parenthesis, and the semicolon
post === '");'
The 2nd RegExp then simply searches the middle group, sub, for any " and places a \ before them.
And, if invert already exists and serves another purpose, rename this function as desired.
I have the following string that I am attempting to match:
REQS_HOME->31
The following Javascript code is attempting to match this:
pathRegExPattern = '(' + docTypeHome + ')' + '(->)' + '(\d+)';
parsedResult = pathCookieValue.match(pathRegExPattern);
cookieToDelete = docType + '_ScrollPos_' + $3;
alert(parsedResult); // output - null
Assume the following:
docTypeHome = "REQS_HOME"
pathCookieValue = "REQS_HOME->31"
Firstly, I am not calling my match function properly. And secondly, how do I access the value where I am attempting to match the digit values using the backreference operator?
I need to extract the value 31.
Your digit-matching part needs to double-up on the backslashes:
pathRegExPattern = '(' + docTypeHome + ')' + '(->)' + '(\\d+)';
When you build up a regular expression from string parts, the string syntax itself will "eat" a backslash. Thus, the regex you were winding up with was just d+, without the backslash.
The "31" (or whatever the number ends up being) will be in parsedResult[3]. Note that it'll be a string, so if you need it to be a number you'll want to convert it first, via the Number constructor, or parseInt(), or whatever.
I have strings with extra whitespace characters. Each time there's more than one whitespace, I'd like it be only one. How can I do this using JavaScript?
Something like this:
var s = " a b c ";
console.log(
s.replace(/\s+/g, ' ')
)
You can augment String to implement these behaviors as methods, as in:
String.prototype.killWhiteSpace = function() {
return this.replace(/\s/g, '');
};
String.prototype.reduceWhiteSpace = function() {
return this.replace(/\s+/g, ' ');
};
This now enables you to use the following elegant forms to produce the strings you want:
"Get rid of my whitespaces.".killWhiteSpace();
"Get rid of my extra whitespaces".reduceWhiteSpace();
Here's a non-regex solution (just for fun):
var s = ' a b word word. word, wordword word ';
// with ES5:
s = s.split(' ').filter(function(n){ return n != '' }).join(' ');
console.log(s); // "a b word word. word, wordword word"
// or ES2015:
s = s.split(' ').filter(n => n).join(' ');
console.log(s); // "a b word word. word, wordword word"
Can even substitute filter(n => n) with .filter(String)
It splits the string by whitespaces, remove them all empty array items from the array (the ones which were more than a single space), and joins all the words again into a string, with a single whitespace in between them.
using a regular expression with the replace function does the trick:
string.replace(/\s/g, "")
I presume you're looking to strip spaces from the beginning and/or end of the string (rather than removing all spaces?
If that's the case, you'll need a regex like this:
mystring = mystring.replace(/(^\s+|\s+$)/g,' ');
This will remove all spaces from the beginning or end of the string. If you only want to trim spaces from the end, then the regex would look like this instead:
mystring = mystring.replace(/\s+$/g,' ');
Hope that helps.
jQuery.trim() works well.
http://api.jquery.com/jQuery.trim/
I know I should not necromancy on a subject, but given the details of the question, I usually expand it to mean:
I want to replace multiple occurences of whitespace inside the string with a single space
...and... I do not want whitespaces in the beginnin or end of the string (trim)
For this, I use code like this (the parenthesis on the first regexp are there just in order to make the code a bit more readable ... regexps can be a pain unless you are familiar with them):
s = s.replace(/^(\s*)|(\s*)$/g, '').replace(/\s+/g, ' ');
The reason this works is that the methods on String-object return a string object on which you can invoke another method (just like jQuery & some other libraries). Much more compact way to code if you want to execute multiple methods on a single object in succession.
var x = " Test Test Test ".split(" ").join("");
alert(x);
Try this.
var string = " string 1";
string = string.trim().replace(/\s+/g, ' ');
the result will be
string 1
What happened here is that it will trim the outside spaces first using trim() then trim the inside spaces using .replace(/\s+/g, ' ').
How about this one?
"my test string \t\t with crazy stuff is cool ".replace(/\s{2,9999}|\t/g, ' ')
outputs "my test string with crazy stuff is cool "
This one gets rid of any tabs as well
If you want to restrict user to give blank space in the name just create a if statement and give the condition. like I did:
$j('#fragment_key').bind({
keypress: function(e){
var key = e.keyCode;
var character = String.fromCharCode(key);
if(character.match( /[' ']/)) {
alert("Blank space is not allowed in the Name");
return false;
}
}
});
create a JQuery function .
this is key press event.
Initialize a variable.
Give condition to match the character
show a alert message for your matched condition.