Do not understand the work of the regular expression - javascript

"She's the one!".match(/['"](.+?)['"]/g);
Why is the result of Null? The idea is to be a coincidence "She'?

It matches text between two quote mark or apostrophe characters. The string only has one in it. (The " that are used to delimit the string literal are not part of the data in the string).

You're trying to match the outer quotes, which is unneeded
You can get your desired answer by using:
"She's the one!".match(/(.+?)['"]/g);

Related

Regex remove quotes after and before certain characters

I need to clean a string from unwanted quotes before parsing it to a JSON. The string looks like a general JSON expect sometimes there are some unwanted quotes in the description or other fields e.g.
'{"description":"this is a long **"** description with strange quotes **"** like **"**","number":"111111111","quantity":"10","price":"5.20","unit":"ST **"** " }'
needs to look like:
'{"description":"this is a long description with strange quotes like ","number":"111111111","quantity":"10","price":"5.20","unit":"ST" }'
I came up with following regex:
[\w\s\d](")[\w\s\d]
An issue is that the regex matches also the character before and after the unwanted quote. So with a simple string replace the characters are also getting replace. This is not wanted. It looks like:
"{"description":"this is a longescription with strange quotes " like "","number":"111111111","quantity":"10","price":"5.20","unit":"ST"" }"
Another issue is, that only the first occurance is matched and not every occurance.
Can someone please help?
Edit:
Solved! Correct regex was
([\w\s])"(?=\w|(?!\s*[\]}])\s)
and with the replace statement:
string.replace(/([\w\s])"(?=\w|(?!\s*[\]}])\s)/g, "$1");
An issue is that the regex matches also the character before and after the unwanted quote. So with a simple string replace the characters are also getting replace.
Use '$1$3' as the replacement string to include the first (character before) and third (character after) capturing groups.
Another issue is, that only the first occurance is matched and not every occurance.
Use the /g regex flag.
Also, note \d is redundant as \w includes digits.
I'm unsure if *'s are actually part of your input or you're just using them to indicate the quotation marks you want removed. I'll assume the latter given your attempted regex and description. Otherwise, just replace " with \**"\** in the 2nd regex group.
let s = '{"description":"this is a long " description with strange quotes " like "","number":"111111111","quantity":"10","price":"5.20","unit":"ST " " }'
let out = s.replace(/([\w\s"])(")(["\w\s])/g, '$1$3');
console.log(out);

Does regex syntax provide a way of escaping a whole string, rather than escaping characters one by one?

If I want to find a reference to precisely the following string:
http ://www.mydomain.com/home
within a more complex regex expression.
Is it possible to escape the whole sequence instead of escaping each / and . character individually? To get something more readable than
/http:\/\/www\.mydomain\.com\/home/
In the regex parsing site https://regexr.com/ , if I type the url in and set a regex to
/(http ://www.mydomain.com/home)/
, it appears to recognize the string, yet declares an error:
Unescaped forward slash. This may cause issues if copying/pasting this expression into code.
So I'm confused about this issue.
It appears that regex does not offer such a syntax, at least for Javascript. It is possible, however, to proceed as follows:
use a string and automatically escape all the special characters in it,
as indicated here: Javascript regular expression - string to RegEx object
concatenate that string with strings representing the rest of the expression you want to create
transform the string into a regex expression as indicated in Escape string for use in Javascript regex .

jQuery Attribute Equals Selector [name=”value”] issues passing a variable that ends with backslash

Humour me on this one.
In my code I am attempting to find all the divs that match a data-attribute value. Now this value is created by a user so the string could contain anything.
During my testing I ran into an error when the value contained a quote and ended with a backslash "\" (The javascript escape character).
Error: Syntax error, unrecognized expression:
className[data-attributename="Mac"Mac\"]
Here is an example (please note in this example the double backslash escapes itself and the first backslash escapes the quote):
var value= "Mac\"Mac\\";
$('.className[data-attributename="'+value+'"]');
This error only occurs if the string contains a quote (") and has a backslash (\) at the end of the string. If there is a space after the backslash or if the backslash is in beginning or middle of the string there is no issue.
Is it possible to pass a variable that includes a quote or apostrophe ( " ' ) and ends with a backslash (\) into the jQuery Attribute Equals Selector?
One obvious solution would be just to prevent my users from using the backslash "\" character. If I do this is there any other characters that could be harmful using this jQuery selector?
Another solution would be:
var value= "Mac\"Mac\\";
$('.className').each(function(){
if($(this).attr('data-attributename') === value){
//perform action
}
});
With this solution would it be less efficient because it would have to iterate through each element or does the Attribute Equals Selector essentially work the same way? If so, for safety should I always use this solution over the attribute equals selector?
Here is an example of the div I would be trying to select:
$('body').append("<div class='className' data-attributename='Mac\"Mac\\' ></div>")
You will have to use the second solution to get jQuery working.
Similar questions have been asked an this is an answer to one of them. jQuery selector value escaping
$("#SomeDropdown >option[value='a\\'b]<p>']")
But this doesn't work in
jQuery because its selector parser is not completely
standards-compliant. It uses this regex to parse the value part of an
[attr=value] condition:
(['"]*)(.*?)\3|)\s*\]
\3 being the group containing the opening
quotes, which weirdly are allowed to be multiple opening quotes, or no
opening quotes at all. The .*? then can parse any character, including
quotes until it hits the first ‘]’ character, ending the match. There
is no provision for backslash-escaping CSS special characters, so you
can't match an arbitrary string value in jQuery.
You can use escape() function to escape the special character like
value = escape('Max\\'); //it will return Max%5C

Trouble with word-boundary (\b)

I have an array of keywords, and I want to know whether at least one of the keywords is found within some string that has been submitted. I further want to be absolutely sure that it is the keyword that has been matched, and not something that is very similar to the word.
Say, for example, that our keywords are [English, Eng, En] because we are looking for some variation of English.
Now, say that the input from a user is i h8 eng class, or something equally provocative and illiterate - then the eng should be matched. It should also fail to match a word like england or some odd thing chen, even though it's got the en bit.
So, in my infinite lack of wisdom I believed I could do something along the lines of this in order to match one of my array items with the input:
.match(RegExp('\b('+array.join('|')+')\b','i'))
With the thinking that the regular expression would look for matches from the array, now presented like (English|Eng|En) and then look to see whether there were zero-width word bounds on either side.
You need to double the backslashes.
When you create a regex with the RegExp() constructor, you're passing in a string. JavaScript string constant syntax also treats the backslash as a meta-character, for quoting quotes etc. Thus, the backslashes will be effectively stripped out before the RegExp() code even runs!
By doubling them, the step of parsing the string will leave one backslash behind. Then the RegExp() parser will see the single backslash before the "b" and do the right thing.
You need to double the backslashes in a JavaScript string or you'll encode a Backspace character:
.match(RegExp('\\b('+array.join('|')+')\\b','i'))
You need to double-escape a \b, cause it have special value in strings:
.match(RegExp('\\b('+array.join('|')+')\\b','i'))
\b is an escape sequence inside string literals (see table 2.1 on this page). You should escape it by adding one extra slash:
.match(RegExp('\\b('+array.join('|')+')\\b','i'))
You do not need to escape \b when used inside a regular expression literal:
/\b(english|eng|en)\b/i

Result of javascript regular expression not understood

When I eval (in javascript) [I meant, used string.match()]:
<!--:es-->Text number 1<!--:--><!--:en-->text 2<!--:-->
using
/<!--:es-->(.|\n)*?<!--:-->/
I get as match:
Text number 1,1
I mean, it adds a comma and repeats the last character. Does anybody know why this happens?
PS. text could have carriage return, that is why i used (.|\n).
Thanks a lot.
The result of a regular expression match is an array.
The zero-th element of the array is the whole match : "Text number 1"
The first element of the array is the contents of the first group, in this case "1" since the * is outside the parentheses.
When the array is converted to a string, you get the contents with commas in between.
When I eval (in javascript)
Don't. Use RegExp
Eval() evaluates any ECMAScript, you don't want to do this if you don't have 100% control over the input.
Some research has shown me that the . can't match newlines in javascript.
I'd rewrite your regex this way:
/<!--:es-->[\s\S]*?<!--:-->/
This will avoid the problem you saw, as it excludes the capture group.
And ghoppe is right: use RegExp.

Categories