Regex remove quotes after and before certain characters - javascript

I need to clean a string from unwanted quotes before parsing it to a JSON. The string looks like a general JSON expect sometimes there are some unwanted quotes in the description or other fields e.g.
'{"description":"this is a long **"** description with strange quotes **"** like **"**","number":"111111111","quantity":"10","price":"5.20","unit":"ST **"** " }'
needs to look like:
'{"description":"this is a long description with strange quotes like ","number":"111111111","quantity":"10","price":"5.20","unit":"ST" }'
I came up with following regex:
[\w\s\d](")[\w\s\d]
An issue is that the regex matches also the character before and after the unwanted quote. So with a simple string replace the characters are also getting replace. This is not wanted. It looks like:
"{"description":"this is a longescription with strange quotes " like "","number":"111111111","quantity":"10","price":"5.20","unit":"ST"" }"
Another issue is, that only the first occurance is matched and not every occurance.
Can someone please help?
Edit:
Solved! Correct regex was
([\w\s])"(?=\w|(?!\s*[\]}])\s)
and with the replace statement:
string.replace(/([\w\s])"(?=\w|(?!\s*[\]}])\s)/g, "$1");

An issue is that the regex matches also the character before and after the unwanted quote. So with a simple string replace the characters are also getting replace.
Use '$1$3' as the replacement string to include the first (character before) and third (character after) capturing groups.
Another issue is, that only the first occurance is matched and not every occurance.
Use the /g regex flag.
Also, note \d is redundant as \w includes digits.
I'm unsure if *'s are actually part of your input or you're just using them to indicate the quotation marks you want removed. I'll assume the latter given your attempted regex and description. Otherwise, just replace " with \**"\** in the 2nd regex group.
let s = '{"description":"this is a long " description with strange quotes " like "","number":"111111111","quantity":"10","price":"5.20","unit":"ST " " }'
let out = s.replace(/([\w\s"])(")(["\w\s])/g, '$1$3');
console.log(out);

Related

Backslash bug in JavaScript

I have a string that involves tricky \\ characters.
Below is the initial code, and what I am literally trying to achieve but it is not working. I have to replace the \" characters but I think that is where the bug is.
var current = csvArray[0][i].Replace("\"", "");
I have tried the variation below but it is still not working.
var current = csvArray[0][i].Replace('\"', '');
It is currently throwing an Uncaught TypeError: csvArray[0][i].Replace is not a function
Is there a way for Javascript to take my string ("\"") literally like in C#? Kindly help me investigate. Thanks!
If the sequence you want to match is a single backslash character followed by a quotation mark, then you need to escape the backslash itself because backslashes have special meaning in string literals. You then need to separately escape the quotation mark with its own backslash:
.replace("\\\"", "")
I believe that would also be true in C#.
Or you can simplify it by using single quotes around the string so that only the backslash needs to be escaped:
.replace('\\"', '')
If the first argument to .replace() is a string, however, it will only replace the first occurrence. To do a global replace you have to use a regular expression with the g flag, noting that backslashes need to be escaped in regular expressions too:
.replace(/\\"/g, '')
I'm not going to setup a demo array to exactly match your code, but here's a simple demo where you can see that a lone backslash or quote in the input string are not replaced, but all backslash-quote combinations are replaced:
var input = 'Some\\ test" \\" text \\" for demo \\"'
var output = input.replace(/\\"/g, '')
console.log(input)
console.log(output)

Regex to match a JSON String

I am building a JSON validator from scratch, but I am quite stuck with the string part. My hope was building a regex which would match the following sequence found on JSON.org:
My regex so far is:
/^\"((?=\\)\\(\"|\/|\\|b|f|n|r|t|u[0-9a-f]{4}))*\"$/
It does match the criteria with a backslash following by a character and an empty string. But I'm not sure how to use the UNICODE part.
Is there a regex to match any UNICODE character expert " or \ or control character? And will it match a newline or horizontal tab?
The last question is because the regex match the string "\t", but not " " (four spaces, but the idea is to be a tab). Otherwise I will need to expand the regex with it, which is not a problem, but my guess is the horizontal tab is a UNICODE character.
Thanks to Jaeger Kor, I now have the following regex:
/^\"((?=\\)\\(\"|\/|\\|b|f|n|r|t|u[0-9a-f]{4})|[^\\"]*)*\"$/
It appears to be correct, but is there any way to check for control characters or is this unneeded as they appear on the non-printable characters on regular-expressions.info? The input to validate is always text from a textarea.
Update: the regex is as following in case anyone needs it:
/^("(((?=\\)\\(["\\\/bfnrt]|u[0-9a-fA-F]{4}))|[^"\\\0-\x1F\x7F]+)*")$/
For your exact question create a character class
# Matches any character that isn't a \ or "
/[^\\"]/
And then you can just add * on the end to get 0 or unlimited number of them or alternatively 1 or an unlimited number with +
/[^\\"]*/
or
/[^\\"]+/
Also there is this below, found at https://regex101.com/ under the library tab when searching for json
/(?(DEFINE)
# Note that everything is atomic, JSON does not need backtracking if it's valid
# and this prevents catastrophic backtracking
(?<json>(?>\s*(?&object)\s*|\s*(?&array)\s*))
(?<object>(?>\{\s*(?>(?&pair)(?>\s*,\s*(?&pair))*)?\s*\}))
(?<pair>(?>(?&STRING)\s*:\s*(?&value)))
(?<array>(?>\[\s*(?>(?&value)(?>\s*,\s*(?&value))*)?\s*\]))
(?<value>(?>true|false|null|(?&STRING)|(?&NUMBER)|(?&object)|(?&array)))
(?<STRING>(?>"(?>\\(?>["\\\/bfnrt]|u[a-fA-F0-9]{4})|[^"\\\0-\x1F\x7F]+)*"))
(?<NUMBER>(?>-?(?>0|[1-9][0-9]*)(?>\.[0-9]+)?(?>[eE][+-]?[0-9]+)?))
)
\A(?&json)\z/x
This should match any valid json, you can also test it at the website above
EDIT:
Link to the regex
Use this, works also with array jsons [{...},{...}]:
((\[[^\}]{3,})?\{s*[^\}\{]{3,}?:.*\}([^\{]+\])?)
Demo:
https://regex101.com/r/aHAnJL/1

jQuery Attribute Equals Selector [name=”value”] issues passing a variable that ends with backslash

Humour me on this one.
In my code I am attempting to find all the divs that match a data-attribute value. Now this value is created by a user so the string could contain anything.
During my testing I ran into an error when the value contained a quote and ended with a backslash "\" (The javascript escape character).
Error: Syntax error, unrecognized expression:
className[data-attributename="Mac"Mac\"]
Here is an example (please note in this example the double backslash escapes itself and the first backslash escapes the quote):
var value= "Mac\"Mac\\";
$('.className[data-attributename="'+value+'"]');
This error only occurs if the string contains a quote (") and has a backslash (\) at the end of the string. If there is a space after the backslash or if the backslash is in beginning or middle of the string there is no issue.
Is it possible to pass a variable that includes a quote or apostrophe ( " ' ) and ends with a backslash (\) into the jQuery Attribute Equals Selector?
One obvious solution would be just to prevent my users from using the backslash "\" character. If I do this is there any other characters that could be harmful using this jQuery selector?
Another solution would be:
var value= "Mac\"Mac\\";
$('.className').each(function(){
if($(this).attr('data-attributename') === value){
//perform action
}
});
With this solution would it be less efficient because it would have to iterate through each element or does the Attribute Equals Selector essentially work the same way? If so, for safety should I always use this solution over the attribute equals selector?
Here is an example of the div I would be trying to select:
$('body').append("<div class='className' data-attributename='Mac\"Mac\\' ></div>")
You will have to use the second solution to get jQuery working.
Similar questions have been asked an this is an answer to one of them. jQuery selector value escaping
$("#SomeDropdown >option[value='a\\'b]<p>']")
But this doesn't work in
jQuery because its selector parser is not completely
standards-compliant. It uses this regex to parse the value part of an
[attr=value] condition:
(['"]*)(.*?)\3|)\s*\]
\3 being the group containing the opening
quotes, which weirdly are allowed to be multiple opening quotes, or no
opening quotes at all. The .*? then can parse any character, including
quotes until it hits the first ‘]’ character, ending the match. There
is no provision for backslash-escaping CSS special characters, so you
can't match an arbitrary string value in jQuery.
You can use escape() function to escape the special character like
value = escape('Max\\'); //it will return Max%5C

What is this "/\,$/"?

Tried to search for /\,$/ online, but coudnt find anything.
I have:
coords = coords.replace(/\,$/, "");
Im guessing it returns coords string index number. What I have to search online for this, so I can learn more?
/\,$/ finds the comma character (,) at the end of a string (denoted by the $) and replaces it with empty (""). You sometimes see this in regex code aiming to clean up excerpts of text.
It's a regular expression to remove a trailing comma.
That thing is a Regular Expression, also known as regex or regexp. It is a way to "match" strings using some rules. If you want to learn how to use it in JavaScript, read the Mozilla Developer Network page about RegExp.
By the way, regular expressions are also available on most languages and in some tools. It is a very useful thing to learn.
That's a regular expression that finds a comma at the end of a string. That code removes the comma.
// defines a JavaScript regular expression, used to match a pattern within a string.
\,$ is the pattern
In this case \, translates to ,. A backslash is used to escape special characters, but in this case, it's not necessary. An example where it would be necessary would be to remove trailing periods. If you tried to do that with /.$/ the period here has a different meaning; it is used as a wildcard to match [almost] any character (aside for some newlines). So in this case to match on "." (period character) you would have to escape the wildcard (/\.$/).
When $ is placed at the end of the pattern, it means only look at the end of the string. This means that you can't mistakingly find a comma anywhere in the middle of the string (e.g., not after help in help, me,), only at the end (trailing). It also speeds of the regular expression search considerably. If you wanted to match on characters only at the beginning of the string, you would start off the pattern with a carat (^), for instance /^,/ would find a comma at the start of a string if one existed.
It's also important to note that you're only removing one comma, whereas if you use the plus (+) after the comma, you'd be replacing one or more: /,+$/.
Without the +; trailing commas,, becomes trailing commas,
With the +; no trailing comma,, becomes no trailing comma

Javascript String pattern Validation

I have a string and I want to validate that string so that it must not contain certain characters like '/' '\' '&' ';' etc... How can I validate all that at once?
You can solve this with regular expressions!
mystring = "hello"
yourstring = "bad & string"
validRegEx = /^[^\\\/&]*$/
alert(mystring.match(validRegEx))
alert(yourstring.match(validRegEx))
matching against the regex returns the string if it is ok, or null if its invalid!
Explanation:
JavaScript RegEx Literals are delimited like strings, but with slashes (/'s) instead of quotes ("'s).
The first and last characters of the validRegEx cause it to match against the whole string, instead of just part, the carat anchors it to the beginning, and the dollar sign to the end.
The part between the brackets ([ and ]) are a character class, which matches any character so long as it's in the class. The first character inside that, a carat, means that the class is negated, to match the characters not mentioned in the character class. If it had been omited, the class would match the characters it specifies.
The next two sequences, \\ and \/ are backslash escaped because the backslash by itself would be an escape sequence for something else, and the forward slash would confuse the parser into thinking that it had reached the end of the regex, (exactly similar to escaping quotes in strings).
The ampersand (&) has no special meaning and is unescaped.
The remaining character, the kleene star, (*) means that whatever preceeded it should be matched zero or more times, so that the character class will eat as many characters that are not forward or backward slashes or ampersands, including none if it cant find any. If you wanted to make sure the matched string was non-empty, you can replace it with a plus (+).
I would use regular expressions.
See this guide from Mozillla.org. This article does also give a good introduction to regular expressions in JavaScript.
Here is a good article on Javascript validation. Remember you will need to validate on the server side too. Javascript validation can easily be circumvented, so it should never be used for security reasons such as preventing SQL Injection or XSS attacks.
You could learn regular expressions, or (probably simpler if you only check for one character at a time) you could have a list of characters and then some kind of sanitize function to remove each one from the string.
var myString = "An /invalid &string;";
var charList = ['/', '\\', '&', ';']; // etc...
function sanitize(input, list) {
for (char in list) {
input = input.replace(char, '');
}
return input
}
So then:
sanitize(myString, charList) // returns "An invalid string"
You can use the test method, with regular expressions:
function validString(input){
return !(/[\\/&;]/.test(input));
}
validString('test;') //false
You can use regex. For example if your string matches:
[\\/&;]+
then it is not valid. Look at:
http://www.regular-expressions.info/javascriptexample.html
You could probably use a regular expression.
As the others have answered you can solve this with regexp but remember to also check the value server-side. There is no guarantee that the user has JavaScript activated. Never trust user input!

Categories