Javascript Regular expression not working as expected - javascript

I have string which is in form of JSON but not a valid JSON string. String is like as below (Its single line string but I have added new lines for clarity.)
"{
clientId :\"abc\",
note:\"ATTN:Please take care of item x\"
}"
I am trying to fix it (reformating to valid JSON) using javascript regular expression. I am currently using following regular expression but its not working for second property i.e. note as it has colon (:) in its value.
retObject.replace(/(['"])?([a-zA-Z0-9_]+)(['"])?:/g, '"$2": ');
What I am trying to do here is using regular expression to reformat above string to
"{
"clientId" :"abc",
"note":"ATTN:Please take care of item x"
}"
Tried many ways but couldnt get it just right as I am still beginer in RegEx.

Try using .split() with RegExp /[^\w\s\:]/ , .test() with RegExp /\:$/ , .match() with RegExp /\w+/
var str = "{clientId :\"abc\",note:\"ATTN:Please take care of item x\"}";
var res = {};
var arr = str.split(/[^\w\s\:]/).filter(Boolean);
for (var i = 0; i < arr.length; i++) {
if ( /\:$/.test(arr[i]) ) {
res[ arr[i].match(/\w+/) ] = arr[i + 1]
}
}
console.log(res)

Trying to fix broken JSON with a regexp is a fool's errand. Just when you think you have the regexp working, you will be presented with additional gobbledygook such as
"{ clientId :\"abc\", note:\"ATTN:Please take \"care\" of item x\" }"
where one of the strings has double quotes inside of it, and now your regexp will fail.
For your own sanity and that of your entire team, both present and future, have the upstream component that is producing this broken JSON fixed. All languages in the world have perfectly competent JSON serializers which will create conformant JSON. Tell the upstream folks to use them.
If you have absolutely no choice, use the much-reviled eval. Meet evil with evil:
eval('(' + json.replace(/\\"/g, '"') + ')')

Related

Find character characters except when surrounded by specific characters

I have a string: "${styles.button} ${styles[color]} ${styles[size]} ${styles[_state]} ${iconOnly ? styles.iconOnly : ''}", and I'm trying to use regex to find all the spaces, except for spaces that are part of an interpolation string (${...}).
I'm willing to admit that regex might not be the right tool for this job, but I'm curious what I'm missing.
Essentially what I'm trying to do is replace the spaces with a newline character.
You can split the string in interpolation string and non-interpolation string sequences and then only modify the odd sequences (the resulting array always starts with a non-interpolation string, don't worry about that). This has to be done, because regular expressions are limited in the states they can remember (for more about that study CS). A solution would be:
var string = "${styles.button} ${styles[color]} ${styles[size]} ${styles[_state]} ${iconOnly ? styles.iconOnly : ''}";
var result = string
// split in non-interpolation string and interpolation string sequences
.split(/(\${[^}]*})/g)
// modify the sequences with odd indices ( non-interpolation)
.map((part, i) => (i % 2 ? part : part.replace(/ +/g, '')))
// concatenate the strings
.join('');
console.log(result);
But also mind the comment by ggorlen on your question:
Looks like you're trying to use regex to parse arbitrary JS template strings. That isn't an easy task in the general case and regex is probably the wrong tool for the job--it's likely an xy problem. Can you provide more context (why do you need to parse JS template strings in the first place?) and show an attempt? Thanks.
Assuming you have only have ${...} patterns separated by space as per your example you can apply this regex:
var str = "${styles.button} ${styles[color]} ${styles[size]} ${styles[_state]} ${iconOnly ? styles.iconOnly : ''}"
var re = /(\}) +(\$\{)/g;
var result = str.replace(re, "$1\n$2");
console.log('result: ' + result);
Result:
result: ${styles.button}
${styles[color]}
${styles[size]}
${styles[_state]}
${iconOnly ? styles.iconOnly : ''}
I tested with a simple find ' \$' (without quotes), replace with '\n$' (without quotes) - in sublime text regex search, works well

Parsing string as JSON with single quotes?

I have a string
str = "{'a':1}";
JSON.parse(str);
VM514:1 Uncaught SyntaxError: Unexpected token '(…)
How can I parse the above string (str) into a JSON object ?
This seems like a simple parsing. It's not working though.
The JSON standard requires double quotes and will not accept single quotes, nor will the parser.
If you have a simple case with no escaped single quotes in your strings (which would normally be impossible, but this isn't JSON), you can simple str.replace(/'/g, '"') and you should end up with valid JSON.
I know it's an old post, but you can use JSON5 for this purpose.
<script src="json5.js"></script>
<script>JSON.stringify(JSON5.parse('{a:1}'))</script>
If you are sure your JSON is safely under your control (not user input) then you can simply evaluate the JSON. Eval accepts all quote types as well as unquoted property names.
var str = "{'a':1}";
var myObject = (0, eval)('(' + str + ')');
The extra parentheses are required due to how the eval parser works.
Eval is not evil when it is used on data you have control over.
For more on the difference between JSON.parse and eval() see JSON.parse vs. eval()
Using single quotes for keys are not allowed in JSON. You need to use double quotes.
For your use-case perhaps this would be the easiest solution:
str = '{"a":1}';
Source:
If a property requires quotes, double quotes must be used. All
property names must be surrounded by double quotes.
var str = "{'a':1}";
str = str.replace(/'/g, '"')
obj = JSON.parse(str);
console.log(obj);
This solved the problem for me.
Something like this:
var div = document.getElementById("result");
var str = "{'a':1}";
str = str.replace(/\'/g, '"');
var parsed = JSON.parse(str);
console.log(parsed);
div.innerText = parsed.a;
<div id="result"></div>
// regex uses look-forwards and look-behinds to select only single-quotes that should be selected
const regex = /('(?=(,\s*')))|('(?=:))|((?<=([:,]\s*))')|((?<={)')|('(?=}))/g;
str = str.replace(regex, '"');
str = JSON.parse(str);
The other answers simply do not work in enough cases. Such as the above cited case: "title": "Mama's Friend", it naively will convert the apostrophe unless you use regex. JSON5 will want the removal of single quotes, introducing a similar problem.
Warning: although I believe this is compatible with all situations that will reasonably come up, and works much more often than other answers, it can still break in theory.
sometimes you just get python data, it looks a little bit like json but it is not. If you know that it is pure python data, then you can eval these data with python and convert it to json like this:
echo "{'a':1}" | /usr/bin/python3 -c "import json;print(json.dumps(eval(input())))"
Output:
{"a": 1}
this is good json.
if you are in javascript, then you could use JSON.stringify like this:
data = {'id': 74,'parentId': null};
console.log(JSON.stringify(data));
Output:
> '{"id":74,"parentId":null}'
If you assume that the single-quoted values are going to be displayed, then instead of this:
str = str.replace(/\'/g, '"');
you can keep your display of the single-quote by using this:
str = str.replace(/\'/g, '\&apos;\');
which is the HTML equivalent of the single quote.
json = ( new Function("return " + jsonString) )();

How to make JSON.stringify encode non-ascii characters in ascii-safe escaped form (\uXXXX) without "post-processing"?

I have to send characters like ü to the server as unicode character but as an ASCII-safe string. So it must be \u00fc (6 characters) not the character itself. But after JSON.stringify it always gets ü regardless of what I've done with it.
If I use 2 backslashes like \\u00fc then I get 2 in the JSON string as well and that's not good either.
Important constraint: I can't modify the string after JSON.stringify, it's part of the framework without workaround and we don't want to fork the whole package.
Can this be done? If so, how?
If, for some reason, you want your JSON to be ASCII-safe, replace non-ascii characters after json encoding:
var obj = {"key":"füßchen", "some": [1,2,3]}
var json = JSON.stringify(obj)
json = json.replace(/[\u007F-\uFFFF]/g, function(chr) {
return "\\u" + ("0000" + chr.charCodeAt(0).toString(16)).substr(-4)
})
document.write(json);
document.write("<br>");
document.write(JSON.parse(json));
This should get you to where you want. I heavily based this on this question: Javascript, convert unicode string to Javascript escape?
var obj = {"key":"ü"};
var str1 = JSON.stringify(obj);
var str2 = "";
var chr = "";
for(var i = 0; i < str1.length; i++){
if (str1[i].match(/[^\x00-\x7F]/)){
chr = "\\u" + ("000" + str1[i].charCodeAt(0).toString(16)).substr(-4);
}else{
chr = str1[i];
}
str2 = str2 + chr;
}
console.log(str2)
I would recommend though that you look into #t.niese comment about parsing this server side.
Depending on the exact scenario, you can affect the behavior of JSON.stringify by providing a toJSON method as detailed here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify#tojson_behavior
If an object has a toJSON method that is a function, then calling JSON.stringify on that will use the result of that method rather than the normal serialization. You could combine this with the approaches mentioned in other answers to get the result you want, even if a library doesn't naturally provide any hooks for customization.
(Of course, its possible that a third-party library is itself doing something that overrides this behavior.)

How to split a long regular expression into multiple lines in JavaScript?

I have a very long regular expression, which I wish to split into multiple lines in my JavaScript code to keep each line length 80 characters according to JSLint rules. It's just better for reading, I think.
Here's pattern sample:
var pattern = /^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
Extending #KooiInc answer, you can avoid manually escaping every special character by using the source property of the RegExp object.
Example:
var urlRegex= new RegExp(''
+ /(?:(?:(https?|ftp):)?\/\/)/.source // protocol
+ /(?:([^:\n\r]+):([^#\n\r]+)#)?/.source // user:pass
+ /(?:(?:www\.)?([^\/\n\r]+))/.source // domain
+ /(\/[^?\n\r]+)?/.source // request
+ /(\?[^#\n\r]*)?/.source // query
+ /(#?[^\n\r]*)?/.source // anchor
);
or if you want to avoid repeating the .source property you can do it using the Array.map() function:
var urlRegex= new RegExp([
/(?:(?:(https?|ftp):)?\/\/)/ // protocol
,/(?:([^:\n\r]+):([^#\n\r]+)#)?/ // user:pass
,/(?:(?:www\.)?([^\/\n\r]+))/ // domain
,/(\/[^?\n\r]+)?/ // request
,/(\?[^#\n\r]*)?/ // query
,/(#?[^\n\r]*)?/ // anchor
].map(function(r) {return r.source}).join(''));
In ES6 the map function can be reduced to:
.map(r => r.source)
[Edit 2022/08] Created a small github repository to create regular expressions with spaces, comments and templating.
You could convert it to a string and create the expression by calling new RegExp():
var myRE = new RegExp (['^(([^<>()[\]\\.,;:\\s#\"]+(\\.[^<>(),[\]\\.,;:\\s#\"]+)*)',
'|(\\".+\\"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
Notes:
when converting the expression literal to a string you need to escape all backslashes as backslashes are consumed when evaluating a string literal. (See Kayo's comment for more detail.)
RegExp accepts modifiers as a second parameter
/regex/g => new RegExp('regex', 'g')
[Addition ES20xx (tagged template)]
In ES20xx you can use tagged templates. See the snippet.
Note:
Disadvantage here is that you can't use plain whitespace in the regular expression string (always use \s, \s+, \s{1,x}, \t, \n etc).
(() => {
const createRegExp = (str, opts) =>
new RegExp(str.raw[0].replace(/\s/gm, ""), opts || "");
const yourRE = createRegExp`
^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|
(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|
(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$`;
console.log(yourRE);
const anotherLongRE = createRegExp`
(\byyyy\b)|(\bm\b)|(\bd\b)|(\bh\b)|(\bmi\b)|(\bs\b)|(\bms\b)|
(\bwd\b)|(\bmm\b)|(\bdd\b)|(\bhh\b)|(\bMI\b)|(\bS\b)|(\bMS\b)|
(\bM\b)|(\bMM\b)|(\bdow\b)|(\bDOW\b)
${"gi"}`;
console.log(anotherLongRE);
})();
Using strings in new RegExp is awkward because you must escape all the backslashes. You may write smaller regexes and concatenate them.
Let's split this regex
/^foo(.*)\bar$/
We will use a function to make things more beautiful later
function multilineRegExp(regs, options) {
return new RegExp(regs.map(
function(reg){ return reg.source; }
).join(''), options);
}
And now let's rock
var r = multilineRegExp([
/^foo/, // we can add comments too
/(.*)/,
/\bar$/
]);
Since it has a cost, try to build the real regex just once and then use that.
Thanks to the wonderous world of template literals you can now write big, multi-line, well-commented, and even semantically nested regexes in ES6.
//build regexes without worrying about
// - double-backslashing
// - adding whitespace for readability
// - adding in comments
let clean = (piece) => (piece
.replace(/((^|\n)(?:[^\/\\]|\/[^*\/]|\\.)*?)\s*\/\*(?:[^*]|\*[^\/])*(\*\/|)/g, '$1')
.replace(/((^|\n)(?:[^\/\\]|\/[^\/]|\\.)*?)\s*\/\/[^\n]*/g, '$1')
.replace(/\n\s*/g, '')
);
window.regex = ({raw}, ...interpolations) => (
new RegExp(interpolations.reduce(
(regex, insert, index) => (regex + insert + clean(raw[index + 1])),
clean(raw[0])
))
);
Using this you can now write regexes like this:
let re = regex`I'm a special regex{3} //with a comment!`;
Outputs
/I'm a special regex{3}/
Or what about multiline?
'123hello'
.match(regex`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`)
[2]
Outputs hel, neat!
"What if I need to actually search a newline?", well then use \n silly!
Working on my Firefox and Chrome.
Okay, "how about something a little more complex?"
Sure, here's a piece of an object destructuring JS parser I was working on:
regex`^\s*
(
//closing the object
(\})|
//starting from open or comma you can...
(?:[,{]\s*)(?:
//have a rest operator
(\.\.\.)
|
//have a property key
(
//a non-negative integer
\b\d+\b
|
//any unencapsulated string of the following
\b[A-Za-z$_][\w$]*\b
|
//a quoted string
//this is #5!
("|')(?:
//that contains any non-escape, non-quote character
(?!\5|\\).
|
//or any escape sequence
(?:\\.)
//finished by the quote
)*\5
)
//after a property key, we can go inside
\s*(:|)
|
\s*(?={)
)
)
((?:
//after closing we expect either
// - the parent's comma/close,
// - or the end of the string
\s*(?:[,}\]=]|$)
|
//after the rest operator we expect the close
\s*\}
|
//after diving into a key we expect that object to open
\s*[{[:]
|
//otherwise we saw only a key, we now expect a comma or close
\s*[,}{]
).*)
$`
It outputs /^\s*((\})|(?:[,{]\s*)(?:(\.\.\.)|(\b\d+\b|\b[A-Za-z$_][\w$]*\b|("|')(?:(?!\5|\\).|(?:\\.))*\5)\s*(:|)|\s*(?={)))((?:\s*(?:[,}\]=]|$)|\s*\}|\s*[{[:]|\s*[,}{]).*)$/
And running it with a little demo?
let input = '{why, hello, there, "you huge \\"", 17, {big,smelly}}';
for (
let parsed;
parsed = input.match(r);
input = parsed[parsed.length - 1]
) console.log(parsed[1]);
Successfully outputs
{why
, hello
, there
, "you huge \""
, 17
,
{big
,smelly
}
}
Note the successful capturing of the quoted string.
I tested it on Chrome and Firefox, works a treat!
If curious you can checkout what I was doing, and its demonstration.
Though it only works on Chrome, because Firefox doesn't support backreferences or named groups. So note the example given in this answer is actually a neutered version and might get easily tricked into accepting invalid strings.
There are good answers here, but for completeness someone should mention Javascript's core feature of inheritance with the prototype chain. Something like this illustrates the idea:
RegExp.prototype.append = function(re) {
return new RegExp(this.source + re.source, this.flags);
};
let regex = /[a-z]/g
.append(/[A-Z]/)
.append(/[0-9]/);
console.log(regex); //=> /[a-z][A-Z][0-9]/g
The regex above is missing some black slashes which isn't working properly. So, I edited the regex. Please consider this regex which works 99.99% for email validation.
let EMAIL_REGEXP =
new RegExp (['^(([^<>()[\\]\\\.,;:\\s#\"]+(\\.[^<>()\\[\\]\\\.,;:\\s#\"]+)*)',
'|(".+"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
To avoid the Array join, you can also use the following syntax:
var pattern = new RegExp('^(([^<>()[\]\\.,;:\s#\"]+' +
'(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#' +
'((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|' +
'(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$');
You can simply use string operation.
var pattenString = "^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|"+
"(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|"+
"(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$";
var patten = new RegExp(pattenString);
I tried improving korun's answer by encapsulating everything and implementing support for splitting capturing groups and character sets - making this method much more versatile.
To use this snippet you need to call the variadic function combineRegex whose arguments are the regular expression objects you need to combine. Its implementation can be found at the bottom.
Capturing groups can't be split directly that way though as it would leave some parts with just one parenthesis. Your browser would fail with an exception.
Instead I'm simply passing the contents of the capture group inside an array. The parentheses are automatically added when combineRegex encounters an array.
Furthermore quantifiers need to follow something. If for some reason the regular expression needs to be split in front of a quantifier you need to add a pair of parentheses. These will be removed automatically. The point is that an empty capture group is pretty useless and this way quantifiers have something to refer to. The same method can be used for things like non-capturing groups (/(?:abc)/ becomes [/()?:abc/]).
This is best explained using a simple example:
var regex = /abcd(efghi)+jkl/;
would become:
var regex = combineRegex(
/ab/,
/cd/,
[
/ef/,
/ghi/
],
/()+jkl/ // Note the added '()' in front of '+'
);
If you must split character sets you can use objects ({"":[regex1, regex2, ...]}) instead of arrays ([regex1, regex2, ...]). The key's content can be anything as long as the object only contains one key. Note that instead of () you have to use ] as dummy beginning if the first character could be interpreted as quantifier. I.e. /[+?]/ becomes {"":[/]+?/]}
Here is the snippet and a more complete example:
function combineRegexStr(dummy, ...regex)
{
return regex.map(r => {
if(Array.isArray(r))
return "("+combineRegexStr(dummy, ...r).replace(dummy, "")+")";
else if(Object.getPrototypeOf(r) === Object.getPrototypeOf({}))
return "["+combineRegexStr(/^\]/, ...(Object.entries(r)[0][1]))+"]";
else
return r.source.replace(dummy, "");
}).join("");
}
function combineRegex(...regex)
{
return new RegExp(combineRegexStr(/^\(\)/, ...regex));
}
//Usage:
//Original:
console.log(/abcd(?:ef[+A-Z0-9]gh)+$/.source);
//Same as:
console.log(
combineRegex(
/ab/,
/cd/,
[
/()?:ef/,
{"": [/]+A-Z/, /0-9/]},
/gh/
],
/()+$/
).source
);
Personally, I'd go for a less complicated regex:
/\S+#\S+\.\S+/
Sure, it is less accurate than your current pattern, but what are you trying to accomplish? Are you trying to catch accidental errors your users might enter, or are you worried that your users might try to enter invalid addresses? If it's the first, I'd go for an easier pattern. If it's the latter, some verification by responding to an e-mail sent to that address might be a better option.
However, if you want to use your current pattern, it would be (IMO) easier to read (and maintain!) by building it from smaller sub-patterns, like this:
var box1 = "([^<>()[\]\\\\.,;:\s#\"]+(\\.[^<>()[\\]\\\\.,;:\s#\"]+)*)";
var box2 = "(\".+\")";
var host1 = "(\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])";
var host2 = "(([a-zA-Z\-0-9]+\\.)+[a-zA-Z]{2,})";
var regex = new RegExp("^(" + box1 + "|" + box2 + ")#(" + host1 + "|" + host2 + ")$");
#Hashbrown's great answer got me on the right track. Here's my version, also inspired by this blog.
function regexp(...args) {
function cleanup(string) {
// remove whitespace, single and multi-line comments
return string.replace(/\s+|\/\/.*|\/\*[\s\S]*?\*\//g, '');
}
function escape(string) {
// escape regular expression
return string.replace(/[-.*+?^${}()|[\]\\]/g, '\\$&');
}
function create(flags, strings, ...values) {
let pattern = '';
for (let i = 0; i < values.length; ++i) {
pattern += cleanup(strings.raw[i]); // strings are cleaned up
pattern += escape(values[i]); // values are escaped
}
pattern += cleanup(strings.raw[values.length]);
return RegExp(pattern, flags);
}
if (Array.isArray(args[0])) {
// used as a template tag (no flags)
return create('', ...args);
}
// used as a function (with flags)
return create.bind(void 0, args[0]);
}
Use it like this:
regexp('i')`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`
To create this RegExp object:
/(\d+)([a-z]{1,3})/i

Regex to pull out querystring variables and values

I am using Javascript and trying to break out query string variables from their values. I made a regex that works just fine IF there are no other ampersands except for denoting variables, otherwise the data cuts off at the ampersand.
example: ajax=10&a=test&b=cats & dogs returns a = "test", b = "cats "
I cannot encode the ampersands before the string is made due to the nature of this project and the inefficiency with encoding/replacing characters in hundreds of locations upon entry.
What this piece of code should ultimately do is turn the querystring ajax=10&a=cats & dogs into ajax=10&a=cats%20%26%20dogs
list = [ 'ajax','&obj','&a','&b','&c','&d','&e','&f','&g','&h','&m' ];
ajax_string = '';
for (var i=0, li=list.length; i<li; i++) {
variables = new RegExp(list[i] +"=([^&(.+)=]*)");
query_string = variables.exec(str);
if (query_string != null) {
alert(query_string);
}
}
The query string should be split on ampersands. Any ampersands in the values of actual arguments should be converted to %26.
This is what the query string you posted should look like:
ajax=10&a=test&b=cats+%26+dogs
The query string you posted should give you this:
'ajax': '10'
'a': 'test'
'b': 'cats '
' dogs': ''
Edit
It looks like you actually want to sanitize a query string that other developers have built lazily. If we assume that: a) every argument name matches /[a-zA-Z0-9]+/; and b) it is always followed by an equals sign, then this code will work:
var queryString = 'ajax=10&a=test&b=cats & dogs';
var parts = queryString.split(/&(?=[a-zA-Z0-9]+\=)/);
for(var i = 0; i < parts.length; i++)
{
var index = parts[i].indexOf('=') + 1;
if(index > 0)
parts[i] = parts[i].substring(0, index) + escape(parts[i].substring(index));
//else: error?
}
queryString = parts.join("&");
alert("queryString: " + queryString);
> I cannot encode the ampersands before the string is made due to the nature of this project
Then you won't have a full-proof answer.
Ampersands ("&") separate query parameters in url query strings. You can't have it both ways where some of your query parameter values contain un-escaped "&" and expect a parser based on this simple rule to know the difference.
If you can't escape "&" as "%26" in each value component beforehand, then you can never know that the values you get are correct. The best you could do is: If the value to the right of an "&" and before the next "&" does not contain an equal sign "=", you append the value to the previous value read, or the empty string if this is the first value read.
This requires a proper parser as JavaScript does not support lookahead regular expressions that could help you do this.
Note however that an algorithm like that completely bypasses the spec. Presuming for a moment that the query string:
a=test&b=cats & dogs&c=test
is valid, technically that string contains 4 parameters: "a" (with a value of "test"), "b" (with a value of "cats "), " dogs" (with no value), and "c" (with a value of "test").
If you don't change the query string at the source (and properly escape the value component), you're just hacking in the wrong solution.
Good luck.

Categories