JS Regex: Double slash splitting along with other characters - javascript

I have a split statement in my JavaScript that will split spaces and semicolons, but I want to split double slashes as well. I cannot figure out how to include a double slash along with the space and semicolon.
line = lines[i].split(/[\s;]+/);
Any help is greatly appreciated.

so assuming that by "double slashes" you mean a double forward slash ( "//" ) you are going to want to do something like the following:
line = lines[i].split(/[\s;]+|\/{2}/);
Note that the matching options are being moved from between brackets, because when placed within the brackets, "{", "2", and "}" would be interpreted literally, rather than as a pattern

The other answers will not behave properly in the presence of a double slash or semi-colon surrounded by spaces. It will generate empty strings in the output. This regexp handles that case:
/(?:\s|;|\/\/)+/
In other words, split on any sequence composed of spaces, semi-colons, or double slashes.
var re = /(?:\s|;|\/\/)+/;
var input = "Some stuff; more stuff // last stuff";
console.log(input.split(re));

Related

How to deal with backslash concatenation in RegExp breaking expressions, e.g. those that contain capturing groups brackets?

Creating RegEx and combining it with variables that are strings with a certain pattern doesn't always work out well, if the pattern doesn't change to work with the string.
"\\" + ) = \) in RegEx. Which is an escaped ) and not the closing bracket of a capturing group anymore.
"\\\\" + ) = \\) in RegEx. Here \ successfully gets escaped and the capturing group does not break.
In a normal string or RegEx expression with // only, so no concatenation, this "\\" would be working fine as an escaped backslash \.
The problem is, in RegEx the \ is also used for something.
What I am wondering is, if there's a way to make something flexible. That would prevent a pattern like this to match , breaking when combined together. Example, matching example\.
"(" + "example\\" + ")"
This one when combined together breaks, because when used as a RegExp, the \ gets escaped, and the capturing group breaks.
To fix it, it would have to look like this
"(" + "example\\" + "\\)"
Here, \ one backlash gets escaped and then another one gets escaped, and together in RegEx, it would result into one escaped bracket \
because what I basically created was \\ when combining it all together.
The problem is, this breaks again, if there wouldn't be \\ infront of example.
 
Here is an example code.
To explain, the function matches the path after example_folder with strings in the matchingArray that end with a \. Because it's in a string, I have to do \\ so that it appears as a \ backslash.
class example_options {
matchingArray = ["example\\", "example\\\\special\\", "something_else\\"]
}
function example(options) {
if (typeof(options) != "object") {
options = new example_options()
}
var test = `"example_folder\\example\\special\\test.txt"`
for (let i=0; i < options.matchingArray.length; i++) {
var regexPattern = new RegExp(`(?:example_folder\\\\)(${options.matchingArray[i]}\\)`)
var match = test.match(regexPattern)
if (match) {
console.log(match, options.matchingArray[i])
}
}
}
example()
example() executes the function, if you would be running it, which successfully works.
But the problem is the flexibility of it.
I wouldn't have to backlash it twice, if the RegEx would have been constructed without using string, but the slashes /:
var regexPattern = new RegExp(/(?:example_folder\\)(example\\special\\)/)
matchingArray = ["example\\", "example\\\\special\\", "something_else\\"] here I had to add \\\\, because in RegEx a \ is also used for its own thing.
At the end of the expression here:
var regexPattern = new RegExp(`(?:example_folder\\\\)(${options.matchingArray[i]}\\)`)
I had to add a \\at the end.
This would mean that in RegEx it ends up like this \) and the capturing group gets destroyed. However, since I have backslashes in the matchingArray. When all of this gets combined at the end, the RegEx near the bracket looks like this \\) and the capturing group is preserved.
Is there a way to deal with this issue?
What if I wanted the matchingArray to be like this? To be compatible with other things that aren't RegEx.
matchingArray = ["example\\", "example\\special\\", "something_else\\"]
If I would do this, it wouldn't work with RegEx anymore, because this example\\special\\ would end up like this example\special\. Then the issue would be this \s.
The other issue is that, I have to add this at the end \\) of the RegEx expression. But I only need to do that, if I know that there is going to be a \\ at the end, when combining the string.
And for the otherway around, if I would have a string in the matchingArray that wouldn't be ending with a backslash, then the RegEx would break because it would end up like this \) causing Unterminated group.
 
A possible idea that I have, is that all \ could be converted into /, but what if you can't do that?
I don't see an issue doing that however, because I would be able to convert the / back into \ at any time. It would be an issue if those mixed slashes need to be preserved, but this would probably an even more specific problem.
But are there other ways?

Regexp, wrap each CSV field in double quotes

Using a regular expression, I can't find a solution to wrap each field from a csv text into double quotes.
The issue is that there could be already double-quoted fields.
Example:
Country;Product Family;Product SKU;Commercial Status
Germany;Aprobil;"Apro&'bil_1_5 mL";Actively Marketed
Should be
"Country";"Product Family";"Product SKU";"Commercial Status"
"Germany";"Aprobil";"Apro&'bil_1_5 mL";"Actively Marketed"
Basically, I have a problem to get two logical part in a regular expression...
Thanks in advance!
You will need to to do 2 replacements, I think, first regex looks like this:
/([\w ]+[^;\n]*|\"[^\"]*\")/g
The regex will either match:
Any Word character or Space, 1 or more times, followed by any char not being semi colon ';' or newline, any number of times.
A double quote followed by any characters not being double quote, any number of times, ending with a double quote.
You then replace the matches with: \"\1\".
Fianally you replace 2 double quotes with a single one.
In JavaScript this is:
var test = 'Country;Product Family;Product SKU;Commercial Status\n'
+ 'Germany;Aprobil;"Apro&'bil_1_5 mL";Actively Marketed\n';
var regex = /([\w ]+[^;\n]*|\"[^\"]*\")/g;
test = test.replace(regex, '\"\1\"'); // wrap in double quotes
test = test.replace(/\"\"/g, '\"'); // replace 2 quotes with one
Now you should have what you want.

Backslash bug in JavaScript

I have a string that involves tricky \\ characters.
Below is the initial code, and what I am literally trying to achieve but it is not working. I have to replace the \" characters but I think that is where the bug is.
var current = csvArray[0][i].Replace("\"", "");
I have tried the variation below but it is still not working.
var current = csvArray[0][i].Replace('\"', '');
It is currently throwing an Uncaught TypeError: csvArray[0][i].Replace is not a function
Is there a way for Javascript to take my string ("\"") literally like in C#? Kindly help me investigate. Thanks!
If the sequence you want to match is a single backslash character followed by a quotation mark, then you need to escape the backslash itself because backslashes have special meaning in string literals. You then need to separately escape the quotation mark with its own backslash:
.replace("\\\"", "")
I believe that would also be true in C#.
Or you can simplify it by using single quotes around the string so that only the backslash needs to be escaped:
.replace('\\"', '')
If the first argument to .replace() is a string, however, it will only replace the first occurrence. To do a global replace you have to use a regular expression with the g flag, noting that backslashes need to be escaped in regular expressions too:
.replace(/\\"/g, '')
I'm not going to setup a demo array to exactly match your code, but here's a simple demo where you can see that a lone backslash or quote in the input string are not replaced, but all backslash-quote combinations are replaced:
var input = 'Some\\ test" \\" text \\" for demo \\"'
var output = input.replace(/\\"/g, '')
console.log(input)
console.log(output)

Regex match with '\' slash and replace with '\\'?

I was converting normal string in to latex format.So i was created the latex code match and replace the \ single slash into \\ double slash.why the i need it Refer this link.I tried Below code :
function test(){
var tex="$$\left[ x=\left({{11}\over{2}}+{{\sqrt{3271}}\over{2\,3^{{{3}\over{2} $$";
var tex_form = tex.replace("/[\\\/\\\\\.\\\\]/g", "\\");
document.getElementById('demo').innerHTML=tex_form;//nothing get
}
test();
<p id="demo"></p>
Not getting any output data.But the match in this link
i wish to need replace the \ into \\
There are these issues:
The string literal has no backslashes;
The regular expression is not a regular expression;
The class in the intended regular expression cannot match sequences, only single characters;
The replacement would not add backslashes, only replace with them.
Here you find the details on each point:
1. How to Encode Backslashes in String Literals
Your tex variable has no backslashes. This is because a backslash in a string literal is not taken as a literal backslash, but as an escape for interpreting the character that follows it.
When you have "$$\left...", then the \l means "literal l", and so the content of your variable will be:
$$left...
As an l does not need to be escaped, the backslash is completely unnecessary, and these two assignments result in the same string value:
var tex="$$\left[ x=\left({{11}\over{2}}+{{\sqrt{3271}}\over{2\,3^{{{3}\over{2} $$";
var tex="$$left[ x=left({{11}over{2}}+{{sqrt{3271}}over{2,3^{{{3}over{2} $$";
To bring the point home, this will also represent the same value:
var tex="\$\$\l\e\f\t\[\ \x\=\l\e\f\t\(\{\{\1\1\}\o\v\e\r\{\2\}\}\+\{\{\s\q\r\t\{\3\2\7\1\}\}\o\v\e\r\{\2\,\3\^\{\{\{\3\}\o\v\e\r\{\2\}\ \$\$";
If you really want to have literal backslashes in your content (which I understand you do, as this is about LaTeX), then you need to escape each of those backslashes... with a backslash:
var tex="$$\\left[ x=\\left({{11}\\over{2}}+{{\\sqrt{3271}}\\over{2\\,3^{{{3}\\over{2} $$";
Now the content of your tex variable will be this string:
$$\left[ x=\left({{11}\over{2}}+{{\sqrt{3271}}\over{2\,3^{{{3}\over{2} $$
2. How to Code Regular Expression Literals
You are passing a string literal to the first argument of replace, while you really intend to pass a regular expression literal. You should leave out the quotes for that to happen. The / are the delimiters of a regular expression literal, not quotes:
/[\\\/\\\\\.\\\\]/g
This should not be wrapped in quotes. JavaScript understands the / delimiters as denoting a regular expression literal, including the optional modifiers at the end (like g here).
3. Classes are sets of single characters
This regular expression has unnecessary characters. The class [...] should list all individual characters you want to match. Currently you have these characters (after resolving the escapes):
\
/
\
\
.
\
\
It is overkill to have the backslash represented 5 times. Also, in JavaScript the forward slash and dot do not need to be escaped when occurring in a class. So the above regular expression is equivalent to this one:
/[\\/.]/g
Maybe this is, or is not, what you intended to match. To match several sequences of characters, you could use the | operator. This is just an example:
/\\\\|\\\/|\\\./g
... but I don't think you need this.
4. How to actually prefix with backslashes
It seems strange to me that you would want to replace a point or forward slash with a backslash. Probably you want to prefix those with a backslash. In that case make a capture group (with parentheses) and refer to it with $1 in this replace:
tex.replace(/([\\/.])/g, "\\$1");
Note again, that in the replacement string there is only one literal backslash, as the first one is an escape (see point 1 above).
why the i need it
As the question you link to says, the \ character has special meaning inside a JavaScript string literal. It represents an escape sequence.
Not getting any output data.But the match in this link
The escape sequence is processed when the string literal is parsed by the JavaScript compiler.
By the time you apply your regular expression to them, they have been consumed. The slash characters only exist in your source code, not in your data.
If you want to put a slash character in your string, then you need to write the escape sequence for it (the \\) in the source code. You can't add them back in with JavaScript afterwards.
Not sure if I understood the problem, but try this code:
var tex_form = tex.replace("/(\\)/g","\\\\");.
You need to use '(' ')' instead of '['']' to get a match for output.

Backslash escape in indexOf

Why does n give 0 in the following instance :
var str = '\\nvga032.bmwgroup.net\QXE7868\Daten\IE\3_bookmarks.zzz'
var n = str.indexOf("\\");
alert(n) //0
Surely the escape character for a backslash is
'\\'
Am I missing something? I am looking for a single backslash at the last position. I tried lastIndexOf as well, and that also gives zero. Are the two '.'s messing things up?
indexOf matches on the string not the JavaScript source code used to create it.
A \ character starts an escape sequence.
\\ is the escape sequence for "A backslash".
The string assigned to str starts with \\ which puts a backslash in position 0 in the data.
The string passed to indexOf consists entirely of \\ which matches the first backslash in the data.
If you wanted to describe an escape sequence in a string you would use \\\\ (i.e. the escape sequence for a backslash followed by another escape sequence for a backslash resulting in data consisting of two backslashes).
"\\" will be parsed down to a single blackslash. And then indexOf will look for that single backslash, which happens to be at the start of the string (n=0).
If you want to search for TWO backslashes, you'll have to indexOf("\\\\") (FOUR backslashes, which will be parsed down to two literal backslashes).
Your "str" variable most probably doesn't contain what you expected. Write instead:
var str = '\\\\nvga032.bmwgroup.net\\QXE7868\\Daten\\IE\\3_bookmarks.zzz'
var n = str.lastIndexOf("\\");

Categories