javascript replace() not replacing text containing literal \r\n strings

javascript replace() not replacing text containing literal \r\n strings - javascript

Using this bit of code trims out hidden characters like carriage returns and linefeeds with nothing using javascript just fine:
value = value.replace(/[\r\n]*/g, "");
but when the code actually contains \r\n text what do I do to trim it without affecting r's and n's in my content? I've tried this code:
value = value.replace(/[\\r\\n]+/g, "");
on this bit of text:
{"client":{"werdfasreasfsd":"asdfRasdfas\r\nMCwwDQYJKoZIhvcNAQEBBQADGw......
I end up with this:
{"cliet":{"wedfaseasfsd":"asdfRasdfasMCwwDQYJKoZIhvcNAQEBBQADGw......
Side note: It leaves the upper case versions of R and N alone because I didn't include the /i flag at the end and thats ok in this case.
What do I do to just remove \r\n text found in the string?

If you want to match literal \r and literal \n then you should use the following:
value = value.replace(/(?:\\[rn])+/g, "");
You might think that matching literal \r and \n with [\\r\\n] is the right way to do it and it is a bit confusing but it won't work and here is why:
Remember that in character classes, each single character represents a single letter or symbol, it doesn't represent a sequence of characters, it is just a set of characters.
So the character class [\\r\\n] actually matches the literal characters \, r and n as separate letters and not as sequences.
Edit: If you want to replace all carriage returns \r, newlines \n and also literal \r and '\n` then you could use:
value = value.replace(/(?:\\[rn]|[\r\n]+)+/g, "");
About (?:) it means a non-capturing group, because by default when you put something into a usual group () then it gets captured into a numbered variable that you can use elsewhere inside the regular expression itself, or latter in the matches array.
(?:) prevents capturing the value and causes less overhead than (), for more info see this article.

To just remove them, this seems to work for me:
value = value.replace(/[\r\n]/g, "");
You don't need the * after the character set because the g flag solves that for you.
Note, this will remove all \r or \n chars whether they are in this exact sequence or not.
Working demo of this option: http://jsfiddle.net/jfriend00/57GtJ/
If you want to remove these characters only when in this exact sequence (e.g. only when a \r is directly followed by a \n, you could use this:
value = value.replace(/\r\n/g, "");
Working demo of this option: http://jsfiddle.net/jfriend00/Ta3sn/

If you have text with a lot of \r\n and want to save all of them try this one
value.replace(/(?:\\[rn]|[\r\n])/g,"<br>")
http://jsfiddle.net/57GtJ/63/

Related

Odd RegEx request for Javascript

I'm having trouble with a certain RegEx replacement string for later use in Javascript.
We have quite a bit of text that was stored in a rather odd format that we aren't allowed to fix.
But we do need to find all the "network path" strings inside it, following these rules:
A. The matches always start with 2 backslashes.
B. The matching characters should stop as soon as it hits a first occurrence of any 1 of these:
A < character
A space
A line feed
A carriage return
A & character
A literal "\r" or "\n" string (but only if occurring at end of line)
We "almost" have it working with /\\\\[^ &<\s]*/gi as shown in this RegEx Tester page:
https://regex101.com/r/T4cDOL/5
Even if we get it working, the RegEx has to be even futher "escape escaped" before putting on
our Javascript code, but that's also not working as expected.

From your example, it seems you literally have a backslash followed by an n and a backslash followed by an r (as opposed to a newline or carriage return), which means you can't only use a negated character class (since you need to handle a sequence of two characters). I'd use a positive lookahead to know where to stop, so I can use an alternation for that part.
You haven't said what parts of those strings should match, so I've had to guess a bit, but here's my best guess (with useful input from Niet the Dark Absol):
const rex = /\\\\.*?(?=[ &<\r\n]|\\[rn](?:$| ))/gmi;
That says:
Match starting with \\
Take everything prior to the lookahead (non-greedy)
Lookahead: An alternation of:
A space, &, <, carriage return (\r, character 13), or a newline (\n, character 10); or
A backslash followed by r or n if that's either at the end of a line or followed by a space (so we get the \nancy but not the \n after it).
Updated regex101
You might want to have more characters than just a space after the \r/\n. If so, make it a character class (and/or use \s for "whitespace" if that applies):
const rex = /\\\\.*?(?=[ &<\r\n]|\\[rn](?:$|[ others]))/gmi;
// −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−^^^^^^^^^

Can't figure out what this JS code means

I've been romping through a piece of JS I came across online and can't figure out what this piece of code means.
global$string$newLines = function(a) {
return a.replace(/(\r\n|\r|\n)/g, "\n");
},
I'm specifically wondering about the piece /(\r\n|\r|\n)/g
Also - Is this machine generated code? Is that why the variable name is full of $s?

They are regular expresions
\r = Find a carriage return character
\n = Find a new line character
the /g (g only) mean to find all
http://www.w3schools.com/jsref/jsref_obj_regexp.asp
So the code mean to find all \r\n or just \r or just \n and replace it with \n
They are whitespace characters so they not visible.

It's a regular expression for replacing newline characters.
There are different types of new line characters inserted by various browsers/editors/OSes etc.
\n is the default on all (true) Unix systems with \r having no meaning, C, Java, C++, etc, adopted this convention.
\r is from the days of Mac before it was a Unix system, while the duplicate \r\n is the Windows way.
The /g flag represents a global setting telling the regular expression to search the entire document.
So what the code is doing is using a regular expression to globally find all possible equivalents of a newLine, and replacing them with the defacto standard, '\n'

This is just a regular expression used to replace Carriage Returns and New Line characters with new line characters.
Your Regex: /(\r\n|\r|\n)/g
Explanation:
1st Capturing group (\r\n|\r|\n)
1st Alternative: \r\n
\r matches a carriage return (ASCII 13)
\n matches a line-feed (newline) character (ASCII 10)
2nd Alternative: \r
\r matches a carriage return (ASCII 13)
3rd Alternative: \n
\n matches a line-feed (newline) character (ASCII 10)
g modifier: global. Give All matches (i.e Don't return on first match).
PS: Check out regex101.com for generating such beautiful explanation for any Regex.

The code replaces carriage-return/new-line combinations with a single newline.
The $'s in the variable name is done by several javascript compilers out there. Developers will often break their code up into namespaces of the form global.string.newline, for example. But when we want to run that code on a client, it's safer and more efficient to turn this object-within-an-object-within-an-object into a single variable. Usually, the javascript compiler will go one step further and then turn this long variable name into some short unique sequence, but it will also preserve this intermediate form for easier debugging.

It is a regex to remove the carriage return/new line/carriage return + new line with new line from a string.
/(\r\n|\r|\n)/g
the /g in the end signifies globally, hence throughout the string and not just the first occurence.
Working Fiddle
JS Code:
global$string$newLines = function (a) {
return a.replace(/(\r\n|\r|\n)/g, "\n")
}
function abc() {
var text = document.getElementById("test").value;
console.log(global$string$newLines(text));
}
HTML Code:
<textarea id="test"></textarea>
<button id="testClick" onclick="abc()">Click</button>

This is a regular expression replace that means:
Find any occurence of either:
\r\n
\r
\n
And replace it with \n.
Comments:
The /g means it will mach all findings, not only the first occurence.
The third option to replace \n by \n is nonsense as it has no effect.
Doc of the replace and link to regex: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace

What is this "/\,$/"?

Tried to search for /\,$/ online, but coudnt find anything.
I have:
coords = coords.replace(/\,$/, "");
Im guessing it returns coords string index number. What I have to search online for this, so I can learn more?

/\,$/ finds the comma character (,) at the end of a string (denoted by the $) and replaces it with empty (""). You sometimes see this in regex code aiming to clean up excerpts of text.

It's a regular expression to remove a trailing comma.

That thing is a Regular Expression, also known as regex or regexp. It is a way to "match" strings using some rules. If you want to learn how to use it in JavaScript, read the Mozilla Developer Network page about RegExp.
By the way, regular expressions are also available on most languages and in some tools. It is a very useful thing to learn.

That's a regular expression that finds a comma at the end of a string. That code removes the comma.

// defines a JavaScript regular expression, used to match a pattern within a string.
\,$ is the pattern
In this case \, translates to ,. A backslash is used to escape special characters, but in this case, it's not necessary. An example where it would be necessary would be to remove trailing periods. If you tried to do that with /.$/ the period here has a different meaning; it is used as a wildcard to match [almost] any character (aside for some newlines). So in this case to match on "." (period character) you would have to escape the wildcard (/\.$/).
When $ is placed at the end of the pattern, it means only look at the end of the string. This means that you can't mistakingly find a comma anywhere in the middle of the string (e.g., not after help in help, me,), only at the end (trailing). It also speeds of the regular expression search considerably. If you wanted to match on characters only at the beginning of the string, you would start off the pattern with a carat (^), for instance /^,/ would find a comma at the start of a string if one existed.
It's also important to note that you're only removing one comma, whereas if you use the plus (+) after the comma, you'd be replacing one or more: /,+$/.
Without the +; trailing commas,, becomes trailing commas,
With the +; no trailing comma,, becomes no trailing comma

unable to parse - in Regular expression in Javascript

I am a bit new to the regular expressions in Javascript.
I am trying to write a function called parseRegExpression()
which parses the attributes passed and generates a key/value pairs
It works fine with the input:
"iconType:plus;iconPosition:bottom;"
But it is not able to parse the input:
"type:'date';locale:'en-US';"
Basically the - sign is being ignored. The code is at:
http://jsfiddle.net/visibleinvisibly/ZSS5G/
The Regular Expression key value pair is as below
/[a-z|A-Z|-]*\s*:\s*[a-z|A-Z|'|"|:|-|_|\/|\.|0-9]*\s*;|[a-z|A-Z|-]*\s*:\s*[a-z|A-Z|'|"|:|-|_|\/|\.|0-9]*\s*$/gi;

There are a few problems:
A | inside a character class means a literal | character, not an alternation.
A . inside a character class means a literal . character, so there's no need to escape it.
A - as the first or last character inside a character class means a literal - character, otherwise it means a character range.
There's no need to use [a-zA-Z] when you use the case-insensitive modifier (i); [a-z] is enough.
The only difference between your alterations is the last bit; this can be simplified significantly by just limiting your alternation to that part which is different.
This should be equivalent to your original pattern:
/[a-z-]*\s*:\s*[a-z0-9'":_\/.-]*\s*(?:;|$)/gi

You can avoid the regex:
var test1 = "iconType:plus;iconPosition:bottom;";
var test2 = "type:'date';locale:'en-US';";
function toto(str) {
var result = new Array();
var temp = str.split(';');
for (i=0; i<temp.length-1; i++) {
result[i] = temp[i].split(':',1);
}
return result;
}
console.log(toto(test1));
console.log(toto(test2));

Inside a character set atom [...] the pipe char | is just a regular char and doesn't mean "or".
A character set atom lists characters or ranges you want to accept (or exclude if the character set starts with ^) and "or" is implicit.
You can use a backslash in a character set if you need to include/exclude a close bracket ], the ^ sign, the dash - that is used for ranges, the backslash \ itself, an unprintable character or if you want to use a non-ASCII unicode char specifying the code instead of literally.
Regular expression syntax however also lets you to avoid backslash-escaping in a character set atom by placing the character in a position where it cannot have the special meaning... for example a dash - as first or last in the set (it cannot mean a range there).
Note also that if you need to be able to match as values quoted strings, including backslash escaping, the regular expression is more complex, for example
'(?:[^'\\]|\\.)*'|"(?:[^"\\]|\\.)*"
matches a single-quoted or double-quoted string including backslash escaping, the meaning being:
A single quote '
Zero or more of either:
Any char except the single quote ' or the backslash \
A pair composed of a backslash \ followed by any char
A single quote '
or the same with double quotes " instead.
Note that the groups have been delimited with (?:...) instead of plain (...) to avoid capture

It doesn't match hyphens because it interpreting |-| as a range that starts at | and ends at |. (I would have expected that to be treated as a syntax error, but there you have it. It works the same in every regex flavor I've tried, too.)
Have a look at this regex:
/(?:^|;)([a-z-]*)\s*:\s*([a-z'":_\/.0-9-]*)\s*(?=;|$)/ig
As suggested by the other responders, I collapsed it to one alternative, removed the unneeded pipes, and escaped the hyphen by moving it to the end. I also anchored it at the beginning as well as the end. Or anchored it as well as I can, anyway. I used a lookahead to match the trailing semicolon so it will still be there when the next match starts. It's far from foolproof, but it should work okay as long as the input is well formed.

Replace regular expressions in your code as follow:
regExpKeyValuePair = /[-a-z]*\s*:\s*[-a-z'":_\/.0-9]*\s*;|[-a-z]*\s*:\s*[-a-z'":-_\/.0-9]*\s*$/gi;
regExpKey = /[-a-z]*/gi;
regExpValue = /[-a-z:_\/.0-9]*/gi;
You don't need escape . inside [].
No need to put | between elements [].
Because you are using /i flag, [A-Z] is not needed.
- should be at the beginning or at the end.

JS regex to split by line

How do you split a long piece of text into separate lines? Why does this return line1 twice?
/^(.*?)$/mg.exec('line1\r\nline2\r\n');
["line1", "line1"]
I turned on the multi-line modifier to make ^ and $ match beginning and end of lines. I also turned on the global modifier to capture all lines.
I wish to use a regex split and not String.split because I'll be dealing with both Linux \n and Windows \r\n line endings.

arrayOfLines = lineString.match(/[^\r\n]+/g);
As Tim said, it is both the entire match and capture. It appears regex.exec(string) returns on finding the first match regardless of global modifier, wheras string.match(regex) is honouring global.

Use
result = subject.split(/\r?\n/);
Your regex returns line1 twice because line1 is both the entire match and the contents of the first capturing group.

I am assuming following constitute newlines
\r followed by \n
\n followed by \r
\n present alone
\r present alone
Please Use
var re=/\r\n|\n\r|\n|\r/g;
arrayofLines=lineString.replace(re,"\n").split("\n");
for an array of all Lines including the empty ones.
OR
Please Use
arrayOfLines = lineString.match(/[^\r\n]+/g);
For an array of non empty Lines

Even simpler regex that handles all line ending combinations, even mixed in the same file, and removes empty lines as well:
var lines = text.split(/[\r\n]+/g);
With whitespace trimming:
var lines = text.trim().split(/\s*[\r\n]+\s*/g);

Unicode Compliant Line Splitting
Unicode® Technical Standard #18 defines what constitutes line boundaries. That same section also gives a regular expression to match all line boundaries. Using that regex, we can define the following JS function that splits a given string at any line boundary (preserving empty lines as well as leading and trailing whitespace):
const splitLines = s => s.split(/\r\n|(?!\r\n)[\n-\r\x85\u2028\u2029]/)
I don't understand why the negative look-ahead part ((?!\r\n)) is necessary, but that is what is suggested in the Unicode document 🤷‍♂️.
The above document recommends to define a regular expression meta-character for matching all line ending characters and sequences. Perl has \R for that. Unfortunately, JavaScript does not include such a meta-character. Alas, I could not even find a TC39 proposal for that.

First replace all \r\n with \n, then String.split.

http://jsfiddle.net/uq55en5o/
var lines = text.match(/^.*((\r\n|\n|\r)|$)/gm);
I have done something like this. Above link is my fiddle.

We Keep Coding

JavaScript is the programming language of the Web.

javascript replace() not replacing text containing literal \r\n strings - javascript

If you have text with a lot of \r\n and want to save all of them try this one value.replace(/(?:\\[rn]|[\r\n])/g,"<br>") http://jsfiddle.net/57GtJ/63/

Related

Odd RegEx request for Javascript

Can't figure out what this JS code means

What is this "/\,$/"?

unable to parse - in Regular expression in Javascript

JS regex to split by line

Categories

Resources