Regex not working in Javascript with IE 7 - javascript

last_tag="abcde x";
last_tag = last_tag.replace(/[\s]+x$/, '');
this is my problem: i have to remove an "x" at the end of my string. This piece of code is used in a plugin i've been using without problems until now. On IE 7 "last_tag" is selected in the wrong way, so i get an "x" and i have to remove it. I think who wrote the plugin added this replace to do exactly this but it's not working on IE7.
Example:
before:last_tag="abcde x"
after: last_tag="abcde"
Actually the problem is that last_tag remain exactly the same.
is the regex correct? is there any error or compatibility issue with IE?
EDIT: Probably the regex is not the issue.
I've tried this piece of code, but nothing happens:
var temp_tag="abc x";
alert(temp_tag);
temp_tag = temp_tag.replace(/[\s]+x$/, '');
alert(temp_tag)
The same piece of code work perfectly on Chrome.

The regex looks okay, but it's possible you're trying to match non-breaking spaces (U+00A0). \s doesn't match those in IE (as explained in this answer), but it does in FireFox and Chrome.

I'd go for this RegExp
/\s+x$/
don't use character class [] for \s which is a character class already
(shorthand for something like [ \t\r\n\v\f]) (space, tab, carriage return, line feed, vertical tab, form feed)
edit
Alan Moore is right:
try this instead
/[\s\u00A0]+x$/
edit
maybe this is case sensitive: maybe \u00a0would not be correct
this should match every white-space-character as well as the non breaking spaces

Try this
last_tag = last_tag.replace(/[\t\r\n]+x$/, '');

Related

JavaScript RegEx Fails In IE / Firefox

I've run into an issue of regex match not evaluating in Internet Explorer and in Firefox. It works fine in Chrome and Opera. I know Chrome is generally much more tolerant of mistakes so I suspect I've dropped the ball somewhere along the way - yet none of the online evaluation tools seem to find any errors in my expression. I'm sorry that it's such a convoluted expression but hopefully something will be easily obvious as the culprit. The expression is as follows:
keyData = data.match(/\w+\u0009\w+\u0009[\u0009]?\w+\u0009([-]?\w+|%%)[#]?\u0009([-]?\w+|%%)[#]?\u0009([-]?\w+|%%)[#]?(\u0009([-]?\w+|%%)[#]?)?(\u0009([-]?\w+|%%)[#]?)?(\u0009([-]?\w+|%%)[#]?)?\u0009\u0009\/\//g);
'data' is a text file which I am parsing with no errors. I wont post the whole file here but what I am hoping to match is something such as the following:
10 Q 1 0439 0419 -1 // CYRILLIC SMALL LETTER SHORT I, CYRILLIC CAPITAL LETTER SHORT I, <none>
I believe that when I post the string here it removes the 'u0009' characters so if you'd like to see one of the full files, I've linked one here. If there is anything more I can clarify, please let me know!
Edit:
My goal in this post is understanding not only why this is failing, but also if this expression well-formatted. After further review, it seems that it's an issue with how Internet Explorer and Firefox parse the text file. They seem to strip out the tabs and replace them with spaces. I tried to update the expression and it matches with no problems in an online validator but it still fails in IE/FF.
Edit 2
I have since updated my expression to a clearer form taking into account feedback. The issue still is persisting in IE and Firefox. It seems to be an issue with the string itself. IE won't let me match more than a single character, no matter what my expression is. For example, if the character string of the file is KEYBOARD and I try to match with /\w+/, it will just return K.
/[0-9](\w)?(\t+|\s+)\w+(\t+|\s+)[0-9](\t+|\s+)(-1|\w+#?|%%)(\t+|\s+)(-1|\w+#?|%%)(\t+|\s+)(-1|\w+#?|%%)((\t+|\s+)(-1|\w+#?|%%))?((\t+|\s+)(-1|\w+#?|%%))?((\t+|\s+)(-1|\w+#?|%%))?(\t+|\s+)\/\//g
After poking around with my regex for a while, I suspected something was wrong with the way IE was actually reading the text file as compared to Chrome. Specifically, if I had the string KEYBOARDwithin the text file and I tried to match it using /\w+/, it would simply return K in IE but in Chrome it would match the whole string KEYBOARD. I suspected IE was inserting some dead space between characters so I stepped through the first few characters of the file and printed their unicode equivalent.
for (i = 0; i < 30; i++) {
console.log(data.charCodeAt(i) + ' ' + data[i]);
}
This confirmed my suspicion and I saw u0000 pop up between each character. I'm not sure why there are NULL characters between each character but to resolve my issue I simply performed:
data = data.replace(/\u0000+/g, '');
This completely resolved my issue and I was able to parse my string like normal using the expression:
keyData = data.match(/[0-9](\w)?(\t+|\s+)\w+(\t+|\s+)[0-9](\t+|\s+)(-1|\w+#?|%%)(\t+|\s+)(-1|\w+#?|%%)(\t+|\s+)(-1|\w+#?|%%)((\t+|\s+)(-1|\w+#?|%%))?((\t+|\s+)(-1|\w+#?|%%))?((\t+|\s+)(-1|\w+#?|%%))?(\t+|\s+)\/\//g);

javascript regex replace multiline strings [duplicate]

var ss= "<pre>aaaa\nbbb\nccc</pre>ddd";
var arr= ss.match( /<pre.*?<\/pre>/gm );
alert(arr); // null
I'd want the PRE block be picked up, even though it spans over newline characters. I thought the 'm' flag does it. Does not.
Found the answer here before posting. SInce I thought I knew JavaScript (read three books, worked hours) and there wasn't an existing solution at SO, I'll dare to post anyways. throw stones here
So the solution is:
var ss= "<pre>aaaa\nbbb\nccc</pre>ddd";
var arr= ss.match( /<pre[\s\S]*?<\/pre>/gm );
alert(arr); // <pre>...</pre> :)
Does anyone have a less cryptic way?
Edit: this is a duplicate but since it's harder to find than mine, I don't remove.
It proposes [^] as a "multiline dot". What I still don't understand is why [.\n] does not work. Guess this is one of the sad parts of JavaScript..
DON'T use (.|[\r\n]) instead of . for multiline matching.
DO use [\s\S] instead of . for multiline matching
Also, avoid greediness where not needed by using *? or +? quantifier instead of * or +. This can have a huge performance impact.
See the benchmark I have made: https://jsben.ch/R4Hxu
Using [^]: fastest
Using [\s\S]: 0.83% slower
Using (.|\r|\n): 96% slower
Using (.|[\r\n]): 96% slower
NB: You can also use [^] but it is deprecated in the below comment.
[.\n] does not work because . has no special meaning inside of [], it just means a literal .. (.|\n) would be a way to specify "any character, including a newline". If you want to match all newlines, you would need to add \r as well to include Windows and classic Mac OS style line endings: (.|[\r\n]).
That turns out to be somewhat cumbersome, as well as slow, (see KrisWebDev's answer for details), so a better approach would be to match all whitespace characters and all non-whitespace characters, with [\s\S], which will match everything, and is faster and simpler.
In general, you shouldn't try to use a regexp to match the actual HTML tags. See, for instance, these questions for more information on why.
Instead, try actually searching the DOM for the tag you need (using jQuery makes this easier, but you can always do document.getElementsByTagName("pre") with the standard DOM), and then search the text content of those results with a regexp if you need to match against the contents.
You do not specify your environment and version of JavaScript (ECMAScript), and I realise this post was from 2009, but just for completeness:
With the release of ECMA2018 we can now use the s flag to cause . to match \n (see https://stackoverflow.com/a/36006948/141801).
Thus:
let s = 'I am a string\nover several\nlines.';
console.log('String: "' + s + '".');
let r = /string.*several.*lines/s; // Note 's' modifier
console.log('Match? ' + r.test(s)); // 'test' returns true
This is a recent addition and will not work in many current environments, for example Node v8.7.0 does not seem to recognise it, but it works in Chromium, and I'm using it in a Typescript test I'm writing and presumably it will become more mainstream as time goes by.
Now there's the s (single line) modifier, that lets the dot matches new lines as well :)
\s will also match new lines :D
Just add the s behind the slash
/<pre>.*?<\/pre>/gms
[.\n] doesn't work, because dot in [] (by regex definition; not javascript only) means the dot-character. You can use (.|\n) (or (.|[\n\r])) instead.
I have tested it (Chrome) and it's working for me (both [^] and [^\0]), by changing the dot (.) with either [^\0] or [^] , because dot doesn't match line break (See here: http://www.regular-expressions.info/dot.html).
var ss= "<pre>aaaa\nbbb\nccc</pre>ddd";
var arr= ss.match( /<pre[^\0]*?<\/pre>/gm );
alert(arr); //Working
In addition to above-said examples, it is an alternate.
^[\\w\\s]*$
Where \w is for words and \s is for white spaces
[\\w\\s]*
This one was beyond helpful for me, especially for matching multiple things that include new lines, every single other answer ended up just grouping all of the matches together.

RegEx does match in Expresso but doesn't in JavaScript

I have written an regex with the help of Expresso. It matches all my samples so I copied it into my JavaScript code. There it doesn't match one of my examples, but why?
RegEx:
^(\d{1,2}):?(\d\d)?\s*-\s*(\d{1,2}):?(\d\d)?$
Should match:
10-12
10:00-12:00
1000-1200
In JavaScript 10:00-12:00 doesn't work for me in all browsers like IE9, Chrome, Firefox.
Any ideas?
Update (JavaScript Code):
input.match(/^(\d{1,2}):?(\d\d)?\s*-\s*(\d{1,2}):?(\d\d)?$/);
Update (solved):
Due some prefiltering the code never got reached. Sorry for that!
Testing it in Chrome right now, and it appears to work:
var exp = /^(\d{1,2}):?(\d\d)?\s*-\s*(\d{1,2}):?(\d\d)?$/;
exp.test('10-12') // true
exp.test('10:00-12:00') // true
exp.test('1000-1200') // true
exp.test('1000-12005') // false
Did you escape the \'s when placing the expression in your Javascript code?
When embedding it as a string you'll end up writing:
var expression = "^(\\d{1,2}):?(\\d\\d)?\\s* etc

Javascript RegEx to find first line of each paragraph

This is probably a simple one but I'm very much a regex novice.
I'm looking to select the first line of every paragraph within a textarea on a page using a regular expression. After thinking I was there I have hit a problem.
Using http://gskinner.com/RegExr/ I came up with this:
/\r\r.*\r/g
but then I place that into my javascript and ran it on the page:
var headingsArr = document.getElementById("text").value.match(/\r\r.*\r/g);
and the array returns null.
Have I got the regular expression right and if so where am I going wrong when using it in my javascript!?
Thank you
This depends on what your newline characters are. I think you may better go for
/(?:\r\n|[\r\n]){2}.*(?:\r\n|[\r\n])/g
I know in Regexr only a \r is a newline. But in Windows normally \r\n is used, but under .*ix its normally only the \n.
So (?:\r\n|[\r\n]) is an alternation, it tries at first to match \r\n if this is not found it matches either \r or \n.
For the sake of future searchers, if you just need to style the first line of text, you can use the css pseudoclass ::first-line :
textarea::first-line {
background-color: yellow;
}
http://www.w3schools.com/cssref/sel_firstline.asp

JavaScript/jQuery removing character 160 from a node's text() value - Regex

$('#customerAddress').text().replace(/\xA0/,"").replace(/\s+/," ");
Going after the value in a span (id=customerAddress) and I'd like to reduce all sections of whitespace to a single whitespace. The /\s+/ whould work except this app gets some character 160's between street address and state/zip
What is a better way to write this? this does not currently work.
UPDATE:
I have figured out that
$('.customerAddress').text().replace(/\s+/g," ");
clears the 160s and the spaces.
But how would I write a regex to just go after the 160s?
$('.customerAddress').text().replace(String.fromCharCode(160)," ");
didn't even work.
Note: I'm testing in Firefox / Firebug
Regarding just replacing char 160, you forgot to make a global regex, so you are only replacing the first match. Try this:
$('.customerAddress').text()
.replace(new RegExp(String.fromCharCode(160),"g")," ");
Or even simpler, use your Hex example in your question with the global flag
$('.customerAddress').text().replace(/\xA0/g," ");
\s does already contain the character U+00A0:
[\t\n\v\f\r \u00a0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000]
But you should add the g modifier to replace globally:
$('#customerAddress').text().replace(/\s+/g, " ")
Otherwise only the first match will be replaced.
Sorry if I'm being obvious (or wrong), but doesn't .text() when called w/o parameters just RETURNS the text? I mean, I don't know if you included the full code or just an excerpt, but to really replace the span you should do it like:
var t = $('#customerAddress').text().replace(/\xA0/,"").replace(/\s+/," ");
$('#customerAddress').text(t);
Other than that, the regex for collapsing the spaces seems OK, I'm just not sure about the syntax of your non-printable char there.

Categories