I've run into an issue of regex match not evaluating in Internet Explorer and in Firefox. It works fine in Chrome and Opera. I know Chrome is generally much more tolerant of mistakes so I suspect I've dropped the ball somewhere along the way - yet none of the online evaluation tools seem to find any errors in my expression. I'm sorry that it's such a convoluted expression but hopefully something will be easily obvious as the culprit. The expression is as follows:
keyData = data.match(/\w+\u0009\w+\u0009[\u0009]?\w+\u0009([-]?\w+|%%)[#]?\u0009([-]?\w+|%%)[#]?\u0009([-]?\w+|%%)[#]?(\u0009([-]?\w+|%%)[#]?)?(\u0009([-]?\w+|%%)[#]?)?(\u0009([-]?\w+|%%)[#]?)?\u0009\u0009\/\//g);
'data' is a text file which I am parsing with no errors. I wont post the whole file here but what I am hoping to match is something such as the following:
10 Q 1 0439 0419 -1 // CYRILLIC SMALL LETTER SHORT I, CYRILLIC CAPITAL LETTER SHORT I, <none>
I believe that when I post the string here it removes the 'u0009' characters so if you'd like to see one of the full files, I've linked one here. If there is anything more I can clarify, please let me know!
Edit:
My goal in this post is understanding not only why this is failing, but also if this expression well-formatted. After further review, it seems that it's an issue with how Internet Explorer and Firefox parse the text file. They seem to strip out the tabs and replace them with spaces. I tried to update the expression and it matches with no problems in an online validator but it still fails in IE/FF.
Edit 2
I have since updated my expression to a clearer form taking into account feedback. The issue still is persisting in IE and Firefox. It seems to be an issue with the string itself. IE won't let me match more than a single character, no matter what my expression is. For example, if the character string of the file is KEYBOARD and I try to match with /\w+/, it will just return K.
/[0-9](\w)?(\t+|\s+)\w+(\t+|\s+)[0-9](\t+|\s+)(-1|\w+#?|%%)(\t+|\s+)(-1|\w+#?|%%)(\t+|\s+)(-1|\w+#?|%%)((\t+|\s+)(-1|\w+#?|%%))?((\t+|\s+)(-1|\w+#?|%%))?((\t+|\s+)(-1|\w+#?|%%))?(\t+|\s+)\/\//g
After poking around with my regex for a while, I suspected something was wrong with the way IE was actually reading the text file as compared to Chrome. Specifically, if I had the string KEYBOARDwithin the text file and I tried to match it using /\w+/, it would simply return K in IE but in Chrome it would match the whole string KEYBOARD. I suspected IE was inserting some dead space between characters so I stepped through the first few characters of the file and printed their unicode equivalent.
for (i = 0; i < 30; i++) {
console.log(data.charCodeAt(i) + ' ' + data[i]);
}
This confirmed my suspicion and I saw u0000 pop up between each character. I'm not sure why there are NULL characters between each character but to resolve my issue I simply performed:
data = data.replace(/\u0000+/g, '');
This completely resolved my issue and I was able to parse my string like normal using the expression:
keyData = data.match(/[0-9](\w)?(\t+|\s+)\w+(\t+|\s+)[0-9](\t+|\s+)(-1|\w+#?|%%)(\t+|\s+)(-1|\w+#?|%%)(\t+|\s+)(-1|\w+#?|%%)((\t+|\s+)(-1|\w+#?|%%))?((\t+|\s+)(-1|\w+#?|%%))?((\t+|\s+)(-1|\w+#?|%%))?(\t+|\s+)\/\//g);
Related
I wanted to understand what carriage return is by writint a simple code to console.As carriage return '\r' means
" return to the beginning of the current line without advancing
downward"
But in my code the following string is appended at the end of the line .Why it is behaving like this.I have a string "this is my string" ,then i have carriage return ,and it is followed by another string "that".I thought "that" will be placed at the beginning of the string
console.log("this is my string"+String.fromCharCode(13)+"that");
it prints "this is my stringthat"
Using \r in a string in JavaScript is probably going to give you different results depending on a combination of how the program is being run (in a browser or a standalone engine) and the target of the text (console, alert, a text node in an HTML element etc). It's not clear from your question whether you're running JavaScript in a browser, but (assuming you are) you're going to get different results for different browsers. Internet Explorer's console treats \r as a newline character (\n) while most other browsers will ignore it. I doubt any browser implementation of console is going to give you the behavior you've described.
Note that \r is not a string processing instruction, it's a character. Doing this:
var aString="one\r2";
is never going to result in
aString == "2ne"
or
aString == "2one"
or
aString == "one2"
or anything similar evaluating to true. aString's value will remain "one\r2" until you change it. It's up to the console or alert that is displaying the string to choose how to render \r.
There are string processing methods in JavaScript for splitting and recombining strings (see the w3schools Javascript String Reference or Mozilla's String reference) that would better suit your purposes. If you start using characters like \r or \b in other languages and/or environments you're going to encounter different behaviors based on a whole host of factors.
I am surprised to not to find any post regarding this, I must be missing something very trivial. I have a small JavaScript function to check if a string matches an object's properties. Simple stuff right? It works easily with all strings except those which contain a forward slash.
"04/08/2015".indexOf('4') // returns 2 :good
"04/08/2015".indexOf('4/') // returns -1 :why?
The same issue appears to be with .search() function as well. I encountered this issue while working on date strings.
Please note that I don't want to use regex based solution for performance reasons. Thanks for your help in advance!
Your string has invisible Unicode characters in it. The "left-to-right mark" (hex 200E) appears around the two slash characters as well as at the beginning and the end of the string.
If you type the code in on your browser console instead of cutting and pasting, you'll see that it works as expected.
I am literally pulling my hair out on this one...
Here's the situation. I have two javascript strings as follows:
dsName = "Test 1"
replacementString = "Test "
I'm trying to see if dsName starts with replacementString, with the following code:
if(dsName.indexOf(replacementString) == 0)
{
// I never get here!
}
indexOf is returning -1!! How is this possible? I can put a breakpoint in Chrome script debugging right before that line and paste "dsName.indexOf(replacementString)" into the console and see that it is indeed returning -1.
Now just to prove I'm not crazy I can from that same breakpoint print out dsName and it does in fact equal "Test 1" and replacementString does equal "Test ". Here is an actual screenshot from the Chrome debugging console:
So as you can see, if I paste in the literal string, it works as expected, but if I use the variable, it doesn't work. I've even tried String(replacementString) and replacementString.toString() to see if maybe it was a type issue, but it does the same thing.
It's like it works if the parameter for indexOf is a literal string, but not if it's a string variable.
Am I going crazy, is there a something stupid I'm missing? Or is this possibly a bug in Chrome?
It looks like some of the characters that look like spaces are not actually simple spaces. Try this to see what the string really contains:
for (var i=0; i<replacementString.length; i++)
console.log(replacementString.charCodeAt(i));
You can replace non-breaking spaces by regular ones like this:
replacementString = replacementString.replace(String.fromCharCode(160), " ");
Kudos to Wolfgang for getting me on the right path to figuring this out, but it turned out to be something completely unexpected and different...
I was pulling the value of replacementText from a <textarea> which had a style of white-space:nowrap. I guess when nowrap is turned on, it returns spaces as non-breaking (ASCII code 160) and not as regular spaces.
Here's a js-fiddle to see what's going on: http://jsfiddle.net/Jk9Cw/
What do you guys think? Is this a "duh you've should have known" or a "wow, that is something I've never run into before"?
I've been working on my Safari extension for saving content to Instapaper and have been working on enhancing my title parsing for bookmarks. For example, an article that I recently saved has a tag that looks like this:
Report: Bing Users Disproportionately Affected By Malware Redirects | TechCrunch
I want to use the JavaScript in my Safari extension to remove all of the text after the pipe character so that I can make the final bookmark look neater once it is saved to Instapaper.
I've attempted the title parsing successfully in a couple of similar cases using blocks of code that look like this:
if(safari.application.activeBrowserWindow.activeTab.title.search(' - ') != -1) {
console.log(safari.application.activeBrowserWindow.activeTab.title);
console.log(safari.application.activeBrowserWindow.activeTab.title.search(' - '));
var parsedTitle = safari.application.activeBrowserWindow.activeTab.title.substring(0, safari.application.activeBrowserWindow.activeTab.title.search(' - '));
console.log(parsedTitle);
};
I started getting thrown for a loop once I tried doing this same thing with the pipe character; however, since JavaScript uses it as a special character. I've tried several bits of code to try and solve this problem. The most recent looks like this (attempting to use regular expressions and escape the pipe character):
if(safari.application.activeBrowserWindow.activeTab.title.search('/\|') != -1) {
console.log(safari.application.activeBrowserWindow.activeTab.title);
console.log(safari.application.activeBrowserWindow.activeTab.title.search('/\|'));
var parsedTitle = safari.application.activeBrowserWindow.activeTab.title.substring(0, safari.application.activeBrowserWindow.activeTab.title.search('/\|'));
console.log(parsedTitle);
};
If anybody could give me a tip that works for this, your help would be greatly appreciated!
Your regex is malformed. It should be:
safari.application.activeBrowserWindow.activeTab.title.search(/\|/)
Note the lack of quotes; I'm using a regex literal here. Also, regex literals need to be bound by /.
Instead of searching and then replacing, you can simply do a replace with the following regex:
str = str.replace(/\|.*$/, "");
This will remove everything after the | character if it exists.
last_tag="abcde x";
last_tag = last_tag.replace(/[\s]+x$/, '');
this is my problem: i have to remove an "x" at the end of my string. This piece of code is used in a plugin i've been using without problems until now. On IE 7 "last_tag" is selected in the wrong way, so i get an "x" and i have to remove it. I think who wrote the plugin added this replace to do exactly this but it's not working on IE7.
Example:
before:last_tag="abcde x"
after: last_tag="abcde"
Actually the problem is that last_tag remain exactly the same.
is the regex correct? is there any error or compatibility issue with IE?
EDIT: Probably the regex is not the issue.
I've tried this piece of code, but nothing happens:
var temp_tag="abc x";
alert(temp_tag);
temp_tag = temp_tag.replace(/[\s]+x$/, '');
alert(temp_tag)
The same piece of code work perfectly on Chrome.
The regex looks okay, but it's possible you're trying to match non-breaking spaces (U+00A0). \s doesn't match those in IE (as explained in this answer), but it does in FireFox and Chrome.
I'd go for this RegExp
/\s+x$/
don't use character class [] for \s which is a character class already
(shorthand for something like [ \t\r\n\v\f]) (space, tab, carriage return, line feed, vertical tab, form feed)
edit
Alan Moore is right:
try this instead
/[\s\u00A0]+x$/
edit
maybe this is case sensitive: maybe \u00a0would not be correct
this should match every white-space-character as well as the non breaking spaces
Try this
last_tag = last_tag.replace(/[\t\r\n]+x$/, '');