My regex breaks when I "alert" it - because of escaped slashes? - javascript

I'm currently working on a CKEditor plugin which would add internal links to our CMS. One of the thing their current link plugin does is that it'll parse through a link when it loads the link dialog to figure out what "type" it is.
Since I created the internal type I need to add a regular expression to compare it to and I'm having trouble doing so. I managed to match my expression using this tool but once I use the same expression in the RegExp object definition it doesn't seem to work.
My links look like this:
/en/my_folder_5
or
/fr/my_folder_5
I tried the following (which worked in that tool):
/(en|fr)/[A-Za-z_^/]+_[0-9]+
but all the slashes get escaped when I "alert" the expression (which leads me to believe it might be what's breaking it since I copy pasted the alerted expression and it did not work)
Any help is appreciated :)

var regex = /\/(en|fr)\/[A-Za-z_^\/]+_[0-9]+/;
alert(regex.test('/fr/my_folder_5')); // prints true

Related

Delimiting documents with regular expressions

I'm working on several documents that are within just a file, and before working on the documents, I need to define where one document begins and ends. For this, I am using the following regex:
MINISTÉRIO\sDO\sTRABALHO\sE\sEMPREGO(?:[^P]*(?:P(?!ÁG\s:\s\d+\/\d+)[^P]*)*)PÁG\s:\s\d+\/(\d+)\b(?:\D*(?:(?!\1\/\1)\d\D*)*)\1\/\1(?:[^Z]*(?:Z(?!6:\s\d+)[^Z]*)*)Z6:\s\d+
Example is here
Is working 100%, the problem is, sometimes the text does not come this way I showed.. it comes with spaces and lines. As you can see here, the document is the same as the previous one, but the regular expression does not work. I wonder why is not working and how to fix to make it work ?
Also, I need modify the regex, not the text, cause the only real part that I have access is the regex.
OBS: I'm using Node.JS, that's why i'm tagging with JS this post.

using unicode symbols in ngOption <select>

Hello I've ran into an issue that is stumping me:
So I have an ngOption that loops through and displays unicode symbols
<select class="form-control symbolSelect" ng-model="input.loadSymbol" ng-options="d as d.TagShpUTF for d in loadSymbols" ng-change=""></select>
Here is an example jsFiddle showing it working: http://jsfiddle.net/tjm9a6o2/
I set up the datasource to have a unicode character like so: loadsymbols[0].TagShpUTF = '\u2660'
This all works fine as static data, but when I try to pull the data from my DB it displays it as regular text and doesn't seem to know it's special unicode characters.
This is how I have it setup in the DB (Don't mind other columns, TagShpUTF is the important one):
...what I think it's doing is automatically add a second slash '\' so it can be a valid string, but I don't want that to happen. I want it to be recognized as unicode so it shows the symbols in my dropdown (like jsFiddle), but instead it's showing the actual text (like '\u2660').
Any suggestions would be very helpful. Really need a way of storing these symbols and loading them into a drop down. I tried HTML unicode symbols, but they were giving me even more problems than this method. Thanks!
Eureka!!!
So after many painful attempts and exhausting the kind help from #OrGuz, I kind of gave up on using the \u version of unicode and started looking at HTML-Code version again.
I stumbled upon this SO post buried in the garbage i've been digging through. It had a link to a MDN page about String.fromCharCode()
By storing the HTML- Code number in my DB and calling String.fromCharCode()
I was able to load the symbol in the drop down.
spade: HTML-Code= ♣
TagShpUTF= 9827
String.fromCharCode(TagShpUTF); <---- Works!

Intellij IDEA regex character class may not be used inside character range

IDEA will not allow this error and i have not been able to find an option to turn off these kinds of errors. does anyone know how to fix the error or turn off the warning. The javascript works fine only IDEA sees this as a problem
You are creating a range by using the hyphen(-) in mid of your character class. You should move it to either end.
Also, note that you don't need to escape the regex meta-characters inside the character class. They loose their meanings in there.
So, just use:
[-\w._+%]

Javascript replace() function adding strange characters

Consider the following Javascript:
var previewImg = 'http://example.com/preview_img/hey.jpg';
var fullImg = previewImg.replace('preview','full');
I would expect the value of fullImg to be:
http://example.com/full_img/hey.jpg
In fact, it is... sort of. Running alert(fullImg); shows the expected url string. But when I deliver that variable to jQuery Fancybox, like this:
jQuery.fancybox.open(fullImg);
Something adds characters into the string, like this:
http://example.com/%EF%BF%BCfull_img/hey.jpg
Where is this %EF%BF%BC coming from? What is it? And most importantly, how do I get rid of it?
Some other clues: This is a Drupal 7 site, running jQuery 1.5.1. I'm using that same Fancybox script elsewhere on the site with no issues.
%EF%BF%BC is a sequence of three URL-encoded characters.
You clearly can't see any unexpected characters in the string. That's because the character sequence %EF%BF%BC is invisible.
It's actually a UTF-8 byte-order mark sequence. This sequence typically comes at the start of a UTF-8 encoded text file. They probably got into your code when you did a copy+paste from another file.
The quickest way to get rid of them is to find the bit of code that was copied+pasted, delete the characters on either side of the problem, and retype them. Depending on your editor, you may find the delete behaves strangely as it deletes the hidden characters.
Some text editors and IDEs will have an option to show hidden characters. If your editor has this, it may help you see where the mystery characters are so you can delete them.
Hope that helps.

Regex will not match

This is my string:
<link href="/post?page=4&tags=example" rel="last" title="Last Page">
From there I am trying to obtain the 4 out of that page parameter, using this regular expression:
link href="/post?page=(.*?)&tags=(.*?)" rel="last"
I will then collect the 4 out of the first group, the tags parameter has a wildcard because the contents can change. However, I don't seem to be getting a match with this, can anyone help?
And I know I shouldn't be using regex to parse HTML, but this is just a small thing and it would be a waste to import a huge module for this.
Assuming you are using a /regex literal/, you will need to escape the / in that path as \/.
Alternatively, it depends on how you are getting this string. Is it really typed that way, or is it part of an innerHTML that you are then reading out again? If that's the case, then the innerHTML won't be what you expect it to be, because the browser will "normalise" it.
If it is an innerHTML, then it'd be far easier to get the tag, then get the tag's href attribute, then regex that.
link href="/post\?page=(.*?)&tags=(.*?)" rel="last"
You forgot the slash before ?
I think it might be better to change your capture groups to something a little different, but will catch everything up to the terminating character:
link href="/post?page=([^&]+)&tags=([^\"]+)" rel="last"
Using the negating character first in the character group tells the regex engine "capture all characters EXCEPT the ones listed here". This makes it very easy to capture everything up until it hits a termination character, such as the amperstand and double-quote. Assuming you're using PHP or Java, this should also slightly improve regex performance.
If the page parameter always comes first, try the PCRE /\?page=(\d+)/. Match group 1 will contain the page number.

Categories