convert js regex into python regex - javascript

I'm working on a part of a project, which is repleacing http url's with https url's if possible.
The Problem is, that the regular expressions for that are written for the javascript regex parser, but I'm using that regex inside python. To be compatible, I would rewrite the regex during parsing into a valide python regex.
as example, I have that regular expression given:
https://$1wikimediafoundation.org/
and I would a regular expression like that:
https://\1wikimediafoundation.org/
my problem is that I doesn't know how to do that (converting $ into \)
This code doesn't work:
'https://$1wikimediafoundation.org/'.replace('$', '\')
generate the following error:
SyntaxError: EOL while scanning string literal
This code work without error:
'https://$1wikimediafoundation.org/'.replace('$', '\\')
but generate a wrong output:
'https://\\1wikimediafoundation.org/'

Actually it works:
>>> 'https://$1wikimediafoundation.org/'.replace('$', '\\')
'https://\\1wikimediafoundation.org/'
>>> print 'https://$1wikimediafoundation.org/'.replace('$', '\\')
https://\1wikimediafoundation.org/
when you are doing 'https://$1wikimediafoundation.org/'.replace('$', '\\'), it's returning the __repr__ (~representation) of the string and you can see special characters.
By printing it, you are using the __str__, the readable version. (See this answer on __str__ vs __repr__)

try this:
'https://$1wikimediafoundation.org/'.replace('$', r'\')
adding r"\" whill automatically escape the backslash which you are trying to do.

You test your regex here https://regex101.com/, and then change it to python.
Additionaly, to replace the matched group, you can use re.sub module on these lines:
re.sub(r"'([^']*)'", r'{\1}', col ) )
replace
'Protein_Expectation_Value_Log(e)', 'Protein_Intensity_Log(I)'
{Protein_Expectation_Value_Log(e)}, {Protein_Intensity_Log(I)}
More you can refer here

Note that $& in replacement patterns should be converted to \g<0>, since \0 is \0x00 character in python regex

Related

RegExp for remove first and last char and turn ending double slashes into single

I have the following Javascript code to obtain the inner string from an RegExp:
Function.prototype.method = function (name,func){
this.prototype[name] = func;
return this;
};
RegExp.method('toRawString', function(){
return this.toString().replace(/^.(.*).$/,"$1");
});
The purpose of this, is to avoid in string double quoting. For example, if you have a Windows file path "C:\My Documents\My Folder\MyFile.file", you can use it like the following:
alert(/C:\My Documents\My Folder\MyFile.file/.toRawString());
However it is not working for ""C:\My Documents\My Folder\" since it causes syntax error. The only way to avoid it is to keep double quoting at the end of the string. Thus it will be written
alert(/C:\My Documents\My Folder\\/.toRawString());
The fact is any odd number of back slashes on the end of the string will be an error, so all ending back slashes must be double escaped. It will not be hard to use a multiple line small implementation, but are there any single RegExp solution?
NOTE
When using toRawString the RegExp object for this is usually NOT going to be used for any other purpose except for that method. I just want to use the syntax of RegExp to avoid double back slashes in source code. Unfortunately the ending double slashes cannot be easily avoid. I think another workaround is to force a space at the end but that is another question then.
UPDATE
I finally solved the "another question" and posted the code here.
OK, I get what you're trying to do! It's hacky : )
Try something like:
return this.toString().slice(1, -1).replace(/\\+$/, '\\')
Hope that helps.
If you want to include the double quotes in the string just wrap it with single quotes.
s = '"C:\\My Documents\\My Folder\\MyFile.file"'
console.log(s) // Output => "C:\My Documents\My Folder\MyFile.file"
This produces a syntax error:
/C:\My Documents\/
But that regular expression could be written correctly like this:
/C:\\My Documents\\/
Or like this:
new RegExp("C:\\\\My Documents\\\\")
I think your function is just fine and is returning a correct result. Regular expressions just can't end with an unpaired backslash. It's not that you're double escaping - you're just escaping the escape character.
This would produce an error too:
new RegExp("C:\\My Documents\\")
A regular expression like this, for instance, can't be written without a pair of backslashes:
/C:\\What/
Without the second backslash, \W would be interpreted as a special character escape sequence. So escaping the escape character isn't only necessary at the end. It's required anywhere it might be interpreted as the beginning of an escape sequences. For that reason, it might be a good rule of thumb to always use two backslashes to indicate a backslash literal in a regular expression.

Converting backslashes into forward slashes using javascript does not work properly?

I have a javascript variable comming from legacy system with backslashes into forward slashes:
'/46\465531_Thumbnail.jpg'
and I am trying to convert into this:
'/46/465531_Thumbnail.jpg'.
There is no way to fix the problem on the legacy system.
Here is the command I am running on IE8 browser:
javascript:alert("/46\465531_Thumbnail.jpg".replace(/\\/g,"/"));
as response I get:
---------------------------
Message from webpage
---------------------------
/46&5531_Thumbnail.jpg
---------------------------
OK
---------------------------
actually I just want to be translated as '/46/465531_Thumbnail.jpg'
What is wrong?
You need to double the backslash in your string constant:
alert("/46\\465531_Thumbnail.jpg".replace(/\\/g,"/"));
If your legacy system is actually creating JavaScript string constants on your pages with embedded, un-quoted (that is, not doubled) backslashes like that, then it's broken and you'll have problems. However, if you're getting the strings via some sort of ajax call in XML or JSON or whatever, then your code looks OK.
It is actually interpreting \46 as an escape-code sequence for the character &. If you are going to hard-code the string, you need to escape the \:
alert("/46\\465531_Thumbnail.jpg".replace(/\\/g,"/"));
^^ change \ to \\
Sample: http://jsfiddle.net/6QWE9/
The replacement part isn't the problem, it's the string itself. Your string:
"/46\465531_Thumbnail.jpg"
isn't /46\465531. Rather, the backslash is acting as an escape character. You need to change it to:
javascript:alert("/46\\465531_Thumbnail.jpg".replace(/\\/g,"/"));
ie, escapeing the backslash with a backslash.
Nothing wrong with the replace. The input is wrong.
javascript:alert("/46\\465531_Thumbnail.jpg".replace(/\\/g,"/"));
^
\---------------- need to escape this!

Regular expression and xml

Sometimes, I have to work with not well formed xml (without root). So I take the first node name (In this case "error") and make up a regex pattern: "</error>$" The problem is it matches true with the following string. (</error> is the end of the line)
<error>0</error>
<roles>
<role rid="12" title="User" description="Hello world"></role>
<role rid="11" title="Admin" description="Hello world2"></role></roles>
After looking at some reference I tried to do so </error>\z and </error>\Z. but it doesn't work.
Please help me with the solution
P.S. If there are better solution, I'll be really happy. Target environment is javascript.
If the only thing that makes your XML not well-formed is that it's missing root, then the fix is simple – just add some root element and then parse that and work with it as normal XML.
xml = '<root>' + xml + '</root>';
You really shouldn't try to parse XML with regular expressions.
You are correct. Unfortunately, javascript does not support the \A and \Z anchors. Consider either using the substring of the last 10 characters with your regular expression (which would probably be more efficient.)
Otherwise, try this expression which matches if there is any non-whitespace after the </error> endtag.
[\s\S]*</error>(?=\s*\S)

How do I escape backslashes in JSON?

I am using Firefox's native JSON.parse() to parse some JSON strings that include regular expressions as values, for example:
var test = JSON.parse('{"regex":"/\\d+/"}');
The '\d' in the above throws an exception with JSON.parse(), but works fine when I use eval (which is what I'm trying to avoid).
What I want is to preserve the '\' in the regex - is there some other JSON-friendly way to escape it?
You need to escape the escape backslashes already in there :) like this:
var test = JSON.parse('{"regex":"/\\\\d+/"}');
You can test it a bit here: http://jsfiddle.net/h3rzE/

Regex to match part of a string

Regex fun again...
Take for example http://something.com/en/page
I want to test for an exact match on /en/ including the forward slashes, otherwise it could match 'en' from other parts of the string.
I'm sure this is easy, for someone other than me!
EDIT:
I'm using it for a string.match() in javascript
Well it really depends on what programming language will be executing the regex, but the actual regex is simply
/en/
For .Net the following code works properly:
string url = "http://something.com/en/page";
bool MatchFound = Regex.Match(url, "/en/").Success;
Here is the JavaScript version:
var url = 'http://something.com/en/page';
if (url.match(/\/en\//)) {
alert('match found');
}
else {
alert('no match');
}
DUH
Thank you to Welbog and Chris Ballance to making what should have been the most obvious point. This does not require Regular Expressions to solve. It simply is a contains statement. Regex should only be used where it is needed and that should have been my first consideration and not the last.
If you're trying to match /en/ specifically, you don't need a regular expression at all. Just use your language's equivalent of contains to test for that substring.
If you're trying to match any two-letter part of the URL between two slashes, you need an expression like this:
/../
If you want to capture the two-letter code, enclose the periods in parentheses:
/(..)/
Depending on your language, you may need to escape the slashes:
\/..\/
\/(..)\/
And if you want to make sure you match letters instead of any character (including numbers and symbols), you might want to use an expression like this instead:
/[a-z]{2}/
Which will be recognized by most regex variations.
Again, you can escape the slashes and add a capturing group this way:
\/([a-z]{2})\/
And if you don't need to escape them:
/([a-z]{2})/
This expression will match any string in the form /xy/ where x and y are letters. So it will match /en/, /fr/, /de/, etc.
In JavaScript, you'll need the escaped version: \/([a-z]{2})\/.
You may need to escape the forward-slashes...
/\/en\//
Any reason /en/ would not work?
/\/en\// or perhaps /http\w*:\/\/[^\/]*\/en\//
You don't need a regex for this:
location.pathname.substr(0, 4) === "/en/"
Of course, if you insist on using a regex, use this:
/^\/en\//.test(location.pathname)

Categories