I'm writing a python program that uses re.sub
re.sub("( |\t)([a-zA-Z0-9_]+) = ", "$1\"$2\":", string)
the only issue is that when I use this code instead of actually using the groups via the $s it writes down $1"$2":. In js and even in the visual studio code find and replace it works so what would be the equivalent?
I've tried searching for an answer and only gotten results that line up with the regex match of "at end of line" $.
I think the way you are trying to do it will not work either. The structure $n python did not know. You have to specify a function for the replacement, which takes a regex match object as argument with which then the groups are evaluated. In your example like this:
import re
string = ""
def replace_func(matches):
return f'{matches.group(1)}"{matches.group(2)}":'
re.sub("( |\t)([a-zA-Z0-9_]+) = ", replace_func, string)
Related
I know RegEx should not be used for parsing HTML, but I'm unable to use any other solution, so I'm stuck with this
I got this for URI.js:
/\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’"e]))/ig
However it doesn't work very well, so I wanted to add a prefix that would search only for strings starting with href=
Ended up with something like this (which works in the RegEx tester):
href\=\"\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’"e]))
But when compiled, it throws "illegal character" error. Not sure if it's the " or = that causes that.
JS code:
matches_temp = result_content.match(href\=\"\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’"e])));
result_content is taken from the DB.
You need the slashes that say this is a regex, sort of how like quotes say that this value is a string. So .match(regex) should be .match(/regex/). Take a look:
var result_content = 'blah';
var matches_temp = result_content.match(/href\=\"\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’"e]))/);
console.log(matches_temp[1]);
I am attempting to generate some code using escodegen's .generate() function which gives me a string.
Unfortunately it does not remove completely the semi-colons (only on blocks of code), which is what I need it to do get rid of them myself. So I am using the the .replace() function , however the semi-colons are not removed for some reason.
Here is what I currently have:
generatedCode = escodegen.generate(esprima.parseModule(code), escodegenOptions)
const cleanGeneratedCode = generatedFile.replace(';', '')
console.log('cleanGeneratedCode ', cleanGeneratedCode) // string stays the exact same.
Am I doing something wrong or missing something perhaps?
As per MDN, if you provide a substring instead of a regex
It is treated as a verbatim string and is not interpreted as a regular expression. Only the first occurrence will be replaced.
So, the output probably isn't exactly the same as the code generated, but rather the first semicolon has been removed. To remedy this, simply use a regex with the "global" flag (g). An example:
const cleanGenereatedCode = escodegen.generate(esprima.parseModule(code), escodegenOptions).replace(/;/g, '');
console.log('Clean generated code: ', cleanGeneratedCode);
I got a web application and an Android app in which I want to check the input.
Now I created this Regex in Java:
private static final String NAME_REGEX = "^[\\w ]+$";
if (!Pattern.matches(NAME_REGEX, name)) {
mNameView.setError(getString(R.string.error_field_noname));
focusView = mNameView;
cancel = true;
}
In JavaScript I want to test the same so I used:
var re = /^[\w ]+$/;
if (!re.test(company)) {
...
}
Everything works fine except that the Java version accepts the characters ä,ö,ü, ó, á (...) and the JavaScript version won't.
Don't know where's the difference between the code mentioned above?
In the end the most important thing is that both (JavaScript and Java) work exactly the same.
Goal:
Get a regex for Javascript that is exactly the same as in Java (^[\\w ]+$)
Please use following regular expression.
var re=^[äöüß]*$
The above regular expression will allow these characters also.
If you want to use special characters and alphabets use the below one
var re=^[A-Za-z0-9!##$%^&*äöüß()_]*$
Try this : /^[\wäöüß ]+$/i.
Please note the modifier i for "case insensitive", or it will not match ÄÖÜ.
These languages uses different engines to read the RegExp. Java supports unicode better than JavaScript does.
See : https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines#Part_2
So state of art is that one must use a library to get the same results in javascript as in java.
As this isn't a real solution for me I simply use this in JavaScript:
var re = /^[A-Za-z0-9_öäüÖÄÜß ]+$/;
and this one in Java:
private static final String NAME_REGEX = "^[A-Za-z0-9_öäüÖÄÜß ]+$";
So this seems to be the exact same in both environments.
Thanks for the help!
I've been working on my Safari extension for saving content to Instapaper and have been working on enhancing my title parsing for bookmarks. For example, an article that I recently saved has a tag that looks like this:
Report: Bing Users Disproportionately Affected By Malware Redirects | TechCrunch
I want to use the JavaScript in my Safari extension to remove all of the text after the pipe character so that I can make the final bookmark look neater once it is saved to Instapaper.
I've attempted the title parsing successfully in a couple of similar cases using blocks of code that look like this:
if(safari.application.activeBrowserWindow.activeTab.title.search(' - ') != -1) {
console.log(safari.application.activeBrowserWindow.activeTab.title);
console.log(safari.application.activeBrowserWindow.activeTab.title.search(' - '));
var parsedTitle = safari.application.activeBrowserWindow.activeTab.title.substring(0, safari.application.activeBrowserWindow.activeTab.title.search(' - '));
console.log(parsedTitle);
};
I started getting thrown for a loop once I tried doing this same thing with the pipe character; however, since JavaScript uses it as a special character. I've tried several bits of code to try and solve this problem. The most recent looks like this (attempting to use regular expressions and escape the pipe character):
if(safari.application.activeBrowserWindow.activeTab.title.search('/\|') != -1) {
console.log(safari.application.activeBrowserWindow.activeTab.title);
console.log(safari.application.activeBrowserWindow.activeTab.title.search('/\|'));
var parsedTitle = safari.application.activeBrowserWindow.activeTab.title.substring(0, safari.application.activeBrowserWindow.activeTab.title.search('/\|'));
console.log(parsedTitle);
};
If anybody could give me a tip that works for this, your help would be greatly appreciated!
Your regex is malformed. It should be:
safari.application.activeBrowserWindow.activeTab.title.search(/\|/)
Note the lack of quotes; I'm using a regex literal here. Also, regex literals need to be bound by /.
Instead of searching and then replacing, you can simply do a replace with the following regex:
str = str.replace(/\|.*$/, "");
This will remove everything after the | character if it exists.
I am generating XML using Javascript. It works fine if there are no special characters in the XML. Otherwise, it will generate this message: "invalid xml".
I tried to replace some special characters, like:
xmlData=xmlData.replaceAll(">",">");
xmlData=xmlData.replaceAll("&","&");
//but it doesn't work.
For example:
<category label='ARR Builders & Developers'>
Thanks.
Consider generating the XML using DOM methods. For example:
var c = document.createElement("category");
c.setAttribute("label", "ARR Builders & Developers");
var s = new XMLSerializer().serializeToString(c);
s; // => "<category label=\"ARR Builder & Developers\"></category>"
This strategy should avoid the XML entity escaping problems you mention but might have some cross-browser issues.
This will do the replacement in JavaScript:
xml = xml.replace(/</g, "<");
xml = xml.replace(/>/g, ">");
This uses regular expression literals to replace all less than and greater than symbols with their escaped equivalent.
JavaScript comes with a powerful replace() method for string objects.
In general - and basic - terms, it works this way:
var myString = yourString.replace([regular expression or simple string], [replacement string]);
The first argument to .replace() method is the portion of the original string that you wish to replace. It can be represented by either a plain string object (even literal) or a regular expression.
The regular expression is obviously the most powerful way to select a substring.
The second argument is the string object (even literal) that you want to provide as a replacement.
In your case, the replacement operation should look as follows:
xmlData=xmlData.replace(/&/g,"&");
xmlData=xmlData.replace(/>/g,">");
//this time it should work.
Notice the first replacement operation is the ampersand, as if you should try to replace it later you would screw up pre-existing well-quoted entities for sure, just as ">".
In addition, pay attention to the regex 'g' flag, as with it the replacement will take place all throughout your text, not only on the first match.
I used regular expressions, but for simple replacements like these also plain strings would be a perfect fit.
You can find a complete reference for String.replace() here.