I know RegEx should not be used for parsing HTML, but I'm unable to use any other solution, so I'm stuck with this
I got this for URI.js:
/\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’"e]))/ig
However it doesn't work very well, so I wanted to add a prefix that would search only for strings starting with href=
Ended up with something like this (which works in the RegEx tester):
href\=\"\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’"e]))
But when compiled, it throws "illegal character" error. Not sure if it's the " or = that causes that.
JS code:
matches_temp = result_content.match(href\=\"\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’"e])));
result_content is taken from the DB.
You need the slashes that say this is a regex, sort of how like quotes say that this value is a string. So .match(regex) should be .match(/regex/). Take a look:
var result_content = 'blah';
var matches_temp = result_content.match(/href\=\"\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’"e]))/);
console.log(matches_temp[1]);
Related
I am attempting to generate some code using escodegen's .generate() function which gives me a string.
Unfortunately it does not remove completely the semi-colons (only on blocks of code), which is what I need it to do get rid of them myself. So I am using the the .replace() function , however the semi-colons are not removed for some reason.
Here is what I currently have:
generatedCode = escodegen.generate(esprima.parseModule(code), escodegenOptions)
const cleanGeneratedCode = generatedFile.replace(';', '')
console.log('cleanGeneratedCode ', cleanGeneratedCode) // string stays the exact same.
Am I doing something wrong or missing something perhaps?
As per MDN, if you provide a substring instead of a regex
It is treated as a verbatim string and is not interpreted as a regular expression. Only the first occurrence will be replaced.
So, the output probably isn't exactly the same as the code generated, but rather the first semicolon has been removed. To remedy this, simply use a regex with the "global" flag (g). An example:
const cleanGenereatedCode = escodegen.generate(esprima.parseModule(code), escodegenOptions).replace(/;/g, '');
console.log('Clean generated code: ', cleanGeneratedCode);
I have a string with a line-break in the source code of a javascript file, as in:
var str = 'new
line';
Now I want to delete that line-break in the code. I couldn't find anything on this, I kept getting stuff about \n and \r.
Thanks in advance!
EDIT (2021)
This question was asked a long, long time ago, and it's still being viewed relatively often, so let me elaborate on what I was trying to do and why this question is inherently flawed.
What I was trying to accomplish is simply to use syntax like the above (i.e. multi-line strings) and how I could accomplish that, as the above raises a SyntaxError.
However, the code above is just invalid JS. You cannot use code to fix a syntax error, you just can't make syntax errors in valid usable code.
The above can now be accomplished if we use backticks instead of single quotes to turn the string into a template literal:
var str = `new
line`;
is totaly valid and would be identical to
var str = 'new\n line';
As far as removing the newlines goes, I think the answers below address that issue adequately.
If you do not know in advance whether the "new line" is \r or \n (in any combination), easiest is to remove both of them:
str = str.replace(/[\n\r]/g, '');
It does what you ask; you end up with newline. If you want to replace the new line characters with a single space, use
str = str.replace(/[\n\r]+/g, ' ');
str = str.replace(/\n|\r/g,'');
Replaces all instances of \n or \r in a string with an empty string.
I am trying to some some urls throught javascript where some replacement of urls needs to be done. I have a textarea with some URLs example given below:
http://mywebsite.com/preview.aspx?mode=desktop&url=http://mywebsite.com/post.aspx?id=44&content=1
http://mywebsite.com/preview.aspx?mode=desktop&url=http://mywebsite.com/post.aspx?id=44&content=2
http://mywebsite.com/preview.aspx?mode=desktop&url=http://mywebsite.com/post.aspx?id=44&content=3
http://mywebsite.com/preview.aspx?mode=desktop&url=http://mywebsite.com/post.aspx?id=44&content=3
Now what i am trying to do is replacing http://mywebsite.com/preview.aspx?mode=desktop&url= with spaces.
I have tried using str.replace() but it is replacing only first occurence of that url.
I have also tried with Global variable g the query i have used is
str_replace(\http://mywebsite.com/preview.aspx?mode=desktop&url=/g,'');
But its not working So can anyone tell me how i can do that ?
I want the output of the textarea like:
http://mywebsite.com/post.aspx?id=44&content=1
http://mywebsite.com/post.aspx?id=44&content=2
http://mywebsite.com/post.aspx?id=44&content=3
http://mywebsite.com/post.aspx?id=44&content=4
I believe that your biggest issue is that your regex syntax is incorrect. Try this:
Imagine that var s is equal the the value of your textarea.
s.replace(/http\:\/\/mywebsite\.com\/preview.aspx\?mode\=desktop\&url\=/g, '');
The issue you were having was improper delimiters and unescaped reserved symbols.
Though Javascript has some of its own regex idiosyncrasies, the issues here were related to basic regex, you might find these resources useful:
http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
http://regexpal.com
try this.
var string = document.getElementById('textareaidhere');
string.replace(/http:\/\/mywebsite\.com\/preview\.aspxmode=desktop&url=/g, '');
JSFiddle here
I have a JS file with some XML in it, where the XML is supposed to get converted to a word by the server.
E.g.
var ip = "<lang:cond><lang:when test="$(VAR{'ip_addr'})">$(VAR{'ip_addr'})</lang:when></lang:cond>";
This gets converted to:
var ip = "192.168.0.0";
However, in case the server doesn't work as intended, I don't want there to be a syntax error, and this is VERY important. Currently there would be a syntax error because the language uses both types of quotes. I can't think of a way to get around this, but perhaps there's another way to do quotes in JavaScript? Or to create a string?
For example, in Python I'd use triple quotes:
ip = """<lang:cond><lang:when test="$(VAR{'ip_addr'})">$(VAR{'ip_addr'})</lang:when></lang:cond>"""
Anyone have a bright idea?
I have had to create strings without quotes for a project as well. We were delivering executable client javascript to the browser for an internal website. The receiving end strips double and single quotes when displayed. One way I have found to get around quotes is by declaring my string as a regular expression.
var x = String(/This contains no quotes/);
x = x.substring(1, x.length-1);
x;
Using String prototype:
String(/This contains no quotes/).substring(1).slice(0,-1)
Using String.fromCharCode
String.fromCharCode(72,69,76,76,79)
Generate Char Codes for this:
var s = "This contains no quotes";
var result = [];
for (i=0; i<s.length; i++)
{
result.push(s.charCodeAt(i));
}
result
In JavaScript, you can escape either type of quote with a \.
For example:
var str = "This is a string with \"embedded\" quotes.";
var str2 = 'This is a string with \'embedded\' quotes.';
In particular, your block of JavaScript code should be converted to:
var ip = "<lang:cond><lang:when test=\"$(VAR{'ip_addr'})\">$(VAR{'ip_addr'})</lang:when></lang:cond>";
In general, I always prefer to escape the quotes instead of having to constantly switch quote types, depending upon what type of quotes may be used within.
I was looking for a solution to the same problem. Someone suggested looking at https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/template_strings which proved helpful. After reading about half the article, it stated that you can create strings with the backward tick character. (`)
Try this :)
document.getElementById('test').innerHTML = `'|'|'|"|"`
<div id="test" style="font-size:3em;"></div>
You can't create a string without using a single or double quote, as even calling the String() prototype object directly still requires you to pass it the string.
Inside XML you would use CDATA, but inside JS you'll have to just escape the '\"strings\"' "\'appropriately\'"
So I receive some xml in plaintext (and no I can't use DOM or JSON because apparently I am not allowed to), I want to strip all elements encased in a certain element and put them into an array, where I can strip out the text in the individual segments.
Now I am used to using POSIX regex and I will never actually understand the point behind PCRE regex, nor do I get the syntax.
Now here is the code I am using:
var strResponse = objResponse.text;
var strRegex = new RegExp("<item>(.*?)<\/item>","i");
var arrMatches = "";
var match;
while (match = strRegex.exec(strResponse)) {
arrMatches[] = match[1];
}
I have no idea why it won't find any matches with this code, can someone please help me on this and perhaps elaborate on what exactly it is I am continuously doing wrong with the PCRE syntax?
If those tags are in different rows the . will not match the newline characters and therefor your expression will not match. This is just a guess, I don't know your source.
You can try
var strRegex = new RegExp("<item>([\\s\\S]*?)<\\/item>","i");
[\\s\\S] is a character class. containing all whitespace and all non whitespace characters. linebreaks are covered by the whitespace characters.
The best way to complete this task is using the following, to parse it as proper HTML and navigate it with the DOM parser:
Javascript function to parse HTML string into DOM?
Regex has it with being very faulty and is in general not very good for parsing irregular text like HTML structure.