Assume I have the following URL stored in variable called content:
http://www.example.com/watch?v=4444444&feature=related
Problem:
I need to replace watch?v= with embed/
I need to erase whatever comes after &
The final output would look like:
http://www.example.com/embed/4444444
I tried these two steps but didn't work:
content = content.replace('/watch?v=/', 'embed/');
content = content.replace('&*/g','');
The URL in page source code appears as:
http://www.example.com/watch?v=4444444&feature=related
You have many errors:
You are using a regular expression when you only need a string.
You are writing your regular expressions as strings.
To write 'match any characters' you need to write '.*', not just '*'. The star modifies the previous token.
There is no need to use the g flag here.
Try this instead:
content = content.replace('watch?v=', 'embed/').replace(/&.*/, '');
Related
I am attempting to generate some code using escodegen's .generate() function which gives me a string.
Unfortunately it does not remove completely the semi-colons (only on blocks of code), which is what I need it to do get rid of them myself. So I am using the the .replace() function , however the semi-colons are not removed for some reason.
Here is what I currently have:
generatedCode = escodegen.generate(esprima.parseModule(code), escodegenOptions)
const cleanGeneratedCode = generatedFile.replace(';', '')
console.log('cleanGeneratedCode ', cleanGeneratedCode) // string stays the exact same.
Am I doing something wrong or missing something perhaps?
As per MDN, if you provide a substring instead of a regex
It is treated as a verbatim string and is not interpreted as a regular expression. Only the first occurrence will be replaced.
So, the output probably isn't exactly the same as the code generated, but rather the first semicolon has been removed. To remedy this, simply use a regex with the "global" flag (g). An example:
const cleanGenereatedCode = escodegen.generate(esprima.parseModule(code), escodegenOptions).replace(/;/g, '');
console.log('Clean generated code: ', cleanGeneratedCode);
I know it's going to be a VERY obvious answer, but I can't find anything on how to do this.
I'm trying to unescape < and > within an HTML string
My test output string is essentially:
```php
>h2<Heading2>/h2<
```
`>h2<Heading2>/h2<`
>h2<Heading2>/h2<
So in this example we have Github flavoured Markdown, a regular code markdown snippet, and then raw text all with the same HTML tag. I want to unescape the raw tag (the third one) to actually become a link. The ideal output would be something like this.
```php
>h2<Heading2>/h2<
```
`>h2<Heading2>/h2<`
<h2>Heading2</h2>
I'm getting stuck at getting multiple > in the same line.
Current regex:
/(?:.*?(>))/
This will get the first entry.
/(?:.*?(>))/g
This one gets the second entry. I want it to be able to get EVERY entry. Then, it's just a matter of throwing the tick pieces.
/(?:```|`)(?:.*?(>)).*?(?:```|`)/gs
If you're intending on using a regular expression for this task, you can consider the following:
var r = s.replace(/((`(?:``)?)[^`]*\2)|>/g, '$1<')
.replace(/((`(?:``)?)[^`]*\2)|</g, '$1>')
.replace(/`[<>]+/g, '`');
Working Demo
Given something a regex like this:
http://rubular.com/r/ai1LFT5jvK
I want to use string.replace to replace "subdir" with a string of my choosing.
Doing myStr.replace(/^.*\/\/.*\.net\/.*\/(.*)\/.*\z/,otherStr)
only returns the same string, as shown here: http://jsfiddle.net/nLmbV/
If you view the Rublar, it appears to capture what I want it to capture, but on the Fiddle, it doesn't replace it.
I'd like to know why this happens, and what I'm doing wrong. A correct regex or a correct implementation of the replace call would be nice, but most of all, I want to understand what I'm doing wrong so that I can avoid it in the future.
EDIT
I've updated the fiddle to change my regex from:
/^.*\/\/.*\.net\/.*\/(.*)\/.*\z/
to
/^.*\/\/.*\.net\/.*\/(.*)\/.*$/
And according to the fiddle, it just returns hello instead of https://xxxxxxxxxxx.cloudfront.net/dir/hello/Slide1_v2.PNG
It's that little \z in your regex.
You probably forgot to replace it with a $ sign. JavaScript uses ^ and $ as anchors, while Ruby uses \A and \z.
To answer your edit:
The match is always replaced as a whole. You'll want to group both the left side and the right side of the to-be-replaced part and reinsert it in the replacement:
url.replace(/^(.*\/\/.*\.net\/.*\/).*(\/.*)$/,"$1hello$2")
Before I get marked down, I know the question asks about regexp. The reason for this answer URLs are nearly impossible to process reliably with a regexp without writing fiendishly complex regexps. It can be done, but it makes your head hurt!
If you are doing this in a browser, you can use an A tag in your script to make things much simpler. The A tag knows how to parse them into pieces, and it lets you modify the pieces independently, so you only need to deal with the pathname:
//make a temporary a tag
var a = document.createElement('a');
//set the href property to the url you want to process
a.href = "scheme://host.domain/path/to/the/file?querystring"
//grab the path part of the url, and chop up into an array of directories
var dirs = a.pathname.split('/');
//set 2nd dir name - array is ['','path','to','file']
dirs[2]='hello';
//put the path back together
a.pathname = dirs.join('/');
a.href now contains the URL you want.
More lines, but also more hair left when you come back to change the code later.
ok i do have this following data in my div
<div id="mydiv">
<!--
what is your present
<code>alert("this is my present");</code>
where?
<code>alert("here at my left hand");</code>
oh thank you! i love you!! hehe
<code>alert("welcome my honey ^^");</code>
-->
</div>
well what i need to do there is to get the all the scripts inside the <code> blocks and the html codes text nodes without removing the html comments inside. well its a homework given by my professor and i can't modify that div block..
I need to use regular expressions for this and this is what i did
var block = $.trim($("div#mydiv").html()).replace("<!--","").replace("-->","");
var htmlRegex = new RegExp(""); //I don't know what to do here
var codeRegex = new RegExp("^<code(*n)</code>$","igm");
var code = codeRegex.exec(block);
var html = "";
it really doesn't work... please don't give the exact answer.. please teach me.. thank you
I need to have the following blocks for the variable code
alert("this is my present");
alert("here at my left hand");
alert("welcome my honey ^^");
and this is the blocks i need for variable html
what is your present
where?
oh thank you! i love you!! hehe
my question is what is the regex pattern to get the results above?
Parsing HTML with a regular expression is not something you should do.
I'm sure your professor thinks he/she was really clever and that there's no way to access the DOM API and can wave a banner around and justify some minor corner-case for using regex to parse the DOM and that sometimes it's okay.
Well, no, it isn't. If you have complex code in there, what happens? Your regex breaks, and perhaps becomes a security exploit if this is ever in production.
So, here:
http://jsfiddle.net/zfp6D/
Walk the dom, get the nodeType 8 (comment) text value out of the node.
Invoke the HTML parser (that thing that browsers use to parse HTML, rather than regex, why you wouldn't use the HTML parser to parse HTML is totally beyond me, it's like saying "Yeah, I could nail in this nail with a hammer, but I think I'm going to just stomp on the nail with my foot until it goes in").
Find all the CODE elements in the newly parsed HTML.
Log them to console, or whatever you want to do with them.
First of all, you should be aware that because HTML is not a regular language, you cannot do generic parsing using regular expressions that will work for all valid inputs (generic nesting in particular cannot be expressed with regular expressions). Many parsers do use regular expressions to match individual tokens, but other algorithms need to be built around them
However, for a fixed input such as this, it's just a case of working through the structure you have (though it's still often easier to use different parsing methods than just regular expressions).
First lets get all the code:
var code = '', match = [];
var regex = new RegExp("<code>(.*?)</code>", "g");
while (match = regex.exec(content)) {
code += match[1] + "\n";
}
I assume content contains the content of the div that you've already extracted. Here the "g" flag says this is for "global" matching, so we can reuse the regex to find every match. The brackets indicate a capturing group, . means any character, * means repeated 0 or more times, and ? means "non-greedy" (see what happens without it to see what it does).
Now we can do a similar thing to get all the other bits, but this time the regex is slightly more complicated:
new RegExp("(<!--|</code>)(.*?)(-->|<code>)", "g")
Here | means "or". So this matches all the bits that start with either "start comment" or "end code" and end with "end comment" or "start code". Note also that we now have 3 sets of brackets, so the part we want to extract is match[2] (the second set).
You're doing a lot of unnecessary stuff. .html() gives you the inner contents as a string. You should be able to use regEx to grab exactly what you need from there. Also, try to stick with regEx literals (e.g. /^regexstring$/). You have to escape escape characters using new RegExp which gets really messy. You generally only want to use new RegExp when you need to put a string var into a regEx.
The match function of strings accepts regEx and returns a collection of every match when you add the global flag (e.g. /^regexstring$/g <-- note the 'g'). I would do something like this:
var block = $('#mydiv').html(), //you can set multiple vars in one statement w/commas
matches = block.match(/<code>[^<]*<\/code>/g);
//[^<]* <-- 0 or more characters that aren't '<' - google 'negative character class'
matches.join('_') //lazy way of avoiding a loop - join into a string with a safe character
.replace(/<\/*code>/g,'') //\/* 0 or more forward slashes
.split('_');//return the matches string back to array
//Now do what you want with matches. Eval (ew) or append in a script tag (ew).
//You have no control over the 'ew'. I just prefer data to scripts in strings
I'm trying to parse and amend some html (as a string) using javascript and in this html, there are references (like img src or css backgrounds) to filenames which contain full stops/periods/dots/.
e.g.
<img src="../images/filename.01.png"> <img src="../images/filename.02.png">
<div style="background:url(../images/file.name.with.more.dots.gif)">
I've tried, struggled and failed to come up with a neat regex to allow me to parse this string and spit it back out without the dots in those filenames, e.g.
<img src="../images/filename01.png"/> <img src="../images/filename02.png"/>
<div style="background:url(../images/filenamewithmoredots.gif)">
I only want to affect the image filenames, and obviously I want to leave the filetype alone.
A regex like:
/(.*)(?=(.gif|.png|.jpg|.jpeg))
allows me to match the main part of the filename and the extension seperately, but it also matches across the whole of the string, not just within the one filename I want.
I have no control over the incoming html, I'm just consuming it.
Help me please overflowers, you're my only hope!
I agree that this is not a problem suitable for regular expression, much less one neat expression.
But I trust that you are not here to hear that. So, in case you want to keep the input as string...
var src, result = '<img src="../images/filename.01.png"> <img src="../images/filename.02.png"><div style="background:url(../images/file.name.with.more.dots.gif)">';
do {
src = result;
result = src.replace( /((?:url(\()|href=|src=)['"]?(?:[^'"\/]*\/)*[^'"\/]*)\.(?=[^\.'")]*\.(?:gif|png|jpe?g)['")>}\s])/g, '$1' );
} while (result != src)
Basically it keeps removing the second last dot of images url's filenames until there are none. Here is a breakdown of the expression in case you need to modify it. Tread lightly:
( start main capturing group since js regx has no lookbehind.
(?:url(\()|href=|src=)['"]? Start of an url. it would be safer to force url() to be properly quoted so that we can use back reference, but unfortunately your given example is not.
(?:[^'"\/]*\/)* Folder part of the url.
[^'"\/]* Part of the file name that comes before second last dot.
) close main group.
\. This is the second last dot we want to get rid of.
(?= Look behind.
[^\.'")]* Part of the file name that goes between second last dot and last dot.
\.(?:gif|png|jpe?g) Make sure the url ends in image extension.
['")>}\s] Closing the url, which can be a quote, ')', '>', '}', or spaces. Should user back reference here if possible. (Was ['"]?\b when first answered)
) End of look behind.
Consider using the DOM instead of regular expressions. One way is to create fake elements.
var fake = document.createElement('div');
fake.innerHTML = incomingHTML: // Not really part of JS standard but all the 'main' browsers support it
var background = fake.childNodes[0].style.background;
// Now use a regex if need be: /url\(\"?(.*)\"?\)/
// If img is at childNodes[1]
var url = fake.childNodes[1].src;
With jQuery this is far easier:
$(incomingHTML).find('img').each(function() { $(this).attr('src'); });
Your problem is the greedy match in .*. Maybe better try something like this
([^\/]*)(?=(.gif|.png|.jpg|.jpeg))
[^\/] is a character class that matches every character but slashes
another point is, you need to escape the . to match it literally
([^\/]*)(?=\.(gif|png|jpg|jpeg))
The problem is that . means "any character".
Escape it:
/(.*)(?=(\.gif|\.png|\.jpg|\.jpeg))