Cant get the correct regex - javascript

It drives me crazy to get the correct regex, can any one help, much appreciated.
Source String:
<checklist><checklist class="ng-scope">it can be any content but no more "checklist tag" pair inside</checklist></checklist>
<checklist><checklist class="ng-scope">it can be any content but no more "checklist tag" pair inside</checklist></checklist>
Result string needed :
<checklist></checklist>
<checklist></checklist>
Basically I need to get rid of the content in between pair (no class attribute).
I tried regex something like this
"/[^(.?)[^]*/g" using phone editing , if you can see this correctly , please see the regex I included in the comment
it didn't work, i am fairly new to regex
The following code snippet can repeat multiple times in the source string:
<checklist><checklist class="ng-scope">it can be any content but no more "checklist tag" pair inside</checklist></checklist>

If you insist on a solution with regular expressions, you could do sth. like:
var string = '<checklist><checklist class="ng-scope">it can be any content but no more "checklist tag" pair inside</checklist></checklist>';
var regex = /<checklist\s+[^>]+>.*?<\/checklist>/gi;
// that is, look for a checklist tag with additional attributes
// match everything up to a new closing tag (non-greedy)
// followed by a closing tag
var strippedString = string.replace(regex, '');
alert(strippedString);
See a JS fiddle here and a regex101 demo here.
EDIT: Added /g as #Atri pointed out.
Otherwise, consider using either document.getElementById or some other DOM function.

Related

Replace everything after last character in URL

I have the following code which replaces the current URL using JavaScript:
window.location.replace(window.location.href.replace(/\/?$/, '#/view-0'));
However if I have a URL like:
domain.com/#/test or domain.com/#/
It will append the #/view-0 to the current hash. What I want to is replace EVERYTHING after the last part of the URL including any query strings or hashes.
So presume my regex doesn't handle that... How can I amend it, to be more aggressive?
The following syntax may help:
location.href.replace(/[?#].*$/, '#/view')
It will replace everything after (and together with) ? or # in the string with #/view.
(^[^\/]*?\/)(?:.*)
Use this.Replace by \1 then your string
See demo.
http://regex101.com/r/sA7pZ0/28

Selecting and wrapping all occurences of a certain string with jQuery

In a list of footnote references for an article, I want to select all occurences of "(en)" and wrap them in some so that I can apply bold style to them as well as a right margin.
How to do that with jQuery ?
lets say you can get all your data into a string:
var myString = $('#footnotes').html();
Since you don't need regex you can split into an array and rejoin.. ex:
var newString = myString.split("(en)").join("<span class='en-element'>(en)</span>");
$('#footnotes').html(newString);
Using regex to find the "(en)"'s, and then using string.replace to replace them with the content you want.
I know when to use regex but am not too familiar with it's sintaxys, so sorry for not providing the exact code, take a look at this post in which they tried to do the same but finding and replacing it with \n
jQuery javascript regex Replace <br> with \n

Confused with Regex JS pattern

ok i do have this following data in my div
<div id="mydiv">
<!--
what is your present
<code>alert("this is my present");</code>
where?
<code>alert("here at my left hand");</code>
oh thank you! i love you!! hehe
<code>alert("welcome my honey ^^");</code>
-->
</div>
well what i need to do there is to get the all the scripts inside the <code> blocks and the html codes text nodes without removing the html comments inside. well its a homework given by my professor and i can't modify that div block..
I need to use regular expressions for this and this is what i did
var block = $.trim($("div#mydiv").html()).replace("<!--","").replace("-->","");
var htmlRegex = new RegExp(""); //I don't know what to do here
var codeRegex = new RegExp("^<code(*n)</code>$","igm");
var code = codeRegex.exec(block);
var html = "";
it really doesn't work... please don't give the exact answer.. please teach me.. thank you
I need to have the following blocks for the variable code
alert("this is my present");
alert("here at my left hand");
alert("welcome my honey ^^");
and this is the blocks i need for variable html
what is your present
where?
oh thank you! i love you!! hehe
my question is what is the regex pattern to get the results above?
Parsing HTML with a regular expression is not something you should do.
I'm sure your professor thinks he/she was really clever and that there's no way to access the DOM API and can wave a banner around and justify some minor corner-case for using regex to parse the DOM and that sometimes it's okay.
Well, no, it isn't. If you have complex code in there, what happens? Your regex breaks, and perhaps becomes a security exploit if this is ever in production.
So, here:
http://jsfiddle.net/zfp6D/
Walk the dom, get the nodeType 8 (comment) text value out of the node.
Invoke the HTML parser (that thing that browsers use to parse HTML, rather than regex, why you wouldn't use the HTML parser to parse HTML is totally beyond me, it's like saying "Yeah, I could nail in this nail with a hammer, but I think I'm going to just stomp on the nail with my foot until it goes in").
Find all the CODE elements in the newly parsed HTML.
Log them to console, or whatever you want to do with them.
First of all, you should be aware that because HTML is not a regular language, you cannot do generic parsing using regular expressions that will work for all valid inputs (generic nesting in particular cannot be expressed with regular expressions). Many parsers do use regular expressions to match individual tokens, but other algorithms need to be built around them
However, for a fixed input such as this, it's just a case of working through the structure you have (though it's still often easier to use different parsing methods than just regular expressions).
First lets get all the code:
var code = '', match = [];
var regex = new RegExp("<code>(.*?)</code>", "g");
while (match = regex.exec(content)) {
code += match[1] + "\n";
}
I assume content contains the content of the div that you've already extracted. Here the "g" flag says this is for "global" matching, so we can reuse the regex to find every match. The brackets indicate a capturing group, . means any character, * means repeated 0 or more times, and ? means "non-greedy" (see what happens without it to see what it does).
Now we can do a similar thing to get all the other bits, but this time the regex is slightly more complicated:
new RegExp("(<!--|</code>)(.*?)(-->|<code>)", "g")
Here | means "or". So this matches all the bits that start with either "start comment" or "end code" and end with "end comment" or "start code". Note also that we now have 3 sets of brackets, so the part we want to extract is match[2] (the second set).
You're doing a lot of unnecessary stuff. .html() gives you the inner contents as a string. You should be able to use regEx to grab exactly what you need from there. Also, try to stick with regEx literals (e.g. /^regexstring$/). You have to escape escape characters using new RegExp which gets really messy. You generally only want to use new RegExp when you need to put a string var into a regEx.
The match function of strings accepts regEx and returns a collection of every match when you add the global flag (e.g. /^regexstring$/g <-- note the 'g'). I would do something like this:
var block = $('#mydiv').html(), //you can set multiple vars in one statement w/commas
matches = block.match(/<code>[^<]*<\/code>/g);
//[^<]* <-- 0 or more characters that aren't '<' - google 'negative character class'
matches.join('_') //lazy way of avoiding a loop - join into a string with a safe character
.replace(/<\/*code>/g,'') //\/* 0 or more forward slashes
.split('_');//return the matches string back to array
//Now do what you want with matches. Eval (ew) or append in a script tag (ew).
//You have no control over the 'ew'. I just prefer data to scripts in strings

match text between two html custom tags but not other custom tags

I have something like the following;-
<--customMarker>Test1<--/customMarker>
<--customMarker key='myKEY'>Test2<--/customMarker>
<--customMarker>Test3 <--customInnerMarker>Test4<--/customInnerMarker> <--/customMarker>
I need to be able to replace text between the customMarker tags, I tried the following;-
str.replace(/<--customMarker>(.*?)<--\/customMarker>/g, 'item Replaced')
which works ok. I would like to also ignore custom inner tags and not match or replace them with text.
Also I need a separate expression to extract the value of the attribute key='myKEY' from the tag with Text2.
Many thanks
EDIT
actually I am trying to find things between comment tags but the comment tags were not displaying correctly so I had to remove the '!'. There's a unique situation that required comment tags... in anycase if anyone knows enough regex to help, it would be great. thank u.
In the end, I did something like the following (incase anyone else needs this. enjoy!!! But note: Word about town is that using regex with html tags is not ideal, so do your own research and make up your mind. For me, it had to be done this way, mostly bcos i wanted to, but also bcos it simplified the job in this instance);-
var retVal = str.replace(/<--customMarker>(.*?)<--\/customMarker>/g, function(token, match){
//question 1: I would like to also ignore custom inner tags and not match or replace them with text.
//answer:
var replacePattern = /<--customInnerMarker*?(.*?)<--\/customInnerMarker-->/g;
//remove inner tags from match
match = $.trim(match.replace(replacePattern, ''));
//replace and return what is left with a required value
return token.replace(match, objParams[match]);
//question 2: Also I need a separate expression to extract the value of the attribute key='myKEY' from the tag with Text2.
//answer
var attrPattern = /\w+\s*=\s*".*?"/g;
attrMatches = token.match(attrPattern);//returns a list of attributes as name/value pairs in an array
})
Can't you use <customMarker> instead? Then you can just use getElementsByTagName('customMarker') and get the inner text and child elements from it.
A regex merely matches an item. Once you have said match, it is up to you what you do with it. This is part of the problem most people have with using regular expressions, they try and combine the three different steps. The regex match is just the first step.
What you are asking for will not be possible with a single regex. You're going to need a mini state machine if you want to use regular expressions. That is, a logic wrapper around the matches such that it moves through each logical portion.
I would advise you look in the standard api for a prebuilt engine to parse html, rather than rolling your own. If you do need to do so, read the flex manual to get a basic understanding of how regular expressions work, and the state machines you build with them. The best example would be the section on matching multiline c comments.

Referencing nested groups in JavaScript using string replace using regex

Because of the way that jQuery deals with script tags, I've found it necessary to do some HTML manipulation using regular expressions (yes, I know... not the ideal tool for the job). Unfortunately, it seems like my understanding of how captured groups work in JavaScript is flawed, because when I try this:
var scriptTagFormat = /<script .*?(src="(.*?)")?.*?>(.*?)<\/script>/ig;
html = html.replace(
scriptTagFormat,
'<span class="script-placeholder" style="display:none;" title="$2">$3</span>');
The script tags get replaced with the spans, but the resulting title attribute is blank. Shouldn't $2 match the content of the src attribute of a script tag?
Nesting of groups is irrelevant; their numbering is determined strictly by the positions of their opening parentheses within the regex. In your case, that means it's group #1 that captures the whole src="value" sequence, and group #2 that captures just the value part.
Try this:
/<script (?:(?!src).)*(?:src="(.*?)")?.*?>(.*?)<\/script>/ig
See here: rubular
As stema wrote, the .*? matches too much. With the negative lookahead (?:(?!src).)* you will match only until a src attribute.
But actually in this case you could also just move the .*? into the optional part:
/<script (?:.*?src="(.*?)")?.*?>(.*?)<\/script>/ig
See here: rubular
The .*? matches too much because the following group is optional, ==> your src is matched from one of the .*? around. if you remove the ? after your first group it works.
Update: As #morja pointed out your solution is to move the first .*? into the optional src part.
Just for completeness: /<script (?:.*?(src="(.*?)"))?.*?>(.*?)<\/script>/ig
You can see it here on rubular (corrected my link also)
If you don't want to use the content of the first capturing group, then make it a non capturing group using (?:)
/<script (?:.*?(?:src="(.*?)"))?.*?>(.*?)<\/script>/ig
Then your wanted result is in $1 and $2.
Could you post the html you are retrieving? Your code works fine in a simple example: jsfiddle (warning: alert box)
My first guess is that one of your script tags does not have a src meaning you are left with a single capture group (the script contents).
I'm thinking that regular expressions by themselves can't do exactly what I'm looking for, so here's my modification to work around the problem:
var scriptTagFormat = /<script\s+((.*?)="(.*?)")*\s*>(.*?)<\/script>/ig;
html = html.replace(
scriptTagFormat,
'<span class="script-placeholder" style="display:none;" $1>$4</span>');
Before, I wanted to avoid setting non-standard attributes on the replacement span. This code blindly copies all attributes instead. Luckily, the non-standard attributes aren't stripped out of the DOM when I insert the HTML, so it will work for my purposes.

Categories