regular expression to match specific text not linked - javascript

I would like to write a regular expression in javascript to match specific text, only when it is not part of an html link, i.e.
match match text
would not be matched, but
match text
or
<p>match text</p>
would be matched.
(The "match text" will change each time the search is run - I will use something like
var tmpStr = new RegExp("\bmatch text\b","g");
where the value of "match text" is read from a database.)
So far my best effort at a regular expression is
\bmatch text\b(?!</a>)
This deals with the closing , but not the initial . This will probably work fine for my purposes, but it does not seem ideal. I'd appreciate any help with refining the regular expression.

You can use a negative look-behind to get the opening <a href=...:
var tmpStr = new RegExp('(?<!<a.*?>)match text(?!</a>)');
Hope that works for you.

Thanks for the very quick and helpful answers. Just to clarify, the regular expression I ended up using was
(?!<a.*?>)\bmatch text\b(?!</a>)

Related

Regex that matches everything except given full word

This is similar to javascript regular expression to not match a word, but the accepted answer shows a regex that doesn't match if the word is inside the line. I want to match everything except the full word. It will match "__lambda__".
This is also similar to my question Regex that match everything except the list of strings but that one gives a solution with String::split and I want to use normal match full string.
For example, it should match everything except /^lambda$/ I would prefer to have a regex for this, since it will fit my pattern patching function. It should match lambda_ or _lambda and everything else - except full "lambda" token.
I've tried this:
/^(.(?!lambda))+$/
and
/^((?!lambda).)+$/
this works
/^(.*(?<!lambda))$/
but I would prefer to not have to use a negative lookbehind if I can avoid it (because of browser support). Also, I have interpreter written in JavaScript that I need this for and I would like to use this in guest language (I will not be able to compile the regex with babel).
Is something like this possible with regex without lookbehind?
EDIT:
THIS IS NOT DUPLICATE: I don't want to test if something contains or doesn't contain the word, but if something is not exact word. There are not question on Stack Overflow like that.
Question Regex: Match word not containing has almost as correct an answer as mine, but it's not the full answer to the question. (It would help in finding solution, though.)
I was able to find the solution based on How to negate specific word in regex?
var re = /^(?!.*\blambda\b).*$/;
var words = ['lambda', '_lambda', 'lambda_', '_lambda_', 'anything'];
words.forEach(word => {
console.log({word, match: word.match(re)});
});

How to match everything except specified characters and strings? Regex

I am building a graph drawer and currently working on the math expression parser. I'm done with most parts but I'm stuck at clearing the input text before parsing it. What I'm trying to achieve now is getting rid of unpermitted characters.
For example, in this text:
5ax+4asxxv+sdflog10aloga(132*43)sin(132)
I want to match everything that is not +,-,*,/,^,(,),ln,log,sin,cos,tan,cot,arcsin,arccos,...
and replace them with "".
so that the output is
5x+4xx+log10log(132*43)sin(132)
I need help with the regex.
Spaces don't matter since I clear them out beforehand.
A little bit tricky - at least I couldn't think of a simple way to do what you ask. The regex would get monstrous.
So I did it the other way around - match what you want to keep, and put it back together.
The regex:
[\d+*/^()x-]|ln|log|(?:arc)?(?:sin|cos)|tan|cot
The code:
var re = /[\d+*/^()x-]|ln|log|(?:arc)?(?:sin|cos)|tan|cot/g,
text = '5ax+4asxxv+sdflog10aloga(132*43)sin(132)arccos(1)';
console.log(text.match(re).join(''));

javascript regex to match all custom commented out sections, but not other text

Say I have a html doc like so:
<!--FOO-->
some text
<!--BAR-->
some other text
<!--FOO-->
some more text
<!--BAR-->
How can I write a javascript regex that matches both cases of
<!--FOO-->anytext<!--BAR-->
but not the text in between ('some other text' in this case).
My regex that I thought would work is
/<!--FOO-->(.|\n)*<!--BAR-->/
but it catches the 'some other text' as well.
You need the non-greedy operator ?, like this:
/<!--FOO-->(.|\n)*?<!--BAR-->/
Demo
A slightly better version would be this, letting you actually capture the text between the comments:
/<!--FOO-->((?:\n|.)*?)<!--BAR-->/
Demo
That said, parsing HTML with regex rarely ends well... See here for the classic explanation of the problem. You are better off using a library, unless your parsing is limited to the very simple case in your question.

Confused with Regex JS pattern

ok i do have this following data in my div
<div id="mydiv">
<!--
what is your present
<code>alert("this is my present");</code>
where?
<code>alert("here at my left hand");</code>
oh thank you! i love you!! hehe
<code>alert("welcome my honey ^^");</code>
-->
</div>
well what i need to do there is to get the all the scripts inside the <code> blocks and the html codes text nodes without removing the html comments inside. well its a homework given by my professor and i can't modify that div block..
I need to use regular expressions for this and this is what i did
var block = $.trim($("div#mydiv").html()).replace("<!--","").replace("-->","");
var htmlRegex = new RegExp(""); //I don't know what to do here
var codeRegex = new RegExp("^<code(*n)</code>$","igm");
var code = codeRegex.exec(block);
var html = "";
it really doesn't work... please don't give the exact answer.. please teach me.. thank you
I need to have the following blocks for the variable code
alert("this is my present");
alert("here at my left hand");
alert("welcome my honey ^^");
and this is the blocks i need for variable html
what is your present
where?
oh thank you! i love you!! hehe
my question is what is the regex pattern to get the results above?
Parsing HTML with a regular expression is not something you should do.
I'm sure your professor thinks he/she was really clever and that there's no way to access the DOM API and can wave a banner around and justify some minor corner-case for using regex to parse the DOM and that sometimes it's okay.
Well, no, it isn't. If you have complex code in there, what happens? Your regex breaks, and perhaps becomes a security exploit if this is ever in production.
So, here:
http://jsfiddle.net/zfp6D/
Walk the dom, get the nodeType 8 (comment) text value out of the node.
Invoke the HTML parser (that thing that browsers use to parse HTML, rather than regex, why you wouldn't use the HTML parser to parse HTML is totally beyond me, it's like saying "Yeah, I could nail in this nail with a hammer, but I think I'm going to just stomp on the nail with my foot until it goes in").
Find all the CODE elements in the newly parsed HTML.
Log them to console, or whatever you want to do with them.
First of all, you should be aware that because HTML is not a regular language, you cannot do generic parsing using regular expressions that will work for all valid inputs (generic nesting in particular cannot be expressed with regular expressions). Many parsers do use regular expressions to match individual tokens, but other algorithms need to be built around them
However, for a fixed input such as this, it's just a case of working through the structure you have (though it's still often easier to use different parsing methods than just regular expressions).
First lets get all the code:
var code = '', match = [];
var regex = new RegExp("<code>(.*?)</code>", "g");
while (match = regex.exec(content)) {
code += match[1] + "\n";
}
I assume content contains the content of the div that you've already extracted. Here the "g" flag says this is for "global" matching, so we can reuse the regex to find every match. The brackets indicate a capturing group, . means any character, * means repeated 0 or more times, and ? means "non-greedy" (see what happens without it to see what it does).
Now we can do a similar thing to get all the other bits, but this time the regex is slightly more complicated:
new RegExp("(<!--|</code>)(.*?)(-->|<code>)", "g")
Here | means "or". So this matches all the bits that start with either "start comment" or "end code" and end with "end comment" or "start code". Note also that we now have 3 sets of brackets, so the part we want to extract is match[2] (the second set).
You're doing a lot of unnecessary stuff. .html() gives you the inner contents as a string. You should be able to use regEx to grab exactly what you need from there. Also, try to stick with regEx literals (e.g. /^regexstring$/). You have to escape escape characters using new RegExp which gets really messy. You generally only want to use new RegExp when you need to put a string var into a regEx.
The match function of strings accepts regEx and returns a collection of every match when you add the global flag (e.g. /^regexstring$/g <-- note the 'g'). I would do something like this:
var block = $('#mydiv').html(), //you can set multiple vars in one statement w/commas
matches = block.match(/<code>[^<]*<\/code>/g);
//[^<]* <-- 0 or more characters that aren't '<' - google 'negative character class'
matches.join('_') //lazy way of avoiding a loop - join into a string with a safe character
.replace(/<\/*code>/g,'') //\/* 0 or more forward slashes
.split('_');//return the matches string back to array
//Now do what you want with matches. Eval (ew) or append in a script tag (ew).
//You have no control over the 'ew'. I just prefer data to scripts in strings

regular expressions - finding the position of a number and removing brackets around it

I'm stuck. I tried it with regular expressions, but I guess I'm missing something. I'm working with JavaScript.
I have an input like:
(text [number]) the text that follows...
I want an output like:
[number] the text that follows...
I tried it with substr, but my problem is that I do not know the length of the text or number in the brackets. I guess I need the position of the beginning and ending of the number to work with a regEx.
Have you got an idea?
Regexes are the way to go — using JavaScript’s replace function, you don’t need to fiddle with the position of the number in the string.
Try this:
var geoff = '(text 694) the text that follows...';
var geoff_replaced = geoff.replace(/\([^0-9]* ([0-9]*)\)/, '$1');
# geoff_replaced will be "694 the text that follows...
I don’t do much JavaScript regex stuff, so I totally looked up the above on this guide to JavaScript regexes:
http://www.evolt.org/node/36435
It'd help to have a real example but I made one up...
Text:
(Some text 1234) some more text.
Regex:
^.+?(?<Number>\d+)\)(?<Text>.+)$
Replacement:
${Number}${Text}
Full example:
var fixedText = "(Some text 1234) some more text.".replace(/^.+?(?<Number>\d+)\)(?<Text>.+)$/, "${Number}${Text}");
the regex that matches (text [number]) the text that follows... can be like:
"^\(.*?([0-9]*)\)(.*)$"
or you can just match the beginning (and the ending )) and remove it
"^(\(.*?)[0-9]*(\)).*$"

Categories