I am making a parser of sorts in javascript that takes a mathematical expression given to the script as a string, and evaluates it and does some other things with it. If the users want to use builtin Javascript mathematical functions, they have to enter the following string e.g. "1 + Math.log(x)". That becomes very tedious when things get nested e.g "Math.abs(Math.log(Math.pow(x, 2))) + Math.log2(x)". As you can see, the "Math." part of it not only takes longer to write, but makes it less readable. I want to remove that "Math." part. The way I've done it is using simple regex that basically has a list of all Javascript Math constants and methods and simply prepends the "Math." part of it. Simple enough:
input = input.replace(/(E|PI|SQRT2|SQRT1_2|LN2|LN10|LOG2E|LOG10E)/g, "Math.$1");
The same things happens for the methods. This works fine. But as always that's not very flexible and leaves room for misunderstanding and somebody may coma along and insist on typing Math.log(x) which will in turn be replaced with Math.Math.log(x), which won't work.
What I want to know is, is there some way to match any of these predefined strings (constants and methods) so that they will only be matched via regex if they don't have the "Math." part in front of it. I have tried this
^(?!Math\.)(log2|log|exp|abs)
but it is quite useless, as this doesn't work with nesting and even multiple operands. Is there any way to do this purely in regex, as this would make the entire process more elegant. Any help would be appreciated.
You can use the following trick so that it gets replaced even if it matches or not:
(?:Math\.)?(log2|log|exp|abs|pow)
And replace with Math.$1
See DEMO
Related
I am parsing a series of strings with various formats. The last edge case encountered has me stumped. I'm not a great regexer, believe me it was a challenge to get to this point.
Here are critical snippets from the strings I'm trying to parse. The second example is the current edge case I'm stuck on.
LBP824NW2-58.07789x43.0-207C72
LBP824WW1-77.6875 in. x 3.00 in. 24VDC
I am trying to grab all of the digits (including the decimal) that make up the width part of the dimension in the string (this would be the first number in the dimension). What works in every other case has been grabbing all digits from the "-" to the "x" using the following expression:
/-(\d+\.?\d+?)x\B/
However, this does not handle the cases that have inches included in the dimension. I thought about using "look-aheads" or "look-behinds", but I got confused. Any suggestions would be appreciated.
RegEx can be told to look for "zero or one" of things, using (...)? syntax, so if your pattern already works but it gets confused by a new pattern that simply has "more string data embedded in what is otherwise the same pattern" you can add in zero-or-one checks and you should be good to go.
In this case, putting something like (\s*in\.?\s*)? in a few tactical places to either match "any number of spaces (including none) followed by in followed by an optional full stop followed by any number of spaces (including none)" or nothing should work.
That said, "I cannot change the formatting" is almost never an argument, because while you can't change the formatting, you can almost always change what parses it. RegEx might be adequate, but some code that checks for what kind of general patter it is, and then calls the appropriate function for tokenizing and inspecting that specific string pattern should be quite possible. Unless you've been hired to literally update some predefined CLi script that has a grep in it and you're not allowed to touch anything except for the pattern...
This is the working solution using regex: -(\d+\.?\d+?)(\s*in\.?\s*|x)
I'm working on several documents that are within just a file, and before working on the documents, I need to define where one document begins and ends. For this, I am using the following regex:
MINISTÉRIO\sDO\sTRABALHO\sE\sEMPREGO(?:[^P]*(?:P(?!ÁG\s:\s\d+\/\d+)[^P]*)*)PÁG\s:\s\d+\/(\d+)\b(?:\D*(?:(?!\1\/\1)\d\D*)*)\1\/\1(?:[^Z]*(?:Z(?!6:\s\d+)[^Z]*)*)Z6:\s\d+
Example is here
Is working 100%, the problem is, sometimes the text does not come this way I showed.. it comes with spaces and lines. As you can see here, the document is the same as the previous one, but the regular expression does not work. I wonder why is not working and how to fix to make it work ?
Also, I need modify the regex, not the text, cause the only real part that I have access is the regex.
OBS: I'm using Node.JS, that's why i'm tagging with JS this post.
First of all I'm new to stackoverflow so I'm sorry if I posted this in the wrong section.
I need a regex to search within the html tag and replace the - with a _
e.g:
<TAG-NAME>-100</TAG-NAME>
would become
<TAG_NAME>-100</TAG_NAME>
note that the value inside the tag wasn't affected.
Can anyone help?
Thanks.
Since JavaScript is the language for DOM manipulation, you should generally consider parsing the XML properly and using JavaScript's DOM traversal functions instead of regular expressions.
Here is some example code on how to parse an XML document so that you can use the DOM traversal functions. Then you can traverse all elements and change their names. This will automatically exclude text nodes, attributes, comments and all other annoying things, you don't want to change.
If it has to be a regex, here is a makeshift solution. Note that it will badly fail you if you have tags (or even only >) inside attribute names or comments (in fact it will also apply the replacement to comments):
str = str.replace(/-(?=[^<>]*>)/g, '_');
This will match a - if it is followed by a > without encountering a < before. The concept is called a negative lookahead. The g modifier makes sure that all occurrences are replaced.
Note that this will apply the replacement to anything in front of a >. Even attribute values. If you don't want that you could also make sure that there is an even number of quotes between the hyphen and the closing >, like this:
str = str.replace(/-(?=[^<>"]*(?:"[^<>"]*"[^<>"]*)*>)/g, '_');
This will still change attribute names though.
Here is a regexpal demo that shows what works and what doesn't work. Especially the comment behavior is quite horrible. Of course this could be taken care of with an even more complex regex, but I guess you see where this is going? You should really, really use an XML parser!
s/(\<[^\>]+\>)\-([^\<]+\<\/)/\1_\2/
Although I am not familiar with JS libraries, but I am pretty sure there would be better libraries to parse HTML.
So I'm working on a micro lib, html.js, and basically it creates text nodes with document.createTextNode but when I want to create a text node with a b I get a b so I'm wondering how to escape the & char, without using innerHTML ideally..
Javascript supports the \uXXXX notation, so in the case of a non-breaking space, that would be \u00A0.
document.createTextNode('a\u00A0b');
That's as far as you can get. It's a text node, consisting only of text, and there's no difference between texts created from entity references or from normal characters.
If that's not what you want, you should take a second look at innerHtml. Can't you read it, modify it and put it back?
There's not much functionality in js to encode/decode html entities. Seems like there some libraries out there, though, that can help you achieve this. Here is one I found on goodle.. haven't tried it, but you can check it out, or look for others.
http://www.strictly-software.com/htmlencode
I need to search the text in a HTML document for reg-exes(emails, phone numbers, etc) and words. The matches need to be highlighted and be made anchor-able so that a link can be generated to jump to the location of the matches. So not only does it need to find matches using patterns in needs to do a replace do add the proper html code.
I am currently using jquery but I am not very happy with the speed. In a 1.5mb file it takes about 5 seconds to match 2 regexes and it increases when I add more search criteria.
Does anyone know of a fast method to find regex matches in a large document using javascript?
You say you're "using jQuery" but you don't say how. Have you tried a "highlight" plugin (or, as it sounds like you'd need, a derivation of one)? I've used this one: http://johannburkard.de/blog/programming/javascript/highlight-javascript-text-higlighting-jquery-plugin.html and it doesn't seem slow to me. Again, you'd have to work on it to make it add the markup you need, but that should be pretty clear - it's not very big.
It seems like what you'd want to do for performance is take your regular expressions and combine them into what amounts to a "token grammar". In other words, you don't want to start from scratch looking for each regex individually throughout the entire document. Instead, you'd want to proceed through it with a regex that matches each possible target (one at a time of course), and each time it finds one you'd replace it with whatever's appropriate. That way you could make just one pass over the document, no matter how big it is and no matter how many patterns you're looking for.
edit Mr. Burkard's plugin doesn't let you search with regexes; it uses "indexOf" internally. Hmm.