Regex: Find string unless part of another string - javascript

I'm trying to figure out a regex pattern to parse a CSS file, looking for any instances of $, unless it's part of an attribute selector ($=, like in [attr$=foo]).
In other words, I'm looking for a way to find a string unless it's followed by another string. Not sure how to do that.
The script will run on node.js, v8.9.1 w/o flags, so I don't think I have Lookbehind.
Thnx/

You can try this :
str.match(/(\$)[^=]?/g);
You will have all "$" not followed by "=" in the 1st capturing group.

Related

Removing javascript reserved words from string

So, I have a string which is actually a javascript script. I need to remove first reserved javascript word from it, but only if it actually has the meaning of the reserved word. That means:
it can't be inside string literals ("" or '', like "return that thing to me");
it has to be preceded and followed by whitespace, linebreak and such;
any other cases where it's not a reserved word.
I have the hard time trying to write RegExp for this, as there always seems to be at least one case it doesn't work as intended.
Any help, please?
You have to use a more powerful method than regex - such as syntax analyzer to break your string into an abstract syntax tree. Then look for any keyword you want.
Try using the parser API of the Spider Money.
Or a library like UglifyJS or Esprima

Regular expression: matching word boundaries [duplicate]

This question already has answers here:
Regular expression to stop at first match
(9 answers)
Closed 2 years ago.
I have this gigantic ugly string:
J0000000: Transaction A0001401 started on 8/22/2008 9:49:29 AM
J0000010: Project name: E:\foo.pf
J0000011: Job name: MBiek Direct Mail Test
J0000020: Document 1 - Completed successfully
I'm trying to extract pieces from it using regex. In this case, I want to grab everything after Project Name up to the part where it says J0000011: (the 11 is going to be a different number every time).
Here's the regex I've been playing with:
Project name:\s+(.*)\s+J[0-9]{7}:
The problem is that it doesn't stop until it hits the J0000020: at the end.
How do I make the regex stop at the first occurrence of J[0-9]{7}?
Make .* non-greedy by adding '?' after it:
Project name:\s+(.*?)\s+J[0-9]{7}:
Using non-greedy quantifiers here is probably the best solution, also because it is more efficient than the greedy alternative: Greedy matches generally go as far as they can (here, until the end of the text!) and then trace back character after character to try and match the part coming afterwards.
However, consider using a negative character class instead:
Project name:\s+(\S*)\s+J[0-9]{7}:
\S means “everything except a whitespace and this is exactly what you want.
Well, ".*" is a greedy selector. You make it non-greedy by using ".*?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ".*?". This means that if for instance nothing comes after the ".*?", then it matches nothing.
Here's what I used. s contains your original string. This code is .NET specific, but most flavors of regex will have something similar.
string m = Regex.Match(s, #"Project name: (?<name>.*?) J\d+").Groups["name"].Value;
I would also recommend you experiment with regular expressions using "Expresso" - it's a utility a great (and free) utility for regex editing and testing.
One of its upsides is that its UI exposes a lot of regex functionality that people unexprienced with regex might not be familiar with, in a way that it would be easy for them to learn these new concepts.
For example, when building your regex using the UI, and choosing "*", you have the ability to check the checkbox "As few as possible" and see the resulting regex, as well as test its behavior, even if you were unfamiliar with non-greedy expressions before.
Available for download at their site:
http://www.ultrapico.com/Expresso.htm
Express download:
http://www.ultrapico.com/ExpressoDownload.htm
(Project name:\s+[A-Z]:(?:\\w+)+.[a-zA-Z]+\s+J[0-9]{7})(?=:)
This will work for you.
Adding (?:\\w+)+.[a-zA-Z]+ will be more restrictive instead of .*

Include earlier captured group in regex itself

I want this regex to work.
after finding first captured group, is it possible to refer it to same regex string.
(EP \d{5,7})(?:.*[\r\n]+){52}.*$1
I am currently using notepad++ to find the same in this way which works.
(EP \d{5,7})(?:.*[\r\n]+){52}.*\1
Is this possible in javascrip or vbscript regexp.
I tried using windows vbscript, Jscript and https://regex101.com/#javascript but seems like I am making some mistake.
In javascript a backreference is also denoted by a backslash
(test)\1

Extracting data from JavaScript (Python Scraper)

I'm currently using a fusion of urllib2, pyquery, and json to scrape a site, and now I find that I need to extract some data from JavaScript. One thought would be to use a JavaScript engine (like V8), but that seems like overkill for what I need. I would use regular expressions, but the expression for this seems way to complex.
JavaScript:
(function(){DOM.appendContent(this, HTML("<html>"));;})
I need to extract the <html>, but I'm not entirely sure how to do so. The <html> itself can contain basically every character under the sun, so [^"] won't work.
Any thoughts?
Why regex? Can't you just use two substrings as you know how many characters you want to trim off the beginning and end?
string[42:-7]
As well as being quicker than a regex, it then doesn't matter if quotes inside <html> are escaped or not.
If every occurance of " inside the html code would be escaped by using \" (it is a JavaScript string after all), you could use
HTML\("((?:\\"|.)*?)"\)
to get the parameter to HTML into the first capturing group.
Note that this Regex is not yet escaped to be a Javascript String itself.

Finding beginning and end quotations

I'm starting to write a code syntax highlighter in JavaScript, and I want to highlight text that is in quotes (both "s and 's) in a certain color. I need it be able to not be messed up by one of one type of quote being in the middle of a pair of the other quotes as well, but i'm really not sure where to even start. I'm not sure how I should go about finding the quotes and then finding the correct end quote.
Unless you're doing this for the challenge, have a look at Google Code Prettify.
For your problem, you could read up on parsing (and lexers) at Wikipedia. It's a huge topic and you'll find that you'll come upon bigger problems than parsing strings.
To start, you could use regular expressions (although they rarely have the accuracy of a true lexer.) A typical regular expression for matching a string is:
/"(?:[^"\\]+|\\.)*"/
And then the same for ' instead of ".
Otherwise, for a character-by-character parser, you would set some kind of state that you're in a string once you hit ", then when you hit " that is not preceded by an uneven amount of backslashes (an even amount of backslashes would escape eachother), you exit the string.
You can find quotes using regular expressions but if you're writing a syntax highlighter then the only reliable way is to step through the code, character by character, and decide what to do from there.
E.g. of a Regex
/("|')((?:\\\1|.)+?)\1/g
(matches "this" and 'this' and "thi\"s")
use stack.. if unmatched quote found push it.. if match found pop
I did it with a single regular expression in php using backwards references. JS does not support it and i think that's what you need if you really want to detect undefined backslashes.

Categories