Regex to extract path tree from window.pathname - javascript

Say I access Javascript's window.pathname and get /you/are/here.
Is it possible to construct a regular expression that incrementally matches each part of the path starting from the beginning? In other words, my_regex.exec(window.pathname) would return an array of matches like this:
["/you", "/you/are", "/you/are/here", index: 0, input: "/you/are/here"]

No, regular expressions will not do it. You should match "/[a-zA-Z0-9]+" ( or something that captures the identifiers) and then create the strings by looping over the matches.

You should be able to run it like this:
location.pathname.match(/\w+/g)
That should return an array with all whole words. Of course, a path can also consist of spaces and underscores, as well as % in case of url encoding. So to cover those as well:
location.pathname.match(/[\w_\s.%]+/g)
The bracket creates a character class where any of the characters between the brackets are considered part of the character.
Inside the class we have \w for all regular characters (A-Za-z0-9), followed by underscore(_); any type of space (\s); a period (.); and finally a percentage sign (%).
After the character class we add + to say the want to find at least one, but as many as possible.
The g flag at the end forces it to become global, which should return an array with all hits.

Related

Matching varients and mis-spellings of a word using RegEx in MS Word

I am trying to capture varients of a word using Microsft Word find and replace function. Here is a searchable snippet:
There are going to be 3 instances of the word successful for the purpose of Regex matching. Here is the second sucesfull and here is another succesfull , both spelt incorrectly.
This is my Regex expression used in Find and Replace with "Use Wildcards" selected (I have also tried this with replacing the braces with brackets with no joy)
<([Ss]uc[1,]es[1,]ful[1,])>
[Ss]uc{1,}es{1,}ful{1,}
Replace the [ ] with { } and it should work fine. The curly braces specify how many times you want a character to repeat. Square brackets are used to specify the acceptable characters.
So the current regular expression will match the following.
succcccesssfulll
sucesful
successful
Successsssfull
and so on.
I think this is cleaner and easier to type.
[Ss]uc+es+ful+
"+" counts for one or more occurrence of a character.
The search string you want would be:
<[sS]uc#es#ful#>
This searches for a word (the < and > symbols) starting with either s or S and including one or more (the # symbol) of c, s, and l.

Parsing units with javascript regex

Say I have a string which contains some units (which may or may not have prefixes) that I want to break into the individual units. For example the string may contain "Btu(th)" or "Btu(th).ft" or even "mBtu(th).ft" where mBtu(th) is the bastardised unit milli thermochemical BTU's (this is purely an example).
I currently have the following (simplified) regex however it fails for the case "mBtu(th).ft":
/(m|k)??(Btu\(th\)|ft|m)(?:\b|\s|$)/g
Currently this does not correctly detect the boundary between the end of 'Btu(th)' and the start of 'ft'. I understand javascript regex does not support look back so how do I accurately parse the string?
Additional notes
The regex presented above is greatly simplified around the prefixes and units groups. The prefixes could span multiple characters like 'Ki' and therefore character sets are not suitable.
The desire is for each group to catch the prefix match as group 1 and the unit as match two i.e for 'mBtu(th).ft' match one would be ['m','Btu(th)'] and match two would be ['','ft'].
The prefix match needs to be lazy so that the string 'm' would be matched as the unit metres rather than the prefix milli. Likewise the match for 'mm' would need to be the prefix milli and the unit metres.
I would try with:
/((m)|(k)|(Btu(\(th\))?)|(ft)|(m)|(?:\.))+/g
at least with example above, it matches all units merged into one string.
DEMO
EDIT
Another try (DEMO):
/(?:(m)|(k)|(Btu)|(th)|(ft)|[\.\(\)])/g
this one again match only one part, but if you use $1,$2,$3,$4, etc, (DEMO) you can extract other fragments. It ignores ., (, ), characters. The problem is to count proper matched groups, but it works to some degree.
Or if you accept multiple separate matches I think simple alternative is:
/(m|k|Btu|th|ft)/g
A word boundary will not separate two non-word characters. So, you don't actually want a word boundary since the parentheses and period are not valid word characters. Instead, you want the string to not be followed by a word character, so you can use this instead:
[mk]??(Btu\(th\)|ft|m)(?!\w)
Demo
I believe you're after something like this. If I understood you correctly that want to match any kind of element, possibly preceded by the m or k character and separated by parantheses or dots.
/[\s\.\(]*(m|k?)(\w+)[\s\.\)]*/g
https://regex101.com/r/eQ5nR4/2
If you don't care about being able to match the parentheses but just return the elements you can just do
/(m|k?)(\w+)/g
https://regex101.com/r/oC1eP5/1

Matching variable-term equations

I am trying to develop a regular expression to match the following equations:
(Price+10%+100+200)
(Price+20%+200)
(Price+30%)
(Price+100)
(Price-10%-100-200)
(Price-20%-200)
(Price-30%)
(Price-100)
My regex so far is...
/([(])+([P])+([r])+([i])+([c])+([e])+([+]|[-]){1}([\d])+([+]|[-])?([\d])+([%])?([)])/g
..., but it only matches the following equations:
(Price+100+10%)
(Price+100+100)
(Price+200)
(Price-100-10%)
(Price-100-100)
(Price-200)
Can someone help me understand how to make my pattern match the full set of equations provided?
Note: Parentheses and 'Price' are musts in the equations that the pattern must match.
Try this, which matches all the input strings provided in the question:
/\(Price([+-]\d+%?){1,3}\)/g
You can test it in a regex fiddle.
Things to note:
Only use parentheses where you want to group. Parentheses around single-possibility, fixed-quantity matches (e.g. ([P]) provide no value.
Use character classes (opened with [ and closed with ]) for multiple characters that can match at a position in the pattern (e.g. [+-]). Single-possibility character classes (e.g. [P]) similarly provide no value.
Yes, character classes (generally) implicitly escape regex special characters within them (e.g. ( in [(] vs. equivalent \( outside a character class), but to just escape regex special characters (i.e. to match them literally), you are better off not using a character class and just escaping them (e.g. \() – unless multiple characters should match at a position in the pattern (per the previous point to note).
The quantifier {1} is (almost) always useless: drop it.
The quantifier + means "one or more" as you probably know. However, in a series of cases where you used it (i.e. ([(])+([P])+([r])+([i])+([c])+([e])+), it would match many values that I doubt you expect (e.g. ((((((PPPrriiiicccceeeeee): basically, don't overuse it. Stop to consider whether you really want to match one or more of the character (class) or group to which + applies in the pattern.
To match a literal string without any regex special characters like Price, just use the literal string at the appropriate position in the pattern – e.g. Price in \(Price.
/\(Price[+-](\d)+(%)?([+-]\d+%?)?([+-]\d+%?)?\)/g
works on http://www.regexr.com/
/^[(Price]+\d+\d+([%]|[)])&/i
try at your own risk!

Regex for property paths

I am trying to match property-syntax with a Javascript regex. Is there a reliable way to do this? I would need to match a string like the following-
someobject.somekey.somechildkey.somegrandchildkey
I don't need the path members, I just need to know if a string contains a path. For example, given a string like this
This is some long string that contains a property.path.syntax, and I need to test it.
Try this:
/\b(?:\S+?\.)+\S+\b/g
Demo
This is bounded by two word boundaries, which should work in most cases (a word character next to a non-word character). Then we lazily repeat 1+ non-whitespace character followed by a . (which needs to be escaped). We use \S for non-whitespace, because like #TJCrowder said, properties can contain many characters. There always has to be another set of non-whitespace characters after the last period.
Working within the limits you've identified in the comments:
/(?:[a-zA-Z_$]+[\w$]*)(?:\.[a-zA-Z_$]+[\w$]*)+/g
Live Copy with details (The g flag if you need to do this repeated.)
That says:
Anything starting with a-z, A-Z, _, or $ (emphasizing again this is an incomplete list)
...followed by any number of those plus digits
Followed by one or more non-capturing groups of the same thing, but starting with a .
Or if you need it not to match one.that and should.not in:
blah one.that.1should.not blah
Then:
/(?:\s|^)((?:[a-zA-Z_$]+[\w$]*)(?:\.[a-zA-Z_$]+[\w$]*)+)(?:\s|$)/g
Live Copy
That says the same thing as the one earlier, but plus:
Tequires whitespace or beginning-of-input to start with ((?:\s|^)) and whitespace-or-end-of-input at the end ((?:\s|$)).
Uses a capture group so you can get just the property path without the optional whitespace on either side of it
Just to recap, the valid list of JavaScript identifier characters is very large, much larger than \w (which is [a-zA-Z0-9_]). It's not like some languages that only allow those characters. All sorts of normal-to-large-numbers-of-people characters are allowed, such as ç, ö, ñ (and arabic, and Japanese, and Chinese, and ...). And there are basically no limits on property names (e.g., if you exprss them as strings), only property name literals. More: http://ecma-international.org/ecma-262/5.1/#sec-7.6
var expr = /[a-zA-Z_]([a-zA-Z0-9_]*\.[a-zA-Z_][a-zA-Z0-9_]*)+/i;
expr.test("your.test.case");
The above regexp:
doesn't match .test
doesn't match test.
doesn't match test
doesn't match 0test, because it cannot be a Javascript property (you cannot start the name of a variable with a number)
EDIT: as suggested by Paulchenkiller, and also considering the i at the end stands by "case insensitive", you can also use the following shorter form:
var expr = /[a-z_](\w*\.[a-z_]\w*)+/i;

How to find occurence of multiple strings in a given string using javascript RegExp()

I wanted to check the availability of multiple strings in a given string ( without using a loop ).
like
my_string = "How to find occurence of multiple sting in a given string using javascript RegExp";
// search operated on this string
// having a-z (lower case) , '_' , 0-9 , and space
// following are the strings wanted to search .( having a-z , 0-9 , and '_')
search_str[0]="How";
search_str[1]="javascript";
search_str[2]="sting";
search_str[3]="multiple";
I don't need their position.
I just needed to know all the search_str are must be in my_string.
order of search_str never effect the result .
is there is any regular expression available for this ?
UPDATE : WHAT AM I MISSING
in the answers i found this one is working in the above problem
if (/^(?=.*\bHow\b)(?=.*\bjavascript\b)(?=.*\bsting\b)(?=.*\bmultiple\b)/.test(subject)) {
// Successful match
}
But in this case it is not working.
m_str="_3_5_1_13_10_11_";
search_str[0]='3';
search_str[1]='1';
tst=new RegExp("^(?=.*\\b_"+search_str[0]+"_\\b)(?=.*\\b_"+search_str[1]+"_\\b)");
if(tst.test(m_str)) alert('fooooo'); else alert('wrong');
if (/^(?=.*\bHow\b)(?=.*\bjavascript\b)(?=.*\bsting\b)(?=.*\bmultiple\b)/.test(subject)) {
// Successful match
}
This assumes that your string doesn't contain newlines. If it does, you need to change all the .s to [\s\S].
I have used word boundary anchors to make sure that Howard or resting don't accidentally provide a match. If you do want to allow that, remove the \bs.
Explanation:
(?=...) is a lookahead assertion: It looks ahead in the string to check whether the enclosed regex could match at the current position without actually consuming characters for the match. Therefore, a succession of lookaheads works like a sequence of regexes (anchored to the start of the string by ^) that are combined with a logical && operator.

Categories