RegEx start with, contains these, not end with? - javascript

I need a regular expression to find all comparison parts in the following example.
var magicalRegex = "";
var example = '({Color}=="red"|{Color}=="yellow")&{Size}>32';
example.replace(magicalRegex, function (matchedBlock) {
console.log(matchedBlock);
});
//so i want to see the following result on console
//{Color}=="red"
//{Color}=="yellow"
//{Size}>32
In fact i did some things but couldn't complete, also you may check the following template which i couldn't complete.
\{.*?(==|>)
https://regex101.com/r/aodDeX/1
Thanks

Answer
According to the example you have on regex101 as well as the string you have in your code snippet (two different strings) the following regex will do exactly what you want.
Answer 1
({.*?}(?:==|>)(?:\d+|(?:(["']?).*?\2)))
You can see this regex in use here
Answer 2
Note that I've added both single and double quotes in the above regex. If you only need double quotes, use the following regex.
({.*?}(?:==|>)(?:\d+|".*?"))
You can see this regex in use here
Explanation
These regular expressions work as follows:
Match {, followed by any character (except newline) any number of times, but as few matches as possible, followed by }
Match == or >
Match a digit one to unlimited times or match a quoted string (any character any number of times, but as few matches as possible) e.g. "something"
The regex captures the entire section and if you look at the examples on regex101 as presented, you can see what each capture group is matching. You can remove the capture groups if this is not the intended use.
Expected Results
Input
Note that the two strings below were used for testing purposes. One string is present in the question and the other is present in the link provided by the OP.
({Renk}=="kirmizi"or{Renk}=="sari")or{Size}>32
({Color}=="red"|{Color}=="yellow")&{Size}>32
Output
Note that the output mentioned hereafter specifies what is matched/also capture group 1 (since the whole regex is in a capture group). Any other groups are disregarded as they are not important to the overall question/answer.
{Renk}=="kirmizi"
{Renk}=="sari"
{Size}>32
{Color}=="red"
{Color}=="yellow"
{Size}>32

Related

How to find which part/group of regular expression fails

I need to find which part of expression fails.Let say I have a expression ^-?(\\d*)(,\\d{1,3})*(?:[,]|([.]\\d{0,2}))?$ And I want to know if it fails while matching comma (,) or decimal part . How I can find unmatched group in given regular expression
Break it in to smaller chunks and test that each part matches what you expect it to.
Also as #Avinash Raj has mentioned, online regex checkers like regex101 are indespensible.
These tools highlight what has and hasn't been matched in a given set of data. This will show you where the regex is failing.

Non-capturing group ignored by regex101.com

A question arrived from my previous question regarding regular expressions. I am stucked to understand the difference in results I get, and I am worrying if there may be a bug in parsing libraries or something else.
So the initial question was to replace all :/ in given string, except ones that may be inside tags in that given string. The initial string is
not feeling well today :/ check out this link http://example.com
I have tried to use the following regexp to replace only the first :/ in given example. To skip occurances inside tags non-capturing group is used:
/(?:<[^\/]*?.*?<\/.*?>)|(:\/)/g
What was most surprising is that this regexp gives different results depending on tool/language being used. Here's a brief summary of results I got
regex101.com shows 1 match!!
regexpal.com shows 2 matches
regexr.com shows 2 matches
regextester.com shows 2 matches
Below also a javascript snippet to check the same regexp, and the result, as you can see also is different from what supposed to be - 2 matches -> 2 replacements will occur.
var s = 'not feeling well today :/ check out this link http://example.com';
var replaced = s.replace(/(?:<[^\/]*?.*?<\/.*?>)|(:\/)/g, "smiley_image_here");
document.querySelector("pre").textContent = replaced;
<pre></pre>
It seems that non-capturing group is simply ignored.
So, what is wrong, why results differs and what is the correct regexp to solve initial question?
regex101 also returns 2 matches, as you can see on the label:
and the 2 different colors in the text
It is indeed a bit confusing if you look at the MATCH INFORMATION section. However, that is only intended to show you captures, not necessarily matches:
You may as well test this by replacing each match with some string:
https://regex101.com/r/kY6vI5/2
The non-capturing group is not ignored. It simply doesn't create a capture, but it is in fact matched.

Parsing units with javascript regex

Say I have a string which contains some units (which may or may not have prefixes) that I want to break into the individual units. For example the string may contain "Btu(th)" or "Btu(th).ft" or even "mBtu(th).ft" where mBtu(th) is the bastardised unit milli thermochemical BTU's (this is purely an example).
I currently have the following (simplified) regex however it fails for the case "mBtu(th).ft":
/(m|k)??(Btu\(th\)|ft|m)(?:\b|\s|$)/g
Currently this does not correctly detect the boundary between the end of 'Btu(th)' and the start of 'ft'. I understand javascript regex does not support look back so how do I accurately parse the string?
Additional notes
The regex presented above is greatly simplified around the prefixes and units groups. The prefixes could span multiple characters like 'Ki' and therefore character sets are not suitable.
The desire is for each group to catch the prefix match as group 1 and the unit as match two i.e for 'mBtu(th).ft' match one would be ['m','Btu(th)'] and match two would be ['','ft'].
The prefix match needs to be lazy so that the string 'm' would be matched as the unit metres rather than the prefix milli. Likewise the match for 'mm' would need to be the prefix milli and the unit metres.
I would try with:
/((m)|(k)|(Btu(\(th\))?)|(ft)|(m)|(?:\.))+/g
at least with example above, it matches all units merged into one string.
DEMO
EDIT
Another try (DEMO):
/(?:(m)|(k)|(Btu)|(th)|(ft)|[\.\(\)])/g
this one again match only one part, but if you use $1,$2,$3,$4, etc, (DEMO) you can extract other fragments. It ignores ., (, ), characters. The problem is to count proper matched groups, but it works to some degree.
Or if you accept multiple separate matches I think simple alternative is:
/(m|k|Btu|th|ft)/g
A word boundary will not separate two non-word characters. So, you don't actually want a word boundary since the parentheses and period are not valid word characters. Instead, you want the string to not be followed by a word character, so you can use this instead:
[mk]??(Btu\(th\)|ft|m)(?!\w)
Demo
I believe you're after something like this. If I understood you correctly that want to match any kind of element, possibly preceded by the m or k character and separated by parantheses or dots.
/[\s\.\(]*(m|k?)(\w+)[\s\.\)]*/g
https://regex101.com/r/eQ5nR4/2
If you don't care about being able to match the parentheses but just return the elements you can just do
/(m|k?)(\w+)/g
https://regex101.com/r/oC1eP5/1

Javascript regular expression (unbroken repetitions of a pattern)

Let's say that I have a given string in javascript - e.g., var s = "{{1}}SomeText{{2}}SomeText"; It may be very long (e.g., 25,000+ chars).
NOTE: I'm using "SomeText" here as a placeholder to refer to any number of characters of plain text. In other words, "SomeText" could be any plain text string which doesn't include {{1}} or {{2}}. So the above example could be var s = "{{1}}Hi there. This is a string with one { curly bracket{{2}}Oh, very nice to meet you. I also have one } curly bracket!"; And that would be perfectly valid.
The rules for it are simple:
It does not need to have any instances of {{2}}. However, if it does, then after that instance we cannot encounter another {{2}} unless we find a {{1}} first.
Valid examples:
"{{2}}SomeText"
"{{1}}SomeText{{2}}SomeText"
"{{1}}SomeText{{1}}SomeText{{2}}SomeText"
"{{1}}SomeText{{1}}SomeText{{2}}SomeText{{1}}SomeText"
"{{1}}SomeText{{1}}SomeText{{2}}SomeText{{1}}SomeText{{1}}SomeText"
"{{1}}SomeText{{1}}SomeText{{2}}SomeText{{1}}SomeText{{1}}SomeText{{2}}SomeText"
etc...
Invalid examples:
"{{2}}SomeText{{2}}SomeText"
"{{1}}SomeText{{2}}SomeText{{2}}SomeText"
"{{1}}SomeText{{2}}SomeText{{2}}SomeText{{1}}SomeText"
etc...
This seems like a relatively easy problem to solve - and indeed I could easily solve it without regular expressions, but I'm keen to learn how to do something like this with regular expressions. Unfortunately, I'm not even sure if "conditionals and lookaheads" is a correct description of the issue in this case.
NOTE: If a workable solution is presented that doesn't involve "conditionals and lookaheads" then I will edit the title.
It's probably easier to invert the condition. Try to match any text that contains two consecutive instances of {{2}}, and if it doesn't match that, it's good.
Using this strategy, your pattern can be as simple as:
/{\{2}}([^{]*){\{2}}/
Demonstration
This will match a literal {{2}}, followed by zero or more characters other than {, followed by a literal {{2}}.
Notice that the second { needs to be escaped, otherwise, the regex engine will consider the {2} as to be a quantifier on the previous { (i.e. {{2} matches exactly two { characters).
Just in case you need to allow characters like {, and between the two {{2}}, you can use a pattern like this:
/{\{2}}((?!{\{1}}).)*{\{2}}/
Demonstration
This will match a literal {{2}}, followed by zero or more of any character, so long as those characters create a sequence like {{1}}, followed by a literal {{2}}.
(({{1}}SomeText)+({{2}}SomeText)?)*
Broken down:
({{1}}SomeText)+ - 1 to many {{1}} instances (greedy match)
({{2}}SomeText)? - followed by an optional {{2}} instance
Then the whole thing is wrapped in ()* such that the sequence can appear 0 to many times in a row.
No conditionals or lookaheads needed.
You said you can have one instance of {2} first, right?
^(.(?!{2}))(.{2})?(?!{2})((.(?!{2})){1}(.(?!{2}))({2})?)$
Note if {2} is one letter replace all dots with [^{2}]

PHP or Javascript - Find a specific string

Finding a specific string is relatively easy, but I am not sure where to begin on this one. I would need to extract a string that would be different every time, but with similar characteristics.
Here are some example strings I need to find in a paragraph, either at the beginning, end or somewhere in the middle.
7b.9t.7iv.4x
4ir.4i.5i.6t
7ix.7t.4t.0z
As you can see the string will always begin with a number, and would have up to 2 characters after it and will always contain 4 octets separated by dots.
Let me know if you may need more details.
EDIT:
Thanks to the answer below I came up with this, while not pretty, does what I need.
$body="test 1f.9t.7iv.4x test 1a.9a.7ab.4xa test ";
$var=preg_match_all("([0-9][a-z]{1,2}\.[0-9][a-z]{1,2}\.[0-9][a-z]{1,2}\.[0-9][a-z]{1,2})",$body,$matches);
$count=count($matches[0]);
$stack = array();
while($count > 0){
$count--;
array_push($stack, "<span id='ip_".$matches[0][$count]."'>".$matches[0][$count]."</span>");
}
$stack=array_reverse($stack);
$body=str_replace($matches[0],$stack,$body);
You can use a regular expression.
Something like this to get you started. There may be a better way to match since it's repeated, but....
([0-9][a-z]{1,2}\.[0-9][a-z]{1,2}\.[0-9][a-z]{1,2}\.[0-9][a-z]{1,2})
( Start a capture group
[0-9] match any character 0 through 9
[a-z] match any character [a-z]
{1,2} but only match the previous 1 or 2 times
\. match a literal . the \ is needed as an escape because . is a special character
) End capture group
Both php and javascript allow for regular expression use.
For an even better visual representation you can check out this tool: http://www.debuggex.com/
If you need each octet by itself (as a match) you can add more parenthesis () around each [0-9][a-z]{1,2} which will then store those octets individually.
Also note that \d is the same as [0-9] but I prefer the later as I find it a little more readable.

Categories