Matching varying length in JS Regex - javascript

Let me explain my query with an example:
Am capturing page name from a web site. Due to design, the page name can be of varying length:
It can be
Data1|Data2|Data3
Data1|Data2|Data3|Data4
Data1|Data2
I need to write a Regex which comes true on all the above scenarios. I have something below shared by a previous user:
/(.*?)\|(.*?)\|(.*?)\|(.*)/gm;
The above works well when the string is always of four group, and there is a blank in between. But if I just have two values the regex fails. Can any user please guide?

Not sure what you meant there but does this help? But it will only accept alphanumeric values and a space
/([a-zA-Z 0-9]{1,}\|){1,}[a-zA-Z 0-9]{1,}/g

This will expect at less two Data field, and at most 4 fields
/(?:([^|]*)\|){1,3}([^|]*)/gm;
If you also want only one field (no pipe):
/(?:([^|]*)\|){,3}([^|]*)/gm;
{n,m} means allowed to repeat n trhough m times
Notice how I used [^|]* instead of .*?, so I match anything but the pipe |, also I used non matching groups (?:) so the groups that includes the pipes are invisible, i.e. you can get the fields as get them before

Related

How can I write a regular expression that matches everything between the first and the last quote?

I try to match multiple values between quotes
(these values can be anything but spaces)
the best I can achieve is to match everything between the first and the last quote
I already checked many SO answers, yet I cannot make it work
here is the regex
\[\[\[(\w*img\w*)\s(\w*id|url\w*)+="([^"]|.*)"\]\]\]
here is the string I try to match (values are numbers but I could have urls or anything similar)
[[[img id="37" w="100" h="70"]]]
I should get all parameters and their respecting values, but I get only one parameter with the value beeing 37" w="100" h="70
I know I am close, but this one is tricky
regards
I don't think you need all the \w.
And I also would suggest splitting the task in two parts as suggested in a comment.
However, I also see an option in doing it in just one step:
\[\[\[img(?:\s(\w+)="([^"]+)")?(?:\s(\w+)="([^"]+)")?(?:\s(\w+)="([^"]+)")?\]\]\]
This is basically the wrapper [[[]]], a normal character part img and then (?:\s(\w+)="([^"]+)")? repeated as many times as you expect attributes to appear. (\w+) matches the name of the attribute and ([^"]+) its value.

Javascript - how to use regex process the following complicated string

I have the following string that will occur repeatedly in a larger string:
[SM_g]word[SM_h].[SM_l] "
Notice in this string after the phrase "[SM_g]word[Sm_h]" there are three components:
A period (.) This could also be a comma (,)
[SM_l]
"
Zero to all three of these components will always appear after "[SM_g]word[SM_h]". However, they can also appear in any order after "[SM_g]word[SM_h]". For example, the string could also be:
[SM_g]word[SM_h][SM_l]"
or
[SM_g]word[SM_h]"[SM_l].
or
[SM_g]word[SM_h]".
or
[SM_g]word[SM_h][SM_1].
or
[SM_g]word[SM_h].
or simply just
[SM_g]word[SM_h]
These are just some of the examples. The point is that there are three different components (more if you consider the period can also be a comma) that can appear after "[SM_h]word[SM_g]" where these three components can be in any order and sometimes one, two, or all three of the components will be missing.
Not only that, sometimes there will be up to one space before " and the previous component/[SM_g]word[SM_h].
For example:
[SM_g]word[SM_h] ".
or
[SM_g]word[SM_h][SM_l] ".
etc. etc.
I am trying to process this string by moving each of the three components inside of the core string (and preserving the space, in case there is a space before &\quot; and the previous component/[SM_g]word[SM_h]).
For example, [SM_g]word[SM_h].[SM_l]" would turn into
[SM_g]word.[SM_l]"[SM_h]
or
[SM_g]word[SM_h]"[SM_l]. would turn into
[SM_g]word"[SM_l].[SM_h]
or, to simulate having a space before "
[SM_g]word[SM_h] ".
would turn into
[SM_g]word ".[SM_h]
and so on.
I've tried several combinations of regex expressions, and none of them have worked.
Does anyone have advice?
You need to put each component within an alternation in a grouping construct with maximum match try of 3 if it is necessary:
\[SM_g]word(\[SM_h])((?:\.|\[SM_l]| ?"){0,3})
You may replace word with .*? if it is not a constant or specific keyword.
Then in replacement string you should do:
$1$3$2
var re = /(\[SM_g]word)(\[SM_h])((?:\.|\[SM_l]| ?"){0,3})/g;
var str = `[SM_g]word[SM_h][SM_l] ".`;
console.log(str.replace(re, `$1$3$2`));
This seems applicable for your process, in other word, changing sub-string position.
(\[SM_g])([^[]*)(\[SM_h])((?=([,\.])|(\[SM_l])|( ?&\\?quot;)).*)?
Demo,,, in which all sub-strings are captured to each capture group respectively for your post processing.
[SM_g] is captured to group1, word to group2, [SM_h] to group3, and string of all trailing part is to group4, [,\.] to group5, [SM_l] to group6, " ?&\\?quot;" to group7.
Thus, group1~3 are core part, group4 is trailing part for checking if trailing part exists, and group5~7 are sub-parts of group4 for your post processing.
Therefore, you can get easily matched string's position changed output string in the order of what you want by replacing with captured groups like follows.
\1\2\7\3 or $1$2$7$3 etc..
For replacing in Javascript, please refer to this post. JS Regex, how to replace the captured groups only?
But above regex is not sufficiently precise because it may allow any repeatitions of the sub-part of the trailing string, for example, \1\2\3\5\5\5\5 or \1\2\3\6\7\7\7\7\5\5\5, etc..
To avoid this situation, it needs to adopt condition which accepts only the possible combinations of the sub-parts of the trailing string. Please refer to this example. https://regex101.com/r/6aM4Pv/1/ for the possible combinations in the order.
But if the regex adopts the condition of allowing only possible combinations, the regex will be more complicated so I leave the above simplified regex to help you understand about it. Thank you:-)

Match optional domain within string

I've racked my brain over this JS regex and have so far only managed to get parts of it to work or the whole thing to work in certain circumstances.
I have a string like this:
Some string<br>http://anysubdomain.particulardomain.com<br>Rest of string
The goal is to move the domain part to the end of the string, if it's there. The http part is also optional and can also be https. The TLD is always particulardomain.com, the subdomain can be anything.
I've managed to get everything into capture groups when the domain with protocol is present with this regex:
(.*)(https?\:\/\/[a-z\d\-]*\.particulardomain\.com)(.*)
But any attempt at making the domain part and the protocol part within it optional has resulted in no or the wrong matches.
The end result I'm looking for is to have the three parts of the string – beginning, domain, end – in separate capture groups so I can move capture group 2 (the domain part) to the end, or, if there's no domain present, the whole string in the first capture group.
To clarify, here are some examples with the expected output/capture groups:
INPUT:
Some string<br>http://anysubdomain.particulardomain.com<br>Rest of string
OR (no protocol):
Some string<br>anysubdomain.particulardomain.com<br>Rest of string
OUTPUT:
$1: Some string<br>
$2: http://anysubdomain.particulardomain.com
$3: <br>Rest of string
INPUT:
Some string<br>Rest of string
OUTPUT:
$1: Some string<br>Rest of string
$2: empty
$3: empty
One mistake in your regex is that it contains only particular whereas
the source text contains particulardomain, but this is a detail.
Now let's move to the protocol part. You put only one ? (after s),
which means that only s is optional, but both http and :
are still required.
To make the whole protocol optional, you must:
enclose it with a group (either capturing or not),
make this group optional (put ? after it).
And now maybe the most important thing: Your regex starts with (.*).
Note that it is greedy version, which:
initially tries to capture the whole rest of source string,
then moves back one char by one, to allow matching by the
following part of regex.
Change it to reluctant version (.*?) and then optional
group (https?:)? will match as expected.
Another detail: \ before : is not needed. It does not do
any harm either, but due to the principle "Keep It Simple...",
I recommend to delete it (as I did above).
One more detail: After [a-z\d\-] (subdomain part) you should put
+, not *, as this part may not be empty.
So the whole regex can be:
(.*?)((https?:)?\/\/[a-z\d\-]+\.particulardomain\.com)(.*)
And the last remark: I am in doubt, whether you really need three
capturing groups. Maybe it would be enough to leave only the content
of the middle capturing group, i.e.:
(https?:)?\/\/[a-z\d\-]+\.particulardomain\.com
Found a solution. Since, as stated, the goal is to move the domain to the end of the string, if it's present, I'm just matching the domain and anything after it. If there's no domain, nothing matches and hence nothing gets replaced. The problem was the two .* both at the beginning and the end of the regex. Only the one at the end is needed.
REGEX:
([a-z\d\-:\/]+\.particulardomain\.com)(.*)
Works for the following strings:
Domain present:
Start of string 1234<br>https://subdomain.particulardomain.com<br>End of string 999
Domain without protocol:
Start of string 1234<br>subdomain.particulardomain.com<br>End of string 999
No domain:
Start of string 1234<br>End of string 999
Thanks everyone for helping me rethink the problem!
I see good answer here, as you explained you need three group and set the domain to the back of the string(to be clear the entire url or only the domain e.g particulardomain.com)
You can do this:
//Don't know if the <br> tag matter for you problem, suppose it not
//this is you input
let str = "Start of string 1234<br>https://subdomain.particulardomain.com<br>End of string 99";
let group = str.split(<br>);
let indexOfDomain;
/*moere code like a for loop or work with a in-build funcion of the array with the regExp you made /[a-z\d\-:\/]+\.particulardomain\.com/ you can validated the domain separately.
}
TO HAVE IN MIND:
With your solution will not work at 100%, why?
your regExp:
([a-z\d\-:\/]+\.particulardomain\.com)(.*)
will mach a http, https, *(any other thing that is not a protocol) and will not work for this input you can test if you like and do a comment
Start of string 1234<br>End of string 999
The regExp that #Valdi_Bo answer:
(.*?)((https?:)?\/\/[a-z\d\-]+\.particulardomain\.com)(.*)
will fit to the what you described in the question
This regExp don't fit all yours input maybe he did not test it for all your input as you did not explained in your question like you did in your own answer
In conclusion at the end you need to extract the domain (wich don't know if is the entire url as you mix up the idea). If you are not going to use the do a split and then validated the regExp it will be more easy

Replace Numbers with dots

I am trying to replace some ID numbers in my system to clickable number to open the related record. The problem is, that they are sometimes in this format: 123.456.789.
When I use my regex, I can replace them and it works fine. The problem accurse when I also have IP addresses where the regex also matches: 123.[123.123.123] (the [] indicates where it matches).
How I can I prevent this behavior?
I tried something like this: /^(?!\.)([0-9]{3}\.[0-9]{3}\.[0-9]{3})(?!\.)/
I am working on "notes" in a ticket system. When the note contains only the ID or an IP, the regexp is working. When it contains more text like:
Affected IDs:
641.298.855 (this, lead)
213.794.868
948.895.285
Then it is not matching anymore on my IDs. Could you help me with this issue and explain what I am doing wrong?
Add gm modifier:
/^(?!\.)([0-9]{3}\.[0-9]{3}\.[0-9]{3})(?!\.)/gm
https://regex101.com/r/pK1fV4/2
You don't need to use negative lookahead at the start and also you don't need to include g modifier, just m modifier would be enough for this case because ^ matches the start of a line and the following pattern will match the string which exists only at the start so it won't do any global match (ie, two or more matches in a single line).
/^([0-9]{3}\.[0-9]{3}\.[0-9]{3})(?!\.)/m
For the sake of performance, you further don't need to use capturing group.
/^[0-9]{3}\.[0-9]{3}\.[0-9]{3}(?!\.)/m

Regular expression for Phone Numbers with different lengths

I searched on Google for phone number regex validations but haven't been able to make it work based on my requirements.
Basically, I have three separate sets of rules for the prefix:
For 10 digit numbers I need to make sure the first 3 are numbers starting from 2-9.
For 11 digit numbers I need to make sure the first 4 are numbers starting from 1-9.
For for anything greater than 12 digits I need to make sure the first 7 are numbers from 0-9.
After that I can allow letters like 1888GOSUPER or something like that (this would fall under the second condition)
This is what I have so far but I am not certain if I have covered everything:
var reg10 = /^[2-9]{3}[a-z0-9]+$/i;
var reg11 = /^[1-9]{4}[a-z0-9]+$/i;
var reg12plus = /^[0-9]{7}[a-z0-9]+$/i;
This can be handled by one regex (including your check for length, as suggested by others). Probably can be done more succinctly than this, but I feel this is more readable in the context of your 3 specifically separate prefix requirements:
^(?:[2-9]{3}[a-z0-9]{7})$|^(?:[1-9]{4}[a-z0-9]{7})$|^(?:[0-9]{7}[a-z0-9]{5,})$
Basically combines your three separate cases via "alternation" |
This can be "normalised" slightly, without "breaking" the clarity of intent, by grouping the entire expression and then surrounding with start/end anchors (rather than repeating these in each option, as above). Although this results in a similar length rule overall, by the time we add our additional non-capturing group:
^(?:(?:[2-9]{3}[a-z0-9]{7})|(?:[1-9]{4}[a-z0-9]{7})|(?:[0-9]{7}[a-z0-9]{5,}))$

Categories