Finding a specific string is relatively easy, but I am not sure where to begin on this one. I would need to extract a string that would be different every time, but with similar characteristics.
Here are some example strings I need to find in a paragraph, either at the beginning, end or somewhere in the middle.
7b.9t.7iv.4x
4ir.4i.5i.6t
7ix.7t.4t.0z
As you can see the string will always begin with a number, and would have up to 2 characters after it and will always contain 4 octets separated by dots.
Let me know if you may need more details.
EDIT:
Thanks to the answer below I came up with this, while not pretty, does what I need.
$body="test 1f.9t.7iv.4x test 1a.9a.7ab.4xa test ";
$var=preg_match_all("([0-9][a-z]{1,2}\.[0-9][a-z]{1,2}\.[0-9][a-z]{1,2}\.[0-9][a-z]{1,2})",$body,$matches);
$count=count($matches[0]);
$stack = array();
while($count > 0){
$count--;
array_push($stack, "<span id='ip_".$matches[0][$count]."'>".$matches[0][$count]."</span>");
}
$stack=array_reverse($stack);
$body=str_replace($matches[0],$stack,$body);
You can use a regular expression.
Something like this to get you started. There may be a better way to match since it's repeated, but....
([0-9][a-z]{1,2}\.[0-9][a-z]{1,2}\.[0-9][a-z]{1,2}\.[0-9][a-z]{1,2})
( Start a capture group
[0-9] match any character 0 through 9
[a-z] match any character [a-z]
{1,2} but only match the previous 1 or 2 times
\. match a literal . the \ is needed as an escape because . is a special character
) End capture group
Both php and javascript allow for regular expression use.
For an even better visual representation you can check out this tool: http://www.debuggex.com/
If you need each octet by itself (as a match) you can add more parenthesis () around each [0-9][a-z]{1,2} which will then store those octets individually.
Also note that \d is the same as [0-9] but I prefer the later as I find it a little more readable.
Related
I need to parse an email.subject line and look for the CaseID-
The CaseID- is sometimes at the beginning at sometimes at the end so I need to pull it out. I do not know the length of the CaseID- but I do know it is all digits run together.
Here are sample subject lines:
[EXTERNAL] SupportCentral Case Ownership, CASEID-372146
[EXTERNAL] CaseID-372128, SupportCentral Case Transferred, Testing Dispatch
How do I do this?
Thank you,
Stacy
Welcome to Stack Overflow. This is a job for a regular expression, as #gview mentioned.
const subj = 'Subject: please refer to CaseID-123456 in future correspondence'
const caseNumber = subj.replace(/^.*CaseID-(\d+).*$/,'$1')
console.log(caseNumber)
You can read about regular expressions here.
The regular expression in this case is /^.*CaseID-(\d+).*$/.
/ means the start of the regex
^ means match the start of the string you give it.
. means match any character
.* means match zero or more of any character
CaseID- means match that text
( means begin a so-called capture group
\d means match any numeric digit [0-9]
\d+ means match one or more numeric digits
) ends the capture group
.* again means match zero or more of any character
$ means match the end of the string.
/ ends the regex
And, in the second parameter we have '$1' which means replace the string with the first capture group.
And shazam, you have your case id number. 123456 in my example.
In the old days we used to say that the best way to write regular expressions was to whistle into a modem. Now, writing them involves cats on keyboards.
EDIT
With the given clarity of the new OP edit, the other answer is more fitting. However, I did not have that information when I made my answer. I'd defer to that one then.
const id = "CaseID-23453245643245642345632456543245-CaseID"
const regex = /^.*CaseID-(\d+).*$/;
const decode = id.replace(regex,'$1')
console.log(decode);
I have a filename that will be something along the lines of this:
Annual-GDS-Valuation-30th-Dec-2016-082564K.docx
It will contain 5 numbers followed by a single letter, but it may be in a different position in the file name. The leading zero may or may not be there, but it is not required.
This is the code I come up with after checking examples, however SelectedFileClientID is always null
var SelectedFileClientID = files.match(/^d{5}\[a-zA-Z]{1}$/);
I'm not sure what is it I am doing wrong.
Edit:
The 0 has nothing to do with the code I am trying to extract. It may or may not be there, and it could even be a completely different character, or more than one, but has nothing to do with it at all. The client has decided they want to put additional characters there.
There are at least 3 issues with your regex: 1) the pattern is enclosed with anchors, and thus requires a full string match, 2) the d matches a letter d, not a digit, you need \d to match a digit, 3) a \[ matches a literal [, so the character class is ruined.
Use
/\d{5}[a-zA-Z]/
Details:
\d{5} - 5 digits
[a-zA-Z] - an ASCII letter
JS demo:
var s = 'Annual-GDS-Valuation-30th-Dec-2016-082564K.docx';
var m = s.match(/\d{5}[a-zA-Z]/);
console.log(m[0]);
All right, there are a few things wrong...
var matches = files.match(/\-0?(\d{5}[a-zA-Z])\.[a-z]{3,}$/);
var SelectedFileClientID = matches ? matches[1] : '';
So:
First, I get the matches on your string -- .match()
Then, your file name will not start with the digits - so drop the ^
You had forgotten the backslash for digits: \d
Do not backslash your square bracket - it's here used as a regular expression token
no need for the {1} for your letters: the square bracket content is enough as it will match one, and only one letter.
Hope this helps!
Try this pattern , \d{5}[a-zA-Z]
Try - 0?\d{5}[azA-Z]
As you mentioned 0 may or may not be there. so 0? will take that into account.
Alternatively it can be done like this. which can match any random character.
(\w+|\W+|\d+)?\d{5}[azA-Z]
I want to check by regex if:
String contains number
String does not contain special characters (!<>?=+#{}_$%)
Now it looks like:
^[^!<>?=+#{}_$%]+$
How should I edit this regex to check if there is number anywhere in the string (it must contain it)?
you can add [0-9]+ or \d+ into your regex, like this:
^[^!<>?=+#{}_$%]*[0-9]+[^!<>?=+#{}_$%]*$
or
^[^!<>?=+#{}_$%]*\d+[^!<>?=+#{}_$%]*$
different between [0-9] and \d see here
Just look ahead for the digit:
var re = /^(?=.*\d)[^!<>?=+#{}_$%]+$/;
console.log(re.test('bob'));
console.log(re.test('bob1'));
console.log(re.test('bob#'))
The (?=.*\d) part is the lookahead for a single digit somewhere in the input.
You only needed to add the number check, is that right? You can do it like so:
/^(?=.*\d)[^!<>?=+#{}_$%]+$/
We do a lookahead (like peeking at the following characters without moving where we are in the string) to check to see if there is at least one number anywhere in the string. Then we do our normal check to see if none of the characters are those symbols, moving through the string as we go.
Just as a note: If you want to match newlines (a.k.a. line breaks), then you can change the dot . into [\W\w]. This matches any character whatsoever. You can do this in a number of ways, but they're all pretty much as clunky as each other, so it's up to you.
Say I have a string which contains some units (which may or may not have prefixes) that I want to break into the individual units. For example the string may contain "Btu(th)" or "Btu(th).ft" or even "mBtu(th).ft" where mBtu(th) is the bastardised unit milli thermochemical BTU's (this is purely an example).
I currently have the following (simplified) regex however it fails for the case "mBtu(th).ft":
/(m|k)??(Btu\(th\)|ft|m)(?:\b|\s|$)/g
Currently this does not correctly detect the boundary between the end of 'Btu(th)' and the start of 'ft'. I understand javascript regex does not support look back so how do I accurately parse the string?
Additional notes
The regex presented above is greatly simplified around the prefixes and units groups. The prefixes could span multiple characters like 'Ki' and therefore character sets are not suitable.
The desire is for each group to catch the prefix match as group 1 and the unit as match two i.e for 'mBtu(th).ft' match one would be ['m','Btu(th)'] and match two would be ['','ft'].
The prefix match needs to be lazy so that the string 'm' would be matched as the unit metres rather than the prefix milli. Likewise the match for 'mm' would need to be the prefix milli and the unit metres.
I would try with:
/((m)|(k)|(Btu(\(th\))?)|(ft)|(m)|(?:\.))+/g
at least with example above, it matches all units merged into one string.
DEMO
EDIT
Another try (DEMO):
/(?:(m)|(k)|(Btu)|(th)|(ft)|[\.\(\)])/g
this one again match only one part, but if you use $1,$2,$3,$4, etc, (DEMO) you can extract other fragments. It ignores ., (, ), characters. The problem is to count proper matched groups, but it works to some degree.
Or if you accept multiple separate matches I think simple alternative is:
/(m|k|Btu|th|ft)/g
A word boundary will not separate two non-word characters. So, you don't actually want a word boundary since the parentheses and period are not valid word characters. Instead, you want the string to not be followed by a word character, so you can use this instead:
[mk]??(Btu\(th\)|ft|m)(?!\w)
Demo
I believe you're after something like this. If I understood you correctly that want to match any kind of element, possibly preceded by the m or k character and separated by parantheses or dots.
/[\s\.\(]*(m|k?)(\w+)[\s\.\)]*/g
https://regex101.com/r/eQ5nR4/2
If you don't care about being able to match the parentheses but just return the elements you can just do
/(m|k?)(\w+)/g
https://regex101.com/r/oC1eP5/1
Learning regex but this one gives me a headache. I need to match a float number (with either . or , as decimal point) and it MUST end with the following characters: €/g.
Valid matches should be for example:
40€/g
43.33€/g
40,2€/g
40.2€/g
38.943€/g
Appreciate help..
The regex will look like:
\d+(?:[.,]\d+)?€/g
In Javascript, as a regex object (note that the forward slash needs to be escaped):
/\d+(?:[.,]\d+)?€\/g/
Here's a breakdown of what each part does:
\d+ # one or more digits
(?: # ... don't capture this group separately
[.,] # decimal point
\d+ # one or more digits
)? # make the group optional
€/g # fixed string to match
If you want to allow something like .123€/g to be valid as well, you can use:
(?=[.,]|\d)(?:\d+)?(?:[.,]\d+)?€/g
That is, both the groups of digits are optional, but at least one must be present (this uses lookahead, which is a bit more tricky).
Note that this will also match constructions like 'word2€/g'. If you want to prevent this, start the regex with (?<=^|\s) (matches if preceded by a space or the start of the string) and end it with (?=$|\s) (matches if followed by a space or the end of the string).
Full-blown version:
(?<=^|\s)(?=[.,]|\d)(?:\d+)?(?:[.,]\d+)?€/g(?=$|\s)
\d+([.,]\d+)?€/g
should work, I guess.
Are you really sure you need a regex for this? It might be easier to instead leverage the builtin floating point parsing that is available: take whatever comes before the euro sign, normalize commas to decimals (or vice versa, whatever ends up working) and then try to parse it with the Number function. Note that you would need to check if the conversion worked with the Number.isNaN function.
Another possibility is to just use the parseFloat function. Since it ignores any characters after the numbers then it would parse "40€ as 40.0. However, it might not be what you want since it would also allow things like "40a" and "40b" as well.