RegExp word boundary with special characters (.) javascript - javascript

I want to find "U.S.A." (without the quotes) if it is in a string as a whole word.
So good strings should be
U.S.A.
u.s.a.
U.S.A. is a great country
Is this U.S.A. ?
Bad string is
U.S.A.mnop
U.S.A
I tried using
/\bU\.S\.A\.\b/i
But strangely the one that works is - (But it fails for other countries and so not useful)
/\bU\.S\.A\.\B/i
This seems opposite of my understanding from documentation and have searched this and there are lots of similar problems but none of them helped me understand the issue. I think that the last "." is being consumed by \b and hence not working but am still confused.
Can someone please help with explanation and the right search string ? It should also do proper word search of other strings without special characters.

You can check for the word boundary first (as you were doing), but the tricky part is at the end where you can't use the word boundary because of the .. However, you can check for a whitespace character at the end instead:
/\b(u\.s\.a\.)(?:\s|$)/gi
Check out the Regex101

Related

RegExp - Using the dollar sign to differentiate hex from decimal [duplicate]

I want to find "U.S.A." (without the quotes) if it is in a string as a whole word.
So good strings should be
U.S.A.
u.s.a.
U.S.A. is a great country
Is this U.S.A. ?
Bad string is
U.S.A.mnop
U.S.A
I tried using
/\bU\.S\.A\.\b/i
But strangely the one that works is - (But it fails for other countries and so not useful)
/\bU\.S\.A\.\B/i
This seems opposite of my understanding from documentation and have searched this and there are lots of similar problems but none of them helped me understand the issue. I think that the last "." is being consumed by \b and hence not working but am still confused.
Can someone please help with explanation and the right search string ? It should also do proper word search of other strings without special characters.
You can check for the word boundary first (as you were doing), but the tricky part is at the end where you can't use the word boundary because of the .. However, you can check for a whitespace character at the end instead:
/\b(u\.s\.a\.)(?:\s|$)/gi
Check out the Regex101

How to handle edge case while trying to grab all digits including decimal between two chars using regex

I am parsing a series of strings with various formats. The last edge case encountered has me stumped. I'm not a great regexer, believe me it was a challenge to get to this point.
Here are critical snippets from the strings I'm trying to parse. The second example is the current edge case I'm stuck on.
LBP824NW2-58.07789x43.0-207C72
LBP824WW1-77.6875 in. x 3.00 in. 24VDC
I am trying to grab all of the digits (including the decimal) that make up the width part of the dimension in the string (this would be the first number in the dimension). What works in every other case has been grabbing all digits from the "-" to the "x" using the following expression:
/-(\d+\.?\d+?)x\B/
However, this does not handle the cases that have inches included in the dimension. I thought about using "look-aheads" or "look-behinds", but I got confused. Any suggestions would be appreciated.
RegEx can be told to look for "zero or one" of things, using (...)? syntax, so if your pattern already works but it gets confused by a new pattern that simply has "more string data embedded in what is otherwise the same pattern" you can add in zero-or-one checks and you should be good to go.
In this case, putting something like (\s*in\.?\s*)? in a few tactical places to either match "any number of spaces (including none) followed by in followed by an optional full stop followed by any number of spaces (including none)" or nothing should work.
That said, "I cannot change the formatting" is almost never an argument, because while you can't change the formatting, you can almost always change what parses it. RegEx might be adequate, but some code that checks for what kind of general patter it is, and then calls the appropriate function for tokenizing and inspecting that specific string pattern should be quite possible. Unless you've been hired to literally update some predefined CLi script that has a grep in it and you're not allowed to touch anything except for the pattern...
This is the working solution using regex: -(\d+\.?\d+?)(\s*in\.?\s*|x)

Pain-free way to remove trailing commas in javascript

IE (<9?) doesn't tolerate trailing commas at the end of object or list.
I know this too late, after developing for a few months in Chrome.
Now I have to search for every place I put in a trail comma, this is really painful.
Is there any way(preferably automatic) to do this? Like a editor plugin, or some script that search and replace these commas with blank?
JSLint is an option, but it throws a lot of other warnings, and I have to paste in the scripts (which sometimes contain server-side template tags...).
Some examples would have been good and which editor you use.
Notepad++ and UltraEdit support both Perl regular expression replaces with back referencing.
So you could try a Perl regular expression replace searching with the expression ,(\s*?[)\]]) and using \1 as replace string.
This expression finds a comma before a closing round or square bracket with 0 or more spaces/tabs/line terminators between and keeps on replace everything except the comma.
You should run this replace manually on your JavaScript code with checking what is found before making the replace. And perhaps you need to run this replace several times in case of multiple commas at end of a list.

Regex will not match

This is my string:
<link href="/post?page=4&tags=example" rel="last" title="Last Page">
From there I am trying to obtain the 4 out of that page parameter, using this regular expression:
link href="/post?page=(.*?)&tags=(.*?)" rel="last"
I will then collect the 4 out of the first group, the tags parameter has a wildcard because the contents can change. However, I don't seem to be getting a match with this, can anyone help?
And I know I shouldn't be using regex to parse HTML, but this is just a small thing and it would be a waste to import a huge module for this.
Assuming you are using a /regex literal/, you will need to escape the / in that path as \/.
Alternatively, it depends on how you are getting this string. Is it really typed that way, or is it part of an innerHTML that you are then reading out again? If that's the case, then the innerHTML won't be what you expect it to be, because the browser will "normalise" it.
If it is an innerHTML, then it'd be far easier to get the tag, then get the tag's href attribute, then regex that.
link href="/post\?page=(.*?)&tags=(.*?)" rel="last"
You forgot the slash before ?
I think it might be better to change your capture groups to something a little different, but will catch everything up to the terminating character:
link href="/post?page=([^&]+)&tags=([^\"]+)" rel="last"
Using the negating character first in the character group tells the regex engine "capture all characters EXCEPT the ones listed here". This makes it very easy to capture everything up until it hits a termination character, such as the amperstand and double-quote. Assuming you're using PHP or Java, this should also slightly improve regex performance.
If the page parameter always comes first, try the PCRE /\?page=(\d+)/. Match group 1 will contain the page number.

Stuck on a regular expression

I seem to have backed myself up into a corner with this. I'm sure that the answer is going to make me want to smack a brick against my head, but I'm not all that good with regex just yet. So, here goes.
I need to modify this regex so that it fails if it finds any occurances of pound signs. (#)
My current regex is this;
/^[A-Za-z\.\-\_\s]{1,80}$/i
I tried a number of variations such as;
/[^#]^[A-Za-z\.\-\_\s]{1,80}$/i
/^[[^#]A-Za-z\.\-\_\s]{1,80}$/i
/^[A-Za-z\.\-\_\s^#]{1,80}$/i
None of which work. Can anyone offer any advise, please?
Your original regex should work, because # isn't in the list of characters you specified for the class. You don't need to add anything, it already fails if there's a # in there.
Just use two regexes:
/^[A-Za-z\.\-\_\s]{1,80}$/i
Then filter your input so that you keep only what does NOT match this regex:
/#/
It's far easier to match on patterns that you want to filter out (and then ignoring the matching strings instead of ignoring the complement) than it is to try to construct an "inverse" regex. And there's no reason why you should try to fit everything into one regex.

Categories