Best practice for specifying number of new lines in regex - javascript

Since long I am using this for specifying new line and it works fine.
(?:.*[\r\n]+){n,m}
Is this the best practice for doing this?

It's better to use (?:.*\r?\n){n,m} since [\r\n]+ in your regex should match one or more newline chars greedily. So it won't do the exact work you want.

Related

regex works in regex tester but not in pattern

Its quite a simple but in my opinion weird problem i basically have this regex and entered a few tests and they work.
(?=^\*)|(?=^.{1,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-\{\}]{1,63}\.?)+(?:[a-zA-Z\{\}]{1,})$)
https://regex101.com/r/hU6tP0/2
But when i try to use it in html it fails. But if i test it in javascript it works.
http://jsfiddle.net/ek6kby2q/9/
I don't have much knowledge at all about regex so maybe anyone know whats going wrong or got any tips to make the regex better is welcome.
As an html attribute, the pattern must match all the string from the beginning to the end, that's why (?=^\*) fails to do it, since it matches zero characters.
Use this pattern instead:
\*.*|(?!.{255})(?:[A-Za-z_{}-][\w{}-]{0,62}\.?)+[A-Za-z{}]+
(You can omit the anchors since they are implicit)

Best way to match and catch doubly-entified character entities/references?

I'm talking about stuff like & which will then render to: & when it actually should render to &. In this I asked how to match entities, but it seems that isn't really possible or realistic with regexes. What then is the best way to match double entities?
EDIT: Is this a good way to do it? .replace(/&(?=#?x?[0-9a-z]+);/i, '&');
(I'm using javascript)
I'd go with
pattern &([a-zA-Z0-9]+?;)\1
replacement &$1
to replace just double amps, or:
pattern &([#a-zA-Z0-9]+?;)
EDIT:
your pattern
/&(?=#?x?[0-9a-z]+);/i
looks also good to me.
Note: none of these is something you can trust
Possibly:
&[a-zA-Z]+;
Though not fool proof.
Normalize your data first. Use whatever you know about encoding to decode them back to form where character/piece of data have only one possible encoding. After that match this normalized data with normalized pattern.

How can I write a regex that will match any string?

Yes, I know this sounds counter-intuitive. But I need it for a JavaScript mashup written by someone else. There is a regex value used to select project names to which the mashup will apply. I want it to apply to all projects.
As others have mentioned, the . doesn't match line feeds.
If you must match absolutely every character including \n then you can use this instead...
[\s\S]*
That's whitespace characters, and non-whitespace characters. In other words, that'll match everything.
OK, folks. I'm not strong with regex, but I think I figured this out. I'm just using the following regex:
'.*'
This is working fine for me. The dot allows any character, and the asterisk allows it to be repeated any number of times.
If anybody knows how to limit this to a string that is a single line (ASCII Character set) I'm all ears.
Your answer (.*) works, and by default will match only a single line. If you wanted it to match multiple lines, you could enable multi-line mode in your particular regex implementation, but nobody enables it by default AFAIK.
You're correct that .* will find any character, however, the exception are newline characters ('\n', etc). What you could try is grouping. This: (.*) should work for you. If you need help with accessing matched group more info can be found here: How do you access the matched groups in a JavaScript regular expression?
EDIT: If you are using something other than JavaScript the implementation may differ slightly with multi vs single-line mode. JavaScript itself does not have single-ling mode.

How To Create This RegExp

I am looking to find this in a string: XXXX-XXX-XXX Where the X is any number.
I need to find this in a string using JavaScript so bonus points to those who can provide me the JavaScript too. I tried to create a regex and came out with this: ^[0-9]{4}\-[0-9]{3}\-[0-9]{3}$
Also, I would love to know of any cheat sheets or programs you guys use to create your regular expressions.
i suppose this is what you want:
\d{4}-\d{3}-\d{3}
in doubt? Google for "RegEx Testers"
With your attempt:
^[0-9]{4}\-[0-9]{3}\-[0-9]{3}$
Since the - is not a metacharacter, there is no need to escape it -- thus you are looking for explicit backslash characters.
Also, you've anchored the match at the beginning and end of the string -- this will match only strings that consist only of your number. (Well, assuming the rest were correct.)
I know most people like the {3} style of counting, but when the thing being matched is a single digit, I find this more legible:
\d\d\d\d-\d\d\d-\d\d\d
Obviously if you wanted to extend this to matching hexadecimal digits, extending this one would be horrible, but I think this is far more legible than alternatives:
\d{4}-\d{3}-\d{3}
[[:digit:]]{4}-[[:digit:]]{3}-[[:digit:]]{3}
[0-9]{4}-[0-9]{3}-[0-9]{3}
Go with whatever is easiest for you to read.
I tend to use the perlre(1) manpage as my main reference, knowing full well that it is far more featureful than many regexp engines. I'm prepared to handle the differences considering how conveniently available the perlre manpage is on most systems.
var result = (/\d{4}\-\d{3}\-\d{3}/).exec(myString);

Wipe a string but keep its middle part

With a string like "HorsieDoggieBirdie", is there a non-capturing regex replace that would kill "Horsie" and "Birdie", yet keep "Doggie" intact? I can only think of a capturing solution:
s/(Horsie)(Doggie)(Birdie)/$2/g
Is there a non-capturing solution like:
s/Horsie##Doggie##Birdie//g
where ## is some combination of regex codes? The specific problem is in JavaScript (innerHTML.replace) but I'll take Perl suggestions, too.
You don't have to capture the Horsie or the Birdie.
s/Horsie(Doggie)Birdie/$1/g;
A similar thing should work for Javascript as well. This is probably as efficient as it gets, and at least as fast as using look-around assertions; although you should benchmark it if you want to know for sure. (The results, of course, will depend on the horsies, doggies and birdies in question.)
Mandatory disclaimer: you should know what happens when you use regular expressions with HTML...
You can use Look-Around Assertions:
s/(?:Horsie(?=Doggie))|(?:(?<=Doggie)Birdie)//g;

Categories