Need help in regex pattern - javascript

i have this Regex pattern
\=[a-zA-Z\.\:\[\]_\(\)\&\$\%#\-\#\!0-9;=\?/\+\xBF\~]+[?\s+|?>]
and i have this HTML
1.esc#xyz.com
2.johnross#zys.com
3.johnross#wen.com
Here the problem is,
I need to avoid first and second as it has white space as well and it is valid attributes.
But only the third one is working as it does't has white spaces.
means nothing should be selected with the above pattern.
here is direct link to test
http://regexr.com?31r61
Please help!
Thanks,

EDIT:
If you just want to match unquoted attributes, this should work:
[<\s]+[\w]+(=[^\"][^\s>]*)
Kind of inelegant but let me know if that does what you want.
Which pattern are you trying to match? All three? And if so, which portion? The subject or the email? If you're just trying to match the subject, try using this as the pattern to match:
\=\"mailto:[^?]*\?subject=([^\"]*)\"\>
That will return a match where the group is the subject itself.

That is a wicked character class....
why don't you try something a bit more reasonable. Try this...
\=".*?(?<!\\)"
that will match anything in the parenthesis after href if that's what you're trying to get. If you're looking for more than that, this regex can easily by modified.

Related

Regex for - 'A,B','C'

I have written this regex -
([\s]*'[A-Za-z0-9_: ]*[\,]*[\s]*[A-Za-z0-9_: ]*\'[\s]*)[\,]*
But this is not handling the input - 'A,B' 'C' - In this the comma is missing, still its a perfect match.
Can anyone please help.
After giving this more thought, I think what you want is something more like this:
^(?<item>\'[a-zA-Z0-9,\s]+\')(\s*,(?&item))*\s*$
You're using an asterisk which will match zero instances. Try using + instead for the characters you want one or more of.
Please provide other examples that you'd expect to match. For this specific case, the following would match, but is very rigid and specific:
\'+[a-zA-Z]+\,\s*[a-zA-Z]+\'+\,\s*\'+[a-zA-Z]+\'+
Edit:
This is more in line with what I think you want:
^(\'[a-zA-Z]+(\,+\s*[a-zA-Z]+)*\'\s*\,*)*$

How to append string after matching field with regex

I want to append a word after <body> tag, it should not modify/replace anything other than just append a word. I have done something like this, is it valid do empty parenthesis fir second capture group will match everything?
/(<body[^>]*>)()/, `$1${my_variable}$2`)
The second capture group, designed to capture nothing, will match "nothing" - it will form a match immediately after your closed body tag. There's nothing wrong with doing this for the regex, though you might want to be wary of using [^>]* - this negated character class will gladly match across lines and grab as much input as it can. Handy for matching multi-line tags, but often very dangerous.
Also, if you're on linux and for some reason have > symbols in filenames (which is valid!) your regex will break horribly, as shown here.
That being said, valid regex or not, it's usually a bad idea to use regex with html, since HTML isn't a regular language. Also, you could accidentally summon Cthulhu.
let page = "<html><body>Some info</body></html>";
page.replace("<body>", `<body>${my_variable}`);
or
page.replace(/<body>|<BODY>/, `<body>${my_variable}`);
If in the broweser you can also use document.querySelector("body").innerHTML
Also depending on which framework you're using there are better ways to accomplish this.

Regex match all except a pattern

I need a little help. I already tried to practice in several ways, but it didn't work as expected. For example, this one.
I want to match all single words except the pattern <br> in JS.
So I tried
(?!<br>)[\s\S]
(?!<|b|r|>)[\s\S]
The problem I have is, in the ?! quote, it's matching either the first word, < only, not the entire pattern <br>. In reverse, just <br> can match all <br> expect any other words. How can I let it know I want to match the entire word in the ?! quote?
Thank you so much!
Here is what I am trying.
The regular expression you are looking for might look like this:
([^>]|<(?!br>)[^>]+>)+(?=<br>|$)
It should work for any tag, try replacing br by p in the above pattern.
Regex101 link
However, It would be much easier and readable and faster to use:
content.split('<br>').filter(x => x.length)
Hope it helps.

What's wrong with this regular expression to find URLs?

I'm working on a JavaScript to extract a URL from a Google search URL, like so:
http://www.google.com/search?client=safari&rls=en&q=thisisthepartiwanttofind.org&ie=UTF-8&oe=UTF-8
Right now, my code looks like this:
var checkForURL = /[\w\d](.org)/i;
var findTheURL = checkForURL.exec(theURL);
I've ran this through a couple regex testers and it seems to work, but in practice the string I get returned looks like this:
thisisthepartiwanttofind.org,.org
So where's that trailing ,.org coming from?
I know my pattern isn't super robust but please don't suggest better patterns to use. I'd really just like advice on what in particular I did wrong with this one. Thanks!
Remove the parentheses in the regex if you do not process the .org (unlikely since it is a literal). As per #Mark comment, add a + to match one or more characters of the class [\w\d]. Also, I would escape the dot:
var checkForURL = /[\w\d]+\.org/i;
What you're actually getting is an array of 2 results, the first being the whole match, the second - the group you defined by using parens (.org).
Compare with:
/([\w\d]+)\.org/.exec('thisistheurl.org')
→ ["thisistheurl.org", "thisistheurl"]
/[\w\d]+\.org/.exec('thisistheurl.org')
→ ["thisistheurl.org"]
/([\w\d]+)(\.org)/.exec('thisistheurl.org')
→ ["thisistheurl.org", "thisistheurl", ".org"]
The result of an .exec of a JS regex is an Array of strings, the first being the whole match and the subsequent representing groups that you defined by using parens. If there are no parens in the regex, there will only be one element in this array - the whole match.
You should escape .(DOT) in (.org) regex group or it matches any character. So your regex would become:
/[\w\d]+(\.org)/
To match the url in your example you can use something like this:
https?://([0-9a-zA-Z_.?=&\-]+/?)+
or something more accurate like this (you should choose the right regex according to your needs):
^https?://([0-9a-zA-Z_\-]+\.)+(com|org|net|WhatEverYouWant)(/[0-9a-zA-Z_\-?=&.]+)$

Find uppercase substrings and wrap with acronym tags

For example replace the string Yangomo, Congo, DRC with Yangomo, Congo, <acronym>DRC</acronym>. There may potentially be mulitple uppercase substings in each string. I assume some form of regex?
Thanks.
Well, a really simple one might be:
var replaced = original.replace(/\b([A-Z]+)\b/g, '<acronym>$1</acronym>');
Doing this sort of thing always has complications, however; it depends on the source material. (The "\b" thing matches word boundaries, and is an invaluable trick for all sorts of occasions.)
edit — insightful user Buh Buh points out that it might be nice to only affect strings with more than two characters, which would look like /\b([A-Z]{2,})\b/.
Personally I would use PHP to explode the string, use a regex to find all uppercase letters /[A-Z]+/ and then use PHP to insert the tags (using str_replace).

Categories