Capture words not followed by symbol - javascript

I need to capture all (english) words except abbreviations whose pattern are:
"_any-word-symbols-including-dash."
(so there is underscore in the beginning and dot in the end an any letters and dash in the middle)
I tried smthing like this:
/\b([A-Za-z-^]+)\b[^\.]/g
but i seems that I don't understand how to work with negative matches.
UPDATE:
I need not just to match but wrap the words in some tags:
"a some words _abbr-abrr. a here" I should get:
<w>a</w> <w>some</w> <w>words</w> _abbr-abbr. <w>a</w> <w>here</w>
So I need to use replace with correct regex:
test.replace(/correct regex/, '<w>$1</w>')

Negative lookahead is (?!).
So you can use:
/\b([^_\s]\w*(?!\.))\b/g
Unfortunately, there is no lookbehind in javascript, so you can't do similar trick with "not prefixed by _".
Example:
> a = "a some words _abbr. a here"
> a.replace(/\b([^_\s]\w*(?!\.))\b/g, "<w>$1</w>")
"<w>a</w> <w>some</w> <w>words</w> _abbr. <w>a</w> <w>here</w>"
Following your comment with -. Updated regex is:
/\b([^_\s\-][\w\-]*(?!\.))\b/g
> "abc _abc-abc. abc".replace(/\b([^_\s\-][\w\-]*(?!\.))\b/g, "<w>$1</w>")
"<w>abc</w> _abc-abc. <w>abc</w>"

Related

Javascript regex select word without special characters

I want to match word without special characters(dot, quotes, etc.) or whitespaces. The text I have
"üstlenmeyeceğimizin üst "ürünlerin daha sağlıklı ve zamanında ulaşabilmesi süstlenmeyeceğimizin şehirlerarası otobüs şirketleriyle çalıştığımızı fakat ısrarınız üstüne oluşabilecek gecikme veya sorunları üstlenmeyeceğimizin teyidini alarak kargoyla gönderim sağladık." üstlenmeyeceğimizin ğtest atest üstlenmeyeceğimizind. test test üst şüst a ğüst .üst üst.büst she sells seashells tüst atest ni ani grüst
asla ısrar etmedim ve ürünlerin sağlığı için i yi olduğuna dair bi r bilgilendirme yapılmadı.
I want to select üst from this text but there is some different situations like below.
I don't want match words listed below
ğüst
şüst
üstlenmeyeceğimizin
"üstlenmeyeceğimizin
I want select those listed words
.üst
üst (there is whitespace before word)
"üst
I wrote this regex: [^a-zçğşöü]üst(?![a-zçğşöü]) but this regex selects word with special characters. I don't want special characters.
Shortly I don't want select if word has any leading letters or whitespace but if there is any special character leading the word I want to select it without this special character
If you do really want only those 3 words:
I want select those listed words
.üst
üst (there is whitespace before word)
"üst
As you have asked in your question, then should be enough:
[" .]üst\b
demo: https://regex101.com/r/kfMZxr/1/
If now you want to include whole words in your matches use:
(?!şğ)[^\s]*üst(?!lenmeyeceğimizin)[^\s]*
https://regex101.com/r/kfMZxr/2
This will allow matches as tüst and grüst in your text as well as büst and üstüne.
(based on: http://www.ecma-international.org/ecma-262/9.0/index.html)
there is no "üst in your text, you mean "üstlenmeyeceğimizin ?
any way check this [.\s"]üst\b you can test it
here
I need to match words and replace them with html tags, for example:
üst -> <b>üst</b>
I solved my problem with this regex
([^a-zçğşöü])üst(?![a-zçğşöü])
This regex also matches whitespaces and special characters but I grouped them in regex then excluded from html tags when replacing.
Regex test: https://regexr.com/4amj3
Working example: https://codepen.io/asipek/pen/xBQGPK?editors=0011

How to search for a string with regex and change only part of it?

How do you match using /[a-z].\s[a-z]/g but only want to change the period to a comma?
For example:
"asdf. Aasdfcs. adGDS$gGB. Basdf".replace(/[a-z]\.\s[a-z]/g, ", ")
This will match "s. a", but I want the result to be "asdf. Aasdfcs, adGDS$gGB. BasdfB" without altering the letters, just the period to a comma. Anything could come before or after this string.
By using capture groups, of course! Use parenthesis to form a group and $n to access the group:
"asdfcs. adGDS$gGB".replace(/([a-z])\.\s([a-z])/g, "$1, $2");
For your convenience, see the result with this snippet:
alert("asdfcs. adGDS$gGB".replace(/([a-z])\.\s([a-z])/g, "$1, $2"));

RegEx - Match Character only when it's not proceeded or followed by same character

How would I match the quotations around "text" in the string below and not around "TEST TEXT" using RegEx. I wanted just quotations only when they are by themselves. I tried a negative lookahead (for a second quote) but it still captured the second of the two quotes around TEST TEXT.
This is some "text". This is also some ""TEST TEXT""
Be aware that I need this to scale so sometimes it would be right in the middle of a string so something like this:
/(\s|\w)(\")(?!")/g (using $2...)
Would work in this example but not if the string was:
This is some^"text".This is also some ""TEST TEXT""
I just need quotation marks by themselves.
EDIT
FYI, this needs to be Javascript RegEx so lookbehind would not be an option for me for this one.
Since you have not tagged any particular flavor of regex I am takig liberty of using lookbehind also. You can use:
(?<!")"(?!")[^"]*"
RegEx Demo
Update: For working with Javascript you can use this regex:
/""[^"]*""|(")([^"]*)(")/
And use captured group # 1 for your text.
RegEx Demo
I'm not sure if I really understood well your needs. I'll post this answer to check if it helps you but I can delete it if it doesn't.
So, is this what you want using this regex:
"\w+?"
Working demo
By the way, if you just want to get the content within "..." you can use this regex:
"(\w+?)"
Working demo
You can't do this with a pure JavaScript regexp. I am going to eat my words now however, as you can use the following solution using callback parameters:
var regex = /""+|(")/g
replaced = subject.replace(regex, function($0, $1) {
if ($1 == "\"") return "-"; // What to replace to?
else return $0;
});
"This is some -text-. This is also some ""TEST TEXT"""
If you're needing the regex to split the string, then you can use the above to replace matches to something distinctive, then split by them:
var regex = /""+|(")/g
replaced = subject.replace(regex, function($0, $1) {
if ($1 == "\"") return "☺";
else return $0;
});
splits = replaced.split("☺");
["This is some ", "text", ". This is also some ""TEST TEXT"""]
Referenced by:http://www.rexegg.com/regex-best-trick.html

Complex Regex composition - Regex that match "if"

I'm making a Regex to match hashtags to my project. I want that regex match hashtags that are separeted by one single space, don't have another hashtag inside this content and just match a space in the string if this is followed by any word (except other blank space or #).
I'm really curious to know if I can do something like "if" in regular expressions and I hope you can help me with this.
So, in:
"#hashtag?!-=_" "#hashhash#" "#hash tag" "#hash tag" "#hash #ahuhuhhuasd" "#hash "
The regex must match the following sentences:
"#hashtag?!-=_" "#hashhash" "#hash tag" "#hash" "#hash #ahuhuhhuasd" "#hash"
(all hashtag) (one) (another h.)
Actually, this is my code:
#{1,1}\S+\s{0,1}
You can test here this code, but it matches things that isn't desired:
"#ahusdhuas?!__??###hud #ahusdhuads "
The blank space in the end of the string, the 3 '#' inside the string.
none of the following content is desired in this string, just "#ahusdhuas?!__??"
Glad if you can help me!
I think this is what you need :
(#(?:\s?[^#\s]+)+)
Here are some tests :
Is any of these are what You've been looking for?
Try:
#[^# ]+(?: [^# ]+)*
Match a #, then one or more characters that aren't # or , then 0 or more instances of ( A space followed by one or more characters that aren't # or ). The ?: makes the group non-capturing.
If you don't want to match ###hud in #ahusdhuas?!__??###hud #ahusdhuads at all because it begins with three #, you can add the negative lookbehind: (?<!#) to the front of the regex:
(?<!#)#[^# ]+(?: [^# ]+)*
However, that will work in Ruby but not in JavaScript, since JavaScript doesn't have the capability to do lookbehinds. In that case you'd have to use the #[^# ]+(?: [^# ]+)* pattern, and if the match starts after the first character, test the previous character in the string in your code to see if it is a #, and if so, reject the match the regex returns.
I think I got it, though I'm not accustomed to Javascript's regex expression because I only use Python.
I tested the following on the site regexpal.com given by Monty Wild, it's the only one that showed me all the substrings matched:
(?:^ |^| )(#[^#\s]+(?: [^#\s]+)?)(?:(?=\Z| \Z| \S)| +(?=#))
result
#hashtag?!-=_
#hash tag
#hash
#ahuhuhhuasd
#hash
As Javascript's regexex doesn't accept lookbehind assertions, I used a trick to make so that a hastag preceded by two or more blanks won't match: these preceding blanks are consumed by the regex machine as subsequent blanks in the preceding matching: that's the role of the last part +(?=#) of the regex to trihgger such a matching of trailing blanks of a matcjing if there are more than one. This cosumption intervenes only if the former part (?=\Z| \Z| \S) didn't match
Tried this in a standard HTML page and in Firebug as well ...
Works againt inputs you gave.
var hashTags = ["#hashtag?!-=_", "#hashhash#", "#hash tag", "#hash tag", "#hash #ahuhuhhuasd", "#hash ", "#hash #", "#foo bar baz"];
hashTags.forEach(function(el, idx, arr) {
console.log( el.match(/#([^#\s]|(( [^\s])(?!\s|$)))+/g));
});
// Console output
> ["#hashtag?!-=_"]
> ["#hashhash"]
> ["#hash tag"]
> ["#hash"]
> ["#hash #ahuhuhhuasd"]
> ["#hash"]
> ["#hash"]
> ["#foo bar baz"]

Match word in a string that does not end in ellipsis

Let's say I have the following string:
ZD:123123 ZD:213123 ZD:2e213 [ZD:213123] [ZD#221313] ZD:234...
I want to pattern match every occurrence except ZD:234... because I don't want any words that have an elipses.
This pattern was doing nicely for me in JavaScript:
/(\[|\(|)ZD[:#]\w+(\]|\)|)/g
However, it still captures the ZD:234 part of ZD:234... which I absolutely don't want it to do.
How can I prevent regex from doing this?
An easy fix is to use a negative lookahead:
/(\[|\(|)ZD[:#]\w+\b(\]|\)|)(?!\.\.\.)/g
Note that I've also added \b to avoid matching on ZD:23.
A bit simplified:
/[\[(]?ZD[:#]\w+\b[\])]?(?!\.\.\.)/g
In case you want matching brackets (no [ZD:123)):
/(?:ZD[:#]\w+|\[ZD[:#]\w+\]|\(ZD[:#]\w+\))\b(?!\.\.\.)/g
There is more than one way to skin a cat. The following will work in more browsers by using a simpler regular expression:
function trim(s) {
return s.replace(/^ | $/g,'').replace(/\s+/g,' ');
}
var x = 'ZD:123123 ZD:213123 ZD:2e213... [ZD:213123] [ZD#221313] ZD:234...';
alert(
trim(x.replace(/(^| )[^ ]+[\.]{3}( |$)/g,' ')).split(/\s+/)
);
/* shows: ZD:123123,ZD:213123,[ZD:213123],[ZD#221313] */
It removes any space delimited "word" of characters ending in ... and then splits on the space.

Categories