Regex for the following for removing first set of brackets - javascript

I need to strip out the "( ( listen) LEE-mər)" in the following text in javascript, including outer brackets. The content within the outer brackets dynamically changes. I also don't want to strip the next set of brackets (ghosts or spirits)
Here is what I found on Wikipedia: Lemurs ( ( listen) LEE-mər) are a clade of strepsirrhine primates endemic to the island of Madagascar. The word lemur derives from the word lemures (ghosts or spirits) from Roman mythology and was first used to describe a slender loris due to its nocturnal habits and slow pace, but was later applied to the primates on Madagascar."
I got as far as
/\((.*?\()*/g
but it doesn't work. Is it possible to do in regex?

Try this regex :
/\(([^)]*)\)[^)]*\)/
To retrieve only what you want :
var text = "Here...";
var response = text.match(/\(([^)]*)\)[^)]*\)/)[0];

You could try this regular expression to match "( ( listen) LEE-mər)" and remove it from the text:
\(\s*(?=\(\s*listen).+?\).+?\)
This does not match (ghosts or spirits).
You can test it on this link.
Here is an example in Javascript:
const str = "Here is what I found on Wikipedia: Lemurs ( ( listen) LEE-mər) are a clade of strepsirrhine primates endemic to the island of Madagascar. The word lemur derives from the word lemures (ghosts or spirits) from Roman mythology and was first used to describe a slender loris due to its nocturnal habits and slow pace, but was later applied to the primates on Madagascar.";
const transformed = str.replace(/\(\s*(?=\(\s*listen).+?\).+?\)/g, "");
document.getElementById("str").innerHTML = str;
document.getElementById("transformed").innerHTML = transformed;
<div id="str"></div>
<br />
<div id="transformed"></div>
Regex explanation
The regex looks first for a left round bracket followed by zero or more spaces:
\(\s*
which is followed by "listen" in brackets (using positive lookahead):
(?=\(\s*listen)
This is then followed by any non greedy number of characters followed by a bracket two times:
.+?\).+?\)
Regex optimisation
You could optimize the original regex with this one:
\(\s*(?=\(\s*listen)(.+?\)){2}

Related

I need some help for a specific regex in javascript

I try to set a correct regex in my javascript code, but I'm a bit confused with this. My goal is to find any occurence of "rotate" in a string. This should be simple, but in fact I'm lost as my "rotate" can have multiple endings! Here are some examples of what I want to find with the regex:
rotate5
rotate180
rotate-1
rotate-270
The "rotate" word can be at the begining of my string or at the end, or even in the middle separated by spaces from other words. The regex will be used in a search-and-replace function.
Can someone help me please?
EDIT: What I tried so far (probably missing some of them):
/\wrotate.*/
/rotate.\w*/
/rotate.\d/
/\Srotate*/
I'm not fully understanding the regex mechanic yet.
Try this regex as a start. It will return all occurrences of a "rotate" string where a number (positive or negative) follows the "rotate".
/(rotate)([-]?[0-9]*)/g
Here is sample code
var aString = ["rotate5","rotate180","rotate-1","some text rotate-270 rotate-1 more text rotate180"];
for (var x = 0; x < 4; x++){
var match;
var regex = /(rotate)([-]?[0-9]*)/g;
while (match = regex.exec(aString[x])){
console.log(match);
}
}
In this example,
match[0] gives the whole match (e.g. rotate5)
match[1] gives the text "rotate"
match[2] gives the numerical text immediately after the word "rotate"
If there are multiple rotate stings in the string, this will return them all
If you just need to know if the 'word' is in the string so /rotate/ simply will be OK.
But if you want some matching about what coming before or after the #mseifert will be good
If you just want to replace the word rotate by another one
you can just use the string method String.replace use it like var str = "i am rotating with rotate-90"; str.repalace('rotate','turning')'
WHy your regex doesnt work ?
/\wrotate.*/
means that the string must start with a caracter [a-zA-Z0-9_] followed by rotate and another optional character
/rotate.\w*/
meanse rotate must be followed by a character and others n optional character
...............
Using your description:
The "rotate" word can be at the beginning of my string or at the end, or even in the middle separated by spaces from other words. The regex will be used in a search-and-replace function.
This regex should do the work:
const regex = /(^rotate|rotate$|\ {1}rotate\ {1})/gm;
You can learn more about regular expressions with these sites:
http://www.regular-expressions.info
regex101.com and btw here is an example using your requirements.

REGEX - after bracket get data until end bracket

I have a string like the following:
SOME TEXT (BI1) SOME MORE TEXT (BI17) SOME FINAL TEXT (BI1234)
Question
I am trying to make a regex to get just the information between the curly brackets, for example the end string would look like:
BI1 BI17 BI1234
I have found this example on stackoverflow which will get the first value BI1, but will ignore the rest after.
Get text between two rounded brackets
this is the REGEX I created from the above link: /\(([^)]+)\)/g but it includes the brackets, I want to remove these.
I am using this website to attempt to solve this query which has a testing window to see if the regex entered works:
http://www.regexr.com
Additional Information
there can be any amount of numbers also, which is why I have given 3 different examples.
this is a continous string, not on seperate lines
thanks for any help on this matter.
While this isn't possible using just regexes, you can do it with string#split and the following regex:
\).*?\(|^.*?\(|\).*?$
Yielding code that looks a bit like this:
function getBracketed(str) {
return str.split(/\).*?\(|^.*?\(|\).*?$/).filter(Boolean);
}
(You need to filter out the empty strings that'll appear at the beginning and end if you do it this way - hence the extra operation).
Regex demo on Regex101
Code demo on Repl.it
If you need to keep all inside parentheses and remove everything else, you might use
var str = "SOME TEXT (BI1) SOME MORE TEXT (BI17) SOME FINAL TEXT (BI1234)";
var result = str.replace(/.*?\(([^()]*)\)/g, " $1").trim();
console.log(result);
If you need to get only the BI+digits pattern inside parentheses, use
/.*?\((BI\d+)\)/g
Details:
.*? - match any 0+ chars other than linebreak symbols
\( - match a (
(BI\d+) - Group 1 capturing BI + 1 or more digits (\d+) (or [^()]* - zero or more chars other than ( and ))
\) - a closing ).
To get all the values as array (say, for later joining), use
var str = "SOME TEXT (BI1) SOME MORE TEXT (BI17) SOME FINAL TEXT (BI1234)";
var re = /\((BI\d+)\)/g;
var res =str.match(re).map(function(s) {return s.substring(1, s.length-1);})
console.log(res);
console.log(res.join(" "));

Need a javascript function that doesn't care about special characters in a string

I have this html string x:
Michelle Brook
<br></br>
The Content Mine
<br></br>
michelle#contentmine.org
It is taken from first lines of http://www.dlib.org/dlib/november14/brook/11brook.html
I would like to obtain x.substring(0,14)=Michelle Brook.
The problem is that before the M, there are two special characters (unicode code=10) that makes x.substring(0,14)=Michelle Bro.
In fact, using x.split("") i can see {" "," ","M",.....}
I wouldn't remove these characters.
I would like to make substring doing the right thing "keeping in mind" that special characters. How could i do? Is there a different javascript function that makes that?
From your webpage:
window.onload = function() {
var arrStr = document.getElementsByClassName('blue')[0].innerHTML.replace(/[^A-Za-z0-9 <>]/g, '').split('<br>');
alert(arrStr[0].trim());
}
<p class="blue">
Michelle Brook<br>
The Content Mine<br>
michelle#contentmine.org<br><br>
Peter Murray-Rust<br>
University of Cambridge<br>
pm286#cam.ac.uk<br><br>
Charles Oppenheim<br>
City, Northampton and Robert Gordon Universities<br>
c.oppenheim#btinternet.com
<br><br>doi:10.1045/november14-brook
</p>
With the replace function you can remove any character is out of your interest:
in your case I considered you are looking for letters (uppercase, lowercase), numbers and space. You can add other characters to remove.
y cant u alote this strimg in a functio and trim start and end of the String.
Use .trim to remove \n (code 10)
The trim() method removes whitespace from both ends of a string.
Whitespace in this context is all the whitespace characters (space,
tab, no-break space, etc.) and all the line terminator characters (LF,
CR, etc.).
x.trim().substring(0,14);
Or using a regex:
var match = x.match(/[\w ]{14}/);
console.log(match[0]);

regex encapsulation

I've got a question concerning regex.
I was wondering how one could replace an encapsulated text, something like {key:23} to something like <span class="highlightable">23</span, so that the entity will still remain encapsulated, but with something else.
I will do this in JS, but the regex is what is important, I have been searching for a while, probably searching for the wrong terms, I should probably learn more about regex, generally.
In any case, is there someone who knows how to perform this operation with simplicity?
Thanks!
It's important that you find {key:23} in your text first, and then replace it with your wanted syntax, this way you avoid replacing {key:'sometext'} with that syntax which is unwanted.
var str = "some random text {key:23} some random text {key:name}";
var n = str.replace(/\{key:[\d]+\}/gi, function myFunction(x){return x.replace(/\{key:/,'<span>').replace(/\}/, '</span>');});
this way only {key:AnyNumber} gets replaced, and {key:AnyThingOtherThanNumbers} don't get touched.
It seems you are new to regex. You need to learn more about character classes and capturing groups and backreferences.
The regex is somewhat basic in your case if you do not need any nested encapsulated text support.
Let's start:
The beginning is {key: - it will match the substring literally. Note that { can be a special character (denoting start of a limiting quantifier), thus, it is a good idea to escape it: {key:.
([^}]+) - This is a bit more interesting: the round brackets around are a capturing group that let us later back-reference the matched text. The [^}]+ means 1 or more characters (due to +) other than } (as [^}] is a negated character class where ^ means not)
} matches a } literally.
In the replacement string, we'll get the captured text using a backreference $1.
So, the entire regex will look like:
{key:([^}]+)}
See demo on regex101.com
Code snippet:
var re = /{key:([^}]+)}/g;
var str = '{key:23}';
var subst = '<span class="highlightable">$1</span>';
document.getElementById("res").innerHTML = str.replace(re, subst);
.highlightable
{
color: red;
}
<div id="res"/>
If you want to use a different behavior based on the value of key, then you'll need to adjust the regex to either match digits only (with \d+) or letters only (say, with [a-zA-Z] for English), or other shorthand classes, ranges (= character classes), or their combinations.
If your string is in var a, then:
var test = a.replace( /\{key:(\d+)\}/g, "<span class='highlightable'>$1</span>");

Javascript - regex - word boundary (\b) issue

I have a difficulty using \b and greek characters in a regex.
At this example [a-zA-ZΆΈ-ώἀ-ῼ]* succeeds to mark all the words I want (both greek and english). Now consider that I want to find words with 2 letters. For the English language I use something like this:
\b[a-zA-Z]{2}\b. Can you help me write a regex that succeeds to mark words in Greek with 2 letters? (Why? My final goal is to remove them).
text used:
Greek MONOTONIC:
Το γάρ ούν και παρ' υμίν λεγόμενον, ώς ποτε Φαέθων Ηλίου παίς το του πατρός άρμα ζεύξας δια το μή δυνατός είναι κατά την του πατρός οδόν ελαύνειν τα τ' επί της γής ξυνέκαυσε και αυτός κεραυνωθείς διεφθάρη, τούτο μύθου μέν σχήμα έχον λέγεται, το δέ αληθές εστι των περί γήν και κατ' ουρανόν ιόντων παράλλαξις και διά μακρόν χρόνον γιγνομένη των επί γής πυρί πολλώ φθορά.
Greek POLYTONIC:
Τὸ γὰρ οὖν καὶ παρ' ὑμῖν λεγόμενον, ὥς ποτε Φαέθων Ἡλίου παῖς τὸ τοῦ πατρὸς ἅρμα ζεύξας διὰ τὸ μὴ δυνατὸς εἶναι κατὰ τὴν τοῦ πατρὸς ὁδὸν ἐλαύνειν τὰ τ' ἐπὶ τῆς γῆς ξυνέκαυσε καὶ αὐτὸς κεραυνωθεὶς διεφθάρη, τοῦτο μύθου μὲν σχῆμα ἔχον λέγεται, τὸ δὲ ἀληθές ἐστι τῶν περὶ γῆν καὶ κατ' οὐρανὸν ἰόντων παράλλαξις καὶ διὰ μακρὸν χρόνον γιγνομένη τῶν ἐπὶ τῆς γῆς πυρὶ πολλῷ φθορά.
ENGLISH:
For in truth the story that is told in your country as well as ours, how once upon a time Phaethon, son of Helios, yoked his father's chariot, and, because he was unable to drive it along the course taken by his father, burnt up all that was upon the earth and himself perished by a thunderbolt,—that story, as it is told, has the fashion of a legend, but the truth of it lies in the occurrence of a shifting of the bodies in the heavens which move round the earth, and a destruction of the things on the earth by fierce fire, which recurs at long intervals.
what I've tried so far:
// 1
txt = txt.replace(/\b[a-zA-ZΆΈ-ώἀ-ῼ]{2}\b/g, '');
// 2
tokens = txt.split(/\s+/);
txt = tokens.filter(function(token){ return token.length > 2}).join(' ');
// 3
tokens = txt.split(' ');
txt = tokens.filter(function(token){ return token.length != 3}).join(' ') );
2 & 3 were suggested to my question here: Javascript - regex - how to remove words with specified length
EDIT
Read also:
Why can't I use accented characters next to a word boundary?
Javascript + Unicode regexes
Since Javascript doesn't have the lookbehind feature and since word boundaries work only with members of the \w character class, the only way is to use groups (and capturing groups if you want to make a replacement):
(?m)(^|[^a-zA-ZΆΈ-ώἀ-ῼ\n])([a-zA-ZΆΈ-ώἀ-ῼ]{2})(?![a-zA-ZΆΈ-ώἀ-ῼ])
example to remove 2 letters words:
txt = txt.replace(/(^|[^a-zA-ZΆΈ-ώἀ-ῼ\n])([a-zA-ZΆΈ-ώἀ-ῼ]{2})(?![a-zA-ZΆΈ-ώἀ-ῼ])/gm, '\1');
You can use \S
Rather than write a match for "word characters plus these characters" it may be appropriate to use a regex that matches not-whitespace:
\S
It's broader in scope, but simpler to write/use.
If that's too broad - use an exclusive list rather than an inclusive list:
[^\s\.]
That is - any character that is not whitespace and not a dot. In this way it's also easy to add to the exceptions.
Don't try to use \b
Word boundaries don't work with none-ascii characters which is easy to demonstrate:
> "yay".match(/\b.*\b/)
["yay"]
> "γaγ".match(/\b.*\b/)
["a"]
Therefore it's not possible to use \b to detect words with greek characters - every character is a matching boundary.
Match 2 character words
The following pattern can be used to match two character words:
pattern = /(^|[\s\.,])(\S{2})(?=$|[\s\.,])/g;
(More accurately: to match two none-whitespace sequences).
That is:
(^|[\s\.,]) - start of string or whitespace/punctuation (back reference 1)
(\S{2}) - two not-whitespace characters (back reference 2)
($|[\s\.,]) - end of string or whitespace/punctuation (positive lookahead)
That pattern can be used like so to remove matching words:
"input string".replace(pattern);
Here's a jsfiddle demonstrating the patterns use on the texts in the question.
Try something like this:
\s[a-zA-ZΆΈ-ώἀ-ῼ]{2}\s

Categories