How to delete the last occurence of the match? - javascript

I have the lyrics in the format like:
[00:26.8]Lo [00:27.0]rem[00:27.2] Ipsum[00:27.4] sam[00:27.6]ple [00:27.9]text[00:28.1] to [00:28.5]
[00:28.51]demonstrate[00:28.7] the[00:28.9] lyrics[00:29.1] text
I use the following regex to match time tags ([hh:mm.ss]):
/\[\d{1,2}:\d{1,2}\.\d{0,2}\]/ig
But how can I find and delete the last time tag ([00:29.1] in the example above)? Understand that I can match all occurences, take the last one, find the position of according tag within the text (with lastIndexOf usage), then delete the tag. But is there any better way to achieve it?
Upd. There is one more condition - if the time tag is at the beginning of the line, then it shouldn't be removed. I.e. in case of the lyrics:
[00:26.8]Lo [00:27.0]rem[00:27.2] Ipsum[00:27.4] sam[00:27.6]ple [00:27.9]text[00:28.1] to [00:28.5]
[00:28.51]demonstrate
The tag found and deleted should be [00:28.5], not [00:28.51].

Add a look ahead assertion to ensure that not [..] follows the matched string as
/\[\d{1,2}:\d{1,2}\.\d{0,2}\](?!(.|\n)*\[)/
Regex Demo

Also can do this without lookahead by adding a greedy [^]* or [\s\S]* before to eat up.
var str = str.replace(/^([^]*)(\[\d{1,2}:\d{1,2}\.\d{0,2}\])/, "$1");
Replace with captured first part. See fiddle
Ad update: add a [^\n] before:
var str = str.replace(/^([\s\S]*[^\n])(\[\d{1,2}:\d{1,2}\.\d{0,2}\])/, "$1");
See fiddle

The following should do (It makes sure the wanted [hh:mm.ss] isnt followed by any other opening or closing bracket till the end of the string... wich means that it's the last one you're looking for):
(\[\d{1,2}:\d{1,2}\.\d{0,2}\])(?=[^\]\[]*$)
DEMO

Related

Repeat a javascript replace until no change is made?

I have a short but complex regular expression to trim spaces regardless of html tags present in the string.
var text = "<span><span>ex ample </span> </span>";
// trim from start; not relevant in this example
text = text.replace(/^((<[^>]*>)*)\s+/g, "$1");
// trim from end
text = text.replace(/\s+((<[^>]*>)*)$/g, "$1");
console.log(text);
<span><span>ex ample </span> </span> - example input
<span><span>ex ample</span></span> - expected output
<span><span>ex ample </span></span> - observed output
How do I achieve my expected output?
I've tried adding the /g flag because it should supposedly match more than once and that should fix it (running the replace twice does work for the example) but it doesn't seem to repeat anything at all.
Alternative ways to trim strings regardless of tags are also appreciated because that is my primary objective. The secondary objective is learning why this didn't work.
You need to add some meaning to your tags, some need their spaces, some don't.
Try this:
text.replace(/\s*(<\/?(span|div)>)\s*/g, "$1")
.trim()
.replace(/\s+/g, ' ');
It:
replaces spaces around tags "surrounding" content
trims spaces around global string
removes redundant spaces
The list of "surrounding" tags can be changed to include things like tr...
Steps 2 and 3 might come first to speed things up.
Tried it with:
var text = "<div> <i>ano</i> <b>ther</b> <span> <b>my</b> <i>ex</i> <u> ample </u> </span> </div>";
First answer, prior to comments.
The idea is to remove all spaces between:
a non-space character and an opening tag
a closing tag and a non-space character
text.replace(/([^\s])\s*(<)/g, "$1$2")
.replace(/([>])\s*([^\s])/g, "$1$2")
.trim();
Preamble: don't just copy this, read to the end.
Thinking from the other way around - by replacing until no match is found instead of until no change is made, this seems to work very simply.
var text = "<span><span>ex ample </span> </span>";
var trim_start = /^((<[^>]*>)*)\s+/;
while(text.match(trim_start)) {
text = text.replace(trim_start, "$1");
}
var trim_end = /\s+((<[^>]*>)*)$/;
while (text.match(trim_end)) {
text = text.replace(trim_end, "$1");
}
console.log(text);
The output is as expected - the only space is between ex ample
But this has a big problem if the replace might not change anything. Simply changing \s+ to \s* makes it turn into an infinite cycle. So, all in all, it works for my case but is not robust and to use it, you must be completely sure every single replace will change something when the regex matches.

Trying to write a regex where a newline may appear anywhere in a group

I'm trying to make a regex divide text into two parts and ignore everything that comes after these two parts.
The (insufficient) regex I'm trying to use is:
/Artikelnummer(?:(&&&))(.*)(?:\s*.*)\W?(?:Dokumentation&&&KKS-Nummer&&&Beschreibung&&&Seite&&&)((.*)&&&(.*)&&&(\d)+)*/
The text I'm matching is saved at these links:
https://regex101.com/r/VDnUoe/1
https://regex101.com/r/j62Mw0/2
Part 1) Everything after Artikelnummer and before Dokumentation... (easy to match)
Part 2) Everything after (?:Dokumentation&&&KKS-Nummer&&&Beschreibung&&&Seite&&&) that follows the pattern:
text&&&text&&&digits
In one of the above links, the above pattern works except for a new line that is thrown in, which causes some text to be left out that should be included.
The first part is matched:
all&&&Vorwort&&&1&&&all&&&Sicherheit&&&2&&&all&&&Richtlinien und Normen&&&3&&&all&&&Produktbeschreibung&&&4&&&all&&&Installation&&&5&&&all&&&Wichtige Informationene zur Inbetriebnahme&&&6&&&all&&&Projektierung - Wichtige Infos&&&7&&&all&&&Anhang 1&&&8&&&all&&&Anhang 2&&&9&&&all&&&Anhang 3&&&10&&&all&&&Anhang 4&&&11&&&all&&&Anhang 5&&&12&&&all&&&Anhang 6&&&13&&&all&&&Anhang 7&&&14&&&all&&&Anhang 8&&&15&&&all&&&Anhang 9&&&16&&&all&&&Anhang 10&&&17&&&all&&&Anhang 11&&&18&&&all&&&Anhang 12&&&19&&&all&&&Anhang 13&&&20&&&all&&&Anhang 14&&&21&&&all&&&Anhang 15&&&22&&&all&&&Anhang 16&&&23&&&all&&&Anhang 17&&&24&&&all&&&Anhang 18&&&25&&&all&&&Anhang 19&&&26&&&all&&&Anhang 20&&&27&&&all&&&Anhang 21&&&28&&&all&&&Anhang 22&&&29&&&all&&&Anhang 23&&&30&&&all&&&Anhang 24&&&31&&&all&&&Anhang 25&&&32&&&all&&&Anhang 26&&&33
And then this isn't matched, because a newline is inserted:
all&&&Anhang 27&&&34&&&all&&&Anhang 28&&&35&&&all&&&Anhang 29&&&36&&&all&&&Anhang 30&&&37&&&all&&&Anhang 31&&&38&&&all&&&Anhang 32&&&39&&&all&&&Anhang 33&&&40&&&all&&&Anhang 34&&&41&&&all&&&Anhang 35&&&42&&&all&&&Anhang 36&&&43&&&all&&&Anhang 37&&&44&&&all&&&Anhang 38&&&45
My question is, how can this regex be rewritten so that a newline could theoretically be placed anywhere within the second part of the text and still match everything I want?
I'm not sure this is what you want, anyway this regex works with newlines too:
Artikelnummer(?:(&&&))(.*)(?:\s*.*)\W?(?:Dokumentation&&&KKS-Nummer&&&Beschreibung&&&Seite&&&)((.*)&&&(.*)&&&(\d)+(\n?)*)*
\n matches newline
? is the quantifier for zero or one (if newline is found or not)
* I added this one if more newline are encountered
I would try a regex like this:
(Artikelnummer([\n|\r| |\S]*)(?=Dokumentation))(([\n|\r| |\S]*&&&){2}\d+)*
Looking for the \n\r and all other non space chars.
Second I wouldn't use the ?: - for maching every find. The positive lookup ?= should give you the requirements for the first group.

match text between comment string in javascript code

I am trying to strip out code between some tags. Its from a JavaScript plugin and it has multiple occurencies.
For example:
/*<ltIE8>*/ ╗
if (!item.hasOwnProperty) return false; ╣ this should match / go away
/*</ltIE8>*/ ╝
return item instanceof object; // this should not go away/match
...
/*<ltIE8>*/ ╗
if (!window.addEvenetListener) return false; ╣ this should match / go away
/*</ltIE8>*/ ╝
return window.addEvent;
I would like to match/remove those two blocks.
Tried using lookaheads like \/\*<ltIE8>\*\/(?!=\/\*<\/ltIE8>\*\/)([\s\S]+) but it ends up matching from the first ocurrence to the last, and missing the ones in-between.
Example: https://regex101.com/r/iD6mL8/1
Any sugestions? (I will be doing these replacements using JavaScript/NodeJS).
\/\*<ltIE8>\*\/([\s\S]+?)(?=\/\*<\/ltIE8>\*\/)
Try this.See demo.
https://www.regex101.com/r/fG5pZ8/18
I suck at regex, but this seems to work:
(\/\*<ltIE8>\*\/)[\s\S]*?(\/\*<\/ltIE8>\*\/)
The key to this solution is the ?, it tells the regex to not be greedy which basically means it stops when it finds the next /*</ltIE8>*/ rather than going all the way to the very last one.
Here is a working example

I want to find numbers-include dot and comma- but Regex not working in javascript

I thought it was very simple to find out. But how many ways I tried still not work properly.
Below is the test snippet.
"100$ and 1.000,000EUR 1,00.0.000USD .90000000000000000000$ (09898)".replace(/[\.,\d]*/g, '{n}')
And I want the result like below.
{n}$ and {n}EUR {n}USD {n}$ ({n})
The * is your problem, change the regex to /[.,\d]+/g instead.
"100$ and 1.000,000EUR 1,00.0.000USD .90000000000000000000$ (09898)".replace(/[.,\d]+/g, '{n}');
Output
{n}$ and {n}EUR {n}USD {n}$ ({n})
JSFiddle Example Check console screen for the output.
The problem here is that [\.,\d]* can match an empty string. The first step would be to use [.,\d]+ so that at least one of these characters matches.
But a better regex would be \d[.,\d]* because it ensures the replaced characters begin with a digit, so it won't replace periods in sentences.
If you want to go further, you can also use (?=[.,\d]*\d)[.,\d]+ if to handle numbers starting with periods. This one would be the proper answer for your case. The lookahead ensures there's at least one digit anywhere in the replaced text.
Note that you don't need to escape the . inside a character class.
\.?\d[^\s]*\d
Try this.Replace with {n}.See demo.
http://regex101.com/r/kP8uF5/3
var re = /\.?\d[^\s]*\d/gm;
var str = '100$ and 1.000,000EUR 1,00.0.000USD .90000000000000000000$ (09898)';
var subst = '{n}';
var result = str.replace(re, subst);

Javascript RegExp Matching weirdness

I have a RegExp:
/.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?/gi
and some text "Champion"
somehow, this is coming back as a match, am I crazy?
0: "pio"
1: "i"
index: 4
input: "Champion"
length: 2
the loop is here:
// contruct the pattern, dynamically
var someText = "Champion";
var phrase = ".?(NCAA|Division|I|Basketball|Champions,|1939-2011).?";
var pat = new RegExp(phrase, "gi"); // <- ends up being
var result;
while( result = pat.exec(someText) ) {
// do stuff!
}
There has to be something wrong with my RegExp, right?
EDIT:
The .? thing was just a quick and dirty attempt to say that I'd like to match one of those words AND/OR one of those words with a single char on either side. ex:
\sNCAA\s
NCAA
NCAA\s
\sNCAA
GOAL:
I'm trying to do some simple hit highlighting based on some search words. I've got a function that gets all of the text nodes on a page, and I'd like to go through them all and highlight any matches to any of the terms in my phrase variable.
I think that I just need to rework how I am building my RegExp.
Well, first of all you're specifying case-insensitivity, and secondly, you are matching the letter I as one of your matchable string.
Champion would match pio and i, because they both match /.?I.?/gi
It however doesn't match /.?Champions,.?/gi because of the trailing comma.
Add start (^) and end ($) anchors to the regexp.
/^.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?$/gi
Without the anchors, the regexp's match can start and end anywhere in the string, which is why
/.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?/gi.exec('Champion')
can match pio and i: because it's actually matching around the (case-insensitive) I. If you leave the anchors off, but remove the ...|I|..., the regex won't match 'Champion':
> /.?(NCAA|Division|Basketball|Champions,|1939-2011).?/gi.exec('Champion')
null
Champion matches /.?I.?/i.
Your own output notes that it's matching the substring "pio".
Perhaps you meant to bound the expression to the start and end of the input, with ^ and $ respectively:
/^.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?$/gi
I know you said to ignore the .?, but I can't: it's most likely wrong, and it's most likely going to continue to cause you problems. Explain why they're there and we can tell you how to do it properly. :)

Categories