Make this RegEx work in Javascript? - javascript

I need to get the data of an SRT file as an array using a regex.
Here is my code so far:
Javacript:
function readSrt() {
var srtUrl = 'assets/media/subtitles.srt';
$.get(srtUrl, function(data) {
console.log("SRT:", data); // it reads ok
var regexp = /(.*)\n(.*),\d\d\d --> (.*)\n(.*)/g; // this regex doesn't work
console.log("SUBS:", data.match(regexp)); // outputs null
});
}
subtitles.srt:
0
00:00:00,000 --> 00:00:01,000
Instructor…All right, let's start off
1
00:00:01,000 --> 00:00:04,000
here. We were, I think, wrapping up kind
...
14
00:00:40,000 --> 00:00:42,000
mound, basically.
15
00:00:42,000 --> 00:00:44,000
If you go to Colossae today, none of it
...
Need to get:
1. 0
2. 00:00:00
3. 00:00:01,000
4. Instructor…All right, let's start off
Did several attempts in regex101.com but only seems to work well with PHP but not javascript.
What am I doing wrong and how do I fix it?

One thing that might be wrong in your regex is that .* is greedy. It will start to match in the first caption and only end the match at the last caption. Try replacing it with its lazy alternative, to match as little as possible each time.
/(.*?)\n(.*?),\d\d\d --> (.*?)\n(.*?)\n/

You're relying on the . wildcard character to also catch \n (and presumably \r), which it won't do (see http://www.regular-expressions.info/javascript.html for the info on that, "There is indeed no /s modifier" bit). Use explicit groups instead for greater win: use [^\n]+\n for "anything and then a newline instead, [^,]+, for "anything before the comma", etc.

You're close. The main problem with your regex is the part around (.*),\d\d\d, which doesn't work because (.*) will match the entire line of text, making the rest of the regex invalid.
The fixed regex is:
/(.*)\n([^,]+),[\d]{3} --> (.*)\n(.*)(?:\n*)/g

Related

Regex: Replace last segment of url

I try to figure out the correct regex to replace the last segment of an url with a modified version of that very last segment. (I know that there are similar threads out there, but none seemed to help...)
Example:
https://www.test.com/one/two/three/mypost/
--->
one/two/three?id=mypost
https://www.test.com/one/mypost/
--->
one?id=mypost
Now I am stuck here:
https://regex101.com/r/9GqYaU/1
I can get the last segment in capturing group 2 but how would I replace it?
I think I will have to something like this:
const url = 'https://www.test.com/one/two/three/mypost/'
const regex = /(http[s]?:\/\/)([^\/]+\/)*(?=\/$|$)/
const path = url.replace(regex, `${myUrlWithoutTheLastSegmentAnd WithoutHTTPS}?id=$2`)
return path
But I have no idea how to get the url without the last segment. I have currently only access to the whole string or group 1 (which is useless in this case) and then group 2, but not the string without group 2.
I would be very glad for any help here. Sometimes I just lack the knowledge of what is possible with regex and how to achieve it.
Thank you in advance.
Cheers
You could use the URL class to extract the pathname and substring to remove the first '/'.
Then, you could put the last part of the pathname in a group and use it as a reference $1 for the replacement.
const url = new URL('https://www.test.com/one/two/three/mypost/').pathname.substring(1)
console.log(url.replace(/\/([^/]*)\/$/, '?id=$1'))
I came across your question yesterday and agree with going down the route of parsing the URL. Once you get there you could even use JavaScript array methods which I prefer to string methods like:
pathname.split("/").filter(p => p.length).pop()
This would separate each folder, ignore any with no length (i.e. handle a trailing slash) and return the last one (mypost).
Anyway, I am also learning regex so sometimes when I find a question like this I just try to find the answer anyway as the best way of learning is doing. It took 24 hours 😂 I came up with this:
/(https?:\/\/).+?([a-z-]*)\/?$/gm
(https?:\/\/) you know what this does. Small correction, you don't need the square brackets. Question mark matches 0 or 1 of the preceding character. As we're only matching s this just works. If you wanted to match s or z you would use [sz]?. I think.
.+? this is the cool one I think I will use in future now I found it. The question mark here has a different meaning - it makes .+ (which means one or more of any character) non-greedy. That means it stops applying once it reaches the next rule. Which is...
([a-z-]*) any number of letters or a hyphen. You should maybe change this to include numbers and upper case.
\/? Optional slash
$ all this must apply at the end of the string.
Here is a demo
https://regex101.com/r/mQNkIS/1

Regular Expression to find merge conflicts in file

This is the file which contains merge conflicts,
<<<<<<< HEAD
$conf['some_unit_id'] = '4-qw-gg-ds-sometext';
=======
// Some Snippets Site Info
$conf['site_info'] = array(
'customer_service_phone' => '+1 323223232
'logo_path' => 'https://www.google.com/img/icons/src/logo.svg',
'currency' => 'CAD',
'https://www.youtube.com/user/somewebsite/ogog',
'https://www.instagram.com/somewebsite/',
),
);
>>>>>>> ff6df3435231fdff78fwsd83e7dffa0732eft554
// Somes code
$done['rules'] = TRUE;
Am trying to find the best regular expression that detect merge conflicts in the file. Initially I tried with :
/(<* HEAD)/
Which will detect only HEAD with some preceding <
I have some other markers as well like :
1. ======
2. >>>>> ff6df3435231fdff78fwsd83e7dffa0732eft554
These two markers must detect along with HEAD marker as well. And if a developer fixes the merge conflicts only <* HEAD and rest of the ie., ===== and >>> ff6df3435231fdff78fwsd83e7dffa0732eft554 the regular expression should detect that as well.
Since this regular expression am using in pre-commit hook. If one pattern detected in file commit will break. I need exact regex to detect merge conflict markings.
Any solution would be appreciated.
Since they're all the same length, you can use a character group:
/^[<=>]{7}( .+)?$/mg
(make sure to use a multiline regex)
You can use:
^<{7} HEAD(?:(?!={7})[\s\S])*={7}(?:(?!>{7} \w+)[\s\S])*>{7} \w+
Demo & explanation
You might also match all the lines by checking the start of each line to prevent some of the unnecessary backtracking using [\s\S].
First match the <<<<<<< HEAD part, then match all following lines that do not start with ======= and then match it.
Then match all lines that do not start with >>>>>>> followed by matching it and chars [a-z0-9].
^<{7} HEAD(?:\r?\n(?!={7}\r?\n).*)*\r?\n={7}(?:\r?\n(?!>{7} ).*)*\r?\n>{7} [a-z0-9]+
Regex demo
If you want to highlight the markers, you could use a capturing group:
^(<{7} HEAD)(?:\r?\n(?!={7}\r?\n).*)*\r?\n(={7})(?:\r?\n(?!>{7} ).*)*\r?\n(>{7} [a-z0-9]+)
Regex demo
If I understand correctly your desire, you want to find block code that need to resolve conflict. I hope my suggestion can help you.
/^<{7}\sHEAD[\s\S]+?>{7}\s\w+$/gm
Details:
Mode: multiline
^<{7}\sHEAD: block code starts with <<<<<<< HEAD
[\s\S]+?: get any character as few times as possible (line break accepted)
{7}\s\w+$: block code ends with >>>>>>> commit hash
Demo

How can I only capture string within url() with regex?

I'll try and make this concise.
I want to only capture the string within the parenthesis of:
background-image:url()
I've tried for hours and the best I could come up with was (?<!\#\*)(?<=url\().*(?=\)).
(Try it online)
This is almost perfect except when there is a run on line like this or any minified CSS code:
background-image:url(/images/products/test#2x.png);height:0;background-image:url("/images/products/test#2x.png")
It captures everything from the first open parentheses to the last closed parenthesis including the irrelevant styles in between.
I only want to capture the string between the url()'s.
Using positive and negative lookbehind: /(?<=background-image:\s*url\().*(?=\))/ig will return everything inside the background-image: url() call, with \s* to catch spaces.
use regex
(?<=background-image:url\()(.+?)(?=\))
can found all xxx in background-image:url(xxx), while not include background-image:url()
you can check result in: https://regexr.com , like this:
Note:
about look around = look ahead and look behind, here is my summary:
more detail can refer (Chinese) tutorial: 环视断言 · 应用广泛的超强搜索:正则表达式
Below is the regex which will work for you:
(?<!\#\*)(?<=url\().*?(?=\))
Hope this helps!

What Regex would capture both the beginning and end from of a string?

I am trying to edit a DateTime string in typescript file.
The string in question is 02T13:18:43.000Z.
I want to trim the first three characters including the letter T from the beginning of a string AND also all 5 characters from the end of the string, that is Z000., including the dot character. Essentialy I want the result to look like this: 13:18:43.
From what I found the following pattern (^(.*?)T) can accomplish only the first part of the trim I require, that leaves the initial result like this: 13:18:43.000Z.
What kind of Regex pattern must I use to include the second part of the trim I have mentioned? I have tried to include the following block in the same pattern (Z000.)$ but of course it failed.
Thanks.
Any help would be appreciated.
There is no need to use regular expression in order to achieve that. You can simply use:
let value = '02T13:18:43.000Z';
let newValue = value.slice(3, -5);
console.log(newValue);
it will return 13:18:43, assumming that your string will always have the same pattern. According to the documentation slice method will substring from beginIndex to endIndex. endIndex is optional.
as I see you only need regex solution so does this pattern work?
(\d{2}:)+\d{2} or simply \d{2}:\d{2}:\d{2}
it searches much times for digit-digit-doubleDot combos and digit-digit-doubleDot at the end
the only disadvange is that it doesn't check whether say there are no minutes>59 and etc.
The main reason why I didn't include checking just because I kept in mind that you get your dates from sources where data that are stored are already valid, ex. database.
Solution
This should suffice to remove both the prefix from beginning to T and postfix from . to end:
/^.*T|\..*$/g
console.log(new Date().toISOString().replace(/^.*T|\..*$/g, ''))
See the visualization on debuggex
Explanation
The section ^.*T removes all characters up to and including the last encountered T in the string.
The section \..*$ removes all characters from the first encountered . to the end of the string.
The | in between coupled with the global g flag allows the regular expression to match both sections in the string, allowing .replace(..., '') to trim both simultaneously.

Get content with regex in javascript

::head
line 1
line 2
line 3
::content
content 1
content 2
content 3
How do I get "head" paragraph(first part) text with regex? This is from txt file.
Unfortunately, the below doesn't work in javascript because of this: Javascript regex multiline flag doesn't work. So we have to tweak things a bit. A line break in a file can be found in javascript strings as \n. In windows this includes \r but not in linux, so our \s* becomes more important now that we're doing this without using line-ending characters ($). I also noticed that you don't need to specifically gather the other lines, since line breaks are being ignored anyway.
/(::head[^]*?)\n\s*\n/m
This works in testing in Chrome, so it should work for your needs.
this is a little fancy, but it should fit if this is used in conjunction with many similar properties.
/(::head.*?$^.*?$)^\s*$/m
Note that you need the /m multiline flag.
Here it is tested against your sample data http://rubular.com/r/vtflEgDdkY
First, we check for the ::head data. That's where we start collecting information in a group with (). Then we look for anything with .*, but we do so with the lazy ? flag. Then we find the end of the line with $ and look for more lines with data with the line start ^ then anything .*? then the line end $ this will grab multiple lines because of the multiline flag, so it's important to use the lazy matching ? so we don't grab too much data. Then we look for an empty line. Normally you just need ^$ for that, but I wanted to make sure this would work if someone had stuck a stray space or tab on the lines in between sections, so we used \s* to grab spaces. The * allows it to find "0 or more" spaces as acceptable. Notice we didn't include the empty line in the group () because that's not the data you care about.
For further reading on regex, I recommend http://www.regular-expressions.info/tutorial.html It's where I learned everything I know about regex.
You can use [\s\S]+::content to match everything until ::content:
const text = ...
const matches = text.match(/^([\s\S]+)::content/m)
const content = matches[1]

Categories