Regular Expression to find merge conflicts in file - javascript

This is the file which contains merge conflicts,
<<<<<<< HEAD
$conf['some_unit_id'] = '4-qw-gg-ds-sometext';
=======
// Some Snippets Site Info
$conf['site_info'] = array(
'customer_service_phone' => '+1 323223232
'logo_path' => 'https://www.google.com/img/icons/src/logo.svg',
'currency' => 'CAD',
'https://www.youtube.com/user/somewebsite/ogog',
'https://www.instagram.com/somewebsite/',
),
);
>>>>>>> ff6df3435231fdff78fwsd83e7dffa0732eft554
// Somes code
$done['rules'] = TRUE;
Am trying to find the best regular expression that detect merge conflicts in the file. Initially I tried with :
/(<* HEAD)/
Which will detect only HEAD with some preceding <
I have some other markers as well like :
1. ======
2. >>>>> ff6df3435231fdff78fwsd83e7dffa0732eft554
These two markers must detect along with HEAD marker as well. And if a developer fixes the merge conflicts only <* HEAD and rest of the ie., ===== and >>> ff6df3435231fdff78fwsd83e7dffa0732eft554 the regular expression should detect that as well.
Since this regular expression am using in pre-commit hook. If one pattern detected in file commit will break. I need exact regex to detect merge conflict markings.
Any solution would be appreciated.

Since they're all the same length, you can use a character group:
/^[<=>]{7}( .+)?$/mg
(make sure to use a multiline regex)

You can use:
^<{7} HEAD(?:(?!={7})[\s\S])*={7}(?:(?!>{7} \w+)[\s\S])*>{7} \w+
Demo & explanation

You might also match all the lines by checking the start of each line to prevent some of the unnecessary backtracking using [\s\S].
First match the <<<<<<< HEAD part, then match all following lines that do not start with ======= and then match it.
Then match all lines that do not start with >>>>>>> followed by matching it and chars [a-z0-9].
^<{7} HEAD(?:\r?\n(?!={7}\r?\n).*)*\r?\n={7}(?:\r?\n(?!>{7} ).*)*\r?\n>{7} [a-z0-9]+
Regex demo
If you want to highlight the markers, you could use a capturing group:
^(<{7} HEAD)(?:\r?\n(?!={7}\r?\n).*)*\r?\n(={7})(?:\r?\n(?!>{7} ).*)*\r?\n(>{7} [a-z0-9]+)
Regex demo

If I understand correctly your desire, you want to find block code that need to resolve conflict. I hope my suggestion can help you.
/^<{7}\sHEAD[\s\S]+?>{7}\s\w+$/gm
Details:
Mode: multiline
^<{7}\sHEAD: block code starts with <<<<<<< HEAD
[\s\S]+?: get any character as few times as possible (line break accepted)
{7}\s\w+$: block code ends with >>>>>>> commit hash
Demo

Related

Regex: Replace last segment of url

I try to figure out the correct regex to replace the last segment of an url with a modified version of that very last segment. (I know that there are similar threads out there, but none seemed to help...)
Example:
https://www.test.com/one/two/three/mypost/
--->
one/two/three?id=mypost
https://www.test.com/one/mypost/
--->
one?id=mypost
Now I am stuck here:
https://regex101.com/r/9GqYaU/1
I can get the last segment in capturing group 2 but how would I replace it?
I think I will have to something like this:
const url = 'https://www.test.com/one/two/three/mypost/'
const regex = /(http[s]?:\/\/)([^\/]+\/)*(?=\/$|$)/
const path = url.replace(regex, `${myUrlWithoutTheLastSegmentAnd WithoutHTTPS}?id=$2`)
return path
But I have no idea how to get the url without the last segment. I have currently only access to the whole string or group 1 (which is useless in this case) and then group 2, but not the string without group 2.
I would be very glad for any help here. Sometimes I just lack the knowledge of what is possible with regex and how to achieve it.
Thank you in advance.
Cheers
You could use the URL class to extract the pathname and substring to remove the first '/'.
Then, you could put the last part of the pathname in a group and use it as a reference $1 for the replacement.
const url = new URL('https://www.test.com/one/two/three/mypost/').pathname.substring(1)
console.log(url.replace(/\/([^/]*)\/$/, '?id=$1'))
I came across your question yesterday and agree with going down the route of parsing the URL. Once you get there you could even use JavaScript array methods which I prefer to string methods like:
pathname.split("/").filter(p => p.length).pop()
This would separate each folder, ignore any with no length (i.e. handle a trailing slash) and return the last one (mypost).
Anyway, I am also learning regex so sometimes when I find a question like this I just try to find the answer anyway as the best way of learning is doing. It took 24 hours 😂 I came up with this:
/(https?:\/\/).+?([a-z-]*)\/?$/gm
(https?:\/\/) you know what this does. Small correction, you don't need the square brackets. Question mark matches 0 or 1 of the preceding character. As we're only matching s this just works. If you wanted to match s or z you would use [sz]?. I think.
.+? this is the cool one I think I will use in future now I found it. The question mark here has a different meaning - it makes .+ (which means one or more of any character) non-greedy. That means it stops applying once it reaches the next rule. Which is...
([a-z-]*) any number of letters or a hyphen. You should maybe change this to include numbers and upper case.
\/? Optional slash
$ all this must apply at the end of the string.
Here is a demo
https://regex101.com/r/mQNkIS/1

Find all `*.html` but not `*.tmp.html` using JavaScript regex

I have a large project that has many *.html files and many *.tpl.html files.
I want to use a regular expression that allows me to differentiate between these two for my Webpack config.
I have tried using laziness to achieve this, like .*?\.html but this also matches *.tpl.html. https://regex101.com/r/a0fl4H/1
How can this be achieved?
Try this:
^(?!.*\.tpl).+\.html$
Demo:
https://regex101.com/r/a0fl4H/8
For regex, this should do it;
/.*?[^.tpl]\.html/
Working example
Edit: This first solution needs improvement. As mentioned in the comments, this will provide false positives for test.t.html - as it matches any of the given characters (.tpl).
This is a working version using;
^(?!.*\.tpl).*.html
bar.html // matches
bar.tpl.html // doesn't match
test.t.html // matches
test.p.html // matches
test.z.html // matches

Get content with regex in javascript

::head
line 1
line 2
line 3
::content
content 1
content 2
content 3
How do I get "head" paragraph(first part) text with regex? This is from txt file.
Unfortunately, the below doesn't work in javascript because of this: Javascript regex multiline flag doesn't work. So we have to tweak things a bit. A line break in a file can be found in javascript strings as \n. In windows this includes \r but not in linux, so our \s* becomes more important now that we're doing this without using line-ending characters ($). I also noticed that you don't need to specifically gather the other lines, since line breaks are being ignored anyway.
/(::head[^]*?)\n\s*\n/m
This works in testing in Chrome, so it should work for your needs.
this is a little fancy, but it should fit if this is used in conjunction with many similar properties.
/(::head.*?$^.*?$)^\s*$/m
Note that you need the /m multiline flag.
Here it is tested against your sample data http://rubular.com/r/vtflEgDdkY
First, we check for the ::head data. That's where we start collecting information in a group with (). Then we look for anything with .*, but we do so with the lazy ? flag. Then we find the end of the line with $ and look for more lines with data with the line start ^ then anything .*? then the line end $ this will grab multiple lines because of the multiline flag, so it's important to use the lazy matching ? so we don't grab too much data. Then we look for an empty line. Normally you just need ^$ for that, but I wanted to make sure this would work if someone had stuck a stray space or tab on the lines in between sections, so we used \s* to grab spaces. The * allows it to find "0 or more" spaces as acceptable. Notice we didn't include the empty line in the group () because that's not the data you care about.
For further reading on regex, I recommend http://www.regular-expressions.info/tutorial.html It's where I learned everything I know about regex.
You can use [\s\S]+::content to match everything until ::content:
const text = ...
const matches = text.match(/^([\s\S]+)::content/m)
const content = matches[1]

javascript regexp to match path depth

Been struggling for the last hour to try and get this regexp to work but cannot seem to crack it.
It must be a regexp and I cannot use split etc as it is part of a bigger regexp that searches for numerous other strings using .test().
(public\/css.*[!\/]?)
public/css/somefile.css
public/css/somepath/somefile.css
public/css/somepath/anotherpath/somefile.css
Here I am trying to look for path starting with public/css followed by any character except for another forward slash.
so "public/css/somefile.css" should match but the other 2 should not.
A better solution may be to somehow specify the number of levels to match after the prefix using something like
(public\/css\/{1,2}.*)
but I can't seem to figure that out either, some help with this would be appreciated.
edit
No idea why this question has been marked down twice, I have clearly stated the requirement with sample code and test cases and also attempted to solve the issue, why is it being marked down ?
You can use this regex:
/^(public\/css\/[^\/]*?)$/gm
^ : Starts with
[^/] : Not /
*?: Any Characters
$: Ends with
g: Global Flag
m: Multi-line Flag
Something like this?
/public\/css\/[^\/]+$/
This will match
public/css/[Any characters except for /]$
$ is matching the end of the string in regex.

Make this RegEx work in Javascript?

I need to get the data of an SRT file as an array using a regex.
Here is my code so far:
Javacript:
function readSrt() {
var srtUrl = 'assets/media/subtitles.srt';
$.get(srtUrl, function(data) {
console.log("SRT:", data); // it reads ok
var regexp = /(.*)\n(.*),\d\d\d --> (.*)\n(.*)/g; // this regex doesn't work
console.log("SUBS:", data.match(regexp)); // outputs null
});
}
subtitles.srt:
0
00:00:00,000 --> 00:00:01,000
Instructor…All right, let's start off
1
00:00:01,000 --> 00:00:04,000
here. We were, I think, wrapping up kind
...
14
00:00:40,000 --> 00:00:42,000
mound, basically.
15
00:00:42,000 --> 00:00:44,000
If you go to Colossae today, none of it
...
Need to get:
1. 0
2. 00:00:00
3. 00:00:01,000
4. Instructor…All right, let's start off
Did several attempts in regex101.com but only seems to work well with PHP but not javascript.
What am I doing wrong and how do I fix it?
One thing that might be wrong in your regex is that .* is greedy. It will start to match in the first caption and only end the match at the last caption. Try replacing it with its lazy alternative, to match as little as possible each time.
/(.*?)\n(.*?),\d\d\d --> (.*?)\n(.*?)\n/
You're relying on the . wildcard character to also catch \n (and presumably \r), which it won't do (see http://www.regular-expressions.info/javascript.html for the info on that, "There is indeed no /s modifier" bit). Use explicit groups instead for greater win: use [^\n]+\n for "anything and then a newline instead, [^,]+, for "anything before the comma", etc.
You're close. The main problem with your regex is the part around (.*),\d\d\d, which doesn't work because (.*) will match the entire line of text, making the rest of the regex invalid.
The fixed regex is:
/(.*)\n([^,]+),[\d]{3} --> (.*)\n(.*)(?:\n*)/g

Categories