Regular expression to find file name using any Javascript based library - javascript

I am working with markdown editor. After user uploads an image I see following line being entered on the editor.
![](/assets/img/content/id/2e65c657cf609fca24893278cdcb2159.gif)
What I want is: I want to extract the file name by searching content of entire editor.
assume I have following content on my editor:
Hello world.pdf
![](/assets/img/content/id/2e65c657cf609fca24893278cdcb2159.gif)
This is a test
![](/assets/img/content/id/2e65c657cf609fcaeqwe78cdcb2159.png)adding text
Adding another image
![](/assets/img/content/id/2e65c657cf609f24432434423b2159.jpg)
From the above content, after running the regular expression, I should be able to get:
2e65c657cf609fca24893278cdcb2159.gif
2e65c657cf609fcaeqwe78cdcb2159.png
2e65c657cf609f24432434423b2159.jpg
I have tried something like this. Though its working, I am not sure its the best solution:
var myregex = /\(\/assets\/img\/content\/id\/(.+?)\)/gm
var result, allMatches = [];
while((result = myregex.exec(data)) != null) {
var match = result[1]; // get the first match pattern of this match
allMatches.push(match);
}

Use following regex:
var fileNames = content.match(/(\w+\.[a-z]{3,4})/gi);
REGEX Explanation
/: Delimiters of regex
(): Capturing Group
\w+: Matches one or more of any alphabetical character and _(underscore)
\.: Matches . literal
[a-z]{3,4}: Matches 3 to 4 alphabets
gi: Match all occurrences g and case insensitive i
DEMO
JsFiddle DEMO
UPDATE
var fileNames = content.match(/\!\[\].*?(\w+\.[a-z]{3,4})/gi);
fileNames = fileNames.toString();
var names = fileNames.match(/\w+\.[a-z]{3,4}/gi);
alert(names);
DEMO

Try this (for images)
/(\w+\.(?:gif|png|jpe?g))/gi
demo
or for any file type
/(\w+\.(?:\w{2,4}))/gi
demo
UPDATE
Based on comments made on #Tushar's answer
(?:\/assets\/img\/content\/id\/)(\w+\.(?:\w{2,4}))
will find any file with the preceding path /assets/img/content/id/. As one of the tags stated javascript we cannot use a positive lookbehind
demo

Related

Regex to convert markdown to html

My goal is to take a markdown text and create the necessary bold/italic/underline html tags.
Looked around for answers, got some inspiration but I'm still stuck.
I have the following typescript code, the regex matches the expression including the double asterisk:
var text = 'My **bold\n\n** text.\n'
var bold = /(?=\*\*)((.|\n)*)(?<=\*\*)/gm
var html = text.replace(bold, '<strong>$1</strong>');
console.log(html)
Now the result of this is : My <\strong>** bold\n\n **<\strong> text.
Everything is great aside from the leftover double asterisk.
I also tried to remove them in a later 'replace' statement, but this creates further issues.
How can I ensure they are removed properly?
With your pattern (?=\*\*)((.|\n)*)(?<=\*\*) you assert (not match) with (?=\*\*) that there is ** directly to the right.
Then directly after that, you capture the ** using ((.|\n)*) so then it becomes part of the match.
Then at the end you assert again with (?<=\*\*) that there is ** directly to the left, but ((.|\n)*) has already matched it.
This way so you will end up with all the ** in the match.
You don't need lookarounds at all, as you are already using a capture group.
In Javascript you could match the ** on the left and right and capture any character in a capture group:
\*\*([^]*?)\*\*
Regex demo
But I would suggest using a dedicated parser to parse markdown instead of using a regex.
Just make another call to replaceAll removing the ** with and empty string.
var text = 'My **bold\n\n** text.\n'
var bold = /(?=\*\*)((.|\n)*)(?<=\*\*)/gm
var html = text.replace(bold, '<strong>$1</strong>');
html = html.replaceAll(/\*\*/gm,'');
console.log(html)

How can I include the delimiter with regex String.split()?

I need to parse the tokens from a GS1 UDI format string:
"(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
I would like to split that string with a regex on the "(nnn)" and have the delimiter included with the split values, like this:
[ "(20)987111", "(240)A", "(10)ABC123", "(17)2022-04-01", "(21)888888888888888" ]
Below is a JSFiddle with examples, but in case you want to see it right here:
// This includes the delimiter match in the results, but I want the delimiter included WITH the value
// after it, e.g.: ["(20)987111", ...]
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\))/).filter(Boolean))
// Result: ["(20)", "987111", "(240)", "A", "(10)", "ABC123", "(17)", "2022-04-01", "(21)", "888888888888888"]
// If I include a pattern that should (I think) match the content following the delimiter I will
// only get a single result that is the full string:
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)\W+)/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
// I think this is because I'm effectively mathching the entire string, hence a single result.
// So now I'll try to match only up to the start of the next "(":
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)(^\())/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
I've found and read this question, however the examples there are matching literals and I'm using character classes and getting different results.
I'm failing to create a regex pattern that will provide what I'm after. Here's a JSFiddle of some of the things I've tried: https://jsfiddle.net/6bogpqLy/
I can't guarantee the order of the "application identifiers" in the input string and as such, match with named captures isn't an attractive option.
You can split on positions where parenthesised element follows, by using a zero-length lookahead assertion:
const text = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
const parts = text.split(/(?=\(\d+\))/)
console.log(parts)
Instead of split use match to create the array. Then find 1) digits in parenthesis, followed by a group that might contain a digit, a letter, or a hyphen, and then 2) group that whole query.
(PS. I often find a site like Regex101 really helps when it comes to testing out expressions outside of a development environment.)
const re = /(\(\d+\)[\d\-A-Z]+)/g;
const str = '(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888';
console.log(str.match(re));

How to extract a particular text from url in JavaScript

I have a url like http://www.somedotcom.com/all/~childrens-day/pr?sid=all.
I want to extract childrens-day. How to get that? Right now I am doing it like this
url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all"
url.match('~.+\/');
But what I am getting is ["~childrens-day/"].
Is there a (definitely there would be) short and sweet way to get the above text without ["~ and /"] i.e just childrens-day.
Thanks
You could use a negated character class and a capture group ( ) and refer to capture group #1. The caret (^) inside of a character class [ ] is considered the negation operator.
var url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all";
var result = url.match(/~([^~]+)\//);
console.log(result[1]); // "childrens-day"
See Working demo
Note: If you have many url's inside of a string you may want to add the ? quantifier for a non greedy match.
var result = url.match(/~([^~]+?)\//);
Like so:
var url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all"
var matches = url.match(/~(.+?)\//);
console.log(matches[1]);
Working example: http://regex101.com/r/xU4nZ6
Note that your regular expression wasn't actually properly delimited either, not sure how you got the result you did.
Use non-capturing groups with a captured group then access the [1] element of the matches array:
(?:~)(.+)(?:/)
Keep in mind that you will need to escape your / if using it also as your RegEx delimiter.
Yes, it is.
url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all";
url.match('~(.+)\/')[1];
Just wrap what you need into parenteses group. No more modifications into your code is needed.
References: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
You could just do a string replace.
url.replace('~', '');
url.replace('/', '');
http://www.w3schools.com/jsref/jsref_replace.asp

How do I make a regular expression that matches everything on a line after a given character?

If I have a String in JavaScript
key=value
How do I make a RegEx that matches key excluding =?
In other words:
var regex = //Regular Expression goes here
regex.exec("key=value")[0]//Should be "key"
How do I make a RegEx that matches value excluding =?
I am using this code to define a language for the Prism syntax highlighter so I do not control the JavaScript code doing the Regular Expression matching nor can I use split.
Well, you could do this:
/^[^=]*/ // anything not containing = at the start of a line
/[^=]*$/ // anything not containing = at the end of a line
It might be better to look into Prism's lookbehind property, and use something like this:
{
'pattern': /(=).*$/,
'lookbehind': true
}
According to the documentation this would cause the = character not to be part of the token this pattern matches.
use this regex (^.+?)=(.+?$)
group 1 contain key
group 2 contain value
but split is better solution
.*=(.*)
This will match anything after =
(.*)=.*
This will match anything before =
Look into greedy vs ungreedy quantifiers if you expect more than one = character.
Edit: as OP has clarified they're using javascript:
var str = "key=value";
var n=str.match(/(.*)=/i)[1]; // before =
var n=str.match(/=(.*)/i)[1]; // after =
var regex = /^[^=]*/;
regex.exec("key=value");

remove all but a specific portion of a string in javascript

I am writing a little app for Sharepoint. I am trying to extract some text from the middle of a field that is returned:
var ows_MetaInfo="1;#Subject:SW|NameOfADocument
vti_parservers:SR|23.0.0.6421
ContentTypeID:SW|0x0101001DB26Cf25E4F31488B7333256A77D2CA
vti_cachedtitle:SR|NameOfADocument
vti_title:SR|ATitleOfADocument
_Author:SW:|TheNameOfOurCompany
_Category:SW|
ContentType:SW|Document
vti_author::SR|mrwienerdog
_Comments:SW|This is very much the string I need extracted
vti_categories:VW|
vtiapprovallevel:SR|
vti_modifiedby:SR|mrwienerdog
vti_assignedto:SR|
Keywords:SW|Project Name
ContentType _Comments"
So......All I want returned is "This is very much the string I need extracted"
Do I need a regex and a string replace? How would you write the regex?
Yes, you can use a regular expression for this (this is the sort of thing they are good for). Assuming you always want the string after the pipe (|) on the line starting with "_Comments:SW|", here's how you can extract it:
var matchresult = ows_MetaInfo.match(/^_Comments:SW\|(.*)$/m);
var comment = (matchresult==null) ? "" : matchresult[1];
Note that the .match() method of the String object returns an array. The first (index 0) element will be the entire match (here, we the entire match is the whole line, as we anchored it with ^ and $; note that adding the "m" after the regex makes this a multiline regex, allowing us to match the start and end of any line within the multi-line input), and the rest of the array are the submatches that we capture using parenthesis. Above we've captured the part of the line that you want, so that will present in the second item in the array (index 1).
If there is no match ("_Comments:SW|" doesnt appear in ows_MetaInfo), then .match() will return null, which is why we test it before pulling out the comment.
If you need to adjust the regex for other scenarios, have a look at the Regex docs on Mozilla Dev Network: https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
You can use this code:
var match = ows_MetaInfo.match(/_Comments:SW\|([^\n]+)/);
if (match)
document.writeln(match[1]);
I'm far from competent with RegEx, so here is my RegEx-less solution. See comments for further detail.
var extractedText = ExtractText(ows_MetaInfo);
function ExtractText(arg) {
// Use the pipe delimiter to turn the string into an array
var aryValues = ows_MetaInfo.split("|");
// Find the portion of the array that contains "vti_categories:VW"
for (var i = 0; i < aryValues.length; i++) {
if (aryValues[i].search("vti_categories:VW") != -1)
return aryValues[i].replace("vti_categories:VW", "");
}
return null;
}​
Here's a working fiddle to demonstrate.

Categories