Extract text from HTML with Javascript regex

Extract text from HTML with Javascript regex - javascript

I am trying to parse a webpage and to get the number reference after <li>YM#. For example I need to get 1234-234234 in a variable from the HTML that contains
<li>YM# 1234-234234 </li>
Many thanks for your help someone!
Rich

currently, your regex only matches if there is a single number before the dash and a single number after it. This will let you get one or more numbers in each place instead:
/YM#[0-9]+-[0-9]+/g
Then, you also need to capture it, so we use a cgroup to captue it:
/YM#([0-9]+-[0-9]+)/g
Then we need to refer to the capture group again, so we use the following code instead of the String.match
var regex = /YM#([0-9]+-[0-9]+)/g;
var match = regex.exec(text);
var id = match[1];
// 0: match of entire regex
// after that, each of the groups gets a number

(?!<li>YM#\s)([\d-]+)
http://regexr.com?30ng5
This will match the numbers.

Try this:
(<li>[^#<>]*?# *)([\d\-]+)\b
and get the result in $2.

Related

Matching a line protocol string with regex in JavaScript

I'm trying to match a string like this: test0,id=28084 type=high,18765003 138456387
And I'm using this regex:
const str = `test0,id=28084 type=high,18765003 138456387`;
console.log(str.match(/\s*([a-zA-Z0-9\-_.]+)\s*,\s*([a-zA-Z0-9\-_.]+\s*=\s*[a-zA-Z0-9\-_.]+\s*)*\s*,\s*([0-9.]+)\s+([0-9]+)\s*/))
But I am not getting the id part. Just type=high.
Any help is appreciated.
Edit: I see that I will only get the last capture group. But not stated in the question, I need there to be a dynamic number of fields at that point in the string. I'm wondering if there's some other way to accomplish this.

You can capture in the same group all the key-value pairs and then split them:
const str = `test0,id=28084 type=high,18765003 138456387`;
matches = str.match(/\s*([a-zA-Z0-9\-_.]+)\s*,\s*((?:[a-zA-Z0-9\-_.]+\s*=\s*[a-zA-Z0-9\-_.]+\s*)*)\s*,\s*([0-9.]+)\s*([0-9]+)\s*/)
matches.splice.apply(matches, [2, 1].concat(matches[2].split(/\s+/)));
console.log(matches)
Notice that I changed in your regex the second group from (...)* to ((?:...)*).

Javascript regex to find a string and extract it from whole string

I have a Javascript array of string that contains urls like:
http://www.example.com.tr/?first=DSPN47ZTE1BGMR&second=NECEFT8RYD
http://www.example.com.tr/?first=RTR22414242144&second=YUUSADASFF
http://www.example.com.tr/?first=KOSDFASEWQESAS&second=VERERQWWFA
http://www.example.com.tr/?first=POLUJYUSD41234&second=13F241DASD
http://www.example.com.tr/?first=54SADFD14242RD&second=TYY42412DD
I want to extract "first" query parameter values from these url.
I mean i need values DSPN47ZTE1BGMR, RTR22414242144, KOSDFASEWQESAS, POLUJYUSD41234, 54SADFD14242RD
Because i am not good using regex, i couldnt find a way to extract these values from the array. Any help will be appreciated

Instead of using regex, why not just create a URL object out of the string and extract the parameters natively?
let url = new URL("http://www.example.com.tr/?first=54SADFD14242RD&second=TYY42412DD");
console.log(url.searchParams.get("first")); // -> "54SADFD14242RD"
If you don't know the name of the first parameter, you can still manually search the query string using the URL constructor.
let url = new URL("http://www.example.com.tr/?first=54SADFD14242RD&second=TYY42412DD");
console.log(url.search.match(/\?([^&$]+)/)[1]); // -> "54SADFD14242RD"
The index of the search represents the parameter's position (with index zero being the whole matched string). Note that .match returns null for no matches, so the code above would throw an error if there's no parameters in the URL.

Does it have to use regex? Would something like the following work:
var x = 'http://www.example.com.tr/?first=DSPN47ZTE1BGMR&second=NECEFT8RYD';
x.split('?first=')[1].split('&second')[0];

Try this regex:
first=([^&]*)
Capture the contents of Group 1
Click for Demo
Code
Explanation:
first= - matches first=
([^&]*) - matches 0+ occurences of any character that is not a & and stores it in Group 1

You can use
(?<=\?first=)[^&]+?
(?<=\?first=) - positive look behind to match ?first=
[^&]+? - Matches any character up to & (lazy mode)
Demo
Without positive look behind you do like this
let str = `http://www.example.com.tr/?first=DSPN47ZTE1BGMR&second=NECEFT8RYD
http://www.example.com.tr/?first=RTR22414242144&second=YUUSADASFF
http://www.example.com.tr/?first=KOSDFASEWQESAS&second=VERERQWWFA
http://www.example.com.tr/?first=POLUJYUSD41234&second=13F241DASD
http://www.example.com.tr/?first=54SADFD14242RD&second=TYY42412DD`
let op = str.match(/\?first=([^&]+)/g).map(e=> e.split('=')[1])
console.log(op)

How to extract a particular text from url in JavaScript

I have a url like http://www.somedotcom.com/all/~childrens-day/pr?sid=all.
I want to extract childrens-day. How to get that? Right now I am doing it like this
url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all"
url.match('~.+\/');
But what I am getting is ["~childrens-day/"].
Is there a (definitely there would be) short and sweet way to get the above text without ["~ and /"] i.e just childrens-day.
Thanks

You could use a negated character class and a capture group ( ) and refer to capture group #1. The caret (^) inside of a character class [ ] is considered the negation operator.
var url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all";
var result = url.match(/~([^~]+)\//);
console.log(result[1]); // "childrens-day"
See Working demo
Note: If you have many url's inside of a string you may want to add the ? quantifier for a non greedy match.
var result = url.match(/~([^~]+?)\//);

Like so:
var url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all"
var matches = url.match(/~(.+?)\//);
console.log(matches[1]);
Working example: http://regex101.com/r/xU4nZ6
Note that your regular expression wasn't actually properly delimited either, not sure how you got the result you did.

Use non-capturing groups with a captured group then access the [1] element of the matches array:
(?:~)(.+)(?:/)
Keep in mind that you will need to escape your / if using it also as your RegEx delimiter.

Yes, it is.
url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all";
url.match('~(.+)\/')[1];
Just wrap what you need into parenteses group. No more modifications into your code is needed.
References: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp

You could just do a string replace.
url.replace('~', '');
url.replace('/', '');
http://www.w3schools.com/jsref/jsref_replace.asp

Get Second to last character position from string using jQuery

I have a dynamically formed string like - part1.abc.part2.abc.part3.abc
In this string I want to get the substring based on second to last occurrence of "." so that I can get and part3.abc
Is there any direct method available to get this?

You could use:
'part1.abc.part2.abc.part3.abc'.split('.').splice(-2).join('.'); // 'part3.abc'
You don't need jQuery for this.

Nothing to do with jQuery. You can use a regular expression:
var re = /[^\.]+\.[^\.]+$/;
var match = s.match(re);
if (match) {
alert(match[0]);
}
or
'part1.abc.part2.abc.part3.abc'.match(/[^.]+\.[^.]+$/)[0];
but the first is more robust.
You could also use split and get the last two elements from the resulting array (if they exist).

Parse string regex for known keys but leave separator

Ok, So I hit a little bit of a snag trying to make a regex.
Essentially, I want a string like:
error=some=new item user=max dateFrom=2013-01-15T05:00:00.000Z dateTo=2013-01-16T05:00:00.000Z
to be parsed to read
error=some=new item
user=max
dateFrom=2013-01-15T05:00:00.000Z
ateTo=2013-01-16T05:00:00.000Z
So I want it to pull known keywords, and ignore other strings that have =.
My current regex looks like this:
(error|user|dateFrom|dateTo|timeFrom|timeTo|hang)\=[\w\s\f\-\:]+(?![(error|user|dateFrom|dateTo|timeFrom|timeTo|hang)\=])
So I'm using known keywords to be used dynamically so I can list them as being know.
How could I write it to include this requirement?

You could use a replace like so:
var input = "error=some=new item user=max dateFrom=2013-01-15T05:00:00.000Z dateTo=2013-01-16T05:00:00.000Z";
var result = input.replace(/\s*\b((?:error|user|dateFrom|dateTo|timeFrom|timeTo|hang)=)/g, "\n$1");
result = result.replace(/^\r?\n/, ""); // remove the first line
Result:
error=some=new item
user=max
dateFrom=2013-01-15T05:00:00.000Z
dateTo=2013-01-16T05:00:00.000Z

Another way to tokenize the string:
var tokens = inputString.split(/ (?=[^= ]+=)/);
The regex looks for space that is succeeded by (a non-space-non-equal-sign sequence that ends with a =), and split at those spaces.
Result:
["error=some=new item", "user=max", "dateFrom=2013-01-15T05:00:00.000Z", "dateTo=2013-01-16T05:00:00.000Z"]
Using the technique above and adapt your regex from your question:
var tokens = inputString.split(/(?=\b(?:error|user|dateFrom|dateTo|timeFrom|timeTo|hang)=)/);
This will correctly split the input pointed out by Qtax mentioned in the comment: "error=user=max foo=bar"
["error=", "user=max foo=bar"]

We Keep Coding

JavaScript is the programming language of the Web.

Extract text from HTML with Javascript regex - javascript

I am trying to parse a webpage and to get the number reference after <li>YM#. For example I need to get 1234-234234 in a variable from the HTML that contains <li>YM# 1234-234234 </li> Many thanks for your help someone! Rich

(?!<li>YM#\s)([\d-]+) http://regexr.com?30ng5 This will match the numbers.

Try this: (<li>[^#<>]?# )([\d\-]+)\b and get the result in $2.

Related

Matching a line protocol string with regex in JavaScript

Javascript regex to find a string and extract it from whole string

How to extract a particular text from url in JavaScript

Get Second to last character position from string using jQuery

Parse string regex for known keys but leave separator

Categories

Resources

We Keep Coding

JavaScript is the programming language of the Web.

Extract text from HTML with Javascript regex - javascript

I am trying to parse a webpage and to get the number reference after <li>YM#. For example I need to get 1234-234234 in a variable from the HTML that contains <li>YM# 1234-234234 </li> Many thanks for your help someone! Rich

(?!<li>YM#\s)([\d-]+) http://regexr.com?30ng5 This will match the numbers.

Try this: (<li>[^#<>]*?# *)([\d\-]+)\b and get the result in $2.

Related

Matching a line protocol string with regex in JavaScript

Javascript regex to find a string and extract it from whole string

How to extract a particular text from url in JavaScript

Get Second to last character position from string using jQuery

Parse string regex for known keys but leave separator

Categories

Resources

Try this: (<li>[^#<>]?# )([\d\-]+)\b and get the result in $2.