Javascript regex parse complex url string - javascript

I need to parse a complex URL string to fetch specific values.
From the following URL string:
/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot&format=rss&url=http://any-feed-url-b.com?filter=rising&format=rss
I need to extract this result in array format:
['http://any-feed-url-a.com?filter=hot&format=rss', 'http://any-feed-url-b.com?filter=rising&format=rss']
I tried already with this one /url=([^&]+)/ but I can't capture all correctly all the query parameters. And I would like to omit the url=.
RegExr link
Thanks in advance.

This regex works for me: url=([a-z:/.?=-]+&[a-z=]+)
also, you can test this: /http(s)?://([a-z-.?=&])+&/g
const string = '/api/rss/feeds?url=http://any-feed-url.com?filter=hot&format=rss&url=http://any-feed-url.com?filter=latest&format=rss'
const string2 = '/api/rss/feeds?url=http://any-feed-url.com?filter=hot&format=rss&next=parm&url=http://any-feed-url.com?filter=latest&format=rss'
const regex = /url=([a-z:/.?=-]+&[a-z=]+)/g;
const regex2 = /http(s)?:\/\/([a-z-.?=&])+&/g;
console.log(string.match(regex))
console.log(string2.match(regex2))

have you tried to use split method ? instead of using regex.
const urlsArr = "/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot&format=rss&url=http://any-feed-url-b.com?filter=rising&format=rss".split("url=");
urlsArr.shift(); // removing first item from array -> "/api/rss/feeds?"
console.log(urlsArr)
)
which is going to return ["/api/rss/feeds?", "http://any-feed-url-a.com?filter=hot&format=rss&", "http://any-feed-url-b.com?filter=rising&format=rss"] then i am dropping first item in array
if possible its better to use something else then regex CoddingHorror: regular-expressions-now-you-have-two-problems

You can matchAll the url's, then map the capture group 1 to an array.
str = '/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot&format=rss&url=http://any-feed-url-b.com?filter=rising&format=rss'
arr = [...str.matchAll(/url=(.*?)(?=&url=|$)/g)].map(x => x[1])
console.log(arr)
But matchAll isn't supported by older browsers.
But looping an exec to fill an array works also.
str = '/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot&format=rss&url=http://any-feed-url-b.com?filter=rising&format=rss'
re = /url=(.*?)(?=&url=|$)/g;
arr = [];
while (m = re.exec(str)) {
arr.push(m[1]);
}
console.log(arr)

If your input is better-formed in reality than shown in the question and you’re targeting a modern JavaScript environment, there’s URL/URLSearchParams:
const input = '/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot%26format=rss&url=http://any-feed-url-b.com?filter=rising%26format=rss';
const url = new URL(input, 'http://example.com/');
console.log(url.searchParams.getAll('url'));
Notice how & has to be escaped as %26 for it to make sense.
Without this input in a standard form, it’s not clear which rules of URLs are still on the table.

Related

Clean way to get value from string in Javascript

I have this string https://pokeapi.co/api/v2/pokemon/6/
I would like to extract the value after pokemon/ in this case 6. This represent Pokémon ids which could span between 1 -> N
I know this is pretty trivial and was wondering a nice solution for future proofing. Here is my solution.
const foo= "https://pokeapi.co/api/v2/pokemon/6/"
const result = foo.split('/') //[ 'https:', '', 'pokeapi.co', 'api', 'v2', 'pokemon', '6', '' ]
const ids = result[6]
You can grab the value after the last / character like so:
const pokemonID = foo.substring(foo.lastIndexOf("/") + 1)
Using String.lastIndexOf to get the final index of the slash character, and then using String.substring with only a single argument to parse the part of the string after that last / character. We add 1 to the lastIndexOf to omit the final slash.
For this to work you need to drop your final trailing slash (which won't do anything anyways) from your request URL.
This could be abstracted into a utility function to get the last value of any url, which is the biggest improvement over using a split and find by index approach.
However, beware, it will take whatever the value is after the last slash.
Using the string https://pokeapi.co/api/v2/pokemon/6/pokedex would return pokedex.
If you are using Angular, React, Vue etc with built in router, there will be specific APIs for the framework that can get the exact parameter you need regardless of URL shape.
You should use the built-in URL API to do the splitting correctly for you:
const url = new URL("https://pokeapi.co/api/v2/pokemon/6/");
Then you can get the pathname and split that:
const path = url.pathname.split("/");
After you split it you can get the value 6 by accessing the 5th element here:
const url = new URL("https://pokeapi.co/api/v2/pokemon/6/");
const path = url.pathname.split("/");
console.log(path[4]);
you could also do something like:
url.split('pokemon/')[1].split('/')[0]
Here is what I would do
const result = new URL(url).pathname.split('/');
const id = result[4];
I am not sure if this is better than yours
const foo= "https://pokeapi.co/api/v2/pokemon/6/"
const result = foo.indexOf("pokemon/");
const id_index = result + 8
const id = foo[id_index];

Array contains string the other way

Lets say I have this array
const urlList = ["/user/profile", "/user/edit", "/verify/device"];
const isContainUrl = fruits.includes(window.location.pathname);
That is fine if pathname is static like above, what if I have url like /verify/device/{device-id} , I want isContainUrl to be true as well for partial match. But since I am comparing longer string to the shorter one, so I cant simple use indexOf.
May be anyone has any idea to do it?
It is possible to use some() and check both edge cases:
const urlList = ["/user/profile", "/user/edit", "/verify/device"];
const veryLongString = '/verify/device/1';
const isContainUrl = urlList.some(s => s.includes(veryLongString)
|| veryLongString.includes(s));
console.log(isContainUrl)
You need to parse url according to custom rules, or do some regular matching work.
Refer to path-to-regexp.

Dynamic string cutting

Okay, so I have a filepath with a variable prefix...
C:\Users\susan ivey\Documents\VKS Projects\secc-electron\src\views\main.jade
... now this path will be different for whatever computer I'm working on...
is there a way to traverse the string up to say 'secc-electron\', and drop it and everything before it while preserving the rest of it? I'm familiar with converting strings to arrays to manipulate elements contained within delimiters, but this is a problem that I have yet to come up with an answer to... would there be some sort of regex solution instead? I'm not that great with regex so I wouldn't know where to begin...
What you probably want is to do a split (with regex or not):
Here's an example:
var paragraph = 'C:\\Users\\susan ivey\\Documents\\VKS Projects\\secc-electron\\src\\views\\main.jade';
var splittedString = paragraph.split("secc-electron"); // returns an array of 2 element containing "C:\\Users\\susan ivey\\Documents\\VKS Projects\\" as the first element and "\\src\\views\\main.jade" as the 2nd element
console.log(splittedString[1]);
You can have a look at this https://www.w3schools.com/jsref/jsref_split.asp to learn more about this function.
With Regex you can do:
var myPath = 'C:\Users\susan ivey\Documents\VKS Projects\secc-electron\src\views\main.jade'
var relativePath = myPath.replace(/.*(?=secc-electron)/, '');
The Regex is:
.*(?=secc-electron)
It matches any characters up to 'secc-electron'. When calling replace it will return the last part of the path.
You can split the string at a certain point, then return the second part of the resulting array:
var string = "C:\Users\susan ivey\Documents\VKS Projects\secc-electron\src\views\main.jade"
console.log('string is: ', string)
var newArray = string.split("secc-electron")
console.log('newArray is: ', newArray)
console.log('newArray[1] is: ', newArray[1])
Alternatively you could use path.parse(path); https://nodejs.org/api/path.html#path_path_parse_path and retrieve the parts that you are interested in from the object that gets returned.

Split A string on a term AND THEN splitting on another term

I have a poorly designed URL query string that I can't easily change e.g.
https://mysite/.shtml?source=999&promotype=promo&cmpid=abc--dfg--hif-_-1234&cm=qrs-stv-_wyx&aff=45628_THIS+IS+Test_Example
I need to extract elements from it e.g. 45628
At the moment I'm using
document.URL.split(/aff=|_/)[5];
But I don't like this solution because if other parts of the URL structure change which is highly likely then my solution will break
Instead what I want to say is
split on "aff=" AND THEN split on "_"
Is there an easy way to do this, looking for a JS answer
Pretty sure you can do it like this:
document.URL.split("aff=")[1].split("_")[0];
I would start by splitting the string into tokens, if you can. Rather than working with foo=bar&fin=bin, break it down into [['foo', 'bar'], ['fin', 'bin]]. You can do that by splitting on the & and then the splitting each of those on the = character:
const data = 'source=999&promotype=promo&cmpid=abc--dfg--hif-_-1234&cm=qrs-stv-_wyx&aff=45628_THIS+IS+Test_Example';
console.log(data.split('&').map(it => it.split('=')));
Next, take the tokens you want and extract the leading digits:
const data = 'source=999&promotype=promo&cmpid=abc--dfg--hif-_-1234&cm=qrs-stv-_wyx&aff=45628_THIS+IS+Test_Example';
const tokens = data.split('&').map(it => it.split('='));
const [key,val] = tokens.find(([key]) => key === 'aff');
console.log(key, val.match(/[0-9]+/));
var url = 'https://mysite/.shtml?source=999&promotype=promo&cmpid=abc--dfg--hif-_-1234&cm=qrs-stv-_wyx&aff=45628_THIS+IS+Test_Example';
var re = new RegExp(/aff=(\d+)/);
var ext = re.exec(url)[1];
alert(ext)

Using an array of regex expressions for .match

I have something I am trying to accomplish.
I'd like to take an array built with AJAX/xml.
array[/word0/, /word1/, /word2/]
and put this into a form that could be used in a .match():
result = string.match(array)
I have tried using a for loop and stepping through the array using string.match(array[i]) to no avail.
Is there an easy way to do this?
Edit: You may have a syntax problem. The following is not valid syntax:
array[/word0/, /word1/, /word2/]
Something like this fixes it:
var regexps = [/word0/, /word1/, /word2/];
Original answer:
Javascript RegExps already do this. You're looking for:
var regexp = /word0|word1|word2/;
Assuming your list of matches comes back in the right format, you could achieve this like so:
var words = ["word0", "word1", "word2"];
var regexp = new Regexp(words.join("|"));
str.match(regexp);
http://jsfiddle.net/KALPh/
Your approach was fine. Here's my implementation:
var regexes = [/^def/, /^abc/],
testString = 'abcdef',
numRegexes = regexes.length;
for(var x=0;x<numRegexes;x++) {
alert(regexes[x].test(testString));
}
To initialize your array, use
var array = [/word0/, /word1/, /word2/];
Then you can use
str.match(array[i])
If your problem is the transmission in "AJAX/xml", then you'll need to build the regular expressions client side with new RegExp(somestring) where somestring might for example be "word0" : you can't embed a regex literal in XML.

Categories