How to push certain strings from an huge string into array - javascript

This might sound weird, but i want to parse a xml response by pushing certain string with following regular expression:
data-href="[\d\w\/:\.\=]*[">]?
into an array.
The reason for this is just for testing. A friend has built a webpage with jimdo where he displays a image gallery. now i want to try parsing the xml on this site and only fetch the images which are at every data-href tag and use them in my react native app.
Any ideas?

Why not just use the String.match method?
var matches = xml_str.match(/data-href="[\d\w\/:\.\=]*[">]?/g);
Edit: if you want to just get the URL within the href, use regex capture groups:
var matches = xml_str.match(/data-href="([\d\w\/:\.\=]*)[">]?/g);
console.log(matches);
You can then map the matches array to only contain the element with the URL.

Related

How to "black list" pages with a query string parameter

I am using cookieconsent.js to show a popup for users to accept for my website. I need to stop the cookie consent popup from showing if a page has a certain query string.
The documentation for cookieconsent provides a solution to "blacklistPage" where I can "Specify pages using string or RegExp" that I want to prevent the popup from showing on.
This is fine until I try to use regex for a query string.
Example of path, filename and query string to match:
/sub-folder/file-name.shtml?value=pair
"blacklistPage": [
"/.*\?value=pair"
]
According to the documentation it's expecting either regex or a string but you're trying to pass regex in a string which isn't valid.
using a string : ‘/index.html’ (matches ‘/index.html’ exactly)
using RegExp : /\/page_[\d]+.html/ (matched ‘/page_1.html’ and ‘/page_2.html’ etc)
Additionally you're quoting the blacklistPage, this doesn't need to be quoted.
By removing the quotes and provide a standard JS regex format you can make the following:
blacklistPage: [
/\/.*\?value=pair/
]
Alternatively your use case is simple so you could just use a string and avoid regex:
blacklistPage: [
'/sub-folder/file-name.shtml?value=pair'
]
I have come to the conclusion, along with a friend who knows js much better than I, that the cookieconsent.js script will not allow query strings.

Javascript str.split(/[^a-zA-Z0-9.#]|(username|fname)/ not removing 'username' or 'fname' from string

I have a simple query string in my program:
?username=someone#email.com&fname=
I have come up with a regular expression that selects everything except the data I want:
[^a-zA-Z0-9.#]|(username|fname)
I am trying to use javascripts str.split() to split around everything that isn't actually data in the query, like so:
let userinfo = global.location.search.split(/[^a-zA-Z0-9.#]|(username|fname)/).filter(Boolean);
Unfortunately, when this runs, I get the following array:
['username', 'someone#email.com', 'fname'].
I expect just ['someone#email.com'] since "username" and "fname" should be split around from my regex.
I have tested this in https://regex101.com/ and it appears to work fine, so I'm not sure why it doesn't work in JS.
Any ideas?
When you have a capture group in the regexp used with .split() the captured strings are included in the resulting array.
If you need a group but don't want to include it in the result, use a non-capturing group (?:username|fname).
But there's no need for the group in this case at all. /xxx|(yyy|zzz)/ matches the same thing as /xxx|yyy|zzz/, they only differ in what they capture.
/[^a-zA-Z0-9.#]|username|fname/
You need Regex for such tasks, you can use standard URLSearchParams API
let searchParams = "?username=someone#email.com&fname="
let parsed = new URLSearchParams(searchParams)
console.log(parsed.get('username'))

Match everything after word using only regex

I'm trying to set up a ShareX custom engine, and after the upload I'm given the full url, for instance http://foo.com/HF139hR and I can work that string with regex before copying it to clipboard. What I want to do is to get only the last part of the url, HF139hR so I can throw it into another url, say http://foo.com/?viewer=HF139hR. So far I was using the expression\w+$ to grab it but sometimes I can get an upload error, and that will also get the last word of the error message and pass it to ?viewer=.
Doing my research I found \bfoo.com\/\K\S+, which is exactly what I want, but unfortunately it is not supported on javascript.
\bfoo.com/\K\S+
\bfoo.com\/(\S+)
You can use a similar one and grab the group 1 or capture 1
You can use this Regex: /\/(\w+)(\?+.*)*$/ and get the capturing group between (), this will avoid the part of the upload error which starts with ? like in the example `?viewer=$1$, you can try it here:
var url="http://foo.com/HF139hR?=viewer=$1$";
var reg=/\/(\w+)(\?+.*)*$/;
alert(url.match(reg)[1]);
And if you use only the url="http://foo.com/HF139hR" as a url it will also match the same thing.
And you can take a look at this Regex DEMO where you can see the match information.

trouble using string.replace with regex

Given something a regex like this:
http://rubular.com/r/ai1LFT5jvK
I want to use string.replace to replace "subdir" with a string of my choosing.
Doing myStr.replace(/^.*\/\/.*\.net\/.*\/(.*)\/.*\z/,otherStr)
only returns the same string, as shown here: http://jsfiddle.net/nLmbV/
If you view the Rublar, it appears to capture what I want it to capture, but on the Fiddle, it doesn't replace it.
I'd like to know why this happens, and what I'm doing wrong. A correct regex or a correct implementation of the replace call would be nice, but most of all, I want to understand what I'm doing wrong so that I can avoid it in the future.
EDIT
I've updated the fiddle to change my regex from:
/^.*\/\/.*\.net\/.*\/(.*)\/.*\z/
to
/^.*\/\/.*\.net\/.*\/(.*)\/.*$/
And according to the fiddle, it just returns hello instead of https://xxxxxxxxxxx.cloudfront.net/dir/hello/Slide1_v2.PNG
It's that little \z in your regex.
You probably forgot to replace it with a $ sign. JavaScript uses ^ and $ as anchors, while Ruby uses \A and \z.
To answer your edit:
The match is always replaced as a whole. You'll want to group both the left side and the right side of the to-be-replaced part and reinsert it in the replacement:
url.replace(/^(.*\/\/.*\.net\/.*\/).*(\/.*)$/,"$1hello$2")
Before I get marked down, I know the question asks about regexp. The reason for this answer URLs are nearly impossible to process reliably with a regexp without writing fiendishly complex regexps. It can be done, but it makes your head hurt!
If you are doing this in a browser, you can use an A tag in your script to make things much simpler. The A tag knows how to parse them into pieces, and it lets you modify the pieces independently, so you only need to deal with the pathname:
//make a temporary a tag
var a = document.createElement('a');
//set the href property to the url you want to process
a.href = "scheme://host.domain/path/to/the/file?querystring"
//grab the path part of the url, and chop up into an array of directories
var dirs = a.pathname.split('/');
//set 2nd dir name - array is ['','path','to','file']
dirs[2]='hello';
//put the path back together
a.pathname = dirs.join('/');
a.href now contains the URL you want.
More lines, but also more hair left when you come back to change the code later.

JavaScript match current url against string with wildcards?

Im trying to match the following string:
http://*/*/checkout
to this URL:
http://www.url.com/sub-folder/checkout
Ultimately what i am trying to do is to find a way to display my JavaScript widget on certain pages by allowing to add conditions like the above.
How could i use the string to see if the current URL matches?
Use a regex:
> /^http:\/\/.*?\/.*?\/checkout$/.test('http://www.url.com/sub-folder/checkout')
true
Here's a more readable version:
RegExp('^http://.*?/.*?/checkout$').test('http://www.url.com/sub-folder/checkout')

Categories