Check Array Entries with Regex

Check Array Entries with Regex - javascript

I have an Array with one or more entries. Each one is a string (List of urls in open Tabs via Firefox SDK). I want to check if a specific url is already opened in some of the tabs (nothing special till now).
My problem is, that the url in tab list can have four diffrent fourms. For example:
Url I want to find in the tablist:
https://cmsr-author.de/cf#/content/test/de.html
But the url can also look like this:
https://cmsr-author.de/content/test/de.html
https://cmsr-author.de/test/de.html
https://cmsr-author.de/cf#/test/de.html
Of course the last part of the url (after /test/...) is always something diffrent. If I wasn't able to find one of the four urls in the tablist i want to call some other action.
My Solution till now is to build some if-chain:
if (res !== url1) {
if (res !== url2) {
if ...
But i thought there must be some more elegant way. Maybe via RegEx? I already have a capture to catch the first part (which stays the same https://cmsr-author.ws...) with it four forms. But i dont know how to implent this probably.

var urls = ["https://cmsr-author.de/content/test/de.html","https://cmsr-author.de/test/de.html","https://cmsr-author.de/cf#/test/de.html"]
var filtered = urls.filter(function(url)
{
return url.indexOf("cf#") > -1 && url.endsWith("/test/de.html")
})
var contains = filtered.length > 0
console.log(contains)

If you want to use regex you can do this by using groups for the middle part, which is explained in detail here: http://www.regular-expressions.info/refcapture.html
Practically, your regex would look something like that:
https:\/\/cmsr-author\.de\/(content|...|...)\/de\.html
Where ... must be replaced by the middle parts of the url which differ.
Note that | is "or" used to provide multiple possibilities within the group. The character / and . must be escaped since they have special roles in regex.
I hope that helps!

My English is not good,Do not fully understand what you mean,According to my idea,You should need a regular expression,Only to match the first.If I am wrong,
please # me.
I hope that helps!
var reg = /^https:\/\/cmsr\-author\.de\/cf#\/(?:\w+\/)+test\/de\.html$/gi;
var str1 = "https://cmsr-author.de/cf#/content/test/de.html";
var str2 = "https://cmsr-author.de/content/test/de.html";
var str3 = "https://cmsr-author.de/test/de.html";
var str4 = "https://cmsr-author.de/cf#/test/de.html";
console.log(reg.test(str1));
console.log(reg.test(str2));
console.log(reg.test(str3));
console.log(reg.test(str4));

Related

Checking for a specific URL regex

I need to check for a specific URL pattern using regex and not sure what would be the approach but I think it should not be too complex for this case and therefore regex would be the preferred solution. I just need to check that the exact strings #, shares and assets are in the appropriate slots, for example:
http://some-domain.com/#/shares/a454-rte3-445f-4543/assets
Everything in the URL can be variable (protocol, domain, port, share id) except the exact strings I'm looking for and the slots (slash positions) at which they appear.
Thanks for your help!

You can use
/^https?:\/\/some-domain\.com\/#\/shares\/[^/]+\/assets/i
let url = `http://some-domain.com/#/shares/a454-rte3-445f-4543/assets`
let matched = /^https?:\/\/some-domain\.com\/#\/shares\/[^/]+\/assets/i.test(url)
console.log(matched)

Decided to avoid regex and do it this way instead.
const urlParts = window.location.href.split('/');
if (urlParts[3] === '#' && urlParts[4] === 'shares' && urlParts[6] === 'assets') {
// code goes here...
}

Get data from URL address

I have a website like below:
localhost:3000/D129/1
D129 is a document name which changes and 1 is section within a document.
Those two values change depends on what user selects.
How do I just extract D129 part from the URL using javascript?

window.location.pathname.match(/\/([a-zA-Z\d]*)/)[1]
^ that should get you the 1st string after the slash
var path = "localhost:3000/D129/1";
alert(path.match(/\/([a-zA-Z\d]*)/)[1])

You can use .split() and [1]:
a = "localhost:3000/D129/1";
a = a.split("/");
alert(a[1]);
This works if your URLs always have the same format. Better to use RegEx. Wanted to answer in simple code. And if you have it with http:// or something, then:
a = "http://localhost:3000/D129/1";
a = a.split("/");
alert(a[3]);
ps: For the RegEx version, see Tuvia's answer.

javascript regex match not working as expected

I'm trying to do something very simple, but I can't get to work the way I intend. I'm sure it's doing exactly what I'm asking it to do, but I'm failing to understand the syntax.
Part 1:
In the following example, I want to extract the part of the string between geotech and Input.
x = "geotechCITYInput"
x.match(/^geotech(.*)(?:Input|List)$/)
The result:
["geotechCITYInput", "CITY"]
I've been writing regex for many years in perl/python and even javascript, but I've never seen the ?: syntax, which, I think, is what I'm supposed to use here.
Part 2:
The higher level problem I'm trying to solve is more complicated. I have a form with many elements defined as either geotechXXXXInput or geotechXXXXList. I want to create an array of XXXX values, but only if the name ends with Input.
Example form definition:
obj0.name = "geotechCITYInput"
obj1.name = "geotechCITYList"
obj2.name = "geotechSTATEInput"
obj3.name = "geotechSTATEList"
I ultimately want an array like this:
["CITY","STATE"]
I can iterate over the form objects easily with an API call, but I can't figure out how to write the regex to match the ones I want. This is what I have right now, but it doesn't work.
geotechForm.forEachItem(function(name) {
if(name.match(/Input$/)
inputFieldNames.push( name.match(/^geotech(.*)Input$/) );
});
Any suggestions would be greatly appreciated.

You were missing the Input and List suffix in your regex. This will match if the name starts with geotech and ends with either Input or List and it will return an array with the text in the middle as the second item in the array.
geotechForm.forEachItem(function (name) {
var match = name.match(/^geotech(.*)(Input|List)$/);
if (match) {
inputFieldNames.push(match[1]);
}
});

JavaScript, Regex and null result

I have written this regexp: <(a*)\b[^>]*>.*?</\1>
and is tested on this regexp testing site: http://gskinner.com/RegExr/?2tntr
The point of the regexp is to go through a sites HTML and find all of the links. It should then return these in an Array for me to manipulate.
On the regexp testing site it works perfectly, but when put in action with JavaScript on my site it returns null.
JavaScript looks like this:
var data = $('#mainDivOnMiddleOfPage').html();
var pattern = "<(a*).*href=.*>.*</a>";
var modi = "g";
var patt = new RegExp(pattern, modi);
var result = patt.exec(data);
jQuery gets the content of the page. This is tested and verified.
Question is, why does this return null in JavaScript but what it is supposed to return in the regexp tester?

All <a> links:
<a[^>]*?\bhref=['\"](.*?)['\"]
Absolute links only (starting with http):
<a[^>]*?\bhref=['\"](http.*?)['\"]
JavaScript code:
var html = '<a href="test.html">';
var m = html.match(/<a[^>]*?\bhref=['"](.*?)['"]/);
print (m[1]);
See and test the code here.

I use the following code to do the same thing and it works for me, try it out
var data = document.getElementById('mainDivOnMiddleOfPage').textContent;
var result = data.match(/<(a*).*href=.*>.*<\/a>/);

Going to go ahead and post this here, since I think it's what you want -- it is not a RegEx solution, however.
$(function(){
$.ajax({
url: "test.htm",
success: function(data){
var array_of_links = $.makeArray($("a",data));
// do your stuff here
}
});
});

I'm conscious an answer has been chosen. However it's worth mentioning that the current REGEX solutions match the tags but not the actual HREFs in isolation.
This is where JavaScript falls down, since its somewhat simplistic implementation of REGEX does not allow for the capturing of sub-groups when the global g flag is specified.
One way round this is to exploit the REGEX replacement callback. This will get just the link HREFs, not the tags.
var html = document.body.innerHTML,
links = [];
html.replace(/<a[^>]*?href=('|")(.*?)\1/gi, function($0, $1, $2) {
links.push($2);
});
//links is now an array of hrefs
It also uses a back-reference to close the href attribute, i.e. making sure both opening and closing quote are single or double, not mixed.
Sidenote: as others have mentioned, where possible, you'd want to DOM this rather than REGEX.

"The point of the regexp is to go through a sites HTML and find all of the links. It should then return these in an Array for me to manipulate."
I won't add another regex answer, but just want to point out that if you have hold of the document (not just the html) then it's easier to walk trhough the links collection. That contains all <a href="">'s but also all <area> elements:
for (var link, links = document.links, n = links.length, i=0; i<n; i++){
link = links[i];
switch (link.tagName){
case "A":
//do something with the link
break;
case "AREA":
//do something with the area.
break;
}
}

Your problem is that you are not compiling your regex:
patt.compile();
You have to call it before using with the exec() method.

Add body #id based on url

Need help! I've been looking for a solution for this seemingly simple task but can't find an exact one. Anyway, I'm trying to add custom #id to the tag based on the page's URL. The script I'm using works ok when the URLs are like these below.
- http://localhost.com/index.html
- http://localhost.com/page1.html
- http://localhost.com/page2.html
-> on this level, <body> gets ids like #index, #page1, #page2, etc...
My question is, how can I make the body #id still as #page1 or #page2 even when viewing subpages like this?
- http://localhost.com/page1/subpage1
- http://localhost.com/page2/subpage2
Here's the JS code I'm using (found online)
$(document).ready(function() {
var pathname = window.location.pathname;
var getLast = pathname.match(/.*\/(.*)$/)[1];
var truePath = getLast.replace(".html","");
if(truePath === "") {
$("body").attr("id","index");
}
else {
$("body").attr("id",truePath);
}
});
Thanks in advance!
edit: Thanks for all the replies! Basically I just want to put custom background images on every pages based on their body#id. >> js noob here.
http://localhost.com/page2/subpage2 - > my only problem is how to make the id as #page2 and not #subpage2 on this link.

Using the javascript split function might be of help here. For example (untested, but the general idea):
var url = window.location.href.replace(/http[s]?:\/\//, '').replace('.html', '');
var segments = url.split('/');
$('body').id = segments[0];
Also, you might want to consider using classes instead of ID's. This way you could assign every segment as a class...
var url = window.location.href.replace(/http[s]?:\/\//, '').replace('.html', '');
var segments = url.split('/');
for (var i = 0; i < segments.length; i++) {
$('body').addClass(segments[i]);
}
EDIT:
Glad it worked. Couple of notes if you're planning on using this for-real: If you ever have an extension besides .html that will get picked up in the class name. You can account for this by changing that replace to a regex...
var url = window.location.href.replace(/http[s]?:\/\//, '');
// Trim extension
url = url.replace(/\.(htm[l]?|asp[x]?|php|jsp)$/,'');
If there will ever be querystrings on the URL you'll want to filter those out too (this is the one regex I'm not 100% on)...
url = url.replace(/\?.+$/,'');
Also, it's a bit inefficient to have the $('body') in every for loop "around" as this causes jQuery to have to re-find the body tag. A more performant way to do this, especially if the sub folders end up 2 or 3 deep would be to find it once, then "cache" it to a variable like so..
var $body = $('body');
for ( ... ) {
$body.addClass( ...
}

Your regex is only going to select the last part of the url.
var getLast = pathname.match(/./(.)$/)[1];
You're matching anything (.*), followed by a slash, followed by anything (this time, capturing this value) and then pulling out the first match, which is the only match.
If you really want to do this (and I have my doubts, this seems like a bad idea) then you could just use window.location.pathname, since that already has the fullpath in there.
edit: You really shouldn't need to do this because the URL for the page is already a unique identifier. I can't really think of any situation where you'd need to have a unique id attribute for the body element on a page. Anytime where you're dealing with that content (either from client side javascript, or from a scraper) you should already have a unique identifier - the URL.
What are you actually trying to do?

Try the following. Basically, it sets the id to whatever folder or filename appears after the domain, but won't include a file extension.
$(document).ready(function() {
$("body").attr("id",window.location.pathname.split("/")[1].split(".")[0]);
}

You want to get the first part of the path instead of the last:
var getFirst = pathname.match(/^\/([^\/]*)/)[1];

If your pages all have a common name as in your example ("page"), you could modify your script including changing your match pattern to include that part:
var getLast = pathname.match(/\/(page\d+)\//)[1];
The above would match "page" followed by a number of digits (omitting the 'html' ending too).

We Keep Coding

JavaScript is the programming language of the Web.

Check Array Entries with Regex - javascript

Related

Checking for a specific URL regex

Get data from URL address

javascript regex match not working as expected

JavaScript, Regex and null result

Add body #id based on url

Categories

Resources