Javascript regex- How to match xpaths - javascript

I am creating a regex for matching xpaths generated by firebug, cna some one help me with that, an example xpath is:
.//*[#id='tab-HOME']/li[2]/span/span[1]/span/span[2]/span[2]/span/span
.//*[#id='any_possible_id']/span/span[2]/span/span
Now keeping in mind the names alowed for id's in javascript what can be the possible regex. I want to match
.//*[#id='any_possible_id']/li
Here is what I tried:
alert(/^\.\/\/\*\[[#id=]*\]/.test(xpath));
certainly incomplete.

Use [^\]]+ after the id to match until the next ] of the [#id=...]
alert(/^\.\/\/\*\[#id=[^\]]+\]\/li/.test(xpath));
If you do not want to match only for ../li then remove the li from the regex.

Related

How to append string after matching field with regex

I want to append a word after <body> tag, it should not modify/replace anything other than just append a word. I have done something like this, is it valid do empty parenthesis fir second capture group will match everything?
/(<body[^>]*>)()/, `$1${my_variable}$2`)
The second capture group, designed to capture nothing, will match "nothing" - it will form a match immediately after your closed body tag. There's nothing wrong with doing this for the regex, though you might want to be wary of using [^>]* - this negated character class will gladly match across lines and grab as much input as it can. Handy for matching multi-line tags, but often very dangerous.
Also, if you're on linux and for some reason have > symbols in filenames (which is valid!) your regex will break horribly, as shown here.
That being said, valid regex or not, it's usually a bad idea to use regex with html, since HTML isn't a regular language. Also, you could accidentally summon Cthulhu.
let page = "<html><body>Some info</body></html>";
page.replace("<body>", `<body>${my_variable}`);
or
page.replace(/<body>|<BODY>/, `<body>${my_variable}`);
If in the broweser you can also use document.querySelector("body").innerHTML
Also depending on which framework you're using there are better ways to accomplish this.

javascript regex for matching attributes in HTML string

Can anyone look at my regex in javascript and suggest a correct one?
I'm trying to select attributes(name/value) pairs in an HTML/XML string like following?
<unknowncustom:tag attrib1="XX' XX'" attrib2='YY" YY"' attrib3=ZZ""'>/unknowncustom:tag>
SOME TEXT that is not part of any tag and should not be selected, name='XX', y='ee';
<custom:tag attrib1="XX' XX'" attrib2='YY" YY"' attrib3=ZZ""'>/custom:tag>
I found many solutions but none seem foolproof (including this one Regular expression for extracting tag attributes)
My current regex selects the first attribute pair but can't figure out how to make it select all matching attributes. Here is the regex:
/<\w*:?\w*\s+(?:((\w*)\s*=\s*((?:(?:"[^"]*")|(?:'[^']*')|[^>\s]+))))[^>]*>/gim
Thanks
Let's have a go:
/(\w+)\s*=\s*((["'])(.*?)\3|([^>\s]*)(?=\s|\/>))(?=[^<]*>)/g
Regex is not ideal for this. If your attributes contain unescaped angle brackets < > it probably will not work.
Proof: http://regex101.com/r/dD4uT4

return substring match

I am trying to get just a part of a string with a regex
this is the string i am testing
class1 container _box _box_CEC493
the string is a series of classes applied to an element.
what i would like to get is just CEC493 which changes since the regex will be applied to a bunch of different elements (therefore string like the one above)
the regex i am using now is
/\s_box_([0-9a-zA-Z]+)/
which returns
_box_CEC493, CEC493
How can i modify it in order to get just the second value (CEC493)?
Thank you
You could probably just split the string:
var str = "class1 container _box _box_CEC493";
var match = str.split('_').pop();
alert(match);
DEMO
The standard way regexes come back is like this:
[0]: Whole result
[1]: First parentheses capture group
etc
So the standard way that people access these is with result[1]. Does that cause any issues in your case?
[updated]
instead of selecting all characters, select until an unwanted character,, and since you are selecting from a number of classes, it is possible that you have the _box_.. class alone without a space before it, so don't use space at the beginning of your regex selector.
str.match(/_box_([^\s]*)/)[1]
jsfiddle

Extracting both the full match, and the last token match in a regexp

I have a little interesting issue here. I have a plaintext URL coming from Excel and I need to change it to an HTML URL with a unique body. Here is the regex code for javascript:
text = text.toString().replace(/=hyperlink\(([#\\\w\s\(\)-\.\/]+)\)/g, "<a href='file:///$1'>$1</a>");
This works perfectly fine for what it does. Example, text is:
=hyperlink("\\share\folder\log\2013\13-05-13\13-05-13.txt")
regex turns it into
\\share\folder\log\2013\13-05-13\13-05-13.txt
However, I need the inner HTML to be just the text file name:
13-05-13.txt
To further complicate the matter, the original text the regex is going through is not a single occurrence. It is an entire spreadsheet with 100's of rows that contain this. So the regex will be matching and replacing 100's of these strings in one operation.
Hopefully it is possible to get this all done in one regexp on the entire string, but I suppose I could loop through each line of the string first...
If there is no way to do this with one regex engine, what do you think the best approach is? (no PHP/Python/Server side. Just Javascript, HTML, Jquery, etc).
I guess you could use this regex:
=hyperlink\("([#\\\w\s\(\)\-\.\/]+\\([^"]+))"\)
And this new replace:
$2
I'm not sure how your regex was working, but I added the quotes in the regex and replaced the single quotes by double quotes in the replace. Revert those if need be.
Demo

match text between two html custom tags but not other custom tags

I have something like the following;-
<--customMarker>Test1<--/customMarker>
<--customMarker key='myKEY'>Test2<--/customMarker>
<--customMarker>Test3 <--customInnerMarker>Test4<--/customInnerMarker> <--/customMarker>
I need to be able to replace text between the customMarker tags, I tried the following;-
str.replace(/<--customMarker>(.*?)<--\/customMarker>/g, 'item Replaced')
which works ok. I would like to also ignore custom inner tags and not match or replace them with text.
Also I need a separate expression to extract the value of the attribute key='myKEY' from the tag with Text2.
Many thanks
EDIT
actually I am trying to find things between comment tags but the comment tags were not displaying correctly so I had to remove the '!'. There's a unique situation that required comment tags... in anycase if anyone knows enough regex to help, it would be great. thank u.
In the end, I did something like the following (incase anyone else needs this. enjoy!!! But note: Word about town is that using regex with html tags is not ideal, so do your own research and make up your mind. For me, it had to be done this way, mostly bcos i wanted to, but also bcos it simplified the job in this instance);-
var retVal = str.replace(/<--customMarker>(.*?)<--\/customMarker>/g, function(token, match){
//question 1: I would like to also ignore custom inner tags and not match or replace them with text.
//answer:
var replacePattern = /<--customInnerMarker*?(.*?)<--\/customInnerMarker-->/g;
//remove inner tags from match
match = $.trim(match.replace(replacePattern, ''));
//replace and return what is left with a required value
return token.replace(match, objParams[match]);
//question 2: Also I need a separate expression to extract the value of the attribute key='myKEY' from the tag with Text2.
//answer
var attrPattern = /\w+\s*=\s*".*?"/g;
attrMatches = token.match(attrPattern);//returns a list of attributes as name/value pairs in an array
})
Can't you use <customMarker> instead? Then you can just use getElementsByTagName('customMarker') and get the inner text and child elements from it.
A regex merely matches an item. Once you have said match, it is up to you what you do with it. This is part of the problem most people have with using regular expressions, they try and combine the three different steps. The regex match is just the first step.
What you are asking for will not be possible with a single regex. You're going to need a mini state machine if you want to use regular expressions. That is, a logic wrapper around the matches such that it moves through each logical portion.
I would advise you look in the standard api for a prebuilt engine to parse html, rather than rolling your own. If you do need to do so, read the flex manual to get a basic understanding of how regular expressions work, and the state machines you build with them. The best example would be the section on matching multiline c comments.

Categories