I would like to get all the links from a page, which match on a specific pattern.
I try to do this with Regular Expression Extractor post processor with regex like this: <a[^>]* href="([^"]*)".
I checked response of GET command and found, those links are not visible in response, but links are only visible in browser, when mouse is over text.
Don't use regular expressions to parse HTML data.
I would recommend going for CSS/JQuery Extractor instead, the relevant configuration would be something like:
Reference Name: anything meaningful, i.e. link
CSS/JQuery Expression: a
Attribute: href
Match No: -1
You will be able to see all the extracted link URLs using Debug Sampler and View Results Tree listener combination. See How to Use the CSS/JQuery Extractor in JMeter article for more details.
In general when mouse is over text the attribute title is used for the text
So in your case if title is after href you the following (group 2 for mouse over text)
<a[^>]* href="([^"]*)" title="([^"]*)"
and change Template to use second group as $2$
Related
I am trying to do some html scraping with JavaScript, and would like to take the a href link and replace it into a hyperlink on a Discord embed. I am having trouble with regex, I am finding it very difficult to learn.
I assume I will also need another regex to capture it all so I can replace it with my desired target?
This is an example raw html that I have:
An **example**, also known as a example type
to make this readable within a Discord embed, I am looking for a desired output of:
An **example**, also known as a [**example type**](https://www.example.com/example%20type)
I have tried extracting the URL via regex, which I can match however, I am having issues with extracting the link and the (I think its called target? The 'example type' in the example link text) and then replacing the string with my desired output.
I have the following: (https://regexr.com/73574)
/href="[^"]+/g
This matches href="https://www.example.com/example%20type, and feels like a very early step, it includes 'href' in the match, and it does not capture the target.
EDIT:
I apologise, I did not think about additional checks, what if the string has multiple links? and text after them, for example:
An **example**, also known as a example type is the first example, and now I have second example
with a desired output of:
An **example**, also known as a [**example type**](https://www.example.com/example%20type) is the first example, and now I have [**second**](https://www.example.com/second) example
Try this: (?<=href=")[^"]*
By using a lookbehind, you can now verify that the text behind is equal to href=" without capturing it
Demo: https://regex101.com/r/2qMnPt/1
You can use regular expression groups to capture things that interest you. My regular expression here might be far from perfect but I don't think that's important here - it shows you a way and you can always improve it if needed.
Things you have to do:
prepare regex that captures groups that you need (anchor tag, anchor text, anchor url),
remove the anchor tag completely from the text
inject anchor text and anchor href into the final string
Here's a quick code example of that:
const anchorRegex = /(<a\shref="([^"]+)">(.+?)<\/a>)/i;
const textToBeParsed = `An **example**, also known as a example type`;
const parseText = (text) => {
const matches = anchorRegex.exec(textToBeParsed);
if (!matches) {
console.warn("Something went wrong...");
return;
}
const [, fullAnchorTag, anchorUrl, anchorText] = matches;
const textWithoutAnchorTag = text.replace(fullAnchorTag, '');
return `${textWithoutAnchorTag}[**${anchorText}**](${anchorUrl})`;
};
console.log(parseText(textToBeParsed));
Solution:
const input = 'An **example**, also known as a example type first and second here no u and then done noice';
const output = input.replace(/<a href="([^"]+)">([^<]+)<\/a>/g, '[**$2**]($1)')
console.log(output);
Regex breakdown:
<a href=" - Matches the opening <a href" HTML tag
([^"]+) - This is a capturing group, matches a number of characters that are not double quotes
"> - Matches the closing double quotes, including the closing tag '>'
([^<]+) - Another capturing group, matches a number of characters that are not a less than symbol
<\/a> - Matches the closing HTML tag
I then use the replace method seen in my output variable.
Within the replace, you see two options (regex, replaceWith)
The first option is obvious, its the regex. The second option [**$2**]($1), uses the capturing groups we see in the regex, the first group $1 provides the link within the HTML tag, and $2 provides the HTML target (the name after the link, for example in my input variable, the first target you see is: 'example type'.
The only important bits in this option is: $2 and $1, however I wanted to display them in a certain way, [**target**](link).
I am currently writing a code-snippet that automatically links certain keywords and saves the links it linked into an array called linked. I do this last step to prevent a certain word to be linked twice.
Now the user is writing into a textbox, writes a keyword it gets linked. That works fine. My problem now is I am trying to handle the situation when he deletes text from the textbox. This means I have to match all links in the text against the linked array and then remove those from the linked array, that aren't in the text anymore. So far the theory. Unfortunately I am stuck with the following error.
Assume we have a text like this:
Test <a href='link1'>Link1</a> <a href='link2'>Link2</a>
I use this regEx (/href='([^\'\"]+)'/g) to get all the hrefs in the text above like so:
var hrefs = $(textInput).val().match(/href='([^\'\"]+)'/g);
This gives me an array that contains the following:
href='link1'
href='link2'
If I start deleting text and end up with something like this:
Test <a href='link1'>Link1</a> <a href='link2
Notice the one ' that is gone, the whole regEx turns out undefined, even though there still is a link in the string. Since I am not an expert with regEx I can't see exactly why? Is there maybe a better regEx for this situation?
You can simplify your regex like this:
/href='[^']+'/g
Demo
http://regex101.com/r/tU2qL0
Use this regex /href=('|")\w+('|")/g like this;
var hrefs = $(textInput).val().match(/href=('|")\w+('|")/g);
This should give you the matches.
BTW, match() is correct. Don't do exec() as #tenub said
Mark it as answer if it helps :)
Given something a regex like this:
http://rubular.com/r/ai1LFT5jvK
I want to use string.replace to replace "subdir" with a string of my choosing.
Doing myStr.replace(/^.*\/\/.*\.net\/.*\/(.*)\/.*\z/,otherStr)
only returns the same string, as shown here: http://jsfiddle.net/nLmbV/
If you view the Rublar, it appears to capture what I want it to capture, but on the Fiddle, it doesn't replace it.
I'd like to know why this happens, and what I'm doing wrong. A correct regex or a correct implementation of the replace call would be nice, but most of all, I want to understand what I'm doing wrong so that I can avoid it in the future.
EDIT
I've updated the fiddle to change my regex from:
/^.*\/\/.*\.net\/.*\/(.*)\/.*\z/
to
/^.*\/\/.*\.net\/.*\/(.*)\/.*$/
And according to the fiddle, it just returns hello instead of https://xxxxxxxxxxx.cloudfront.net/dir/hello/Slide1_v2.PNG
It's that little \z in your regex.
You probably forgot to replace it with a $ sign. JavaScript uses ^ and $ as anchors, while Ruby uses \A and \z.
To answer your edit:
The match is always replaced as a whole. You'll want to group both the left side and the right side of the to-be-replaced part and reinsert it in the replacement:
url.replace(/^(.*\/\/.*\.net\/.*\/).*(\/.*)$/,"$1hello$2")
Before I get marked down, I know the question asks about regexp. The reason for this answer URLs are nearly impossible to process reliably with a regexp without writing fiendishly complex regexps. It can be done, but it makes your head hurt!
If you are doing this in a browser, you can use an A tag in your script to make things much simpler. The A tag knows how to parse them into pieces, and it lets you modify the pieces independently, so you only need to deal with the pathname:
//make a temporary a tag
var a = document.createElement('a');
//set the href property to the url you want to process
a.href = "scheme://host.domain/path/to/the/file?querystring"
//grab the path part of the url, and chop up into an array of directories
var dirs = a.pathname.split('/');
//set 2nd dir name - array is ['','path','to','file']
dirs[2]='hello';
//put the path back together
a.pathname = dirs.join('/');
a.href now contains the URL you want.
More lines, but also more hair left when you come back to change the code later.
I would like typeahead.js to behave like jqueryui autocomplete with regards to how it matches items. Using jqueryui autocomplete it's possible to search inside the text items. In typeahead it's only from the beginning of the string.
Autocomplete example: http://goo.gl/O43afF
Typeahead example: http://twitter.github.io/typeahead.js/examples/
With autocomplete, it seems like it's possible to assign a comparison function, but I haven't found anything like that in typeahead.
If I have a list that contains the item "Equestrian (Horses)" then I would like to get a match if I start writing "o".
Typeahead.js code as is will look for prefix matches, as you correctly say. There is a "trick" though: every datum may also contain a tokens element, which as the Typeahead documentation says is "a collection of strings that aid typeahead.js in matching datums with a given query".
The prefix matching is done against tokens. If you don't supply a tokens value for one of your datums, its value is tokenized (space-separated) for you. However, you could supply tokens to get what you want. For example, in your case you would supply a value of tokens that is all the unique substrings of all the words in your query string.
I suggest "all unique substrings of length >= 2", btw.
typeahead's datasource is set via the 'source' parameter. So it's perfectly ok to place another method instead an array in there. Also note that it internally expects an array of strings so you have to format everything to string.
Take a look at this fiddle for an example
EDIT: this example now always generates values from Test 0 to Test 9, so you can of course only check by entering parts of "test"
I have something like the following;-
<--customMarker>Test1<--/customMarker>
<--customMarker key='myKEY'>Test2<--/customMarker>
<--customMarker>Test3 <--customInnerMarker>Test4<--/customInnerMarker> <--/customMarker>
I need to be able to replace text between the customMarker tags, I tried the following;-
str.replace(/<--customMarker>(.*?)<--\/customMarker>/g, 'item Replaced')
which works ok. I would like to also ignore custom inner tags and not match or replace them with text.
Also I need a separate expression to extract the value of the attribute key='myKEY' from the tag with Text2.
Many thanks
EDIT
actually I am trying to find things between comment tags but the comment tags were not displaying correctly so I had to remove the '!'. There's a unique situation that required comment tags... in anycase if anyone knows enough regex to help, it would be great. thank u.
In the end, I did something like the following (incase anyone else needs this. enjoy!!! But note: Word about town is that using regex with html tags is not ideal, so do your own research and make up your mind. For me, it had to be done this way, mostly bcos i wanted to, but also bcos it simplified the job in this instance);-
var retVal = str.replace(/<--customMarker>(.*?)<--\/customMarker>/g, function(token, match){
//question 1: I would like to also ignore custom inner tags and not match or replace them with text.
//answer:
var replacePattern = /<--customInnerMarker*?(.*?)<--\/customInnerMarker-->/g;
//remove inner tags from match
match = $.trim(match.replace(replacePattern, ''));
//replace and return what is left with a required value
return token.replace(match, objParams[match]);
//question 2: Also I need a separate expression to extract the value of the attribute key='myKEY' from the tag with Text2.
//answer
var attrPattern = /\w+\s*=\s*".*?"/g;
attrMatches = token.match(attrPattern);//returns a list of attributes as name/value pairs in an array
})
Can't you use <customMarker> instead? Then you can just use getElementsByTagName('customMarker') and get the inner text and child elements from it.
A regex merely matches an item. Once you have said match, it is up to you what you do with it. This is part of the problem most people have with using regular expressions, they try and combine the three different steps. The regex match is just the first step.
What you are asking for will not be possible with a single regex. You're going to need a mini state machine if you want to use regular expressions. That is, a logic wrapper around the matches such that it moves through each logical portion.
I would advise you look in the standard api for a prebuilt engine to parse html, rather than rolling your own. If you do need to do so, read the flex manual to get a basic understanding of how regular expressions work, and the state machines you build with them. The best example would be the section on matching multiline c comments.