regular expression to unlink html code with javascript

regular expression to unlink html code with javascript - javascript

I'm sorry,I can't believe this question is not solved in stackoverflow but I've been searching a lot and I don't find any solution.
I want to change HTML code with regular expressions in this way:
testing anchor
to
testing anchor
Only I want to unlink a text code without use DOM functions, the code is in a string not in the document and I don't want to remove other tags that the a ones.

If you really don't want to use DOM functions (why ?) you might do
str = str.replace(/<[^>]*>/g, '')
You can use it if you're fairly confident you don't have a more complex HTML but it will fail in many cases, for example some nested tags, or > in an attribute. You might fix some of the problems with more complex regular expressions but they aren't the right tool for this job in the general case.
If you don't want to remove other tags than a, do this :
str = str.replace(/<\/?a( [^>]*)?>/g, '')
This changes
<a>testing</a> <b>a</b>nchor<div>test</div><aaa>E</aaa>
to
testing <b>a</b>nchor<div>test</div><aaa>E</aaa>

I know you only want regex, for future viewers, here is a trivial solution using DOM methods.
var a = document.createElement("div");
a.innerHTML = 'testing anchor';
var wordsOnly = a.textContent || a.innerText;
This will not fail on complicated use cases, allows nested tags and it's perfectly clear what's happening:
Hey browser! Create an element
Put that HTML in it
Give me back just the text, that's what I want now.
NOTE:
The element we're creating will not be added to the actual DOM since we're not adding it anywhere, it'll stay invisible. Here is a fiddle to illustrate how this works.

As has been mentioned, you cannot parse HTML with regular expressions. The principal reason is that HTML elements nest and regular expressions cannot handle that.
That said, with a few restrictions which I will mention, you can do the following :
string.replace (/(\b\w+\s*)<a\s+href="([^"]*)">(.*)<\/a>/g, '$1 $3')
This requires there to be a word before the tag, spacing between the word and the tag is optional, no attributes other than the href specified in the <a> tag and you accept anything between the <a> and the .

You can create a DOM object from the string, use DOM methods to parse, without having had appended said DOM object to the document

Related

Scan dom/webpage after certain pattern and get domtag

I write a small chrome extension which includes adding buttons add specific positions.
These positions are mostly random and can't be determined with normal css/jQuery selectors.
I need to scan the whole page for a certain text pattern (regex).
After I found matches I need to get the dom tag where the text is in.
I tried parsing the whole source with body.innerHtml but I cant get the tag obj afterwards.
Any ideas on how to accomplish such a task are highly appreciated!

Sounds like you could use :contains() for this.
$(":contains('Your Text')")
For finding text using a regular expression use .filter()
var regex = new RegExp("Your Text");
$("*").filter(function () {
return regex.test($(this).text());
});

Javascript regex to replace ampersand in all links href on a page

I've been going through and trying to find an answer to this question that fits my need but either I'm too noob to make other use cases work, or their not specific enough for my case.
Basically I want to use javascript/jQuery to replace any and all ampersands (&) on a web page that may occur in a links href with just the word "and". I've tried a couple different versions of this with no luck
var link = $("a").attr('href');
link.replace(/&/g, "and");
Thank you

Your current code replaces the text of the element within the jQuery object, but does not update the element(s) in the DOM.
You can instead achieve what you need by providing a function to attr() which will be executed against all elements in the matched set. Try this:
$("a").attr('href', function(i, value) {
return value.replace(/&/g, "and");
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
link
link

Sometimes when replacing &, I've found that even though I replaced &, I still have amp;. There is a fix to this:
var newUrl = "#Model.UrlToRedirect".replace(/&/gi, '%').replace(/%amp;/gi, '&');
With this solution you replace & twice and it will work. In my particular problem in an MVC app, window.location.href = #Model.UrlToRedirect, the url was already partially encoded and had a query string. I tried encoding/decoding, using Uri as the C# class, escape(), everything before coming up with this solution. The problem with using my above logic is other things could blow up the query string later. One solution is to put a hidden field or input on the form like this:
<input type="hidden" value="#Model.UrlToRedirect" id="url-redirect" />
then in your javascript:
window.location.href = document.getElementById("url-redirect").value;
in this way, javascript won't take the c# string and change it.

How can I separately retrieve the HTML that's before and after a child element inside a parent element?

We're writing a web app that relies on Javascript/jQuery. It involves users filling out individual words in a large block of text, kind of like Mad Libs. We've created a sort of HTML format that we use to write the large block of text, which we then manipulate with jQuery as the user fills it out.
Part of a block of text might look like this:
This is a test of the NOUN Broadcast System.
Given that markup, I need to separately retrieve and manipulate the text before and after the inner ; we're calling those the "prefix" and "suffix".
I know that you can't parse HTML with simple string manipulation, but I tried anyway; I tried using split() on the and tags. It seemed simple enough. Unfortunately, Internet Explorer casts all HTML tags to uppercase, so that technique fails. I could write a special case, but the error has taught me to do this the right way.
I know I could simply use extra HTML tags to manually denote the prefix and suffix, but that seems ugly and redundant; I'd like to keep our markup format as lean and readable and writable as possible.
I've looked through the jQuery docs, and can't find a function that does exactly what I need. There are all sorts of functions to add stuff before and after and around and inside elements, but none that I can find to retrieve what's already there. I could remove the inner , but then I don't know how I can tell what came before the deleted element apart from what came after it.
Is there a "right" way to do what I'm trying to do?

With simple string manipulations you can also use Regex.
That should solve your problem.
var array = $('.fillmeout').html().split(/<\/?span>/i);

Use your jQuery API! $('.fillmeout').children() and then you can manipulate that element as required.
http://api.jquery.com/children/

For completeness, I thought I should point out that the cleanest answer is to put the prefix and suffix text in it's own <span> like this and then you can use jQuery selectors and methods to directly access the desired text:
<span class="fillmeout">
<span class="prefix">This is a test of the </span>
<span>NOUN</span>
<span class="suffix"> Broadcast System.</span>
</span>
Then, the code would be as simple as:
var fillme = $(".fillmeout").eq(0);
var prefix = fillme.find(".prefix").text();
var suffix = fillme.find(".suffix").text();
FYI, I would not call this level of simplicity "ugly and redundant" as you theorized. You're using HTML markup to delineate the text into separate elements that you want to separately access. That's just smart, not redundant.
By way of analogy, imagine you have toys of three separate colors (red, white and blue) and they are initially organized by color and you know that sometime in the future you are going to need to have them separated by color again. You also have three boxes to store them in. You can either put them all in one box now and manually sort them out by color again later or you can just take the already separated colors and put them each into their own box so there's no separation work to do later. Which is easier? Which is smarter?
HTML elements are like the boxes. They are containers for your text. If you want the text separated out in the future, you might as well put each piece of text into it's own named container so it's easy to access just that piece of text in the future.

Several of these answers almost got me what I needed, but in the end I found a function not mentioned here: .contents(). It returns an array of all child nodes, including text nodes, that I can then iterate over (recursively if needed) to find what I need.

I'm not sure if this is the 'right' way either, but you could replace the SPANs with an element you could consistently split the string on:
jQuery('.fillmeout span').replaceWith('|');
http://api.jquery.com/replaceWith/
http://jsfiddle.net/mdarnell/P24se/

You could use
$('.fillmeout span').get(0).previousSibling.textContent
$('.fillmeout span').get(0).nextSibling.textContent
This works in IE9, but sadly not in IE versions smaller than 9.

Based on your example, you could use your target as a delimiter to split the sentence.
var str = $('.fillmeout').html();
str = str.split('<span>NOUN</span>');
This would return an array of ["This is a test of the ", " Broadcast System."]. Here's a jsFiddle example.

You could just use the nextSibling and previousSibling native JavaScript (coupled with jQuery selectors):
$('.fillmeout span').each(
function(){
var prefix = this.previousSibling.nodeValue,
suffix = this.nextSibling.nodeValue;
});
JS Fiddle proof of concept.
References:
each().
node.nextSibling.
node.previousSibling.

If you want to use the DOM instead of parsing the HTML yourself and you can't put the desired text in it's own elements, then you will need to look through the DOM for text nodes and find the text nodes before and after the span tag.
jQuery isn't a whole lot of help when dealing with text nodes instead of element nodes so the work is mostly done in plain javascript like this:
$(".fillmeout").each(function() {
var node = this.firstChild, prefix = "", suffix = "", foundSpan = false;
while (node) {
if (node.nodeType == 3) {
// if text node
if (!foundSpan) {
prefix += node.nodeValue;
} else {
suffix += node.nodeValue;
}
} else if (node.nodeType == 1 && node.tagName == "SPAN") {
// if element and span tag
foundSpan = true;
}
node = node.nextSibling;
}
// here prefix and suffix are the text before and after the first
// <span> tag in the HTML
// You can do with them what you want here
});
Note: This code does not assume that all text before the span is located in one text node and one text node only. It might be, but it also might not be so it collates all the text nodes together that are before and after the span tag. The code would be simpler if you could just reference one text node on each side, but it isn't 100% certain that that is a safe assumption.
This code also handles the case where there is no text before or after the span.
You can see it work here: http://jsfiddle.net/jfriend00/P9YQ6/

Is it possible to get jquery objects from an html string thats not in the DOM?

For example in javascript code running on the page we have something like:
var data = '<html>\n <body>\n I want this text ...\n </body>\n</html>';
I'd like to use and at least know if its possible to get the text in the body of that html string without throwing the whole html string into the DOM and selecting from there.

First, it's a string:
var arbitrary = '<html><body>\nSomething<p>This</p>...</body></html>';
Now jQuery turns it into an unattached DOM fragment, applying its internal .clean() method to strip away things like the extra <html>, <body>, etc.
var $frag = $( arbitrary );
You can manipulate this with jQuery functions, even if it's still a fragment:
alert( $frag.filter('p').get() ); // says "<p>This</p>"
Or of course just get the text content as in your question:
alert( $frag.text() ); // includes "This" in my contrived example
// along with line breaks and other text, etc
You can also later attach the fragment to the DOM:
$('div#something_real').append( $frag );
Where possible, it's often a good strategy to do complicated manipulation on fragments while they're unattached, and then slip them into the "real" page when you're done.

The correct answer to this question, in this exact phrasing, is NO.
If you write something like var a = $("<div>test</div>"), jQuery will add that div to the DOM, and then construct a jQuery object around it.
If you want to do without bothering the DOM, you will have to parse it yourself. Regular expressions are your friend.

It would be easiest, I think, to put that into the DOM and get it from there, then remove it from the DOM again.
Jquery itself is full of tricks like this. It's adding all sorts off stuff into the DOM all the time, including when you build something using $('<p>some html</p>'). So if you went down that road you'd still effectively be placing stuff into the DOM then removing it again, temporarily, except that it'd be Jquery doing it.

John Resig (jQuery author) created a pure JS HTML parser that you might find useful. An example from that page:
var dom = HTMLtoDOM("<p>Data: <input disabled>");
dom.getElementsByTagName("body").length == 1
dom.getElementsByTagName("p").length == 1
Buuuut... This question contains a constraint that I think you need to be more critical of. Rather than working around a hard-coded HTML string in a JS variable, can you not reconsider why it's that way in the first place? WHAT is that hard-coded string used for?
If it's just sitting there in the script, re-write it as a proper object.
If it's the response from an AJAX call, there is a perfectly good jQuery AJAX API already there. (Added: although jQuery just returns it as a string without any ability to parse it, so I guess you're back to square one there.)

Before throwing it in the DOM that is just a plain string.
You can sure use REGEX.

space or not outputting in jQuery

I have the following being extracted from an XML and being put into a jQuery variable.
links.append($("<a href='"+alink+"'></a> ").html(desc));
...however the does not output onto the page. I need this to separate the hrefs on output
I have also tried
links.append($("<a href='"+alink+"'></a>").html(desc));
links.append($(" "));
Many thanks!

$("<a href='"+alink+"'></a> ")
Yeah, that's actually only creating the <a> element, and discarding the nbsp. When you pass a string into the $() function that looks like(*) HTML, jQuery creates the stretch of markup between the first < in the string and the last >. If you've got any leading or trailing content outside those, it gets thrown away(**). You could fool jQuery by saying:
$("<a href='"+alink+"'></a> <!-- don't ignore me! -->")
This doesn't seem to be documented anywhere, makes no sense whatsoever, and might be considered a bug, but it has been jQuery's normal behaviour for some time so it's probably not going away.
When you pass an HTML string to the append function (and other manipulation methods) directly instead of via the $ function, this behaviour does not occur. So:
links.append("<a href='"+alink+"'></a> ");
actually does keep the space. But a better way forward is to stop throwing HTML strings about, so you don't have to worry about alink containing ', < or & characters either, and work in a more DOM style:
var link= $('<a/>');
link.attr('href', alink);
link.html(desc);
links.append(link);
links.append('\xA0');
Or, more concisely, using jQuery 1.4's props argument shortcut:
links.append($('<a/>', {href: alink, html: desc}));
links.append('\xA0');
assuming that desc is really something that should contain HTML markup; if not, use text instead.
(I used \xA0, the JavaScript string literal way to include a character U+00A0 NON-BREAKING SPACE as it is a whole two characters shorter than the HTML entity reference. Woohoo!)
(*: how does it tell that a string is HTML? Why, by checking to see if there's a < and > character in it, in that order, of course. Meaning it'll get fooled if you try to use a selector that has those characters in. Brilliant, jQuery, brilliant.(***))
(**: why? see line 125 of jQuery 1.4.2. It builds the HTML fragment from match[1]—the group from the first < to the last > in quickExpr—and not the original string or match[0].)
(***: I'm being sarcastic. The insane over-overloading of the $ function is one of jQuery's worst features.)

You better style with css, something like :
links.append($("<a class='link' href='"+alink+"'></a>").html(desc));
in css :
a.link {
padding-left : 5px ;
padding-right : 5px ;
}

you could try

We Keep Coding

JavaScript is the programming language of the Web.