i want to replace html links in a string with text links, for example:
<a href="test.com"> should become test.com.
I cant figure out any regex matching all my patterns. Cause links might have more attributes in different orders:
<a class="test" href="test.com" title="test">
How can i achieve that?
let str = '<a class="test" href="test.com" title="test">'
let result = str.split(/href="/)[1].split('"')[0]
console.log(result)
Create a temporary DOM element with the string as HTML content and iterate over all a tags and replace it with the corresponding link(by getting href attribute).
let html = `<a class="test" href="test.com" title="test">`;
// create a temporary div element
let tempDiv = document.createElement('div');
// set html content as your string
tempDiv.innerHTML = html;
// get all a tags and iterate
tempDiv.querySelectorAll('a').forEach(ele => {
// replace element with corresponding link
ele.replaceWith(ele.getAttribute('href')) // or ele.href
})
// get html content of temporary element
console.log(tempDiv.innerHTML)
Or alternately you can use DOMParser for parsing html content.
let html = `<a class="test" href="test.com" title="test">`;
// parser
let parser = new DOMParser();
// parse the string which returs a document object
doc = parser.parseFromString(html, "text/html");
// get all a tags and iterate
doc.querySelectorAll('a').forEach(ele => {
// replace element with corresponding link
ele.replaceWith(ele.getAttribute('href')) // or ele.href
})
// get html content from body
console.log(doc.body.innerHTML)
UPDATE : With regex you can extract and replace the a tag in the following method(not prefered).
var str = '<a class="test" href="test.com" title="test">';
console.log(str.replace(/<a[^>]*href="([^"]+)"[^>]*>(?:.*?<\/a>)?/g, '$1'));
var str1 = '<a class="test" href="test.com" title="test">abc</a>';
console.log(str1.replace(/<a[^>]*href="([^"]+)"[^>]*>(?:.*?<\/a>)?/g, '$1'));
Reference : Using regular expressions to parse HTML: why not?
Related
I have a basic string with HTML tags in it. I want to remove the "span" tag and all of its contents and return the rest of the string and html.
When I do the following, it returns "here"...which is the contents of the matched query...I want to get everything else not the "span" stuff...what am I doing wrong?
var temp = '<div>Some text</div><p style="color:red">More text<span>here</span></p><p>Even more</p>';
var clean_temp = $(temp).find('span').remove();
var $temp = $(clean_temp).html(); //Returns "here"
alert($temp); // "here"
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
The jQuery remove() function removes and item and then returns the removed item, as you already seem to have discovered. The object from which it was removed is still also there, with its content modified.
Also in this case, due to the nature of your source string, doing $() on the source string returns a jQuery collection that wraps 3 separate DOM elements. Doing .find('span').remove() on that collection modifies the middle of these wrapped DOM elements. To reconstruct the HTML, we have to generate HTML from each wrapped DOM element and then join all these HTML parts together.
I created a helper function getHtml() for that purpose, see demo:
function getHtml(jqueryObj) {
return jqueryObj.toArray().map(el => el.outerHTML).join("");
}
var temp = '<div>Some text</div><p style="color:red">More text<span>here</span></p><p>Even more</p>';
var $temp = $(temp);
console.log(getHtml($temp));
$temp.find('span').remove();
console.log(getHtml($temp));
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
You can replace it with an empty string:
var temp = `
<div>Some text</div>
<p style="color:red">
More text <span>here</span>
</p>
<p>
Even more
<span>here as well</span>
</p>`;
// The modifier g looks global, not just the first hit and i is for case-insensitive
var expression = new RegExp(/<span(.*?)<\/span>/gi) // no closing '>' for elements with attributes
var clean_temp = temp.replace(expression, '')
console.log(clean_temp);
I want to extract all the HTML tags like from this <body id = "myid"> .... </body> i just want to extract <body id ="myid"> similarly i want to extract all the HTML tags with attributes and using javascript.
I've tried using regex to make an array of all the tags inclosed between '< & >'
<script>
$(document).ready(function(){
// Get value on button click and show alert
$("#btn_parse").click(function(){
var str = $("#data").val();
var arr = str.split(/[<>]/);
$('#result').text(arr);
});
});
</script>
but it's creating an array arr containing empty and garbage also it's removing angular brackets '<>'
which I don't want.
SO in nutshell I want a script that takes
str ='mystring ... <htmltag id='myid' class='myclass'>i_don't_want_anythin_from_here</htmltag> ...';
and produces an array like:
arr = ["<htmltag id='myid' class='myclass'>","</htmltag>",...];
Here is one dirty way. Add it to the dom so it can be accessed via normal DOM functions, then remove the text, and split the tags and push to an array.
str ="mystring ... <htmltag id='myid' class='myclass'>i_don't_want_anythin_from_here</htmltag> ...";
div = document.createElement("div");
div.innerHTML = str;
document.body.appendChild(div);
tags = div.querySelectorAll("*");
stripped = [];
tags.forEach(function(tag){
tag.innerHTML = "";
_tag = tag.outerHTML.replace("></",">~</");
stripped.push(_tag.split("~"));
});
console.log(stripped);
document.body.removeChild(div);
Assuming you can also get the input from a "live" page then the following should do what you want:
[...document.querySelectorAll("*")]
.map(el=>el.outerHTML.match(/[^>]+>/)[0]+"</"+el.tagName.toLowerCase()+">")
The above will combine the beginning and end tags into one string like
<div class="js-ac-results overflow-y-auto hmx3 d-none"></div>
And here is the same code applied on an arbitrary string:
var mystring="<div class='all'><htmltag id='myid' class='myclass'>i_don't_want_anythin_from_here</htmltag><p>another paragraph</p></div>";
const div=document.createElement("div");
div.innerHTML=mystring;
let res=[...div.querySelectorAll("*")].map(el=>el.outerHTML.match(/[^>]+>/)[0]+"</"+el.tagName.toLowerCase()+">")
console.log(res)
I have string with html code.
<h2 class="some-class">
<a href="#link" class="link" id="first-link"
<span class="bold">link</span>
</a>
NEED TO GET THIS
</h2>
I need to get only text content of h2.
I create this regular expression:
(?<=>)(.*)(?=<\/h2>)
But it's useful if h2 has no inner tags. Otherwise I get this:
<a href="#link" class="link" id="first-link"
<span class="bold">link</span>
</a>
NEED TO GET THIS
Never use regex to parse HTML, check these famous answers:
Using regular expressions to parse HTML: why not?
RegEx match open tags except XHTML self-contained tags
Instead, generate a temp element with the text as HTML and get content by filtering out text nodes.
var str = `<h2 class="some-class">
<a href="#link" class="link" id="first-link"
<span class="bold">link</span>
</a>
NEED TO GET THIS
</h2>`;
// generate a temporary DOM element
var temp = document.createElement('div');
// set content
temp.innerHTML = str;
// get the h2 element
var h2 = temp.querySelector('h2');
console.log(
// get all child nodes and convert into array
// for older browser use [].slice.call(h2...)
Array.from(h2.childNodes)
// iterate over elements
.map(function(e) {
// if text node then return the content, else return
// empty string
return e.nodeType === 3 ? e.textContent.trim() : '';
})
// join the string array
.join('')
// you can use reduce method instead of map
// .reduce(function(s, e) { return s + (e.nodeType === 3 ? e.textContent.trim() : ''); }, '')
)
Reference :
Fastest way to convert JavaScript NodeList to Array?
Rgex is not good for parsing HTML, but if your html is not valid or any way you like to use regex:
(?!>)([^><]+)(?=<\/h2>)
try Demo
It's getting last texts before closing tag of </h2> (IF EXISTS)
To avoid null results changed * to +.
This Regex is completely limit and fitting to limited situations as question mentioned.
demo
var h2 = document.querySelector('h2')
var h2_clone = h2.cloneNode(true)
for (let el of h2_clone.children) {
el.remove()
}
alert(h2_clone.innerText)
Let's assume, I have the following text:
This is a sample url : http://example.com.
These are some images:
<img src="http://example.com/sample1.png" class="sample-image" />
<img src="http://example.com/sample2.png" class="sample-image" />
Another url http://example2.com
Here is regex code that I am using to parse the above text:
const urls = /(\b(https?|ftp):\/\/[A-Z0-9+&##\/%?=~_|!:,.;-]*[-A-Z0-9+&##\/%=~_|])/gim;
const emails = /(\w+#[a-zA-Z_]+?\.[a-zA-Z]{2,6})/gim;
return function(text) {
if(text.match(urls)) {
text = text.replace(urls, "$1")
}
if(text.match(emails)) {
text = text.replace(emails, "$1")
}
return text
}
The above code does this to my text:
This is a sample url : http://example.com.
These are some images:
<img src="<a href=" class="sample-image">"http://example.com/sample1.png">"
<img src="<a href=" class="sample-image">"http://example.com/sample2.png">"
Another url http://example2.com
And I desire the following result:
This is a sample url : http://example.com.
These are some images:
<img src="http://example.com/sample1.png" class="sample-image" /> <!-- Do not change -->
<img src="http://example.com/sample2.png" class="sample-image" /> <!-- Do not change -->
Another url http://example2.com
How can I achieve the above result?
It's always better to avoid using regex for parsing HTML.
RegEx match open tags except XHTML self-contained tags
Using regular expressions to parse HTML: why not?
Instead, generate a temporary DOM element with the content and fetch all text nodes to update the content. Where apply replace the method with regex on the text node contents.
var html = 'This is a sample url : http://example.com These are some images:<img src="http://example.com/sample1.png" class="sample-image" /><img src="http://example.com/sample2.png" class="sample-image" />Another url http://example2.com';
// regex for replacing content
const urls = /(\b(https?|ftp):\/\/[A-Z0-9+&##\/%?=~_|!:,.;-]*[-A-Z0-9+&##\/%=~_|])/gim;
const emails = /(\w+#[a-zA-Z_]+?\.[a-zA-Z]{2,6})/gim;
// for replacing the content
function update(text) {
return text.replace(urls, "$1").replace(emails, "$1");
}
// create a DOM element
var temp = document.createElement('div');
// set the string as your content
temp.innerHTML = html;
console.log(
// get all child nodes and convert into array
// for older browser use `[].slice.call()`
Array.from(temp.childNodes)
// iterate over the elements to generate the content array
.map(function(n) {
// if node is text then update the content and return it
if (n.nodeType == 3)
return update(n.textContent);
// otehrwise return the html content
else
return n.outerHTML;
// join them
}).join('')
)
UPDATE : In case you need to keep the escaped HTML then you need to add an additional method which generates corresponding escaped HTML of a text node.
var html = 'This is a sample url : http://example.com These are some images:<img src="http://example.com/sample1.png" class="sample-image" /><img src="http://example.com/sample2.png" class="sample-image" />Another url http://example2.com hi <a href="#">Sam</a>';
// regex for replacing content
const urls = /(\b(https?|ftp):\/\/[A-Z0-9+&##\/%?=~_|!:,.;-]*[-A-Z0-9+&##\/%=~_|])/gim;
const emails = /(\w+#[a-zA-Z_]+?\.[a-zA-Z]{2,6})/gim;
// for replacing the content
function update(text) {
return text.replace(urls, "$1").replace(emails, "$1");
}
// function for generating escaped html content for text node
function getEncodedText(node) {
// temporary element
var temp = document.createElement('div');
// append the text node
temp.appendChild(node);
// get the escaped html content
return temp.innerHTML
}
// create a DOM element
var temp = document.createElement('div');
// set the string as your content
temp.innerHTML = html;
console.log(
// get all child nodes and convert into array
// for older browser use `[].slice.call()`
Array.from(temp.childNodes)
// iterate over the elements to generate the content array
.map(function(n) {
// if node is text then update the escaped html content and return it
if (n.nodeType == 3)
return update(getEncodedText(n));
// otehrwise return the html content
else
return n.outerHTML;
// join them
}).join('')
)
How about:
str='This is a sample url : http://example.com.\nThese are some images:\n<img src="http://example.com/sample1.png" class="sample-image" />\n<img src="http://example.com/sample2.png" class="sample-image" />\nAnother url http://example2.com';
str= str.replace(/[^"](https?:\/\/[^"\s]+)/g, '$1');
console.log(str);
Output:
This is a sample url :http://example.com.
These are some images:
<img src="http://example.com/sample1.png" class="sample-image" />
<img src="http://example.com/sample2.png" class="sample-image" />
Another urlhttp://example2.com
I have this source
<div class="page"><h1>First Page </h1></div>
How can I convert it to html and use selector like $('.page') ? I tried to assign above string to a variable then use html() it doesn't work.
You can parse your string in HTML, after that if you look the object returned, there's a data property on the first row who contain the html string with good format.
EDIT
You can get HTML object properties without append it to the DOM. Check my edited code.
var test = '<div class="page"><h1>First Page </h1></div>';
var testHTML = $.parseHTML(test);
var elemHTML = $(testHTML[0].data);
console.log(elemHTML.text());
You can try this :
var test = '<div class="page"><h1>First Page </h1></div>';
var testHTML = $.parseHTML(test);
$("body").html(testHTML[0].data);
$(".page").css("color","blue");
//Without append element in the DOM
var elemHTML = $(testHTML[0].data);
console.log(elemHTML.text());
//For count number of element you can use a container without append it to the DOM
var container=$("<div></div>");
container.append(elemHTML);
console.log(container.find(".page").length);
.page{
color:red;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
See comments, first we have to process the entities, then use the result as HTML:
// The string
var str = '<div class="page"><h1>First Page </h1></div>';
// A wrapper element to put it in
var wrapper = $("<body>");
// Process the character entities
wrapper.html(str);
str = wrapper.text();
// Convert the resulting HTML to a structure
wrapper.html(str);
console.log("Text of .page: ", wrapper.find(".page").text());
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
That's verbose for clarity; here's the concise version:
var str = '<div class="page"><h1>First Page </h1></div>';
var wrapper = $("<body>");
wrapper.html(wrapper.html(str).text());
console.log("Text of .page: ", wrapper.find(".page").text());
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
You can use following script for this
$('.page').html('<div class="page"><h1>First Page </h1></div>');
or
var htmlString = '<div class="page"><h1>First Page </h1></div>';
$('.page').html(htmlString);
Jquery automatically convert it to html