How do I parse this piece of innerHTML in JavaScript? - javascript

I did this:
var blah = document.getElementById('id').getElementsByClassName('class')[0].innerHTML;
Now I have this in bar:
<a class="title" href="http://www.example.com/" tabindex="1">Some text goes here</a> <span class="domain">(foobar.co.uk)</span>
I want to read the string "Some text goes here" from the HTML using JS (no jQuery). I don't have access to the site's HTML. I'm parsing a webpage to inject JS for a browser extension.
Will I just have to parse it as a string and find my text from between > and < or is there a way to parse innerHTML in JS?

Basic HTML markup that I am assuming you have:
<div id="id">
<div class="class">
<a class="title" href="http://www.example.com/" tabindex="1">Some text goes here</a> <span class="domain">(foobar.co.uk)</span>
</div>
</div>
So select the anchor and read the text
var theAnchorText = document.getElementById('id').getElementsByClassName('class')[0].getElementsByTagName("a")[0].textContent;
if you need to support IE8
var theAnchor = document.getElementById('id').getElementsByClassName('class')[0].getElementsByTagName("a")[0];
var theAnchorText = theAnchor.textContent || theAnchor.innerText;
and if you are using a modern browser, querySelector makes it a lot cleaner
var theAnchorText = document.querySelector("#id .class a").textContent;

You could approach this two ways. A regexp or textContent on a temp DOM element:
var foo = "<b>bar</b>";
function regexpStrip(str) {
return str.replace(/<[^>]*>/g, '');
}
function parseViaDOM(str) {
var el = document.createElement('div');
el.innerHTML = str;
return el.textContent;
}
console.log(regexpStrip(foo)); // => "bar"
console.log(parseViaDOM(foo)); // => "bar"

Related

add element/class to a first unique word from a string in javascript / jquery

I want to insert a tag from the first unique word from a string using javascript / jquery. It is something like:
jQuery(document).ready(function($){
$(".product_title:contains(NEW)").replace('NEW', '<span class="new">NEW</span>');
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
FROM: <h2 class="product_title">NEW Jeep Wrangler</h2>
INTO: <h2 class="product_title"><span class="new">NEW</span>Jeep Wrangler</h2>
FROM: <h2 class="product_title">REBUILT Jeep Wrangler</h2>
INTO: <h2 class="product_title"><span class="rebuilt">REBUILT</span>Jeep Wrangler</h2>
I tried below approach to replace the first word with html code but didn't work: I received a Uncaught TypeError: $(...).replace is not a function on console.
How to achieve this? I can't find any examples on the web that is applying this similar approach. Please advise.
UPDATE: I think I'm close, I use below approach:
$('.product_title').each(function() {
var text = $(this).text();
$(this).text(text.replace('NEW', '<span class="cust-woo-title-tag-new">NEW</span>'));
});
But the tags are displaying on the front end.
get all the product_title that contains "NEW"
store the previous html value (eg. NEW Jeep Wrangler)
remove the "NEW" word (NEW Jeep Wrangler becomes Jeep Wrangler)
update the innerHTML
$(document).ready(function($){
//get all the product_title that contains "NEW"
let contains_new = $(".product_title:contains(NEW)");
for(let x = 0; x < contains_new.length; x++) {
//store the previous html value
let prev_html = contains_new[x].innerHTML;
//remove the "NEW" word in the prev_html
let new_html = prev_html.replace('NEW','');
//update the inner html
contains_new[x].innerHTML = `<span class"new">NEW</span>${new_html}`;
}
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<h2 class="product_title">NEW Jeep Wrangler</h2>
<h2 class="product_title">NEW Jeep Wrangler</h2>
try to inspect element the result to view the html structure.
Something like this could work:
$('h2').each((i)=>{
txt=$(i).text().split(' ')
i_html = `
<span class ='${txt[0]}.toLocaleLowerCase()' >${txt.pop(0)}</span> ${txt.join(' ')}
`
$(i).html(i_html)
})
If you just want to do it for certain words, create a array with them and check with keywords.include(txt[0])

How to replace HTML link with text link?

i want to replace html links in a string with text links, for example:
<a href="test.com"> should become test.com.
I cant figure out any regex matching all my patterns. Cause links might have more attributes in different orders:
<a class="test" href="test.com" title="test">
How can i achieve that?
let str = '<a class="test" href="test.com" title="test">'
let result = str.split(/href="/)[1].split('"')[0]
console.log(result)
Create a temporary DOM element with the string as HTML content and iterate over all a tags and replace it with the corresponding link(by getting href attribute).
let html = `<a class="test" href="test.com" title="test">`;
// create a temporary div element
let tempDiv = document.createElement('div');
// set html content as your string
tempDiv.innerHTML = html;
// get all a tags and iterate
tempDiv.querySelectorAll('a').forEach(ele => {
// replace element with corresponding link
ele.replaceWith(ele.getAttribute('href')) // or ele.href
})
// get html content of temporary element
console.log(tempDiv.innerHTML)
Or alternately you can use DOMParser for parsing html content.
let html = `<a class="test" href="test.com" title="test">`;
// parser
let parser = new DOMParser();
// parse the string which returs a document object
doc = parser.parseFromString(html, "text/html");
// get all a tags and iterate
doc.querySelectorAll('a').forEach(ele => {
// replace element with corresponding link
ele.replaceWith(ele.getAttribute('href')) // or ele.href
})
// get html content from body
console.log(doc.body.innerHTML)
UPDATE : With regex you can extract and replace the a tag in the following method(not prefered).
var str = '<a class="test" href="test.com" title="test">';
console.log(str.replace(/<a[^>]*href="([^"]+)"[^>]*>(?:.*?<\/a>)?/g, '$1'));
var str1 = '<a class="test" href="test.com" title="test">abc</a>';
console.log(str1.replace(/<a[^>]*href="([^"]+)"[^>]*>(?:.*?<\/a>)?/g, '$1'));
Reference : Using regular expressions to parse HTML: why not?

RegExp. Get only text content of tag (without inner tags)

I have string with html code.
<h2 class="some-class">
<a href="#link" class="link" id="first-link"
<span class="bold">link</span>
</a>
NEED TO GET THIS
</h2>
I need to get only text content of h2.
I create this regular expression:
(?<=>)(.*)(?=<\/h2>)
But it's useful if h2 has no inner tags. Otherwise I get this:
<a href="#link" class="link" id="first-link"
<span class="bold">link</span>
</a>
NEED TO GET THIS
Never use regex to parse HTML, check these famous answers:
Using regular expressions to parse HTML: why not?
RegEx match open tags except XHTML self-contained tags
Instead, generate a temp element with the text as HTML and get content by filtering out text nodes.
var str = `<h2 class="some-class">
<a href="#link" class="link" id="first-link"
<span class="bold">link</span>
</a>
NEED TO GET THIS
</h2>`;
// generate a temporary DOM element
var temp = document.createElement('div');
// set content
temp.innerHTML = str;
// get the h2 element
var h2 = temp.querySelector('h2');
console.log(
// get all child nodes and convert into array
// for older browser use [].slice.call(h2...)
Array.from(h2.childNodes)
// iterate over elements
.map(function(e) {
// if text node then return the content, else return
// empty string
return e.nodeType === 3 ? e.textContent.trim() : '';
})
// join the string array
.join('')
// you can use reduce method instead of map
// .reduce(function(s, e) { return s + (e.nodeType === 3 ? e.textContent.trim() : ''); }, '')
)
Reference :
Fastest way to convert JavaScript NodeList to Array?
Rgex is not good for parsing HTML, but if your html is not valid or any way you like to use regex:
(?!>)([^><]+)(?=<\/h2>)
try Demo
It's getting last texts before closing tag of </h2> (IF EXISTS)
To avoid null results changed * to +.
This Regex is completely limit and fitting to limited situations as question mentioned.
demo
var h2 = document.querySelector('h2')
var h2_clone = h2.cloneNode(true)
for (let el of h2_clone.children) {
el.remove()
}
alert(h2_clone.innerText)

Adding elements using browser console

I'm trying to add an element in a webpage using the browser console.
My element is something like:
<a class="myclass" role="myrole" href="/url.com">
<span class="Label">Hello World</span>
</a>
How can I do this ?
document.getElementById('myid').innerHTML = "<a class='myclass' role='myrole' href='/url.com'><span class='Label'>Hello World</span></a>";
You can add a dummy div or tag with some ID and use this code
HTML:
<div id="myid">
</div>
function addElem(elem, text)
{
var elem = document.createElement(elem), // element to be created
text = document.createTextNode(text); // text node
bd = document.body; // get body
elem.appendChild(text); // elem appended with text
bd.appendChild(elem); // body appended with elem
return elem;
}
call it as so: addElem('p', 'text here');
call from console and see :)
try here itself
open your console,
var _el = '<a class="myclass" role="myrole" href="/url.com"><span class="Label">Hello World</span></a>';
$('.post-text').append(_el);

Finding out what line number an element in the dom occurs on in Javascript?

Though I've never heard of this but, is it possible to retrieve a node from the DOM using JS, and then find out on what line of the file that node occurred on?
I'm open to anything, alternative browsers plugins/add-ons etc...it doesn't need to be cross-browser per say.
I would assume that this would be possible somehow considering that some JS debuggers are capable of finding the line number within a script tag, but I'm not entirely sure.
Ok, forgive me for how large this is. I thought this was a very interesting question but while playing with it, I quickly realized that innerHTML and its ilk are quite unreliable wrt maintaining whitespace, comments, etc. With that in mind, I fell back to actually pulling down a full copy of the source so that I could be absolutely sure I got the full source. I then used jquery and a few (relatively small) regexes to find the location of each node. It seems to work well although I'm sure I've missed some edge cases. And, yeah, yeah, regexes and two problems, blah blah blah.
Edit: As an exercise in building jquery plugins, I've modified my code to function reasonably well as a standalone plugin with an example similar to the html found below (which I will leave here for posterity). I've tried to make the code slightly more robust (such as now handling tags inside quoted strings, such as onclick), but the biggest remaining bug is that it can't account for any modifications to the page, such as appending elements. I would need probably need to use an iframe instead of an ajax call to handle that case.
<html>
<head id="node0">
<!-- first comment -->
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js"></script>
<style id="node1">
/* div { border: 1px solid black; } */
pre { border: 1px solid black; }
</style>
<!-- second comment -->
<script>
$(function() {
// fetch and display source
var source;
$.ajax({
url: location.href,
type: 'get',
dataType: 'text',
success: function(data) {
source = data;
var lines = data.split(/\r?\n/);
var html = $.map(lines, function(line, i) {
return ['<span id="line_number_', i, '"><strong>', i, ':</strong> ', line.replace(/</g, '<').replace(/>/g, '>'), '</span>'].join('');
}).join('\n');
// now sanitize the raw html so you don't get false hits in code or comments
var inside = false;
var tag = '';
var closing = {
xmp: '<\\/\\s*xmp\\s*>',
script: '<\\/\\s*script\\s*>',
'!--': '-->'
};
var clean_source = $.map(lines, function(line) {
if (inside && line.match(closing[tag])) {
var re = new RegExp('.*(' + closing[tag] + ')', 'i');
line = line.replace(re, "$1");
inside = false;
} else if (inside) {
line = '';
}
if (line.match(/<(script|!--)/)) {
tag = RegExp.$1;
line = line.replace(/<(script|xmp|!--)[^>]*.*(<(\/(script|xmp)|--)?>)/i, "<$1>$2");
var re = new RegExp(closing[tag], 'i');
inside = ! (re).test(line);
}
return line;
});
// nodes we're looking for
var nodes = $.map([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], function(num) { return $('#node' + num) });
// now find each desired node in both the DOM and the source
var line_numbers = $.map(nodes, function(node) {
var tag = node.attr('tagName');
var tags = $(tag);
var index = tags.index(node) + 1;
var count = 0;
for (var i = 0; i < clean_source.length; i++) {
var re = new RegExp('<' + tag, 'gi');
var matches = clean_source[i].match(re);
if (matches && matches.length) {
count += matches.length;
if (count >= index) {
console.debug(node, tag, index, count, i);
return i;
}
}
}
return count;
});
// saved till end to avoid affecting source html
$('#source_pretty').html(html);
$('#source_raw').text(source);
$('#source_clean').text(clean_source.join('\n'));
$.each(line_numbers, function() { $('#line_number_' + this).css('background-color', 'orange'); });
},
});
var false_matches = [
"<div>",
"<div>",
"</div>",
"</div>"
].join('');
});
</script>
</head>
<!-- third comment -->
<body id="node2">
<div>
<pre id="source_pretty">
</pre>
<pre id="source_raw">
</pre>
<pre id="source_clean">
</pre>
</div>
<div id="node3">
<xmp>
<code>
// <xmp> is deprecated, you should put it in <code> instead
</code>
</xmp>
</div>
<!-- fourth comment -->
<div><div><div><div><div><div><span><div id="node4"><span><span><b><em>
<i><strong><pre></pre></strong></i><div><div id="node5"><div></div></div></div></em>
</b></span><span><span id="node6"></span></span></span></div></span></div></div></div></div></div></div>
<div>
<div>
<div id="node7">
<div>
<div>
<div id="node8">
<span>
<!-- fifth comment -->
<div>
<span>
<span>
<b>
<em id="node9">
<i>
<strong>
<pre>
</pre>
</strong>
</i>
<div>
<div>
<div>
</div>
</div>
</div>
</em>
</b>
</span>
<span>
<span id="node10">
</span>
</span>
</span>
</div>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>
Something like this?
var wholeDocument = document.getElementsByTagName('html')[0]
var findNode = document.getElementById('whatever')
var documentUpToFindNode = wholeDocument.substr(0, wholeDocument.indexOf(findNode.outerHTML))
var nlsUpToFindNode = documentUpToFindNode.match(/\n/g).length
This can be done. Start by getting the highest node in the document like this:
var htmlNode = document.getElementsByTagName('html')[0];
var node = htmlNode;
while (node.previousSibling !== null) {
node = node.previousSibling;
}
var firstNode = node;
(this code was tested and retrieved both the doctype node as well as comments above the html node)
Then you loop through all nodes (both siblings and children). In IE, you'll only see the elements and comments (not text nodes), so it'll be best to use FF or chrome or something (you said it wouldn't have to be cross browser).
When you get to each text node, parse it to look for carriage returns.
You could try: -
- start at the 'whatever' node,
- traverse to each previous node back to the doc begining while concatenating the html of each node,
- then count the new lines in your collected HTML.
Post the code once you nut it out coz thats a good question :)

Categories