Converting html page represented as text to dom object

Converting html page represented as text to dom object - javascript

I have a text that represents some page. I need to convert this text to dom object, extract body element and append it to my dom.
I have used following code to convert text and extract body element:
$('body', $(text)).length
and:
$(text).filter('body').length
In both cases it returns 0...
To test: http://jsfiddle.net/wEyvr/1/

jQuery is parsing whole HTML in a non-standard way, so $(html) doesn't work as expected.
You can extract the content of the body tag using regexp and work from there:
// get the content of the body tags
var body = $(text.match(/<body[\s\S]*?>([\s\S]*?)<\/body>/i)[1]);
// append the content to our DOM
body.appendTo('body');
// bonus - to be able to fully use find -> we need to add single parent
var findBody = $("<body />").html(body.clone());
// now we are able to use selectors and have fun
findBody.find("div.cls").appendTo('body');
HERE is the working code.
EDIT: Changed the code to show both direct append and also using selectors.

Something like this:
var ifr = $("<iframe>"),
doc = ifr.appendTo("body")[0].contentWindow.document,
bodyLength;
doc.open();
doc.write(text);
doc.close();
bodyLength = ifr.contents().find("body").length;
ifr.remove();
alert(bodyLength);
http://jsfiddle.net/wEyvr/2/

Related

Find And Change Element In a Parsed HTML DOM

I am getting an HTML string in response to an ajax request. It is a large HTML string with a lot of hierarchical child nodes.
I parse it using
jQuery.parseHTML();
to convert it into a DOM. Now i want to change the content of a child node with a certain ID and then regenerate the HTML.
The Problem is when ever i use a jQuery method to select a dom element to make the changes, it returns that particular node and the
jQuery.html()
just changes that node to HTML.
I have tried following code samples
var parsedHTML = jQuery.parseHTML( 'htmlstring' );
jQuery(parsedHTML).find('#element-id').text('changed text').html();
or
jQuery(parsedHTML).filter('#element-id').text('changed text').html();
the problem is it only returns span#element-id and when html() is applied, the generated html has only span text.
How can i generate back the complete html and change the specific node?

Don't chain (or if you do, use end, but simpler really just not to). By chaining, you're saying you only want the HTML of the last set of elements in the chain:
var elements = jQuery(parsedHTML);
elements.filter('#element-id').text('changed text');
var html = elements.html();
But elements.html() will only give you the inner HTML of the first element. To get the full HTML string again, you need to get the outer HTML of each element and join them together:
var html = elements.map(function() {
return this.outerHTML;
}).get().join("");
Note that your use of filter assumes the element is at the top level of the HTML string. If it is, great, that's fine. If it isn't, you'll want find instead.
Example with filter:
var parsedHTML = jQuery.parseHTML(
"<span>no change</span>" +
"<span id='element-id'>change me</span>" +
"<span>no change</span>"
);
var elements = jQuery(parsedHTML);
elements.filter('#element-id').text('changed text');
console.log(elements.map(function() {
return this.outerHTML;
}).get().join(""));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
Example with find:
var parsedHTML = jQuery.parseHTML(
"<span>no change</span>" +
"<div>the span is in here<span id='element-id'>change me</span></div>" +
"<span>no change</span>"
);
var elements = jQuery(parsedHTML);
elements.find('#element-id').text('changed text');
console.log(elements.map(function() {
return this.outerHTML;
}).get().join(""));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

How to use regex to replace text between tags

I'd like to replace some text in a string that represents a div tag that may or may not also include style and class attributes. For example,
var s = "<div style='xxx' class='xxx'>replaceThisText<div>
If it were just the tag, I believe I could just do this:
str = str.replace(/<div>[\s\S]*?<\/div>/, '<div>' + newText+ '<\/div>');
But how do I take the attributes into account?

Generate a temporary element with your string as HTML content then get the div within it to update content after updating the content get back the HTML of temporary element.
var s = "<div style='xxx' class='xxx'>replaceThisText<div>";
// create a temporary div element
var temp = document.createElement('div');
// set content as string
temp.innerHTML = s;
// get div within the temporary element
// and update the content within the div
temp.querySelector('div').innerHTML = 'newText';
// get back the current HTML content in the
// temporary div element
console.log(temp.innerHTML)
Why not regex?
RegEx match open tags except XHTML self-contained tags
Using regular expressions to parse HTML: why not?

Regex will never be a good decision to parse html content.
Consider the following short solution using DOMParser object(for browsers which support DOMParser implementation, see compatibility table):
var s = "<div style='xxx' class='xxx'>replaceThisText<div>",
tag = (new DOMParser()).parseFromString(s, 'text/html').querySelector('.xxx');
tag.textContent = 'newText'; // replacing with a new text
console.log(tag.outerHTML); // outputs the initial tag representation with replaced content
https://developer.mozilla.org/ru/docs/Web/API/DOMParser

String Filtering.Need to remove the <style> tag and its contents and keep only the contents in <body>

In our project, we are getting a response from the DB. We are using the same string in two ways.
We have to display the text part alone in one line
We are putting the entire content as an HTML.
We are getting a response similar to this.
"<html><head><title>SomeTitle</title></head><style>a.hover{color:green}cc.a{color:red},pq.a{text-decoration:underline}</style> <body> Some content </body></html>"
I need to get the content only from the body using string manipulation.I need to filter out all the contents of the other tags as well.
For example
Final result should be
Some content
I used text() in some case but at times the content inside is also getting displayed. That is not allowed for me.
Note: There are times where I don't get so there should be a check for that as well.
any solution on this?
At times we are getting inside body as well. So is there any way to remove that part off?
for example
var str = "<html><head><title>SomeTitle</title></head><style>a.hover{color:green}cc.a{color:red},pq.a{text-decoration:underline}</style> <body> <style>.hello12{color:green}</style>Some content </body></html>";
and i should get just "some content"

Use DOMParser and get text content from body tag. Where querySelector can be used to get body element and get text content from textContent property.
var str = "<html><head><title>SomeTitle</title></head><style>a.hover{color:green}cc.a{color:red},pq.a{text-decoration:underline}</style> <body> Some content </body></html>";
var parser = new DOMParser();
var doc = parser.parseFromString(str, "text/html");
console.log(
doc.querySelector('body').textContent
)
FYI : To avoid script and style tag content use innerText property instead of textContent property.

Extracting text from a HTML to be stored as a JS variable, then to be added to a separate HTML's element

Alrite, I have seen other Questions with similar titles but they don't do exactly what Im asking.
I have 2 x HTML documents, one containing my page, one containing a element with a paragraph of text in it. As-well as a separate .js file
what I want to do is extract this text, store it as a JS variable and then use jQuery to edit the contents of an element within the main page. This is the conclusion I came to but it didnt work as expected, im not sure if it is me making a syntax error or if i am using the wrong code completely:
$(document).ready(function(){
var c1=(#homec.substring(0))
// #homec is the container of the text i need
$(".nav_btn #1").click(function(c1){
$(".pcontent span p") .html(+c1)}
);
});
i know +c1 is most probably wrong, but i have been struggling to find the syntax on this one. thankyou in advance :D

var c1=(#homec.substring(0)) will throw an error because #homec is not a valid variable name, is undefined, and does not have a property function called substring. To get the html of an element with an id of homec, use the html method:
var c1 = $("#homec").html();
c1 should not be an argument of the click function because it is defined in the parent scope. +c1 is unnecessary because you do not need to coerce c1 to a number.
If you are trying to add content to the end of the paragraph, use the append method:
$(".pcontent span p").append(c1)
That means you should use this code instead:
$(document).ready(function() {
var c1 = $("#homec").html();
$(".nav_btn #1").click(function() {
$(".pcontent span p").append(c1)
});
});
P.S. Numbers are not valid ID attributes in HTML. Browsers support it, so it won't make anything go awry, but your pages won't validate.

Try this:
$(".nav_btn #1").click(function(c1){
var para = $(".pcontent span p");
para.html(para.html() + c1);
});

The JQuery text() function will allow you to get the combined text contents of each element in the set of matched elements, including their descendants. You can then use the text(value) function to set the text content of your target paragraph element. Something like this should suffice:
$(document).ready(function() {
var c1 = $("homec").text();
$(".nav_btn #1").click(function() {
$(".pcontent span p").text(c1);
});
});
See the JQuery documentation for more details on the text() function. If you need to capture the full structure of the other document, then try the html() function instead.

Extracting Metadata from Website

I was wondering if there's a way in javascript that allows me to process the html source code that allows me to take out specific tags that I want?
Sorry if it sounds easy or too simple. i am new to programming.

If you have the HTML in a string, then you can use:
var str = '<html></html>'; // your html text goes here
var div = document.createElement('div');
div.innerHTML = str;
var dom = div.firstChild; // dom is the object you want,
// you can manipulate it using standard dom methods
Alternately, use jQuery. jQuery is a library to help you manipulate and access HTML elements more easily. First, add this to the head of your document:
<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js"></script>
This is a reference to the jQuery library. Then, do:
var foo = $("<html>Your html here</html>");
Or, if your html is in a variable (e.g. str), you can do:
var foo = $(str);
Then, you can manipulate and parse foo in a number of ways. For example, to remove all paragraph elements, you would use
foo.remove('p');
Or, to remove the paragraph element with id="bar", use:
foo.remove('p.bar');
Once you are done your modifications, you can get the new html text using:
foo.html();
Why is your html in a string? Is it not the html of the current page?

Use DOM it can pull data from webpages if you know the structure.

We Keep Coding

JavaScript is the programming language of the Web.

Converting html page represented as text to dom object - javascript

Something like this: var ifr = $("<iframe>"), doc = ifr.appendTo("body")[0].contentWindow.document, bodyLength; doc.open(); doc.write(text); doc.close(); bodyLength = ifr.contents().find("body").length; ifr.remove(); alert(bodyLength); http://jsfiddle.net/wEyvr/2/

Related

Find And Change Element In a Parsed HTML DOM

How to use regex to replace text between tags

String Filtering.Need to remove the <style> tag and its contents and keep only the contents in <body>

Extracting text from a HTML to be stored as a JS variable, then to be added to a separate HTML's element

Extracting Metadata from Website

Categories

Resources