HTML web page string into array or JSON in JS - javascript

Greetings!
Is it possible to convert an HTML string to an array or JSON using Javascript?
Something like this:
var stringweb = '<html><head>hi</head><body>my body</body></html>';
And as result, I can have this:
var myarray = {[html,
[head,
[hi]
]
[etc...]
]}
Thanks in advance! :)

As you can tell from the comments above, this doesn't seem like the most robust idea... Anyhow, here is a solution that I think gets you what you asked for. It was fun to write, anyhow.
function htmlStringToArray(str) {
var temp = document.createElement('iframe');
temp.style.display = "none";
document.body.appendChild(temp);
var doc = temp.contentWindow.document;
doc.open();
doc.write(str);
doc.close();
var array = htmlNodeToArray(doc.documentElement);
temp.parentNode.removeChild(temp);
return array;
}
function htmlNodeToArray(node) {
if (node.nodeType == 1) {
var array = [node.tagName];
if (node.childNodes.length) {
for (var i=0, child; child = node.childNodes[i]; i++) {
if (child.nodeType == 1 || child.nodeType == 3) {
array.push(htmlNodeToArray(child));
}
}
} else if (node.innerText) {
array.push([node.innerText]);
}
return array;
} else if (node.nodeType == 3) {
return [node.nodeValue];
}
}
I tried it out in the latest chrome, firefox and IE. Here it is running on jsbin: http://jsbin.com/uqize3/7/edit
BTW your HTML string is invalid. Browsers will move "hi" from inside the <head> into the <body>. I assumed you intended to have a <title> in there.

You can do that in JavaScript, because JavaScript is a sufficiently expressive language as to allow just about anything. However, it's not going to be particularly easy: you're going to have to implement (or find) as complete an HTML parser as is necessary to recognize the particular HTML documents that you want to convert. HTML itself is pretty complicated, and that complexity is greatly magnified by the fact that most of the world's stock of existing HTML documents are badly erroneous. Thus, if you've got well-constrained HTML that you know to be valid, or at least consistently invalid, that might make the task a little easier.
edit — #Hemlock points out, quite wisely, that if you're doing this in a browser (that is, if this code is going to run from inside a web page served to browsers), then you've got it a lot easier. You can hand your HTML over to the browser, perhaps as the content document for an <iframe> element you add to the page. If it's not too awful for the browser to parse (and browsers can cope with surprisingly weird HTML), then once the DOM is ready in the <iframe> you can just walk the DOM and generate whatever sort of different representation you want.

Related

Is there a more efficient way to replace multiple strings in a html?

first of all please excuse my bad title. Don't know how to name it.
This is a student project. My task is to explore/examine many, fairly large, texts in a web app.
But this is not important, that's anyway the way how it needs to be done.
The strings that needs to be replaced are stored in a json file. I'm iterating throw the file with
$.each(stringList, function(index, value) {
var isInContent = $('#myContent').text().indexOf(index) > -1;
if (isInContent) {... replaceString(index); ...}
}
And to replace the string I use
$('#myContent :not(script)').contents().filter(function() {
return this.nodeType === 3;
}).replaceWith(function() {
return this.nodeValue.replace(index, myNewString);
});
It works as it should. But it's a bit slow. I think because it's loading for every replace the $('#content') and writes the whole $('#content') after the replace back to the html.
Is there a more efficient way to do it?
Thank you

createElement() vs. innerHTML When to use?

I have some data in a sql table. I send it via JSON to my JavaScript.
From there I need to compose it into HTML for display to the user by 1 of 2 ways.
By composing the html string and inserting into .innerHTML property of the holding element
By using createElment() for each element I need and appending into the DOM directly
Neither of the questions below gives a quantifiable answer.
From first answer in first link, 3rd Reason ( first two reasons stated don't apply to my environment )
Could be faster in some cases
Can someone establish a base case of when createElement() method is faster and why?
That way people could make an educated guess of which to use, given their environment.
In my case I don't have concerns for preserving existing DOM structure or Event Listeners. Just efficiency ( speed ).
I am not using a library regarding the second link I provided. But it is there for those who may.
Research / Links
Advantages of createElement over innerHTML?
JavaScript: Is it better to use innerHTML or (lots of) createElement calls to add a complex div structure?
Adding to the DOM n times takes n times more time than adding to the DOM a single time. (:P)
This is the logic I'm personally following.
In consequence, when it is about to create, for instance, a SELECT element, and add to it several options, I prefer to add up all options at once using innerHTML than using a createElement call n times.
This is a bit the same as to compare BATCH operation to "one to one"... whenever you can factorise, you should!
EDIT: Reading the comments I understand that there's a feature (DOM DocumentFragment) that allow us saving such overhead and at the same time taking advantage of the DOM encapsulation. In this case, if the performances are really comparable, I would definitely not doubt and chose the DOM option.
I thought I read somewhere that the createElement and appendElement is faster. It makes sense, considering document.write() and innerHTML have to parse a string, and create and append the elements too. I wrote a quick test to confirm this:
<html>
<body>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<script>
function inner() {
var test = '';
for (var i=0; i<10000; i++) {
test += '<p>bogus link with some other <strong>stuff</strong></p>';
}
console.time('innerHTML');
document.getElementById('test').innerHTML = test;
console.timeEnd('innerHTML');
}
function jq() {
var test = '';
for (var i=0; i<10000; i++) {
test += '<p>bogus link with some other <strong>stuff</strong></p>';
}
console.time('jquery');
jQuery('#test').html(test);
console.timeEnd('jquery');
}
function createEl() {
var dest = document.getElementById('test');
console.time('createel');
//dest.innerHTML = '';//Not IE though?
var repl = document.createElement('div');
repl.setAttribute('id','test');
for (var i=0; i<10000; i++) {
var p = document.createElement('p');
var a = document.createElement('a');
a.setAttribute('href','../'); a.setAttribute('target','_blank');
a.appendChild(document.createTextNode("bogus link"));
p.appendChild(a);
p.appendChild(document.createTextNode(" with some other "));
var bold = document.createElement('strong');
bold.appendChild(document.createTextNode("stuff"));
p.appendChild(bold);
repl.appendChild(p);
}
dest.parentNode.replaceChild(repl,dest);
console.log('create-element:');
console.timeEnd('createel');
}
</script>
<button onclick="inner()">innerhtml</button>
<button onclick="jq()">jquery html</button>
<button onclick="createEl()">Create-elements</button>
<div id="test">To replace</div>
</body>
</html>
In this example, the createElement - appendChild method of writing out HTML works significantly faster than innerHTML/jQuery!

Efficiently replacing strings within an HTML block in Javascript

I am using Javascript(with Mootools) to dynamically build a large page using HTML "template" elements, copying the same template many times to populate the page. Within each template I use string keywords that need to be replaced to create the unique IDs. I'm having serious performance issues however in that it takes multiple seconds to perform all these replacements, especially in IE. The code looks like this:
var fieldTemplate = $$('.fieldTemplate')[0];
var fieldTr = fieldTemplate.clone(true, true);
fieldTr.removeClass('fieldTemplate');
replaceIdsHelper(fieldTr, ':FIELD_NODE_ID:', fieldNodeId);
parentTable.grab(fieldTr);
replaceIdsHelper() is the problem method according to IE9's profiler. I've tried two implementations of this method:
// Retrieve the entire HTML body of the element, replace the string and set the HTML back.
var html = rootElem.get('html').replace(new RegExp(replaceStr, 'g'), id);
rootElem.set('html', html);
and
// Load the child elements and replace just their IDs selectively
rootElem.getElements('*').each(function(elem) {
var elemId = elem.get('id');
if (elemId != null) elemId = elemId.replace(replaceStr, id);
elem.set('id', elemId)
});
However, both of these approaches are extremely slow given how many times this method gets called(about 200...). Everything else runs fine, it's only replacing these IDs which seems to be a major performance bottleneck. Does anyone know if there's a way to do this efficiently, or a reason it might be running so slow? The elements start hidden and aren't grabbed by the DOM until after they're created so there's no redrawing happening.
By the way, the reason I'm building the page this way is to keep the code clean, since we need to be able to create new elements dynamically after loading as well. Doing this from the server side would make things much more complicated.
I'm not 100% sure, but it sounds to me that the problem is with the indexing of the dom tree.
First of all, do you must use ids or can you manage with classes? since you say that the replacement of the id is the main issue.
Also, why do you clone part of the dom tree instead of just inserting a new html?
You can use the substitute method of String (when using MooTools), like so:
var template = '<div id="{ID}" class="{CLASSES}">{CONTENT}</div>';
template.substitute({ID: "id1", CLASSES: "c1 c2", CONTENT: "this is the content" });
you can read more about it here http://mootools.net/docs/core/Types/String#String:substitute
Then, just take that string and put it as html inside a container, let's say:
$("container_id").set("html", template);
I think that it might improve the efficiency since it does not clone and then index it again, but I can't be sure. give it a go and see what happens.
there are some things you can do to optimise it - and what #nizan tomer said is very good, the pseudo templating is a good pattern.
First of all.
var fieldTemplate = $$('.fieldTemplate')[0];
var fieldTr = fieldTemplate.clone(true, true);
you should do this as:
var templateHTML = somenode.getElement(".fieldTemplate").get("html"); // no need to clone it.
the template itself should/can be like suggested, eg:
<td id="{id}">{something}</td>
only read it once, no need to clone it for every item - instead, use the new Element constructor and just set the innerHTML - notice it lacks the <tr> </tr>.
if you have an object with data, eg:
var rows = [{
id: "row1",
something: "hello"
}, {
id: "row2",
something: "there"
}];
Array.each(function(obj, index) {
var newel = new Element("tr", {
html: templateHTML.substitute(obj)
});
// defer the inject so it's non-blocking of the UI thread:
newel.inject.delay(10, newel, parentTable);
// if you need to know when done, use a counter + index
// in a function and fire a ready.
});
alternatively, use document fragments:
Element.implement({
docFragment: function(){
return document.createDocumentFragment();
}
});
(function() {
var fragment = Element.docFragment();
Array.each(function(obj) {
fragment.appendChild(new Element("tr", {
html: templateHTML.substitute(obj)
}));
});
// inject all in one go, single dom access
parentTable.appendChild(fragment);
})();
I did a jsperf test on both of these methods:
http://jsperf.com/inject-vs-fragment-in-mootools
surprising win by chrome by a HUGE margin vs firefox and ie9. also surprising, in firefox individual injects are faster than fragments. perhaps the bottleneck is that it's TRs in a table, which has always been dodgy.
For templating: you can also look at using something like mustache or underscore.js templates.

How to find a ModalPopupExtender in a JavaScript?

I need to write a Javascript function that run from Master page, to find a ModalPopup in the contenct page and close it. Following code works, but not what I want. I need use something like mpeEditUser.ClientID, but I got an error. Also, it would be nice if I could find a ModalPopup without knowing its id, by its type (ModalPopupExtender) instead. Any suggestion?
function CloseModalPopup() {
var mpu = $find('ctl00_ContentPlaceHolder1_mpeEditUser');
mpu.hide();
}
Here is my solution: (If you see any problem, please let me know. Thanks)
I get the ModalPopup id in the codebehind, and pass it to my javascript function.
In the Page_Load of the default.master.cs:
ContentPlaceHolder cph = (ContentPlaceHolder)FindControl("ContentPlaceHolder1");
string sMpeID = (AjaxControlToolkit.ModalPopupExtender)cph.FindControl("mpeEditUser");
In my Javascript function:
var mpe = $find('<%=sMpeID%>');
if (mpe != null) {
mpe.hide();
}
Its likely the tag is getting mucked up by being called through another page, this happened to me. I don't know the best fix for you, however the way I addressed the issue was to first find the mpe through a javascript function that looked for a vague match out of all of the elements on the page.
var elemets = document.getElementsByTagName("*");
var mpe;
for (var i = 0; i < elemets.length; i++) {
var id = elemets[i].id
if (id.indexOf("mpe") >= 0) {
mpe = elemets[i];
}
}
If you have more then one mpe on the page you may want to match more if the string. For me the elements function only returned about 50 elements, so it was not too much overhead. That may not be the case for you, but even if you dont use this function in the final product it will assist you in discovering the actual ID of the elment.

How to get the page's full content as string in javascript?

I'm writing a bookmarklet, i.e. a bookmark that contains javascript instead of a URL, and I have some trouble. In fact, I cannot remember how I can get the content of the page as a string, so I can apply a regular expression to find what I want. Can you please help me on this?
Before anyone suggests it, I cannot use getElementBy(Id/Name/Tag), because the data I'm looking for is HTML-commented and inside markups, so I don't think that would work.
Thanks.
You can access it through:
document.body.innerHTML
so I can apply a regular expression to find what I want
Do. Not. Use. Regex. To. Parse. HTML.
Especially when the browser has already parsed it for you! Come ON!
the data I'm looking for is HTML-commented
You can perfectly well grab comment content out of the DOM. eg.
<div id="mything"><!-- la la la I'm a big comment --></div>
alert(document.getElementById('mything').firstChild.data);
And if you need to search the DOM for comment elements:
// Get comment descendents
//
function dom_getComments(parent, recurse) {
var results= [];
for (var childi= 0; childi<parent.childNodes.length; childi++) {
var child= parent.childNodes[childi];
if (child.nodeType==8) // Node.COMMENT_NODE
results.push(child);
else if (recurse && child.nodeType==1) // Node.ELEMENT_NODE
results= results.concat(dom_getComments(child));
}
return results;
}

Categories