In its content attribute the blogger API returns an ugly blob of HTML. I would like to convert this HTML string data into a dom that I can parse. What is the best way to parse this text in order that I can re-render within a js widget I'm building for another website?
I'd rather not write my own parser that reverse engineers the HTML encoding that Google put into place. I'm ideally looking for a library which undoes the HTML escaping and then turns it into a dom which I can inspect with JQuery.
Apparently this question was based on some slightly false premises. I have since managed to successfully embed blogs in my website. I have been using AngularJS, which apparently escapes HTML by default before embedding it into the dom. This caused some heavy confusion from my side. The response from google is not escaped.
This means parsing it as a dom is simply a matter of calling jquery.parseHtml(). See: http://api.jquery.com/jquery.parsehtml/
Once this is done, whatever jquery transformations need to be made can be made using angularJS's JQLite by calling angular.element('').
Finally, the object can be bound to the document.
Alternatively, the raw content of the list of blog posts can be injected as an html string the regular angular way using something like this:
$scope.frontPagePosts = posts.map(function(post){
post.content = $sce.trustAsHtml(post.content);
return post;
});
Related
I would like to save a string like "<div class='some_class'>some content</div>", in a mongo document, and later, when I fetch this content, I want to convert it into a DOM node.
How can I edit the content before adding it to the DOM and convert it to a DOM node using meteor?
As suggested by #blaze-sahlzen you can use Cheerio for the manipulation of the HTML. But after all, it's just a String that you can also modify with the usual methods.
To show the HTML, I suggest you look into {{{triple braces}}} - they can be used to include raw HTML within your page. You should use these cautiously and make sure the HTML content is safe and free from syntax errors. Be very sure to sanitize (escape/clean) the HTML before you store it in mongo!
If it is a whole template that you wanna render, there is an API for it here.
I am using HtmlUnit to read content from a web site.
Everything works perfectly to the point where I am reading the content with:
HtmlDivision div = page.getHtmlElementById("my-id");
Even div.asText() returns the expected String object, but I want to get the original HTML inside <div>...</div> as a String object. How can I do that?
I am not willing to change HtlmUnit to something else, as the web site expects the client to run JavaScript, and HtmlUnit seems to be capable of doing what is required.
If by original HTML you mean the HTML code that HTMLUnit has already formatted then you can use div.asXml(). Now, if you really are looking for the original HTML the server sent you then you won't find a way to do so (at least up to v2.14).
Now, as a workaround, you could get the whole text of the page that the server sent you with this answer: How to get the pure raw HTML of a page in HTMLUnit while ignoring JavaScript and CSS?
As a side note, you should probably think twice why you need the HTML code. HTMLUnit will let you get the data from the code, so there shouldn't be any need to store the source code but rather the information it is contained in it. Just my 2 cents.
Currently I am creating a website which is completely JS driven. I don't use any HTML pages at all (except index page). Every query returns JSON and then I generate HTML inside JavaScript and insert into the DOM. Are there any disadvantages of doing this instead of creating HTML file with layout structure, then loading this file into the DOM and changing elements with new data from JSON?
EDIT:
All of my pages are loaded with AJAX calls. But I have a structure like this:
<nav></nav>
<div id="content"></div>
<footer></footer>
Basically, I never change nav or footer elements, they are only loaded once, when loading index.html file. Then on every page click I send an AJAX call to the server, it returns data in JSON and I generate HTML code with jQuery and insert like this $('#content').html(content);
Creating separate HTML files, and then for example using $('#someID').html(newContent) to change every element with JSON data, will use even more code and I will need 1 more request to server to load this file, so I thought I could just generate it in browser.
EDIT2:
SEO is not very important, because my website requires logging in so I will create all meta tags in index.html file.
In general, it's a nice way of doing things. I assume that you're updating the page with AJAX each time (although you didn't say that).
There are some things to look out for. If you always have the same URL, then your users can't come back to the same page. And they can't send links to their friends. To deal with this, you can use history.pushState() to update the URL without reloading the page.
Also, if you're sending more than one request per page and you don't have an HTML structure waiting for them, you may get them back in a different order each time. It's not a problem, just something to be aware of.
Returning HTML from the AJAX is a bad idea. It means that when you want to change the layout of the page, you need to edit all of your files. If you're returning JSON, it's much easier to make changes in one place.
One thing that definitly matters :
How long will it take you to develop a new system that will send data as JSON + code the JS required to inject it as HTML into the page ?
How long will it take to just return HTML ? And how long if you can re-use some of your already existing server-side code ?
and check how much is the server side interrection of your pages...
also some advantages of creating pure HTML :
1) It's simple markup, and often just as compact or actually more compact than JSON.
2) It's less error prone cause all you're getting is markup, and no code.
3) It will be faster to program in most cases cause you won't have to write code separately for the client end.
4) The HTML is the content, the JavaScript is the behavior. You're mixing both for absolutely no compelling reason.
in javascript or nay other scripting language .. if you encountered a problem in between the rest of the code will not work
and also it is easier to debug in pure html pages
my opinion ... use scriptiong code wherever necessary .. rest of the code you can do in html ...
it will save the triptime of going to server then fetch the data and then displaying it again.
Keep point No. 4 in your mind while coding.
I think that you can consider 3 methods:
Sending only JSON to the client and rendering according to a template (i.e.
handlerbar.js)
Creating the pages from the server-side, usually faster rendering also you can cache the page.
Or a mixture of this would be to generate partial views from the server and sending them to the client, for example it's like having a handlebar template on the client and applying the data from the JSON, but only having the same template on the server-side and rendering it on the server and sending it to the client in the final format, on the client you can just replace the partial views.
Also some things to think about determined by the use case of the applicaton, is that if you are targeting SEO you should consider ColBeseder advice, of if you are targeting mobile users, probably you would better go with the JSON only response, as this is a more lightweight response.
EDIT:
According to what you said you are creating a single page application, if this is correct, then probably you can go with either the JSON or a partial views like AngularJS has. But if your server-side logic is written to handle only JSON response, then probably you could better use a template engine on the client like handlerbar.js, underscore, or jquery templates, and you can define reusable portions of your HTML and apply to it the data from the JSON.
If you cared about SEO you'd want the HTML there at page load, which is closer to your second strategy than your first.
Update May 2014: Google claims to be getting better at executing Javascript: http://googlewebmastercentral.blogspot.com/2014/05/understanding-web-pages-better.html Still unclear what works and what does not.
Further updates probably belong here: Do Google or other search engines execute JavaScript?
Question
I would like to find out all occurrence of for example "stackoverflow" in loaded DOM using javascript and replace it with "unknown company"
This text can be a value in html text, html attribute, javascript string - generally all places which could be shown to user.
More details
I cannot search source code, because parts of it are in database, resources, external providers. That is why the easiest way for me is to validate client side.
I have a SPA and 99% is downloaded by AJAX
I am using backbone mixed with standard ASP.NET MVC (but I think it does not change anything)
I cannot provide any code because I do not have an idea how to start
My ideas
Create global handler on ajax success. Search and replace in responseText filtered by content-type: html, text, json, javascript
Read whole DOM into string and make search and replace, but I don't know if it is possible for all above resources.
I hope my question is clear enough, if not I will add more details.
$('.myelement').html(function(index, oldHtml) {
return oldHtml.replace(/stackoverflow/i, 'unknown company');
});
Something like that should replace the text on-the-fly for any given element (and children).
It's up to you to see if it's safe to assume that 'stackoverflow' doesn't appear in any HTML attributes, because they might get replaced too.
I'm working a page that needs to fetch info from some other pages and then display parts of that information/data on the current page.
I have the HTML source code that I need to parse in a string. I'm looking for a library that can help me do this easily. (I just need to extract specific tags and the text they contain)
The HTML is well formed (All closing/ending tags present).
I've looked at some options but they are all being extremely difficult to work with for various reasons.
I've tried the following solutions:
jkl-parsexml library (The library js file itself throws up HTTPError 101)
jQuery.parseXML Utility (Didn't find much documentation/many examples to figure out what to do)
XPATH (The Execute statement is not working but the JS Error Console shows no errors)
And so I'm looking for a more user friendly library or anything(tutorials/books/references/documentation) that can let me use the aforementioned tools better, more easily and efficiently.
An Ideal solution would be something like BeautifulSoup available in Python.
Using jQuery, it would be as simple as $(HTMLstring); to create a jQuery object with the HTML data from the string inside it (this DOM would be disconnected from your document). From there it's very easy to do whatever you want with it--and traversing the loaded data is, of course, a cinch with jQuery.
You can do something like this:
$("string with html here").find("jquery selector")
$("string with html here") this will create a document fragment and put an html into it (basically, it will parse your HTML). And find will search for elements in that document fragment (and only inside it). At the same time it will not put it in page DOM