Find all occurrence know string in loaded DOM and Javascript - javascript

Question
I would like to find out all occurrence of for example "stackoverflow" in loaded DOM using javascript and replace it with "unknown company"
This text can be a value in html text, html attribute, javascript string - generally all places which could be shown to user.
More details
I cannot search source code, because parts of it are in database, resources, external providers. That is why the easiest way for me is to validate client side.
I have a SPA and 99% is downloaded by AJAX
I am using backbone mixed with standard ASP.NET MVC (but I think it does not change anything)
I cannot provide any code because I do not have an idea how to start
My ideas
Create global handler on ajax success. Search and replace in responseText filtered by content-type: html, text, json, javascript
Read whole DOM into string and make search and replace, but I don't know if it is possible for all above resources.
I hope my question is clear enough, if not I will add more details.

$('.myelement').html(function(index, oldHtml) {
return oldHtml.replace(/stackoverflow/i, 'unknown company');
});
Something like that should replace the text on-the-fly for any given element (and children).
It's up to you to see if it's safe to assume that 'stackoverflow' doesn't appear in any HTML attributes, because they might get replaced too.

Related

Blogger API - Render blog content on personal website

In its content attribute the blogger API returns an ugly blob of HTML. I would like to convert this HTML string data into a dom that I can parse. What is the best way to parse this text in order that I can re-render within a js widget I'm building for another website?
I'd rather not write my own parser that reverse engineers the HTML encoding that Google put into place. I'm ideally looking for a library which undoes the HTML escaping and then turns it into a dom which I can inspect with JQuery.
Apparently this question was based on some slightly false premises. I have since managed to successfully embed blogs in my website. I have been using AngularJS, which apparently escapes HTML by default before embedding it into the dom. This caused some heavy confusion from my side. The response from google is not escaped.
This means parsing it as a dom is simply a matter of calling jquery.parseHtml(). See: http://api.jquery.com/jquery.parsehtml/
Once this is done, whatever jquery transformations need to be made can be made using angularJS's JQLite by calling angular.element('').
Finally, the object can be bound to the document.
Alternatively, the raw content of the list of blog posts can be injected as an html string the regular angular way using something like this:
$scope.frontPagePosts = posts.map(function(post){
post.content = $sce.trustAsHtml(post.content);
return post;
});

Getting access to the original HTML in HtmlUnit HtmlElement?

I am using HtmlUnit to read content from a web site.
Everything works perfectly to the point where I am reading the content with:
HtmlDivision div = page.getHtmlElementById("my-id");
Even div.asText() returns the expected String object, but I want to get the original HTML inside <div>...</div> as a String object. How can I do that?
I am not willing to change HtlmUnit to something else, as the web site expects the client to run JavaScript, and HtmlUnit seems to be capable of doing what is required.
If by original HTML you mean the HTML code that HTMLUnit has already formatted then you can use div.asXml(). Now, if you really are looking for the original HTML the server sent you then you won't find a way to do so (at least up to v2.14).
Now, as a workaround, you could get the whole text of the page that the server sent you with this answer: How to get the pure raw HTML of a page in HTMLUnit while ignoring JavaScript and CSS?
As a side note, you should probably think twice why you need the HTML code. HTMLUnit will let you get the data from the code, so there shouldn't be any need to store the source code but rather the information it is contained in it. Just my 2 cents.

Output script tags without jQuery, avoiding execution

I have JS calling remote server through AJAX. The response contains something similar to this
<script>alert(document.getElementById('some_generated_id').innerHTML; ... </script>
User copies the response and uses for own purposes. Now I need to make sure that not a single browser runs the code when I do this:
var response = '<scrip.....';
document.getElementById('output_box').innerHTML = response;
Same should apply to any HTML tags. I know that .text() from jQuery will do exactly what I need:
var response = '<scrip.....';
$('#output_box').text(response);
I am looking for any solutions, including, but not limited to: escaping special characters, however displaying them correctly; adding zero-width space to tags (has to be efficient); outputting in parts. Has to be pure JS.
If you're using a server-side language there is probably a method to escape special characters.
In PHP you could use htmlspecialchars(), it will convert certain characters that have significance in HTML to HTML entities (i.e. & to &).
They will still display correctly and you'll be able to copy and paste the text, but the javascript shouldn't run.
If you need a pure javascript solution for this, someone has answered that here https://stackoverflow.com/a/4835406/15000

Parsing HTML using JavaScript

I'm working a page that needs to fetch info from some other pages and then display parts of that information/data on the current page.
I have the HTML source code that I need to parse in a string. I'm looking for a library that can help me do this easily. (I just need to extract specific tags and the text they contain)
The HTML is well formed (All closing/ending tags present).
I've looked at some options but they are all being extremely difficult to work with for various reasons.
I've tried the following solutions:
jkl-parsexml library (The library js file itself throws up HTTPError 101)
jQuery.parseXML Utility (Didn't find much documentation/many examples to figure out what to do)
XPATH (The Execute statement is not working but the JS Error Console shows no errors)
And so I'm looking for a more user friendly library or anything(tutorials/books/references/documentation) that can let me use the aforementioned tools better, more easily and efficiently.
An Ideal solution would be something like BeautifulSoup available in Python.
Using jQuery, it would be as simple as $(HTMLstring); to create a jQuery object with the HTML data from the string inside it (this DOM would be disconnected from your document). From there it's very easy to do whatever you want with it--and traversing the loaded data is, of course, a cinch with jQuery.
You can do something like this:
$("string with html here").find("jquery selector")
$("string with html here") this will create a document fragment and put an html into it (basically, it will parse your HTML). And find will search for elements in that document fragment (and only inside it). At the same time it will not put it in page DOM

Is there a way to validate the HTML of a page after AJAX operations are performed on it?

I'm writing a web app that inserts and modifies HTML elements via AJAX using JQuery. It works very nicely, but I want to be sure everything is ok under the bonnet. When I inspect the source of the page in IE or Chrome it shows me the original document markup, not what has changed since my AJAX calls.
I love using the WC3 validator to check my markup as it occasionally reminds me that I've forgotten to close a tag etc. How can I use this to check the markup of my page after the original source served from the server has been changed via Javascript?
Thank you.
Use developer tool in chrome to explore the DOM : it will show you all the HTML you've added in javascript.
You can now copy it and paste it in any validator you want.
Or instead of inserting code in JQuery, give it to the console, the browser will then not be able to close tags for you.
console.log(myHTML)
Both previous answers make good points about the fact the browser will 'fix' some of the html you insert into the DOM.
Back to your question, you could add the following to a bookmark in your browser. It will write out the contents of the DOM to a new window, copy and paste it into a validator.
javascript:window.open("").document.open("text/plain", "").write(document.documentElement.outerHTML);
If you're just concerned about well-formedness (missing closing tags and such), you probably just want to check the structure of the chunks AJAX is inserting. (Once it's part of the DOM, it's going to be well-formed... just not necessarily the structure you intended.) The simplest way to do that would probably be to attempt to parse it using an XML library. (one with an HTML mode that can be made strict, if you're not using XHTML)
Actual validation (Testing the "You can't put tag X inside tag Y" rules which browsers generally don't care too much about) is a lot trickier and, depending on how much effort you're willing to put into it, may not be worth the trouble. (Because, if you validate them in isolation, you'll get a lot of "This is just a fragment" false positives)
Whichever you decide to use, you need to grab the AJAX responses before the browser parses them if you want a reliable test result. (While they're still just a string of text rather than a DOM tree)

Categories