Identify and extract rendered DOM in a web page - javascript

I am interested in extracting the text/string literals thats rendered in the web page during a workflow. I want to isolate the strings that are rendered from the ones that are hidden.
My intention is to find the strings which get rendered and map them against each workflow in my application. (I have a way to uniquely identify each string, so thats solved. No worry on that.)
Is there a way i could use preferably .net or in JScript or browser addons or using some trace options in a browser achieve it?
Any help is appreciated thanks!

How are you pulling the content? One way to only get the visible content is to use Jquery's visible selector. Simply add :visible to your jQuery selector (http://api.jquery.com/visible-selector/). For example on this page you could use $('.post-text:visible') to find the question text only if it is visible.

Related

Crawler4j, Jsoup and JavaScript: extract attribute values modified with JavaScript

I'm using a Crawler4j and Jsoup to crawl a website and it works fine for the HTML text, but there are some important contents, which default values are hard coded in CSS and then dynamically set with JavaScript.
For example, I have the
and I need the width value, which in CSS is hard coded as 10px, but modified in JavaScript to, let's say, 5px.
Is there a way to get this value without using another crawler? Or a simple alternative?
I have already quite a lot of code, so I don't want to rewrite everything if there is a possibility to do that with the Crawler4j.
I hope my question is clear enough and thank you in advance for your help!
This is not possible with crawler4j nor with jsoup. They both handle only static HTML content.
There are several open issues related dynamic JavaScript execution on the official GitHub Repository: #49, #197 and #220.
To achieve your objectives, you would need to build a stack based on Selenium, CasperJS and/or PhantomJS, which could then be used for advanced crawling including JavaScript execution.

generic way to check if tag is shown in the browser

I want to filter only the a's that are really shown currently in the browser.
I used getBoundingClientRect function, but there are still some cases that I can't filter - like hidden or covered a tags.
Is there a more generic way in pure javascript to know exactly what are the a tags that are really shown currently and the user really can see?
I need it for my chrome extension that uses these tags. So if there is a way to do it using chrome extension API it is a good solution too.
But Mostly the problem is li tags where not all the elements inside the list are shown. How can I know what elements are shown and what not?
Thanks!

disable layout in javascript

is there an equivalent to
ZendFrameWork's disableLayout and setNoLayout for javascript
where i can just print an object to the browser without the website rendering?
The reason i ask is simply convenience and easy of use (for me). it's much easier for me to quickly scan through data this way, than printing out objects to firebug's console. (finding it hard to click my way through massive sized objects).
Thanks in advance..
The layout or markup is rendered by Zend and is received by the browser as is. Now technically you can strip the markup out using jQuery if you know where the output starts, but you cannot per se disable the layout using javascript since the browser doesn't know what the layout is and what the content is.
If the content is contained inside a container with id "content" and evrything around it is the layout.
You could do jQuery("body").html(jQuery("#content")) and only that part will remain stripping everything else. The css etc will still remain. You can always replace html with a structured rendeition, or simply null the head tag to remove styling. $("head").remove();

Is $.empty() enough for big ajaxy apps?

Been working on an App and since it's getting a bit too big I've thinking of ways to improve memory management since the app runs mostly on Javascipt. So every time a navigation item is clicked I would call the jquery empty then show the html via ajax. ex:
//$.ajaxSetup(); called before this
//$this is the attached element
$.ajax({success:function(data){
$this.empty().html(data.output).fadeIn(400);
//more javascript stuff like loading tinymce or jquery ui
}});
is this enough to prevent memory leaks? I'm not entirely sure what empty does but I'm assuming it removes all DOM elements within that div along with any other objects and events? btw. You can find the app here http://webproposalgenerator.com/ and http://webproposalgenerator.com/demo.
any tips on improving the performance/security or any feedback at all would be greatly appreciated.
$.fn.empty should be enough, it deletes all data and events associated to the elements and then deletes the elements. It also calls .widget("destroy") on all jquery-ui widget.js based widgets that are defined on those elements.
It is also important to note that jquery's $.fn.html method calls $.fn.empty() on the given element before appending html, therefore, if you are using $.fn.html, you don't have to call $.fn.empty
actually my guess was that .html implies .empty anyway, also I'm not sure that's true. for the perforamnce part: according to jqfundamentals excelent book it is a recommanded best practice to add content while the element is in .detach() from the DOM. tried to lock at the code for advice but didn't find it. nice site btw

Is there a way to create your own HTML element?

Is there a way to create your own HTML element? I want to make a specially designed check box.
I imagine such a thing would be done in JavaScript. Something akin to document.createHTMLElement but the ability to design your own element (and tag).
No, there isn't.
The HTML elements are limited to what the browser will handle. That is to say, if you created a custom firefox plugin, and then had it handle your special tag, then you "could" do it, for varying interpretations of "doing it". A list of all elements for a particular version of HTML may be found here: http://www.w3.org/TR/html4/index/elements.html
Probably, however, you don't actually want to. If you want to "combine" several existing elements in such a way as they operate together, then you can do that very JavaScript. For example, if you'd like a checkbox to, when clicked, show a dropdown list somewhere, populated with various things, you may do that.
Perhaps you may like to elaborate on what you actually want to achieve, and we can help further.
Yes, you can create your own tags. You have to create a Schema and import it on your page, and write a JavaScript layer to convert your new tags into existing HTML tags.
An example is fbml (Facebook Markup Language), which includes a schema and a JavaScript layer that Facebook wrote. See this: Open Graph protocol.
Using it you can make a like button really easily:
<fb:like href="http://developers.facebook.com/" width="450" height="80"/>
The easiest way would be probably to write a plugin say in Jquery (or Dojo, MooTools, pick one).
In case of jQuery you can find some plugins here http://plugins.jquery.com/ and use them as a sample.
You need to write own doctype or/and use own namespace to do this.
http://msdn.microsoft.com/en-us/magazine/cc301515.aspx
No, there is not. Moreover it is not allowed in HTML5.
Take a look at Ample SDK JavaScript GUI library that enables any custom elements or event namespaces client-side (this way XUL for example was implemented there) without interferring with the rules of HTML5.
Take a look into for example how XUL scale element implemented: http://github.com/clientside/amplesdk/blob/master/ample/languages/xul/elements/scale.js and its default stylesheet: http://github.com/clientside/amplesdk/blob/master/ample/languages/xul/themes/default/input.css
It's a valid question, but I think the name of the game from the UI side is progressive markup. Build out valid w3 compliant tags and then style them appropriately with javascript (in my case Jquery or Dojo) and CSS. A well-written block of CSS can be reused over and over (my favorite case is Jquery UI with themeroller) and style nearly any element on the page with just a one or two-word addition to the class declaration.
Here's some good Jquery/Javascript/CSS solutions that are relatively simple:
http://www.filamentgroup.com/examples/customInput/
http://aaronweyenberg.com/90/pretty-checkboxes-with-jquery
http://www.protofunc.com/scripts/jquery/checkbox-radiobutton/
Here's the spec for the upcoming (and promising) JqueryUI update for form elements:http://wiki.jqueryui.com/Checkbox
If you needed to validate input, this is an easy way to get inline validation with a single class or id tag: http://www.position-absolute.com/articles/jquery-form-validator-because-form-validation-is-a-mess/
Ok, so my solution isn't a 10 character, one line solution. However, Jquery Code aside, each individual tag wouldn't be much more than:
<input type="checkbox" id="theid">
So, while there would be a medium chunk of Jquery code, the individual elements would be very small, which is important if you're repeating it 250 times (programmatically) as my last project required. It's easy to code, degrades well, validates well, and because progressive markup would be on the user's end, have virtually no cost on the server end.
My current project is in Symfony--not my choice--which uses complex, bulky server-side tags to render form elements, validate, do javascript onclick, style, etc. This seems like what you were asking for at first....and let me tell you, it's CLUNKY. One tag to call a link can be 10 lines of code long! After being forced to do it, I'm not a fan.
Hm. The first thought is that you could create your own element and do a transformation with XSLT to the valid HTML then.
With the emergence of the emerging W3 Web Components standard, specifically the Custom Elements spec, you can now create your own custom HTML elements and register them with the parser with the document.register() DOM method.
X-Tag is a helpful sugar library, developed by Mozilla, that makes it even easier to work with Web Components, have a look: X-Tags.org

Categories