Bookmarklet for ScreenScaping - javascript

http://dy-verse.blogspot.com/2009/08/screen-scraping-with-javascript-firebug.html[link text][1]
outlines a strategy to parse a page and submit contents to a Google spreadsheet that depends on Greasemonkey. I'd like to adapt this approach to a simple bookmarklet where, instead of hardcoding in the page address to be parsed, i would manually navigate to the page in question to execute the bookmarklet.
I need help coding the entry point and assigning the elements to be parsed. My page has 3 (un-nested) tables at the top-level of the document - it has no classnames. How do I go about passing those tables into the start() function?
thx

If you load your page with a bookmarklet you have free hands to use any of your javascript command inside that page.
You can use document.getElementsByTagName('TABLE') or load a JS library that will help you do the job.
And if you are using a modern browser you can use as well the document.querySelectorAll with a CSS selector.

Related

Include code from jQuery load() onto page source code [duplicate]

Many aspects of my site are dynamic. I am using jquery.
I have a div which once the DOM is ready is populated using load().
Then if a button is clicked, using load() once again, this value is replaced by another value.
This kind of setup is common across my site. My homepage is essentially lots of dynamically loaded, refreshed, and changeable content.
What are the repercussions of this for SEO?
Ive seen sites where each page is loaded using load() and then displayed using the animation functions... It looks awesome !
People have posed this question before, but noone has answered it properly.
So any ideas? JQUERY AND SEO??
Thanks
EDIT
Very interesting points. I dont want to overdo my site with jaascript.. just where neccesary to make it look good - my homepage however is one place of concern.
So when the DOM is readY, it loads content into a div. On clicking a tab, this content is changed. I.E No JS, No content.
The beauty here for me is that, there is no duplicated code. Is the suggestion here that i should simply 'print' some default content, then have the tabs link to pages (with the same content) if JS is disabled. I.E sacrifice a little duplicate code for SEO?
As far as degrading goes, my only other place of concern is tabs on the same page.. I have 3 divs, all containing content. On this page two divs are hidden until a tab is clicked. I used this method first before i started playing with JS. Would it perhaps be best to load() these tabs, then have the tab buttons link to where the content is pulled from?
Thanks
None of the content loaded via JavaScript will be crawled.
The common and correct approach is to use Progressive Enhancement: all links should be normal <a href="..."> to actual pages so that your site "makes sense" to a search spider; and the click() event overrides the normal functionality with load() so normal users with JavaScript enabled will see the "enhanced" version of your site.
If your content is navigable when JavaScript is turned off, you'll be a good ways toward being visible to search engines.
Note that search engine crawlers won't be submitting any forms on your site, so if you have any or elements that are meant to be navigating between your site's content pages, that content is not navigable by search engines.
Here is a guidelines how to make Google to crawl content loaded with ajax: http://code.google.com/web/ajaxcrawling/docs/getting-started.html
I use jquery load() asynchronous page load. It greatly improves user experience, but not seo-friendly. Here's the only solution I have found so far:
On first load I do not use jquery load() and try to write cookie with javascript.document.cookie = 'checkjs=on';
On next page load if php script finds this cookie it means that javascript is enabled and jquery load() can be used. If there's no such cookie then javascript is off (probably spider came), so jquery load() is not used.
if (!$_COOKIE['checkjs'] || $_COOKIE['checkjs']!='on'){echo 'js is off, hello Google!'; } else {echo 'js is on, can use jquery load';}
This way I can be sure that most of users can benefit from asynchronous page blocks load, exept for the very first load. And spiders get all content too.
In your case you could just load the same page with new parameter that makes another tab active. Spider is gonna be happy.

Trigger an Event in javascript before objects finish to load

I was trying to write a global JavaScriptfunction which overrides any HTML object (img, iframe, links and so on) before it being loaded by the page. The purpose of the overiding action was to to change the SRC and HREF of these objects using the DOM to any other link.
Unfortunately I didn't find any solution to that without firstly loading the object and only then changing it by the onload event.
My second option was to change the SRC and HREF by matching these attributes with a regular expression and replacing the resultant values. I prefer not to do so because it's slow and consumes a lot of time.
I would be glad if someone can share with his/her experience and help me solve this out.
JavaScript only works within the DOM.
You could however, load the page via AJAX, get the content and do any string manipulation on it.
If you are trying to modify items that exist in the static HTML of the page, you cannot modify them with javascript until they are successfully loaded by the browser. There is no way to modify them before that. They may or may not be visible to the viewer before you have a chance to modify them.
To solve this issue, there are a couple of options.
Put CSS style rules in the page that causes all items that you want to modify to initially be hidden and then your javascript can modify them and then show them so they will not be seen before your modification.
Don't put the items that you want to modify in the static part of your HTML page. You can either create them programmatically with javascript and insert them into the page or you can load them via ajax, modify them after loading them via ajax and then insert them into the page.
For both of these scenarios, you will have to devise a fallback plan if javascript is not enabled.

Script to check if there are other scripts

When archiving web pages, dynamic content has to be treated differently.
How do I detect whether a page uses any JavaScript?
This will end up in a browser extension, so it probably doesn't need to exclude itself from the findings.
Simply checking for <script> tags should be fine.
if (document.querySelectorAll("script").length) {
//there are scripts on this page
}
You could scan the entire page for onclick, etc handlers in HTML tags, but that would be slow for a big page.
That's actually relatively simple -- does it have any <script> tags? Then it probably has dynamic content. Additionally, you may wish to check for <object> tags as occasionally embedded objects will modify the page as well (though I suppose their presence should also make the page considered 'dynamic')
With javascript it ain't possible if javascript is turned off in your browser. You must make some browser check in your server-code.

How to know which JS script manipulated certain elements of the DOM using Firebug

I am inspecting a website, which has tons of JS files loaded from several servers along with jQuery. Number of js files is really big. Some are within the regular scripts tags. Others are loaded dynamically via ajax.
I am interested in certain elements of the DOM which are manipulated because of some js file. I see the dynamic loaded elements in firebug. I needed to know exactly which JS script creates/updates them.
I searched the js files for the classes and the IDs of the elements,so I can have some clue about which js file affects them, but I found nothing.
Is there any direct way using Firebug to know exactly which JS file manipulates certain DOM elements?
Thanks in advance.
Not in a direct way.
Use EventBug addon
Then search by the function signature in your script panel to drill down to the js file
Hope this helps!
You should be able to go to Script tab in firebug, then look at the toolbar right below the script tab you can select all the javascript files included on the page.
If you have an idea which file it is coming from then select that file and then look through the code and set break points on functions you think the event is coming from by clicking on the respective line number, then refresh the page and perform the event that calls the javascript.
You might have to put in a few before you narrow it down, but the break points will make it alot easier to tell which functions are being called for which events.

Editing Facebook Like-Box Css on The FLy?

I am not a coder but, i am able to get my way around code most of the time. However, i found that this is the best place to ask questions relating to code stuff.
I have been working on a website for a client and i am at 95% - the only problem i have is facebook like-box. i have found several tutorials on the web to modify the like box css, and i have implemented most of the recommendations but, i have no favorable results.
Please - stackoverflow help!
I know jquery/javascript is a very powerful language. And facebook like uses javascript iframe/xfbml.
what code would you use, if you were to modify the like box css elements before loading them .
I say load cos i am loading my like box via ".load" ajax. So, when a user clicks the facebook button jquery loads it.
In short: how would i edit a css file on the fly, and then load the edited version afterwards.
thanks
The key problem that you'll have here is that FB's Like button is loaded inside an iframe - a self-contained HTML document within your page (if you use firebug or webkit inspector to inspect the like button, you'll see it's within <body>, <html>, then <iframe>).
The thing about these self-contained pages is that you can't access or manipulate them from the surrounding document (your page). You can change the 'src' attribute (telling the iframe to load a new page), but you can't apply or change styles on the elements inside the page. This is a security limitation that browsers have.
I know that it is possible to have a custom-styled like button, but I don't think it's done with the iframe method.

Categories