Accessing DOM element of a webpage without opening it

Accessing DOM element of a webpage without opening it - javascript

I have started working on JavaScript. I want to count the number of frames/anchors on yahoo homepage without opening it(means I don't want to load the page in another window or frame). I didn't find a proper solution for this without using AJAX.Can't we create a document object referring to remote page?
As I am using JavaScript without any framework, can someone guide me how to do this?

You have very few options if you're working on the client side only. This is primarily because you'll be dealing with the dreaded, but necessary, cross-domain policy. However, even if you don't have cross scripting issues, it won't be possible to accomplish this without AJAX. You'll need to make a request to a server for the page's HTML somehow. I would suggest taking a look at YQL as it appears to solve the cross-domain issue.

You'll have to execute an HTTP request to get the HTML string to feed into a DOM object. AJAX is the easiest way to do that, why do you want to avoid AJAX?
In plain old Java you can create a DocumentBuilder and source a document with a URL, but I don't believe this is available in javascript.

Related

How can I search through the HTML source of an outside URL and return the results to my app?

I am trying to search an outside url for content matching "title" and return the results to my HTML page in the background through Javascript. I have been using Javascript and not found any resources that resolve my query, maybe I'm asking wrong?
but I would basically search the document with :
var title = document.getElementsByName("title");
The hard part is connecting to the page and searching through the HTML source code.
TIA!

You can't generally get the content from an outside URL unless server specifically allows you to do so. But, you can do it from server side. You will be able to get the content of any URL from your server. Server must include an header in response with name access-control-allow-origin which contains patterns/name of your domain.
However, you can do it from server side anyway, unless you are blocked specifically by the server.
You will need to develop a solution in which you grab the content for your outside URL from your server. It can be anything like PHP, Node.js, C# etc. After receiving response from the external server, deliver it in response to the browser using AJAX or anything. Then you can play with it anyway you want using JavaScript or JQuery.
Important Note:
Make sure whatever you are trying to access in anyway, you are allowed to do so. If they (your outside URL) wants to share something with public, they must be providing some APIs or other solutions to allow you access to their content.

Research has led to to a solution, implementing a scraper. There are many in existence,scrapy for instance. Just a head's up for those with the same question.

Cross Domain HTML Get Request

Im using Coda, and Im trying to write a program to use Javascript/Jquery to load nfl's html on their stats page (http://www.nfl.com/stats/player), and then remove all of the excess html. Resulting several lists of players and their stat's.
I've tried using `$('#container').load('http://www.nfl.com/stats/player').
This works fine in Coda, but I can't parse the html the way I want to.
In google chrome I get the error
XMLHttpRequest cannot load http://www.nfl.com/stats/player. Origin null is not
allowed by Access-Control-Allow-Origin.
From what I understand this is a security feature built into all browsers. Is there a workaround to this issue? Can I use a different type of request?
I understand that i should be using JSONP for this type of request, but I dont believe the nfl has an API that would make this possible.
I've seen questions like this get thrown around, but I don't think anyone's given a really good solid answer yet.
I think theres still a lot of people wondering if theres an easy way to $.get cross-domain HTML and parse it.

You're not allowed to do this because it can be used for XSS (cross site scripting) where scripts are accessed by scripts outside of the domain of the site. i.e. you can get cookie information or such this way.
You will have to do this server side. If you're using php you can use $content = file_get_contents('http://nfl.com/stats/player'); or you can do it using curl if you wish.
Otherwise the legit way to do it is through an API, but as you've pointed out that isn't an option in your case.

How to fetch particular HTML contents from remote URL?

I want to fetch particular HTML contents from remote websites url.
The website URL is as follow,
http://www.realtor.com/realestateandhomes-detail/10216-Montwood-Drive_El-Paso_TX_79925_M78337-06548
I want to fetch some specific information from above website url.
Here I attached image it highlight the specific area I want to all highlighted portion from there is a title,image, and descriptions.
How can I fetch the contents using JQuery or Javascript or Json call?
Is any other way to get these?

You might be interested in checking out pjscrape (disclaimer: this is my project). It's a command-line tool using PhantomJS to allow scraping using JavaScript and jQuery in a full browser context.
Scrapers can be written in straight Javascript, executed in the context of the site you're scraping, with a very simple, jQuery-friendly syntax.
It can scrape a single page, an array of pages, or you can define a function to look for more URLs to spider on each page.
It supports JSON and CSV output, either to file or to STDOUT
If the site is static and the structure is uniform, it should be very fast to scrape all the content you need into a structured data format.

This will help you out:
http://papermashup.com/use-jquery-and-php-to-scrape-page-content/

When scraping content, it is vital to consider the following:
Is the content static html or will part of it's content be rendered by ajax-calls?
In the first case, simple http-get-routines like the one used in JNDPNT's comment's Link will be sufficient.
In the second case, you may want to look at automating Selenium via it's Webdriver.
In any case it might be better to ask your colleague if he can provide you with an interface to the raw data, e.g. over a webservice.

If I'm getting you right, you want The user's Browser to scrape The content of another Domain on The Fly, right?
That will Not Be Possible without proxying The Request through some Script on The Same Domain (or via a jsonp Request to a Service that returns The HTML to you) due to The Same Origin Policy.
Sorry to disappoint.

Use the Yahoo Pipes (http://pipes.yahoo.com/pipes/ )service.
This can be used to grab and manipulate the page HTML, extracting the bits you want. Data can then be posted server side using the Web Service module or sent directly to the clients browser using an ordinary javascript callback.

Avoid x-domain solutions

I'm currently working on a web application that customers can add to their webpages by adding a javascript link to a js file on my server. The application read all the javascriptfiles from my sever, but I still get an error when trying to use ajax to get data from my database. I didn't think that would be a problem because the files is on my server.
Can I fix this or do I have to make a cross-browser solution? I don't have any control over the costumers server.
Thanks in advance
Mikael

This is not possible: When you execute a remote script, it runs in the context of the containing document.
There are some popular workarounds for this:
Using an iframe, which fixes the cross-domain problem but doesn't integrate well with the remote site (e.g. no custom styling)
Using JSONP to make cross-domain Ajax requests (detailed explanation here)
Using a server-side proxy script (not an option in this scenario)
Using YQL (I'm not familiar with this but it's said to work)

The same origin policy is based on the host document not the script itself.
You need to use a cross domain ajax technique.

Download file using ajax and webservice

There is this 3rd party webservice. One of the public webmethods available is a GetDocument() method. This method returns a Document object. The Document object has properties for File(byte[]), ContentType(string) ect.
My Question : Can I subscribe to this service using javascript(mootools) + ajax + JSON, return the document object, in this case an excel document, and force the file download?

It is true that typically you cannot initiate a download from JavaScript, but there is a flash component, Downloadify that does enable client side file generation.
So you can serve files for download from HTML/JavaScript.
With that problem solved, you still have the problem of how to get the data that you wish to serve from the source web service.
3rd party implies XSS (cross site scripting) which is a no-no using XmlHttpRequest (Ajax).
A possible solution to this problem could be to use a common hidden IFrame technique to get the data.
Simply have an appropriate (hidden?) form that correctly posts to the web service and point it's action to an hidden IFrame element upon which you are trapping the Load event and parse the data returned.
But current browsers have different levels of security measures that limit your ability to access IFrames with an external source so you are actually stuck here. Sorry to get your hopes up.
The only practical robust way to accomplish what you would like to do is to have a local server side script that can act as a proxy between your HTML/JavaScript and the external web service.
Using such a proxy, you can simply go back to using Ajax to get your data to serve up with Downloadify.
But then, since you are using a server script to get the data, why not just serve the data from the script for download?
These are just my observations on the problem domain you present.

We Keep Coding

JavaScript is the programming language of the Web.

Accessing DOM element of a webpage without opening it - javascript

You'll have to execute an HTTP request to get the HTML string to feed into a DOM object. AJAX is the easiest way to do that, why do you want to avoid AJAX? In plain old Java you can create a DocumentBuilder and source a document with a URL, but I don't believe this is available in javascript.

Related

How can I search through the HTML source of an outside URL and return the results to my app?

Cross Domain HTML Get Request

How to fetch particular HTML contents from remote URL?

Avoid x-domain solutions

Download file using ajax and webservice

Categories

Resources