I want to fetch particular HTML contents from remote websites url.
The website URL is as follow,
http://www.realtor.com/realestateandhomes-detail/10216-Montwood-Drive_El-Paso_TX_79925_M78337-06548
I want to fetch some specific information from above website url.
Here I attached image it highlight the specific area I want to all highlighted portion from there is a title,image, and descriptions.
How can I fetch the contents using JQuery or Javascript or Json call?
Is any other way to get these?
You might be interested in checking out pjscrape (disclaimer: this is my project). It's a command-line tool using PhantomJS to allow scraping using JavaScript and jQuery in a full browser context.
Scrapers can be written in straight Javascript, executed in the context of the site you're scraping, with a very simple, jQuery-friendly syntax.
It can scrape a single page, an array of pages, or you can define a function to look for more URLs to spider on each page.
It supports JSON and CSV output, either to file or to STDOUT
If the site is static and the structure is uniform, it should be very fast to scrape all the content you need into a structured data format.
This will help you out:
http://papermashup.com/use-jquery-and-php-to-scrape-page-content/
When scraping content, it is vital to consider the following:
Is the content static html or will part of it's content be rendered by ajax-calls?
In the first case, simple http-get-routines like the one used in JNDPNT's comment's Link will be sufficient.
In the second case, you may want to look at automating Selenium via it's Webdriver.
In any case it might be better to ask your colleague if he can provide you with an interface to the raw data, e.g. over a webservice.
If I'm getting you right, you want The user's Browser to scrape The content of another Domain on The Fly, right?
That will Not Be Possible without proxying The Request through some Script on The Same Domain (or via a jsonp Request to a Service that returns The HTML to you) due to The Same Origin Policy.
Sorry to disappoint.
Use the Yahoo Pipes (http://pipes.yahoo.com/pipes/ )service.
This can be used to grab and manipulate the page HTML, extracting the bits you want. Data can then be posted server side using the Web Service module or sent directly to the clients browser using an ordinary javascript callback.
Related
This is my situation:
I have a third part that uses a software called microstrategy which is able to generate documents and allow to export them as PDF or Excel files. They provide me only web api of this product, and I haven't any web service to work with.
The url is like:
http://<third_part_domain>/microstrategy/asp/Main.aspx?Server=<third_part_domain>&Project=<project_name>&evt=3069&src=Main.aspx.3069&executionMode=3&promptAnswerMode=1&documentID=<doc_id>&uid=<username>&pwd=<password>&<other_parameters_for_request>
I have try to obtain the file (that I must save on server side) by java code, but the response of the link that we use is an HTML page with some code Javascript that does more than one redirect, so I can not interpreted correctly the response and I should use a browser to obtain the PDF.
So I have thought to put the page into a iframe and after a while (usually the server takes 20 second) take the PDF object by javascript code and send to my server. But obviously the third part have another domain and the CORS policies block everything. To make matters worse, I can not use the final url to obtain the file because the microstrategy respond me with an internal page of the administration console.
So, that's my question:
Is there a way (that is not on the microstrategy server side) to obtain directly the PDF from microstrategy?
Or exists a way from client side to bypass the problem of origin control? I have evaluated to implement a proxy for solution but it's too expensive.
Thanks to all!
You need two things in order to download a PDF from MicroStrategy using a URL:
In the document property set that default visualization as PDF. This is pretty trivial and I think any of your MicroStrategy savvy colleague can help you with this.
Disable the waiting page, this is more complicated. When MicroStrategy generates a documents, usually it needs some time, meanwhile the server is working it will show you a waiting page. Useful if the request comes from a human (the human can go on StackOverflow), not that much if the call arrives from API.
The instruction to disable the waiting page are here: TN34124: How to Disable the Wait Page in MicroStrategy Web using the MicroStrategy Web SDK 9.x.
But I read from your question that you have no control on the third party MicroStrategy application. In that case there is little you can do. You can try to ask them to implement the customization to remove the waiting page or allow you to use taskproc API, but that's a story for another day.
Some options:
Ask the third party to schedule the PDF generation on their side and send it via mail to you. Or place it on a shared folder that is shared between you.
Ask for a different URL Tuareg from the file-share menu options. This will give a URL with 'subscriptionid' in it.
I am trying to search an outside url for content matching "title" and return the results to my HTML page in the background through Javascript. I have been using Javascript and not found any resources that resolve my query, maybe I'm asking wrong?
but I would basically search the document with :
var title = document.getElementsByName("title");
The hard part is connecting to the page and searching through the HTML source code.
TIA!
You can't generally get the content from an outside URL unless server specifically allows you to do so. But, you can do it from server side. You will be able to get the content of any URL from your server. Server must include an header in response with name access-control-allow-origin which contains patterns/name of your domain.
However, you can do it from server side anyway, unless you are blocked specifically by the server.
You will need to develop a solution in which you grab the content for your outside URL from your server. It can be anything like PHP, Node.js, C# etc. After receiving response from the external server, deliver it in response to the browser using AJAX or anything. Then you can play with it anyway you want using JavaScript or JQuery.
Important Note:
Make sure whatever you are trying to access in anyway, you are allowed to do so. If they (your outside URL) wants to share something with public, they must be providing some APIs or other solutions to allow you access to their content.
Research has led to to a solution, implementing a scraper. There are many in existence,scrapy for instance. Just a head's up for those with the same question.
I'm looking for a method to scrape a website from server side (which uses javascript) and save the output after analyzing data into a mysql database. I need to navigate from page to page by clicking links and submitting data from the database,without session expiring . Is this possible using phpquery web browser plugin? . I've started doing this using casperjs. I would like to know the pros and cons of both methods. I'm a beginner in the coding space. Please help.
I would recommend that you use PhantomJS or CasperJS and parse the DOM with JavaScript selectors to get the parts of the pages you want back. Don't use phpQuery as it's based on PHP and would require a separate step in your processing versus using just JavaScript DOM parsing. Also, you won't be able to perform click events using PHP. Anything client side would need to be run in PhantomJS or CasperJS.
It might even be possible to write a full scraping engine using just PHP if that's your server side language of choice. You would need to reverse engineer the login process and maintain a cookie jar with your cURL requests to keep your login valid with each request. Once you've established a session with the the website, you can then setup your navigation path with an array of links that you would like to crawl. The idea behind web crawling is that you load a page from some link and process the page and then move to the next link. You continue this process until all pages have been processed and then your crawl is complete.
I would check out Google's guide Making AJAX Applications Crawlable the website you're trying to scrap might have adopted the scheme (making their site's content crawlable).
You want to look for #! in the URL's hash fragment, this indicates to the crawler that the site supports the AJAX crawling scheme.
To put it simply, when you come across a URL like this.
www.example.com/ajax.html#!key=value you would modify it to www.example.com/ajax.html?_escaped_fragment_=key=value. The server should respond with a HTML snapshot of that page.
Here is the Full Specification
I'm trying to build a very simple bit of javascript that reads and displays a couple stock indexes when a webpage is loaded.
I was hoping to find an RSS feed with this data that I could then parse with jQuery.parseXML, but I couldn't track one down. What I did find is this: Yahoo Finance provides a way to download stock data in CSV format, specifying which data you're after via the URL.
So, I'm thinking this might be a way to accmplish what I'm after: when the page is loaded, I could send a request to Yahoo Finance, and then somehow parse the CSV data to get the data I need to populate my stock quote. My question relates to the aforementioned "somehow." Is there a way to do this via javascript? Is it possible to, for example, somehow load the CSV generated by Yahoo Finance as a string?
I'm also very open to any other suggestions of how to accomplish this. If anyone, for example, knows of an RSS feed from which I could get the S&P/TSX Composite index, please let me know!
You'll probably run into some cross site scripting problems as the browser will not let you do that. See the howto on that about avoiding that. You could also do it on the server side and then query that from you client. Depends on the server side technology you are using.
After that parsing the CSV shouldn't be a problem. Use something like string.split on each line.
JavaScript is by default not allowed to cross-domain requests unless you use JSON-P as your format, requesting CSV directly from another domain will not be allowed. Therefor this is a bit problematic. In this case you will probably have to setup a proxy within your own domain that will fetch the data from Yahoo server-side, and send it to your JavaScript from within your own domain.
There is this 3rd party webservice. One of the public webmethods available is a GetDocument() method. This method returns a Document object. The Document object has properties for File(byte[]), ContentType(string) ect.
My Question : Can I subscribe to this service using javascript(mootools) + ajax + JSON, return the document object, in this case an excel document, and force the file download?
It is true that typically you cannot initiate a download from JavaScript, but there is a flash component, Downloadify that does enable client side file generation.
So you can serve files for download from HTML/JavaScript.
With that problem solved, you still have the problem of how to get the data that you wish to serve from the source web service.
3rd party implies XSS (cross site scripting) which is a no-no using XmlHttpRequest (Ajax).
A possible solution to this problem could be to use a common hidden IFrame technique to get the data.
Simply have an appropriate (hidden?) form that correctly posts to the web service and point it's action to an hidden IFrame element upon which you are trapping the Load event and parse the data returned.
But current browsers have different levels of security measures that limit your ability to access IFrames with an external source so you are actually stuck here. Sorry to get your hopes up.
The only practical robust way to accomplish what you would like to do is to have a local server side script that can act as a proxy between your HTML/JavaScript and the external web service.
Using such a proxy, you can simply go back to using Ajax to get your data to serve up with Downloadify.
But then, since you are using a server script to get the data, why not just serve the data from the script for download?
These are just my observations on the problem domain you present.