I'm using the Google Feed API to get feeds and the results contain an link field which is great however a feedburner feed returns something to the effect of google.feedproxy.blahblahblah and I need the actual url for the html. The google.feedproxy url will redirect. Is there a way using Javascript for me to get the url that is being redirected to? Or even just a more elegant way of getting the html url?
Any help is appreciated.
Grab the text in <feedburner:origLink>. it's available in most of the feedburner feeds.
Make sure your Accept header is limited to the RSS and Atom MIME types. If by HTML URL, you mean the site for the feed, look for a <link rel="alternate"/> in the feed itself, as its href is the relevant site.
Related
In my textbook the URL http://services.faa.gov/airport/status/SFO?format=application/JSON was provided. That link points to a page that provides the content of the original page in JSON format. I want to format another webpage's content into JSON so I tried copying the method used, (Also the link my professor provided for an assignment uses the same format) and I get nothing. http://www.programmableweb.com/apitag/weather?format=application/JSON Clicking the link from here leads to a search of the website via a search engine. Copy pasting that exact same link just takes you to the actual webpage. My question is, why cant I just append ?format=application/JSON to any url for the JSON format of the webpage?
If it matters I'm trying to get JSON data to display via a Chrome extension.
My question is, why cant I just append ?format=application/JSON to any url for the JSON format of the webpage?
Because a URL is just data, and there is nothing standard about a query string parameter called "format". The server has to be designed to give you JSON before it can or will do that.
That particular website simply provides a feature where you can get the same data in an alternate format such as JSON. Not all websites provide features like that, and not all of them implement it with the same URL parameter. Some sites may have URLs ending with .html be HTML pages and ones ending with .json provide the same info in JSON. Others might provide a separate API. You might check that website to see if it has a "developers" section that gives information on their API, if they have one.
I have used php simple html dom to no success on this issue.
Now I have gone to DOMDocument and DOMXpath and this does seem promising.
Here is my issue:
I am trying to scrape data from a page which is loaded via a web service request after the page initially shows. It is only milliseconds but because of this, normal scraping shows a template value as opposed to the actual data.
I have found the endpoint url using chrome developer network settings. So if I enter that url into the browser address bar the data displays nicely in JSON format. All Good.
My problem arises because any time the site is re-visited or the page refreshed, the suffix of the endpoint url is randomly-generated so I can't hard-code this url into my php file. For example the end of the url is "?=253648592" on first visit but on refresh it could be "?=375482910". The base of the url is static.
Without getting into headless browsers (I tried and MY head hurts!) is there a way to have Xpath find this random url when the page loads?
Sorry for being so long-winded but I wanted to explain as best I could.
It's probably much easier and faster to just use a regex if you only need one item/value from the HTML. I would like to give an example but therefor I would need a more extended snippet of how the HTML looks like that contains the endpoint that you want to fetch.
Is it possible to give a snippet of the HTML that contains the endpoint?
For some reason the National Weather Service's xml site does not work for me. When I say "does not work", I mean that I've tried both XMLHttpRequest and ajax to GET the xml data from http://w1.weather.gov/xml/current_obs/KSFO.xml in order to write a script that displays current weather conditions. this is my code:
(function (){
updateWeather();
})();
function updateWeather(){
var url= "http://w1.weather.gov/xml/current_obs/KSFO.xml";
$.ajax({
url: url,
dataType: 'xml',
error:function(xhr){
document.getElementById("weatherbox").innerHTML="error" +xhr.status+xhr.statusText;},
success:function(result,status,xhr){
document.getElementById('weatherbox').innerHTML="success";
}
});
}
I know that you typically cannot request information cross domain, but the NWS site says its open to the public and im using an ajax call and since it seems as though nobody else has this problem it must not be a cross domain error, but i have tried using crossDomain: true in the ajax call. I have tried making the url "https:...." instead but that did nothing. I've tried specifying type:'GET' in the ajax call as well. Every time I run the script it returns error0error . Does anyone have any ideas? A working implementation of an ajax call would be even better, I've been working at this for days and it's driving me crazy that I can't seem to retrieve this data.
in response to the first comment, I looked into it before but it seems like the SOAP service is for requesting data packages, such as "the weather in SF from january to september" or something, and from the looks of this:
"XML Feeds of Current Weather Conditions
This page provides access to observed current weather conditions for about 1,800 locations across the United States and US Territories. Two file formats designed for computer to computer data transfer are provided. RSS and XML lists are provided to aid the automated dissemination of this information. More information on RSS and XML formats/feeds. Comments and feedback are welcome. There is additional information about this offering via this Product Description Document.
Select a State or Territory to locate XML weather observations feeds available:
Select a State/Territory above to list display list of observations stations An index list of all available stations is available in XML (900kb): XML Format"
and
"About XML
NWS offers hourly weather observations formatted with xml tags to aid in the parsing of the information by automated programs used to populate databases, display information on webpages or other similar applications. This format is not to be confused with RSS and cannot be read by RSS readers and aggregators. These files present more detailed information than the RSS feeds in strings friendly for parsing. Both the RSS and XML feeds offer URLs to icon images. Additionally, A list of what phrases may appear in the XML tag and suggested icons is available. To access these feeds, select a state and then the last XML link in the column."
from this site: http://w1.weather.gov/xml/current_obs/
i should be able to just use the xml from the link i posted above to retrieve current observation data and not packages like one would use for calculating or predicting forecast trends, AND it seems as though the SOAP request service actually would not work for my purposes because i cannot just order one data point.
You could use JSONP request to avoid getting CORS errors, but this SOAP service does not wrap data in script. Just try reading the docs here. You'll most probably have to create a client. NWS also provides a RESTful API. Read the tutorials here.
If you can use a php proxy, then look at http://www.webresourcesdepot.com/cross-domain-javascript-with-simple-php-proxy/ for the solution, and the corresponding code link at pastebin
To summarize, the solution uses an intermediary to the remote site that sits at the same location as your JS code. You invoke the proxy by setting the url parameter to your target. Let's say you saved the proxy code to filename 'weatherproxy.php' and your webserver supports the php module and curl support, then you would set your variable as
var url = 'weatherproxy.php?url=http://w1.weather.gov/xml/current_obs/KSFO.xml';
With no other options to your proxy, on success it will return a json with the form:
{ status: { http_code: 200 }, contents: "your xml contents as a string" }
From there you would have to invoke an xml interpreter on 'contents'. Alternatively, there are parameters that you can supply to that proxy to return the raw xml. The parameter is '&mode=native'. I'm not sure though that jQuery can properly handle the XML that returns.
Have fun exploring the code.
I am developing a PHP based website. I would like to grab the web page title, content and thumb from any URL submitted by a user. I am not sure how I should proceed. If possible I would like to avoid any third party platform such as Embed. Could you please help?
Cheers.
You must get source of entered web url.
And parse it with regex.
For get source code:
file_get_contents();
After use regex parse:
preg_match();
detail of functions:
http://php.net/manual/en/function.preg-match.php
http://www.w3schools.com/php/func_filesystem_file_get_contents.asp
I have obtained a URL in a variable. Using the url I would like to get a particular content from that HTML page.
The URL is http://www.linkedin.com/profile/view?id=1112465
From this page I would like to get the current company data using JavaScript.
So please help me with this.
Assuming you don't work for linked in, here's the simplest answer: you can't.
There are cross-origin limitations that disallow fetching content from a domain other than the one that's requesting it. What's this mean? abc.com can't request content from xyz.com--at least not without special permission.