Trying to build query string and scrape Google results - javascript

I'm trying to build a Google query string, make a request to that page, scrape the HTML, and parse it in a Chrome extension, which is JavaScript. So I have the following code:
var url = "https://www.google.com/search?#q=" + artist + "+" + title;
searchGoogleSampleInformation(url);
function searchGoogleSampleInformation(url)
{
var xhr = new XMLHttpRequest();
xhr.open("GET", url, false);
xhr.onreadystatechange = function ()
{
if (xhr.readyState == 4)
{
return parseGoogleInformation(xhr.responseText, url);
}
}
xhr.send();
}
function parseGoogleInformation(search_results, url)
{
var link = $(".srg li.g:eq(0) .r a", search_results).attr('href');
}
The parse method just grabs the url of the first search result (which is not want I'll end up doing, but just to test that the HTTP Request was working). But link is undefined after that line. Then I used alert(url) and verified that my query string was being built correctly; I copied it from the alert window and pasted into another tab, and it pulled up the results as expected. Then I opened a new window with search_results, and it appeared to be Google's regular homepage with no search at all. I thought that problem might be occurring because of the asynchrony of the xhr.open call, but flipping that didn't help either. Am I missing something obvious?

It's because "https://www.google.com/search?#q=" + artist + "+" + title initially has no search results in the content. Google renders the page initially with no results and then dynamically loads the results via JavaScript. Since you are just fetching the HTML of the page and processing it the JavaScript in the HTML never gets executed.

You are making a cross domain Ajax call, which is not allowed by default. You cannot make a cross domain call unless the server supports it and you pass the appropriate headers.
However, as you mentioned you are building a Chrome extension, it is possible by adding a few fields in the manifest file: https://developer.chrome.com/extensions/xhr#requesting-permission

Related

How do I send user to a link upon successful xhr POST?

I have a working web page, complete with a JavaScript function that displays text messages based on the "non-successful" results, within the same page. Everything is working except this last step.
I need to send a JSON string to my server in a POST, and regardless of outcome, I need the user's browser to navigate to the page returned in the POST. (Just as if it were an ordinary link ( href ="" type of thing.) I am using the custom tag [OK_RESULT_URL] that my server replaces with the real URL just before the page is downloaded.
You see in my code below, that I set the URL to [OK_RESULT_URL] AND the window.location to [OK_RESULT_URL] as well, which seems wrong. That means I'm making two hits to [OK_RESULT_URL], one is a POST with a body (which is correct) and the other one a GET without a body (which is wrong).
I'm a total newbie to JavaScript, so I'm probably missing something obvious. It's as if instead of using xhr.Send() I want to say xhr.SendAndNaviateTo() ... or something like that.
Thanks for any help you can provide.
onApproval: function (response) {
showResult("Approved", JSON.stringify(response, null, ''\t''));
let xhr = new XMLHttpRequest();
let url = "[OK_RESULT_URL]";
xhr.open("POST", url, true);
xhr.setRequestHeader("Content-Type", "application/json");
xhr.onreadystatechange = function () {if (xhr.readyState === 4 && xhr.status === 200){
window.location = "[OK_RESULT_URL]"};
var data = JSON.stringify(response);
xhr.send(data);
}

Modify <div> container on click of a button, insert HTML from GET request

Prerequisites
I have a Website, that displays a page with an input and a button. On the other end is a server that exposes a very basic HTTP API. The API is called like this:
http://127.0.0.1/api/arg1/arg2/arg3
where argX are the arguments. It returns raw HTML. This HTML code needs to be inserted into the Website (another domain). There is a
<div id="container5"></div>
on the website. The HTML needs to be inserted into this container. The code returned by the API is specifically made to be inserted into this container, as it uses CSS classes and scripts from the website, i.e.: the code is not valid for it self.
The Goal
Here is what I have: I've got the API to return what I want, and I got a small JavaScript to run on the website to change the contents of the container:
var element = document.getElementById("container5");
element.innerHTML = "New Contents";
This works so far. Now I need a way to get the HTML from the API to the page. By reading numerous SO questions, it quickly became clear that reading HTML from another URL is close to impossible in JavaScript, due to security constraints.
Is there an easy way to do this with JavaScript or do I need rethink the whole process somehow? One last constraint on my side is that I can only insert JS into the website, I can't - for example - upload a new file to the server.
Edit 1: Workaround!
I solved this for me by using a PHP intermediate file on the requesting server:
<?php
echo file_get_contents('http://example.com');
?>
This will generate a site using the HTML content of any URL. Now the requesting site can read this by using JavaScript:
var getHTML = function ( url, callback ) {
// Feature detection
if ( !window.XMLHttpRequest ) return;
// Create new request
var xhr = new XMLHttpRequest();
// Setup callback
xhr.onload = function() {
if ( callback && typeof( callback ) === 'function' ) {
callback( this.responseXML );
}
}
// Get the HTML
xhr.open( 'GET', url );
xhr.responseType = 'document';
xhr.send();
};
This modifies any element:
var element = document.getElementById("resultpage");
getHTML( 'http://localserver.org/test.php', function (response) {
element.innerHTML = response.documentElement.innerHTML;
});
Checkout CORS https://en.wikipedia.org/wiki/Cross-origin_resource_sharing
also JSONP in same article.

Get text from a link in javascript

I am trying to get text from a service on the same server as my webserver. The link is something like this:
http://<OwnIPadres>:8080/calc/something?var=that
This is my code:
function httpGet(theUrl)
{
alert(theUrl);
var doc = new XMLHttpRequest();
doc.onreadystatechange = function() {
if (doc.readyState == XMLHttpRequest.DONE) {
alert("text: " + doc.responseText );
document.getElementById('ctm').text = doc.responseText;
}
}
doc.open("get", theUrl);
doc.setRequestHeader("Content-Encoding", "UTF-8");
doc.send();
}
The url that i print in my first alert is the good one if i test in my browser, it is an html page with a table in it. But the alert of my text is empty? Is it a problem that the text is html?
Actually, its quite ok that your 'text' is 'html'. The problem is that using a different port counts as cross-site scripting. Therefore, your XMLHttpRequest is being stopped by the browser before it actually reaches your page across port 8080.
I'm not sure what else you're doing before and around this code snippet, but you could try an iframe call to your url to get your data, or you could add an
Access-Control-Allow-Origin: http://:8080/
in your header (however that will only get you the most modern browsers).
Finally, you could pull in a JS framework like JQuery which could help you with pulling in this service data.

How to use javascript to get information from the content of another page (same domain)?

Let's say I have a web page (/index.html) that contains the following
<li>
<div>item1</div>
details
</li>
and I would like to have some javascript on /index.html to load that
/details/item1.html page and extract some information from that page.
The page /details/item1.html might contain things like
<div id="some_id">
picture
map
</div>
My task is to write a greasemonkey script, so changing anything serverside is not an option.
To summarize, javascript is running on /index.html and I would
like to have the javascript code to add some information on /index.html
extracted from both /index.html and /details/item1.html.
My question is how to fetch information from /details/item1.html.
I currently have written code to extract the link (e.g. /details/item1.html)
and pass this on to a method that should extract the wanted information (at first
just .innerHTML from the some_id div is ok, I can process futher later).
The following is my current attempt, but it does not work. Any suggestions?
function get_information(link)
{
var obj = document.createElement('object');
obj.data = link;
document.getElementsByTagName('body')[0].appendChild(obj)
var some_id = document.getElementById('some_id');
if (! some_id) {
alert("some_id == NULL");
return "";
}
return some_id.innerHTML;
}
First:
function get_information(link, callback) {
var xhr = new XMLHttpRequest();
xhr.open("GET", link, true);
xhr.onreadystatechange = function() {
if (xhr.readyState === 4) {
callback(xhr.responseText);
}
};
xhr.send(null);
}
then
get_information("/details/item1.html", function(text) {
var div = document.createElement("div");
div.innerHTML = text;
// Do something with the div here, like inserting it into the page
});
I have not tested any of this - off the top of my head. YMMV
As only one page exists in the client (browser) at a time and all other (virtual/possible) pages are on the server, how will you get information from another page using JavaScript as you will have to interact with the server at some point to retrieve the second page?
If you can, integrate some AJAX-request to load the second page (and parse it), but if that's not an option, I'd say you'll have to load all pages that you want to extract information from at the same time, hide the bits you don't want to show (in hidden DIVs?) and then get your index (or whoever controls the view) to retrieve the needed information from there ... even though that sounds pretty creepy ;)
You can load the page in a hidden iframe and use normal DOM manipulation to extract the results, or get the text of the page via AJAX, grab the part between <body...>...</body>ยจ and temporarily inject it into a div. (The second might fail for some exotic elements like ins.) I would expect Greasemonkey to have more powerful functions than normal Javascript for stuff like that, though - it might be worth to thumb through the documentation.

Problem with making a simple JS XmlHttpRequest call

Edit: Maybe I made the question more complex than it should. My questions is this: How do you make API calls to a server from JS.
I have to create a very simple client that makes GET and POST calls to our server and parses the returned XML. I am writing this in JavaScript, problem is I don't know how to program in JS (started to look into this just this morning)!
As n initial test, I am trying to ping to the Twitter API, here's the function that gets called when user enters the URL http://api.twitter.com/1/users/lookup.xml and hits the submit button:
function doRequest() {
var req_url, req_type, body;
req_url = document.getElementById('server_url').value;
req_type = document.getElementById('request_type').value;
alert("Connecting to url: " + req_url + " with HTTP method: " + req_type);
req = new XMLHttpRequest();
req.open(req_type, req_url, false, "username", "passwd");// synchronous conn
req.onreadystatechange=function() {
if (req.readyState == 4) {
alert(req.status);
}
}
req.send(null);
}
When I run this on FF, I get a
Access to restricted URI denied" code: "1012
error on Firebug. Stuff I googled suggested that this was a FF-specific problem so I switched to Chrome. Over there, the second alert comes up, but displays 0 as HTTP status code, which I found weird.
Can anyone spot what the problem is? People say this stuff is easier to use with JQuery but learning that on top of JS syntax is a bit too much now.
For security reasons, you cannot use AJAX to request a file from a different domain.
Since your Javascript isn't running on http://api.twitter.com, it cannot request files from http://api.twitter.com.
Instead, you can write server-side code on your domain to send you the file.

Categories