$.get() doesn't work with redirections - javascript

I'm writing an extensions for google chrome that needs searching the contents of all urls in google search page.
For example after searching jquery in google search box I want to see title tag of all links in the result page. I'll get all links with var links=$('a') then I'm trying to use jquery get() function as the way bellow but it doesn't give me the right result:
$.get($('a')[i], function(data) {
console.warn(data);
});
and the result is:
<script>window.googleJavaScriptRedirect=1</script><META name="referrer" content="origin"><script>var m={navigateTo:function(b,a,d){if(b!=a&&b.google){if(b.google.r){b.google.r=0;b.location.href=d;a.location.replace("about:blank");}}else{a.location.replace(d);}}};m.navigateTo(window.parent,window,"https://www.facebook.com/r.php");</script><noscript><META http-equiv="refresh" content="0;URL='https://www.facebook.com/r.php'"></noscript>

AJAX $.get() works with normal HTTP redirects.
The problem you have is that there is a JavaScript redirect on the page you are trying to load with $.get(). The code on the requested page will never run, so the redirect never happens.

Related

Untraceable HTTP redirection?

I'm currently working on a project to track products from several websites. I use a python scraper to retrieve all the URLs related to the listed products, and later, regularly check if these URLs are still active.
To do so I use the Python requests module, run a get request and look at the response's status code. Usually I get 200, 301, 302 or 404 as expected, except in the following case:
http://www.sephora.fr/Parfum/Parfum-Femme/Totem-Orange-Eau-de-Toilette/P2232006
This product has been removed and while opening the link (sorry it's in French), I am briefly shown a placeholder page saying the product is not available anymore and then redirected to the home page (www.sephora.fr).
Oddly, Python still returns a 200 status code and so do various redirect tracers such as wheregoes.com or redirectdetective.com. The worst part is that the response URL still is the original, so I can't even trace it that way.
When analyzing with Chrome DevTools and preserving the logs, I see that at some point the page is reloaded. However I'm unable to find out where.
I'm guessing this is done client-side via Javascript, but I'm not quite sure how. Furthermore, I'd really need to be able to detect this change from within Python.
As a reference, here's a link to a working product:
http://www.sephora.fr/Parfum/Parfum-Femme/Kenzo-Jeu-d-Amour-Eau-de-Parfum/P1894014
Any leads?
Thank you !
Ludwig
The page has a meta tag, that redirects the page to the root URL:
<meta http-equiv="refresh" content="0; URL=/" />

How to create crawelable Ajax using jquery?

I am going to create a crawelable ajax by jquery, How to do it? before I had a website that used jquery Ajax for searching my website but nothing indexed.
this is the new way tha I use:
page 1
And then show result by ajax and don't allow the link to go:
javascript
$("body").on("click","#linkA",function(e){
e.preventDefault();
var href=$(this).attr('href');
$.ajax({
type:"POST",
url:"ajax/return.php",
data:({page:href}),
success:function(data){
$("body").html(data);
}
})
});
my questions:
1- Is the way that I am using true?
2- Is this way crawelable?
I think the way that you are using is true, and it's a good way, but google has an article about Making AJAX Applications Crawlable
As long as the links you provide in the "href" attribute are also rendered correctly by the server if the browser accesses them directly, you're on the safe site. You should also use HTML5 History API and Pushstate in order to reflect the url of the page currently shown, so visitors can use their browser history buttons, send links to pages and favorize them in their browser.
Google and the other search engines normaly won't execute your javascript and try directly to access the links you provide.
If your site got heavy scripts to load or static parts like header, footer, menu it's a great way to improve your loading / rendering speed by hijacking the links and loading only the needed content via javascript.

GM_xmlhttpRequest's responsetext is missing some HTML

If I go to this Google Maps page, some of the HTML is missing in View Source, but shows up in Firebug.
Likewise, when that same URL is passed to my function, the following HTML does not show up in the responseText, but it does show in Firebug when I open the page.
<a id="mapmaker-link" class="kd-button mini left" style="" href="https://www.google.com/mapmaker?ll=41.06877,-112.047203&spn=0.038696,0.132093&t=h&z=14&vpsrc=0&q=1093+W+3090+S,Syracuse,+UT&utm_medium=website&utm_campaign=relatedproducts_maps&utm_source=mapseditbutton_normal">
Here is the function I'm using:
function updateMap(url) {
GM_xmlhttpRequest(
{
method: 'GET',
url: url,
onload: function(resp) {
var ll = resp.responseText.split("mapmaker?")[1];
ll = ll.split("&")[0];
document.getElementById('googlemap').href = url+"&"+ll;
}
});
}
I have placed a sample responseText value at pastebin.com/Tt8nrzG8.
The response is "missing" HTML because the called page loads that HTML (and almost all of the page's content) via AJAX.
GM_xmlhttpRequest (and all other current AJAX methods) only gets the static source of a given page. Such XHR requests cannot process a requested page's javascript, like a browser does when you browse to the page.
In fact, if you save that sample responseText, that you linked, as an HTML file; you'll see it looks like this:
See "How to get an AJAX get-request to wait for the page to be rendered before returning a response?", for the same type of problem. But note that the answer recommends that you use an API, if one is available.
So, use the Google Maps API to get the lat/long you want for your URL.
Or, the easiest approach is still to have the script also run on Google maps pages and do a one-time zoom on links with your special URL parameter -- like I recommended on your previous question. This has the added advantage that no calls to Google are made/needed until you actually decide to click your Google Maps link.
If you do opt for the iframe approach (again, NOT recommended for ANY Google site), beware that you will need to adjust the URL to tell Google to allow iframing and the lat/long information will be in a different part of the page.

is it possible to run a javascript function on a website by using the url of the website?

Suppose that we have a website called example.com
In this website we have a single button which when clicked runs a javascript function let's say myFunction()
Is it possible to create an http request from the url bar of firefox so that when this http request is finished, the button would be clicked as well?
For instance, it would be good if I could include this url to the url bar of my browser
example.com + javascript:myFunction()
which would run the function after loading the page without having me to click on the button
however this does not work for me.
thank you in advance
Nope, that's not possible.
You can't append JS in a url. You can however use JS instead of a url, so javascript:myFunction()
You can use window.onload:
window.onload = function() {
myFunction();
}
as soon as the URL has loaded myFunction will execute, but unless some kind of bad server side logic is used, you can not tell a page to execute some arbitrary JavaScript after it has loaded without the page knowing of it.

Execute javascript retrieved with AJAX in Chrome extension

I am writing a chrome extension that injects a div into a website with a content script. The content script makes an AJAX request to a website that I cleared in the manifest.json file and it inserts the data into the div with innerHTML. Part of what the AJAX request returns is javascript that needs to be executed. The AJAX request from within the content script works fine.
When I make the same AJAX request from a regular website, the javascript that is returned executes just fine, but when I make the AJAX request from the content script it does not execute. No errors are displayed in the console. I don't want to reload the website, if possible.
I assume that this is a security 'feature' and not a bug. How can I turn off or circumvent this behavior?
First off what Rob W said is very important, if you don't already know it, a good explanation of the different environment a content script runs in is useful.
You might want to check this out. It's not 100% what you're looking for but the main part is there. Basically from your background page (if you don't have one already create one), you use chrome.tabs.executeScript() to execute the script you've downloaded. That runs the javascript in the real page context instead of the "content script" context. All you need now is to get that script (in string form) to the background page, and determine the tabId to execute it on (from the sender tab)
You can use chrome.extension.sendMessage to send it to the background page, and in the background.js, use chrome.extension.onMessage to receive the message with your script. From there use the sender argument to get the tabId (sender.tab.id), and build your executeScript call.
One more helpful hint, page scripts (dynamic javascript executions) in chrome by default don't show up in any set way in the chrome debugger, but you can append something like this to the string of your javascript:
"\n//# sourceURL=/myFolder/myDynamicJavascript.js"
This will make this script always show up with the "/myFolder/myDynamicJavascript.js" path for the chrome debugger, allowing you to set breakpoints in the js code you've inserted. It's a lifesaver.

Categories