How Do Google and Bing's Autocompletion Work?

How Do Google and Bing's Autocompletion Work? - javascript

How Do Google and Bing's Autocompletion Work?
When I use firebug I see no XHR's. In that case, how is the data retrieved?

XHR is restricted to the same domain as the source request, whereas Google and Bing use separate domains to serve their dynamic content.
Instead, they dynamically load new data by adding <script> tags to the page (which show up in the "All" tab of Firebug)

They create <script> elements that point to URLs that return Javascript that calls a function with the results of the autocomplete. (Similar to JSONP)
You can see the requests in Firebug's Net tab.

Related

Is there an alternative to preprocessorScript for Chrome DevTools extensions?

I want to create a custom profiler for Javascript as a Chrome DevTools Extension. To do so, I'd have to instrument all Javascript code of a website (parse to AST, inject hooks, generate new source). This should've been easily possible using chrome.devtools.inspectedWindow.reload() and its parameter preprocessorScript described here: https://developer.chrome.com/extensions/devtools_inspectedWindow.
Unfortunately, this feature has been removed (https://bugs.chromium.org/p/chromium/issues/detail?id=438626) because nobody was using it.
Do you know of any other way I could achieve the same thing with a Chrome Extension? Is there any other way I can replace an incoming Javascript source with a changed version? This question is very specific to Chrome Extensions (and maybe extensions to other browsers), I'm asking this as a last resort before going a different route (e.g. dedicated app).

Use the Chrome Debugging Protocol.
First, use DOMDebugger.setInstrumentationBreakpoint with eventName: "scriptFirstStatement" as a parameter to add a break-point to the first statement of each script.
Second, in the Debugger Domain, there is an event called scriptParsed. Listen to it and if called, use Debugger.setScriptSource to change the source.
Finally, call Debugger.resume each time after you edited a source file with setScriptSource.
Example in semi-pseudo-code:
// Prevent code being executed
cdp.sendCommand("DOMDebugger.setInstrumentationBreakpoint", {
eventName: "scriptFirstStatement"
});
// Enable Debugger domain to receive its events
cdp.sendCommand("Debugger.enable");
cdp.addListener("message", (event, method, params) => {
// Script is ready to be edited
if (method === "Debugger.scriptParsed") {
cdp.sendCommand("Debugger.setScriptSource", {
scriptId: params.scriptId,
scriptSource: `console.log("edited script ${params.url}");`
}, (err, msg) => {
// After editing, resume code execution.
cdg.sendCommand("Debugger.resume");
});
}
});
The implementation above is not ideal. It should probably listen to the breakpoint event, get to the script using the associated event data, edit the script and then resume. Listening to scriptParsed and then resuming the debugger are two things that shouldn't be together, it could create problems. It makes for a simpler example, though.

On HTTP you can use the chrome.webRequest API to redirect requests for JS code to data URLs containing the processed JavaScript code.
However, this won't work for inline script tags. It also won't work on HTTPS, since the data URLs are considered unsafe. And data URLs are can't be longer than 2MB in Chrome, so you won't be able to redirect to large JS files.
If the exact order of execution of each script isn't important you could cancel the script requests and then later send a message with the script content to the page. This would make it work on HTTPS.
To address both issues you could redirect the HTML page itself to a data URL, in order to gain more control. That has a few negative consequences though:
Can't reload page because URL is fixed to data URL
Need to add or update <base> tag to make sure stylesheet/image URLs go to the correct URL
Breaks ajax requests that require cookies/authentication (not sure if this can be fixed)
No support for localStorage on data URLs
Not sure if this works: in order to fix #1 and #4 you could consider setting up an HTML page within your Chrome extension and then using that as the base page instead of a data URL.
Another idea that may or may not work: Use chrome.debugger to modify the source code.

Inject content script into iFrame across domains (but with optional permissions)? [duplicate]

Content Script can be injected programatically or permanently by declaring in Extension manifest file. Programatic injection require host permission, which is generally grant by browser or page action.
In my use case, I want to inject gmail, outlook.com and yahoo mail web site without user action. I can do by declaring all of them manifest, but by doing so require all data access to those account. Some use may want to grant only outlook.com, but not gmail. Programatic injection does not work because I need to know when to inject. Using tabs permission is also require another permission.
Is there any good way to optionally inject web site?

You cannot run code on a site without the appropriate permissions. Fortunately, you can add the host permissions to optional_permissions in the manifest file to declare them optional and still allow the extension to use them.
In response to a user gesture, you can use chrome.permission.request to request additional permissions. This API can only be used in extension pages (background page, popup page, options page, ...). As of Chrome 36.0.1957.0, the required user gesture also carries over from content scripts, so if you want to, you could add a click event listener from a content script and use chrome.runtime.sendMessage to send the request to the background page, which in turn calls chrome.permissions.request.
Optional code execution in tabs
After obtaining the host permissions (optional or mandatory), you have to somehow inject the content script (or CSS style) in the matching pages. There are a few options, in order of my preference:
Use the chrome.declarativeContent.RequestContentScript action to insert a content script in the page. Read the documentation if you want to learn how to use this API.
Use the webNavigation API (e.g. chrome.webNavigation.onCommitted) to detect when the user has navigated to the page, then use chrome.tabs.executeScript to insert the content script in the tab (or chrome.tabs.insertCSS to insert styles).
Use the tabs API (chrome.tabs.onUpdated) to detect that a page might have changed, and insert a content script in the page using chrome.tabs.executeScript.
I strongly recommend option 1, because it was specifically designed for this use case. Note: This API was added in Chrome 38, but only worked with optional permissions since Chrome 39. Despite the "WARNING: This action is still experimental and is not supported on stable builds of Chrome." in the documentation, the API is actually supported on stable. Initially the idea was to wait for a review before publishing the API on stable, but that review never came and so now this API has been working fine for almost two years.
The second and third options are similar. The difference between the two is that using the webNavigation API adds an additional permission warning ("Read your browsing history"). For this warning, you get an API that can efficiently filter the navigations, so the number of chrome.tabs.executeScript calls can be minimized.
If you don't want to put this extra permission warning in your permission dialog, then you could blindly try to inject on every tab. If your extension has the permission, then the injection will succeed. Otherwise, it fails. This doesn't sound very efficient, and it is not... ...on the bright side, this method does not require any additional permissions.
By using either of the latter two methods, your content script must be designed in such a way that it can handle multiple insertions (e.g. with a guard). Inserting in frames is also supported (allFrames:true), but only if your extension is allowed to access the tab's URL (or the frame's URL if frameId is set).

I advise against using declarativeContent APIs because they're deprecated and buggy with CSS, as described by the last comment on https://bugs.chromium.org/p/chromium/issues/detail?id=708115.
Use the new content script registration APIs instead. Here's what you need, in two parts:
Programmatic script injection
There's a new contentScripts.register() API which can programmatically register content scripts and they'll be loaded exactly like content_scripts defined in the manifest:
browser.contentScripts.register({
matches: ['https://your-dynamic-domain.example.com/*'],
js: [{file: 'content.js'}]
});
This API is only available in Firefox but there's a Chrome polyfill you can use. If you're using Manifest v3, there's the native chrome.scripting.registerContentScript which does the same thing but slightly differently.
Acquiring new permissions
By using chrome.permissions.request you can add new domains on which you can inject content scripts. An example would be:
// In a content script or options page
document.querySelector('button').addEventListener('click', () => {
chrome.permissions.request({
origins: ['https://your-dynamic-domain.example.com/*']
}, granted => {
if (granted) {
/* Use contentScripts.register */
}
});
});
And you'll have to add optional_permissions in your manifest.json to allow new origins to be requested:
{
"optional_permissions": [
"*://*/*"
]
}
In Manifest v3 this property was renamed to optional_host_permissions.
I also wrote some tools to further simplify this for you and for the end user, such as
webext-domain-permission-toggle and webext-dynamic-content-scripts. They will automatically register your scripts in the next browser launches and allow the user the remove the new permissions and scripts.

Since the existing answer is now a few years old, optional injection is now much easier and is described here. It says that to inject a new file conditionally, you can use the following code:
// The lines I have commented are in the documentation, but the uncommented
// lines are the important part
//chrome.runtime.onMessage.addListener((message, callback) => {
// if (message == “runContentScript”){
chrome.tabs.executeScript({
file: 'contentScript.js'
});
// }
//});
You will need the Active Tab Permission to do this.

GM_xmlhttpRequest's responsetext is missing some HTML

If I go to this Google Maps page, some of the HTML is missing in View Source, but shows up in Firebug.
Likewise, when that same URL is passed to my function, the following HTML does not show up in the responseText, but it does show in Firebug when I open the page.
<a id="mapmaker-link" class="kd-button mini left" style="" href="https://www.google.com/mapmaker?ll=41.06877,-112.047203&spn=0.038696,0.132093&t=h&z=14&vpsrc=0&q=1093+W+3090+S,Syracuse,+UT&utm_medium=website&utm_campaign=relatedproducts_maps&utm_source=mapseditbutton_normal">
Here is the function I'm using:
function updateMap(url) {
GM_xmlhttpRequest(
{
method: 'GET',
url: url,
onload: function(resp) {
var ll = resp.responseText.split("mapmaker?")[1];
ll = ll.split("&")[0];
document.getElementById('googlemap').href = url+"&"+ll;
}
});
}
I have placed a sample responseText value at pastebin.com/Tt8nrzG8.

The response is "missing" HTML because the called page loads that HTML (and almost all of the page's content) via AJAX.
GM_xmlhttpRequest (and all other current AJAX methods) only gets the static source of a given page. Such XHR requests cannot process a requested page's javascript, like a browser does when you browse to the page.
In fact, if you save that sample responseText, that you linked, as an HTML file; you'll see it looks like this:
See "How to get an AJAX get-request to wait for the page to be rendered before returning a response?", for the same type of problem. But note that the answer recommends that you use an API, if one is available.
So, use the Google Maps API to get the lat/long you want for your URL.
Or, the easiest approach is still to have the script also run on Google maps pages and do a one-time zoom on links with your special URL parameter -- like I recommended on your previous question. This has the added advantage that no calls to Google are made/needed until you actually decide to click your Google Maps link.
If you do opt for the iframe approach (again, NOT recommended for ANY Google site), beware that you will need to adjust the URL to tell Google to allow iframing and the lat/long information will be in a different part of the page.

Is it possible to use jQuery to grab the HTML of another web page into a div?

I am trying to integrate with the FireShot API to given a URL, grab HTML of another web page into a div then take a screenshot of it.
Some things I will need to do after getting the HTML
grab <link> & <script> from <head>
grab <body> into <div>
But 1st, it seems when I try to do a
$.get("http://google.com", function(data) { ... });
I get a 200 in firebug colored red. I think it has to do with sites not allowing you to grab their page with JS? Then is opening a window the best I can do? But how might I control the other page with jQuery or call fsapi on that page?
UPDATE
I tried to do something like below to do something when the new window is ready, but FireBug says "Permission denied to access property 'document'"
w = window.open($url.val());
setTimeout(function() { // if I dont do this, I always get about:blank, is there a better way around this?
$(w.document).ready(function() {
console.log(w.document.body);
});
}, 1000);

I believe the cross-site security setup within Javascript is basically blocking this. You'd likely have to proxy the content through your own domain.
There are a couple other options I think for break the cross-site security constraints, but I'm not sure I'd promote them.

If the "another page" locates within the same domain of your hosting page, yes, you can. Please refer to jQuery's $().load() API.
Otherwise, you're disallowed to do so by the browser's Cross-Site Security Policy. At this moment, you can choose to use iFrame instead of DIV.
Some jQuery plugins, e.g. thickbox provides ability to load pages to appropriate container automatically.

Unless I am correct, I do not believe you can AJAX a page cross domain (e.g. from domain1.com to domain2.com). To get around this, you can have a PHP "proxy" script that does the "getting" of the page and then pass it to JS.
For example, in JS you would get() http://mydomain.com/get/?domain=http://google.com and then do what you need to do!

Google's search suggest XHR requests

I went to google, and had my firebug open. I started typing "in", and then checked the "NET" tab of Firebug, and a couple of new GET requests had been sent to fetch the list of search autocomplete suggestions.
Like:
GET http://clients1.google.com/complete/search?hl=en&client=hp&expIds=17259,17315,23628,24549,26637,26761,26849,26869,27386,27404&q=i&cp=1
But they were classified under the "JS" section, rather than as a "XHR" - why is this? Isn't google making an AJAX GET request behind the scene?

This is almost certainly a JSONP request, used to get around cross-domain restrictions on XHRs. Essentially, they are dynamically inserting <script /> tags into their page, and that's why it shows up under JS in Firebug.

We Keep Coding

JavaScript is the programming language of the Web.