Chrome extension: Get the text of a web page from given url

Chrome extension: Get the text of a web page from given url - javascript

First, I'm completly newbie making chrome extension, then in a part of the chrome extension I will receive differents urls and I want to store the text of the web page to process it later, resulting in an array of boolean variables, each associated with the given url. Schematically it would be something like this:
var result;
function process(text){
if something -> result.push(true);
if not -> result.push(false);
}
function main(){
for (i...){
url = given[i];
text = getHTMLText(url);
process(text);
}
final();//when the loop finish activate another function that use the global variable: result
}
I have problems with main function, first I have tried with synchronous XMLHttpRequest, although it works it's very slow and chrome always give the warning that synchronous XMLHttpRequest is deprecated.
for (var i = 0; i < urls.length; i++){
url = urls[i];
var req = new XMLHttpRequest();
req.open('GET', url, false);
req.send(null);
if (req.status == 200) detecting(req.responseText);
};
Other solution that I find was use fetch(url), but the code that I find I don't fully understand. Although the returned text works correctly but then the proccess function give different results on each page update.
for (var i = 0; i < urls.length; i++){
url = urls[i];
fetch(url).then(function(response) {
response.text().then(function(text) {
detecting(text);
});
});
};
Other problem, but this is because of the little knowledge I have of fetch(), was that I can't store the text out of the fetch(), every time I do console.log give undefined, this greatly complicates the processing of the text for me.
I have seen that maybe it can be done through extension APIs of chrome but I can't see how to do it.

The algorithm shown in your main pseudocode can be implemented easily by using async/await and Promise.all, without a for loop:
(async () => {
const results = await Promise.all(urls.map(processUrl));
console.log(results);
// further processing must be also inside this IIFE
})();
async function processUrl(url) {
try {
const text = await (await fetch(url)).text();
return {url, text, status: detecting(text)};
} catch (error) {
return {url, error};
}
}

Related

nodelist sometimes loads after the function is finished

I have an issue where I sometimes am able to load the nodelist before it is being called but at the same time it sometimes loads after it is being called(Causing an error of the list being undefined).
This is what I wish would appear all the time
Sorry, this is the right image now. This is the error I receive sometimes.
I believe this is the issue but I do not know how to fix it
I have done some searching online and I think it is related to the code being async or synchrous..(I have not learned about this so I am unsure if I am correct). Here's my code. Context: the getNeighbourhoodData() is being onloaded to the body of my html page.
function getNeighbourhoodData(){
var request = new XMLHttpRequest();
request.open('GET', neighbourhood_url, true);
//This function will be called when data returns from the web api
request.onload = function() {
//get all the restaurant records into our neighbourhood array
neighbourhood_array = JSON.parse(request.responseText);
//get User data
displayNeighbourhoods();
};
//This command starts the calling of the restaurant web api
request.send();
}
function displayNeighbourhoods() {
var list = document.getElementsByName("neiList");
console.log(list);
num=0;
alphabet_array=["A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"];
console.log(alphabet_array);
for (var count = 0; count < neighbourhood_array.length; count++) {
var neighbourhood = neighbourhood_array[count].Neighbourhood;
if(neighbourhood_array[count].Neighbourhood.startsWith(alphabet_array[num])== true ){
var cell = '<li><a class="a--grey" href="/restByNeighbourhood.html" onclick="getName(this)" name="Paya Lebar">'+ neighbourhood +'</a></li>';
list[num].insertAdjacentHTML('beforeend', cell);
if(count >= neighbourhood_array.length-1 && num <= 25){
num+=1;
count=-1;
console.log(num);
}
}
else if(count >= neighbourhood_array.length - 1 && num <= 25){
num+=1;
count=-1;
console.log(num);
}
else if(num >= 26){
break;
}
else{
continue;
}
}
}

JavaScript is Single threaded, which means only one thing can happen at a time. However, with async calls you can "act" like a multy-threaded language.
For example the build-in fetch() funktion returns a Promise that you can await.
async function loadURLodContent() { //
const result = await fetch(/* url-path */);
}
So you can await Promises and write async funktions that return promises.
But this topic isnt an easy one. I'd really recomend getting into Promises and Async calls as soon as possible because you're gonna encounter them if you develop in the Web sooner or later.
But to your Problem.... at least from my point of view you're not giving enough information. Tracer69 hase a good proposal for that in the comments.

Is it possible to list / kill / ... all pending promises / async events in a headless chrome?

I've got a bunch of integration tests using headless chrome. Because restarting the browser on an entirely new profile is so expensive the harness tries to "clean up" the browser state (flush caches, clear cookies and storage, ...) on teardown.
However there's a recurring issue that during the cleanup phase some async operations resolve and try to do whatever they do in a now nonsensical state.
There are two issues here:
async stack traces support in CDT are listed as experimental and don't appear at all in the response (possibly because they have to be enabled via a hidden flag somehow)
I have no idea what's still running at that point, and can't really even debug what breaks due to (1)
Is there any way to improve the situation expect by trawling through heisenbugs as they occur, trying to slowly make my way up the async callstacks throuth ever more logging until the root cause is found?

First we make a hook to be able to capture all xhr packets. You'll have to execute this before any of your other scripts load. Probaly put this in your boot/prepare script before running tests.
I have implemented below a start and stop button. start makes 300 xhr requests, just the "normal" way. If you press stop, you can cancel them all. Ideally you'd put the stop event handler code in an beforeunload event.
If you don't want to stop them, you can analyze their state, requested urls, etc... from one neat array where you keep track of everything within code.
This example works because only "so" many requests can be made at the same time by the browser. The rest in the queue waits as pending until a slot comes free. I used a 300 requests because I don't know a large/slow source to request from that isn't CORS protected, and this gives us humans enough time to press the stop button(I hope).
function addXMLRequestCallback(callback){
var oldSend, i;
if( XMLHttpRequest.callbacks ) {
// we've already overridden send() so just add the callback
XMLHttpRequest.callbacks.push( callback );
} else {
// create a callback queue
XMLHttpRequest.callbacks = [callback];
// store the native send()
oldSend = XMLHttpRequest.prototype.send;
// override the native send()
XMLHttpRequest.prototype.send = function(){
// process the callback queue
// the xhr instance is passed into each callback but seems pretty useless
// you can't tell what its destination is or call abort() without an error
// so only really good for logging that a request has happened
// I could be wrong, I hope so...
// EDIT: I suppose you could override the onreadystatechange handler though
for( i = 0; i < XMLHttpRequest.callbacks.length; i++ ) {
XMLHttpRequest.callbacks[i]( this );
}
// call the native send()
oldSend.apply(this, arguments);
}
}
}
/**
* adding some debug data to the XHR objects. Note, don't depend on this,
* this is against good practises, ideally you'll have your own wrapper
* to deal with xhr objects and meta data.
* The same way you can extend the XHR object to catch post data etc...
*/
var xhrProto = XMLHttpRequest.prototype,
origOpen = xhrProto.open;
origSend = xhrProto.send;
xhrProto.open = function (method, url) {
this._url = url;
return origOpen.apply(this, arguments);
};
xhrProto.send = function (data) {
this._data = data;
return origSend.apply(this, arguments);
};
+function() {
var xhrs = [],
i,
statuscount = 0,
status = document.getElementById('status'),
DONE = 4;;
addXMLRequestCallback((xhr) => {
xhrs.push(xhr);
});
document.getElementById('start').addEventListener('click',(e) => {
statuscount = 0;
var data = JSON.stringify({
'user': 'person',
'pwd': 'password',
'organization': 'place',
'requiredkey': 'key'
});
for(var i = 0;i < 300; i++) {
var oReq = new XMLHttpRequest();
oReq.addEventListener("load", (e) => {
statuscount++;
status.value=statuscount;
});
oReq.open("GET", 'https://code.jquery.com/jquery-3.4.1.js');
oReq.send(data);
}
});
document.getElementById('cancel').addEventListener('click', (event) => {
for(i = 0; i < xhrs.length; i++) {
if(xhrs[i].readyState !== DONE) {
console.log(xhrs[i]._url, xhrs[i]._data , 'is not done');
}
}
/** Cancel everything */
for(i = 0; i < xhrs.length; i++) {
if(xhrs[i]) {
xhrs[i].abort();
}
}
});
}();
<button id="start">start requests</button>
<button id="cancel">cancel requests</button>
<progress id="status" value="0" max="300"></progress>
Code of addXMLRequestCallback courtesy of meouw from this answer
Code of xhrProto keeping debug variables courtesy Joel Richard of from this answer

Race conditions using Office.js in Excel

I'm encountering some sort of race condition in the following code where I'm trying to write the response of an HTTP request to the active cell. I've read some possible solutions to the "InvalidObjectPath" errors from Office.js (I'm using ScriptLab specifically), but I don't think I'm trying to use anything across multiple contexts.
The current behavior works sometimes, but other times nothing will get written to the cell.
var counter = 0;
$("#run").click(run);
async function run() {
try {
await Excel.run(async (ctx) => {
var user;
const sUrl = "https://jsonplaceholder.typicode.com/users/1";
var client = new HttpClient();
var range = ctx.workbook.getSelectedRange();
counter++;
client.get(sUrl, function (response) {
var obj = JSON.parse(response);
user = obj.username;
range.values = [[user + counter]];
ctx.sync();
});
await ctx.sync();
});
}
catch (error) {
OfficeHelpers.UI.notify(error);
OfficeHelpers.Utilities.log(error);
}
}
var HttpClient = function() {
this.get = function(aUrl, aCallback) {
var anHttpRequest = new XMLHttpRequest();
anHttpRequest.onreadystatechange = function() {
if (anHttpRequest.readyState == 4 && anHttpRequest.status == 200)
aCallback(anHttpRequest.responseText);
}
anHttpRequest.open( "GET", aUrl, true );
anHttpRequest.send(null);
}
}

The issue is that you're not awaiting the completion of client.get. This means that [some of the time], the Excel.run will complete and "garbage-collect"(ish) some of the objects (range) before the callback inside of client.get is executed.
You can solve the issue in a number of ways:
Do the call to the web service before you execute the Excel.run. In your example here (which may not be realistic for many other scenarios, but it is here), you're not actually relying on anything from the document at all before you do your web call. In that case, no need to be inside the Excel.run at all, you can have Excel.run be part of the callback on the web service call.
Wrap your web-service call in a Promise, so that it can be awaited. Something like this:
var HttpClient = function() {
this.get = function(aUrl) {
return new Promise(function (resolve, reject) {
var anHttpRequest = new XMLHttpRequest();
anHttpRequest.onreadystatechange = function () {
if (anHttpRequest.readyState == 4 && anHttpRequest.status == 200) {
resolve(anHttpRequest.responseText);
} else {
reject(anHttpRequest.statusText);
}
}
anHttpRequest.open("GET", aUrl, true);
anHttpRequest.send(null);
});
}
}
I describe both approaches (and much more...) in a book that I've been writing about Building Office Add-ins using Office.js: https://leanpub.com/buildingofficeaddins/. I'm pasting in a few screenshots below from some of the relevant book content.
BTW, I should say that getting the selection is one of the few times when you don't want to delay a sync, as you want to capture the fleeting point-in-time selection rather than what will become the selection X seconds from now, once the web call succeeds. So this is one of the few cases where you may want to insert an extra await context.sync() even if you don't technically need it. See section "5.8.2: When to sync" in the book for more info.
=====
Promisifying an API:
=====
From about Promises:
=====
From the implementation details section:

Iterate through appended json file

I have this code:
function heatMapRange() {
var script1 = document.createElement('script');
script1.src = 'allCoords.js';
document.getElementsByTagName('head')[0].appendChild(script1);
}
which appends the allCoords.js file above:
allCoords_callback({
"coordinates": [
[50.1729677, 12.6692243, 580000],
[50.001168, 14.4270033, 2895000],
[50.6988037, 13.9384015, 945000],
[50.1218161, 14.4824004, 409900],
[49.470061, 17.0937597, 1499000],
[49.8509959, 18.5593087, 380000]
]
});
What I want is to iterate through this data with something like this:
function allCoords_callback(results1) {
for (var i = 0; i < results1.coordinates.length; i++) {
alert(results1.coordinates[i]);
}
}
Is it possible?

You can iterate an array in javascript with Array.map().
In your example will be something like:
results1.coordinates.map(function(coordinate) { alert(coordinate); })
That's about iterating part.
Then, another topic is how do you get JSON you need to process. In the example given on Google Maps docs they do it using JSONP just because that is the way the real-time earthquake data works. Another method to fetch data are XMLHttpRequests (AKA as AJAX). This is a more common practice and I would recomend using it if possible.
In your case I would re-write your code to look something like this:
function heatMapRange(){
var request = new XMLHttpRequest();
request.open('GET', '/allCoords.js', true);
request.onload = function () {
if (request.status >= 200 && request.status < 400) {
// Success!
var data = JSON.parse(request.responseText);
// process the data in the response, like
// iterating through the list of coordinates
data.coordinates.map(function(coordinate) { alert(coordinate); })
} else {
// We reached our target server, but it returned an error
}
}
request.error = function () {
// There was a connection error of some sort
}
request.send();
}
Which fetch the data from the JSON file allCoords.json:
{
"coordinates": [
[50.1729677,12.6692243,580000],
[50.001168,14.4270033,2895000],
[50.6988037,13.9384015,945000],
[50.1218161,14.4824004,409900],
[49.470061,17.0937597,1499000],
[49.8509959,18.5593087,380000]
]
}
This way of fetching data from a server align more with the best practices used in the industry. This is just the straight forward example using vanillaJS XMLHttpRequest. There are tons of libraries that simplify this action. Even better there is Fetch API coming tackling the topic of fetching resources.

Well the code on the top works, problem was that I disabled the alerts in google chrome. So closing the tab and reopening page did the trick.

Concept - Designing a collapsible queue for asynchronous resources

I've noticed that the size of a file requested will effect how long the response takes for ajax calls. So if I fire 3 ajax GET requests for files of varying size, they may arrive in any order. What I want to do is guarantee the ordering when I append the files to the DOM.
How can I set up a queue system so that when I fire A1->A2->A3. I can guarantee that they are appeneded as A1->A2->A3 in that order.
For example, suppose A2 arrives before A1. I would want the action to wait upon the arrival and loading of A1.
One idea is to create a status checker using a timed callback as such
// pseudo-code
function check(ready, fund) {
// check ready some how
if (ready) {
func();
} else {
setTimeout(function () {
check(ready, fund);
}, 1); // check every msec
}
}
but this seems like a resource heavy way, as I fire the same function every 1msec, until the resources is loaded.
Is this the right path to complete this problem?

status checker using a 1msec-timed callback - but this seems like a resource heavy way; Is this the right path to complete this problem?
No. You should have a look at Promises. That way, you can easily formulate it like this:
var a1 = getPromiseForAjaxResult(ressource1url);
var a2 = getPromiseForAjaxResult(ressource2url);
var a3 = getPromiseForAjaxResult(ressource3url);
a1.then(function(res) {
append(res);
return a2;
}).then(function(res) {
append(res);
return a3;
}).then(append);
For example, jQuery's .ajax function implements this.

You can try something like this:
var resourceData = {};
var resourcesLoaded = 0;
function loadResource(resource, callback) {
var xhr = new XMLHttpRequest();
xhr.onload = function() {
var state = this.readyState;
var responseCode = request.status;
if(state == this.DONE && responseCode == 200) {
callback(resource, this.responseText);
}
};
xhr.open("get", resource, true);
xhr.send();
}
//Assuming that resources is an array of path names
function loadResources(resources) {
for(var i = 0; i < resources.length; i++) {
loadResource(resources[i], function(resource, responseText) {
//Store the data of the resource in to the resourceData map,
//using the resource name as the key. Then increment the
//resource counter.
resourceData[resource] = responseText;
resourcesLoaded++;
//If the number of resources that we have loaded is equal
//to the total number of resources, it means that we have
//all our resources.
if(resourcesLoaded === resources.length) {
//Manipulate the data in the order that you desire.
//Everything you need is inside resourceData, keyed
//by the resource url.
...
...
}
});
}
}
If certain components must be loaded and executed before (like certain JS files) others, you can queue up your AJAX requests like so:
function loadResource(resource, callback) {
var xhr = new XMLHttpRequest();
xhr.onload = function() {
var state = this.readyState;
var responseCode = request.status;
if(state == this.DONE && responseCode == 200) {
//Do whatever you need to do with this.responseText
...
...
callback();
}
};
xhr.open("get", resource, true);
xhr.send();
}
function run() {
var resources = [
"path/to/some/resource.html",
"path/to/some/other/resource.html",
...
"http://example.org/path/to/remote/resource.html"
];
//Function that sequentially loads the resources, so that the next resource
//will not be loaded until first one has finished loading. I accomplish
//this by calling the function itself in the callback to the loadResource
//function. This function is not truly recursive since the callback
//invocation (even though it is the function itself) is an independent call
//and therefore will not be part of the original callstack.
function load(i) {
if (i < resources.length) {
loadResource(resources[i], function () {
load(++i);
});
}
}
load(0);
}
This way, the next file will not be loaded until the previous one has finished loading.
If you cannot use any third-party libraries, you can use my solution. However, your life will probably be much easier if you do what Bergi suggested and use Promises.

There's no need to call check() every millisecond, just run it in the xhr's onreadystatechange. If you provide a bit more of your code, I can explain further.

I would have a queue of functions to execute and each of them checks the previous result has completed before executing.
var remoteResults[]
function requestRemoteResouse(index, fetchFunction) {
// the argument fetchFunction is a function that fetches the remote content
// once the content is ready it call the passed in function with the result.
fetchFunction(
function(result) {
// add the remote result to the list of results
remoteResults[index] = result
// write as many results as ready.
writeResultsWhenReady(index);
});
}
function writeResults(index) {
var i;
// Execute all functions at least once
for(i = 0; i < remoteResults.length; i++) {
if(!remoteResults[i]) {
return;
}
// Call the function that is the ith result
// This will modify the dom.
remoteResults[i]();
// Blank the result to ensure we don't double execute
// Store a function so we can do a simple boolean check.
remoteResults[i] = function(){};
}
}
requestRemoteResouse(0, [Function to fetch the first resouse]);
requestRemoteResouse(1, [Function to fetch the second resouse]);
requestRemoteResouse(2, [Function to fetch the thrid resouse]);
Please note that this is currently O(n^2) for simplicity, it would get faster but more complex if you stored an object at every index of remoteResults, which had a hasRendered property. Then you would only scan back until you found a result that had not yet occurred or one that has been rendered.

We Keep Coding

JavaScript is the programming language of the Web.

Chrome extension: Get the text of a web page from given url - javascript

Related

nodelist sometimes loads after the function is finished

Is it possible to list / kill / ... all pending promises / async events in a headless chrome?

Race conditions using Office.js in Excel

Iterate through appended json file

Concept - Designing a collapsible queue for asynchronous resources

Categories

Resources