Consuming chunked data asyncrhonously in javascript - javascript

I have a (GET) endpoint that sends data in chunks (Transfer-Encoding: chunked). The data is JSON encoded and sent line by line.
Is there a way to consume the data sent by this endpoint in an asynchronous manner in JavaScript (or using some JavaScript library)?
To be clear, I know how to perform an asynchronous GET, but I would like to have the GET request not waiting for the whole data to be transfered, but instead read the data line by line as it arrives. For instance, when doing:
curl http://localhost:8081/numbers
The lines below are shown one by one as they become available (the example server I made is waiting a second between sending a line and the second).
{"age":1,"name":"John"}
{"age":2,"name":"John"}
{"age":3,"name":"John"}
{"age":4,"name":"John"}
I would like to reproduce the same behavior curl exhibits, but in the browser. I don't want is leave the user wait till all the data becomes available in order to show anything.

Thanks to Dan and Redu I was able to put together an example that consumes data incrementally, using the Fetch API . The caveat is that this will not work on Internet Explorer, and it has to be enabled by the user in Firefox:
/** This works on Edge, Chrome, and Firefox (from version 57). To use this example
navigate to about:config and change
- dom.streams.enabled preference to true
- javascript.options.streams to true
See https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream
*/
fetch('http://localhost:8081/numbers').then(function(response) {
console.log(response);
const reader = response.body.getReader();
function go() {
reader.read().then(function(result) {
if (!result.done) {
var num = JSON.parse(
new TextDecoder("utf-8").decode(result.value)
);
console.log(
"Got number " + num.intVal
);
go ();
}
})
}
go ();
})
The full example (with the server) is available at my sandbox. I find it illustrative of the limitations of XMLHttpRequest to compare this version with the this one, which does not use the fetch API.

Related

scrapy + selenium: <a> tag has no href, but content is loaded by javascript

I'm almost there with my first try of using scrapy, selenium to collect data from website with javascript loaded content.
Here is my code:
# -*- coding: utf-8 -*-
import scrapy
from selenium import webdriver
from scrapy.selector import Selector
from scrapy.http import Request
from selenium.webdriver.common.by import By
import time
class FreePlayersSpider(scrapy.Spider):
name = 'free_players'
allowed_domains = ['www.forge-db.com']
start_urls = ['https://www.forge-db.com/fr/fr11/players/?server=fr11']
driver = {}
def __init__(self):
self.driver = webdriver.Chrome('/home/alain/Documents/repository/web/foe-python/chromedriver')
self.driver.get('https://forge-db.com/fr/fr11/players/?server=fr11')
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
#time.sleep(1)
sel = Selector(text = self.driver.page_source)
players = sel.xpath('.//table/tbody/tr')
for player in players:
joueur = player.xpath('.//td[3]/a/text()').get()
guilde = player.xpath('.//td[4]/a/text()').get()
yield {
'player' : joueur,
'guild' : guilde
}
next_page_btn = self.driver.find_element_by_xpath('//a[#class="paginate_button next"]')
if next_page_btn:
time.sleep(2)
next_page_btn.click()
yield scrapy.Request(url = self.start_urls, callback=self.parse)
# Close the selenium driver, so in fact it closes the testing browser
self.driver.quit()
def parse_players(self):
pass
I want to collect user names and their relative guild and output to a csv file.
For now my issue is to proceed to NEXT PAGE and to parse again the content loaded by javascript.
if i'm able to simulate click on NEXT tag, i'm not 100% sure that code will proceed all pages and i'm not able to parse the new content using the same function.
Any idea how could i solve this issue ?
thx.
Instead of using selenium, you should try recreate the request to update the table. If you look closely at the HTML under chrometools. You can see that the request is made with parameters and a response is sent back with the data in a nice structured format.
Please see here with regards to dynamic content in scrapy. As it explains the first step to think about is it necessary to recreate browser activity ? Or can I get the information I need from reverse engineering HTTP get requests. Sometimes the information is hidden with <script></script> tags and you can use some regex or some string methods to gain what you want. Rendering the page and then using browser activity should be thought of as a last step.
Now before I go into some background on reverse engineering the requests, this website you're trying to get information from requires only to reverse engineer the HTTP requests.
Reverse Engineering HTTP requests in Scrapy
Now in terms of the actual web itself we can use chrome devtools by right clicking inspect on a page. Clicking the network tab allows you to see all requests the browser makes to render the page. In this case you want to see what happens when you click next.
Image1: here
Here you can see all the requests made when you click next on the page. I always look for the biggest sized response as that'll most likely have your data.
Image2: here
Here you can see the request headers/params etc... things you need to make a proper HTTP request. We can see that the referring URL is actually getplayers.php with all the params to get the next page added on. If you scroll down you can see all the same parameters it sends to getplayers.php. Keep this in mind, sometimes we need to send headers, cookies and parameters.
Image3: here
Here is the preview of the data we would get back from the server if we make the correct request, it's a nice neat format which is great for scraping.
Now You could copy the headers and parameters, cookies here into scrapy, but after a bit of searching and it's always worth checking this first, if just by passing in an HTTP request with the url will you get the data you want then that is the simplest way.
In this case it's true and infact you get in a nice need format with all the data.
Code example
import scrapy
class TestSpider(scrapy.Spider):
name = 'test'
allowed_domains = ['forge-db.com']
def start_requests(self):
url = 'https://www.forge-db.com/fr/fr11/getPlayers.php?'
yield scrapy.Request(url=url)
def parse(self,response):
for row in response.json()['data']:
yield {'name':row[2],'guild':row[3] }
Settings
In settings.py, you need to set ROBOTSTXT_OBEY = False The site doesn't want you to access this data so we need to set it to false. Be careful, you could end getting banned from the server.
I would also suggest a couple of other settings to be respectful and cache the results so if you want to play around with this large dataset you don't hammer the server.
CONCURRENT_REQUESTS = 1
DOWNLOAD_DELAY = 3
HTTPCACHE_ENABLED = True
HTTPCACHE_DIR = 'httpcache'
Comments on the code
We make a request to https://www.forge-db.com/fr/fr11/getPlayers.php? and if you were to print the response you get all the data from the table, it's quite a lot... Now it looks like it's in json format so we use scrapy's new feature to handle json and convert into a python dictionary. response.json() be sure that you have uptodate scrapy to take advantage of this. Otherwise you could use the json library that python provides to do the same thing.
Now you have to look at the preview data abit here but the individual rows are within response.json()['data'][i] where i in the row of data. The name and guild are within response.json()['data'][i][2] and response.json()['data'][i][3]. So looping over every response.json()['data']and grabbing the name and guild.
If the data wasn't so structured as it is here and it needed modifying I would strongly urge you to use Items or ItemLoaders for creating the fields that you can then output the data. You can modifying the extracted data more easily with ItemLoaders and you can interact with duplicates items etc using a pipeline. These are just some thoughts for in the future, I almost never use yielding a dictionary for extracting data particularly large datasets.

static seeded random.seed() returns mixed results from loaded JS, but expected results in console, and from Postman

I am running a server on Django, one function takes a seed through a URL param GET request, generates some data based on that seed, and sends it back.
The URL format:
mysite.com/api/generate/<seed>
expected result:
submitting a GET on mysite.com/api/generate/99 gets picked up in Django as a seed value of 99. data returned is chosen with random.choice() seeding with random.seed(99) from a database which contains a single column of names. data returned is the following:
Walker Lewis
Dalia Aguilar
Meghan Ford
Theresa Hughes
Kenna Coffey
Kendra Ho
problem
Here's where I'm getting confused (code below for each):
1000 requests in postman, all 1000 return perfectly equal
approx 100 requests from the google chrome console, all are equal
from the generate.js that the server sends with index.html, making the same call, results degenerate (examples below)
Postman call
very simple, GET mysite.com/api/generate/99
jquery from chrome console
$.ajax({
url: "/api/generate/99",
success: function( result ) {
console.log(result.data)
}})
jquery from generate.js
$.ajax({
url: "/api/generate/99",
success: function( result ) {
var data = result.data;
// data is now passed about the script, but debugging at the line above shows that data has already started to vary on a request by request basis
Both Postman and Chrome Console will return the expected results:
Walker Lewis
Dalia Aguilar
Meghan Ford
Theresa Hughes
Kenna Coffey
Kendra Ho
generate.js:
The first two names are always correct
The third is correct the majority of the time
fouth, 20% at best (estimate)
Anything past the forth might as well not be seeded, it just seems to be chosen at random from the database
other information
I have confirmed that each request from each source is being sent and received from the server, and not from a cache
Confirmed that all sources are hitting the same server, in the same state, and the same database
If anyone has any advice on this it would be very appreciated.
So it turned out this was due to me sending multiple AJAX requests "at once". When Django was behind Gunicorn/nginx each request gets a worker and is processed correctly. when requesting directly to docker the front end is returned with the strange data.

sending continuous text data from java code to html via http request

I am working on developing an application where i am am doing http post request via angular and then this request is received by Java code, code does its stuff and generate logs of about 50-60 lines creating one line every seconds.
I want to show these logs on my html page as they generate, right now i am collecting all logs and displaying them once the request finishes?
Can this be done in continuous manner?
JAVA CODE
Java code create array of logs of size 50-60, it takes 60-90 seconds to finish the operation, and i am sending array with below code after converting it to JSON
response.getWriter.write(applogs)
JAVASCRIPT CODE
var httpPostData = function (postparameters,postData){
return $http ({
method : 'POST',
url : URL,
params : postparameters,
headers: headers,
data : postData
}).success (function (responseData){
return responseData.data;
})
}
var addAppPromise = httpPostData (restartAppParams,app);
addAppPromise.then(function (logs){
$scope.logs = logs.data;
})
HTML Code
<span ng-repeat="log in logs">{{log}}<br></span>
You have at least two options:
(Uglier but faster and easier one) Make your service respond immediately (don't wait for 'stuff' to be generated) and create second service
that would return logs created so far. Then in JS implement polling: call this second service in short, fixed intervals and update view.
Use EventSource to get server sent events .
You can also use websockets, but since you only want your server to
feed client, EventSource should be enough. However, keep in mind that this API will require polyfills for IE/Edge and special handling on the server side.

Where is the Google Analytics pixel in my DOM?

How can I identify, using JavaScript that a Google Analytics pixel (or any pixel for that matter) has been sent, and contains URL parameters i'm looking for?
I thought, since it's a tracking pixel, i could look for it in the DOM, but it doesn't look like it's ever inserted.
Can someone think of a way to analyze the network request made by google using javascript (not a chrome extension)?
something like
document.whenGooglePixelIsSentDoReallyCoolStuff(function(requestUrl){
});
A few things:
1) The tracking beacons aren't always pixels. Sometimes they're XHR and sometimes they use navigator.sendBeacon depending on the situation and/or your tracker's transport setting, so if you're just looking for pixels you could be looking in the wrong place.
2) You don't need to add an image to the DOM to get it to send the request. Simply doing document.createElement('img').src = "path/to/image.gif" is sufficient.
3) You don't need to use a Chrome extension to debug Google Analytics, you can simply load the debug version of the script instead of the regular version.
4) If you really don't want to use the debug version of Google Analytics and want to track what is sent programmatically, you can override the sendHitTask and intercept hits before they're sent.
Update (7/21/2015)
You've changed how your question is worded, so I'll answer the new wording by saying you should follow the suggestion I give in #4 above. Here's some code that would work with your hypothetical whenGooglePixelIsSentDoReallyCoolStuff function:
document.whenGooglePixelIsSentDoReallyCoolStuff = function(callback) {
// Pass the `qa` queue method a function to get acess to the default
// tracker object created via `ga('create', 'UA-XXXX-Y', ...)`.
ga(function(tracker) {
// Grab a reference to the default `sendHitTask` function.
var originalSendHitTask = tracker.get('sendHitTask');
// Override the `sendHitTask` to call the passed callback.
tracker.set('sendHitTask', function(model) {
// When the `sendHitTask` runs, get the hit payload,
// which is formatted as a URL query string.
var requestUrl = model.get('hitPayload')
// Invoke the callback passed to `whenGooglePixelIsSentDoReallyCoolStuff`
// If the callback returns `false`, don't send the hit. This allows you
// to programmatically do something else based on the contents of the
// request URL.
if (callback(requestUrl)) {
originalSendHitTask(model);
}
});
});
};
Note that you'd have to run this function after creating your tracker, but prior to sending your first hit. In other words, you'd have to run it between the following two lines of code:
ga('create', 'UA-XXXX-Y', 'auto');
ga('send', 'pageview');

Uploading/Downloading Byte Arrays with AngularJS and ASP.NET Web API

I have spent several days researching and working on a solution for uploading/downloading byte[]’s. I am close, but have one remaining issue that appears to be in my AngularJS code block.
There is a similar question on SO, but it has no responses. See https://stackoverflow.com/questions/23849665/web-api-accept-and-post-byte-array
Here is some background information to set the context before I state my problem.
I am attempting to create a general purpose client/server interface to upload and download byte[]’s, which are used as part of a proprietary server database.
I am using TypeScript, AngularJS, JavaScript, and Bootstrap CSS on the client to create a single page app (SPA).
I am using ASP.NET Web API/C# on the server.
The SPA is being developed to replace an existing product that was developed in Silverlight so it is constrained to existing system requirements. The SPA also needs to target a broad range of devices (mobile to desktop) and major OSs.
With the help of several online resources (listed below), I have gotten most of my code working. I am using an asynchronous multimedia formatter for byte[]’s from the Byte Rot link below.
http://byterot.blogspot.com/2012/04/aspnet-web-api-series-part-5.html
Returning binary file from controller in ASP.NET Web API
I am using a jpeg converted to a Uint8Array as my test case on the client.
The actual system byte arrays will contain mixed content compacted into predefined data packets. However, I need to be able to handle any valid byte array so an image is a valid test case.
The data is transmitted to the server correctly using the client and server code shown below AND the Byte Rot Formatter (NOT shown but available on their website).
I have verified that the jpeg is received properly on the server as a byte[] along with the string parameter metadata.
I have used Fiddler to verify that the correct response is sent back to the client.
The size is correct
The image is viewable in Fiddler.
My problem is that the server response in the Angular client code shown below is not correct.
By incorrect, I mean the wrong size (~10K versus ~27.5K) and it is not recognized as a valid value for the UintArray constructor. Visual Studio shows JFIF when I place the cursor over the returned “response” shown in the client code below, but there is no other visible indicator of the content.
/********************** Server Code ************************/
Added missing item to code after [FromBody]byte[]
public class ItemUploadController : ApiController{
[AcceptVerbs("Post")]
public HttpResponseMessage Upload(string var1, string var2, [FromBody]byte[] item){
HttpResponseMessage result = new HttpResponseMessage(HttpStatusCode.OK);
var stream = new MemoryStream(item);
result.Content = new StreamContent(stream);
result.Content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
return result;
}
}
/***************** Example Client Code ********************/
The only thing that I have omitted from the code are the actual variable parameters.
$http({
url: 'api/ItemUpload/Upload',
method: 'POST',
headers: { 'Content-Type': 'application/octet-stream' },// Added per Byte Rot blog...
params: {
// Other params here, including string metadata about uploads
var1: var1,
var2: var2
},
data: new Uint8Array(item),
// arrybuffer must be lowecase. Once changed, it fixed my problem.
responseType: 'arraybuffer',// Added per http://www.html5rocks.com/en/tutorials/file/xhr2/
transformRequest: [],
})
.success((response, status) => {
if (status === 200) {
// The response variable length is about 10K, whereas the correct Fiddler size is ~27.5K.
// The error that I receive here is that the constructor argument in invalid.
// My guess is that I am doing something incorrectly with the AngularJS code, but I
// have implemented everything that I have read about. Any thoughts???
var unsigned8Int = new Uint8Array(response);
// For the test case, I want to convert the byte array back to a base64 encoded string
// before verifying with the original source that was used to generate the byte[] upload.
var b64Encoded = btoa(String.fromCharCode.apply(null, unsigned8Int));
callback(b64Encoded);
}
})
.error((data, status) => {
console.log('[ERROR] Status Code:' + status);
});
/****************************************************************/
Any help or suggestions would be greatly appreciated.
Thanks...
Edited to include more diagnostic data
First, I used the angular.isArray function to determine that the response value is NOT an array, which I think it should be.
Second, I used the following code to interrogate the response, which appears to be an invisible string. The leading characters do not seem to correspond to any valid sequence in the image byte array code.
var buffer = new ArrayBuffer(response.length);
var data = new Uint8Array(buffer);
var len = data.length, i;
for (i = 0; i < len; i++) {
data[i] = response[i].charCodeAt(0);
}
Experiment Results
I ran an experiment by creating byte array values from 0 - 255 on the server, which I downloaded. The AngularJS client received the first 128 bytes correctly (i.e., 0,1,...,126,127), but the remaining values were 65535 in Internet Explorer 11, and 65533 in Chrome and Firefox. Fiddler shows that 256 values were sent over the network, but there are only 217 characters received in the AngularJS client code. If I only use 0-127 as the server values, everything seems to work. I have no idea what can cause this, but the client response seems more in line with signed bytes, which I do not think is possible.
Fiddler Hex data from the server shows 256 bytes with the values ranging from 00,01,...,EF,FF, which is correct. As I mentioned earlier, I can return an image and view it properly in Fiddler, so the Web API server interface works for both POST and GET.
I am trying vanilla XMLHttpRequest to see I can get that working outside of the AngularJS environment.
XMLHttpRequest Testing Update
I have been able to confirm that vanilla XMLHttpRequest works with the server for the GET and is able to return the correct byte codes and the test image.
The good news is that I can hack around AngularJS to get my system working, but the bad news is that I do not like doing this. I would prefer to stay with Angular for all my client-side server communication.
I am going to open up a separate issue on Stack Overflow that only deals with the GET byte[] issues that I am have with AngularJS. If I can get a resolution, I will update this issue with the solution for historical purposes to help others.
Update
Eric Eslinger on Google Groups sent me a small code segment highlighting that responseType should be "arraybuffer", all lower case. I updated the code block above to show the lowercase value and added a note.
Thanks...
I finally received a response from Eric Eslinger on Google Group. He pointed out that he uses
$http.get('http://example.com/bindata.jpg', {responseType: 'arraybuffer'}).
He mentioned that the camelcase was probably significant, which it is. Changed one character and the entire flow is working now.
All credit goes to Eric Eslinger.

Categories