Huffman's algorithm with stream in Node JS

Huffman's algorithm with stream in Node JS - javascript

I have implemented Huffman's algorithm in Node JS and it looks like this:
huffman.encode(inputFilename, outputFilename)
huffman.decode(inputFilename, outputFilename)
But I want to implement it like this:
inputStream.pipe(HuffmanEncoderStream).pipe(outputStream)
outputStream.pipe(HuffmanDecoderStream).pipe(inputStream)
And the problem is I need to read content of the source file twice. Firstly to create table of frequencies and Huffman's tree and secondary to exactly encode content. So is it possible to implement this task with Transform Stream?
P.S. with decoding there no problems

Huffman's algorithm requires that you have all the data first in order to compute frequencies. However, nothing is stopping you from applying Huffman's algorithm to chunks of your data, which would allow streaming. If the chunks are large enough (100's of K to MBs), the overhead of transmitting a code description will be very small in comparison. If the data is homogenous, you will get about the same compression. If the data is not homogenous, your compression might even improve, since the Huffman codes would be optimized to the statistics of the data local to that chunk.

Related

Format WebM video chunks/Create MediaStream using input as media tracks

Normally, I am able to find the answer I am looking for, however, I have come across an issue that I am not finding resolution for yet..
Given a MessageEvent whoms body contains a 1-... second video file,
webm, as a binaryString. I can parse this as a dataURL and update the
src, however, I would like to instead build a growing buffer that can
be streamed to the srcObj, as if it were the mediaDevice ?
I am working on a scalable API for broadcasting video data that has as few dependencies as possible.
String trimming is possible as well, maybe just trim the binary string using a regex that removes all header data and continuously append to srcObj. The stream may be in excess of 1 GB total chunks, meaning src="..." may not be scalable friendly in terms of growing the string over time, additional solutions may include toggling different video sources to achieve a smoother transition. I can manipulate the binary string in php on the server or use a python, cpp, ruby, node, service as long as it routes the output to the correct socket.
I am not utilizing webRTC.
Thanks, the Stack Overflow community is awesome, I do not get to say that often enough.

AFrame / Three.js : why so many (JS)Strings in memory, when loading complex .obj files?

we have a pretty complex webscene where we dynamically load pretty complex .obj- and .mtl-files.
After comparing the scene without any of these objects with the one that has multiple objects inside we noticed a weird behaviour:
firefox memory heap shows that most of the memory (>100MB for 5 Objects) is used for JSStrings. the rest of the memory is used for Objects which is self-explaining when we have complex object files in there.
But how come the high amount of Strings and are we able to reduce it? Does AFrame convert the content of the .obj-files into strings?
We thought about minimizing the .obj files itself and reducing vertices. Maybe someone of you made similar experiences and / or can give us suggestions how to solve this problem.
Thank you in advance :-)

OBJ files are text-based, and unfortunately not a particularly efficient way to transfer 3D data. A-Frame has to parse that text to get your data uploaded to the GPU.
If you need to avoid that, I'd suggest trying to convert your OBJ files to a binary format like glTF (.glb). You can do that conversion with obj2gltf (CLI) or https://cesiumjs.org/convertmodel.html (web). A glTF file will load more quickly.

Three js displaying 3D images using json file of big size

I have a requirement of displaying the 3D images. I have a .obj file that is of the size 80 MB that I converted to json file which is also nearly 75 MB.Now using three js I could display rotating 3D image but the issue is speed.It takes nearly 1 minute to load the 3D image.
Further client is expecting to display 2 such images on same page I could do that also but it is taking nearly 4-5 minutes to load images.
Is there any workaround for increasing the speed ??

If the object is 80MB in size then it should not be considered suitable for use in a web application. If the model does not seem to have a high level of detail, then perhaps your exporter has some problem with it and you are getting a lot of extra information that you don't need. If the model simply is very complex, then your only real option is to simplify the model dramatically, or find another model that has a lower polygon count.
If the model has come directly from the client, you are unfortunately going to be faced with the unenviable task of convincing them of the limitations.
If you absolutely have to use these models, perhaps you can try to compress the data and decompress it on the client side. A quick google brought up this Stack Overflow question as a starting point:
Client side data compress/decompress?
EDIT: Perhaps you could break it down into separate pieces. Since it is an indexed format, you would probably still have to download all of the vertex data first, and then chunks of the index data.
If you wanted to go deeper you could try breaking the index data into chunks and computing which vertices you would need for each chunk. There is no guarantee that the final result would look that great though, so it might not be worth the effort. That said, with some analysis you could probably rearrange the indices so the model loaded in a sensible order, say from front to back, or left to right.
If you have a 3d artist at your disposal, it might be easier to break the model down into several models and load them one by one into the scene. This would probably look much cleaner since you could make artistic choices about where to cut the model up.

Algorithm to store text file (with 25% of repeated words) in JavaScript

I have raw data in text file format with lot of repetitive tokens (~25%). I would like to know if there's any algorithm which will help:
(A) store data in compact form
(B) yet, allow at run time to re-constitute the original file.
Any ideas?
More details:
the raw data is consumed in a pure html+javascript app, for instant search using regex.
data is made of tokens containing (case sensitive) alpha characters, plus few punctuation symbols.
tokens are separated by spaces, new lines.
Most promising Algorithm so far: Succinct data structures discussed below, but reconstituting looks difficult.
http://stevehanov.ca/blog/index.php?id=120
http://ejohn.org/blog/dictionary-lookups-in-javascript/
http://ejohn.org/blog/revised-javascript-dictionary-search/
PS: server side gzip is being employed right now, but its only a transport layer optimization, and doesn't help maximize use of offline storage for example. Given the massive 25% repetitiveness, it should be possible to store in a more compact way, isn't it?

Given that the actual use is pretty unclear I have no idea whether this is helpful or not, but for smallest total size (html+javascript+data) some people came up with the idea of storing text data in a greyscale .png file, one byte to each pixel. A small loader script can then draw the .png to a canvas, read it pixel for pixel and reassemble the original data this way. This gives you deflate compression without having to implement it in Javascript. See e.g. here for more detailled information.
Please, do not use a technique like that unless you have pretty esotheric requirements, e.g. for a size-constrained programming competition. Your coworkers will thank you :-)

Generally speaking, it's a bad idea to try to implement compression in JavaScript. Compression is the exact type of work that JS is the worst at: CPU-intensive calculations.
Remember that JS is single-threaded1, so for the entire time spent decompressing data, you block the browser UI. In contrast, HTTP gzipped content is decompressed by the browser asynchronously.
Given that you have to reconstruct the entire dataset (so as to test every record against a regex), I doubt the Succinct Trie will work for you. To be honest, I doubt you'll get much better compression than the native gzipping.
1 - Web Workers notwithstanding.

Better for loading speed: large JSON file, or small CSV file that requires processing?

For maximum load speed and page efficiency, is it better to have:
An 18MB JSON file, containing an array of dictionaries, that I can load and start using as a native JavaScript object (e.g. var myname = jsonobj[1]['name']).
A 4MB CSV file, that I need to read using the jquery.csv plugin, and then use lookups to refer to: var nameidx = titles.getPos('name'); var myname = jsonobj[1][nameidx]).
I'm not really expecting anyone to give me a definitive answer, but a general suspicion would be very useful. Or tips for how to measure - perhaps I can check the trade-off between load speed and efficiency using Developer Tools.
My suspicion is that any extra efficiency from using a native JavaScript object in (1) will be outweighed by the much smaller size of the CSV file, but I would like to know if others think the same.

Did you considered delivering the json content using gzip - here is some benchmarks on gzip http://www.cowtowncoder.com/blog/archives/2009/05/entry_263.html

What is your situation? Are you writing some intranet site where you know what browser users are using and have something like a reasonable expectation of bandwidth, or is this a public-facing site?
If you have control of what browsers people use, for example because they're your employees, consider taking advantage of client-side caching. If you're trying to convince people to use this data you should probably consider breaking the data up into chunks and serving it via XHR.
If you really need to serve it all at once then:
Use gzip
Are you doing heavy processing of the data on the client side? How many of the items are you actually likely to go through? If you're only likely to access fewer than 1,000 of them in any given session then I would imagine that the 14MB savings would be worth it. If on the other hand you're comparing all kinds of things against each other all the time (because you're doing some sort of visualization or... anything) then I imagine that the JSON would pay off.
In other words: it depends. Benchmark it.

4MB vs 18MB? Where problem? Json is just standard format now, csv is maybe same good and ok if you using it. My opinion.

14Mb of data are a HUGE difference, but I will try first to serve both the content with GZIP/Deflate server side compression and, thus, make a comparison of these requests (probably the CSV request will be again better in content length)
Then, I would also try to create some data manipulation tests on jsperf both with CSV and JSON data with a real test case/common usage

That depends a lot on the bandwidth of the connection to the user.
Unless this is only going to be used by people who have a super fast connection to the server, I would say that the best option would be an even smaller file that only contains the actual information that you need to display right away, and then load more data as needed.

We Keep Coding

JavaScript is the programming language of the Web.