Reading text file with special characters using d3.request

Reading text file with special characters using d3.request - javascript

I have a file.txt that I need to get to my script and parse via d3.request.
The content of file is encoded with windows-1250 encoding and has extra lines to be deleted, so that only lines starting with 'Date' and '2017' should pass.
So far I have been using cli solution to grep text file (removing extra lines) and use d3 dsv2json to get clean json which can be loaded.
$ grep -E '^(Date|2017)' file.txt > file.csv
$ dsv2json -r ';' --input-encoding windows-1250 --output-encoding utf-8 < file.csv > file.json
However now i need to do these operations programmatically once txt file is loaded in the script via d3.request.
d3.request('file.txt')
.mimeType('text/csv')
.response(function(response) {
// response.responseText
})
TheresponseText gives me raw data with wrong encoding and extra lines. How to fix it so it will produce clean json at the end?

After further investigation I have found solution.
To decode file I used solution from here with TextDecoder. In order to do this d3.request.response should be set to arraybuffer.
function decode(response) {
const dataView = new DataView(response);
const decoder = new TextDecoder("windows-1250");
const decodedString = decoder.decode(dataView);
return decodedString
}
To filter out extra lines I used following step:
function filterData(rawData) {
return rawData
.split(/\n/)
.filter(row => (row.startsWith('Data') || row.startsWith('2017')))
.join('\n')
}
So finally, in context of d3.request:
d3.request('file.txt')
.header('Content-Type', 'text/csv;charset=windows-1250')
.mimeType('text/csv')
.responseType('arraybuffer')
.response(function(xhr) {
const decoded = decode(xhr.response)
const filtered = filterData(decoded)
const json = d3.dsvFormat(';').parse(filtered)
return json
})
.get()

Related

Md5 always same hash when drag&drop in javascript

I have drag&drop event and I would like to hash the filed dragged. I have this:
var file = ev.dataTransfer.items[i].getAsFile();
var hashf = CryptoJS.SHA512(file).toString();
console.log("hashf", hashf)
But when I drag differents files, "hashf" is always the same string.
https://jsfiddle.net/9rfvnbza/1/

The issue is that you are attempting to hash the File object. Hash algorithms expect a string to hash.
When passing the File Object to the CryptoJS.SHA512() method, the API attempts to convert the object to a string. That conversion results in CryptoJS.SHA512() receiving the same string not matter what File object you send provide it.
The string is [object File] - you can replace file in your code with that string and discover it is the same hash code you've see all along.
To fix this, retrieve the text from the file first and pass that to the hashing algorithm:
file.text().then((text) => {
const hashf = CryptoJS.SHA512(text).toString();
console.log("hashf", hashf);
});
If you prefer async/await, you can put it in an IIFE:
(async() => {
const text = await file.text()
const hashf = CryptoJS.SHA512(text).toString();
console.log("hashf", hashf);
})();

how to send encoded data from python to nodejs

I want to send some string data from Python3 to nodeJs. The string is Korean characters and I am encoding it to utf8.(Cause I don't know other ways sending data safely.) When I send(from python) it is ByteStream and in nodeJs I receive it as Array. I convert this Array to String. But now I cannot decode string back to original Korean characters.
Here are some codes I am using.
python
input = sys.argv[1]
d = bot.get_response(input)
data = str(d).encode('utf8')
print(data)
nodeJs
var utf = require('utf8');
var python = require('python-shell');
var pyt = path.normalize('path/to/my/python.exe'),
scrp = path.normalize('path/to/my/scriptsFolder/'),
var options = {
mode: 'text',
pythonPath: pyt,
pythonOptions: ['-u'],
scriptPath: scrp,
encoding: 'utf8',
args: [message]
};
python.run('test.py', options, function (err, results) {
//here I need to decode 'results'
var originalString = utf.encode(results.toString());// that code is not working for me
});
I have used several libs like utf8 to decode but didn't help.
Can someone please give some idea how to make it work.
EDIT
I have to edit with some more info.
I have tried #smarx approach but did not work.
I have two cases:
1. if I send data as string from python here is what I get in nodeJs b'\xec\x95\x88\xeb\x85\x95\xed\x95\x98\xec\x8b\xad\xeb\x8b\x88\xea\xb9\x8c? \xec\x9d\xb4\xed\x9a\xa8\xec\xa2\x85 \xea\xb3\xa0\xea\xb0\x9d\xeb\x8b\x98! \xeb\x8f\x99\xec\x96\x91\xeb\xa7\xa4\xec\xa7\x81\xec\x9e\x85\xeb\x8b\x88\xeb\x8b\xa4
2. if I encode data and send. I get �ȳ��Ͻʴϱ�? ��ȿ�� ������! �

I had the totally same issue on my project and now I finally found the answer.
I solved my problem by using these codes.
It works on windows (macOS and Linux, their default system encoding it 'utf8' so the issue doesn't happen).
I hope it might help you, too!
#in the python file that your javascript file will call by python-shell module put those code
import sys
sys.stdout.reconfigure(encoding='utf-8')
I found the hints from python-shell description.
feature >Simple and efficient data transfers through stdin and stdout streams

I'm still not sure what python.run does, since you won't share that code, but here's my version of the code, which is working fine:
test.py
print("안녕 세상")
app.js
const { exec } = require('child_process');
exec('python3 test.py', function (err, stdout, stderr) {
console.log(stdout);
});
// Output:
// 안녕 세상

I have the same issue when using python-shell.
Here is my solution:
The string after .encode('utf-8') is a binary string. So you need to print it on stdout directly.
in test.py, it print a utf-8 json which include some chinese char:
sys.stdout.buffer.write(json.dumps({"你好":"世界"}, ensure_ascii=False).encode('utf8'))
print() # print \n at ending to support python-shell in json mode
in main.js
let opt = {mode: 'json', pythonOptions: ['-u'], pythonPath: 'python', encoding: 'utf8'}
let pyshell = new PythonShell('lyric.py', opt);
pyshell.on('message', function (message) {
console.log(message); //*** The console msg may still wrong (still ���)
let json = JSON.stringify(message);
let fs = require('fs');
fs.writeFile('myjsonfile.json', json, 'utf8', function () {
}); //*** The output json file will be correct utf8 output
});
result:
This shows the msg is correctly receive in utf-8, because the json output is correct.
However console.log output apparently failed.
I don't know is there any way to fix console.log output. (Windows 10)

I had same trouble in using data(string) from python in node js.
I solved this problem in this way:
Try change default code page of Windows Console to UTF-8, if your code page of Windows Console is not UTF-8.
(In my case default code page was CP949.)
In my case:
I got message like ������ 2���� ��������.
I tried encoding on online (http://code.cside.com/3rdpage/us/url/converter.html)
then I found my strings encoded cp949 -> decoded utf-8.

What's the encoding should I use to "gzuncompress()" in node.js?

I'm converting my code from PHP to node.js, and I need to convert a part of my code where there's the gzuncompress() function.
For that I'm using zlib.inflateSync. But I don't know which encoding I should use to create the buffer and so to have the same result of php
Here's what I do with php to decompress a string:
gzuncompress(substr($this->raw, 8))
and here's what I've tried in node.js
zlib.inflateSync(new Buffer(this.raw.substr(8), "encoding"))
So what encoding should I use to make zlib.inflateSync returns the same data as gzuncompress ?

I am not sure about what would be exact encoding here however this repo has some PHP translations for node.js (https://github.com/gamalielmendez/node-fpdf/blob/master/src/PHP_CoreFunctions.js). According to this repo, the following could work:
const gzuncompress = (data) => {
const chunk = (!Buffer.isBuffer(data)) ? Buffer.from(data, 'binary') : data
const Z1 = zlib.inflateSync(chunk)
return Z1.toString('binary')//'ascii'
}

Unable to read accented characters from csv file stream in node

To start off. I am currently using npm fast-csv which is a nice CSV reader/writer that is pretty straightforward and simple. What Im attempting to do is use this in conjunction with iconv to process "accented" character and non-ASCII characters and either convert them to an ASCII equivalent or remove them depending on the character.
My current process Im doing with fast-csv is to bring in a chunk for processing (comes in as one row) via a read stream, pause the read stream, process the data, pipe the data to a write stream and then resume the read stream using a callback. Fast-csv currently knows where to separate the chunks based on the format of the data coming in from the readstream.
The entire process looks like this:
var stream = fs.createReadStream(inputFileName);
function csvPull(source) {
csvWrite = csv.createWriteStream({ headers: true });
writableStream = fs.createWriteStream(outputFileName);
csvStream = csv()
.on("data", function (data) {
csvStream.pause();
processRow(data, function () {
csvStream.resume();
});
})
.on("end", function () {
console.log('END OF CSV FILE');
});
csvWrite.pipe(writableStream);
source.pipe(csvStream);
}
csvPull(stream);
The problem I am currently running into is that Im noticing that for some reason, when my javascript compiles, it does not inherently recognise non-ASCII characters, so I am resorting to having to use npm iconv-lite to encode the data stream as it comes in to something usable. However, this presents a bigger issue as fast-csv will no longer know where to split the chunks (rows) due to the now encoded data. This is a problem due to the sizes of the CSVs I will be working with; it will not be an option to load the entire CSV into the buffer to then decode.
Are there any suggestions on how I might get around this without writing my own CSV parser into my code?

Try reading your file with binary for the encoding option. I had to read few csv with some accented characters and it worked fine with that.
var stream = fs.createReadStream(inputFileName, { encoding: 'binary' });

Unless I misunderstand, you should be able to fix this by setting the encoding on the stream to utf-8 (docs).
for the first line:
var stream = fs.createReadStream(inputFileName, {encoding: 'utf8'});
and if needed:
writableStream = fs.createWriteStream(outputFileName, {defaultEncoding: 'utf8'});

Saving byteArray to pdf file in Titanium

I have a SOAP API that is returning me a file divided in chunks encoded in several base64 strings
i'm not being able to save it to the file system without breaking it
This is the pastebin of a whole file encoded, as is, once i download and chain the responses.
What is the way to save it correctly?
i tried in many ways
var f = Ti.FileSystem.getFile(Ti.FileSystem.tempDirectory, 'test.pdf');
...
var blobStream = Ti.Stream.createStream({ source: fileString, mode: Ti.Stream.MODE_READ });
var newBuffer = Ti.createBuffer({ length: fileString.length });
f.write(fileString);
or
var data = Ti.Utils.base64decode(fileString);
var blobStream = Ti.Stream.createStream({ source: data, mode: Ti.Stream.MODE_READ });
var newBuffer = Ti.createBuffer({ length: data.length });
var bytes = blobStream.read(newBuffer);
f.write(fileString);
or
var data = Ti.Utils.base64decode(fileString);
var blobStream = Ti.Stream.createStream({ source: data, mode: Ti.Stream.MODE_READ });
var newBuffer = Ti.createBuffer({ length: data.length });
var bytes = blobStream.read(newBuffer);
f.write(bytes);
but i'm not understanding which one is the right path
Do I have to convert back to byteArray the string on my own?
What is the right way to save it?
Do I have to create a buffer from the string or ...?

I think that the base64enc for the file is not valid or incomplete, I've tested it using bash and base64 utils. You can perform these steps.
Copy and paste the base64 string on a file called pdf.base64 then run this command:
cat pdf.base64 | base64 --decode >> out.pdf
the output file is not a valid pdf.
You can try to encode and decode a valid pdf file to take a look at the generated binary:
cat validfile.pdf | base64 | base64 --decode >> anothervalidfile.pdf
Try to check if you are chainig chunks correctly or simply make a phone call with the guy who build the soap api.

Before you start downloading your file you need to create the file stream to write too, writing to a blob is not the way to go:
// Step 1
var outFileStream = Ti.Filesystem.getFile('outfile.bin').open(Ti.Filesystem.MODE_WRITE);
After creating your HTTPClient or socket stream and when you receive a chunk of Base64 data from the serve, you need to put that decoded data into a Titanium.Buffer. This would probably go into your onload or onstream in an HTPPClient, :
// Step 2
var rawDecodedFileChunk = Ti.Utils.base64decode(fileString);
var outBuffer = Ti.createBuffer({
byteOrder : // May need to set this
type : // May also need to set this to match data
value: rawDecodedFileChunk
});
Finally you can write the data out to the file stream:
// Step 3
var bytesWritten = outFileStream.write(outBuffer); // writes entire buffer to stream
Ti.API.info("Bytes written:" + bytesWritten); // should match data length
if(outBuffer.length !== bytesWritten) {
Ti.API.error("Not all bytes written!");
}
Generally errors come from having the wrong byte order or type of data, or writing in the wrong order. Obviously, this all depends on the server sending the data in the correct order and it being valid!
You may also want to consider the pump command version of this, which allows you to transfer from input stream to output file stream, minimizing your load.

We Keep Coding

JavaScript is the programming language of the Web.

Reading text file with special characters using d3.request - javascript

Related

Md5 always same hash when drag&drop in javascript

how to send encoded data from python to nodejs

What's the encoding should I use to "gzuncompress()" in node.js?

Unable to read accented characters from csv file stream in node

Saving byteArray to pdf file in Titanium

Categories

Resources