node.js / Write buffer to file - javascript

I have a file, with size of 108 bytes.
I want to add to this file some text (buffer), let say "Hello world".
So I wrote the next:
fs.open("./tryit.txt", 'w+', function (err, fd1) {
var buffer = new Buffer("hello world");
fs.write(fd1, buffer, 0, 11, 109, function (err, bytesWrite, buffer) {
})
})
In order to write the file from position of 109.
I see that it write it, but before the hello world, all the text of the file was replaced by the NUL character.
How can I do it? append is not an option, because in some cases I want to write to the middle of the file.

What you want is random access IO (read or write at a specific point in a file).
It's not provided in the default API but you may use an additional package like https://www.npmjs.org/package/random-access-file

From docs:
'w+' - Open file for reading and writing. The file is created (if it does not exist) or truncated (if it exists)
"truncated" means that file becomes empty once opened.
You need a different mode, r+ for instance. a also might work, but not on Linux, according to docs.

Related

Java Script File Reader class is not reading the contents properly

I am using readAsText method in FileReader class (java script) with encoding type as "UTF-8" to read a file from client. It works well for all kind of characters with ascii values ranging from 1 to 65000. The only problem I have is, when I read chunk by chunk from the file, any char has ascii value after 3000 sometimes not read properly, After the investigation, I found that it is happening only when I do this reading for big files and the particular char is accidently sitting as first letter of a chunk. And I tested with multiple chunks of a file. This problem is not happening for all the chunks, happening one or 2 chunks out of 10. This is weird and strange. Am I missing something here? and do we have any other options to read local file in Java script? Any help will be much appreciated.
this might be one solution
new Blob(['hi']).text().then(console.log)
Here is another, not so cross browser friendly... but this could work...
new Blob(['foo']) // or new File(['foo'], 'test.txt')
.stream()
.pipeThrough(new TextDecoderStream('utf-8'))
.pipeTo(new WritableStream({
write(part) {
console.log(part)
}
}))
Another lower level solution that don't depend on WritableStream or TextDecoderStream would be to use regular TextDecoder with stream option by doing
var res = ''
const decoder = new TextDecoder()
res += decoder.decode(chunk1, { stream: true })
res += decoder.decode(chunk2, { stream: true })
res += decoder.decode(chunk3, { stream: true })
res += decoder.decode() // flush the end
how u get each chunk could be by using new Response(blob).body or by blob.stream() or simply slicing the blob using blob.slice(start, end) and use FileReader.prototype.readAsArrayBuffer()

Random comma inserted at character 8192 in python "json" result called from node.js

I'm a JS developer just learning python. This is my first time trying to use node (v6.7.0) and python (v2.7.1) together. I'm using restify with python-runner as a bridge to my python virtualenv. My python script uses a RAKE NLP keyword-extraction package.
I can't figure out for the life of me why my return data in server.js inserts a random comma at character 8192 and roughly multiples of. There's no pattern except the location; Sometimes it's in the middle of the object key string other times in the value, othertimes after the comma separating the object pairs. This completely breaks the JSON.parse() on the return data. Example outputs below. When I run the script from a python shell, this doesn't happen.
I seriously can't figure out why this is happening, any experienced devs have any ideas?
Sample output in browser
[..., {...ate': 1.0, 'intended recipient': 4.,0, 'correc...}, ...]
Sample output in python shell
[..., {...ate': 1.0, 'intended recipient': 4.0, 'correc...}, ...]
DISREGARD ANY DISCREPANCIES REGARDING OBJECT CONVERSION AND HANDLING IN THE FILES BELOW. THE CODE HAS BEEN SIMPLIFIED TO SHOWCASE THE ISSUE
server.js
var restify = require('restify');
var py = require('python-runner');
var server = restify.createServer({...});
server.get('/keyword-extraction', function( req, res, next ) {
py.execScript(__dirname + '/keyword-extraction.py', {
bin: '.py/bin/python'
})
.then( function( data ) {
fData = JSON.parse(data); <---- ERROR
res.json(fData);
})
.catch( function( err ) {...});
return next();
});
server.listen(8001, 'localhost', function() {...});
keyword-extraction.py
import csv
import json
import RAKE
f = open( 'emails.csv', 'rb' )
f.readline() # skip line containing col names
outputData = []
try:
reader = csv.reader(f)
for row in reader:
email = {}
emailBody = row[7]
Rake = RAKE.Rake('SmartStoplist.txt')
rakeOutput = Rake.run(emailBody)
for tuple in rakeOutput:
email[tuple[0]] = tuple[1]
outputData.append(email)
finally:
file.close()
print( json.dumps(outputData))
This looks suspiciously like a bug related to size of some buffer, since 8192 is a power of two.
The main thing here is to isolate exactly where the failure is occurring. If I were debugging this, I would
Take a closer look at the output from json.dumps, by printing several characters on either side of position 8191, ideally the integer character code (unicode, ASCII, or whatever).
If that looks OK, I would try capturing the output from the python script as a file and read that directly in the node server (i.e. don't run a python script).
If that works, then create a python script that takes that file and outputs it without manipulation and have your node server execute that python script instead of the one it is using now.
That should help you figure out where the problem is occurring. From comments, I suspect that this is essentially a bug that you cannot control, unless you can increase the python buffer size enough to guarantee your data will never blow the buffer. 8K is pretty small, so that might be a realistic solution.
If that is inadequate, then you might consider processing the data on the the node server, to remove every character at n * 8192, if you can consistently rely on that. Good luck.

How NOT to stop reading file when meeting EOF?

I'm trying to implement a routine for Node.js that would allow one to open a file, that is being appended to by some other process at this very time, and then return chunks of data immediately as they are appended to file. It can be thought as similar to tail -f UNIX command, however acting immediately as chunks are available, instead of polling for changes over time. Alternatively, one can think of it as of working with a file as you do with socket — expecting on('data') to trigger from time to time until a file is closed explicitly.
In C land, if I were to implement this, I would just open the file, feed its file descriptor to select() (or any alternative function with similar designation), and then just read chunks as file descriptor is marked "readable". So, when there is nothing to be read, it won't be readable, and when something is appended to file, it's readable again.
I somewhat expected this kind of behavior for following code sample in Javascript:
function readThatFile(filename) {
const stream = fs.createReadStream(filename, {
flags: 'r',
encoding: 'utf8',
autoClose: false // I thought this would prevent file closing on EOF too
});
stream.on('error', function(err) {
// handle error
});
stream.on('open', function(fd) {
// save fd, so I can close it later
});
stream.on('data', function(chunk) {
// process chunk
// fs.close() if I no longer need this file
});
}
However, this code sample just bails out when EOF is encountered, so I can't wait for new chunk to arrive. Of course, I could reimplement this using fs.open and fs.read, but that somewhat defeats Node.js purpose. Alternatively, I could fs.watch() file for changes, but it won't work over network, and I don't like an idea of reopening file all the time instead of just keeping it open.
I've tried to do this:
const fd = fs.openSync(filename, 'r'); // sync for readability' sake
const stream = net.Socket({ fd: fd, readable: true, writable: false });
But had no luck — net.Socket isn't happy and throws TypeError: Unsupported fd type: FILE.
So, any solutions?
UPD: this isn't possible, my answer explains why.
I haven't looked into the internals of the read streams for files, but it's possible that they don't support waiting for a file to have more data written to it. However, the fs package definitely supports this with its most basic functionality.
To explain how tailing would work, I've written a somewhat hacky tail function which will read an entire file and invoke a callback for every line (separated by \n only) and then wait for the file to have more lines written to it. Note that a more efficient way of doing this would be to have a fixed size line buffer and just shuffle bytes into it (with a special case for extremely long lines), rather than modifying JavaScript strings.
var fs = require('fs');
function tail(path, callback) {
var descriptor, bytes = 0, buffer = new Buffer(256), line = '';
function parse(err, bytesRead, buffer) {
if (err) {
callback(err, null);
return;
}
// Keep track of the bytes we have consumed already.
bytes += bytesRead;
// Combine the buffered line with the new string data.
line += buffer.toString('utf-8', 0, bytesRead);
var i = 0, j;
while ((j = line.indexOf('\n', i)) != -1) {
// Callback with a single line at a time.
callback(null, line.substring(i, j));
// Skip the newline character.
i = j + 1;
}
// Only keep the unparsed string contents for next iteration.
line = line.substr(i);
// Keep reading in the next tick (avoids CPU hogging).
process.nextTick(read);
}
function read() {
var stat = fs.fstatSync(descriptor);
if (stat.size <= bytes) {
// We're currently at the end of the file. Check again in 500 ms.
setTimeout(read, 500);
return;
}
fs.read(descriptor, buffer, 0, buffer.length, bytes, parse);
}
fs.open(path, 'r', function (err, fd) {
if (err) {
callback(err, null);
} else {
descriptor = fd;
read();
}
});
return {close: function close(callback) {
fs.close(descriptor, callback);
}};
}
// This will tail the system log on a Mac.
var t = tail('/var/log/system.log', function (err, line) {
console.log(err, line);
});
// Unceremoniously close the file handle after one minute.
setTimeout(t.close, 60000);
All that said, you should also try to leverage the NPM community. With some searching, I found the tail-stream package which might do what you want, with streams.
Previous answers have mentioned tail-stream's approach which uses fs.watch, fs.read and fs.stat together to create the effect of streaming the contents of the file. You can see that code in action here.
Another, perhaps hackier, approach might be to just use tail by spawning a child process with it. This of course comes with the limitation that tail must exist on the target platform, but one of node's strengths is using it to do asynchronous systems development via spawn and even on windows, you can execute node in an alternate shell like msysgit or cygwin to get access to the tail utility.
The code for this:
var spawn = require('child_process').spawn;
var child = spawn('tail',
['-f', 'my.log']);
child.stdout.on('data',
function (data) {
console.log('tail output: ' + data);
}
);
child.stderr.on('data',
function (data) {
console.log('err data: ' + data);
}
);
So, it seems people are still looking for an answer to this question for five years already, and there is yet no answer on topic.
In short: you can't. Not in Node.js particularly, you can't at all.
Long answer: there are few reasons for this.
First, POSIX standard clarifies select() behavior in this regard as follows:
File descriptors associated with regular files shall always select true for ready to read, ready to write, and error conditions.
So, select() can't help with detecting a write beyond the file end.
With poll() it's similar:
Regular files shall always poll TRUE for reading and writing.
I can't tell for sure with epoll(), since it's not standartized and you have to read quite lengthy implementation, but I would assume it's similar.
Since libuv, which is in core of Node.js implementation, uses read(), pread() and preadv() in its uv__fs_read(), neither of which would block when invoked at the end of file, it would always return empty buffer when EOF is encountered. So, no luck here too.
So, summarizing, if such functionality is desired, something must be wrong with your design, and you should revise it.
What you're trying to do is a FIFO file (acronym for First In First Out), which as you said works like a socket.
There's a node.js module that allows you to work with fifo files.
I don't know what do you want that for, but there are better ways to work with sockets on node.js. Try socket.io instead.
You could also have a look at this previous question:
Reading a file in real-time using Node.js
Update 1
I'm not familiar with any module that would do what you want with a regular file, instead of with a socket type one. But as you said, you could use tail -f to do the trick:
// filename must exist at the time of running the script
var filename = 'somefile.txt';
var spawn = require('child_process').spawn;
var tail = spawn('tail', ['-f', filename]);
tail.stdout.on('data', function (data) {
data = data.toString().replace(/^[\s]+/i,'').replace(/[\s]+$/i,'');
console.log(data);
});
Then from the command line try echo someline > somefile.txt and watch at the console.
You might also would like to have a look at this: https://github.com/layerssss/node-tailer

Use FileAPI to download big generated data file

The JavaScript process generates a lot of data (200-300MB). I would like to save this data for further analysis but the best I found so far is saving using this example http://jsfiddle.net/c2U2T/ which is not an option for me, because it looks like it requires all the data being available before starting the downloading. But what I need is something like
var saver = new Saver();
saver.save(); // The Save As ... dialog appears
saver.onaccepted = function () { // user accepted saving
for (var i = 0; i < 1000000; i++) {
saver.write(Math.random());
}
};
Of course, instead of the Math.random() will be some meaningful construction.
#dader - I would build upon dader's example.
Use HTML5 FileSystem API - but instead of writing to the file each and every line (more IO than it is worth), you can batch some of the lines in memory in a javascript object/array/string, and only write it to the file when they reach a certain threshold. You are thus appending to a local file as the process chugs (makes it easy to pause/restart/stop etc)
Of note is the following, which is an example of how you can spawn the dialoge to request the amount of data that you would need (it sounds large). Tested in chrome.:
navigator.persistentStorage.queryUsageAndQuota(
function (usage, quota) {
var availableSpace = quota - usage;
var requestingQuota = args.size + usage;
if (availableSpace >= args.size) {
window.requestFileSystem(PERSISTENT, availableSpace, persistentStorageGranted, persistentStorageDenied);
} else {
navigator.persistentStorage.requestQuota(
requestingQuota, function (grantedQuota) {
window.requestFileSystem(PERSISTENT, grantedQuota - usage, persistentStorageGranted, persistentStorageDenied);
}, errorCb
);
}
}, errorCb);
When you are done you can use Javascript to open a new window with the url of that blob object that you saved which you can retrieve via: fileEntry.toURL()
OR - when it is done crunching you can just display that URL in an html link and then they could right click on it and do whatever Save Link As that they want.
But this is something that is new and cool that you can do entirely in the browser without needing to involve a server in any way at all. Side note, 200-300MB of data generated by a Javascript Process sounds absolutely huge... that would be a concern for whether you are storing the "right" data...
What you actually are trying to do is a kind of streaming. I mean FileAPI is not suited for the task. Instead, I could suggest two options :
The first, using XHR facility, ie ajax, by splitting your data into several chunks which will sequencially be sent to the server, each chunk in its own request along with an id ( for identifying the stream ) and a position index ( for identifying the chunk position ). I won't recommend that, since it adds work to break up and reassemble data, and since there's a better solution.
The second way of achieving this is to use Websocket API. It allows you to send data sequentially to the server as it is generated. Following a usual stream API. I think you definitely need this.
This page may be a good place to start at : http://binaryjs.com/
That's all folks !
EDIT considering your comment :
I'm not sure to perfectly get your point though but, what about HTML5's FileSystem API ?
There are a couple examples here : http://www.html5rocks.com/en/tutorials/file/filesystem/ among which this sample that allows you to append data to an existant file. You can also create a new file, etc. :
function onInitFs(fs) {
fs.root.getFile('log.txt', {create: false}, function(fileEntry) {
// Create a FileWriter object for our FileEntry (log.txt).
fileEntry.createWriter(function(fileWriter) {
fileWriter.seek(fileWriter.length); // Start write position at EOF.
// Create a new Blob and write it to log.txt.
var blob = new Blob(['Hello World'], {type: 'text/plain'});
fileWriter.write(blob);
}, errorHandler);
}, errorHandler);
}
EDIT 2 :
What you're trying to do is not possible using javascript as said on SO here. Tha author nonetheless suggest to use Java Applet to achieve needed behaviour.
To put it in a nutshell, HTML5 Filesystem API only provides a sandboxed filesystem, ie located in some hidden directory of the browser. So if you want to access the true filesystem, using java would be just fine considering your use case. I guess there is an interface between java and javascript here.
But if you want to make your data only available from the browser ( constrained by same origin policy ), use FileSystem API.

download string object content as plain text file with Mozilla Firefox extension

In my FF extension I create an object (a string) retrieving data from the DOM.
Now I need to download a plain text file with the string content. The result should be a CSV file.
I read about addDownload method but I miss a lot of pieces... any hint?
Mainly I don't know how to:
"transform" my string in a downloadable object (a file?)
correctly call the addDownload method (nsIURI, etc)
Thank you very much for the help.
You have a number of approaches. The old-school way is to manipulate nsIFile and nsIFileOutputStream directly, although you can't write null bytes this way. You can also create an nsIStringInputStream from your string and write that to your output stream, or you can use nsIAsyncStreamCopier to copy it asynchronously. FileUtils.jsm and NetUtil.jsm exist to try to make this easier for you.
However, if you are targetting new enough versions of Firefox, you can ignore all that and use the OS.File API instead.
This is part of my extension, it shows the save as dialog so the user can pick the correct location (you can skip that part or try to find out where the downloads should be placed automatically)
const nsIFilePicker = Components.interfaces.nsIFilePicker;
var fp = Components.classes["#mozilla.org/filepicker;1"]
.createInstance(nsIFilePicker);
fp.init(window, "Save as", nsIFilePicker.modeSave);
fp.appendFilters(nsIFilePicker.filterHTML);
fp.appendFilters(nsIFilePicker.filterAll);
var rv = fp.show();
if (rv == nsIFilePicker.returnOK || rv == nsIFilePicker.returnReplace) {
var file = fp.file;
// work with returned nsILocalFile...
// Check that it has some extension
var name = file.leafName;
if (-1==name.indexOf('.'))
file.leafName = name + '.html' ;
// file is nsIFile, data is a string
var foStream = Components.classes["#mozilla.org/network/file-output-stream;1"]
.createInstance(Components.interfaces.nsIFileOutputStream);
// use 0x02 | 0x10 to open file for appending.
foStream.init(file, 0x02 | 0x08 | 0x20, 0666, 0);
// write, create, truncate
foStream.write(data, data.length);
foStream.close();

Categories