This question already has answers here:
Use File Content to Determine MIME Type with Node JS
(2 answers)
Closed 2 years ago.
Hey everyone I'm trying to get the file type in Node.js. I have to rename the file before it's uploaded for version control and once the file is written then run some processing on it.
I'm aware that I can check the file type on the client but I still think it would be beneficial to do a server side check as well.
Other solutions have popped the file extension off the file name:
return filename.split('.').pop();
But since I'm renaming this file and a user could simply rename a malicious file to a whitelisted extension I'm looking for a solution that actually determines the file type, specifically CSV.
Can anyone point me in the direction of solving this?
To read the file extension you may use the snippet:
const { extname } = require('path')
console.log(extname('foo.csv')) // .csv
console.log(extname('BAR.CSV')) // .CSV
But since I'm renaming this file and a user could simply rename a malicious file to a whitelisted extension I'm looking for a solution that actually determines the file type, specifically CSV.
Usually, to implement this check, you need to read the magic bytes of the file and act accordingly for a limited set of file types but CSV is not in this list.
CSV is a plain text with a defined format, so you could:
read some line an try to parse them
define that the CSV must have a defined header like id,col1,col2,etc
use some tools that try to guess the mime type of the file (like mime-types )
Related
How can I get MIME type of a text file with no extension in Node.js? I am using file-type package, but it does not detect file type of text files.
Thanks
Without a helper package, you can't easily tell of which format is a certain text file. There just isn't a place where different file formats say who they are; you would have to use a bunch of regexes, one for each file format.
I am using the dropbox-js API as a back-end to an application I am creating.
I need to get the contents of a file and I understand that the method "readFile" that is used to get the contents only really supports text files.
I can get the contents of a text file of type "text/plain" i.e. .txt files, using the following:
client.readFile(d2.path, {arrayBuffer: true}, function(error, contents){
var decoded = decodeUtf8(contents);
console.log(decoded);
});
The API reference for this method is here: http://coffeedoc.info/github/dropbox/dropbox-js/master/classes/Dropbox/Client.html#readFile-instance
The decode function was found here: https://gist.github.com/boushley/5471599
This does not seem to work for any other document type file. If I try and read a .docx / .doc file the result consists of what looks like scrambled characters. Should it be able to work with other document type files? How would I read it differently?
I really need it to support more than .txt files.
Edit:
This is a test document (.docx) that I tried to read:
This is how it is decoded (Contents shows that it is indeed an arrayBuffer, while Decoded is the actual string that is returned after decode:
readFile should work for any content type. Presumably the "scrambled characters" you see are exactly the content of the .docx or .doc file you're reading. (If you looked at the file via type on Windows or cat on Mac/Linux, you would see the same thing.)
So I think the issue you're having is that you want to somehow extract the text from a variety of file formats. Dropbox (and dropbox.js) won't help you with that particular problem... you'll need to find software that understands all those file formats and can convert them to the form you need. For example, textract is a Python library that can do this.
I am trying to create a file in the extension using this code:
Components.utils.import("resource://gre/modules/FileUtils.jsm");
var file = new FileUtils.File("C:\\Windows\\hello.txt");
But nothing happens.The file doesn't appear
Any ideas?
Your file var is an object that represents a file at the location you specified. Creating this file object does not create a file directly (you might instead choose to read from the file, for example).
You can now use the nsIFile API to manipulate the file object. For example, you can create a file at that location:
file.create(file.NORMAL_FILE_TYPE, parseInt("0600", 8));
Note that Windows UAC can cause file access to fail. You might want to try:
file.isWriteable();
but ultimately you might find that it's not possible to write to directories that UAC is protecting so you can instead choose a non-protected location, perhaps using the special directory definitions explained on this useful MDN page: https://developer.mozilla.org/en-US/docs/Code_snippets/File_I_O
How to validate the content-type of a file before uploading using JavaScript? I'm not asking the extension validation. I want to validate pdf,plain text and MS word files.
I'm using a django forms.ModelForm to pass file upload widget to html. I couldn't achieve this either on server side. Here is that question,
Django - Uploaded file type validation
Maybe but it won't give you any form of security because an attacker could use other means to upload files thus circumventing your validation.
To check the file type using the extension (which is very insecure since it's dead easy to manipulate it), you can use JavaScript. See this question: How do I Validate the File Type of a File Upload?
[EDIT] After some googling, I found that the input element has an attribute accept which takes a list of mime type patterns. Unfortunately, most browsers ignore it (or only use it to tweak the file selection dialog). See this question: File input 'accept' attribute - is it useful?
[EDIT 2] Right now, it seems that the File API (see "Using files from web applications") is your only way it you really don't want to use file extensions. Each File instance has a type property which contains the mime type.
But this API is work in progress, so it's not available everywhere. And there is no guarantee that you'll get a MIME type (the property can be "").
So I suggest this approach: Try the File API. If it's not available or the type property is empty, use the file extension.
In theory you could use the File API to read the files.
You would then need to write parsers in JavaScript for the file formats you cared about to check if they matched.
I'm using uploadify + s3, and when trying to upload a file that has question marks in it, Uploadify doesn't give me the correct filename. For example, if the file is named #?? (copy).mp4, the fileObj.name value sent to the event handlers is # (basically everything after and including the question mark is removed).
Ignoring the original filename altogether is not an option, because I also need the extension.
If I try to change the scriptData at runtime, the upload will fail for some reason.
Can you help me out with this issue?
The problem exists above uploadify in actionscript's FileReference object.
From what I can tell the FileReference object chops the name at the question mark and only returns the part in front it.
I tried finding some way of getting to the original filesystem file name before it populated FileReference(event.target).name but I have next to no knowledge of actionscript.
I've also thought about renaming on the server but no mime type is set when the file is uploaded due to how `FileReference' handles the filename. I think it throws away the file ext since it's after the question mark.
I looked into hacking the uploadify Javascript to deal with file name validation and sanitization client side or send something to the server so the name can be fixed when the file is processed but by by the time uploadify has access to the name it's been truncated.