Validating browser upload file name and extension with simple regex - javascript

I got the regexp right. Works perfectly for Firefox ONLY. How would i make this cross browser, cross platform manner. Since it is file name and extension validation you are right i am using File Upload control.
^[a-zA-Z0-9_\.]{3,28}(.pdf|.txt|.doc|.docx|.png|.gif|.jpeg|.jpg|.zip|.rar)$
matches File name must not be empty[ 3, 28 characters long].
Extension must be within the group.
When this works superb in forefox i assume because the fileUpload.value = Filename.extension in firefox. It awfully fails in Google chrome and IE. I am using the above with .net Regular Expression validator and ClientScript enabled.
I know how to validate it on server, so please no server side solutions.
note:
Google chrome:
Provides the fileupload control value as c:\fakePath\filename.extension
IE:
Provides the Full path.

You can't use the ^ to start with if you sometimes have a full path but are only interested in the filename. The dot of the filending should be escaped.
You could try something like this:
[^\\/]{3,}\.(pdf|txt|doc|docx|png|gif|jpeg|jpg|zip|rar)$
As it looks you get only the file with Firefox but the full path with other browsers.
I'd always add a prefix / to your string and than validate the last part after the last fileseprator / or \.
This example uses lookahead to check the fileseparator (or manually added /) before the file and also allows the check of max 28 char for filename. see this online regex tester:
(?<=[\\/])[\w\.]{3,28}\.(?:pdf|txt|doc|docx|png|gif|jpeg|jpg|zip|rar)$

As things stand, your regex validates garbage like the following:
....pdf
____pdf
It also rejects perfectly valid files:
i.jpg
my-pic.jpg
pic.JPG
The easiest is to validate things in multiple steps:
Extract the extension:
\.[a-zA-Z]{3,4}$
Lowercase the extension and validate it against an array of acceptable values.
Optionally validate the file's name (though I'd recommend cleaning it instead):
[a-zA-Z0-9_-]+(?:\.[a-zA-Z0-9_-]+)*

Related

Unicode characters cannot be decoded

I use browserless.js (headless Chrome) to fetch the html code of a website, and then use a regular expression to find certain image URLs.
One example is the following:
https://vignette.wikia.nocookie.net/moviepedia/images/8/88/Adrien_Brody.jpg/revision/latest/top-crop/width/360/height/450?cb\u003d20141113231800\u0026path-prefix\u003dde
There are unicode characters such as \u003d, which should be decoded (in this case to =). The reason is that I want to include these images in a site, and without decoding some of them cannot be displayed (like that one above, just paste the URL; it gives broken-image.webp).
I have tried lots of things, but nothing works.
JSON.parse(JSON.stringify(...))
String.prototype.normalize()
decodeURIComponent
Curiously, the regular expression for "\u003d" (i.e. "\\u003d" in js) does not match that string above, but "u003d" does.
This is all very weird, and my current guess is that browserless is responsible for some weird formatting behind the scenes. Namely, when I console log the URL and copy paste it somewhere else, every method mentioned above works for decoding.
I hope that someone can help me on this.
Just to mark this one as answered. Thomas replied:
JSON.parse(`"${url}"`)

Path separator for Atom / JavaScript on Windows

I have developed an Atom package which calls SyncTeX (a utility for reverse lookup for LaTeX), and then opens in Atom the file specified in the SyncTeX response. I'm developing on Linux, but now a user on Windows tells me that this doesn't work.
In detail: SyncTeX returns a pathname like C:\data\tex\main.tex, with Windows-appropriate backslash separators. My package then calls atom.workspace.open with exactly that returned string, which leads to a JavaScript error (file not found).
Strangely, if the string is modified to use forward slashes instead, C:/data/tex/main.tex, the call works and the file is opened.
My questions:
Is this behavior specific to Atom, or to some underlying technology (JavaScript, Electron, Node, ...)? I was unable to find any documentation on this.
Since the replacement \ → / is apparently necessary, is there a preferred way to implement it? Would a simple String.replace be adequate?
Do I risk breaking compatibility with other platforms if I always do the replacement?
By my best knowledge paths with forward slashes '/' work well everywhere except Windows XP.

Open word on local computer - Office URI Schemes special characters

I want to open word-documents by clicking on a link in my solution. The link below shows how it is structured in ofe for office. This solution is really nice because it works in every browser but i have problems with special characters.
ms-word:ofe|u|file://our.local/Testing ÅÄÖ.DOCX
I Have tried different approaches to solve this problem but its not working when åäö is present in the path. EncodeURI on the path does not help for instance.
https://learn.microsoft.com/en-us/office/client-developer/office-uri-schemes does not describe anything out of the ordinary and only follow URI spec.
Documents without special characters works great but i can not figure out how special characters should be encoded to make it work.
If i take the file:\... part and past it into any browser it is working but not with the ofe prefix. So it should be some problem with encoding due to it is working fine without any special characters.
Running in cmd is also working:
So in this case i guess the browser is encoding the characters before sending to protocolhandler ??

encodeURIComponent encodes differently, depending on environment

I am passing an object via the url using:
encodeURIComponent(JSON.stringify(myObject))
"ä" is encoded as "%C3%A4" on my local server.
Unfortunately it is encoded as "a%CC%88" on the webserver.
Which breaks my app because it is part of the name of a database field which isn't found when wrong encoded. And I can't control that there are no ä's in field names because the app allows users to upload their own data.
How can I make sure that "ä" is always encoded correctly?
SORRY. To make this clear: The encoding happens both times client-side in the browser. But when the web-app is served from the webserver the "ä" is encoded as "%C3%A4" instead of "a%CC%88" (I've tested both in the same chrome browser)
Thanks for all your help. It got me to dig deeper:
I have code that runs on an event. It loops through checkboxes and creates an array of objects containing (also) the field names. The code gets the field names from an attribute named "feld" of the checkbox:
<div class="checkbox">
<label>
<input class="feld_waehlen" type="checkbox" dstyp="Taxonomie" datensammlung="SISF Index 2 (2005)" feld="Artname vollständig">Artname vollständig
</label>
</div>
running this code:
console.log("this.getAttribute('feld') = " + this.getAttribute('feld'));
gives as expected: $(this).attr('feld') = Artname vollständig
If while looping, I run:
console.log('encodeURIComponent("Artname vollständig") = ' + encodeURIComponent("Artname vollständig"));
the answer is correct: encodeURIComponent("Artname vollständig") = Artname%20vollst%C3%A4ndig
But if I run:
console.log("encodeURIComponent(this.getAttribute('feld')) = " + encodeURIComponent(this.getAttribute('feld')));
the answer is: encodeURIComponent(this.getAttribute('feld')) = Artname%20vollsta%CC%88ndig
This happens all in the browser. But the issue only appears, when the web-app is served from the webserver (a couchapp running on cloudant.com).
How can it be that the method "getAttribute" returns a different encoding?
The following code has been tested on Chrome 29 OS X, IE 8 Windows XP.
encodeURIComponent("ä") //%C3%A4"
decodeURIComponent("%C3%A4") //ä
so basically "%C3%A4" should be the expected output.
I think the issue here might be encodeURIComponent require a UTF-8 encoding while your server-side language returns something other than this.
encodeURICompoent - MDN
just a follow up in case somebody runs into this issue later.
It seems to be unique to cloudant.com where my couchapp was hosted.
This is the answer I got from their very helpful support:
OK - I think I've found the culprit. The issue is that, due to internal optimisations (which are not present in CouchDB), the form of unicode strings can get changed. In this case, ä is represented as:
U+0061 LATIN SMALL LETTER A character
U+0308 COMBINING DIAERESIS character (̈)
instead of
U+00E4 LATIN SMALL LETTER A WITH DIAERESIS character (ä)
Both are semantically equivalent, so the fix is to normalize your unicode strings before comparison. Unfortunately, JavaScript has no built-in unicode normalization, but you can use a library such ashttps://github.com/walling/unorm.
It's not an issue for me any more as I changed to a virtual server running on digitalocean.com with vanilla couchdb (and am very happy with it).
But I do think this could hit others developing couchapps in German or other languages needing utf8 and hosting them on cloudant.com
Thanks for your great help.
Alex

CKEDITOR.instance[x].setData not working in IE

Ok, I'm using the CKEditor in a web application. One thing I need to do it set the text in the text area. I've been using the line:
CKEDITOR.instances.setData(html);
...where html is a varible containing HTML.
This works fine in Chrome & Firefox, but not at all in Internet Explorer or Safari.
Can anyone provide an insight as to why, or suggest a work-around?
Many thanks in advance! :-)
Make sure to strip all newlines from the string you pass into setData(). An exception is thrown if you don't, with a message about an unterminated string. The newline characters used by CKEditor are the UNIX-style of \n (in other words, not the DOS version: \r\n).
The newline apparently throws off the parser, making it think that it's the end of the statement.
Also note that if you call getData() to get that value you just set again, CKEditor puts the line breaks and tabs back into it. You'll need to strip them out again if you need to set that value back using setData(). I use a regexp pattern like this to strip out the newlines (and tabs just for completeness):
[\n\t]+
Also make sure that if you use the regular expression to strip them, you need to make sure that the pattern matching will match the \n character (called "single-line" mode in .NET, but I don't know what you're using).

Categories