I use browserless.js (headless Chrome) to fetch the html code of a website, and then use a regular expression to find certain image URLs.
One example is the following:
https://vignette.wikia.nocookie.net/moviepedia/images/8/88/Adrien_Brody.jpg/revision/latest/top-crop/width/360/height/450?cb\u003d20141113231800\u0026path-prefix\u003dde
There are unicode characters such as \u003d, which should be decoded (in this case to =). The reason is that I want to include these images in a site, and without decoding some of them cannot be displayed (like that one above, just paste the URL; it gives broken-image.webp).
I have tried lots of things, but nothing works.
JSON.parse(JSON.stringify(...))
String.prototype.normalize()
decodeURIComponent
Curiously, the regular expression for "\u003d" (i.e. "\\u003d" in js) does not match that string above, but "u003d" does.
This is all very weird, and my current guess is that browserless is responsible for some weird formatting behind the scenes. Namely, when I console log the URL and copy paste it somewhere else, every method mentioned above works for decoding.
I hope that someone can help me on this.
Just to mark this one as answered. Thomas replied:
JSON.parse(`"${url}"`)
Related
I want to open word-documents by clicking on a link in my solution. The link below shows how it is structured in ofe for office. This solution is really nice because it works in every browser but i have problems with special characters.
ms-word:ofe|u|file://our.local/Testing ÅÄÖ.DOCX
I Have tried different approaches to solve this problem but its not working when åäö is present in the path. EncodeURI on the path does not help for instance.
https://learn.microsoft.com/en-us/office/client-developer/office-uri-schemes does not describe anything out of the ordinary and only follow URI spec.
Documents without special characters works great but i can not figure out how special characters should be encoded to make it work.
If i take the file:\... part and past it into any browser it is working but not with the ofe prefix. So it should be some problem with encoding due to it is working fine without any special characters.
Running in cmd is also working:
So in this case i guess the browser is encoding the characters before sending to protocolhandler ??
I have a web app, which allows searching. So when I go to somedomain.com/search/<QUERY> it searches for entities according to <QUERY>. The problem is, when I try to search for . or .. it doesn't work as expected (which is pretty obvious). What surprised me though, is that if I manually enter the url of somedomain.com/search/%2E, the browser (tested Chrome and IE11) converts it somedomain.com/search/ and issues a request without necessary payload.
So far I haven't found anything that would say it's not possible to make this work, so I came here. Right now I have only one option: replacing . and .. to something like __dotPlaceholder__, but this feels like a dirty hack to me.
Any solution (js or non-js) will be welcomed. Any information on why do browsers strip url-encoded dots is also a nice-to-have.
Unfortunately part of RFC3986 defines the URI dot segments to be normalised and stripped out in that case, ie http://example.com/a/./ to become http://example.com/a
see https://www.rfc-editor.org/rfc/rfc3986#page-33 for more information
I am using some JavaScript code in SSRS to open a link in a new window on a report. The report links point to file locations on a server. The code I am using within Reporting Services for the link is:
="javascript:void(window.open('"+ "file:" & Replace(Fields!FilePath.Value,"\","/") + "','_blank'))"
This code works just fine when the file name is something 'normal' such as:
\\myserver\images\Files\1969\1-000-002_SE 82ND AVE 1_1969.pdf
However, when there are special characters (at least # for sure), I get an error message. This is what happens. An example file name would be:
\\myserver\images\Files\1978\1-001-003_SE 82nd AVE #12 1_1978.pdf
In this case what gets returned as the URL is:
\\myserver\images\Files\1978\1-001-003_SE 82nd AVE
As can be seen, the URL is cut off at the first instance of the number sign. If I copy the shortcut for the offending link, this is what I get:
javascript:void(window.open('file://myserver/images/Files/1978/1-001-003_SE%2082nd%AVE%20#12%201_1978.pdf','_blank'))
It appears that the JavaScript is encoding the file path correctly but something is getting lost in translation between the JavaScript code and the URL.
I am unable to change the file names so I need to come up with a way to work with the special characters. I have tried using EncodeURI() but could not figure out how to format it correctly in SSRS to work with the existing JavaScript.
Any ideas would be welcomed.
URLs will recognize the HTML character numbers. So, outside of your JavaScript, use an SSRS replace function for each special character you expect to find, replacing each with its corresponding HTML number code. For instance, a pound sign is %23; and a space is %20.
Note, I have some pages that use pound signs to split out URL parameters, and this does NOT seem to work in those cases. However, it might work in your situation. To try this, change your function to the following:
="javascript:void(window.open('"+ "file:" & Replace(Replace(Fields!FilePath.Value,"\","/"),"#","%23") + "','_blank'))"
In case this does work for you, you can find more of these codes here.
Trying to create a simple 'mailto' function using javascript. I just need to be able to send some links (like: See this article bla bla).
Some of the links I need to send include spaces, danish chars. So I've been using the
encodeURI() function.
The problem arises when I try to mail the link (sample code below)
var _encodedPath = encodeURI(path);
var _tempString = "mailto:someemail#somewhere.dk?subject=Shared%20from%20some%20page&body=" + _encodedPath;
If I output the _tempString to the console I get the correct encoded string. However when using the same string in 'mailto' the string loses it's encoding and returns to the way it was before.
Any clue as to why this is?
Thanks in advance :)
The link is decoded when you click it - that's normal. Since you have an http link within a mailto link, it should be encoded twice.
Email clients do their best to make things that look like links clickable. They typically decide where the link ends in a somewhat arbitrary and unpredictable manner.
In email, the best way to keep a link contiguous is to enclose it in angle-brackets like this:
<http://www.example.com/url with spaces>
But this isn't foolproof. Email is fragile and you can't control the content well enough with a mailto link. It might be better to try to reduce the complexity of the url - perhaps by providing or utilizing a url-shortener service. Any url longer than 74 or so characters is likely to be mangled by some email clients.
You should use encodeURIComponent instead of encodeURI.
More information here.
this site helped me solving any troubles with mailto links:
http://www.1ngo.de/web/formular.html
may be it's not the nicest way, but it always works with every browser i know. And it also has very cool algorithm implemented to format the content so that everything should be alright. Just try it and play around a little with code by quoting out parts of the code and you will understand very fast what exactly happens there and how to modify it for your wishes. Althoug it's a little late I hope this one helps anybody checking this question.
althoug it's in german, you just need to copy the code shown there and run it and experiment with it.
Ok, I'm using the CKEditor in a web application. One thing I need to do it set the text in the text area. I've been using the line:
CKEDITOR.instances.setData(html);
...where html is a varible containing HTML.
This works fine in Chrome & Firefox, but not at all in Internet Explorer or Safari.
Can anyone provide an insight as to why, or suggest a work-around?
Many thanks in advance! :-)
Make sure to strip all newlines from the string you pass into setData(). An exception is thrown if you don't, with a message about an unterminated string. The newline characters used by CKEditor are the UNIX-style of \n (in other words, not the DOS version: \r\n).
The newline apparently throws off the parser, making it think that it's the end of the statement.
Also note that if you call getData() to get that value you just set again, CKEditor puts the line breaks and tabs back into it. You'll need to strip them out again if you need to set that value back using setData(). I use a regexp pattern like this to strip out the newlines (and tabs just for completeness):
[\n\t]+
Also make sure that if you use the regular expression to strip them, you need to make sure that the pattern matching will match the \n character (called "single-line" mode in .NET, but I don't know what you're using).