I'm working to integrate some code with a third party, and sometimes a string argument they pass to a Javascript function I'm writing will be encoded using encodeURIComponent, sometimes it won't be.
Is there a definitive way to check whether it's been encoded using encodeURIComponent
If not, I'll do the encoding then
You could decode it and see if the string is still the same
decodeURIComponent(string) === string
Not reliably, especially in the case where a string may be encoded twice:
encodeURIComponent('http://stackoverflow.com/')
// yields 'http%3A%2F%2Fstackoverflow.com%2F'
encodeURIComponent(encodeURIComponent('http://stackoverflow.com/'))
// yields 'http%253A%252F%252Fstackoverflow.com%252F'
In essence, if you were to try and detect the string encoding when the passed argument is not actually encoded but has qualities of an encoded string, you'd be decoding something you shouldn't.
I'd recommend adding a second parameter in the definition "isURIComponent".
However, if you wanted to attempt, perhaps the following would do the trick:
if ( str.match(/[_\.!~*'()-]/) && str.match(/%[0-9a-f]{2}/i) ) {
// probably encoded with encodeURIComponent
}
This tests that the non alphanumeric characters that don't get encoded are intact, and that hexadecimals exist (e.g. %20 for a space)
Related
I have a problem where I need to preserve %2F in a URI string. However, encodeURI encodes % as %25 (as normal), so the whole string ends up being %252F instead of %2F. How can I escape this so the % doesn't get encoded. This is happening deep within a framework, so JS manipulation is not an option, it needs to be done via string escaping. I've tried using \ in a number of ways, none of it is working.
If you have control during the decode process, an unescape() before decodeURI would help.
This way, every double encoding is consistently handled.
For example:
decodeURIComponent(unescape("http%253A%252F%252Fabcd.com"));
will return
'http://abcd.com'
So from the textarea I take the shortcode %91samurai id="19"%93 it should be [samurai id="19"]:
var not_decoded_content = jQuery('[data-module_type="et_pb_text_forms_00132547"]')
.find('#et_pb_et_pb_text_form_content').html();
But when I try to decode the %91 and %93
self.content = decodeURI(not_decoded_content);
I get the error:
Uncaught URIError: URI malformed
How can i solve this problem?
The encodings are invalid. If you can't fix the whatever-system-produces-them to correctly produce %5B and %5D, then your only option is to do a replacement yourself: replace all %91 with character 91 which is '[', then replace all %93 with character 93 which is ']'.
Note that javascript String Replace as-is won't do "Replace all occurrences". If you need that, then create a loop (while it contains(...) do a replace), or search the internet for javascript replace all, you should find plenty results.
And a final note, I am used to using decodeURIComponent(...). If you can make the whatever-system-produces-them to correctly produce %5B and %5D, and you still get that error, then try using decodeURIComponent(...) instead of decodeURI(...).
The string you're trying to decode is not a URI. Use decodeURIComponent() instead.
UPDATE
Hmm, that's not actually the issue, the issues are the %91 and %93.
encodeURI('[]')
gives %5b%5d, it looks like whatever has encoded this string has used the decimal rather than hexadecimal value.
Decimal 91 = hex 5b
Decimal 93 = hex 5d
Trying again with the hex values
decodeURI('%5bsamurai id="19"%5d') == '[samurai id="19"]'
I know this is not the solution you want to see, but can you try using "%E2%80%98" for %91 and "%E2%80%9C" for %93 ?
The %91 and %93 are part of control characters which html does not like to decode (for reasons beyond me). Simply put, they're not your ordinary ASCII characters for HTML to play around with.
I have string that is encoded in UTF16 and i want to decode it using JS, when i use simple decodeURI()
function i get the desired result but in case when special characters are there in the string like á, ó, etc it do not decodes.
On more analysis i came to know that these characters in the encoded string contains the ASCII value.
Say I have string "Acesse já, Encoded version : "Acesse%20j%E1". How can i get the string from the encode version using java script?
EDIT:
The string is a part of URL
Ok, your string seems to have been encoded using escape, use unescape to decode it!
unescape('Acesse%20j%E1'); // => 'Acesse já'
However, escape and unescape are deprecated, you’d better use encodeURI or encodeURIComponent here.
encodeURIComponent('Acesse já'); // => 'Acesse%20j%C3%A1'
decodeURIComponent('Acesse%20j%C3%A1'); // => 'Acesse já'
I am trying to encode a string in javascript and decode it in php.
I use this code to put the string in a inputbox and then send it via form PUT.
document.getElementById('signature').value= b64EncodeUnicode(ab2str(signature));
And this code to decode
$signature=base64_decode($signature);
Here there is a jsfiddle for the encoding page:
https://jsfiddle.net/okaea662/
The problem is that I always get a string 98% correct but with some different characters.
For example: (the first string is the string printed in the inputbox)
¦S÷ä½m0×C|u>£áWÅàUù»¥ïs7Dþ1Ji%ýÊ{\ö°(úýýÁñxçO9Ù¡ö}XÇIWçβÆü8ú²ðÑOA¤nì6S+̽ i¼?¼ºNËÒo·a©8»eO|PPþBE=HèÑqaX©$Ì磰©b2(Ðç.$nÈR,ä_OX¾xè¥3éÂòkå¾ N,sáW§ÝáV:ö~Å×à<4)íÇKo¡L¤<Í»äA(!xón#WÙÕGù¾g!)ùC)]Q(*}?Ìp
¦S÷ ä½m0×C|u>£áWÅàUù»¥ïs7Dþ1Ji%ýÊ{\ö°(úýýÁñxçO9Ù¡ö}XÇIWçβÆü8ú²ðÑOA¤nì6S+̽ i¼?¼ºNËÒo·a©8»eO|PPþBE=HèÑ qaX©$Ì磰©b2(Ðç.$nÈR,ä_OX¾xè¥3éÂòkå¾ N ,sá W§ÝáV:ö~Å×à<4)íÇKo¡L¤<Í»äA(!xón#WÙÕGù¾g!)ùC)]Q(*}?Ìp
Note that the 4th character is distinct and then there is one or two more somewhere.
The string corresponds to a digital signature so these characters make the signature to be invalid.
I have no idea what is happening here. Any idea? I use Chrome browser and utf-8 encoding in header and metas (Firefox seems to use a different encoding in the inputbox but I will look that problem later)
EDIT:
The encoding to base64 apparently is not the problem. The base64 encoded string is the same in the browser than in the server. If I base64-decode it in javascript I get the original string but if I decode it in PHP I get a slightly different string.
EDIT2:
I still don't know what the problem is but I have avoided it sending the data in a blob with ajax.
Try using this command to encode your string with js:
var signature = document.getElementById('signature');
var base64 = window.btoa(signature);
Now with php, you simply use: base64_decode($signature)
If that doesn't work (I haven't tested it) there may be something wrong with the btoa func. So checkout this link here:
https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding
There is a function in there that should work (if the above does not)
function b64EncodeUnicode(str) {
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, function(match, p1) {
return String.fromCharCode('0x' + p1);
}));
}
b64EncodeUnicode(signature); // "4pyTIMOgIGxhIG1vZGU="
Encoding a string with German umlauts like ä,ü,ö,ß with Javascript encodeURI() causes a weird bug after decoding it in PHP with rawurldecode(). Although the string seems to be correctly decoded it isn't. See below example screenshots from my IDE
Also the strlen() of the - with rawurldecode() - decoded string gives more characters than it really has!
Problems occur when I need to process the decoded string, for example if I want to replace the German characters ä,ü,ö with ae, ue and oe. This can be seen in the example provided here.
I have also made an PHP fiddle where this whole weirdness can be seen.
What I've tried so far:
- utf8_decode
- iconv
- and also first two suggestions from here
This is a Unicode equivalence issue and it looks like your IDE doesnt handle multibyte strings very well.
In unicode you can represent Ü with either:
the single unicode codepoint (U+00DC) or %C3%9C in utf8
or use a capital U (U+0055) with a modifier (U+0308) or %55%CC%88 in utf8
Your GWT string uses the latter method called NFD while your one from PHP uses the first method called NFC. That's why your GWT string is 3 characters longer even though they are both valid encodings of logically identical unicode strings. Your problem is that they are not identical byte for byte in PHP.
More details about utf-8 normalisation.
If you want to do preg replacements on the strings you need to normalise them to the same form first. From your example I can see your IDE is using NFC since it's the PHP string that works. So I suggest normalising to NFC form in PHP (the default), then doing the preg_replace.
http://php.net/manual/en/normalizer.normalize.php
function cleanImageName($name)
{
$name = Normalizer::normalize( $name, Normalizer::FORM_C );
$clean = preg_replace(
Otherwise you have to do something like this which is based on this article.