JavaScript not decoding parameter - javascript

So from the textarea I take the shortcode %91samurai id="19"%93 it should be [samurai id="19"]:
var not_decoded_content = jQuery('[data-module_type="et_pb_text_forms_00132547"]')
.find('#et_pb_et_pb_text_form_content').html();
But when I try to decode the %91 and %93
self.content = decodeURI(not_decoded_content);
I get the error:
Uncaught URIError: URI malformed
How can i solve this problem?

The encodings are invalid. If you can't fix the whatever-system-produces-them to correctly produce %5B and %5D, then your only option is to do a replacement yourself: replace all %91 with character 91 which is '[', then replace all %93 with character 93 which is ']'.
Note that javascript String Replace as-is won't do "Replace all occurrences". If you need that, then create a loop (while it contains(...) do a replace), or search the internet for javascript replace all, you should find plenty results.
And a final note, I am used to using decodeURIComponent(...). If you can make the whatever-system-produces-them to correctly produce %5B and %5D, and you still get that error, then try using decodeURIComponent(...) instead of decodeURI(...).

The string you're trying to decode is not a URI. Use decodeURIComponent() instead.
UPDATE
Hmm, that's not actually the issue, the issues are the %91 and %93.
encodeURI('[]')
gives %5b%5d, it looks like whatever has encoded this string has used the decimal rather than hexadecimal value.
Decimal 91 = hex 5b
Decimal 93 = hex 5d
Trying again with the hex values
decodeURI('%5bsamurai id="19"%5d') == '[samurai id="19"]'

I know this is not the solution you want to see, but can you try using "%E2%80%98" for %91 and "%E2%80%9C" for %93 ?
The %91 and %93 are part of control characters which html does not like to decode (for reasons beyond me). Simply put, they're not your ordinary ASCII characters for HTML to play around with.

Related

UTF8 String not allowing charAt() or substring to pull out specific characters

In my code I'm trying to isolate out the first character of a variable, it is the UTF8 symbol: 🌈
The code to outputs are as follows:
Code:
console.log(login_name);
console.log(login_name.charAt(0));
console.log(login_name.substring(0,1));
Output:
🌈 ✨✨✨UTF8MB4
�
�
Obviously, I want .charAt() to print 🌈 and not �. Any known oddities with utf8mb4 that I'm missing? My main problem is I don't know how to word this specific problem.
Also if I swap the rainbow for/ target the ✨, it functions as it should and prints properly.
JavaScript can't handle Unicode properly. charAt() operates on code units instead of code points.
Luckily JavaScript has workarounds. To get the characters in a string instead of UTF-16/UCS-2 code units you need to call Array.from(yourstring), which will get you an array of characters. From there on you can get the first element in the usual way.
let characters = Array.from(login_name);
console.log(characters.shift());

JSON.parse string with unexpected token

Parsing this string I get an unexpected token error, what is the unexpected token?
JSON.parse("​[{"attr1":079455,"Attr2": 3},{"Attr1":847987​​,"Attr2": 3}]​​​");
I keep looking here at the documentation but I'm just not seeing what's wrong with this string? I've tried all sorts of stringifying and replacing double quotes with single ect.
JSON format does not allow leading zeroes on numbers, except for the special case of 0 or floating point numbers that begin with 0.. See the diagram that shows the format of numbers at http://www.json.org/.
So the number 079455 is not valid JSON.
You should fix the program that's generating the JSON in the first place. It should use a library function to produce JSON, instead of formatting it by hand.
If you can't, you could use the following clumsy Javascript to remove the extraneous zeroes:
json_str = json_str.replace(/":0+/, '":');
As well as incorrect number formats, you are not correctly wrapping your String. If you want to include " characters inside your string, you should wrap it with ':
JSON.parse('[{"attr1":79455,"Attr2": 3},{"Attr1":847987,"Attr2": 3}]');

Javascript encodeURI() vs. PHP rawurldecode() and special characters

Encoding a string with German umlauts like ä,ü,ö,ß with Javascript encodeURI() causes a weird bug after decoding it in PHP with rawurldecode(). Although the string seems to be correctly decoded it isn't. See below example screenshots from my IDE
Also the strlen() of the - with rawurldecode() - decoded string gives more characters than it really has!
Problems occur when I need to process the decoded string, for example if I want to replace the German characters ä,ü,ö with ae, ue and oe. This can be seen in the example provided here.
I have also made an PHP fiddle where this whole weirdness can be seen.
What I've tried so far:
- utf8_decode
- iconv
- and also first two suggestions from here
This is a Unicode equivalence issue and it looks like your IDE doesnt handle multibyte strings very well.
In unicode you can represent Ü with either:
the single unicode codepoint (U+00DC) or %C3%9C in utf8
or use a capital U (U+0055) with a modifier (U+0308) or %55%CC%88 in utf8
Your GWT string uses the latter method called NFD while your one from PHP uses the first method called NFC. That's why your GWT string is 3 characters longer even though they are both valid encodings of logically identical unicode strings. Your problem is that they are not identical byte for byte in PHP.
More details about utf-8 normalisation.
If you want to do preg replacements on the strings you need to normalise them to the same form first. From your example I can see your IDE is using NFC since it's the PHP string that works. So I suggest normalising to NFC form in PHP (the default), then doing the preg_replace.
http://php.net/manual/en/normalizer.normalize.php
function cleanImageName($name)
{
$name = Normalizer::normalize( $name, Normalizer::FORM_C );
$clean = preg_replace(
Otherwise you have to do something like this which is based on this article.

javascript json - problem decoding ajax json array from php

I'm using php's json_encode() to convert an array to json which then echo's it and is read from a javascript ajax request.
The problem is the echo'd text has unicode characters which the javascript json parse() function doesn't convert to.
Example array value is "2\u00000\u00001\u00000\u0000-\u00001\u00000\u0000-\u00000\u00001" which is "2010-10-01".
Json.parse() only gives me "2".
Anyone help me with this issue?
Example:
var resArray = JSON.parse(this.responseText);
for(var x=0; x < resArray.length; x++) {
var twt = resArray[x];
alert(twt.date);
break;
}
You have NUL characters (character code zero) in the string. It's actually "2_0_1_0_-_1_0_-_0_1", where _ represents the NUL characters.
The unicode character escape is actually part of the JSON standard, so the parser should handle that correctly. However, the result is still a string will NUL characters in it, so when you try to use the string in Javascript the behaviour will depend on what the browser does with the NUL characters.
You can try this in some different browsers:
alert('as\u0000df');
Internet Explorer will display only as
Firefox will display asdf but the NUL character doesn't display.
The best solution would be to remove the NUL characters before you convert the data to JSON.
To add to what Guffa said:
When you have alternating zero bytes, what has almost certainly happened is that you've read a UTF-16 data source without converting it to an ASCII-compatible encoding such as UTF-8. Whilst you can throw away the nulls, this will mangle the string if it contains any characters outside of ASCII range. (Not an issue for date strings of course, but it may affect any other strings you're reading from the same source.)
Check where your PHP code is reading the 2010-10-01 string from, and either convert it on the fly using eg iconv('utf-16le', 'utf-8', $string), or change the source to use a more reasonable encoding. If it's a text file, for example, save it in a text editor using ‘UTF-8 without BOM’, and not ‘Unicode’, which is a highly misleading name Windows text editors use to mean UTF-16LE.

VBScript chr() appears to return wrong value

I'm trying to convert a character code to a character with chr(), but VBScript isn't giving me the value I expect. According to VBScript, character code 199 is:
�
However, when using something like Javascript's String.fromCharCode, 199 is:
Ç
The second result is what I need to get out of VBScript's chr() function. Any idea what the problem is?
Edited to reflect comments
Chr(199) returns a 2-byte character, which is being interpreted as 2 separate characters.
use ChrW(199) to return a Unicode string.
use ChrB(199) to return it as a single-byte character
Encoding is the problem. Javascript may be interpreting as latin-1; VBScript may be using a different encoding and getting confused.
The fromCharCode() takes the specified Unicode values and returns a string.
The Chr() function converts the specified ANSI character code to a character.

Categories