encodeURIComponent encodes differently, depending on environment

encodeURIComponent encodes differently, depending on environment - javascript

I am passing an object via the url using:
encodeURIComponent(JSON.stringify(myObject))
"ä" is encoded as "%C3%A4" on my local server.
Unfortunately it is encoded as "a%CC%88" on the webserver.
Which breaks my app because it is part of the name of a database field which isn't found when wrong encoded. And I can't control that there are no ä's in field names because the app allows users to upload their own data.
How can I make sure that "ä" is always encoded correctly?
SORRY. To make this clear: The encoding happens both times client-side in the browser. But when the web-app is served from the webserver the "ä" is encoded as "%C3%A4" instead of "a%CC%88" (I've tested both in the same chrome browser)
Thanks for all your help. It got me to dig deeper:
I have code that runs on an event. It loops through checkboxes and creates an array of objects containing (also) the field names. The code gets the field names from an attribute named "feld" of the checkbox:
<div class="checkbox">
<label>
<input class="feld_waehlen" type="checkbox" dstyp="Taxonomie" datensammlung="SISF Index 2 (2005)" feld="Artname vollständig">Artname vollständig
</label>
</div>
running this code:
console.log("this.getAttribute('feld') = " + this.getAttribute('feld'));
gives as expected: $(this).attr('feld') = Artname vollständig
If while looping, I run:
console.log('encodeURIComponent("Artname vollständig") = ' + encodeURIComponent("Artname vollständig"));
the answer is correct: encodeURIComponent("Artname vollständig") = Artname%20vollst%C3%A4ndig
But if I run:
console.log("encodeURIComponent(this.getAttribute('feld')) = " + encodeURIComponent(this.getAttribute('feld')));
the answer is: encodeURIComponent(this.getAttribute('feld')) = Artname%20vollsta%CC%88ndig
This happens all in the browser. But the issue only appears, when the web-app is served from the webserver (a couchapp running on cloudant.com).
How can it be that the method "getAttribute" returns a different encoding?

The following code has been tested on Chrome 29 OS X, IE 8 Windows XP.
encodeURIComponent("ä") //%C3%A4"
decodeURIComponent("%C3%A4") //ä
so basically "%C3%A4" should be the expected output.
I think the issue here might be encodeURIComponent require a UTF-8 encoding while your server-side language returns something other than this.
encodeURICompoent - MDN

just a follow up in case somebody runs into this issue later.
It seems to be unique to cloudant.com where my couchapp was hosted.
This is the answer I got from their very helpful support:
OK - I think I've found the culprit. The issue is that, due to internal optimisations (which are not present in CouchDB), the form of unicode strings can get changed. In this case, ä is represented as:
U+0061 LATIN SMALL LETTER A character
U+0308 COMBINING DIAERESIS character (̈)
instead of
U+00E4 LATIN SMALL LETTER A WITH DIAERESIS character (ä)
Both are semantically equivalent, so the fix is to normalize your unicode strings before comparison. Unfortunately, JavaScript has no built-in unicode normalization, but you can use a library such ashttps://github.com/walling/unorm.
It's not an issue for me any more as I changed to a virtual server running on digitalocean.com with vanilla couchdb (and am very happy with it).
But I do think this could hit others developing couchapps in German or other languages needing utf8 and hosting them on cloudant.com
Thanks for your great help.
Alex

Related

Open word on local computer - Office URI Schemes special characters

I want to open word-documents by clicking on a link in my solution. The link below shows how it is structured in ofe for office. This solution is really nice because it works in every browser but i have problems with special characters.
ms-word:ofe|u|file://our.local/Testing ÅÄÖ.DOCX
I Have tried different approaches to solve this problem but its not working when åäö is present in the path. EncodeURI on the path does not help for instance.
https://learn.microsoft.com/en-us/office/client-developer/office-uri-schemes does not describe anything out of the ordinary and only follow URI spec.
Documents without special characters works great but i can not figure out how special characters should be encoded to make it work.
If i take the file:\... part and past it into any browser it is working but not with the ofe prefix. So it should be some problem with encoding due to it is working fine without any special characters.
Running in cmd is also working:
So in this case i guess the browser is encoding the characters before sending to protocolhandler ??

How to make python dictionary from big Javascript object

I have one string. which represents JavaScript object.
https://docs.google.com/document/d/1CFONQntMFMdtD-04rk9uut4UpLyB_OSsDH0bwDZ0tuM/edit?usp=sharing
When i'm using json.loads() python raises Exception: JSONDecodeError: Extra data.
What i'm doing wrong?
P.S.: It's dynamic object, I can't change him

As #DSupreme says, there are extra characters in your data.
This can be caused from various reasons, one being stringifying JSON files and parsing in different programs and languages. If you need to verify if a JSON string is in a correct format, you can use verifying tools such as http://jsonlint.com/ .
For your specific problem, there is an extra ' at the beginning of your string, / for escaping characters and you're also missing the double quote in the end of your string.

Is this what you are looking for
Things i changed:
Removed ' in the beginning
Remove extra / used for escaping double quotes
Added double quotes in the end
{"log_timeout":1000,"featureFlags":{"serverRenderingForBotsOnly":false,"experienceLevelOnFixedPriceJobs":true,"JSUIPaymentVerified":true,"JSUI736SaveSearchRedesign":true,"JSUI341ProposalsFilter":true},"csrfTokenCookieName":"XSRF-TOKEN","csrfTokenHeaderName":"X-Odesk-Csrf-Token","runtime_id":"32a28ec0399c4e8a-DME","clientStatsDMetrics":true,"smfAjax":false,"pageSpeedMetrics":false,"ccstCookieName":"oauth2_global_js_token","pageId":"User","isSearchWithEmptyParams":false,"queryParsedParams":{"q":"python"},"jobs":[{"title":"Python
app engine, Linux,
infrastructure","createdOn":"2017-01-31T16:57:42+00:00","type":2,"ciphertext":"~01440d6a85cc997768","description":"If
you know python and have at least some knowledge of app engine
development. Willing to learn and positive. A big plus if you have
worked with the linux environment and knows how to set up servers on
google-cloud or AWS. \n\nWe can offer a job for a longer period of
time if you are the right fit.\n\nAt least 5 years experience as a
developer required.","category2":"Web, Mobile & Software
Dev","subcategory2":"Other - Software
Development","skills":[{"name":"google-app-engine","prettyName":"Google
App Engine"},{"name":"python","prettyName":"Python"}],"duration":"1 to
3 months","engagement":"30+
hrs\/week","amount":{"currencyCode":"USD","amount":0},"recno":209422776,"client":{"paymentVerificationStatus":null,"location":{"country":null},"totalSpent":0,"totalReviews":0,"totalFeedback":0,"companyRid":0,"companyName":null,"edcUserId":0,"lastContractPlatform":null,"lastContractRid":0,"lastContractTitle":null,"feedbackText":"No
feedback
yet"},"freelancersToHire":0,"relevanceEncoded":"{}","maxAmount":{"currencyCode":"USD","amount":0},"enterpriseJob":false,"tierText":"Intermediate
($$)","isSaved":false,"feedback":null,"proposalsTier":"5 to
10","isApplied":false,"sticky":true,"stickyLabel":"Interesting
Job","jobTs":"1485881862000"},{"title":"Python</span>
Developer","createdOn":"2017-02-01T04:00:20+00:00","type":2,"ciphertext":"~019767ae6381e97a90","description":"Python</span> Developer who can read and
understand the legacy script and apply that understanding to an
existing Python business \nservice. ... Python Developer who can read
and understand the legacy script and apply that understanding to an
existing Python</span> business
\nservice.",
"category2":"Web,
Mobile & Software Dev",
"subcategory2":"Other - Software Development",
"skills":[],
"duration":"Less than 1 week",
"engagement":"Less than 10"
}

Problem was in { } at the end and begin of the string

Browser strips encoded dot character from url

I have a web app, which allows searching. So when I go to somedomain.com/search/<QUERY> it searches for entities according to <QUERY>. The problem is, when I try to search for . or .. it doesn't work as expected (which is pretty obvious). What surprised me though, is that if I manually enter the url of somedomain.com/search/%2E, the browser (tested Chrome and IE11) converts it somedomain.com/search/ and issues a request without necessary payload.
So far I haven't found anything that would say it's not possible to make this work, so I came here. Right now I have only one option: replacing . and .. to something like __dotPlaceholder__, but this feels like a dirty hack to me.
Any solution (js or non-js) will be welcomed. Any information on why do browsers strip url-encoded dots is also a nice-to-have.

Unfortunately part of RFC3986 defines the URI dot segments to be normalised and stripped out in that case, ie http://example.com/a/./ to become http://example.com/a
see https://www.rfc-editor.org/rfc/rfc3986#page-33 for more information

Advanced Javascript Obfuscation

I've been training heavily in JS obfuscation, starting to know my way around all advanced concepts, but I recently found an obfuscated code, I believe it is some form of "Native Javascript Code", I just can't find ANY documentation on this type of obfuscation :
Here is a small extract :
'\141\75\160\162\157\155\160\164\50\47\105\156\164\162\145\172\40'
It is called this way :
eval(eval('\141\75\160\162\157\155\160\164\50\47\105\156\164\162\145\172\40'))
Since the code is the work of another and I encoutered it in a JS challenge I'm not posting the full code, so the example I gave won't work, but the full code does work.
So here is my question:
What type of code is this? And where can I learn more about it?
Any suggestions appreciated :)

It's just a string with the characters escaped. You can read it in the JavaScript console in any browser:
console.log('\141\75\160\162\157\155\160\164\50\47\105\156\164\162\145\172\40')
will print:
"a=prompt('Entrez "

It's just escaped characters, one part outputting the string of a query and another actually running the returned string - try calling it in a console.
eval('\160\162\157\155\160\164\50\47\105\156\164\162\145\172\47\51')
Might help?

These numbers is the ascii codes (http://www.asciitable.com/index/asciifull.gif) of characters (in Octal representation).
You can convert it to characters. This is used when somebody wants to make an XSS attack, or wants to hide the js code.
So the string what you written represents:
a=prompt('Entrez
The js engines, browsers can translate the octal format to the 'real' string. With eval function it could run. (in case the 'translated' code has no syntax errors)

HTML5 History.pushState mangles URL's containing percent encoded non-Ascii (Unicode) chars

In an OSS web app, we have JS code that performs some Ajax update (uses jQuery, not relevant). After the page update, a call is made to the html5 history interface History.pushState, in the following code:
var updateHistory = function(url) {
var context = { state:1, rand:Math.random() };
/* -----> bedfore the problem call <------- */
History.pushState( context, "Questions", url );
/* -----> after the problem call <------- */
setTimeout(function (){
/* HACK: For some weird reson, sometimes something overrides the above pushState so we re-aplly it
This might be caused by some other JS plugin.
The delay of 10msec allows the other plugin to override the URL.
*/
History.replaceState( context, "Questions", url );
}, 10);
};
[Please note: the full code segment is provided for context, the HACK part is not the issue of this question]
The app is i18n'ed and is using URL encoded Unicode segments in the URL's, so just before the marked problem call in the above code, the URL argument contains (as inspected in Firebug):
"/%D8%A7%D9%84%D8%A3%D8%B3%D8%A6%D9%84%D8%A9/scope:all/sort:activity-desc/page:1/"
The encoded segment is utf-8 in percent encoding. The URL in the browser window is: (just for completeness, doesn't really matter)
http://<base-url>/%D8%A7%D9%84%D8%A3%D8%B3%D8%A6%D9%84%D8%A9/
Just after the call, the URL displayed in the browser window changes to:
http://<base-url>/%C3%98%C2%A7%C3%99%C2%84%C3%98%C2%A3%C3%98%C2%B3%C3%98%C2%A6%C3%99%C2%84%C3%98%C2%A9/scope:all/sort:activity-desc/page:1/
The URL encoded segment is just mojibake, the result of using the wrong encoding at some level. The correct URL would've been:
http://<base-url>/%D8%A7%D9%84%D8%A3%D8%B3%D8%A6%D9%84%D8%A9/scope:all/sort:activity-desc/page:1/
This behavior has been tested on both FF and Chrome.
The history interface specs don't mention anything about encoded URL's, but I assume the default standard for URL formation (utf-8 and percent encoding etc) would apply when using URL's in function calls for the interface.
Any idea on what's going on here.
Edit:
I wasn't paying attention to the uppercase H in History - this code is actually using the History.js wrapper for the history interface. I replaced with a direct call to history.pushState (notice the lowercase h) without going through the wrapper, and the code is working as expected as far as I can tell. The issue with the original code still stands - so an issue with the History.js library it seems.

Update
As Doug S explains in the comments below, the latest version of History.js includes a fix for this behaviour. He also found that my solution caused double-encoding when used in browsers (such as IE 9 and below) which require the hash fallback, so I recommend that instead of using the fix detailed below, just download the latest version.
I've kept my original answer below, since it does explain what's going on in much more detail.
Basel found a resolution of sorts, but there's still some confusion about what's happening under the hood. This answer goes into detail about the problem and suggests a better fix. (You can skip straight to the fix if you want.)
The problem
First, open your browser's JS console and run this:
window.encodeURI(window.unescape('%D8%A7%D9%84%D8%A3%D8%B3%D8%A6%D9%84%D8%A9'))
Does that look familiar? It should—that's what your URL is being mangled to. The problem lies in the implementation of History.unescapeString, specifically this line:
tmp = window.unescape(result);
window.unescape is a DOM Level 0 function—which is to say, an unstandardised relic from the hoary days of Netscape 2. It uses the escaping rules defined in RFC 2396, according to which characters outside of the unreserved range (alphanumerics and a small set of punctuation symbols) are encoded as octets.
This works fine for the US-ASCII range, but not all (indeed, the vast majority) of the characters in UTF-8 can be represented in a single byte. Since URIs do not have a built-in way of representing the character set being used, window.unescape just assumes each character maps to a single octet and blithely mangles any that don't.
In this example, the first letter in your URL is the Arabic letter alef (ا), represented by two bytes: 0xD8 0xA7. window.unescape interprets these as two separate characters: 0x00 0xD8 (Ø—capital O with stroke) and 0x00 0xA7 (§—section sign).
This is a known issue with History.js.
The fix
As noted above by the asker, the issue can be sidestepped by using the native implementation of the History API instead of the History.js wrapper, i.e. history.pushState instead of History.pushState.
This works for browsers that support the History API, but loses the benefit of having a polyfill for those that don't. Fortunately, there's a better fix. Open up the History.js source you're referencing and find this line (~1059 in my copy):
tmp = window.unescape(result);
Replace it with:
tmp = window.unescape(encodeURIComponent(result));
Or, if you're using the compressed source, replace a.unescape(c) with a.unescape(encodeURIComponent(c)).
To test this change, I ran the History.js HTML5 jQuery test suite on a local web server inside an Arabic-named directory. Before making the change, test 14 fails; after the change, all tests passed.
Credit
Though I found the problem and solution independently, Damien Antipa deserves credit for finding it first and making a pull request with the fix.

I'm still able to reproduce this in the following case:
History.pushState(null, null, "?" + some_Unicode_String_Or_A_String_With_Whitespace);
document.location.hash += "&someStuff";
In this case the _suid parameter gets removed and &someStuff as well. If the string is not unicode or doesn't have whitespaces (so no % chars) - this does not happen.
This workaround worked for me:
History.pushState(null, null, "?" + some_Unicode_String_Or_A_String_With_Whitespace + "&someStuff");

We Keep Coding

JavaScript is the programming language of the Web.