En dash character replaced with a question mark (Ajax call)

En dash character replaced with a question mark (Ajax call) - javascript

I'm making a standard jQuery $.ajax() call, doing a POST. The call passes a string to PHP controller.
The problem is the following: when a en dash (–) character is used in the string, by the time it reaches PHP it's replaced with a (?) character. A normal hyphen (-) does not cause the problem.
The site's encoding is UTF-8. I'm not sure how to get around this problem. I probably could do some character replacing, but then do I need to do it for every single "problematic" punctuation mark?
And the problem aside, shouldn't this just work if the encoding is correct?
Confusing.
Update:
I used mb_detect_encoding() on the passed string. The result is "ASCII"... I'm working with a legacy code. How do I fix something like that?

On PHP side, the $_REQUEST global was used to retrieve the Ajax data. After I changed it to $_POST, the en dashes are kept.
I don't really get why $_REQUEST was failing, though.
Anyway, this worked in this case. I truly dislike the devs who wrote this code and created this project :)

Related

Salesforce API insert adds special characters

I am using salesforce PHP toolkit in order to insert values of javascript functions (Just in order to document functions I am using, not for execution in salesforce) inside a custom object I have.
In my PHP function I am saving a string like:
(function(d,f){var b={src:(d.location.protocol=="https:"?"https:":"http:")...
after I insert this string using SF API, The result I see in the field is:
(function(d,f){var b={src:(d.location.protocol=="https:"?"...
As you can see, salesforce has added special characters to my string.
I haven't found anyway to pass that.
Any idea's?

The solution was to remove the htmlspecialchars from the string. I didn't think that SF would accept the string without because you could not echo the string without it as well (Due to special characters in my string). But it seems that it does pass the parameter without any issue. I'd be happy to understand that if anyone understands.

Accurately unescape HTML entities in javascript

In javascript, I need to take a string and HTML un-escape it.
This question over here asks the same question, and the most popular answer involves populating a temporary div.
I've used this as well, but I think I've found a bug.
Simple example, correct behavior
If you have this string: Cats>Dogs
Unescaped, it should be: Cats>Dogs
Malformed example, wrong behavior
If you remove the semicolon and use this instead:Cats&gtDogs
You will get this as a result: Cats>Dogs
Isn't that wrong?
This struck me as odd. From what I understand, an escaped string requires the presence of a terminating semicolon, otherwise it's not escaped. After all, what if I had a store called guitars&amps? For all we know, this company exists but gets no business because it causes null reference exceptions everywhere it has records.
Any ideas on how I could perform escaping while knowingly avoiding escaping when the semicolon is missing? Currently, all I can think to do is perform the unescaping myself.
(The WYSIWYG preview in StackOverflow exhibits a similar unusual behavior, by the way. Try entering &ampgt;, this renders as >!)

Isn't that wrong?
Successful HTML parsers are tolerant. This is one of the things distinguishing them from, say, XML parsers. They don't necessarily stick to strict rules about markup, for the simple reason that there's a lot of incorrect markup out there. So they try to figure out what the markup is meant to represent. &gtDogs is more likely to mean >Dogs than &gtDogs, so that's what the parser goes with.

Should I worry that using GET in a form element doesn't automatically URL-encode angle brackets?

So I decided to use GET in my form element, point it to my cshtml page, and found (as expected) that it automatically URL encodes any passed form values.
I then, however, decided to test if it encodes angle brackets and surprisingly found that it did not when the WebMatrix validator threw a server error warning me about a potentially dangerous value being passed.
I said to myself, "Okay, then I guess I'll use Request.Unvalidated["searchText"] instead of Request.QueryString["searchText"]. Then, as any smart developer who uses Request.Unvalidated does, I tried to make sure that I was being extra careful, but I honestly don't know much about inserting JavaScript into URLs so I am not sure if I should worry about this or not. I have noticed that it encodes apostrophes, quotations, parenthesis, and many other JavaScript special characters (actually, I'm not even sure if an angle bracket even has special meaning in JavaScript OR URLs, but it probably does in one, if not both. I know it helps denote a List in C#, but in any event you can write script tags with it if you could find a way to get it on the HTML page, so I guess that's why WebMatrix's validator screams at me when it sees them).
Should I find another way to submit this form, whereas I can intercept and encode the user data myself, or is it okay to use Request.Unvalidated in this instance without any sense of worry?
Please note, as you have probably already noticed, my question comes from a WebMatrix C#.net environment.
Bonus question (if you feel like saving me some time and you already know the answer off the top of your head): If I use Request.Unvalidated will I have to URL-decode the value, or does it do that automatically like Request.QueryString does?
---------------------------UPDATE----------------------------
Since I know I want neither a YSOD nor a custom error page to appear simply because a user included angle brackets in their "searchText", I know I have to use Request.Unvalidated either way, and I know I can encode whatever I want once the value reaches the cshtml page.
So I guess the question really becomes: Should I worry about possible XSS attacks (or any other threat for that matter) inside the URL based on angle brackets alone?
Also, in case this is relevant:
Actually, the value I am using (i.e. "searchText") goes straight to a cshtml page where the value is ran through a (rather complex) SQL query that queries many tables in a database (using both JOINS and UNIONS, as well as Aliases and function-based calculations) to determine the number of matches found against "searchText" in each applicable field. Then I remember the page locations of all of these matches, determine a search results order based on relevance (determined by type and number of matches found) and finally use C# to write the search results (as links, of course) to a page.
And I guess it is important to note that the database values could easily contain angle brackets. I know it's safe so far (thanks to HTML encoding), but I suppose it may not be necessary to actually "search" against them. I am confused as to how to proceed to maximum security and functional expecations, but if I choose one way or the other, I may not know I chose the wrong decision until it is much too late...

URL and special caracters
The url http://test.com/?param="><script>alert('xss')</script> is "benign" until it is read and ..
print in a template : Hello #param. (Potential reflected/persisted XSS)
or use in Javascript : divContent.innerHTML = '<a href="' + window.location.href + ... (Potential DOM XSS)
Otherwise, the browser doesn't evaluate the query string as html/script.
Request.Unvalidated/Request.QueryString
You should use Request.Unvalidated["searchText"] if you are expecting to receive special caracters.
For example : <b>User content</b><p>Some text...</p>
If your application is working as expected with QueryString["searchText"], you should keep it since it validate for potential XSS.
Ref: http://msdn.microsoft.com/en-us/library/system.web.httprequest.unvalidated.aspx

What's the best way to handle different single quotes (’) and (') using jQuery/Perl/Mechanize?

I have a form in Html that I'm submitting with jQuer.ajax to a Perl script that uses Mechanize to process the form on an URL and everything works well, except for the fact that when I see the info that's sent to the receiving URL, the character (’) get's stored as (â), I'm not sure what's the best way to handle it, I tried JavaScript's escape(), encodeURI(), replacing (’) from jQuery before sending everything through ajax, but I'm not sure if it get's treated as the other single quote ('). I can use a JavaScript/jQuery solution or do something with Perl, I'm just not sure how should I handle it.

«’» is RIGHT SINGLE QUOTATION MARK (U+2019). Its UTF-8 encoding is E2 80 99.
If you treat E2 80 99 as iso-8859-1 or as Unicode code points, you get
LATIN SMALL LETTER A WITH CIRCUMFLEX (â)
Unnamed control character.
Unnamed control character.
This is what you are seeing. You have an encoding problem.

I would think this would be more of a problem with encoding (e.g. Unicode, ASCII, etc.) between the languages more-so than an escaping issue. I would look to see what is the encoding standards between languages and you'll probably have to convert between two prior to passing the value(s) between the languages.
Edit: As I stated previously, it's an encoding problem:
http://ecmanaut.blogspot.com/2006/07/encoding-decoding-utf8-in-javascript.html
http://ahinea.com/en/tech/perl-unicode-struggle.html

Strip the last character sent by JavaScript through websockets to Python

I'm currently trying out websockets, creating a client in JavaScript and a server in Python.
I'm stuck on a simple problem, though: when I send something from the client to the server it always contains a special ending character, but I don't know how to remove it.
I've tried data[:-1] thinking that would get rid of it, but it didn't.
With the character my JSON code won't validate.
This is what I send through JavaScript:
ws.send('{"test":"test"}');
This is what I get in python:
{"test":"test"}�
I thought the ending character was \xff

The expression "data[:-1]" is an expression that produces a copy of data missing the last character. It doesn't modify the "data" variable. To do that, you have to assign back to "data", like so:
data = data[:-1]
My suspicion is the "special ending character" is a bug, somewhere, either in your code or how you're using the APIs. Network code does not generally introduce random characters into the data stream. Good luck!

We Keep Coding

JavaScript is the programming language of the Web.

En dash character replaced with a question mark (Ajax call) - javascript

On PHP side, the $_REQUEST global was used to retrieve the Ajax data. After I changed it to $_POST, the en dashes are kept. I don't really get why $_REQUEST was failing, though. Anyway, this worked in this case. I truly dislike the devs who wrote this code and created this project :)

Related

Salesforce API insert adds special characters

Accurately unescape HTML entities in javascript

Should I worry that using GET in a form element doesn't automatically URL-encode angle brackets?

What's the best way to handle different single quotes (’) and (') using jQuery/Perl/Mechanize?

Strip the last character sent by JavaScript through websockets to Python

Categories

Resources