Spanish special characters like á ó while displaying shows jumbled or garbage value - javascript

I have a Spanish validation message which I'm trying to display using my JavaScript.
And all the special characters like above gets changed into & #243;.
And it is only happening when I'm using JavaScript, there are couple of more validation messages in Spanish which I'm displaying through server side and they are fine.
errorString = "<%:Validation.xyz %>";
I'm trying to get from resource file.
Can some one think of quick work around?

What you call garbage is actually but the HTML encoded value of the corresponding character and is there to prevent you from XSS. The encoding happens because you are using <%: which automatically HTML encodes the string but this shouldn't be a problem for your javascript. Example:
var text = 'hello &#243';
document.getElementById('foo').innerHTML = text;
works just fine and displays hello ó in the corresponding DOM element.

Check if you saved your file with UTF-8 encoding (just in case). It happens that it goes into TFS without UTF8 BOM and then mess can happen on client side.

Related

How to escape smart quote in French characters while displaying on JSP

Currently, our application supports i18n.
We have one property file for each Locale.
For English, we were able to successfully display the placeholder defined in the property file.
Problem
When we change the Locale / Language settings in the application from English to French we were unable to replace the placeholder.
Placeholder - String Date - This is being successfully returned from the controller but still we couldn't replace on the UI page.
This is because the complete value for the key specified in the JSP is not being rendered properly for French Locale & as a result placeholder is not being replaced.
The special character with which we are facing issue is smart quote (d'essai)
Unicode for this character: U+00B4
We have tried placing UTF-8 encoding in the JSP page using Meta & Page directives but still, it didn't resolve the issue.
Any help is highly appreciated.
My Code Snippet
<fmt:message bundle="${myBundle}" key="myKey"><fmt:param value="${nextDateInString}"/></fmt:message>
Votre période d’essai gratuit prend fin le {0}
As you can see {0} is not getting replaced dynamically for French locale whereas for English it is working as expected.
Tried using StringEscapeUtils.escapeJavascript(myMessageFromProperties)
also did not help
The single quote is a special character in MessageFormat-Strings used to quote text, which the is then not processed. See Java API for MessageFormat.
You need two single quotes to escape it and display one single quote.

Working with characters based on their UTF-8 hex codes

I'm working on something that will read a user's text messages and export them to a csv file, which they can then download. The messages are being retrieved from a third-party web interface—I am essentially using js to grab the html of each message and compiling it as needed. The content of each message is added to a variable which, once all message are gathered, is given to a new Blob, which is then downloaded.
The problem I am having is that, in this web interface, emoji are represented as images, rather than characters. Thus, when writing a message containing an emoji to a file, the result is as so:
"Blah blah blah <img height="18px" width="18px" class="emoji adjustedSpriteForMessageDisplay spriteEMOJI sprite-1f612" data-textvalue="%F0%9F%98%92" src="assets/blank.gif">"
Now, from this image, we can get 2 workable values:
The UTF-8 hex value
F09F9892
and the Unicode codepoint (I may be referring to this wrong, I don't know much about encoding).
U+1f612
Now, what I want to do is take either of these values (whichever works better), and write it to the csv file as the character itself. So that, when viewing the csv file in a text editor or what have you, it would appear as
Though I have no idea where to even start with this. Maybe it's as simple as throwing some syntax around the character values, but I haven't been able to get anything from google, because I'm not familiar enough with encoding to know what to Google.
I suggest preprocessing the data as you grab it from the webpage instead of extracting it from the string afterwards.
You can then use decodeURIComponent() to decode the percent-encoded string:
decodeURIComponent('%F0%9F%98%92')
Combine that with jQuery to access the data-textvalue-attribute:
decodeURIComponent($(element).data('textvalue'))
I created a simple example on JSFiddle.
For some reason the emoji doesn't render correctly in the result screen in my browser, but that is a font issue. When looking at the result using a DOM inspector (or copying the text into a different application), the result is shown with a smiley.
CSV file format does not have character encoding information, so Excel usually assumes ASCII.
https://en.wikipedia.org/wiki/Comma-separated_values#General_functionality
Microsoft Excel mangles Diacritics in .csv files?

Secure database entry against XSS

I'm creating an app that retrieves the text within a tweet, store it in the database and then display it on the browser.
The problem is that I'm thinking if the text has PHP tags or HTML tags it might be a security breach there.
I looked into strip_tags() but saw some bad reviews. I also saw suggestions to HTML Purifier but it was last updated years ago.
So my question is how can I be 100% secure that if the tweet text is "<script> something_bad() </script>" it won't matter?
To state the obvious the tweets are sent to the database from users so I don't want to check all individually before displaying them.
You are NEVER 100% secure, however you should take a look at this. If you use ENT_QUOTES parameter too, currently there are no ways to inject ANY XSS on your website if you're using valid charset (and your users don't use outdated browsers). However, if you want to allow people to only post SOME html tags into their "Tweet" (for example <b> for bold text), you will need to take a deep look at EACH whitelisted tag.
You've passed the first stage which is to recognise that there is a potential issue and skipped straight to trying to find a solution, without stopping to think about how you want to deal the scenario of the content. This is a critical pre-cusrsor to solving the problem.
The general rule is that you validate input and escape output
validate input
- decide whether to accept or reject it it in its entirety)
if (htmlentities($input) != $input) {
die "yuck! that tastes bad";
}
escape output
- transform the data appropriately according to where its going.
If you simply....
print "<script> something_bad() </script>";
That would be bad, but....
print JSONencode(htmlentities("<script> something_bad() </script>"));
...then you'd would have done something very strange at the front end to make the client susceptivble to a stored XSS attack.
If you're outputting to HTML (and I recommend you always do), simply HTML encode on output to the page.
As client script code is only dangerous when interpreted by the browser, it only needs to be encoded on output. After all, to the database <script> is just text. To the browser <script> tells the browser to interpret the following text as executable code, which is why you should encode it to <script>.
The OWASP XSS Prevention Cheat Sheet shows how you should do this properly depending on output context. Things get complicated when outputting to JavaScript (you may need to hex encode and HTML encode in the right order), so it is often much easier to always output to a HTML tag and then read that tag using JavaScript in the DOM rather than inserting dynamic data in scripts directly.
At the very minimum you should be encoding the < & characters and specifying the charset in metatag/HTTP header to avoid UTF7 XSS.
You need to convert the HTML characters <, > (mainly) into their HTML equivalents <, >.
This will make a < and > be displayed in the browser, but not executed - ie: if you look at the source an example may be <script>alert('xss')</script>.
Before you input your data into your database - or on output - use htmlentities().
Further reading: https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet

Output script tags without jQuery, avoiding execution

I have JS calling remote server through AJAX. The response contains something similar to this
<script>alert(document.getElementById('some_generated_id').innerHTML; ... </script>
User copies the response and uses for own purposes. Now I need to make sure that not a single browser runs the code when I do this:
var response = '<scrip.....';
document.getElementById('output_box').innerHTML = response;
Same should apply to any HTML tags. I know that .text() from jQuery will do exactly what I need:
var response = '<scrip.....';
$('#output_box').text(response);
I am looking for any solutions, including, but not limited to: escaping special characters, however displaying them correctly; adding zero-width space to tags (has to be efficient); outputting in parts. Has to be pure JS.
If you're using a server-side language there is probably a method to escape special characters.
In PHP you could use htmlspecialchars(), it will convert certain characters that have significance in HTML to HTML entities (i.e. & to &).
They will still display correctly and you'll be able to copy and paste the text, but the javascript shouldn't run.
If you need a pure javascript solution for this, someone has answered that here https://stackoverflow.com/a/4835406/15000

french chars html - javascript

I have an html page were i can fill in some text and send (with javascript) this to an sql-database.
On my pc, everything works fine, but on another one (a french windows), it doesn't save my chars correctly.
french chars like é, è, â,.. were saved as 'É', or something like that.
I googled a lot but still did not found any solution, i'm also not able to reproduce the problem on my own pc..
"É" occurs when a character encoded in utf-8 (2 bytes) is read as latin (1 byte). The problem can be on the client side (e.g. by the use of escape) or on the server side (wrong parsing of the form's POST data, database encoding).
Make sure that your html pages encoding is set to something like UTF-8, UTF-16, etc... Also make sure that your strings are escaped properly in javascript.
You need to encode the file in ANSI. I do this my self. For example in notepad 2 you would click File->Encoding->ANSI and then save.

Categories