On my website visitors can do some inline editing. I use ajax for it with a MySQL database and PHP. I expect the Dutch language to be used on the website.
My challenge is to get the character encoding to work well.
I could use advice on:
the database (do i use utf-8? latin1_swedish_ci)
the tables in the database (i'd prefer to heve them similar to the database.)
the escaping to use in the ajax call (x = escape(x);)
the webpage character set (UTF-8? ISO-something?)
how this all works together.
I use nicEdit as javascript wysiwyg editor.
I could of course explain what happens whan I want to save ë and if that helps I will, but I figured it would be best to understand the matter instead of just trying to quick-fix it.
[EDIT]
To elaborate:
I use these in my PHP
$input = stripslashes($input); //(if magic quotes are 'on')
$input = mysql_real_escape_string($input);
$input = strip_tags($input, '<strong><em><span><ul><ol><p><a><br><li>');
In my htmlpage:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Javascript:
x = excape(x);
Database:
MySQL connection collation: utf8_general_ci
Table options: DEFAULT CHARSET=utf8
This is an example of what happens:
I enter (inline) the word Rëg (using 'option+u' then 'e' on my mac).
I save the word. It shows like this: R�g on the webpage.
In the database i find Rëg.
I open the editor, do nothing but save again and it shows: R%uFFFDg in the database as well as on the page. After that it does not change anymore.
Any help is greatly appreciated.
Kim
It shows like this: R�g on the webpage.
You need to instruct the webbrowser that you're displaying the webpage in UTF-8 and that it should interpret it as the same. Add the following to top of your PHP, before emitting any character to the output:
header('Content-Type: text/html; charset=utf-8');
Only the <meta> tag is not enough. This is not used by the webbrowser. It's the response header which counts. By the way, Javascript's escape() function is deprecated.
See also:
PHP UTF-8 cheatsheet
Just use UTF-8 for everything, and normally it will just work.
Related
Say i have a form with which user inputs some information and is submited to server using php and in PHP code i have say
$data = $_POST['data'];
// or
$data = strip_tags(#$_POST['data']);
I want to know of the strip_tags() is enough to stop javascript injection through html forms. If not how else can this be prevented. I have read here.
And also say i input javascript:void(document.bgColor="blue") in the browser address bar, this changes the whole site background color to blue. How can javascript injection through the address bar be prevented.
Thanks.
i suggest to use htmlspecialchars when ever you want to output something to browser
echo htmlspecialchars($data, ENT_QUOTES, 'UTF-8');
checkout this
For question 2, I'm not sure if that's even possible to prevent. It's not something I've ever considered before. It sounds like you're trying to prevent executing any javascript that wasn't included by you on the page, which would also mean blocking the devtools in the browser from executing anything in the console. This could potentially be hostile to your users, e.g. if they wanted to use a bookmarklet from Instapaper.
For 1, ultimately your goal is to avoid including this injected javascript from the form when you generate a new page. When you output the data from the form, you can wrap it in htmlspecialchars.
It's depend which output you are trying to get.
In some cases , you'll want to leave the HTML tags including script tags ,but you want that those elements will not run when you output them, in that case you should use htmlspecialchars($_POST['data']), (It's suggested to define also utf8 as the third parameter).
But if you want to remove entierly the tags than strip_tags will prevent XSS
One function cannot fully protect you from script injection. Consider the following program:
<?php
if(isset($_POST['height']))
$height=htmlspecialchars($_POST['height'], ENT_QUOTES, 'UTF-8');
else $height=200;
if(isset($_POST['width']))
$height=htmlspecialchars($_POST['width'], ENT_QUOTES, 'UTF-8');
else $width=300;
echo("
<!DOCTYPE html>
<html>
<body>
<iframe src='whatever' height=$height width=$width>
</iframe>
</body>
</html>
");
The input is sanitized, but javascript will still be executed through a simple injection vector like:
300 onload=alert(String.fromCharCode(88)+String.fromCharCode(83)+String.fromCharCode(83))
You still need to quote your attributes or you are vulnerable like this example.
Another semi-common injection vector exists when user input is echoed into javascript comments, and you can inject new lines or close the comment. I blame it on the 'this shit doesn't work as it should, but let's keep it around in a comment'-style of development.
Note: The XSS protection of many browsers will not run my simple example. If you want to try it use one without protection, or find a vector that defeats it (not sure if there is one for e.g. Chrome).
I have created a website based on
SQL Server 2014
C# (ASP.NET)
Javascript and jQuery
The user can store information within a textbox on my site.
To prevent injection, I can use encode / decode from special character.
The user should be able to submit code like below but the code should not executed. So far so good.
<script type="text/javascript">
$(document).ready(function () {
console.log('uuuuups.....');
});
</script>
This code will be stored as is to database. (without encoding first).
Now I would like to offer a ckeditor to my users and give the ability to use the code-plugin. The code-plugin itself creates the following code:
<pre class="brush:jscript;">
<script type="text/javascript">
$(document).ready(function () {
console.log('uuuuups....');
});
</script></pre>
I have tried to replace charcter within SQL like
replace(replace(#text, '<', '<'), '>', '>')
But this seems to break the code when I try to view.
My problem now is, how to handle this?
Do I have encode twice?
Every hint will be appreciated.
i am using ckeditor in my web site, and i face the same issue, it is about javascript injection. how to prevent it without disrupting the view.
Try in your server side to parse the "< script ...>javascript code ...</script>" and clear it. I think it is not difficult to find this tag in asp.net nor in php.
Good luck
I'm creating an app that retrieves the text within a tweet, store it in the database and then display it on the browser.
The problem is that I'm thinking if the text has PHP tags or HTML tags it might be a security breach there.
I looked into strip_tags() but saw some bad reviews. I also saw suggestions to HTML Purifier but it was last updated years ago.
So my question is how can I be 100% secure that if the tweet text is "<script> something_bad() </script>" it won't matter?
To state the obvious the tweets are sent to the database from users so I don't want to check all individually before displaying them.
You are NEVER 100% secure, however you should take a look at this. If you use ENT_QUOTES parameter too, currently there are no ways to inject ANY XSS on your website if you're using valid charset (and your users don't use outdated browsers). However, if you want to allow people to only post SOME html tags into their "Tweet" (for example <b> for bold text), you will need to take a deep look at EACH whitelisted tag.
You've passed the first stage which is to recognise that there is a potential issue and skipped straight to trying to find a solution, without stopping to think about how you want to deal the scenario of the content. This is a critical pre-cusrsor to solving the problem.
The general rule is that you validate input and escape output
validate input
- decide whether to accept or reject it it in its entirety)
if (htmlentities($input) != $input) {
die "yuck! that tastes bad";
}
escape output
- transform the data appropriately according to where its going.
If you simply....
print "<script> something_bad() </script>";
That would be bad, but....
print JSONencode(htmlentities("<script> something_bad() </script>"));
...then you'd would have done something very strange at the front end to make the client susceptivble to a stored XSS attack.
If you're outputting to HTML (and I recommend you always do), simply HTML encode on output to the page.
As client script code is only dangerous when interpreted by the browser, it only needs to be encoded on output. After all, to the database <script> is just text. To the browser <script> tells the browser to interpret the following text as executable code, which is why you should encode it to <script>.
The OWASP XSS Prevention Cheat Sheet shows how you should do this properly depending on output context. Things get complicated when outputting to JavaScript (you may need to hex encode and HTML encode in the right order), so it is often much easier to always output to a HTML tag and then read that tag using JavaScript in the DOM rather than inserting dynamic data in scripts directly.
At the very minimum you should be encoding the < & characters and specifying the charset in metatag/HTTP header to avoid UTF7 XSS.
You need to convert the HTML characters <, > (mainly) into their HTML equivalents <, >.
This will make a < and > be displayed in the browser, but not executed - ie: if you look at the source an example may be <script>alert('xss')</script>.
Before you input your data into your database - or on output - use htmlentities().
Further reading: https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet
I'm trying to display the pound symbol in HTML (from PHP) but all I get is a symbol with a question mark.
The following are things that I've tried.
In PHP:
header('Content-type: text/html; charset=utf-8');
In HTML, put this in the head tag:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
I tried displaying it using a javascript function which converts it to:
�
I suppose it would help if I knew what I was doing... but I guess that's why I'm asking this question :)
Educated guess: You have a ISO-8859-1 encoded pound sign in a UTF-8 encoded page.
Make sure your data is in the right encoding and everything will work fine.
Use £. I had the same problem and solved it using jQuery:
$(this).text('£');
If you try this and it does not work, just change the jQuery methods,
$(this).html('£');
This always work in all contexts...
1st: the pound symbol is a "special" char in utf8 encoding (try saving £$ in a iso-8859-1 (or iso-8859-15) file and you will get ä when encoding using header)
2nd: change your encoding to utf8 form the file.
there are plenty of methods to do it.
notepad and notepad++ are great sugestions.
3rd: use ob_start(); (in php) BEFORE YOU MAKE ANY OUTPUT if you are getting weird encoding errors, like missing the encoding sometimes.
and YES, this solves it!
this kind of errors occurs when a page is encoded in windows-1252(ANSI),ASCII,iso-8859-1(5) and then you have all the others in utf8.
this is a terrible error and can cause weird things like session_start(); not working.
4th: other php solutions:
utf8_encode('£');
htmlentities('£');
echo '£';
5th: javascript solutions:
document.getElementById('id_goes_here').innerText.replace('£','£');
document.getElementById('id_goes_here').innerText.replace('£',"\u00A3");
$(this).html().replace('£','£'); //jquery
$(this).html().replace('£',"\u00A3"); //jquery
String.fromCharCode('163');
you MUST send £, so it will repair the broken encoded code point.
please, avoid these solutions!
use php!
these solutions only show how to 'fix' the error, and the last one only to create the well-encoded char.
Have you tried displaying a £ ?
Here is an overwhelming list.
You could try using £ or £ instead of embedding the character directly; if you embed it directly, you're more likely to run into encoding issues in which your editor saves the file is ISO-8859-1 but it's interpreted as UTF-8, or vice versa.
If you want to embed it (or other Unicode characters) directly, make sure you actually save your file as UTF-8, and set the encoding as you did with the Content-Type header. Make sure when you get the file from the server that the header is present and correct, and that the file hasn't been transcoded by the web server.
Or for other code equivalents try:
£
£
You need to save your PHP script file in UTF-8 encoding, and leave the <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> in the HTML.
For text editor, I recommend Notepad++, because it can detect and display the actual encoding of the file (in the lower right corner of the editor), and you can convert it as well.
This works in all chrome, IE, Firefox.
In Database > table > field type .for example set the symbol column TO varchar(2) utf8_bin
php code:
$symbol = '£';
echo mb_convert_encoding($symbol, 'UTF-8', 'HTML-ENTITIES');
or
html_entity_decode($symbol, ENT_NOQUOTES, 'UTF-8');
And also make sure set the HTML OR XML encoding to encoding="UTF-8"
Note: You should make sure that database, document type and php code all have a same encoding
How ever the better solution would be using £
I'm running into a character encoding issue when I load a dropdown using jQuery from an external js file. This only seems to happen when the JavaScript object is not within the page.
For example the below is the JavaScript object.
var langs = [
{value:'zh-CN', text:'中文 (简体) Chinese Simplified'},
{value:'en', text:'English'},
{value:'eo', text:'EsperAnt'},
{value:'es', text:'Español'},
{value:'ja', text:'日本語 (Japanese)'},
{value:'pt-PT', text:'Português'},
{value:'ru', text:'Русский (Russian)'},
];
If this is in my page with the proper meta tags <meta http-equiv="content-type" content="text/html; charset=utf-8" /> the below code works.
$(document).ready(function() {
// Fill language select
$.each(langs, function(i, j){
$('#LangSelect').append($("<option></option>").attr("value",j.value).text(j.text));
});
But, since I need languages on more then one page I've moved the langs object to an external js file and reference it. After doing this, I run into encoding issues such as russian characters become РуÑÑкий (Russian).
This encoding issues seems to still appear even when the reference to the external js file is set as below:
<script type="text/javascript" charset="UTF-8" src="externalJS.js"></script>
Is there anyway to force the JavaScript object to be loaded with the proper encoding from an external file?
Please note I am experiencing these issues when viewing content on the iPhone Mobile Safari browser. Additionally these pages are simply html and JavaScript without any server side components.
Thanks in advance,
Ben
Is there anyway to force the JavaScript object to be loaded with the proper encoding from an external file?
Yes, the script charset attribute as you quoted. However it historically didn't work everywhere and was best not relied on. Where this is not supported, the browser will always use the charset of the main page as the charset in the script. So as long as you include the UTF-8 charset parameter in the main page you should be fine either way.
I am surprised if a modern browser like Mobile Safari doesn't understand it, though.
Is it possible your server might be serving .js files with a bad Content-Type header containing a wrong charset? A combination of unset mime-types for JS plus AddDefaultCharset in Apache could leave you with:
Content-Type: text/plain;charset=iso-8859-1
Which might maybe have the effect of mucking it up.
Make sure you save the javascript file using UTF-8 encoding. If you open the file in Notepad++, then you can click Format>Encode in UTF-8 (If you try Format>Convert to UTF-8, then have a look at the page using a hex editor. Sometimes you end up with some strange characters at the beginning of the file).