TOMCAT reporting this error: INFO: Character decoding failed

TOMCAT reporting this error: INFO: Character decoding failed - javascript

I have a WEB application consisting of a client (mainly AngularJS, JQuery and Bootstrap), a Servlet (TOMCAT) and a database (MySQL).
The user can enter text in a number of places (sort of free-text form). The client prepares a JSON and sends it towards the servlet who forwards to the DB, and a response JSON is returned all the way towards the client.
I found a mishandling (causing a "Character decoding failure" in the servlet) when special characters are included in the text. Specifically, I copied from MS-Word the text and pasted it into the input fields and the string included some characters that MS-Word automatically replaces (e.g. simple quote sign to a titled one - if you just type "I don't know" the ' is replaced by ’) causing the error.
I tried removing control characters using myString=myString.replace(/[\x00-\x1F\x7F-\x9F]/g, "") but with no success.
Could anyone suggest what is the standard practice to properly handle this condition?
Thanks!!!
EDIT:
Here are the lines where the error is being reported (the JSON is quite large, so I'm only showing the relevant sections):
Jul 30, 2016 11:56:29 AM org.apache.tomcat.util.http.Parameters processParameters
INFO: Character decoding failed. Parameter [request] with value [{...,"Text":"I donֳ¢ֲ€ֲ™t know"..."I donֳ¢ֲ€ֲ™t know"...}] has been ignored. Note that the name and value quoted here may be corrupted due to the failed decoding. Use debug level logging to see the original, non-corrupted values.
Note: further occurrences of Parameter errors will be logged at DEBUG level.

Try to change the encoding of your Tomcat. You can find it in conf/server.xml, the line like this:
<Connector port="8080" URIEncoding="UTF-8"/>

Related

UnparseableJsonResponse for one specific, valid JSON

I have a standard Dialogflow agent, using javascript/node.js webhooks. It works perfectly well in most cases. I have recently encountered a problem which has me at a complete loss. I am currently saving some JSON-objects in conv.data to minimize the external API-calls my webhook have to make. For one specific JSON-object, fetched from an external API using node-fetch, the response I send from my side looks perfectly ordinary. I use firebase and the firebase logs do not show any error messages or any sign that there might be a problem. But I get this error in the Google Actions console:
UnparseableJsonResponse API Version 2: Failed to parse JSON response string with 'INVALID_ARGUMENT' error: "Parsing terminated before end of input. 8,\\"3\\":12},\\"w ^".
And in the stackdriver logs, the received response does not start with the usual
Received response from agent with body: HTTP/1.1 200 OK Server: ... etc
Instead it starts in the middle of the external API-JSON-file
Received response from agent with body: 8,\\"3\\":12},\\"winPercentage\\":1392}}}}, ... etc
This does not happen the first time the agent responds after fetching the JSON from the external API. The second time the agent responds after fetching the JSON, everything crashes regardless of whether the information from the JSON is used by that second call, regardless of anything at all except if the JSON file is overwritten between first and second call. If the file is overwritten the program runs perfectly. So the problem is likely part of storing and/or parsing this specific JSON file. Unfortunately the API I use in this application is not a public one and due to NDAs I cannot give any access to that JSON, so I understand that it is probably impossible for you to help me. I will however give as much information about the JSON as I can, and hope for the best:
It is valid according to https://codebeautify.org/jsonvalidator and jsonlint.com
It is structured the exact same way as other JSON files from the same API which do not crash the application
It is slightly larger that other JSON files from the same API. It has around 340 000 characters, others are around 280-300 000.
All JSONs, this as well as those that work, is from a Swedish company, therefore unusual characters like å, ä and ö are likely present.
The error message is always the same, except the start of the response is in different places in the JSON file. "8,\\"3\\":12}, ...", "ostPosition\\":2 ...", "3804,\\"startPoints\\":2960 ..." are some examples.
I am extremely grateful for any and all help I might receive, even if it's just what questions I need to ask, or where I might try troubleshooting next.

I suspect the problem is that the JSON you're trying to save is larger than the buffer size they allocate for conv.data, although I can't find any documentation to say there is some specific limit.
I'd check to see where the strings you're seeing in the error header are located in the JSON and try to keep it well under that limit.

ANSI vs UTF-8 in web Browser

My requirement is to allow users to use(type) ANSI characters instead of utf-8 when they are typing in to the text fields of my webpages.
I looked at the setting of the character set in html meta tag
<meta charset="ISO-8859-1">
That was helpful to display the content in ANSI instead of UTF-8, but it does not stop users typing in utf-8. Any help is appreciated.

Let's distinguish between two things here: characters the user can type and the encoding used to send this data to the server. These are two separate issues.
A user can type anything they want into a form in their browser. For all intents and purposes these characters have no encoding at this point, they're pure "text"; encodings do not play a role just yet and you cannot restrict the set of available characters with encodings.
Once the user submits the form, the browser will have to encode this data into binary somehow, which is where an encoding comes in. Ultimately the browser decides how to encode the data, but it will choose the encoding specified in the HTTP headers, meta elements and/or accept-charset attribute of the form. The latter should always by the deciding factor, but you'll find buggy behaviour in the real world (*cough*cough*IE*cough*). In practice, all three character set definitions should be identical to not cause any confusion there.
Now, if your user typed in some "exotic" characters and the browser has decided to encode the data in "ANSI" and the chosen encoding cannot represent those exotic characters, then the browser will typically replace those characters with HTML entities. So, even in this case it doesn't restrict the allowed characters, it simply finds a different way to encode them.
How can I know what encoding is used by the user
You cannot. You can only specify which character set you would like to receive and then double check that that's actually what you did receive. If the expectation doesn't match, reject the input (an HTTP 400 Bad Request response may be in order).
If you want to limit the acceptable set of characters a user may input, you need to do this by checking and rejecting characters independent of their encoding. You can do this in Javascript at input time, and will ultimately need to do this on the server again (since browser-side Javascript ultimately has no influence on what can get submitted to the server).

If you set the encoding of the page to UTF-8 in a and/or HTTP header, it will be interpreted as UTF-8, unless the user deliberately goes to the View->Encoding menu and selects a different encoding, overriding the one you specified.
In that case, accept-encoding would have the effect of setting the submission encoding back to UTF-8 in the face of the user messing about with the page encoding. However, this still won't work in IE, due the previous problems discussed with accept-encoding in that browser.
So it's IMO doubtful whether it's worth including accept-charset to fix the case where a non-IE user has deliberately sabotaged the page encoding

Data Validation and Security: From user input to browser output – PHP/MySQL/JavaScript

I am trying to understand the steps I have to follow in order for data to be input and output securely on a website. This is what I understood so far:
**
Procedure
**
1)User inputs data
2)This data is validated using JavaScript. If data doesn’t match the structure you
requested, send an error message.
3)The data is also validated using PHP in case the JavaScript is disabled or not supported by the browser. The PHP validation will almost be identical to the JavaScript one. If data doesn’t match the requested structure, send an error message.
4)Open a connection with the database (PDO method)
5)Check input data against your database using prepared statements (PDO method) and return an error message if required [for example if the data is an email address then we cannot have 2 users with the same email address/ Error message: This email address is already registered. If you are already registered please login or use another email address to register].
6)After all checking is done [client-side (JavaScript) and server-side (PHP)], use prepared statements to insert un-escaped data into the database.
7)When data is requested and must be displayed on the web browser, only then escape (output) data, to prevent XSS.
**
Security
**
A)The PHP script will use session_regenerate_id when there is a change in the level of privilege (from logged in to logged out and via versa) – mitigate session fixation
B)SSL will be used to minimize the exposure of data between the client and the server
C)The form will have a hidden field nesting an anti-CSRF token, that will be checked against the one stored in the session – mitigate CSRF
D)Passwords will be stored after hashing them with Bcrypt hashing algorithm (with a proper salt)
E)(2)+ (3) validation will use Regular Expressions. I understand that, a wrong Regular Expression can cause many errors. Are there any general accepted Regular Expressions for validating email address, passwords, etc?
**
Questions:
**
1)Do I understand the input/output procedure correctly? Am I doing something wrong?
2)I know that security-wise you can never be 100% protected. What else should I do? Is something I write above wrong?
Thanks in advance.

Yes, you understand it all right in general.
That's, as you noted yourself, is an endlessly open topic. There are thousands vectors. Some of them are include injection (never include or read a file taken blindly from user input), eval (avoid this operator as a hot iron), upload injection is alone a wide topic with multiple issues (in short, always verify input data format)
As of regexps - oh, yes. Just try google.

Capacity of textarea

What is the maximum capacity of a textarea that it can accept some text.The html page is working fine when the text limits to about 130-140 words.But when the text exceeds that limit it doesn't do anything(just hangs). This text is passed through javascript for some manipulations and displayed in another textarea. If there is a limit how to make it accept large amount of text?
UPDATE :
I get the following error when I check the error log
request failed: URI too long (longer than 8190)
I am using the following line to pass the text through javascript
xmlhttp.open("GET","./analyze.pl?unk="+str ,true);

The problem isn't with the <textarea>. The problem is that you are creating a URL that is too long.
Submit the data using a POST, not a GET and the problem will go away.
As a general rule: if you have occasion to worry about URL length, you are probably passing too much data via query string parameters. From a REST standpoint, consider that a GET is used to retrieve a resource. A GET should not be used to submit data that will create/update a resource (such as one might do when entering data into a <textarea>).

I use a maxlength of 8000 on one of my sites and it didn't have any problems. Your javascript must be the cause of the problem (guessing infinite/long loops) OR it must be the browser your testing on OR your computer.
It would be best if you show your javascript code.

Greasemonkey communication with server that requires windows-1250 encoding

I'm developing a greasemonkey plugin, which is supposed to send a form in background using POST (GM_xmlhttpRequest) on an application not under my control. That application is written in PHP and seems to expect all its input in windows-1250 encoding. What I need to do is to take all the form fields as they are, edit just one of them and resubmit. Some of the fields use accented characters and are limited in length.
Not a problem in theory - I iterate over all form fields, use the encodeURIComponent function on the values and concatenate everything to a post request body. HOWEVER. The encodeURIComponent function always encodes characters according to UTF-8, which leads to all sorts of problems. Because PHP doesn't seem to recode my request to windows-1250 properly, it misinterprets multibyte strings and comes to the conclusion that the resubmitted values are longer than the allowed 40 characters and dies on me. Or the script just dies silently without giving me any sort of useful feedback.
I have tested this by looking at the POST body firefox is sending when I submit the form in a browser window and then resending the same data to the server using xhr. Which worked. For example the string:
Zajišťujeme profesionální modelky
Looks as follows, when encoded by encodeURIComponent:
Zaji%C5%A1%C5%A5ujeme%20profesion%C3%A1ln%C3%AD%20modelky
Same thing using urlencode in PHP (source text in windows-1250) or Firefox:
Zaji%9A%9Dujeme+profesion%E1ln%ED+modelky
Apparently, I need to encode the post body as if it were in windows-1250 or somehow make the server accept utf-8 (which I doubt is possible). I tried all kinds of other function like escape or encodeURI, but the output is not much different - all seem to output in utf-8.
Is there any way out of this?

Another way to get Firefox to encode a URL is to set it as the href of a link. The property (NOT attribute) will always read back as an absolute link urlencoded in the page's encoding.
For a GET request you would simply set the href as http://server/cgi?var=value and read back the encoded form. For a POST request you would have to take the extra step to separate the data (you can't use ?var=value on its own because the link reads back as an absolute link).

Let the browser encode the form. Put it in a hidden iframe and call submit() on it.

We Keep Coding

JavaScript is the programming language of the Web.