Javascript Html Swedish characters strange behavior

Javascript Html Swedish characters strange behavior - javascript

I would like to ask a question regarding a strange behavior I face using the escape Ascii characters for some Swedish chars.
More specifically, in order to support a multilingual site, I have a json file where I have specified all required messages in Swedish, i.e. 'Avancerad sök'.
Then when the page loads the first time, I set this value to an input text and it is displayed properly: 'Avancerad sök'. But when I click a button and set again the value of this input text I get: 'Avancerad sök'.
Does anyone have faced a similar problem?
Thanks a lot!
Code:
q('#keyword').val(qLanguage.qAdvancedHint);
I execute this code both times. qLanguage is an object which I fill it from the json file and qAdvancedHint a specific key.

Don't know have the specific encoding is called. But tested with js's unescape method, but didn't work.
However a solution, a bad/ugly one, could be to ask jQuery to parse it for you then add it as a value property:
var text = $("<span/>").html(qLanguage.qAdvancedHint).text();
q('#keyword').val(text);

Related

What is the right way to safely and accurately insert user-provided URL data into an HTML5 document?

Given an arbitrary customer input in a web form for a URL, I want to generate a new HTML document containing that URL within an href. My question is how am I supposed to protect that URL within my HTML.
What should be rendered into the HTML for the following URLs that are entered by an unknown end user:
http://example.com/?file=some_19%affordable.txt
http://example.com/url?source=web&last="f o o"&bar=<
https://www.google.com/url?source=web&sqi=2&url=https%3A%2F%2Ftwitter.com%2F%3Flang%3Den&last=%22foo%22
If we assume that the URLs are already uri-encoded, which I think is reasonable if they are copying it from a URL bar, then simply passing it to attr() produces a valid URL and document that passes the Nu HTML checker at validator.w3.org/nu.
To see it in action, we set up a JS fiddle at https://jsfiddle.net/kamelkev/w8ygpcsz/2/ where replacing the URLs in there with the examples above can show what is happening.
For future reference, this consists of an HTML snippet
<a>My Link</a>
and this JS:
$(document).ready(function() {
$('a').attr('href', 'http://example.com/request.html?data=>');
$('a').attr('href2', 'http://example.com/request.html?data=<');
alert($('a').get(0).outerHTML);
});
So with URL 1, it is not possible to tell if it is URI encoded or not by looking at it mechanically. You can surmise based on your human knowledge that it is not, and is referring to a file named some_19%affordable.txt. When run through the fiddle, it produces
My Link
Which passes the HTML5 validator no problem. It likely is not what the user intended though.
The second URL is clearly not URI encoded. The question becomes what is the right thing to put into the HTML to prevent HTML parsing problems.
Running it thru the fiddle, Safari 10 produces this:
My Link
and pretty much every other browser produces this:
My Link
Neither of these passes the validator. Three complaints are possible: the literal double quote (from un-escaping HTML), the spaces, or the trailing < character (also from un-escaping HTML). It just shows you the first of these it finds. This is clearly not valid HTML.
Two ways to try to fix this are a) html-escape the URL before giving it to attr(). This however results in every & becoming & and the entities such as & and < become double-escaped by attr(), and the URL in the document is entirely inaccurate. It looks like this:
My Link
The other is to URI-encode it before passing to attr(), which does result in a proper validating URL which actually clicks to the intended destination. It looks like this:
My Link
Finally, for the third URL, which is properly URI encoded, the proper HTML that validates does come out.
My Link
and it does what the user would expect to happen when clicked.
Based on this, the algorithm should be:
if url is encoded then
pass as-is to attr()
else
pass encodeURI(url) to attr()
however, the "is encoded" test seems to be impossible to detect in the affirmative based on these two prior discussions (indeed, see example URL 1):
How to find out if string has already been URL encoded?
How to know if a URL is decoded/encoded?
If we bypass the attr() method and forcibly insert the HTML-escaped version of example URL 2 into the document structure, it would look like this:
My Link
Which seemingly looks like valid HTML, yet fails the HTML5 validator because it unescapes to have invalid URL characters. The browsers, however, don't seem to mind it. Unfortunately, if you do any other manipulation of the object, the browser will re-escape all the &'s anyway.
As you can see, this is all very confusing. This is the first time we're using the browser itself to generate the HTML, and we are not sure if we are getting it right. Previously, we did it server side using templates, and only did the HTML-escape filter.
What is the right way to safely and accurately insert user-provided
URL data into an HTML5 document (using JavaScript)?

If you can assume the URL is either encoded or not encoded, you may be able to get away with something along the lines of this. Try to decode the URL, treat an error as the URL not being encoded and you should be left with a decoded URL.
<script>
var inputurl = 'http://example.com/?file=some_19%affordable.txt';
var myurl;
try {
myurl = decodeURI(inputurl);
}
catch(error) {
myurl = inputurl;
}
console.log(myurl);
</script>

Replace / restrict non-standard characters in CKEDITOR

I have a CKEDITOR instance (version 4.5.7) into which users input content. This content posts to a database field with the collation SQL_Latin1_General_CP1_CI_AS.
The problem comes when a user pastes text from Word or a similar rich-text editor. Two characters in particular get malformed when they hit the database: ” (”) and – (–).
I have already set config.entities to false to prevent the characters from being converted into their HTML equivalents. Now I'm looking for a place where I can intercept the process to find/replace any offending characters. Although the javascript for this sort of thing is easy enough ( text = text.replace('”', '"') ), I'm not sure where to put it in order to make this happen. I've tried placing it in various places within the CKEDITOR.htmlParser.basicWriter function, but nothing so far has worked.
This seems like it would be a fairly common problem - is there perhaps a way to set collation on the editor so it matches the database?
Thank you for any advice.

I kept plunking away in the basicWriter function until eventually I was surprised to find one place that actually does work. Basically, this is the process I used to solve this problem without editing ckeditor.js
Download and open an uncompressed version of the ckeditor.js file.
Locate and copy the entire CKEDITOR.htmlParser.basicWriter function into the bottom of your config.js file. This basically redefines the function, overriding the real one but allowing us to make customizations to it without necessarily breaking future updates.
In the copied function within config.js, locate the getHtml section and customize the html variable before it gets returned. Below is a template to help you locate this section
getHtml: function( reset ) {
var html = this._.output.join( '' );
// this is where we can replace individual characters or make other
// customizations
html = html.replace('”', '"');
html = html.replace('–', '-');
if ( reset )
this.reset();
return html;
}

Javascript DOM nodes with special characters in the name

UPDATE: OK, this turns out to be an encoding/decoding problem. In the actual SOURCE HTML, the ID is rendered as follows:
<input type="text" id="addInput0436%2E20_0" name="quantity" value="1" />
And the Javascript to reference it is rendered as follows:
javascript:app.catalog.product.updateAvailability('availabilityContainer0436%2E20_0','0436%2E20_0','PFC','0436.20', dojo.attr(dojox.html.entities.decode('addInput0436%2E20_0'), 'value'));
However, by the time the Javascript call gets to the dojo.attr() routine, the encoded value of addInput0436%2E20_0 has helpfully been decoded to addInput0436.20_0 which of course no longer matches.
So I need to either force the JS to not decode this string, or force HTML/Tomcat to not encode the HTML ID. Ugh.
ORIGINAL POST:
I'm trying to identify a problem with the following line of code:
document.getElementById('addInput0436.20_0')
This should return the DOM node with that ID (which does exist in the document) but instead returns null.
I suspect that it is the special character in the node name, but I'm not sure how to fix it. Anyone run into this before?

I'm posting this answer in case it helps someone else:
We finally resolved this by re-encoding the (previously decoded) tag ID inside the Javascript routine that needed it. We were not able to find any other way around this behavior.
Bottom line: Don't use URL-encoded strings as tag ID names. Doing so seems to be asking for trouble.

Can something help me to see how to deal with single quote escaping in the following scenario

We write js programs for clients which allow them to craft the display text. Here is what we did
We have a raw js file which replaced those strings with tokens, for example
month = [_MonthToken_];
name = '_NameToken_';
and have a xml file to allow user to specify the text like
<xml>
<token name="MonthToken">'Jan','Feb','March'</token>
<token name="NameToken">Alice</token>
</xml>
and have a generator to replace the token with the text and generate the final js file.
month = ['Jan','Feb','March'];
name = 'Alice';
However, I found there is a bug in this scenario. When somebody specifies the name to be "D'Angelo" (for example.) the js will run into a error because the name variable will become
name='D'Angelo'
We have thought of several ways to fix the problem but none of which are perfect.
We may ask our clients to escape the characters, may it seems not appropriate given that they may not know js and there are more cases to escape (", ), which could make them unhappy :|
We also think of changing the generator to escape ', but sometimes the text may be replacing an array, the single quote there should not be escaped. (there are other cases, we may detect it case by case, but it is tedious)
We may have done something wrong for the whole scenario/architecture. but we don't want to change that unless we have confirmed that it is definitely necessary.
So, is there any solution? I will look into every ideas. Thank you in advanced!
(I may also need a better title :P)

I think your xml schema is poor designed, and this is the root cause of your problems.
Basically, you are forcing the author of the xml to put Javascript code inside of the name="MonthToken" element, while you pretend that she can do this without Javascript syntax knowledgement. I guess that you are planning to use eval on the parsed element content to build month and name variables.
The problem you discovered it's not the only one: you also are subject to Javascript code injection: what if a user forge an element such as:
<token name="MonthToken">alert('put some evil instruction here')</token>
I would suggest to change the xml schema in this way:
<xml>
<token name="MonthToken">Jan</token>
<token name="MonthToken">Feb</token>
<token name="MonthToken">March</token>
<token name="NameToken">Alice</token>
</xml>
Then in your generator, you'll have to parse each MonthToken element content, and add it to the month array. Do the same for the name variable.
In this way:
You don't use eval, and so you have no possibility of code injection
Your user doesn't no more have to know how to quote month names
You automatically handle quotes or apostrophe in names, because you are not using them as js code.
If you want month variable to become a string when user enter just a month, then simply transform the variable: with something similar to this:
if (month.length == 1) {
month = month[0];
}

unterminated string literal error in salesforce due to data in multiple lines

I am using a standard page on which there is a button in which JavaScript code is written. It is calling a Controller (Class).
When I ever used click on the button,It gives me error:
A problem with the OnClick JavaScript for this button or link was encountered:
unterminated string literal.
Javascript:
try
{
alert('hi1');
}
catch(Err)
{
alert('Error in creation'+Err);
}
After searching and done some hit and trial,used a simple alert code above.
I found that. whenever i used data where there is a new line space between them i.e. data in multiple times.This error encounters no matter if you are using that particular field in JavaScript code and class or not.
I found similar problem here: unterminated string literal error in salesforce
but solution is not specified in above link.

Try wrapping SUBSTITUTE() or even URLENCODE() around the value of your long text field, then decode from JS?
Back in S-Control days I've been cheating by putting the long text values in <p id="myId" style="visibility:hidden">{!mergefield}</p> and then referencing them with getElementById...
If you're really calling your controller from JS then probably either you have control over newlines (replace them with something?) or you don't need to display whole error msg. Maybe err.details? Also - would help to use proper JS debugging tool...

We Keep Coding

JavaScript is the programming language of the Web.

Javascript Html Swedish characters strange behavior - javascript

Related

What is the right way to safely and accurately insert user-provided URL data into an HTML5 document?

Replace / restrict non-standard characters in CKEDITOR

Javascript DOM nodes with special characters in the name

Can something help me to see how to deal with single quote escaping in the following scenario

unterminated string literal error in salesforce due to data in multiple lines

Categories

Resources