Chinese/japanese characters in a search box and form - javascript

Why is it that when I use Firefox to enter: 漢, the GET will transform to:
q=%E6%BC%A2&start=0
However, when I use IE8 and I type the same chinese character, the GET is:
q=?&start=0
It turns it into a question mark.

Mark the encoding of the page as UTF-8 and this problem will go away. Firefox will fail to autodetect your encoding without this hint sometimes, too. And you may have manually changed the encoding in IE once, so that becomes the new default for unmarked pages.
put this in your <HEAD>:
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
If your content isn't really in UTF-8, then you'll need to use an alternate method. There's an html attribute on FORM that hints to IE that you want non-ANSI codepage characters to be sent as UTF-8, but it's far nicer to just use the correct content type.
Also, the address bar may not be the best place to look at the resulting text, as the last time I checked, it didn't reliably work with non-ACP characters. Make sure you're looking at the actual request data.
If you're talking about entering text into the address bar or search box in the browser, and not a specific web page, I don't reproduce this problem on English Windows 7. Perhaps you're using a very old version of Windows and your system ANSI code page does not contain that character; Win95/Win98/WinME would certainly have that problem.
Edited to add:
In IE 8, entering the character you specified on a page containing this content works exactly as expected for me. I've verified this with Fiddler. Whatever problem you are having is probably different than what you have described so far.
<HTML>
<HEAD>
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
</HEAD>
<BODY>
<form accept-charset="utf-8" method="get" action="http://www.example.com/something">
<input type="text" name="q">
<input type="submit">
</form>
</BODY>
</HTML>
You actually don't need the accept-charset unless you are using an alternate encoding for the page itself. But I am leaving it in for illustrative purposes. For it to be actually useful, at least in earlier versions of IE (things may have changed; a colleague of mine specified the behavior back in IE5 or so), you need a hidden "_charset_" field with no value to encourage the browser to mark what charset it actually used, but that's superfluous in a utf-8 page).

It can either be font installation or URL encoding issue
One of main issue which I have seen when dealing with CJK characters is the installation of East Asian Language fonts not done by default when OS is installed. These characters show up properly in MS Word even without installation being done.
To make sure all applications in OS can deal with CJK (Chinese, Japanese and Korean), doing the below exercise is better
Go To Control Panel
Select Regional And Language Options
Go to language tab
Select checkbox to install fonts for East Asian Languages
Hopefully you have the windows CD with you to proceed with this.
After that IE8 hopefully would show characters properly.
Also in case you are doing any url encoding make sure you always use UTF-8 as the character encoding when dealing with non ASCII characters.

To begin with, IE believes that Chinese characters can be sent 'as is' in UTF-8, while Firefox thinks they need to be URL-encoded.
Have you watched the GET request on the wire? I bet that it's really a three-byte sequence and that the tool you are using to display it is reducing it to a ?.

Related

Why is Unicode allowed in tag names?

On this webpage, one of 1000s that I am scanning, I found a tag name with Unicode 0x97 in it.
It uses
<!?~V[if lt IE 7]>
which contains 0xc2 0x96
According to a unicode converter c2 96 is
U+0096 START OF GUARDED AREA
Based on
<meta http-equiv="Content-Type" content="text/html; charset=windows-1251" />
I'd say the encoding is not unicode, it's windows-1251.
The line you're asking about reads:
<!—rating forum -->
That "weird" character is an em dash. My experience with these is that they're usually the result of typing -- (double hyphen) into Microsoft Office Word which then auto-corrects it to an em dash.
It's not valid HTML, but it works in the browser because browsers generally try to fix up broken HTML as best they can. In this case, you have an element that starts with <!, enough to guess that, while not the valid <!--, it's still probably the beginning of an embedded comment.

Cross-browser number input with number keyboard, but without arrow buttons

I am designing a web app. In some places, the user will have to enter a single digit. I would like the input for the digit to meet the following criteria:
I don't want the arrow buttons (a.k.a. "spin buttons") to show up, since I think they are unnecessary clutter.
I want the number keyboard to show up on any device without a physical keyboard, or at least on recent iOS and Android devices. The normal text keyboard is full of unnecessary clutter.
I do not want the validation behavior for number inputs built into recent versions of Firefox (red border, tooltip). I am planning to use my own validation, or maybe do without it.
I would prefer to avoid any browser-specific hacks.
This is what I tried and why my attempts were disappointing:
First, I tried using <input type="number">. The arrow buttons showed up in Firefox on my Mac. I managed to disable Firefox's validation by setting an input event handler on the input with the line this.setCustomValidity(" ").
Next, I tried doing some research and found this question. It mentioned a bunch of stuff that looked like it would only work on Webkit. I didn't actually try it.
Next, I tried using a text input (<input type="text">) with the pattern attribute set to [0-9*]. According to this answer, that should make the number keyboard come up. It did that in iOS, but not in Android.
I know about the inputmode attribute that is meant for this purpose, but it doesn't seem to be supported at all.
Does anyone know of a way to implement a sane cross-browser number input? Is what I am asking for unreasonable?
I had the same problem, but after a lot of trial and error found a very simple work-around: live demo.
The code:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Demo</title>
<style>
input[type="tel"]:invalid {
box-shadow: none; /* 0 doesn't work */
}
</style>
</head>
<body>
<form novalidate>
<input type="tel">
</form>
</body>
</html>
.
There is (just) one imperfection: on Androids, it pulls up the phone-number keypad in stead of the normal numeric keypad. But those are almost identical, and I don't think many users will notice.
The novalidate (for which novalidate="novalidate" is valid syntax as well) plus the CSS declaration might seem redundant, but there is a lot going on in browser world when it comes to form behavior, with significant interbrowser and intrabrowser inconsistencies. Have a look here, for example.
Apparently, that behavior is not yet standardized by the W3C. Therefore, I would choose redundancy over taking chances.

jQuery/javascript not working on all pages

So I'm doodling with a little site with some html/css/javascript experiments so I can learn to be a better web-programmer. I am really a designer, and total n00b at this.
Problem:
I have some javaScript running on multiple pages at my site, and they are – as per usual – in a seperate .js-file. However it only seems to be working on this page:
http://www.carlpapworth.com/htmlove/colors.html
And not on these:
http://www.carlpapworth.com/htmlove/arrows.html
http://www.carlpapworth.com/htmlove/fumbling.html
U see, the big splash with the heart is suposed to be hidden by this function:
$(document).ready(function() {
$('#reward').hide();
$('#goal a').click(function(){
$('#reward').fadeIn(1000);
});
$('.exit').click(function(){
$('#collection1').css('color', '#ff63ff');
});
});
To me, the "Head"-code in all these pages looks exactly the same, so I can't figure out the problem.
Please help!
SOLVED! It was the encoding, that was set to UTF-16! So I just changed it as Jezen Thomas said in Coda! Thanks a million!
This was an interesting question. I tried copying your site to my machine and testing locally, and everything worked just fine. However, I believe I've discovered the source of the problem.
http://validator.w3.org/i18n-checker/check?uri=www.carlpapworth.com%2Fhtmlove%2Ffumbling.html#validate-by-uri+
You're trying to force UTF-8 with your meta tag, <meta charset='UTF-8' />. However, the w3 i18n validator detected that your file also contains a UTF-16LE Byte-Order Mark (BOM).
The w3 has this to say on removing the BOM:
If you have an editor which shows the characters that make up the
UTF-8 signature you may be able to delete them by hand. Chances are,
however, that the BOM is there in the first place because you didn't
see it.
Check whether your editor allows you to specify whether a UTF-8
signature is added or kept during a save. Such an editor provides a
way of removing the signature by simply reading the file in then
saving it out again. For example, if Dreamweaver detects a BOM the
Save As dialogue box will have a check mark alongside the text
"Include Unicode Signature (BOM)". Just uncheck the box and save.
I'm not sure if it'll fix the problem in your case, but I don't like the fact that you've used HTML comments before your doctype declaration. Please move <!DOCTYPE html> to the top of the file. Also, in Coda, go to Text > Encoding and verify that UTF-8 is selected. If you can, show the invisible characters and remove anything that looks suspect.
As Jezen Thomas suggested, it may be an encoding issue. Try re-saving the file as UTF-8.
Check out this topic on SO for more details.

Hebrew characters not shown on an HTML5 template

I've been trying to embed some hebrew characters in Thom Sander's free html5 template (download link).
For example, I've tried to change a left-side menu item text to Hebrew, i.e.,
Home Page => עמוד הבית
For some reason the hebrew characters are not shown at all.
When I add hebrew in other places in the document, it is shown correctly. At first I thought this may be an encoding issue but the head encoding seems to be valid: UTF-8. I think there might be some JS code ignoring the Hebrew text, but I'm not sure.
Any ideas?
Seems like someone already found a solution for that. I didn't try to implement the whole solution but tested it with your files and it works.
You can find the solution here
Basicaly you just need to use CufonRTL.js to be able to use Hebrew & Cufon.js together.
You may find CufonRTL.js at the begining of the blog post or just download the file from here
Then you ll have to load CufonRTL.js and execute something like:
CufonRTL.RTL('#menu a');
So the menu links would support Hebrew while using the Cufon library & custom font.
The reason you cannot embed Hebrew characters into your website is beacuse the template is using the cufon technique, which doesn't support right to left languages.
Planned features:
Support for right-to-left and bi-directional text
However, it looks like there is a way around it:
Using Cufon with Right-To-Left Text
Try adding this rule to the CSS
html { unicode-bidi: embed; }
http://www.w3schools.com/jsref/prop_style_unicodebidi.asp
The unicodeBidi property is used with the direction property to set or
return whether the text should be overridden to support multiple
languages in the same document.
Be sure to use:
<meta http-equiv='Content-Type' content='Type=text/html; charset=utf-8'>
Or (as a new HTML5 standard):
<meta charset='utf-8'>
And try adding this property in your CSS:
unicode-bidi: embed;
You can also try to display something, using HTML Entities instead of Unicode characters: ֑ ֒ #1427;

Title UTF-8 on HTML

I'm having a problem with UTF-8 character on the page title,
I want to add this on the title of the page --> ♫ <--- (the Music symbol)
The thing is, sometimes it works (on Google Chrome) and sometimes it doesn't (when it doesn't work, it appears a square that is supposed to be an error of encoding) weirdly.
And in firefox, it always work when you look to the title in the top of the window (the title that appears up in your page) but the title that appears in the bar below appears the square thing again. :/
What do I do to fix this?
I'm defining the title via Javascript by a js file which I'm using PHP to define it utf-8 as well.
var title = "♫ My Music";
document.title = title;
by the way, it seems to work always on Linux, but on Windows, it does those stuff. =/
Thanks in advance.
This is a font problem rather than an encoding problem; a small rectangle in place of a character typically means that the character is not present in the font(s) used.
Browsers typically use some specific fonts when they render title element contents (somewhere outside the page itself). These fonts may depend on the settings in the operating systems. On Windows 7 for example, the default for them is 9pt Segoe UI, a relatively rich font, whereas older systems have more limited fonts. Anyway, that’s outside an author’s hands.
So the conclusion is that special symbols should be avoided in title elements. Their rendering is not guaranteed, and they probably have no value as far as search engines are considered.
Have you added the meta tag for charset to your HTML?
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
sometimes it works (on Google Chrome) and sometimes it doesn't (when it doesn't work, it appears a square that is supposed to be an error of encoding) weirdly
This is most likely down to the font used by your browser, and your Operating System. If the character is defined in the font, it will show up. If it isn't, it won't.
There isn't much you can do about this, unfortunately - both these things are outside of your control.
Somewhat related: Unicode support in Web standard fonts
I resolve this problem (year in Spanish) with this notation:
title="A & # 241 ; o"
IMPORTANT: I insert blanks spaces between characters for adequate renderization.

Categories