using utf-8 in javascript file - javascript

I'm trying to use ☰ in external javascript file
$('<div />',{
text: '☰',
......
But I couldn't save the file and its saying:
The document's current encoding can not correctly save all of the characters within the document. You may want to change to UTF-8 or an encoding that supports the special characters in this document.
What should I do?

You should convert the file to UTF-8, and then try pasting the character in, again, after it's converted and saved.
Your file could be in one of many, many formats, depending on your editor, but if you're just using a text-editor like Notepad, it's going to cause you problems with things that don't fit happily into ASCII.

Related

How to determine if a JavaScript ArrayBuffer contains string or binary data?

I am building a simple application where users can load any file into the Monaco editor in a web browser.
I'm trying to work out if the file that the user has loaded is text, and therefore editable.
In JavaScript, the library I am using to load returns the loaded file as an ArrayBuffer. Of course I can just convert this to text regardless of whether or not it is text or binary and throw the result into the editor. Presumably binary converted to text will display as garbage in the Monaco editor.
I could also examine the mime type of the loaded file. This would go a long way towards solving the problem, but it means I somehow have to know which mime types are text- I have not been able to find anything that specifies this. Also, it means any file without the correct mime type set would not be editable.
So my question is, is there a way to know if the contents of a JavaScript ArrayBuffer is text or binary data such as an image or executable code, by examining the data itself, rather than referring to mime type?
EDIT:
This question is not a duplicate of questions that are simply asking how to convert an ArrayBuffer to text. Simply converting an ArrayBuffer to text doesn't tell whether nor not this is a file that contains editable text or if it is a binary file. Additional information is needed, such as the magic number suggested in the answers to this question.
You can check the Magic numbers of the ArrayBuffer. Magic numbers are a sort of constants in files buffer that you can check to distinguishing between many file formats
Wikipedia - Magic numbers
This NPM module use that approach. Here's a list of the module's supported file types, you can see that they don't support text types.
For SVG you can use https://github.com/sindresorhus/is-svg.
For CSV you can use https://www.npmjs.com/package/detect-csv, but you can't be sure at 100% like they're saying here
UPDATE: I've writed an article about this which contains more explanations and a little Sandbox

Is it possible to insert a base64 Word file using Office 2013 office api

I have a successfully running script that loads Word files from SharePoint and inserts them into Word 2017 (Office 365 Word local client, not online)
The current scripts reads up the files using Ajax and extracts the base64 file and uses
body.insertFileFromBase64(myBase64, end)
I now need to extend the functionality to support Word 2013 (i.e. use the Office.js instead of the Word JavaScript api). So the code has changed to
Office.context.document.setSelectedDataAsync(file, someCoercionType)
I hoped to be able to use a variant of
Office.context.document.setSelectedDataAsync(myBase64, {coercionType: Office.CoercionType.Ooxml}, function (
But I get an error back "The Format of the specified data object is invalid", which is correct enough as the Office API assumes a base64 file is an image.
Is it possible to convert the Base64 file to XML in JavaScript? (Elsewhere in my code I unzip the docx and extract bookmarks, but only from document.xml which lacks all formatting and images, footers etc.)
Base64 is simply an binary encoding and blissfully unaware of the underlying content type. So if you're source content was OOXML, decoding it would give you that OOXML back. What you cannot do is type conversion. For example, a Base64 encoded JPEG can not be decoded directly into a BMP. To do that you would need to first decode and then convert from JPEG to BMP using some other tool.
If you're seeking to manipulate or extract content an existing document, you may want to look at Aspose.Words. Aspose provides tools that allow you to programmatically work with Word documents (they have similar tools for a flew of other file types as well). Using this, you may be able to extract the OOXML you're looking for so you can then insert it into Word using Office.js.
At the moment, the only Coercion Type that accepts Base64 encoded content is Office.CoercionType.Image.

Working with characters based on their UTF-8 hex codes

I'm working on something that will read a user's text messages and export them to a csv file, which they can then download. The messages are being retrieved from a third-party web interface—I am essentially using js to grab the html of each message and compiling it as needed. The content of each message is added to a variable which, once all message are gathered, is given to a new Blob, which is then downloaded.
The problem I am having is that, in this web interface, emoji are represented as images, rather than characters. Thus, when writing a message containing an emoji to a file, the result is as so:
"Blah blah blah <img height="18px" width="18px" class="emoji adjustedSpriteForMessageDisplay spriteEMOJI sprite-1f612" data-textvalue="%F0%9F%98%92" src="assets/blank.gif">"
Now, from this image, we can get 2 workable values:
The UTF-8 hex value
F09F9892
and the Unicode codepoint (I may be referring to this wrong, I don't know much about encoding).
U+1f612
Now, what I want to do is take either of these values (whichever works better), and write it to the csv file as the character itself. So that, when viewing the csv file in a text editor or what have you, it would appear as
Though I have no idea where to even start with this. Maybe it's as simple as throwing some syntax around the character values, but I haven't been able to get anything from google, because I'm not familiar enough with encoding to know what to Google.
I suggest preprocessing the data as you grab it from the webpage instead of extracting it from the string afterwards.
You can then use decodeURIComponent() to decode the percent-encoded string:
decodeURIComponent('%F0%9F%98%92')
Combine that with jQuery to access the data-textvalue-attribute:
decodeURIComponent($(element).data('textvalue'))
I created a simple example on JSFiddle.
For some reason the emoji doesn't render correctly in the result screen in my browser, but that is a font issue. When looking at the result using a DOM inspector (or copying the text into a different application), the result is shown with a smiley.
CSV file format does not have character encoding information, so Excel usually assumes ASCII.
https://en.wikipedia.org/wiki/Comma-separated_values#General_functionality
Microsoft Excel mangles Diacritics in .csv files?

Javascript character encoding

In an external javascript file I have a function that is used to append text to table cells (within the HTML doc that the javascript file is added to), text that can sometimes have Finnish characters (such as ä). That text is passed as an argument to my function:
content += addTableField(XML, 'Käyttötarkoitus', 'purpose', 255);
The problem is that diacritics such as "ä" get converted to some other bogus characters, such as "�". I see this when viewing the HTML doc in a browser. This is obviously not desirable, and is quite strange as well since the character encoding for the HTML doc is UTF-8.
How can I solve this problem?
Thanks in advance for helping out!
The file that contains content += addTableField(XML, 'Käyttötarkoitus', 'purpose', 255); is not saved in UTF-8 encoding.
I don't know what editor you are using but you can find it in settings or in the save dialog.
Example:
If you can't get this to work you could always write out the literal code points in javascript:
content += addTableField(XML, 'K\u00E4ytt\u00f6tarkoitus', 'purpose', 255);
credit: triplee
To check out the character encoding announced by a server, you can use Firebug (in the Info menu, there’s a command for viewing HTTP headers). Alternatively, you can use online services like Web-Sniffer.
If the headers for the external .js file specify a charset parameter, you need to use that encoding, unless you can change the relevant server settings (perhaps a .htaccess file).
If they lack a charset parameter, you can specify the encoding in the script element, e.g. <script src="foo.js" charset="utf-8">.
The declared encoding should of course match the actual encoding, which you can normally select when you save a file (using “Save As” command if needed).
The character encoding of the HTML file / doc does not matter any external ressource.
You will need to deliver the script file with UTF8 character encoding. If it was saved as such, your server config is bogus.

Change encoding from UTF-8 to ISO-8859-2 in Javascript

I would like to change string encoding from UTF-8 to ISO-8859-2 in Javascript. How can I do it?
I need it because I've designed a widget. User just copies < script > tag from my site and puts it on his. This script creates div and puts into div widget contents with text.
If target website is in UTF-8 encoding - it works fine. But when it is in ISO-8859-2 than text that is encoded in UTF-8 is displayed on site with ISO-8859-2 and as a result I see trash.
Instead of using e.g. "ĉ" in your JavaScript code, use Unicode escapes such as "\u0109".
If you're in control of the output, you can replace all special characters with unicode escapes (e.g. \u00e4 for ä). The browser can interpret it correctly regardless of document encoding.
The easiest way to do this would be to put the string into a JSON encoder. Both PHP's and Ruby's does that. Don't know about other implementations though.
Another solution that might work is to add charset="utf-8" to the <script> tag.
I suppose you just need to convert your wdiget from UTF-8 to ISO-8859-2 and provide 2 versions of script.

Categories