how do you do html encode using javascript? [duplicate] - javascript

I’m using JavaScript to pull a value out from a hidden field and display it in a textbox. The value in the hidden field is encoded.
For example,
<input id='hiddenId' type='hidden' value='chalk & cheese' />
gets pulled into
<input type='text' value='chalk & cheese' />
via some jQuery to get the value from the hidden field (it’s at this point that I lose the encoding):
$('#hiddenId').attr('value')
The problem is that when I read chalk & cheese from the hidden field, JavaScript seems to lose the encoding. I do not want the value to be chalk & cheese. I want the literal amp; to be retained.
Is there a JavaScript library or a jQuery method that will HTML-encode a string?

EDIT: This answer was posted a long ago, and the htmlDecode function introduced a XSS vulnerability. It has been modified changing the temporary element from a div to a textarea reducing the XSS chance. But nowadays, I would encourage you to use the DOMParser API as suggested in other anwswer.
I use these functions:
function htmlEncode(value){
// Create a in-memory element, set its inner text (which is automatically encoded)
// Then grab the encoded contents back out. The element never exists on the DOM.
return $('<textarea/>').text(value).html();
}
function htmlDecode(value){
return $('<textarea/>').html(value).text();
}
Basically a textarea element is created in memory, but it is never appended to the document.
On the htmlEncode function I set the innerText of the element, and retrieve the encoded innerHTML; on the htmlDecode function I set the innerHTML value of the element and the innerText is retrieved.
Check a running example here.

The jQuery trick doesn't encode quote marks and in IE it will strip your whitespace.
Based on the escape templatetag in Django, which I guess is heavily used/tested already, I made this function which does what's needed.
It's arguably simpler (and possibly faster) than any of the workarounds for the whitespace-stripping issue - and it encodes quote marks, which is essential if you're going to use the result inside an attribute value for example.
function htmlEscape(str) {
return str
.replace(/&/g, '&')
.replace(/"/g, '"')
.replace(/'/g, ''')
.replace(/</g, '<')
.replace(/>/g, '>');
}
// I needed the opposite function today, so adding here too:
function htmlUnescape(str){
return str
.replace(/"/g, '"')
.replace(/'/g, "'")
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/&/g, '&');
}
Update 2013-06-17:
In the search for the fastest escaping I have found this implementation of a replaceAll method:
http://dumpsite.com/forum/index.php?topic=4.msg29#msg29
(also referenced here: Fastest method to replace all instances of a character in a string)
Some performance results here:
http://jsperf.com/htmlencoderegex/25
It gives identical result string to the builtin replace chains above. I'd be very happy if someone could explain why it's faster!?
Update 2015-03-04:
I just noticed that AngularJS are using exactly the method above:
https://github.com/angular/angular.js/blob/v1.3.14/src/ngSanitize/sanitize.js#L435
They add a couple of refinements - they appear to be handling an obscure Unicode issue as well as converting all non-alphanumeric characters to entities. I was under the impression the latter was not necessary as long as you have an UTF8 charset specified for your document.
I will note that (4 years later) Django still does not do either of these things, so I'm not sure how important they are:
https://github.com/django/django/blob/1.8b1/django/utils/html.py#L44
Update 2016-04-06:
You may also wish to escape forward-slash /. This is not required for correct HTML encoding, however it is recommended by OWASP as an anti-XSS safety measure. (thanks to #JNF for suggesting this in comments)
.replace(/\//g, '/');

Here's a non-jQuery version that is considerably faster than both the jQuery .html() version and the .replace() version. This preserves all whitespace, but like the jQuery version, doesn't handle quotes.
function htmlEncode( html ) {
return document.createElement( 'a' ).appendChild(
document.createTextNode( html ) ).parentNode.innerHTML;
};
Speed: http://jsperf.com/htmlencoderegex/17
Demo:
Output:
Script:
function htmlEncode( html ) {
return document.createElement( 'a' ).appendChild(
document.createTextNode( html ) ).parentNode.innerHTML;
};
function htmlDecode( html ) {
var a = document.createElement( 'a' ); a.innerHTML = html;
return a.textContent;
};
document.getElementById( 'text' ).value = htmlEncode( document.getElementById( 'hidden' ).value );
//sanity check
var html = '<div> & hello</div>';
document.getElementById( 'same' ).textContent =
'html === htmlDecode( htmlEncode( html ) ): '
+ ( html === htmlDecode( htmlEncode( html ) ) );
HTML:
<input id="hidden" type="hidden" value="chalk & cheese" />
<input id="text" value="" />
<div id="same"></div>

I know this is an old one, but I wanted to post a variation of the accepted answer that will work in IE without removing lines:
function multiLineHtmlEncode(value) {
var lines = value.split(/\r\n|\r|\n/);
for (var i = 0; i < lines.length; i++) {
lines[i] = htmlEncode(lines[i]);
}
return lines.join('\r\n');
}
function htmlEncode(value) {
return $('<div/>').text(value).html();
}

Underscore provides _.escape() and _.unescape() methods that do this.
> _.unescape( "chalk & cheese" );
"chalk & cheese"
> _.escape( "chalk & cheese" );
"chalk & cheese"

Good answer. Note that if the value to encode is undefined or null with jQuery 1.4.2 you might get errors such as:
jQuery("<div/>").text(value).html is not a function
OR
Uncaught TypeError: Object has no method 'html'
The solution is to modify the function to check for an actual value:
function htmlEncode(value){
if (value) {
return jQuery('<div/>').text(value).html();
} else {
return '';
}
}

For those who prefer plain javascript, here is the method I have used successfully:
function escapeHTML (str)
{
var div = document.createElement('div');
var text = document.createTextNode(str);
div.appendChild(text);
return div.innerHTML;
}

FWIW, the encoding is not being lost. The encoding is used by the markup parser (browser) during the page load. Once the source is read and parsed and the browser has the DOM loaded into memory, the encoding has been parsed into what it represents. So by the time your JS is execute to read anything in memory, the char it gets is what the encoding represented.
I may be operating strictly on semantics here, but I wanted you to understand the purpose of encoding. The word "lost" makes it sound like something isn't working like it should.

Faster without Jquery. You can encode every character in your string:
function encode(e){return e.replace(/[^]/g,function(e){return"&#"+e.charCodeAt(0)+";"})}
Or just target the main characters to worry about (&, inebreaks, <, >, " and ') like:
function encode(r){
return r.replace(/[\x26\x0A\<>'"]/g,function(r){return"&#"+r.charCodeAt(0)+";"})
}
test.value=encode('Encode HTML entities!\n\n"Safe" escape <script id=\'\'> & useful in <pre> tags!');
testing.innerHTML=test.value;
/*************
* \x26 is &ampersand (it has to be first),
* \x0A is newline,
*************/
<textarea id=test rows="9" cols="55"></textarea>
<div id="testing">www.WHAK.com</div>

Prototype has it built-in the String class. So if you are using/plan to use Prototype, it does something like:
'<div class="article">This is an article</div>'.escapeHTML();
// -> "<div class="article">This is an article</div>"

Here is a simple javascript solution. It extends String object with a method "HTMLEncode" which can be used on an object without parameter, or with a parameter.
String.prototype.HTMLEncode = function(str) {
var result = "";
var str = (arguments.length===1) ? str : this;
for(var i=0; i<str.length; i++) {
var chrcode = str.charCodeAt(i);
result+=(chrcode>128) ? "&#"+chrcode+";" : str.substr(i,1)
}
return result;
}
// TEST
console.log("stetaewteaw æø".HTMLEncode());
console.log("stetaewteaw æø".HTMLEncode("æåøåæå"))
I have made a gist "HTMLEncode method for javascript".

Based on angular's sanitize... (es6 module syntax)
// ref: https://github.com/angular/angular.js/blob/v1.3.14/src/ngSanitize/sanitize.js
const SURROGATE_PAIR_REGEXP = /[\uD800-\uDBFF][\uDC00-\uDFFF]/g;
const NON_ALPHANUMERIC_REGEXP = /([^\#-~| |!])/g;
const decodeElem = document.createElement('pre');
/**
* Decodes html encoded text, so that the actual string may
* be used.
* #param value
* #returns {string} decoded text
*/
export function decode(value) {
if (!value) return '';
decodeElem.innerHTML = value.replace(/</g, '<');
return decodeElem.textContent;
}
/**
* Encodes all potentially dangerous characters, so that the
* resulting string can be safely inserted into attribute or
* element text.
* #param value
* #returns {string} encoded text
*/
export function encode(value) {
if (value === null || value === undefined) return '';
return String(value).
replace(/&/g, '&').
replace(SURROGATE_PAIR_REGEXP, value => {
var hi = value.charCodeAt(0);
var low = value.charCodeAt(1);
return '&#' + (((hi - 0xD800) * 0x400) + (low - 0xDC00) + 0x10000) + ';';
}).
replace(NON_ALPHANUMERIC_REGEXP, value => {
return '&#' + value.charCodeAt(0) + ';';
}).
replace(/</g, '<').
replace(/>/g, '>');
}
export default {encode,decode};

My pure-JS function:
/**
* HTML entities encode
*
* #param {string} str Input text
* #return {string} Filtered text
*/
function htmlencode (str){
var div = document.createElement('div');
div.appendChild(document.createTextNode(str));
return div.innerHTML;
}
JavaScript HTML Entities Encode & Decode

As far as I know there isn't any straight forward HTML Encode/Decode method in javascript.
However, what you can do, is to use JS to create an arbitrary element, set its inner text, then read it using innerHTML.
Let's say, with jQuery, this should work:
var helper = $('chalk & cheese').hide().appendTo('body');
var htmled = helper.html();
helper.remove();
Or something along these lines.

You shouldn't have to escape/encode values in order to shuttle them from one input field to another.
<form>
<input id="button" type="button" value="Click me">
<input type="hidden" id="hiddenId" name="hiddenId" value="I like cheese">
<input type="text" id="output" name="output">
</form>
<script>
$(document).ready(function(e) {
$('#button').click(function(e) {
$('#output').val($('#hiddenId').val());
});
});
</script>
JS doesn't go inserting raw HTML or anything; it just tells the DOM to set the value property (or attribute; not sure). Either way, the DOM handles any encoding issues for you. Unless you're doing something odd like using document.write or eval, HTML-encoding will be effectively transparent.
If you're talking about generating a new textbox to hold the result...it's still as easy. Just pass the static part of the HTML to jQuery, and then set the rest of the properties/attributes on the object it returns to you.
$box = $('<input type="text" name="whatever">').val($('#hiddenId').val());

I had a similar problem and solve it using the function encodeURIComponent from JavaScript (documentation)
For example, in your case if you use:
<input id='hiddenId' type='hidden' value='chalk & cheese' />
and
encodeURIComponent($('#hiddenId').attr('value'))
you will get chalk%20%26%20cheese. Even spaces are kept.
In my case, I had to encode one backslash and this code works perfectly
encodeURIComponent('name/surname')
and I got name%2Fsurname

Here's a little bit that emulates the Server.HTMLEncode function from Microsoft's ASP, written in pure JavaScript:
function htmlEncode(s) {
var ntable = {
"&": "amp",
"<": "lt",
">": "gt",
"\"": "quot"
};
s = s.replace(/[&<>"]/g, function(ch) {
return "&" + ntable[ch] + ";";
})
s = s.replace(/[^ -\x7e]/g, function(ch) {
return "&#" + ch.charCodeAt(0).toString() + ";";
});
return s;
}
The result does not encode apostrophes, but encodes the other HTML specials and any character outside the 0x20-0x7e range.

If you want to use jQuery. I found this:
http://www.jquerysdk.com/api/jQuery.htmlspecialchars
(part of jquery.string plugin offered by jQuery SDK)
The problem with Prototype I believe is that it extends base objects in JavaScript and will be incompatible with any jQuery you may have used. Of course, if you are already using Prototype and not jQuery, it won't be a problem.
EDIT: Also there is this, which is a port of Prototype's string utilities for jQuery:
http://stilldesigning.com/dotstring/

var htmlEnDeCode = (function() {
var charToEntityRegex,
entityToCharRegex,
charToEntity,
entityToChar;
function resetCharacterEntities() {
charToEntity = {};
entityToChar = {};
// add the default set
addCharacterEntities({
'&' : '&',
'>' : '>',
'<' : '<',
'"' : '"',
''' : "'"
});
}
function addCharacterEntities(newEntities) {
var charKeys = [],
entityKeys = [],
key, echar;
for (key in newEntities) {
echar = newEntities[key];
entityToChar[key] = echar;
charToEntity[echar] = key;
charKeys.push(echar);
entityKeys.push(key);
}
charToEntityRegex = new RegExp('(' + charKeys.join('|') + ')', 'g');
entityToCharRegex = new RegExp('(' + entityKeys.join('|') + '|&#[0-9]{1,5};' + ')', 'g');
}
function htmlEncode(value){
var htmlEncodeReplaceFn = function(match, capture) {
return charToEntity[capture];
};
return (!value) ? value : String(value).replace(charToEntityRegex, htmlEncodeReplaceFn);
}
function htmlDecode(value) {
var htmlDecodeReplaceFn = function(match, capture) {
return (capture in entityToChar) ? entityToChar[capture] : String.fromCharCode(parseInt(capture.substr(2), 10));
};
return (!value) ? value : String(value).replace(entityToCharRegex, htmlDecodeReplaceFn);
}
resetCharacterEntities();
return {
htmlEncode: htmlEncode,
htmlDecode: htmlDecode
};
})();
This is from ExtJS source code.

<script>
String.prototype.htmlEncode = function () {
return String(this)
.replace(/&/g, '&')
.replace(/"/g, '"')
.replace(/'/g, ''')
.replace(/</g, '<')
.replace(/>/g, '>');
}
var aString = '<script>alert("I hack your site")</script>';
console.log(aString.htmlEncode());
</script>
Will output: <script>alert("I hack your site")</script>
.htmlEncode() will be accessible on all strings once defined.

HtmlEncodes the given value
var htmlEncodeContainer = $('<div />');
function htmlEncode(value) {
if (value) {
return htmlEncodeContainer.text(value).html();
} else {
return '';
}
}

I ran into some issues with backslash in my Domain\User string.
I added this to the other escapes from Anentropic's answer
.replace(/\\/g, '\')
Which I found here:
How to escape backslash in JavaScript?

Picking what escapeHTML() is doing in the prototype.js
Adding this script helps you escapeHTML:
String.prototype.escapeHTML = function() {
return this.replace(/&/g,'&').replace(/</g,'<').replace(/>/g,'>')
}
now you can call escapeHTML method on strings in your script, like:
var escapedString = "<h1>this is HTML</h1>".escapeHTML();
// gives: "<h1>this is HTML</h1>"
Hope it helps anyone looking for a simple solution without having to include the entire prototype.js

Using some of the other answers here I made a version that replaces all the pertinent characters in one pass irrespective of the number of distinct encoded characters (only one call to replace()) so will be faster for larger strings.
It doesn't rely on the DOM API to exist or on other libraries.
window.encodeHTML = (function() {
function escapeRegex(s) {
return s.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
}
var encodings = {
'&' : '&',
'"' : '"',
'\'' : ''',
'<' : '<',
'>' : '>',
'\\' : '/'
};
function encode(what) { return encodings[what]; };
var specialChars = new RegExp('[' +
escapeRegex(Object.keys(encodings).join('')) +
']', 'g');
return function(text) { return text.replace(specialChars, encode); };
})();
Having ran that once, you can now call
encodeHTML('<>&"\'')
To get <>&"'

function encodeHTML(str) {
return document.createElement("a").appendChild(
document.createTextNode(str)).parentNode.innerHTML;
};
function decodeHTML(str) {
var element = document.createElement("a");
element.innerHTML = str;
return element.textContent;
};
var str = "<"
var enc = encodeHTML(str);
var dec = decodeHTML(enc);
console.log("str: " + str, "\nenc: " + enc, "\ndec: " + dec);

Necromancing.
There's certainly no jQuery required for that !
Here a JavaScript port from System.Web.HttpUtility (C# - disclaimer: not very tested):
"use strict";
function htmlDecode(s) {
if (s == null)
return null;
if (s.length == 0)
return "";
if (s.indexOf('&') == -1)
return s;
function isDigit(str) {
return /^\d+$/.test(str);
}
function isHexDigit(str) {
return /[0-9A-Fa-f]{6}/g.test(str);
}
function initEntities() {
var entities = {};
entities["nbsp"] = '\u00A0';
entities["iexcl"] = '\u00A1';
entities["cent"] = '\u00A2';
entities["pound"] = '\u00A3';
entities["curren"] = '\u00A4';
entities["yen"] = '\u00A5';
entities["brvbar"] = '\u00A6';
entities["sect"] = '\u00A7';
entities["uml"] = '\u00A8';
entities["copy"] = '\u00A9';
entities["ordf"] = '\u00AA';
entities["laquo"] = '\u00AB';
entities["not"] = '\u00AC';
entities["shy"] = '\u00AD';
entities["reg"] = '\u00AE';
entities["macr"] = '\u00AF';
entities["deg"] = '\u00B0';
entities["plusmn"] = '\u00B1';
entities["sup2"] = '\u00B2';
entities["sup3"] = '\u00B3';
entities["acute"] = '\u00B4';
entities["micro"] = '\u00B5';
entities["para"] = '\u00B6';
entities["middot"] = '\u00B7';
entities["cedil"] = '\u00B8';
entities["sup1"] = '\u00B9';
entities["ordm"] = '\u00BA';
entities["raquo"] = '\u00BB';
entities["frac14"] = '\u00BC';
entities["frac12"] = '\u00BD';
entities["frac34"] = '\u00BE';
entities["iquest"] = '\u00BF';
entities["Agrave"] = '\u00C0';
entities["Aacute"] = '\u00C1';
entities["Acirc"] = '\u00C2';
entities["Atilde"] = '\u00C3';
entities["Auml"] = '\u00C4';
entities["Aring"] = '\u00C5';
entities["AElig"] = '\u00C6';
entities["Ccedil"] = '\u00C7';
entities["Egrave"] = '\u00C8';
entities["Eacute"] = '\u00C9';
entities["Ecirc"] = '\u00CA';
entities["Euml"] = '\u00CB';
entities["Igrave"] = '\u00CC';
entities["Iacute"] = '\u00CD';
entities["Icirc"] = '\u00CE';
entities["Iuml"] = '\u00CF';
entities["ETH"] = '\u00D0';
entities["Ntilde"] = '\u00D1';
entities["Ograve"] = '\u00D2';
entities["Oacute"] = '\u00D3';
entities["Ocirc"] = '\u00D4';
entities["Otilde"] = '\u00D5';
entities["Ouml"] = '\u00D6';
entities["times"] = '\u00D7';
entities["Oslash"] = '\u00D8';
entities["Ugrave"] = '\u00D9';
entities["Uacute"] = '\u00DA';
entities["Ucirc"] = '\u00DB';
entities["Uuml"] = '\u00DC';
entities["Yacute"] = '\u00DD';
entities["THORN"] = '\u00DE';
entities["szlig"] = '\u00DF';
entities["agrave"] = '\u00E0';
entities["aacute"] = '\u00E1';
entities["acirc"] = '\u00E2';
entities["atilde"] = '\u00E3';
entities["auml"] = '\u00E4';
entities["aring"] = '\u00E5';
entities["aelig"] = '\u00E6';
entities["ccedil"] = '\u00E7';
entities["egrave"] = '\u00E8';
entities["eacute"] = '\u00E9';
entities["ecirc"] = '\u00EA';
entities["euml"] = '\u00EB';
entities["igrave"] = '\u00EC';
entities["iacute"] = '\u00ED';
entities["icirc"] = '\u00EE';
entities["iuml"] = '\u00EF';
entities["eth"] = '\u00F0';
entities["ntilde"] = '\u00F1';
entities["ograve"] = '\u00F2';
entities["oacute"] = '\u00F3';
entities["ocirc"] = '\u00F4';
entities["otilde"] = '\u00F5';
entities["ouml"] = '\u00F6';
entities["divide"] = '\u00F7';
entities["oslash"] = '\u00F8';
entities["ugrave"] = '\u00F9';
entities["uacute"] = '\u00FA';
entities["ucirc"] = '\u00FB';
entities["uuml"] = '\u00FC';
entities["yacute"] = '\u00FD';
entities["thorn"] = '\u00FE';
entities["yuml"] = '\u00FF';
entities["fnof"] = '\u0192';
entities["Alpha"] = '\u0391';
entities["Beta"] = '\u0392';
entities["Gamma"] = '\u0393';
entities["Delta"] = '\u0394';
entities["Epsilon"] = '\u0395';
entities["Zeta"] = '\u0396';
entities["Eta"] = '\u0397';
entities["Theta"] = '\u0398';
entities["Iota"] = '\u0399';
entities["Kappa"] = '\u039A';
entities["Lambda"] = '\u039B';
entities["Mu"] = '\u039C';
entities["Nu"] = '\u039D';
entities["Xi"] = '\u039E';
entities["Omicron"] = '\u039F';
entities["Pi"] = '\u03A0';
entities["Rho"] = '\u03A1';
entities["Sigma"] = '\u03A3';
entities["Tau"] = '\u03A4';
entities["Upsilon"] = '\u03A5';
entities["Phi"] = '\u03A6';
entities["Chi"] = '\u03A7';
entities["Psi"] = '\u03A8';
entities["Omega"] = '\u03A9';
entities["alpha"] = '\u03B1';
entities["beta"] = '\u03B2';
entities["gamma"] = '\u03B3';
entities["delta"] = '\u03B4';
entities["epsilon"] = '\u03B5';
entities["zeta"] = '\u03B6';
entities["eta"] = '\u03B7';
entities["theta"] = '\u03B8';
entities["iota"] = '\u03B9';
entities["kappa"] = '\u03BA';
entities["lambda"] = '\u03BB';
entities["mu"] = '\u03BC';
entities["nu"] = '\u03BD';
entities["xi"] = '\u03BE';
entities["omicron"] = '\u03BF';
entities["pi"] = '\u03C0';
entities["rho"] = '\u03C1';
entities["sigmaf"] = '\u03C2';
entities["sigma"] = '\u03C3';
entities["tau"] = '\u03C4';
entities["upsilon"] = '\u03C5';
entities["phi"] = '\u03C6';
entities["chi"] = '\u03C7';
entities["psi"] = '\u03C8';
entities["omega"] = '\u03C9';
entities["thetasym"] = '\u03D1';
entities["upsih"] = '\u03D2';
entities["piv"] = '\u03D6';
entities["bull"] = '\u2022';
entities["hellip"] = '\u2026';
entities["prime"] = '\u2032';
entities["Prime"] = '\u2033';
entities["oline"] = '\u203E';
entities["frasl"] = '\u2044';
entities["weierp"] = '\u2118';
entities["image"] = '\u2111';
entities["real"] = '\u211C';
entities["trade"] = '\u2122';
entities["alefsym"] = '\u2135';
entities["larr"] = '\u2190';
entities["uarr"] = '\u2191';
entities["rarr"] = '\u2192';
entities["darr"] = '\u2193';
entities["harr"] = '\u2194';
entities["crarr"] = '\u21B5';
entities["lArr"] = '\u21D0';
entities["uArr"] = '\u21D1';
entities["rArr"] = '\u21D2';
entities["dArr"] = '\u21D3';
entities["hArr"] = '\u21D4';
entities["forall"] = '\u2200';
entities["part"] = '\u2202';
entities["exist"] = '\u2203';
entities["empty"] = '\u2205';
entities["nabla"] = '\u2207';
entities["isin"] = '\u2208';
entities["notin"] = '\u2209';
entities["ni"] = '\u220B';
entities["prod"] = '\u220F';
entities["sum"] = '\u2211';
entities["minus"] = '\u2212';
entities["lowast"] = '\u2217';
entities["radic"] = '\u221A';
entities["prop"] = '\u221D';
entities["infin"] = '\u221E';
entities["ang"] = '\u2220';
entities["and"] = '\u2227';
entities["or"] = '\u2228';
entities["cap"] = '\u2229';
entities["cup"] = '\u222A';
entities["int"] = '\u222B';
entities["there4"] = '\u2234';
entities["sim"] = '\u223C';
entities["cong"] = '\u2245';
entities["asymp"] = '\u2248';
entities["ne"] = '\u2260';
entities["equiv"] = '\u2261';
entities["le"] = '\u2264';
entities["ge"] = '\u2265';
entities["sub"] = '\u2282';
entities["sup"] = '\u2283';
entities["nsub"] = '\u2284';
entities["sube"] = '\u2286';
entities["supe"] = '\u2287';
entities["oplus"] = '\u2295';
entities["otimes"] = '\u2297';
entities["perp"] = '\u22A5';
entities["sdot"] = '\u22C5';
entities["lceil"] = '\u2308';
entities["rceil"] = '\u2309';
entities["lfloor"] = '\u230A';
entities["rfloor"] = '\u230B';
entities["lang"] = '\u2329';
entities["rang"] = '\u232A';
entities["loz"] = '\u25CA';
entities["spades"] = '\u2660';
entities["clubs"] = '\u2663';
entities["hearts"] = '\u2665';
entities["diams"] = '\u2666';
entities["quot"] = '\u0022';
entities["amp"] = '\u0026';
entities["lt"] = '\u003C';
entities["gt"] = '\u003E';
entities["OElig"] = '\u0152';
entities["oelig"] = '\u0153';
entities["Scaron"] = '\u0160';
entities["scaron"] = '\u0161';
entities["Yuml"] = '\u0178';
entities["circ"] = '\u02C6';
entities["tilde"] = '\u02DC';
entities["ensp"] = '\u2002';
entities["emsp"] = '\u2003';
entities["thinsp"] = '\u2009';
entities["zwnj"] = '\u200C';
entities["zwj"] = '\u200D';
entities["lrm"] = '\u200E';
entities["rlm"] = '\u200F';
entities["ndash"] = '\u2013';
entities["mdash"] = '\u2014';
entities["lsquo"] = '\u2018';
entities["rsquo"] = '\u2019';
entities["sbquo"] = '\u201A';
entities["ldquo"] = '\u201C';
entities["rdquo"] = '\u201D';
entities["bdquo"] = '\u201E';
entities["dagger"] = '\u2020';
entities["Dagger"] = '\u2021';
entities["permil"] = '\u2030';
entities["lsaquo"] = '\u2039';
entities["rsaquo"] = '\u203A';
entities["euro"] = '\u20AC';
return entities;
}
var Entities = initEntities();
var rawEntity = [];
var entity = [];
var output = [];
var len = s.length;
var state = 0;
var number = 0;
var is_hex_value = false;
var have_trailing_digits = false;
for (var i = 0; i < len; i++) {
var c = s[i];
if (state == 0) {
if (c == '&') {
entity.push(c);
rawEntity.push(c);
state = 1;
}
else {
output.push(c);
}
continue;
}
if (c == '&') {
state = 1;
if (have_trailing_digits) {
entity.push(number.toString());
have_trailing_digits = false;
}
output.push(entity.join(""));
entity = [];
entity.push('&');
continue;
}
if (state == 1) {
if (c == ';') {
state = 0;
output.push(entity.join(""));
output.push(c);
entity = [];
}
else {
number = 0;
is_hex_value = false;
if (c != '#') {
state = 2;
}
else {
state = 3;
}
entity.push(c);
rawEntity.push(c);
}
}
else if (state == 2) {
entity.push(c);
if (c == ';') {
var key = entity.join("");
if (key.length > 1 && Entities.hasOwnProperty(key.substr(1, key.length - 2)))
key = Entities[key.substr(1, key.length - 2)].toString();
output.push(key);
state = 0;
entity = [];
rawEntity = [];
}
}
else if (state == 3) {
if (c == ';') {
if (number == 0)
output.push(rawEntity.join("") + ";");
else if (number > 65535) {
output.push("&#");
output.push(number.toString());
output.push(";");
}
else {
output.push(String.fromCharCode(number));
}
state = 0;
entity = [];
rawEntity = [];
have_trailing_digits = false;
}
else if (is_hex_value && isHexDigit(c)) {
number = number * 16 + parseInt(c, 16);
have_trailing_digits = true;
rawEntity.push(c);
}
else if (isDigit(c)) {
number = number * 10 + (c.charCodeAt(0) - '0'.charCodeAt(0));
have_trailing_digits = true;
rawEntity.push(c);
}
else if (number == 0 && (c == 'x' || c == 'X')) {
is_hex_value = true;
rawEntity.push(c);
}
else {
state = 2;
if (have_trailing_digits) {
entity.push(number.toString());
have_trailing_digits = false;
}
entity.push(c);
}
}
}
if (entity.length > 0) {
output.push(entity.join(""));
}
else if (have_trailing_digits) {
output.push(number.toString());
}
return output.join("");
}
function htmlEncode(s) {
if (s == null)
return null;
if (s.length == 0)
return s;
var needEncode = false;
for (var i = 0; i < s.length; i++) {
var c = s[i];
if (c == '&' || c == '"' || c == '<' || c == '>' || c.charCodeAt(0) > 159
|| c == '\'') {
needEncode = true;
break;
}
}
if (!needEncode)
return s;
var output = [];
var len = s.length;
for (var i = 0; i < len; i++) {
var ch = s[i];
switch (ch) {
case '&':
output.push("&");
break;
case '>':
output.push(">");
break;
case '<':
output.push("<");
break;
case '"':
output.push(""");
break;
case '\'':
output.push("'");
break;
case '\uff1c':
output.push("<");
break;
case '\uff1e':
output.push(">");
break;
default:
if (ch.charCodeAt(0) > 159 && ch.charCodeAt(0) < 256) {
output.push("&#");
output.push(ch.charCodeAt(0).toString());
output.push(";");
}
else
output.push(ch);
break;
}
}
return output.join("");
}

Related

Javascript - how to render an output by typing three (or more) alphabets into the input?

I am making an html page which is a typer of a foreign script.
my progress: HERE
Here's the entire javascript:
function getReplacedText(latinText) {
if (!latinText) {
return "";
}
var replacedText = "";
for (var i = 0, len = latinText.length; i < len; i++) {
var curLetter = latinText[i];
var pos1Txt = latinText[i + 1];
var pos2Txt = latinText[i + 2];
if (!(curLetter == "")) {
var dualLetter = latreplaced[curLetter + pos1Txt];
if (dualLetter) {
replacedText += dualLetter;
i++;
continue;
}
}
replacedText += latreplaced[curLetter] || curLetter;
}
return replacedText;
}
var latreplaced = {
"u":"う",
"ku":"く",
"tsu":"つ",
};
function onLatinTextChange(txt) {
var replacedTextareaElem = document.getElementById("replaced_textarea");
var div = document.createElement("div");
var replacedHtmlEntities = getReplacedText(txt);
div.innerHTML = replacedHtmlEntities;
replacedTextareaElem.value = div.innerText;
}
The purpose of this project is to create a virtual phonetic keyboard to type certain forign scripts by only using Latin alphabets, without its keyboard setting installed.
Basically, if you enter an alphabet into the input <textarea>, it renders its corresponding foreign alphabet. (For instance, input 'u' > output 'う', input 'ku' > output 'く')
Here is my problem: So far I have enabled rendering an output when one or two alphabet is typed into the input box. But I cannot figure out how to enable the same by entering three alphabets. (For instance, input 'tsu' > output 'つ')
"u":"う", // <- can convert
"ku":"く", // <- can convert
"tsu":"つ", // <- cannot convert!
In the javascript code, there is a var called dualLetter, which goes by the following script:
var dualLetter = latreplaced[curLetter + pos1Txt];
How can I edit this part of code (or the entire javascript) to be able to convert 3 or more input alphabets? Do I need to make var tripleLetter, or create a whole new system? Any alternative ways would also be helpful.
[edit] a solution inspired by your code :
I changed the main function but this definitively works
live demo : https://jsfiddle.net/alias_gui3/wds426mq/12/
source code :
var dictionnary = {
"u":"う",
"ku":"く",
"tsu":"つ",
"test for spaces": "😍"
};
var maxLength = Object.keys(dictionnary)
.reduce((a, b) => a.length > b.length ? a : b) // get the longest word
.length; // and it's length
function translate (text) {
var translated = "";
var cur = 0;
while (cur < text.length) {
var testedPhoneme;
var symbol = undefined;
for (var length = maxLength; length > 0; length --) {
testedPhoneme = text.substr(cur, length);
if (dictionnary[testedPhoneme]) {
symbol = dictionnary[testedPhoneme];
break; // stop the loop
}
}
if (symbol) {
translated += symbol;
cur += testedPhoneme.length;
}
else {
translated += text[cur]
cur++;
}
}
return translated
}
function onLatinTextChange(txt) {
var replacedTextareaElem = document.getElementById("replaced_textarea");
var div = document.createElement("div");
var replacedHtmlEntities = translate(txt);
div.innerHTML = replacedHtmlEntities;
replacedTextareaElem.value = div.innerText;
}
[previous post] a simple solution :
I suggest you split your text using spaces
If i understand well, you want to type u ku tsu to get うくつ, not ukutsu, if this is right then something like that could work :
const dictionnary = {
"u": "う",
"ku": "く",
"tsu": "つ"
var phonemes = text.split(' ') // split text by spaces
var translatedArray = phonemes.map(function (phoneme) {
return dictionnary[phoneme] || phoneme
// will return the latin phoneme if it is not in the dictionnary
})
translatedString = translatedArray.join('')

How to add or replace a query parameter in a URL using Javascript/jQuery?

I'm using jQuery 1.12. I want to replace a query string parameter in my window's URL query string, or add the parameter if doesn't exist. I tried the below:
new_url = window.location.href.replace( /[\?#].*|$/, "?order_by=" + data_val )
window.location.href = new_url
but what I'm discovering is that this wipes out all previous parameters in the query string, which I don't want. If the query string is:
?a=1&b=2
I would want the new query string to be:
?a=2&b=2&order_by=data
and if the query string was:
?a=2&b=3&order_by=old_data
it would become:
?a=2&b=3&order_by=data
You could use a jQuery plugin to do the all the heavy lifting for you. It will parse the query string, and also reconstruct the updated query string for you. Much less code to deal with.
Plugin Download Page
Github Repo
// URL: ?a=2&b=3&order_by=old_data
var order_by = $.query.get('order_by');
//=> old_data
// Conditionally modify parameter value
if (order_by) {
order_by = “data”;
}
// Inject modified parameter back into query string
var newUrl = $.query.set(“order_by”, order_by).toString();
//=> ?a=2&b=3&order_by=data
For those using Node.js, there is a package for this available in NPM.
NPM Package
Github Repo
var queryString = require('query-string');
var parsed = queryString.parse('?a=2&b=3&order_by=old_data'); // location.search
// Conditionally modify parameter value
if (parsed.order_by) {
parsed.order_by = 'data';
}
// Inject modified parameter back into query string
const newQueryString = queryString.stringify(parsed);
//=> a=2&b=3&order_by=data
A good solution ought to handle all of the following:
A URL that already has an order_by query parameter, optionally with whitespace before the equals sign. This can be further divided into cases where the order_by appears at the start, middle or end of the query string.
A URL that doesn't already have and order_by query parameter but does already have a question mark to delimit the query string.
A URL that doesn't already have and order_by query parameter and doesn't already have a question mark to delimit the query string.
The following will handle the cases above:
if (/[?&]order_by\s*=/.test(oldUrl)) {
newUrl = oldUrl.replace(/(?:([?&])order_by\s*=[^?&]*)/, "$1order_by=" + data_val);
} else if (/\?/.test(oldUrl)) {
newUrl = oldUrl + "&order_by=" + data_val;
} else {
newUrl = oldUrl + "?order_by=" + data_val;
}
as demonstrated below:
getNewUrl("?a=1&b=2");
getNewUrl("?a=2&b=3&order_by=old_data");
getNewUrl("?a=2&b=3&order_by = old_data&c=4");
getNewUrl("?order_by=old_data&a=2&b=3");
getNewUrl("http://www.stackoverflow.com");
function getNewUrl(oldUrl) {
var data_val = "new_data";
var newUrl;
if (/[?&]order_by\s*=/.test(oldUrl)) {
newUrl = oldUrl.replace(/(?:([?&])order_by\s*=[^?&]*)/, "$1order_by=" + data_val);
} else if (/\?/.test(oldUrl)) {
newUrl = oldUrl + "&order_by=" + data_val;
} else {
newUrl = oldUrl + "?order_by=" + data_val;
}
console.log(oldUrl + "\n...becomes...\n" + newUrl);
}
something like this?
let new_url = "";
if (window.location.search && window.location.search.indexOf('order_by=') != -1) {
new_url = window.location.search.replace( /order_by=\w*\d*/, "order_by=" + data_val);
} else if (window.location.search) {
new_url = window.location.search + "&order_by=" + data_val;
} else {
new_url = window.location.search + "?order_by=" + data_val;
}
window.location.href = new_url;
function addOrReplaceOrderBy(newData) {
var stringToAdd = "order_by=" + newData;
if (window.location.search == "")
return window.location.href + stringToAdd;
if (window.location.search.indexOf('order_by=') == -1)
return window.location.href + stringToAdd;
var newSearchString = "";
var searchParams = window.location.search.substring(1).split("&");
for (var i = 0; i < searchParams.length; i++) {
if (searchParams[i].indexOf('order_by=') > -1) {
searchParams[i] = "order_by=" + newData;
break;
}
}
return window.location.href.split("?")[0] + "?" + searchParams.join("&");
}
window.location.href = addOrReplaceOrderBy("new_order_by");
A little long but I think it works as intended.
You can remove parameter from query string using URLSearchParams https://developer.mozilla.org/ru/docs/Web/API/URLSearchParams?param11=val
It is not yet supported by IE and Safari, but you can use it by adding polyfill https://github.com/jerrybendy/url-search-params-polyfill
And for accessing or modifying query part of the URI you should use "search" property of the window.location.
Working code example:
var a = document.createElement("a")
a.href = "http://localhost.com?param1=val&param2=val2&param3=val3#myHashCode";
var queryParams = new URLSearchParams(a.search)
queryParams.delete("param2")
a.search = queryParams.toString();
console.log(a.href);
Try this:
For reading parameters:
const data = ['example.com?var1=value1&var2=value2&var3=value3', 'example.com?a=2&b=2&order_by=data']
const getParameters = url => {
const parameters = url.split('?')[1],
regex = /(\w+)=(\w+)/g,
obj = {}
let temp
while (temp = regex.exec(parameters)){
obj[temp[1]] = decodeURIComponent(temp[2])
}
return obj
}
for(let url of data){
console.log(getParameters(url))
}
For placing only this parameters:
const data = ['example.com?zzz=asd']
const parameters = {a:1, b:2, add: "abs"}
const setParameters = (url, parameters) => {
const keys = Object.keys(parameters)
let temp = url.split('?')[0] += '?'
for (let i = 0; i < keys.length; i++) {
temp += `${keys[i]}=${parameters[keys[i]]}${i == keys.length - 1 ? '' : '&'}`
}
return temp
}
for (let url of data){
console.log(setParameters(url, parameters))
}
And finaly for inserting (or replace while exists)
const data = ['example.com?a=123&b=3&sum=126']
const parameters = {order_by: 'abc', a: 11}
const insertParameters = (url, parameters) => {
const keys = Object.keys(parameters)
let result = url
for (let i = 0; i < keys.length; i++){
if (result.indexOf(keys[i]) === -1) {
result += `&${keys[i]}=${encodeURIComponent(parameters[keys[i]])}`
} else {
let regex = new RegExp(`${keys[i]}=(\\w+)`)
result = result.replace(regex, `&${keys[i]}=${encodeURIComponent(parameters[keys[i]])}`)
}
}
return result
}
for (let url of data){
console.log(insertParameters(url, parameters))
}
Hope this works for you ;)
After using function just replace window.location.href
This small function could help.
function changeSearchQueryParameter(oldParameter,newParameter,newValue) {
var parameters = location.search.replace("?", "").split("&").filter(function(el){ return el !== "" });
var out = "";
var count = 0;
if(oldParameter.length>0) {
if(newParameter.length>0 && (newValue.length>0 || newValue>=0)){
out += "?";
var params = [];
parameters.forEach(function(v){
var vA = v.split("=");
if(vA[0]==oldParameter) {
vA[0]=newParameter;
if((newValue.length>0 || newValue>=0)) {
vA[1] = newValue;
}
} else {
count++;
}
params.push(vA.join("="));
});
if(count==parameters.length) {
params.push([newParameter,newValue].join("="));
}
params = params.filter(function(el){ return el !== "" });
if(params.length>1) {
out += params.join("&");
}
if(params.length==1) {
out += params[0];
}
}
} else {
if((newParameter.length>0) && (newValue.length>0 || newValue>=0)){
if(location.href.indexOf("?")!==-1) {
var out = "&"+newParameter+"="+newValue;
} else {
var out = "?"+newParameter+"="+newValue;
}
}
}
return location.href+out;
}
// if old query parameter is declared but does not exist in url then new parameter and value is simply added if it exists it will be replaced
console.log(changeSearchQueryParameter("ib","idx",5));
// add new parameter and value in url
console.log(changeSearchQueryParameter("","idy",5));
// if no new or old parameter are present url does not change
console.log(changeSearchQueryParameter("","",5));
console.log(changeSearchQueryParameter("","",""));
Maybe you could try tweaking the regular expression to retrieve only the values you're looking for, then add or update them in a helper function, something like this:
function paramUpdate(param) {
var url = window.location.href,
regExp = new RegExp(param.key + '=([a-z0-9\-\_]+)(?:&)?'),
existsMatch = url.match(regExp);
if (!existsMatch) {
return url + '&' + param.key + '=' + param.value
}
var paramToUpdate = existsMatch[0],
valueToReplace = existsMatch[1],
updatedParam = paramToUpdate.replace(valueToReplace, param.value);
return url.replace(paramToUpdate, updatedParam);
}
var new_url = paramUpdate({
key: 'order_by',
value: 'id'
});
window.location.href = new_url;
Hope it works well for your needs!
To use Regex pattern, I prefer this one:
var oldUrl = "http://stackoverflow.com/";
var data_val = "newORDER" ;
var r = /^(.+order_by=).+?(&|$)(.*)$/i ;
var newUrl = "";
var matches = oldUrl.match(r) ;
if(matches===null){
newUrl = oldUrl + ((oldUrl.indexOf("?")>-1)?"&":"?") + "order_by=" + data_val ;
}else{
newUrl = matches[1]+data_val+matches[2]+matches[3] ;
}
conole.log(newUrl);
If no order_by exist, matches is null and order_by=.. should come after ? or & (if other parameters exist, new one needs &).
If order_by exist, matches has 3 items, see here
Based on AVAVT´s answer I improved it so it takes any key, and I also fixed the missing "?" if there was no querystring
function addOrReplace(key, value) {
var stringToAdd = key+"=" + value;
if (window.location.search == "")
return window.location.href + '?'+stringToAdd;
if (window.location.search.indexOf(key+'=') == -1)
return window.location.href + stringToAdd;
var newSearchString = "";
var searchParams = window.location.search.substring(1).split("&");
for (var i = 0; i < searchParams.length; i++) {
if (searchParams[i].indexOf(key+'=') > -1) {
searchParams[i] = key+"=" + value;
break;
}
}
return window.location.href.split("?")[0] + "?" + searchParams.join("&");
}
usuage:
window.location.href = addOrReplace('order_by', 'date_created');
if you would not want to reload the page you can use pushState Api
if (history.pushState) {
var newurl = addOrReplace('order_by', 'date_created');
window.history.pushState({path:newurl},'',newurl);
}
function myFunction() {
var str = "https://www.citicards.com/cards/credit/application/flow.action?app=UNSOL&HKOP=828cca70910b4fe25e118bd0b59b89c3c7c560df877909495d8252d20026cf8d&cmp=afa|acquire|2003|comparecards&ranMID=44660&ranEAID=2759285&ProspectID=516511657A844EF3A6F0C2B1E85FEFB0&ID=3000";
var res = str.split("&");
var myKey;
if (!str.includes("ranSiteID")) {
console.log("key not found ");
res.push('ranSiteID=samplearsdyfguh.090-nuvbknlmc0.gvyhbjknl')
console.log(res.join("&"));
} else {
res.map(function(key) {
console.log("my keys", key);
if (key.includes("ranSiteID")) {
console.log("my required-->key", key);
mykey = key.split("=");
console.log(mykey);
}
})
}
document.getElementById("demo").innerHTML = res;
}
<!DOCTYPE html>
<html>
<body>
<p>Click the button to display the array values after the split.</p>
<button onclick="myFunction()">Try it</button>
<p id="demo"></p>
</body>
</html>

javascript regex replace expressions found with themselves within tags

I have a form where a user can enter in some code, and then I want to use Javascript to display it back to them in the webpage. I've already used regex in javascript to replace all the < > tags with html keywords &lt and &gt, but I want to highlight in blue all instances of a tag being opened and then ending with a space or a close tag. I can find the expressions I want but I then want to surround each of them with < span> tags.
The actual code is long but Here's some example code that covers what I want to do:
//example of what a user might put in
var text = "<div id='main'>Here is some <b>bold</b> text.</div>";
//Replace all tag symbols with html keywords
text = text.replace(/\r?</g,'&lt');
//now the expression to get what i want to highlight blue
var regExp = /\&lt[a-zA-Z]+(\s|&gt)/g;
And now I want to find the expressions, and replace them all with themselves wrapped inside span tags, like:
text = text.replace(regExp,"<span class='bluefont'>EACH EXPRESSION FOUND</span>");
I don't know how to do this or if it's even possible just using replace, but it would be really handy if it is.
I know there are external libraries for syntax highlighting but I don't want to use any external libraries for this. I'm using [a-zA-Z] instead of checking for legal tag names in html because I want this to work for xml/xhtml too.
There is a simple solution - replace all < and > with entites, next use regexp (\<[a-zA-Z]*\>). But to be exact I would use something else for first replacement, like below.
markedString = htmlentities('My cat is <span>fluffy</span>')
markedString = markedString.replace(new RegExp('(\<[a-zA-Z]*\>)', 'g'), '<span class="marked">$1</span>');
function htmlentities (string, quote_style, charset, double_encode) {
var hash_map = this.get_html_translation_table('HTML_ENTITIES', quote_style),
symbol = '';
string = string == null ? '' : string + '';
if (!hash_map) {
return false;
}
if (quote_style && quote_style === 'ENT_QUOTES') {
hash_map["'"] = ''';
}
if (!!double_encode || double_encode == null) {
for (symbol in hash_map) {
if (hash_map.hasOwnProperty(symbol)) {
string = string.split(symbol).join(hash_map[symbol]);
}
}
} else {
string = string.replace(/([\s\S]*?)(&(?:#\d+|#x[\da-f]+|[a-zA-Z][\da-z]*);|$)/g, function (ignore, text, entity) {
for (symbol in hash_map) {
if (hash_map.hasOwnProperty(symbol)) {
text = text.split(symbol).join(hash_map[symbol]);
}
}
return text + entity;
});
}
return string;
}
function get_html_translation_table (table, quote_style) {
var entities = {},
hash_map = {},
decimal;
var constMappingTable = {},
constMappingQuoteStyle = {};
var useTable = {},
useQuoteStyle = {};
// Translate arguments
constMappingTable[0] = 'HTML_SPECIALCHARS';
constMappingTable[1] = 'HTML_ENTITIES';
constMappingQuoteStyle[0] = 'ENT_NOQUOTES';
constMappingQuoteStyle[2] = 'ENT_COMPAT';
constMappingQuoteStyle[3] = 'ENT_QUOTES';
useTable = !isNaN(table) ? constMappingTable[table] : table ? table.toUpperCase() : 'HTML_SPECIALCHARS';
useQuoteStyle = !isNaN(quote_style) ? constMappingQuoteStyle[quote_style] : quote_style ? quote_style.toUpperCase() : 'ENT_COMPAT';
if (useTable !== 'HTML_SPECIALCHARS' && useTable !== 'HTML_ENTITIES') {
throw new Error("Table: " + useTable + ' not supported');
// return false;
}
entities['38'] = '&';
if (useTable === 'HTML_ENTITIES') {
entities['160'] = ' ';
entities['161'] = '¡';
entities['162'] = '¢';
entities['163'] = '£';
entities['164'] = '¤';
entities['165'] = '¥';
entities['166'] = '¦';
entities['167'] = '§';
entities['168'] = '¨';
entities['169'] = '©';
entities['170'] = 'ª';
entities['171'] = '«';
entities['172'] = '¬';
entities['173'] = '­';
entities['174'] = '®';
entities['175'] = '¯';
entities['176'] = '°';
entities['177'] = '±';
entities['178'] = '²';
entities['179'] = '³';
entities['180'] = '´';
entities['181'] = 'µ';
entities['182'] = '¶';
entities['183'] = '·';
entities['184'] = '¸';
entities['185'] = '¹';
entities['186'] = 'º';
entities['187'] = '»';
entities['188'] = '¼';
entities['189'] = '½';
entities['190'] = '¾';
entities['191'] = '¿';
entities['192'] = 'À';
entities['193'] = 'Á';
entities['194'] = 'Â';
entities['195'] = 'Ã';
entities['196'] = 'Ä';
entities['197'] = 'Å';
entities['198'] = 'Æ';
entities['199'] = 'Ç';
entities['200'] = 'È';
entities['201'] = 'É';
entities['202'] = 'Ê';
entities['203'] = 'Ë';
entities['204'] = 'Ì';
entities['205'] = 'Í';
entities['206'] = 'Î';
entities['207'] = 'Ï';
entities['208'] = 'Ð';
entities['209'] = 'Ñ';
entities['210'] = 'Ò';
entities['211'] = 'Ó';
entities['212'] = 'Ô';
entities['213'] = 'Õ';
entities['214'] = 'Ö';
entities['215'] = '×';
entities['216'] = 'Ø';
entities['217'] = 'Ù';
entities['218'] = 'Ú';
entities['219'] = 'Û';
entities['220'] = 'Ü';
entities['221'] = 'Ý';
entities['222'] = 'Þ';
entities['223'] = 'ß';
entities['224'] = 'à';
entities['225'] = 'á';
entities['226'] = 'â';
entities['227'] = 'ã';
entities['228'] = 'ä';
entities['229'] = 'å';
entities['230'] = 'æ';
entities['231'] = 'ç';
entities['232'] = 'è';
entities['233'] = 'é';
entities['234'] = 'ê';
entities['235'] = 'ë';
entities['236'] = 'ì';
entities['237'] = 'í';
entities['238'] = 'î';
entities['239'] = 'ï';
entities['240'] = 'ð';
entities['241'] = 'ñ';
entities['242'] = 'ò';
entities['243'] = 'ó';
entities['244'] = 'ô';
entities['245'] = 'õ';
entities['246'] = 'ö';
entities['247'] = '÷';
entities['248'] = 'ø';
entities['249'] = 'ù';
entities['250'] = 'ú';
entities['251'] = 'û';
entities['252'] = 'ü';
entities['253'] = 'ý';
entities['254'] = 'þ';
entities['255'] = 'ÿ';
}
if (useQuoteStyle !== 'ENT_NOQUOTES') {
entities['34'] = '"';
}
if (useQuoteStyle === 'ENT_QUOTES') {
entities['39'] = ''';
}
entities['60'] = '<';
entities['62'] = '>';
// ascii decimals to real symbols
for (decimal in entities) {
if (entities.hasOwnProperty(decimal)) {
hash_map[String.fromCharCode(decimal)] = entities[decimal];
}
}
return hash_map;
}
function htmlentites from http://phpjs.org/functions/htmlentities/
function get_html_translation_table from http://phpjs.org/functions/get_html_translation_table/
I'm no regex expert, but I believe you could do it using backreferences in your regular expression.
var regExp = /(\&lt[a-zA-Z]+(\s|&gt))/g; //Note the additional parentheses
text = text.replace(regExp,"<span class='bluefont'>$1</span>");

building a database string

I'm trying to build a database based on some arbitrary data on a website. It's complex and changes for each site so I'll spare the details. Here's basically what I'm trying to do
function level0(arg) { textarea.innerHTML += arg + ' = {'; }
function level1(arg) { textarea.innerHTML += '\n\t' + arg + ': ['; }
function level2(arg) { textarea.innerHTML += arg + ', '; }
And so on. The thing is some level1's don't have any children and I can't get the formatting right.
My three problems are as follows.
The ending commas are going to break in IE (thank you MS)
Empty level1's shouldn't be printed if they don't have any children
Closing /curly?brackets/
HERE'S A DEMO of what I have so far. Notice the ending commas, the empty sub2 which shouldn't be printed, and no closing brackets or braces
Do I need to redesign the entire thing?
Is there also a way to have this all in one function so I don't have to worry if I add another layer?
EDIT
This needs to be done in a string format, I can't build an object and then stringify it, mostly because I need to know which element I'm in the middle of adding to.
Overall it looks that you still might want to build an object, but in case you insist on not building it - here is some sample solution:
function Printer() {
var result = '',
lastLevel = null,
close = {0:'\n}', 1:']', 2:''},
delimiter = {0: ',\n', 1:',\n', 2:','};
function closeLevel(level, noDelimiter) {
if(lastLevel === null)
return;
var l = lastLevel, d = level == lastLevel;
while(l >= level) {
result += close[l] + (l == level && !noDelimiter ? delimiter[l]:'');
l--;
}
}
this.level0 = function(arg) {
closeLevel(0);
result += arg + ' = {\n';
lastLevel = 0;
};
this.level1 = function(arg) {
closeLevel(1);
result += '\t' + arg + ': [';
lastLevel = 1;
};
this.level2 = function(arg) {
closeLevel(2);
result += arg;
lastLevel = 2;
};
this.getResult = function() {
closeLevel(lastLevel, true);
return result;
}
}
var p = new Printer();
p.level0('head');
p.level1('sub1');
p.level2('item1');p.level2('item2');p.level2('item3');
p.level1('sub2');
p.level1('sub3');
p.level2('newthing');
p.level0('head2');
document.getElementById('textarea').value = p.getResult();
You could see it in action here.
I'm not sure why you're building what looks like objects with nested arrays, using string concatenation. Something like this would be much simpler, since it wouldn't require fixing trailing commas, etc:
Edit: I've updated the code to make it keep track of the last level put in.
function Db() {
var level0, level1;
var data = new Object();
this.level0 = function(arg) {
level0 = new Object();
data[arg] = level0;
}
this.level1 = function(arg) {
level1 = new Array();
level0[arg] = level1;
}
this.level2 = function(arg) {
level1.push(arg);
}
this.toString = function() {
var s = '';
for(i in data) {
s += i + '\n';
for(j in data[i]) {
if(data[i][j].length>0) {
s += '\t' + j + ': [' + data[i][j] + ']\n' ;
}
}
}
return s;
}
}
Use like this:
var db = new Db();
db.level0('head');
db.level1('sub1');
db.level2('item1');db.level2('item2');db.level2('item3');
I've tested this in the demo you linked and it works just fine.

Pretty printing XML with javascript

I have a string that represents a non indented XML that I would like to pretty-print. For example:
<root><node/></root>
should become:
<root>
<node/>
</root>
Syntax highlighting is not a requirement. To tackle the problem I first transform the XML to add carriage returns and white spaces and then use a pre tag to output the XML. To add new lines and white spaces I wrote the following function:
function formatXml(xml) {
var formatted = '';
var reg = /(>)(<)(\/*)/g;
xml = xml.replace(reg, '$1\r\n$2$3');
var pad = 0;
jQuery.each(xml.split('\r\n'), function(index, node) {
var indent = 0;
if (node.match( /.+<\/\w[^>]*>$/ )) {
indent = 0;
} else if (node.match( /^<\/\w/ )) {
if (pad != 0) {
pad -= 1;
}
} else if (node.match( /^<\w[^>]*[^\/]>.*$/ )) {
indent = 1;
} else {
indent = 0;
}
var padding = '';
for (var i = 0; i < pad; i++) {
padding += ' ';
}
formatted += padding + node + '\r\n';
pad += indent;
});
return formatted;
}
I then call the function like this:
jQuery('pre.formatted-xml').text(formatXml('<root><node1/></root>'));
This works perfectly fine for me but while I was writing the previous function I thought that there must be a better way. So my question is do you know of any better way given an XML string to pretty-print it in an html page? Any javascript frameworks and/or plugins that could do the job are welcome. My only requirement is this to be done on the client side.
This can be done using native javascript tools, without 3rd party libs, extending the #Dimitre Novatchev's answer:
var prettifyXml = function(sourceXml)
{
var xmlDoc = new DOMParser().parseFromString(sourceXml, 'application/xml');
var xsltDoc = new DOMParser().parseFromString([
// describes how we want to modify the XML - indent everything
'<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">',
' <xsl:strip-space elements="*"/>',
' <xsl:template match="para[content-style][not(text())]">', // change to just text() to strip space in text nodes
' <xsl:value-of select="normalize-space(.)"/>',
' </xsl:template>',
' <xsl:template match="node()|#*">',
' <xsl:copy><xsl:apply-templates select="node()|#*"/></xsl:copy>',
' </xsl:template>',
' <xsl:output indent="yes"/>',
'</xsl:stylesheet>',
].join('\n'), 'application/xml');
var xsltProcessor = new XSLTProcessor();
xsltProcessor.importStylesheet(xsltDoc);
var resultDoc = xsltProcessor.transformToDocument(xmlDoc);
var resultXml = new XMLSerializer().serializeToString(resultDoc);
return resultXml;
};
console.log(prettifyXml('<root><node/></root>'));
Outputs:
<root>
<node/>
</root>
JSFiddle
Note, as pointed out by #jat255, pretty printing with <xsl:output indent="yes"/> is not supported by firefox. It only seems to work in chrome, opera and probably the rest webkit-based browsers.
From the text of the question I get the impression that a string result is expected, as opposed to an HTML-formatted result.
If this is so, the simplest way to achieve this is to process the XML document with the identity transformation and with an <xsl:output indent="yes"/> instruction:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
When applying this transformation on the provided XML document:
<root><node/></root>
most XSLT processors (.NET XslCompiledTransform, Saxon 6.5.4 and Saxon 9.0.0.2, AltovaXML) produce the wanted result:
<root>
<node />
</root>
Found this thread when I had a similar requirement but I simplified OP's code as follows:
function formatXml(xml, tab) { // tab = optional indent value, default is tab (\t)
var formatted = '', indent= '';
tab = tab || '\t';
xml.split(/>\s*</).forEach(function(node) {
if (node.match( /^\/\w/ )) indent = indent.substring(tab.length); // decrease indent by one 'tab'
formatted += indent + '<' + node + '>\r\n';
if (node.match( /^<?\w[^>]*[^\/]$/ )) indent += tab; // increase indent
});
return formatted.substring(1, formatted.length-3);
}
works for me!
Slight modification of efnx clckclcks's javascript function. I changed the formatting from spaces to tab, but most importantly I allowed text to remain on one line:
var formatXml = this.formatXml = function (xml) {
var reg = /(>)\s*(<)(\/*)/g; // updated Mar 30, 2015
var wsexp = / *(.*) +\n/g;
var contexp = /(<.+>)(.+\n)/g;
xml = xml.replace(reg, '$1\n$2$3').replace(wsexp, '$1\n').replace(contexp, '$1\n$2');
var pad = 0;
var formatted = '';
var lines = xml.split('\n');
var indent = 0;
var lastType = 'other';
// 4 types of tags - single, closing, opening, other (text, doctype, comment) - 4*4 = 16 transitions
var transitions = {
'single->single': 0,
'single->closing': -1,
'single->opening': 0,
'single->other': 0,
'closing->single': 0,
'closing->closing': -1,
'closing->opening': 0,
'closing->other': 0,
'opening->single': 1,
'opening->closing': 0,
'opening->opening': 1,
'opening->other': 1,
'other->single': 0,
'other->closing': -1,
'other->opening': 0,
'other->other': 0
};
for (var i = 0; i < lines.length; i++) {
var ln = lines[i];
// Luca Viggiani 2017-07-03: handle optional <?xml ... ?> declaration
if (ln.match(/\s*<\?xml/)) {
formatted += ln + "\n";
continue;
}
// ---
var single = Boolean(ln.match(/<.+\/>/)); // is this line a single tag? ex. <br />
var closing = Boolean(ln.match(/<\/.+>/)); // is this a closing tag? ex. </a>
var opening = Boolean(ln.match(/<[^!].*>/)); // is this even a tag (that's not <!something>)
var type = single ? 'single' : closing ? 'closing' : opening ? 'opening' : 'other';
var fromTo = lastType + '->' + type;
lastType = type;
var padding = '';
indent += transitions[fromTo];
for (var j = 0; j < indent; j++) {
padding += '\t';
}
if (fromTo == 'opening->closing')
formatted = formatted.substr(0, formatted.length - 1) + ln + '\n'; // substr removes line break (\n) from prev loop
else
formatted += padding + ln + '\n';
}
return formatted;
};
Personnaly, I use google-code-prettify with this function :
prettyPrintOne('<root><node1><root>', 'xml')
Or if you'd just like another js function to do it, I've modified Darin's (a lot):
var formatXml = this.formatXml = function (xml) {
var reg = /(>)(<)(\/*)/g;
var wsexp = / *(.*) +\n/g;
var contexp = /(<.+>)(.+\n)/g;
xml = xml.replace(reg, '$1\n$2$3').replace(wsexp, '$1\n').replace(contexp, '$1\n$2');
var pad = 0;
var formatted = '';
var lines = xml.split('\n');
var indent = 0;
var lastType = 'other';
// 4 types of tags - single, closing, opening, other (text, doctype, comment) - 4*4 = 16 transitions
var transitions = {
'single->single' : 0,
'single->closing' : -1,
'single->opening' : 0,
'single->other' : 0,
'closing->single' : 0,
'closing->closing' : -1,
'closing->opening' : 0,
'closing->other' : 0,
'opening->single' : 1,
'opening->closing' : 0,
'opening->opening' : 1,
'opening->other' : 1,
'other->single' : 0,
'other->closing' : -1,
'other->opening' : 0,
'other->other' : 0
};
for (var i=0; i < lines.length; i++) {
var ln = lines[i];
var single = Boolean(ln.match(/<.+\/>/)); // is this line a single tag? ex. <br />
var closing = Boolean(ln.match(/<\/.+>/)); // is this a closing tag? ex. </a>
var opening = Boolean(ln.match(/<[^!].*>/)); // is this even a tag (that's not <!something>)
var type = single ? 'single' : closing ? 'closing' : opening ? 'opening' : 'other';
var fromTo = lastType + '->' + type;
lastType = type;
var padding = '';
indent += transitions[fromTo];
for (var j = 0; j < indent; j++) {
padding += ' ';
}
formatted += padding + ln + '\n';
}
return formatted;
};
All of the javascript functions given here won't work for an xml document having unspecified white spaces between the end tag '>' and the start tag '<'. To fix them, you just need to replace the first line in the functions
var reg = /(>)(<)(\/*)/g;
by
var reg = /(>)\s*(<)(\/*)/g;
what about creating a stub node (document.createElement('div') - or using your library equivalent), filling it with the xml string (via innerHTML) and calling simple recursive function for the root element/or the stub element in case you don't have a root. The function would call itself for all the child nodes.
You could then syntax-highlight along the way, be certain the markup is well-formed (done automatically by browser when appending via innerHTML) etc. It wouldn't be that much code and probably fast enough.
If you are looking for a JavaScript solution just take the code from the Pretty Diff tool at http://prettydiff.com/?m=beautify
You can also send files to the tool using the s parameter, such as:
http://prettydiff.com/?m=beautify&s=https://stackoverflow.com/
You can get pretty formatted xml with xml-beautify
var prettyXmlText = new XmlBeautify().beautify(xmlText,
{indent: " ",useSelfClosingElement: true});
indent:indent pattern like white spaces
useSelfClosingElement: true=>use self-closing element when empty element.
JSFiddle
Original(Before)
<?xml version="1.0" encoding="utf-8"?><example version="2.0">
<head><title>Original aTitle</title></head>
<body info="none" ></body>
</example>
Beautified(After)
<?xml version="1.0" encoding="utf-8"?>
<example version="2.0">
<head>
<title>Original aTitle</title>
</head>
<body info="none" />
</example>
For a current project I had the need to prettify and colorize XML without extra libraries. The following self contained code works quite well.
function formatXml(xml,colorize,indent) {
function esc(s){return s.replace(/[-\/&<> ]/g,function(c){ // Escape special chars
return c==' '?' ':'&#'+c.charCodeAt(0)+';';});}
var sm='<div class="xmt">',se='<div class="xel">',sd='<div class="xdt">',
sa='<div class="xat">',tb='<div class="xtb">',tc='<div class="xtc">',
ind=indent||' ',sz='</div>',tz='</div>',re='',is='',ib,ob,at,i;
if (!colorize) sm=se=sd=sa=sz='';
xml.match(/(?<=<).*(?=>)|$/s)[0].split(/>\s*</).forEach(function(nd){
ob=('<'+nd+'>').match(/^(<[!?\/]?)(.*?)([?\/]?>)$/s); // Split outer brackets
ib=ob[2].match(/^(.*?)>(.*)<\/(.*)$/s)||['',ob[2],'']; // Split inner brackets
at=ib[1].match(/^--.*--$|=|('|").*?\1|[^\t\n\f \/>"'=]+/g)||['']; // Split attributes
if (ob[1]=='</') is=is.substring(ind.length); // Decrease indent
re+=tb+tc+esc(is)+tz+tc+sm+esc(ob[1])+sz+se+esc(at[0])+sz;
for (i=1;i<at.length;i++) re+=(at[i]=="="?sm+"="+sz+sd+esc(at[++i]):sa+' '+at[i])+sz;
re+=ib[2]?sm+esc('>')+sz+sd+esc(ib[2])+sz+sm+esc('</')+sz+se+ib[3]+sz:'';
re+=sm+esc(ob[3])+sz+tz+tz;
if (ob[1]+ob[3]+ib[2]=='<>') is+=ind; // Increase indent
});
return re;
}
See https://jsfiddle.net/dkb0La16/
Or just print out the special HTML characters?
Ex: <xmlstuff>
<node />
</xmlstuff>
Horizontal tab
Line feed
XMLSpectrum formats XML, supports attribute indentation and also does syntax-highlighting for XML and any embedded XPath expressions:
XMLSpectrum is an open source project, coded in XSLT 2.0 - so you can run this server-side with a processor such as Saxon-HE (recommended) or client-side using Saxon-CE.
XMLSpectrum is not yet optimised to run in the browser - hence the recommendation to run this server-side.
here is another function to format xml
function formatXml(xml){
var out = "";
var tab = " ";
var indent = 0;
var inClosingTag=false;
var dent=function(no){
out += "\n";
for(var i=0; i < no; i++)
out+=tab;
}
for (var i=0; i < xml.length; i++) {
var c = xml.charAt(i);
if(c=='<'){
// handle </
if(xml.charAt(i+1) == '/'){
inClosingTag = true;
dent(--indent);
}
out+=c;
}else if(c=='>'){
out+=c;
// handle />
if(xml.charAt(i-1) == '/'){
out+="\n";
//dent(--indent)
}else{
if(!inClosingTag)
dent(++indent);
else{
out+="\n";
inClosingTag=false;
}
}
}else{
out+=c;
}
}
return out;
}
Xml formatting can be done by parsing the xml, adding or changing text nodes in the dom tree for indentation and then serializing the DOM back to xml.
Please check formatxml function in https://jsonbrowser.sourceforge.io/formatxml.js
You can see the function in action in https://jsonbrowser.sourceforge.io/
under the Xml tab.
Below is the simplified code.
formatxml.js adds error checking, optional removal of comments, indent as a parameter and handles non-space text between parent nodes.
const parser = new DOMParser();
const serializer = new XMLSerializer();
function formatXml(xml) {
let xmlDoc = parser.parseFromString(xml, 'application/xml');
let rootElement = xmlDoc.documentElement;
indentChildren(xmlDoc, rootElement, "\n", "\n ");
xml = serializer.serializeToString(xmlDoc);
return xml;
}
function indentChildren(xmlDoc, node, prevPrefix, prefix) {
let children = node.childNodes;
let i;
let prevChild = null;
let prevChildType = 1;
let child = null;
let childType;
for (i = 0; i < children.length; i++) {
child = children[i];
childType = child.nodeType;
if (childType != 3) {
if (prevChildType == 3) {
// Update prev text node with correct indent
prevChild.nodeValue = prefix;
} else {
// Create and insert text node with correct indent
let textNode = xmlDoc.createTextNode(prefix);
node.insertBefore(textNode, child);
i++;
}
if (childType == 1) {
let isLeaf = child.childNodes.length == 0 || child.childNodes.length == 1 && child.childNodes[0].nodeType != 1;
if (!isLeaf) {
indentChildren(xmlDoc, child, prefix, prefix + " ");
}
}
}
prevChild = child;
prevChildType =childType;
}
if (child != null) {
// Previous level indentation after last child
if (childType == 3) {
child.nodeValue = prevPrefix;
} else {
let textNode = xmlDoc.createTextNode(prevPrefix);
node.append(textNode);
}
}
}
Reference: https://www.w3schools.com/XML/dom_intro.asp
var formatXml = this.formatXml = function (xml) {
var reg = /(>)(<)(\/*)/g;
var wsexp = / *(.*) +\n/g;
var contexp = /(<.+>)(.+\n)/g;
xml = xml.replace(reg, '$1\n$2$3').replace(wsexp, '$1\n').replace(contexp, '$1\n$2');
var pad = 0;
var formatted = '';
var lines = xml.split('\n');
var indent = 0;
var lastType = 'other';
var reg = /(>)\s*(<)(\/*)/g;
xml = xml.replace(/\r|\n/g, ''); //deleting already existing whitespaces
xml = xml.replace(reg, '$1\r\n$2$3');
Use above method for pretty print and then add this in any div by using jquery text() method. for example id of div is xmldiv then use :
$("#xmldiv").text(formatXml(youXmlString));
You could also use Saxon-JS client-side:
<script src="SaxonJS/SaxonJS2.js"></script>
<script>
let myXML = `<root><node/></root>`;
SaxonJS.getResource({
text: myXML.replace(`xml:space="preserve"`, ''),
type: "xml"
}).then(doc => {
const output = SaxonJS.serialize(doc, {method: "xml", indent: true, "omit-xml-declaration":true});
console.log(output);
})
</script>
Saxon-JS Installation client-side
Saxon-JS Download page
This may involve creating nodes as objects, but you can have total control over exporting pretty formatted xml.
The following will return a string array of the lines which you can join with a new line delimiter "\n".
/**
* The child of an XML node can be raw text or another xml node.
*/
export type PossibleNode = XmlNode | string;
/**
* Base XML Node type.
*/
export interface XmlNode {
tag: string;
attrs?: { [key: string]: string };
children?: PossibleNode[];
}
/**
* Exports the given XML node to a string array.
*
* #param node XML Node
* #param autoClose Auto close the tag
* #param indent Indentation level
* #returns String array
*/
export function xmlNodeToString(
node: XmlNode,
autoClose: boolean = true,
indent: number = 0
): string[] {
const indentStr = " ".repeat(indent);
const sb: string[] = [];
sb.push(`${indentStr}<${node.tag}`);
if (node.attrs) {
for (const key in node.attrs) {
sb.push(`${indentStr} ${key}="${node.attrs[key]}"`);
}
}
if (node.children) {
if (node.children.length === 1 && typeof node.children[0] === "string") {
sb[sb.length - 1] += ">" + node.children[0];
} else {
sb.push(`${indentStr}>`);
for (const child of node.children) {
if (typeof child === "string") {
sb.push(`${indentStr} ${child}`);
} else {
const lines = xmlNodeToString(child, autoClose, indent + 1);
sb.push(...lines.map((line) => `${indentStr} ${line}`));
}
}
}
if (autoClose) {
if (node.children.length === 1 && typeof node.children[0] === "string") {
sb[sb.length - 1] += `</${node.tag}>`;
} else {
sb.push(`${indentStr}</${node.tag}>`);
}
}
} else {
if (autoClose) {
sb.push(`${indentStr}/>`);
} else {
sb.push(`${indentStr}>`);
}
}
return sb;
}
Updates appreciated on the gist: https://gist.github.com/rodydavis/acd609560ab0416b60681fddabc43eee
Xml-to-json library has method formatXml(xml). I am the maintainer of the project.
var prettyXml = formatXml("<a><b/></a>");
// <a>
// <b/>
// </a>
This my version, maybe usefull for others, using String builder
Saw that someone had the same piece of code.
public String FormatXml(String xml, String tab)
{
var sb = new StringBuilder();
int indent = 0;
// find all elements
foreach (string node in Regex.Split(xml,#">\s*<"))
{
// if at end, lower indent
if (Regex.IsMatch(node, #"^\/\w")) indent--;
sb.AppendLine(String.Format("{0}<{1}>", string.Concat(Enumerable.Repeat(tab, indent).ToArray()), node));
// if at start, increase indent
if (Regex.IsMatch(node, #"^<?\w[^>]*[^\/]$")) indent++;
}
// correct first < and last > from the output
String result = sb.ToString().Substring(1);
return result.Remove(result.Length - Environment.NewLine.Length-1);
}

Categories