Javascript encode HTML entities on server - javascript

In my server application (on Parse Cloud Code), I want save some string data. There are HTML entities here, which I want to encode.
So i find a solution with Javascript:
var txt = document.createElement("textarea");
txt.innerHTML = html;
return txt.value;
This code work perfectly on html pages, where document exists. But there isn't such variable on server.
How can i declare document variable? Or maybe you know another solutions for encoding HTML entities.

You could use html-entities on Node, install it like:
npm install html-entities
then you got entities.encode(..) and entities.decode(..) functions:
var Entities = require('html-entities').XmlEntities;
entities = new Entities();
console.log(entities.encode('<>"\'&©®')); // <>"&apos;&©®
there are more examples in usage part on gihub repo.

function encode(r){
return r.replace(/[\x26\x0A\<>'"]/g,function(r){return"&#"+r.charCodeAt(0)+";"})
}
test.value=encode('How to encode\nonly html tags &<>\'" nice & fast!');
/*************
* \x26 is &ampersand (it has to be first),
* \x0A is newline,
*************/
<textarea id=test rows=11 cols=55>www.WHAK.com</textarea>

Since I asked this question, I learned JavaScript and AJAX. So, my suggestion will be using AJAX and JSON for communication between browser and server-side.

Related

How can I escape a string to ensure that it is a valid string literal in JS source?

I have a Qt application that embeds a web browser (QWebEngineView). I would like to call a javascript function with a string argument from the C++ application. The means of doing this is calling
page()->runJavaScript("setContent(\"hello\");");
This works in simple cases. However, if I try and load, say, a C++ source file and use that as the parameter of setContent, this will break, because I can't simply assemble the string like this:
auto js = QString("setContent(\"%1\");").arg(fileStr);
I tried the following:
fileStr = fileStr.replace('"', "\\\"");
fileStr = fileStr.replace("\n", "\\n");
But apparently this could not escape the string, I get an error when I call this javascript. How can I universally escape a long string with newlines and possible special characters so that I can construct a valid js fragment like this?
So, after some research, I came across QWebChannel which is meant for bi-directional communication between the application and the hosted webpage. The imported qwebchannel.js in the examples can be found here. From there, this is what I did:
In C++:
auto channel = new QWebChannel(this);
page()->setWebChannel(channel);
channel->registerObject("doc", Doc);
In HTML/JS:
new QWebChannel(qt.webChannelTransport,
function(channel) {
var doc = channel.objects.doc; // this is "doc" from the registerObject call
editor.setValue(doc.text);
doc.textChanged.connect(updateText); // textChanged is a signal of the class of doc.
}
);
So, even though this does not directly answer the question, what is presented here can be used to achieve the same effect.

Sending Javascript code to client-side is not escaped correctly

I'm struggling with an encoding-problem in a small system I'm constructing.
In an HTML, this script is loaded
<script src="http://localhost:8000/serving/dk-711218"></script>
and normally I can't access the HTML so everything has to be done inside the javascript file.
The server-side scripts are made in Node.js and it returns pieces of code depending on some settings in customizable XML files. For instance, when displaying an image the system returns a simple
<img src="mypicture.jpg" />
and if it's a text, it returns
<div class="myClass">This is a test</div>
and if they have special behaviors, this code is included as well.
These parts work as intended. These chunks of code resides inside a number of classes and are returned as needed so that the code is gradually built.
So far, so good.
The problem is returning the SWFobject library code, because it seems to corrupt the code on the fly.
All code has been escaped and encoded with encodeURIComponent so that the system just needs to decode and unescape. But the validation fails.
Here's an example of the first few lines before encoding/escaping:
var%2520swfobject%253Dfunction%2528%2529%257Bvar...
Here's how a piece of the SWFObject looks like in the Firefox source code window when accessing the page:
and here's how a piece of the decoded SWFObject looks like in the same window:
This occurs at several places and something that's common for these occurrences is that it looks like the less-than character for unknown reasons is interpreted differently.
Here's the view renderer and I can't figure out if problem is caused in the code or when rendering the code.
Any ideas to what's causing this behavior? Or perhaps some advices on best practice when including code this way?
Responses to comments:
try JSON.stringify
I've tried the JSON solution out as well and it does the trick!
What I did was - as before - to pre-process the included code, using a little tool I built with two input-fields and a JSON.stringify-command between the two. This resulted in the content of returnvar:
Module.prototype.opens = function () {
var returnvar = "var swfobject=function(){var D=\"undefined\",r=\"object\",S=\"Shockwave Flash\",W=\"ShockwaveFlash.ShockwaveFlash\",q=\"application/x-shockwave-flash\",R=\"SWFObjectExprInst\"... etc.
and a JSON.parse is used to correct it again in the renderer:
router.get('/serving/:id', function (req, res) {
var readSymbolicsXMLCallback = function(data) {
res.render('index', {
id: req.params.id,
embedcode: JSON.parse(data)
});
}
var embedcode = readSymbolicsXML(req.params.id, readSymbolicsXMLCallback);
});

Remove lines from CSS that contain references to external files

I'm working on a simple webmail script in php. The content of a message body is retrieved using jQuery which gets the content returned from a php script. For example:
$.get("file.php", function(data) { /* Data is the message content */ });
From here, I'm then writing the string in data to the document of an iFrame. I want to make sure that the content returned is sanitized and one step to this is removing all references to external files, particularly remote files accessed over http. For example, javascript files or images on a server somewhere. It's important to do this because not only may external scripts try to manipulate my page, external images may be running through a dynamic engine like php and confirming to spammers that my email address is active and able to receive mail, and some images can apparently contain viruses.
The following script can remove a lot of things that may be hazardous:
function sanitize(str) {
var html = $(str);
var evil = new Array("head","base","link","script","img","object","embed","video","audio","iframe");
for (e=0; e<evil.length; e++) { html.find(evil[e]).remove(); }
var result = html.wrap("<div>").parent().html();
return result; }
But my question is this: how can I remove a line of css that contains a reference to an external file? For example, if the message body content contained a tag and inside it was this:
background-image: url(http://some/dodgy/server/image.jpg);
how would I remove that line from the string?
has not been tested , but you can try something like
str = str.replace(/background\-image:\s*url\(.*\);\s*/ig, "");

WP8 App misbehaving due to StreamWrite in JavaScript

I would like to save the results calculated on html page in a textfile using javascript.
<script type="text/javascript">
window.onload = function () {
var sw : StreamWriter = new StreamWriter("HTML_Results.txt");
sr.Write('xyz");
*** calculations ******
sr.Write (result);
}
</script>
by doing this, my WP8 App is misbehaving and not displaying images as usual. This app is an Image Fader (calculates FPS).
Also tried:
StreamWriter sr;
try {
sr = new StreamWriter("\HTML5\HTMLResults.txt");
sr.Write("xyz");
File.SetAttributes("HTML5\HTMLResults.txt", FileAttributes.Hidden);
} catch(IOException ex) {
console.write ("error writing"); //handling IO
}
The aim is to:
Extract calculated values of several html pages(after getting loaded
one by one) in a single text file.
A Resultant HTML that reads this
text file and displays results in a tabular format.
Is there a better way to this job or the above can be rectified and used? Appreciate help.
Perhaps I've misunderstood your code but it looks like you're trying to write Java within JavaScript scripting tags. You cannot write Java in an HTML document. As far as I know, client-side JavaScript (which given your <script> tags is I guess what you're trying to write) can't perform the kind of file I/O operations you seem to want here.
You need to use Node JS to use JavaScript for something like that and then you're talking server-side. The closest you can get on client-side is using the new localStorage feature in HTML5 (not supported by all browsers).

Using javascript to rename multiple HTML files using the <TITLE></TITLE> in each file

I have used HTTRACK to download Federal regulations from a government website, and the resulting HTML files are not intuitively named. Each file has a <TITLE></TITLE> tag set, that would serve nicely to name each file in a fashion that will lend itself to ebook creation. I want to turn these regulations into an ebook for my Kindle, so that I can have the regulations readily available for reference, rather than having to carry volumes of books with me everywhere.
My preferred text/hex editor, UltraEdit Professional 15.20.0.1026, has scripting commands enable through embedding of the JavaScript engine. In researching possible solutions to my problem, I found xmlTitleSave on the IDM UltraEdit website.
// ----------------------------------------------------------------------------
// Script Name: xmlTitleSave.js
// Creation Date: 2008-06-09
// Last Modified:
// Copyright: none
// Purpose: find the <title> value in an XML document, then saves the file as the
// title.xml in a user-specified directory
// ----------------------------------------------------------------------------
//Some variables we need
var regex = "<title>(.*)</title>" //Perl regular expression to find title string
var file_path = UltraEdit.getString("Path to save file at? !! MUST PRE EXIST !!",1);
// Start at the beginning of the file
UltraEdit.activeDocument.top();
UltraEdit.activeDocument.unicodeToASCII();
// Turn on regular expressions
UltraEdit.activeDocument.findReplace.regExp = true;
// Find it
UltraEdit.activeDocument.findReplace.find(regex);
// Load it into a selection
var titl = UltraEdit.activeDocument.selection;
// Javascript function 'match' will match the regex within the javascript engine
// so we can extract the actual title via array
t = titl.match(regex);
// 't' is an array of the match from 'titl' based on the var 'regex'
// the 2nd value of the array gives us what we need... then append '.xml'
saveTitle = t[1]+".xml";
UltraEdit.saveAs(file_path + saveTitle);
// Uncomment for debugging
// UltraEdit.outputWindow.write("titl = " + titl);
// UltraEdit.outputWindow.write("t = " + t);
My question is two-fold:
Can this JavaScript be modified to extract the <TITLE></TITLE> contents from an HTML file and rename the files?
If the JavaScript cannot be modified easily, is there a script/program/black magic/animal sacrifice that can accomplish the same thing?
EDIT:
I have been able to get the script to work as desired by removing the UltraEdit.activeDocument.unicodeToASCII(); line and changing the file extension to .html. My only issue now is that while this script works on single open files, it does not batch process the directory.
You can use just about any "scriptable" language to do something like this pretty quickly. Ruby is my favorite:
require 'fileutils'
dir = "/your/directory"
files = Dir["#{dir}/*.html"]
files.each do |file|
html = IO.read file
title = $1 if html.match /<title>([^<]+)<\/title>/i
FileUtils.mv file "#{dir}/#{title}.html"
puts "Renamed #{file} to #{title}.html."
end
Obviously if your UltraEdit script worked for you this might be obtuse, but for anybody running a different env, hopefully this is useful.
Does this not work out of the box?
I don't know anything about UltraEdit, but as far as a regex engine is concerned, if it can parse <title>(.*)</title> out of an XML document, it can do the exact same for HTML.
Just modify the final file title to .html instead of .xml
saveTitle = t[1]+".html";
Assuming you can get that script to work as it's intended (point being I don't know UltraEdit), I'm pretty confident that same process will work for HTML.
XML and HTML are both plain text, and that script is simply running a regular expression on the text to extract the title tags, which are the same in both; the only thing you need to do is change this line:
saveTitle = t[1]+".xml";
to this:
saveTitle = t[1]+".html";
After much searching and trial and error on the scripting side, I ran across a fantastic program for Windows that will do the renaming via TITLE tags: Flexible Renamer 8.3. The author's website is http://hp.vector.co.jp/authors/VA014830/english/FlexRena/, and it manages to handle every bit of what I needed. Many thanks to #coreyward and #Yuji for their fantastic advice on the scripting end of things.

Categories