Html Agility Pack messing with my javascript

Html Agility Pack messing with my javascript - javascript

I'm using the Html Agility Pack to output some javascript in the head of my document. But after saving the document to the file system I recognized that the javascript source has been modified. I guess this happens because HAP is trying to validate my script. Is it possible to prevent this? As you can see below I already tried setting different options.
My code using HAP:
var htmlDoc = new HtmlDocument();
htmlDoc.OptionCheckSyntax = false;
htmlDoc.OptionAutoCloseOnEnd = false;
htmlDoc.OptionFixNestedTags = false;
htmlDoc.LoadHtml(htmlContent);
HtmlNode headNode = htmlDoc.DocumentNode.SelectSingleNode("//head");
headNode.AddScriptNode(htmlDoc, "../../Scripts/jquery-1.7.1.min.js");
Extension Method for adding the script tag
public static void AddScriptNode(this HtmlNode headNode, HtmlDocument htmlDoc, string filePath)
{
string content = "";
using (StreamReader rdr = File.OpenText(filePath))
{
content = rdr.ReadToEnd();
}
if(headNode != null)
{
HtmlNode scripts = htmlDoc.CreateElement("script");
scripts.Attributes.Add("type", "text/javascript");
scripts.InnerHtml = "\n" + content + "\n";
headNode.AppendChild(scripts);
}
}

My assumption: when using scripts.InnerHtml AgilityPack tries to parse the content as HTML. So if there are tags there they will be converted to HTML nodes.
To avoid this you should set the content of the <script> as text. Unfortunately, HtmlNode.InnerText property is a read-only but there is a workaround for this. You could just add a text(a comment node will be preferrable) node to your <script> node:
if(headNode != null)
{
HtmlNode scripts = htmlDoc.CreateElement("script");
scripts.Attributes.Add("type", "text/javascript");
scripts.AppendChild(htmlDoc.CreateComment("\n" + content + "\n"));
headNode.AppendChild(scripts);
}
Here the body of your script will be added as a comment node (<!-- and --> will be added).

Related

nodejs block html in socket.io messages [duplicate]

Is there an easy way to take a string of html in JavaScript and strip out the html?

If you're running in a browser, then the easiest way is just to let the browser do it for you...
function stripHtml(html)
{
let tmp = document.createElement("DIV");
tmp.innerHTML = html;
return tmp.textContent || tmp.innerText || "";
}
Note: as folks have noted in the comments, this is best avoided if you don't control the source of the HTML (for example, don't run this on anything that could've come from user input). For those scenarios, you can still let the browser do the work for you - see Saba's answer on using the now widely-available DOMParser.

myString.replace(/<[^>]*>?/gm, '');

Simplest way:
jQuery(html).text();
That retrieves all the text from a string of html.

I would like to share an edited version of the Shog9's approved answer.
As Mike Samuel pointed with a comment, that function can execute inline javascript code.
But Shog9 is right when saying "let the browser do it for you..."
so.. here my edited version, using DOMParser:
function strip(html){
let doc = new DOMParser().parseFromString(html, 'text/html');
return doc.body.textContent || "";
}
here the code to test the inline javascript:
strip("<img onerror='alert(\"could run arbitrary JS here\")' src=bogus>")
Also, it does not request resources on parse (like images)
strip("Just text <img src='https://assets.rbl.ms/4155638/980x.jpg'>")

As an extension to the jQuery method, if your string might not contain HTML (eg if you are trying to remove HTML from a form field)
jQuery(html).text();
will return an empty string if there is no HTML
Use:
jQuery('<p>' + html + '</p>').text();
instead.
Update:
As has been pointed out in the comments, in some circumstances this solution will execute javascript contained within html if the value of html could be influenced by an attacker, use a different solution.

Converting HTML for Plain Text emailing keeping hyperlinks (a href) intact
The above function posted by hypoxide works fine, but I was after something that would basically convert HTML created in a Web RichText editor (for example FCKEditor) and clear out all HTML but leave all the Links due the fact that I wanted both the HTML and the plain text version to aid creating the correct parts to an STMP email (both HTML and plain text).
After a long time of searching Google myself and my collegues came up with this using the regex engine in Javascript:
str='this string has <i>html</i> code i want to <b>remove</b><br>Link Number 1 ->BBC Link Number 1<br><p>Now back to normal text and stuff</p>
';
str=str.replace(/<br>/gi, "\n");
str=str.replace(/<p.*>/gi, "\n");
str=str.replace(/<a.*href="(.*?)".*>(.*?)<\/a>/gi, " $2 (Link->$1) ");
str=str.replace(/<(?:.|\s)*?>/g, "");
the str variable starts out like this:
this string has <i>html</i> code i want to <b>remove</b><br>Link Number 1 ->BBC Link Number 1<br><p>Now back to normal text and stuff</p>
and then after the code has run it looks like this:-
this string has html code i want to remove
Link Number 1 -> BBC (Link->http://www.bbc.co.uk) Link Number 1
Now back to normal text and stuff
As you can see the all the HTML has been removed and the Link have been persevered with the hyperlinked text is still intact. Also I have replaced the <p> and <br> tags with \n (newline char) so that some sort of visual formatting has been retained.
To change the link format (eg. BBC (Link->http://www.bbc.co.uk) ) just edit the $2 (Link->$1), where $1 is the href URL/URI and the $2 is the hyperlinked text. With the links directly in body of the plain text most SMTP Mail Clients convert these so the user has the ability to click on them.
Hope you find this useful.

An improvement to the accepted answer.
function strip(html)
{
var tmp = document.implementation.createHTMLDocument("New").body;
tmp.innerHTML = html;
return tmp.textContent || tmp.innerText || "";
}
This way something running like this will do no harm:
strip("<img onerror='alert(\"could run arbitrary JS here\")' src=bogus>")
Firefox, Chromium and Explorer 9+ are safe.
Opera Presto is still vulnerable.
Also images mentioned in the strings are not downloaded in Chromium and Firefox saving http requests.

This should do the work on any Javascript environment (NodeJS included).
const text = `
<html lang="en">
<head>
<style type="text/css">*{color:red}</style>
<script>alert('hello')</script>
</head>
<body><b>This is some text</b><br/><body>
</html>`;
// Remove style tags and content
text.replace(/<style[^>]*>.*<\/style>/gm, '')
// Remove script tags and content
.replace(/<script[^>]*>.*<\/script>/gm, '')
// Remove all opening, closing and orphan HTML tags
.replace(/<[^>]+>/gm, '')
// Remove leading spaces and repeated CR/LF
.replace(/([\r\n]+ +)+/gm, '');

I altered Jibberboy2000's answer to include several <BR /> tag formats, remove everything inside <SCRIPT> and <STYLE> tags, format the resulting HTML by removing multiple line breaks and spaces and convert some HTML-encoded code into normal. After some testing it appears that you can convert most of full web pages into simple text where page title and content are retained.
In the simple example,
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<!--comment-->
<head>
<title>This is my title</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style>
body {margin-top: 15px;}
a { color: #D80C1F; font-weight:bold; text-decoration:none; }
</style>
</head>
<body>
<center>
This string has <i>html</i> code i want to <b>remove</b><br>
In this line BBC with link is mentioned.<br/>Now back to "normal text" and stuff using <html encoding>
</center>
</body>
</html>
becomes
This is my title
This string has html code i want to remove
In this line BBC (http://www.bbc.co.uk) with link is mentioned.
Now back to "normal text" and stuff using
The JavaScript function and test page look this:
function convertHtmlToText() {
var inputText = document.getElementById("input").value;
var returnText = "" + inputText;
//-- remove BR tags and replace them with line break
returnText=returnText.replace(/<br>/gi, "\n");
returnText=returnText.replace(/<br\s\/>/gi, "\n");
returnText=returnText.replace(/<br\/>/gi, "\n");
//-- remove P and A tags but preserve what's inside of them
returnText=returnText.replace(/<p.*>/gi, "\n");
returnText=returnText.replace(/<a.*href="(.*?)".*>(.*?)<\/a>/gi, " $2 ($1)");
//-- remove all inside SCRIPT and STYLE tags
returnText=returnText.replace(/<script.*>[\w\W]{1,}(.*?)[\w\W]{1,}<\/script>/gi, "");
returnText=returnText.replace(/<style.*>[\w\W]{1,}(.*?)[\w\W]{1,}<\/style>/gi, "");
//-- remove all else
returnText=returnText.replace(/<(?:.|\s)*?>/g, "");
//-- get rid of more than 2 multiple line breaks:
returnText=returnText.replace(/(?:(?:\r\n|\r|\n)\s*){2,}/gim, "\n\n");
//-- get rid of more than 2 spaces:
returnText = returnText.replace(/ +(?= )/g,'');
//-- get rid of html-encoded characters:
returnText=returnText.replace(/ /gi," ");
returnText=returnText.replace(/&/gi,"&");
returnText=returnText.replace(/"/gi,'"');
returnText=returnText.replace(/</gi,'<');
returnText=returnText.replace(/>/gi,'>');
//-- return
document.getElementById("output").value = returnText;
}
It was used with this HTML:
<textarea id="input" style="width: 400px; height: 300px;"></textarea><br />
<button onclick="convertHtmlToText()">CONVERT</button><br />
<textarea id="output" style="width: 400px; height: 300px;"></textarea><br />

var text = html.replace(/<\/?("[^"]*"|'[^']*'|[^>])*(>|$)/g, "");
This is a regex version, which is more resilient to malformed HTML, like:
Unclosed tags
Some text <img
"<", ">" inside tag attributes
Some text <img alt="x > y">
Newlines
Some <a
href="http://google.com">
The code
var html = '<br>This <img alt="a>b" \r\n src="a_b.gif" />is > \nmy<>< > <a>"text"</a'
var text = html.replace(/<\/?("[^"]*"|'[^']*'|[^>])*(>|$)/g, "");

from CSS tricks:
https://css-tricks.com/snippets/javascript/strip-html-tags-in-javascript/
const originalString = `
<div>
<p>Hey that's <span>somthing</span></p>
</div>
`;
const strippedString = originalString.replace(/(<([^>]+)>)/gi, "");
console.log(strippedString);

Another, admittedly less elegant solution than nickf's or Shog9's, would be to recursively walk the DOM starting at the <body> tag and append each text node.
var bodyContent = document.getElementsByTagName('body')[0];
var result = appendTextNodes(bodyContent);
function appendTextNodes(element) {
var text = '';
// Loop through the childNodes of the passed in element
for (var i = 0, len = element.childNodes.length; i < len; i++) {
// Get a reference to the current child
var node = element.childNodes[i];
// Append the node's value if it's a text node
if (node.nodeType == 3) {
text += node.nodeValue;
}
// Recurse through the node's children, if there are any
if (node.childNodes.length > 0) {
appendTextNodes(node);
}
}
// Return the final result
return text;
}

If you want to keep the links and the structure of the content (h1, h2, etc) then you should check out TextVersionJS You can use it with any HTML, although it was created to convert an HTML email to plain text.
The usage is very simple. For example in node.js:
var createTextVersion = require("textversionjs");
var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";
var textVersion = createTextVersion(yourHtml);
Or in the browser with pure js:
<script src="textversion.js"></script>
<script>
var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";
var textVersion = createTextVersion(yourHtml);
</script>
It also works with require.js:
define(["textversionjs"], function(createTextVersion) {
var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";
var textVersion = createTextVersion(yourHtml);
});

const htmlParser= new DOMParser().parseFromString("<h6>User<p>name</p></h6>" , 'text/html');
const textString= htmlParser.body.textContent;
console.log(textString)

A lot of people have answered this already, but I thought it might be useful to share the function I wrote that strips HTML tags from a string but allows you to include an array of tags that you do not want stripped. It's pretty short and has been working nicely for me.
function removeTags(string, array){
return array ? string.split("<").filter(function(val){ return f(array, val); }).map(function(val){ return f(array, val); }).join("") : string.split("<").map(function(d){ return d.split(">").pop(); }).join("");
function f(array, value){
return array.map(function(d){ return value.includes(d + ">"); }).indexOf(true) != -1 ? "<" + value : value.split(">")[1];
}
}
var x = "<span><i>Hello</i> <b>world</b>!</span>";
console.log(removeTags(x)); // Hello world!
console.log(removeTags(x, ["span", "i"])); // <span><i>Hello</i> world!</span>

For easier solution, try this => https://css-tricks.com/snippets/javascript/strip-html-tags-in-javascript/
var StrippedString = OriginalString.replace(/(<([^>]+)>)/ig,"");

It is also possible to use the fantastic htmlparser2 pure JS HTML parser. Here is a working demo:
var htmlparser = require('htmlparser2');
var body = '<p><div>This is </div>a <span>simple </span> <img src="test"></img>example.</p>';
var result = [];
var parser = new htmlparser.Parser({
ontext: function(text){
result.push(text);
}
}, {decodeEntities: true});
parser.write(body);
parser.end();
result.join('');
The output will be This is a simple example.
See it in action here: https://tonicdev.com/jfahrenkrug/extract-text-from-html
This works in both node and the browser if you pack your web application using a tool like webpack.

I made some modifications to original Jibberboy2000 script
Hope it'll be usefull for someone
str = '**ANY HTML CONTENT HERE**';
str=str.replace(/<\s*br\/*>/gi, "\n");
str=str.replace(/<\s*a.*href="(.*?)".*>(.*?)<\/a>/gi, " $2 (Link->$1) ");
str=str.replace(/<\s*\/*.+?>/ig, "\n");
str=str.replace(/ {2,}/gi, " ");
str=str.replace(/\n+\s*/gi, "\n\n");

After trying all of the answers mentioned most if not all of them had edge cases and couldn't completely support my needs.
I started exploring how php does it and came across the php.js lib which replicates the strip_tags method here: http://phpjs.org/functions/strip_tags/

function stripHTML(my_string){
var charArr = my_string.split(''),
resultArr = [],
htmlZone = 0,
quoteZone = 0;
for( x=0; x < charArr.length; x++ ){
switch( charArr[x] + htmlZone + quoteZone ){
case "<00" : htmlZone = 1;break;
case ">10" : htmlZone = 0;resultArr.push(' ');break;
case '"10' : quoteZone = 1;break;
case "'10" : quoteZone = 2;break;
case '"11' :
case "'12" : quoteZone = 0;break;
default : if(!htmlZone){ resultArr.push(charArr[x]); }
}
}
return resultArr.join('');
}
Accounts for > inside attributes and <img onerror="javascript"> in newly created dom elements.
usage:
clean_string = stripHTML("string with <html> in it")
demo:
https://jsfiddle.net/gaby_de_wilde/pqayphzd/
demo of top answer doing the terrible things:
https://jsfiddle.net/gaby_de_wilde/6f0jymL6/1/

Here's a version which sorta addresses #MikeSamuel's security concern:
function strip(html)
{
try {
var doc = document.implementation.createDocument('http://www.w3.org/1999/xhtml', 'html', null);
doc.documentElement.innerHTML = html;
return doc.documentElement.textContent||doc.documentElement.innerText;
} catch(e) {
return "";
}
}
Note, it will return an empty string if the HTML markup isn't valid XML (aka, tags must be closed and attributes must be quoted). This isn't ideal, but does avoid the issue of having the security exploit potential.
If not having valid XML markup is a requirement for you, you could try using:
var doc = document.implementation.createHTMLDocument("");
but that isn't a perfect solution either for other reasons.

I think the easiest way is to just use Regular Expressions as someone mentioned above. Although there's no reason to use a bunch of them. Try:
stringWithHTML = stringWithHTML.replace(/<\/?[a-z][a-z0-9]*[^<>]*>/ig, "");

Below code allows you to retain some html tags while stripping all others
function strip_tags(input, allowed) {
allowed = (((allowed || '') + '')
.toLowerCase()
.match(/<[a-z][a-z0-9]*>/g) || [])
.join(''); // making sure the allowed arg is a string containing only tags in lowercase (<a><b><c>)
var tags = /<\/?([a-z][a-z0-9]*)\b[^>]*>/gi,
commentsAndPhpTags = /<!--[\s\S]*?-->|<\?(?:php)?[\s\S]*?\?>/gi;
return input.replace(commentsAndPhpTags, '')
.replace(tags, function($0, $1) {
return allowed.indexOf('<' + $1.toLowerCase() + '>') > -1 ? $0 : '';
});
}

I just needed to strip out the <a> tags and replace them with the text of the link.
This seems to work great.
htmlContent= htmlContent.replace(/<a.*href="(.*?)">/g, '');
htmlContent= htmlContent.replace(/<\/a>/g, '');

The accepted answer works fine mostly, however in IE if the html string is null you get the "null" (instead of ''). Fixed:
function strip(html)
{
if (html == null) return "";
var tmp = document.createElement("DIV");
tmp.innerHTML = html;
return tmp.textContent || tmp.innerText || "";
}

A safer way to strip the html with jQuery is to first use jQuery.parseHTML to create a DOM, ignoring any scripts, before letting jQuery build an element and then retrieving only the text.
function stripHtml(unsafe) {
return $($.parseHTML(unsafe)).text();
}
Can safely strip html from:
<img src="unknown.gif" onerror="console.log('running injections');">
And other exploits.
nJoy!

const strip=(text) =>{
return (new DOMParser()?.parseFromString(text,"text/html"))
?.body?.textContent
}
const value=document.getElementById("idOfEl").value
const cleanText=strip(value)

With jQuery you can simply retrieving it by using
$('#elementID').text()

I have created a working regular expression myself:
str=str.replace(/(<\?[a-z]*(\s[^>]*)?\?(>|$)|<!\[[a-z]*\[|\]\]>|<!DOCTYPE[^>]*?(>|$)|<!--[\s\S]*?(-->|$)|<[a-z?!\/]([a-z0-9_:.])*(\s[^>]*)?(>|$))/gi, '');

simple 2 line jquery to strip the html.
var content = "<p>checking the html source </p><p>
</p><p>with </p><p>all</p><p>the html </p><p>content</p>";
var text = $(content).text();//It gets you the plain text
console.log(text);//check the data in your console
cj("#text_area_id").val(text);//set your content to text area using text_area_id

How to insert a javascript file to my BHO Extension for IE-11? I want to insert a javascript file with multiple lines

I want to insert a javascript into a page whenever a site loads using Browser Helper object in IE-11.
public void OnDocumentComplete(object pDisp, ref object URL)
{
HTMLDocument document = (HTMLDocument)webBrowser.Document;
IHTMLElement head = (IHTMLElement)((IHTMLElementCollection)
document.all.tags("head")).item(null, 0);
IHTMLScriptElement scriptObject =
(IHTMLScriptElement)document.createElement("script");
scriptObject.type = #"text/javascript";
scriptObject.text = "\nfunction hidediv(){document.getElementById" +
"('myOwnUniqueId12345').style.visibility = 'hidden';}\n\n";
((HTMLHeadElement)head).appendChild((IHTMLDOMNode)scriptObject);
string div = "<div id=\"myOwnUniqueId12345\" style=\"position:" +
"fixed;bottom:0px;right:0px;z-index:9999;width=300px;" +
"height=150px;\"> <div style=\"position:relative;" +
"float:right;font-size:9px;\"><a " +
"href=\"javascript:hidediv();\">close</a></div>" +
"My content goes here ...</div>";
document.body.insertAdjacentHTML("afterBegin", div);
}
I want to get a javascript file inserted into page instead of using scriptObject.text.

Is there an easy way to add the struts 1.3 html styleId attribute without touching every element?

I am currently working with legacy code to attempt to get it to work correctly in newer browsers. The code is written with Struts 1.3 and makes use of the html tag library extensively in the following manner:
<html:text property="myTextInput" maxlength="10"/>
Which produces the following html when rendered:
<input name="myTextInput" type="text" maxlength="10" value="">
In old versions of IE, one could use document.getElementById('myTextInput') to get a reference even if the element only had a name attribute and didn't have an id attribute. When using the jsp html tags, the name property generates the name attribute in the html code but doesn't generate the id attribute.
I found adding styleId to the html tag in the jsp does add the id attribute to the resulting xml, but this means I would have to touch every single html tag element in all the jsp's and change it similar to:
<html:text property="myTextInput" styleId="myTextInput" maxlength="10"/>
I also found document.getElementByName(), but this results in touching a lot of javascript and also (due to bad code), I don't know if it really is referring to an element by the id or name so this could cause some issues.
Is there an easy way to add the styleId attribute without touching every element?

I ended up writing a small java main method to deal with this. I use regex to find the html elements (select,option. text, hidden, textarea) that don't already have a styleId attribute and then add the styleId attribute with the same value as the property attribute. This could be expanded to do a bunch of files at once but right now I just wanted something to do individual files so I could easily check them against source control and make sure it worked correctly. It's a quick and dirty solution to a problem so I wouldn't have to comb through tons of jsp files manually so I'm sure there are some edge cases it doesn't deal with. With that said:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.nio.channels.FileChannel;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class JspModifierStyleId {
public static void main(String[] args) throws IOException {
String lineEnding = "\r\n";
String baseDir= "C:/path/to/your/directory/"; //Change this to suit your directory
String origFileName= "OriginalFile.jsp"; //Change this to suit your original file that needs the attribute added
File origFile = new File(baseDir + origFileName);
String tempFileName = "TemporaryFile.jsp";
File tempFile = new File(baseDir + tempFileName);
Pattern p = Pattern.compile("^(?!.*styleId)\\s*<html:(?:select|option|text|hidden|textarea)\\s.*property=\"([a-zA-Z1-9.]*)\".+");
FileReader in = new FileReader(origFile);
FileWriter out = new FileWriter(tempFile);
BufferedReader br = new BufferedReader(in);
BufferedWriter bw = new BufferedWriter(out);
String line;
while ((line = br.readLine()) != null) {
Matcher m = p.matcher(line);
if(m.matches()){
String strWithStyleId = line.substring(0, m.start(1)) + m.group(1) + "\" styleId=\"" + line.substring(m.start(1));
bw.write(strWithStyleId + lineEnding);
System.out.println(strWithStyleId);
}else {
bw.write(line + lineEnding);
}
}
br.close();
bw.close();
//copies back to original file, BE CAREFUL!!!
copyFile(tempFile, origFile);
}
public static void copyFile(File sourceFile, File destFile) throws IOException {
if(!destFile.exists()) {
destFile.createNewFile();
}
FileChannel source = null;
FileChannel destination = null;
try {
source = new FileInputStream(sourceFile).getChannel();
destination = new FileOutputStream(destFile).getChannel();
destination.transferFrom(source, 0, source.size());
}
finally {
if(source != null) {
source.close();
}
if(destination != null) {
destination.close();
}
}
}
}

Extract only javascript from a script tag

I would like to extract only javascript from script tags in a HTML document which I want to pass it to a JS parser like esprima. I am using nodejs to write this application and have the content extracted from the script tag as a string.
The problem is when there are HTML comments in the javascript extracted from html documents which I want to remove.
<!-- var a; --> should be converted to var a
A simple removal of <-- and --> does not work since it fails in the case <!-- if(j-->0); --> where it removes the middle -->
I would also like to remove identifiers like [if !IE] and [endif] which are sometimes found inside script tags.
I would also like to extract the JS inside CDATA segments.
<![CDATA[ var a; ]]> should be converted to var a
Is all this possible using a regex or is something more required?
In short I would like to sanitize the JS from script tags so that I can safely pass it into a parser like esprima.
Thanks!
EDIT:
Based on #user568109 's answer. This is the rough code that parses through HTML comments and CDATA segments inside script tags
var htmlparser = require("htmlparser2");
var jstext = '';
var parser = new htmlparser.Pavar htmlparser = require("htmlparser2");
var jstext = '';
var parser = new htmlparser.Parser({
onopentag: function(name, attribs){
if(name === "script" && attribs.type === "text/javascript"){
jstext = '';
//console.log("JS! Hooray!");
}
},
ontext: function(text) {
jstext += text;
},
onclosetag: function(tagname) {
if(tagname === "script") {
console.log(jstext);
jstext = '';
}
},
oncomment : function(data) {
if(jstext) {
jstext += data;
}
}
}, {
xmlMode:true
});
parser.write(input);
parser.end()

That is the job of the parser. See the htmlparser2 or esprima itself. Please don't use regex to parse HTML, it is seductive. You will waste your precious time and effort trying to match more tags.
An example from the page:
var htmlparser = require("htmlparser2");
var parser = new htmlparser.Parser({
onopentag: function(name, attribs){
if(name === "script" && attribs.type === "text/javascript"){
console.log("JS! Hooray!");
}
},
ontext: function(text){
console.log("-->", text);
},
onclosetag: function(tagname){
if(tagname === "script"){
console.log("That's it?!");
}
}
});
parser.write("Xyz <script type='text/javascript'>var foo = '<<bar>>';</script>");
parser.end();
Output (simplified):
--> Xyz
JS! Hooray!
--> var foo = '<<bar>>';
That's it?!
It will give you all the tags divs, comments, scripts etc. But you would have to validate the script inside the comments yourself. Also CDATA is a valid tag in XML(XHTML), so htmlparser2 would detect it as a comment, you would have to check those too.

Insert javascript from an external script into javascript in the HTML

I have some JS from an external JS file that I want to insert inside of a JS function in the HTML file. I can not touch the JS script in the HTML file, so I am wondering if this method can be done.
Here is the JS I want to insert inside of the JS function in the HTML file.
// FIRST JS TO INSERT
if (OS == "mobile"){
killVideoPlayer();
}
// SECOND JS TO INSERT
if (OS == "mobile"){
loadHtmlFiveVideo();
if (!document.all){
flvPlayerLoaded = false;
}
}else {
loadVideoPlayer();
}
Then I want to insert it into here.
<script>
function mediaTypeCheck() {
if (bsiCompleteArray[arrayIndex].mediaType == "video") {
// INSERT FIRST JS HERE
document.getElementById("bsi-video-wrap").style.display = "block";
document.getElementById('pngBsi').style.display = "block";
document.getElementById("frame_photo").style.display = "none";
document.getElementById("relativeFrame").style.display = "block";
document.getElementById("buy-me-photo-button-bsi").style.display = "none";
// INSTER SECOND JS HERE
loadVideoPlayer();
}
if (bsiCompleteArray[arrayIndex].mediaType == "photo") {
killVideoPlayer();
document.getElementById("bsi-video-wrap").style.display = "none";
document.getElementById('pngBsi').style.display = "block";
document.getElementById("relativeFrame").style.display = "block";
document.getElementById("frame_photo").style.display = "block";
document.getElementById("buy-me-photo-button-bsi").style.display = "block";
if (!document.all){
flvPlayerLoaded = false;
}
}
}
</script>
Thank you!

In JavaScript, you can overwrite variables with new values at any time, including functions.
By the looks of it, you could replace the mediaTypeCheck function with one of your own that does what you need and then calls the original function.
E.g.
(function(){
// keep track of the original mediaTypeCheck
var old_function = mediaTypeCheck;
// overwrite mediaTypeCheck with your wrapper function
mediaTypeCheck = function() {
if ( conditions ) {
// do whatever you need to, then ...
}
return old_function();
};
})();
The above can be loaded from any script, so long as it happens after the mediaTypeCheck function is defined.

The easiest way for me in the past has been using server-side includes. Depending on your back end, you can set up a PHP or ASP page or whatever to respond with a mime type that mimics ".js".
I'm not a PHP guy, but you'd do something like this: (if my syntax is incorrect, please someone else fix it)
<?php
//adding this header will make your browser think that this is a real .js page
header( 'Content-Type: application/javascript' );
?>
//your normal javascript here
<script>
function mediaTypeCheck() {
if (bsiCompleteArray[arrayIndex].mediaType == "video") {
//here is where you would 'include' your first javascript page
<?php
require_once(dirname(__FILE__) . "/file.php");
?>
//now continue on with your normal javascript code
document.getElementById("bsi-video-wrap").style.display = "block";
document.getElementById('pngBsi').style.display = "block";
.......

You cannot insert JS inside JS. What you can do is insert another tag into the DOM and specify the SRC for the external JS file.

You can directly insert js file using $.getScript(url);
if you have script as text then you can create script tag.
var script = docuent.createElement('script');
script.innerText = text;
document.getElementsByTagName('head')[0].appendChild(script);

The issue in your case is that you cannot touch the script in the html, so I'll say that it cannot be done on the client side.
If you could at least touch the script tag (not the script itself), then you could add a custom type to manipulate it before it executes, for example:
<script type="WaitForInsert">

Your question looks some strange, but seems to be possible. Try my quick/dirty working code and implement your own situation:
$(document.body).ready(function loadBody(){
var testStr = test.toString();
var indexAcc = testStr.indexOf("{");
var part1 = testStr.substr(0, indexAcc + 1);
var part2 = testStr.substr(indexAcc + 1, testStr.length - indexAcc - 2);
var split = part2.split(";");
split.pop();
split.splice(0,0, "alert('a')");
split.splice(split.length -1, 0, "alert('d')");
var newStr = part1 + split.join(";") + ";" + "}";
$.globalEval(newStr);
test();
}
function test(){
alert('b');
alert('c');
alert('e');
}

We Keep Coding

JavaScript is the programming language of the Web.

Html Agility Pack messing with my javascript - javascript

Related

nodejs block html in socket.io messages [duplicate]

How to insert a javascript file to my BHO Extension for IE-11? I want to insert a javascript file with multiple lines

Is there an easy way to add the struts 1.3 html styleId attribute without touching every element?

Extract only javascript from a script tag

Insert javascript from an external script into javascript in the HTML

Categories

Resources