How to adding special html chars without using innerHTML

How to adding special html chars without using innerHTML - javascript

So I'm working on a micro lib, html.js, and basically it creates text nodes with document.createTextNode but when I want to create a text node with a b I get a&nbsp;b so I'm wondering how to escape the & char, without using innerHTML ideally..

Javascript supports the \uXXXX notation, so in the case of a non-breaking space, that would be \u00A0.
document.createTextNode('a\u00A0b');
That's as far as you can get. It's a text node, consisting only of text, and there's no difference between texts created from entity references or from normal characters.
If that's not what you want, you should take a second look at innerHtml. Can't you read it, modify it and put it back?

There's not much functionality in js to encode/decode html entities. Seems like there some libraries out there, though, that can help you achieve this. Here is one I found on goodle.. haven't tried it, but you can check it out, or look for others.
http://www.strictly-software.com/htmlencode

Related

Regex replace with multiple wildcards works in PHP, not in JavaScript

I'm attempting to implement center alignment for two Markdown parsers:
In PHP for Parsedown (successfully)
In JavaScript for Bootstrap Markdown (without success)
The idea I'm following and finding the easiest is to work with the final HTML output, and just snap inline styling onto the tags.
The following regex does what I need, it adds style="text-align:center;" to any element so far*, as needed:
$text = preg_replace('/\<(.*?)\>\->(.*?)<\-\<\/(.*?)\>/', '<$1 style="text-align:center;">$2</$3>', $text);
That is, <p>text</p> becomes <p style="text-align:center;">text</p>.
However, when I attempted to port this into JavaScript to also make it available for previewing on client-side, the pattern does not match as it should:
content = content.replace('/\<(.*?)\>\->(.*?)<\-\<\/(.*?)\>/', '<$1 style="text-align:center;">$2</$3>');
The replacement in content does not occur.
I'm aware there are slight differences between Regex of PHP and JavaScript, but I have found examples for all the expected behavior here on both sides, working.
*If someone is wondering by any chance, I'm also successfully adding the center alignment to tags that already have a style attribute - on server side only, so far.

You'll need to use the literal syntax for regular expression in JavaScript, like so:
content = content.replace(/\<(.*?)\>\->(.+)<\-\<\/(.+)\>/gi, '<$1 style="text-align:center;">$2</$3>');
Note that the gi at the end of the regular expression simply enables global searching (that is, replace all occurrences matching the pattern) and case-insensitive matching. They are both technically optional, but you will most likely want the g flag enabled for certain. However, keeping the i flag is up to you (depends on whether or not your content contains &GT;, for example).

how to prevent scripts from being run

SO kept preventing me from posting the title I wanted so finally got a title that let me post though it kind of sucks so feel free to edit/change it.
I have fields a user can fill in and in the javascript we have
'${chart.title}'
and stuff like that. Is it sufficient to just strip out the single quote character such that they cannot escape it back to javascript? or are there other ways to close out the string that started with the single quote character.
${chart.title} inserts the title a user typed in on a previous page so naturally they could type something like "Title'+callMethod()+'RestOfTitle" injecting a callMethod into my javascript.
thanks,
Dean

The best way would be to restrict the input to alphanumerical and space characters.
If you want to allow anything inside the title, you can use a escaping function.
http://xkr.us/articles/javascript/encode-compare/
Just stripping the string of single quote characters is definitely not enough. Think of new lines for one reason.

There are couple of options.
First go very restrictive way and do both so called white-list validation for input field for you title and always encode the text that you output to the page. That will filtered out all unwanted (and potentially dangerous) characters and make sure that if some of them pass filter (or somebody update the text to contains some js code after the filters were applied) the encoding procedure make all malicious js scripts not runable (it turns it into plain text).
Second you do let your users input what ever they want (which is highly unrecommended way but sometime developers asked to do it) but always encode the text that you output to the page.
You can implement white-list validation by yourself using regular expression or you can use one of the libraries.

How can I get the actual displayed text from an HTML TextNode instead of the HTML markup?

I'm trying to turn a DOM node and all its children into a plain text markup of my design. I can use node.childNodes to get a list of all the content and recursively turn it into my string format.
However, when I take text out of a TextNode, it includes newlines and spaces that aren't visible on the page. For plain text I want to get the same appearance that was on the HTML - so there shouldn't be lots of indentations before the text or newlines after it, even if they were in the HTML markup, because my browser stripped those out when it rendered the HTML.
The obvious answer would be to .trim() the string myself - except that this can take out spaces that are supposed to exist in the text, in the case of something like <em>text.</em> moretext. The latter textnode loses the space before it.
Even if that was working it's also philosophically unappealing. I want this algorithm to be based on the text presented to the user. The webpage conceals implementation details like spaces, tabs, and newlines in the underlying markup and I would like to remain within that abstraction using whatever it used to trim them down, rather than the approximation granted by trim(). Ideally there would be an equivalent of node.textContent that has a list of both plain textand child elements somehow.
I haven't been able to find anything about this and I can't see a good way to code it to be smart about those spaces (short of comparing the .textContent and .nodeValue strings or parsing innerHTML myself or something). Help?

document.getElementById("someid").innerText.replace(/\s+/g," ")
The trim method removes the space at the head and the end of a string, but not in the middle

I have written an implementation of exactly this as part of my Rangy library's TextRange module, but it's a lot of code to include for just this.
var displayedText = rangy.innerText(node);

regex to change text inside a html tag

First of all I'm new to stackoverflow so I'm sorry if I posted this in the wrong section.
I need a regex to search within the html tag and replace the - with a _
e.g:
<TAG-NAME>-100</TAG-NAME>
would become
<TAG_NAME>-100</TAG_NAME>
note that the value inside the tag wasn't affected.
Can anyone help?
Thanks.

Since JavaScript is the language for DOM manipulation, you should generally consider parsing the XML properly and using JavaScript's DOM traversal functions instead of regular expressions.
Here is some example code on how to parse an XML document so that you can use the DOM traversal functions. Then you can traverse all elements and change their names. This will automatically exclude text nodes, attributes, comments and all other annoying things, you don't want to change.
If it has to be a regex, here is a makeshift solution. Note that it will badly fail you if you have tags (or even only >) inside attribute names or comments (in fact it will also apply the replacement to comments):
str = str.replace(/-(?=[^<>]*>)/g, '_');
This will match a - if it is followed by a > without encountering a < before. The concept is called a negative lookahead. The g modifier makes sure that all occurrences are replaced.
Note that this will apply the replacement to anything in front of a >. Even attribute values. If you don't want that you could also make sure that there is an even number of quotes between the hyphen and the closing >, like this:
str = str.replace(/-(?=[^<>"]*(?:"[^<>"]*"[^<>"]*)*>)/g, '_');
This will still change attribute names though.
Here is a regexpal demo that shows what works and what doesn't work. Especially the comment behavior is quite horrible. Of course this could be taken care of with an even more complex regex, but I guess you see where this is going? You should really, really use an XML parser!

s/(\<[^\>]+\>)\-([^\<]+\<\/)/\1_\2/
Although I am not familiar with JS libraries, but I am pretty sure there would be better libraries to parse HTML.

Replace space with underscore in Netbeans Code Templates

I created the below JavaScript Code Template in Netbeans 7.1.1
This is to generate code automatically in the editor, instead of typing it.
do_this('${selection}${cursor}',13)
which will result in the below code, with the cursor between quotes
do_this('',13)
The template automatically places the text I have highlighted, between quotes.
Now, the problem: I would like to replace any spaces within the selected/highlighted piece of code, with underscores. I think this may be possible with Regular Expressions (regex), however I am not sure how to go about it.
Thanks

Not sure about the Netbeans specific stuff, but once you have the selection you could do something like this:
selection = selection.split(" ").join("_");

This is not possible in NetBeans Code Templates, as they do not have any functions to manipulate the data in variables. If I have a work around for it, I will post it here.
Thanks.

We Keep Coding

JavaScript is the programming language of the Web.