Javascript syntax highlighter that plays nicely with Markdown

Javascript syntax highlighter that plays nicely with Markdown - javascript

I've looked at a few Javascript programs to add syntax highlighting to code blocks on a page, but they all the ones I've found require setting an attribute on the code block to tell it what language is being used. I am generating the HTML with Markdown, so I have no way of setting these attributes, are there any that will do this automatically and will not need an attribute to be set?
The only way I can think of this working is with a shebang line;
#!/usr/bin/ruby
def foo(bar)
bar
end
And it will know it's Ruby, and maybe even not display the shebang line (having a shebang for a one or two line fragment will get tiring).
I wont be needing it to do any very obscure languages, but it would be great if I could easily write new definitions.
Thanks.

Google Prettifier should do the job. StackOverflow uses it, too (with the markup generated by Markdown). It determines the language automatically.

It's my understanding that the Markdown spec allows for the presence of actual markup as a fallback:
For any markup that is not covered by
Markdown's syntax, you simply use HTML
itself. There's no need to preface it
or delimit it to indicate that you're
switching from Markdown to HTML; you
just use the tags.
The only restrictions are that
block-level HTML elements -- e.g.
<div>, <table>, <pre>, <p>, etc. --
must be separated from surrounding
content by blank lines, and the start
and end tags of the block should not
be indented with tabs or spaces.
So, if you've got a syntax highlighter you really like that doesn't auto-detect, you could simply throw a literal <code> block with the appropriate attribute into your Markdown. I don't think it particularly violates the goals of Markdown, either... it's a fairly straightforward and readable indicator.
It also might not be that hard to roll your own script that executes first after the DOM is ready, finds code blocks, and inserts appropriate attributes for the syntax highlighter of your choice into them depending on a few heuristics that you devise for their contents, but if there's a library out there that already does it, obviously that has some advantages. :)

Related

What is this HTML notation and how can I use it myself?

AddThis uses a notation which seems to extend the parameters available in an HTML div tag.
The tag that contains the button array can include additional parameters such as:
<div addthis:url="someUrl"> </div>
Along with defining some css classes for the element seems to give their JavaScript code access to manipulate this element AND read the value of the additional addthis: parameter.
I'd like to implement something similar myself but am confused as to how to correctly allow additional parameters in the standard HTML tags.
I've also seen the AddThis code throw W3C validation errors sometimes so wonder if this is entirely legitimate.
Searching around I've found some discussions about extending the HTML tags via extending the prototypes in JavaScript but everything I've read seems to be about adding new events etc.
This addthis:url notation looks more 'schema'-like to me, or am I on completely the wrong track?
I've made some progress on this, at least functionally, but what I have now breaks the HTML validation quite seriously.
To explain a little further what I am trying to achieve...
In the same way that AddThis allows you to include their sharing elements by adding a simple <DIV> tag to your page and including some JavaScript, I want to provide similar functionality with <IMG> tags.
Someone wanting to use this functionality will include an <IMG> tag that has some additional name=value pairs that are outside of the standard image tags attribute and are defined by my spec.
The JavaScript that is included will then read these additional attributes and perform some actions on the image tags.
To this end I have the following:
<IMG id="first image" class="imageThatCanBeWorkedOn" src="holding.png"
my-API-name:attribute1="some data"
my-API-name:attribute2="some other data">
I then use `getAttribute('my-API-name:attribute1') to access the additional tag data from JavaScript.
(I'm selecting all of the tags with a particular class name into an array and then processing each tag in turn, in case anyone is interested.)
This all works great - I can manipulate the <IMG> tags as needed based on the additional data, BUT the markup is not valid HTML according to the W3C validator.
With the above I get:
Warning Attribute my-API-name:attribute1 is not serializable as XML 1.0.
Warning Attribute my-API-name:attribute2 is not serializable as XML 1.0.
Error: Attribute my-API-name:attribute1 not allowed on element img at this point.
Error: Attribute my-API-name:attribute2 not allowed on element img at this point.
If I remove the : from the attribute name (eg my-API-name-attribute2) the 'not serializable' warnings disappear but I still get the 'not allowed' errors.
So how would I add these additional bits of data to an <IMG> tag and not invalidate the markup but while maintaining a level of clarity/branding by including the 'my-API-name' part in the way that AddThis does?
(I note from the comments that I could use data- attributes. I haven't tried these yet, but I'd prefer to be able to do this in the 'branded' way that AddThis seems to have managed without breaking their users' markup.)

If we were talking about XML (which includes XHTML) it'd be a namespace prefix. In HTML5 it's just a regular attribute:
Attribute names must consist of one or more characters other than the
space characters, U+0000 NULL, U+0022 QUOTATION MARK ("), U+0027
APOSTROPHE ('), U+003E GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and
U+003D EQUALS SIGN (=) characters, the control characters, and any
characters that are not defined by Unicode.
... though slightly harder to manipulate (not too much, though) and totally non-standard.

I'd like to implement something similar myself but am confused as to
how to correctly allow additional parameters in the standard HTML
tags.
Before HTML5, some web developers deployed a technique of adding custom data to an element's class attribute (or to any other attribute which will happily attach itself to any element).
This worked, but it was self-evidently a hack.
For this reason HTML5 introduced custom data-* attributes as the standard approach to extending an element's attributes - and data-* is precisely what you should be deploying.
So how would I add these additional bits of data to an tag and
not invalidate the markup but while maintaining a level of
clarity/branding by including the 'my-API-name' part in the way that
AddThis does?
<img id="first image" class="imageThatCanBeWorkedOn" src="holding.png"
data-myAPIName_attribute1="some data"
data-myAPIName_attribute2="some other data" />
Further Reading:
Time Travel back to 2010: http://html5doctor.com/html5-custom-data-attributes/
Time Travel back to 2008: http://ejohn.org/blog/html-5-data-attributes/

Clientside HTML Minification

Is there a way to this kind of minification with javascript and update the DOM (clientSide)
Input:
<div class="parentDiv">
<div class="childDiv">Some text</div>
<div class="childDiv">Some text</div>
</div>
Output:
<div class="parentDiv"><div class="childDiv">Some text</div><div class="childDiv">Some text</div></div>
I know its useless doing the minification after downloading all the content.
The point here is to stop the identation to create gaps between my divs. I know that if I put a comment between the tags the gap won't appear but it gets difficult to understand the code with so many comments between my div tags.
See this [post] and you'll understand what I mean.

I managed to achieve what I wanted and even created a jQuery plugin to it.
jQuery.fn.clearWhiteSpace = function () {
var htmlClone = this.html().replace(/\n[ ]*/g,"");
this.html(htmlClone);
return this;
}
$(".parentDiv").clearWhiteSpace();
there is an example I wrote in jsfiddle
But thanks for all your effort. :)

If it's a minification the DOM won't update. Also there's nothing client-side minification accomplishes: it's not faster to download and it's not obfuscated from the client.
For what you wrote, you can replace '\n' with '' I guess.

Try this javascript minification script -- http://prettydiff.com/lib/jspretty.js

You need to be careful when parsing documents, especially with special characters in attributes. You can write your own DOM parser, but, why re-invent the wheel?
Here is a great parser, written in JavaScript: https://www.npmjs.com/package/html-minifier
Instructions are documented.
The above method is to "minify" production code; however, if it's a visual spacing issue, then see below:
Update:
"White-space" is mostly ignored when it comes to block-elements.
To ensure that your inline-block elements are not separated by "white-space" you can arrange your (blocks)-code underneath each other, indicating that it is not a "space" that separates them; other than that, here's what really matters:
Proper CSS & HTML
make sure all your HTML tags are "paired" correctly -that each open-tag has a close-tag. This does not count for "void-tags" like <img /> or <input /> as these are "self closing".
if you need blocks placed next to each other, use <div> tags styled with CSS to be display:inline-block. You can also make use of "table-cells" -which do NOT have to be <td> tags as you can achieve this also with CSS to be styled as: display:table-cell.
You can also have elements be wrapped and packed tightly together (as mentioned above) by specifying their style as: float:left (or "right").
It is good practice to place your styles in CSS style-sheets -not in-line as the latter makes your code unmanageable; however, some style-sheets are persistent (see below) and the only way to override such styles is by using inline style.
If you're coding in someone else's code-base and none of the above works, you can make some style-sheets of your own that overrides the others with the word: !important after each property. You can use this to override any property but in this case it would typically be margin or border-...
Lastly, make sure there are no no-braking-spaces between your elements if they are not needed; these look like this:
If you need more info on how to write the modern HTML5 markup and CSS3 style-sheet language, the "Mozilla Developer Network" is a great reference: https://developer.mozilla.org

So let's attempt to solve this issue: "The point here is to stop the indentation to create gaps between my divs." What I can deduce from that sentence + the [post] page + its linked answer page is that client-side HTML minification, isn't the correct solution for this problem.
Have you looked into using inline-block or CSS resets first, before attempting to minify the HTML code or munge it by adding blank comments between the HTML tags?
The linked answer page discusses using inline-block to eliminate the spacing, which is occurring between your HTML elements. Those two pages also discuss resetting the font styles to fix the spacing issues.
CSS Resets can be used to fix gaps between elements. There is a list of the most popular CSS Resets at http://cssreset.com If needed, it should be easy to extend them to override any font settings, thus normalizing how the fonts are treating the white-space characters.
So empty comments shouldn't need to be injected between HTML tags, to fix spacing issues with whitespace characters. If CSS is used to fix the styles, then the HTML will be readable. If the HTML is minified, it will be harder to read & debug. I'd suggest not minifying your HTML using JavaScript. Rather try fixing the spacing issues with CSS.
(As for how minification works under it's hood... see my answer at this SO question.)

Minify HTML in the browser with vanilla JS.
const minify_html = (dom_node) => {
dom_node.childNodes.forEach(node => {
const isTextNode = node.nodeType === 3;
const isEmpty = node.nodeValue.trim().length === 0;
if (isTextNode && isEmpty){
dom_node.removeChild(node);
}
});
};
I created an example with 1,000 elements, and my computer can minify the html in less than 15ms, but it may be slower or faster depending on the device running the code.
https://jsfiddle.net/shwajyxr/

Can't create some kinds of nodes from HTML strings using innerHTML

The following javascript (run in the chrome console) does not do what I'd expect:
> var elem = document.createElement("foo");
undefined
> elem.innerHTML = "<tr></tr>"
"<tr></tr>"
> elem.outerHTML
"<foo></foo>"
The <tr> tag has disappeared!
This seems specific to table-related elements. Using <div> or <span> works as expected.
I expect what I'm doing is invalid, as "foo" is not a known element, and presumably table-related elements can only appear within a . Interestingly, the following code works just fine:
> var elem = document.createElement("foo"), tr = document.createElement("tr");
> elem.appendChild(tr);
> elem.outerHTML
"<foo><tr></tr></foo>"
So it seems like the construction itself (a <tr> not within a <table>) is allowable, but the method of using innerHTML to place it there does not work - perhaps this goes through some html cleanup, which removes things that are not strictly, while creating DOM nodes directly is not subject to the same validation.
My question: is there any way to populate an arbitrary DOM node from a string without running into such cleanup / validation issues? My use case will end up with perfectly valid structure (I plan to place this as the child of a sometime later), but the browser is stopping me while I'm trying to build the individual parts.
It sounds a little like DocumentFragment should be what I'm looking for, but as far as I can tell those are only constructable programmatically - they don't support innerHTML.
some background on why I want to do this:
My use case is javascript-based live templating (i.e not outputting html strings, but actual DOM nodes). So the requirements are:
template input must be allowed to be arbitrary HTML (this is why I'm using innerHTML and not constructing nodes programmatically)
it must be possible to create sub-templates that are then attached into a larger document (that's why I can't just create the whole at once).
The second point is how I encountered this bug. My template contains a sub-template.
var row = Html("<tr></tr>");
var table = Html(["<table><thead>", row, "</thead></table>"]);
I will later add code like:
row.append(Html(["<td>", column.header, "</td>"]));
to actually populate the columns. So when it's fully constructed, the html will be valid. But in the intermediate stages, each template / snippet is constructed under a single element. That means that templates like:
Html(["Hello <span>", name, "</span>"]);
still come out as a single node (so that they can be manipulated as a single entity):
<foo>Hello <span>bob</span></foo>
When the template results in only a single child inside the <foo>, the outer node is removed. But during construction, the row template above should look like <foo><tr></tr></foo>. Due to the validation behaviour I'm seeing when using innerHTML it just ends up as <foo></foo>.
I've checked all code works the same in both firefox & chrome, so I don't expect I'm just hitting a browser bug.

Unfortunately the answer to your general question is no, there is no way to use innerHTML to add arbitrarily incomplete HTML fragments. I know this is not the answer you want to hear but that's the way it is.
One of the most misunderstood thing about innerHTML stems from the way the API is designed. It overloads the + and = operators to perform DOM insertion. This tricks programmers into thinking that it is merely doing string operations when in fact innerHTML behaves more like a function rather than a variable. It would be less confusing to people if innerHTML was designed like this:
element.innerHTML('some <b>html</b> here');
unfortunately it's too late to change the API so we must instead understand that it is really an API instead of merely an attribute/variable.
Now, to understand the so called "validation" behavior of innerHTML. When you modify innerHTML it triggers a call to the browser's HTML compiler. It's the same compiler that compiles your html file/document. There's nothing special about the HTML compiler that innerHTML calls. Therefore, whatever you can do to a html file you can pass to innerHTML (the one exception being that embedded javascript don't get executed - probably for security reasons).
This makes sense from the point of view of a browser developer. Why include two separate HTML compilers in the browser? Especially considering the fact that HTML compilers are huge, complex beasts.
The down side to this is that incomplete HTML will be handled the same way it is handled for html documents. In the case of <td> elements not inside a table most browsers will simply strip it away (as you've observed for yourself). That is essentially what you're trying to do - create invalid/incomplete HTML.
There are two work arounds to this:
Extract the table from the page then using string processing (regex et. el.) insert the <td> into the table string then innerHTML the whole table back into the page.
Parse the inserted HTML string and if you find any <td> or <tr> (or <option>) extract out the html element and insert it using DOM methods.
Unfortunately both are quite painful.

Mihai Stancu's comment about jquery made me think: surely jquery manages this if you call $("<tr></tr>"). I know jquery has a shortcut for strings that look like single tags, but it must work for complex HTML as well.
So I took a dive into the jquery source code, and found just the ticket:
https://github.com/jquery/jquery/blob/6a0ee2d9ed34b81d4ad0662423bf815a3110990f/src/manipulation.js#L450
It's using a regex to detect just the name of the first tag in the string, then using this info to figure out what "context" it needs to wrap it in for the innerHTML process to treat it as valid. I think this technique should work for all well-formed inputs.
I've adopted this code into a standalone function which will turn an arbitrary string into a DOM node:
https://gist.github.com/gfxmonk/5299096

Generating/selecting non-standard HTML tags with jQuery, a good idea?

I've noticed that jQuery can create, and access non-existent/non-standard HTML tags. For example,
$('body').append('<fake></fake>').html('blah');
var foo = $('fake').html(); // foo === 'blah'
Will this break in some kind of validation? Is it a bad idea, or are there times this is useful? The main question is, although it can be done, should it be done?
Thanks in advance!

You can use non-standard HTML tags and most of the browsers should work fine, that's why you can use HTML5 tags in browsers that don't recognize them and all you need to do is tell them how to style them (particularly which tags are display: block). But I wouldn't recommend doing it for two reasons: first it breaks validation, and second you may use some tag that will later get added to HTML and suddenly your page stops working in newer browsers.

The biggest issue I see with this is that if you create a tag that's useful to you, who's to say it won't someday become standard? If that happens it may end up playing a role or get styles that you don't anticipate, breaking your code.

The rules of HTML do say that if manipulated through script the result should be valid both before and after the manipulation.
Validation is a means to an end, so if it works for you in some way, then I wouldn't worry too much about it. That said, I wouldn't do it to "sneak" past validation while using something like facebook's <fb:fan /> element - I'd just suck it up and admit the code wasn't valid.

HTML as such allows you to use any markup you like. Browsers may react differently to unknown tags (and don't they to known ones, too?), but the general bottom line is that they ignore unknown tags and try to render their contents instead.
So technically, nothing is stopping you from using <fake> elements (compare what IE7 would do with an HTML5 page and the new tags defined there). HTML standardization has always been an after-the-fact process. Browser vendors invented tags and at some point the line was drawn and it was called HTMLx.
The real question is, if you positively must do it. And if you care whether the W3C validator likes your document or not. Or if you care whether your fellow programmers like your document or not.
If you can do the same and stay within the standard, it's not worth the hassle.

There's really no reason to do something like this. The better way is to use classes like
<p class = "my_class">
And then do something like
$('p.my_class').html('bah');
Edit:
The main reason that it's bad to use fake tags is because it makes your HTML invalid and could screw up the rendering of your page on certain browsers since they don't know how to treat the tag you've created (though most would treat it as some kind of DIV).
That's the main reason this isn't good, it just breaks standards and leads to confusing code that is difficult to maintain because you have to explain what your custom tags are for.
If you were really determined to use custom tags, you could make your web page a valid XML file and then use XSLT to transform the XML into valid HTML. But in this case, I'd just stick with classes.

Use of CODE tag in HTML, How to I get it to display the code?

Quick question,
If I want to document some code on a basic HTML and put that within a CODE tag, how can I quickly convert the code between those tags when the page renders to display properly? I know I can write a javascript find and replace and search through the string over and over until its done replacing all the characters, but is there a better way?
Or, is there a jQuery way to do it if I need to use javascript?

I think the <code> tag is more for displaying with a certain font, rather than layout. <code> seems to just use a monospaced font.
You might be looking for the <pre> tag (for pre formatted). That will preserve line breaks and spaces.
Unless the code you are trying to display is HTML code itself, then I think you'd have to change all the <'s to <'s ahead of time

Sounds like you may be looking for syntax highlighting. Take a look at google's syntax highlihter

We Keep Coding

JavaScript is the programming language of the Web.