Regex replace with multiple wildcards works in PHP, not in JavaScript - javascript

I'm attempting to implement center alignment for two Markdown parsers:
In PHP for Parsedown (successfully)
In JavaScript for Bootstrap Markdown (without success)
The idea I'm following and finding the easiest is to work with the final HTML output, and just snap inline styling onto the tags.
The following regex does what I need, it adds style="text-align:center;" to any element so far*, as needed:
$text = preg_replace('/\<(.*?)\>\->(.*?)<\-\<\/(.*?)\>/', '<$1 style="text-align:center;">$2</$3>', $text);
That is, <p>text</p> becomes <p style="text-align:center;">text</p>.
However, when I attempted to port this into JavaScript to also make it available for previewing on client-side, the pattern does not match as it should:
content = content.replace('/\<(.*?)\>\->(.*?)<\-\<\/(.*?)\>/', '<$1 style="text-align:center;">$2</$3>');
The replacement in content does not occur.
I'm aware there are slight differences between Regex of PHP and JavaScript, but I have found examples for all the expected behavior here on both sides, working.
*If someone is wondering by any chance, I'm also successfully adding the center alignment to tags that already have a style attribute - on server side only, so far.

You'll need to use the literal syntax for regular expression in JavaScript, like so:
content = content.replace(/\<(.*?)\>\->(.+)<\-\<\/(.+)\>/gi, '<$1 style="text-align:center;">$2</$3>');
Note that the gi at the end of the regular expression simply enables global searching (that is, replace all occurrences matching the pattern) and case-insensitive matching. They are both technically optional, but you will most likely want the g flag enabled for certain. However, keeping the i flag is up to you (depends on whether or not your content contains &GT;, for example).

Related

Delimiting documents with regular expressions

I'm working on several documents that are within just a file, and before working on the documents, I need to define where one document begins and ends. For this, I am using the following regex:
MINISTÉRIO\sDO\sTRABALHO\sE\sEMPREGO(?:[^P]*(?:P(?!ÁG\s:\s\d+\/\d+)[^P]*)*)PÁG\s:\s\d+\/(\d+)\b(?:\D*(?:(?!\1\/\1)\d\D*)*)\1\/\1(?:[^Z]*(?:Z(?!6:\s\d+)[^Z]*)*)Z6:\s\d+
Example is here
Is working 100%, the problem is, sometimes the text does not come this way I showed.. it comes with spaces and lines. As you can see here, the document is the same as the previous one, but the regular expression does not work. I wonder why is not working and how to fix to make it work ?
Also, I need modify the regex, not the text, cause the only real part that I have access is the regex.
OBS: I'm using Node.JS, that's why i'm tagging with JS this post.

VB.Net remove whitespace with regex excluding content inside <script>

I want to reduce the size out my HTML output stream by removing all empty lines and whitespace. However I'm not very good at regex and the pattern I have seems to remove more than wanted e.g. whole script blocks. How can I make sure that blocks are kept in tact?
This is what I have so far:
html = Regex.Replace(html, ">\s+<", "><", RegexOptions.Compiled)
I think you're looking for conditional regex. Look at examples here Regex Tutorial If-Then-Else
There are different regex for different systems (.Net, Python, etc)

regex to change text inside a html tag

First of all I'm new to stackoverflow so I'm sorry if I posted this in the wrong section.
I need a regex to search within the html tag and replace the - with a _
e.g:
<TAG-NAME>-100</TAG-NAME>
would become
<TAG_NAME>-100</TAG_NAME>
note that the value inside the tag wasn't affected.
Can anyone help?
Thanks.
Since JavaScript is the language for DOM manipulation, you should generally consider parsing the XML properly and using JavaScript's DOM traversal functions instead of regular expressions.
Here is some example code on how to parse an XML document so that you can use the DOM traversal functions. Then you can traverse all elements and change their names. This will automatically exclude text nodes, attributes, comments and all other annoying things, you don't want to change.
If it has to be a regex, here is a makeshift solution. Note that it will badly fail you if you have tags (or even only >) inside attribute names or comments (in fact it will also apply the replacement to comments):
str = str.replace(/-(?=[^<>]*>)/g, '_');
This will match a - if it is followed by a > without encountering a < before. The concept is called a negative lookahead. The g modifier makes sure that all occurrences are replaced.
Note that this will apply the replacement to anything in front of a >. Even attribute values. If you don't want that you could also make sure that there is an even number of quotes between the hyphen and the closing >, like this:
str = str.replace(/-(?=[^<>"]*(?:"[^<>"]*"[^<>"]*)*>)/g, '_');
This will still change attribute names though.
Here is a regexpal demo that shows what works and what doesn't work. Especially the comment behavior is quite horrible. Of course this could be taken care of with an even more complex regex, but I guess you see where this is going? You should really, really use an XML parser!
s/(\<[^\>]+\>)\-([^\<]+\<\/)/\1_\2/
Although I am not familiar with JS libraries, but I am pretty sure there would be better libraries to parse HTML.

How to adding special html chars without using innerHTML

So I'm working on a micro lib, html.js, and basically it creates text nodes with document.createTextNode but when I want to create a text node with a b I get a&nbsp;b so I'm wondering how to escape the & char, without using innerHTML ideally..
Javascript supports the \uXXXX notation, so in the case of a non-breaking space, that would be \u00A0.
document.createTextNode('a\u00A0b');
That's as far as you can get. It's a text node, consisting only of text, and there's no difference between texts created from entity references or from normal characters.
If that's not what you want, you should take a second look at innerHtml. Can't you read it, modify it and put it back?
There's not much functionality in js to encode/decode html entities. Seems like there some libraries out there, though, that can help you achieve this. Here is one I found on goodle.. haven't tried it, but you can check it out, or look for others.
http://www.strictly-software.com/htmlencode

Javascript syntax highlighter that plays nicely with Markdown

I've looked at a few Javascript programs to add syntax highlighting to code blocks on a page, but they all the ones I've found require setting an attribute on the code block to tell it what language is being used. I am generating the HTML with Markdown, so I have no way of setting these attributes, are there any that will do this automatically and will not need an attribute to be set?
The only way I can think of this working is with a shebang line;
#!/usr/bin/ruby
def foo(bar)
bar
end
And it will know it's Ruby, and maybe even not display the shebang line (having a shebang for a one or two line fragment will get tiring).
I wont be needing it to do any very obscure languages, but it would be great if I could easily write new definitions.
Thanks.
Google Prettifier should do the job. StackOverflow uses it, too (with the markup generated by Markdown). It determines the language automatically.
It's my understanding that the Markdown spec allows for the presence of actual markup as a fallback:
For any markup that is not covered by
Markdown's syntax, you simply use HTML
itself. There's no need to preface it
or delimit it to indicate that you're
switching from Markdown to HTML; you
just use the tags.
The only restrictions are that
block-level HTML elements -- e.g.
<div>, <table>, <pre>, <p>, etc. --
must be separated from surrounding
content by blank lines, and the start
and end tags of the block should not
be indented with tabs or spaces.
So, if you've got a syntax highlighter you really like that doesn't auto-detect, you could simply throw a literal <code> block with the appropriate attribute into your Markdown. I don't think it particularly violates the goals of Markdown, either... it's a fairly straightforward and readable indicator.
It also might not be that hard to roll your own script that executes first after the DOM is ready, finds code blocks, and inserts appropriate attributes for the syntax highlighter of your choice into them depending on a few heuristics that you devise for their contents, but if there's a library out there that already does it, obviously that has some advantages. :)

Categories