I'm trying to make a parser for formatting JavaScript in a contextual format. First I want to be able to convert the input JavaScript into one line of JavaScript and then format the code based on my requirements. This does not remove all of the enters or white space:
txt = $.trim(txt);
txt = txt.replace("\n", "");
How can I convert the text into one line?
Use a regular expression with the "global" flag set:
txt.replace(/\n/g, "");
However, you should be careful about removing linebreaks in Javascript. You might break code that was depending on semicolon insertion. Why don't you use an off-the shelf parser like Esprima?
Use :
\s character that represents any space character (Carriage return, Line Feed, Tabs, Spaces, ...)
the "greedy" g flag.
var text = txt.replace(/\s+/g, ' ');
Hope it helps
If the text comes from some operating systems, it may have the \r\n line ending, so it is worth removing both...
You should also use /\r/g this replaces ALL \rs not just the first one.
var noNewLines = txt.replace(/\r/g, "").replace(/\n/g, "");
You have to be pretty sure there are no single-line comments and that there are no missing semi-colons.
You can try to minify your code, using something like https://javascript-minifier.com/
however this will also change your variable names
Related
I am attempting to generate some code using escodegen's .generate() function which gives me a string.
Unfortunately it does not remove completely the semi-colons (only on blocks of code), which is what I need it to do get rid of them myself. So I am using the the .replace() function , however the semi-colons are not removed for some reason.
Here is what I currently have:
generatedCode = escodegen.generate(esprima.parseModule(code), escodegenOptions)
const cleanGeneratedCode = generatedFile.replace(';', '')
console.log('cleanGeneratedCode ', cleanGeneratedCode) // string stays the exact same.
Am I doing something wrong or missing something perhaps?
As per MDN, if you provide a substring instead of a regex
It is treated as a verbatim string and is not interpreted as a regular expression. Only the first occurrence will be replaced.
So, the output probably isn't exactly the same as the code generated, but rather the first semicolon has been removed. To remedy this, simply use a regex with the "global" flag (g). An example:
const cleanGenereatedCode = escodegen.generate(esprima.parseModule(code), escodegenOptions).replace(/;/g, '');
console.log('Clean generated code: ', cleanGeneratedCode);
through queries to a Database I am retrieving such data that I previously inserted through HTML textarea or input. When I get the response from my DB , in a JSON object the text field looks like this :
obj : {
text : [some_text] ↵ [some_text]
}
I tried to replace with this function :
string_convert = function(string){
return string.replace("↵",'<br>')
.replace('&crarr','<br>')
.replace('/[\n\r]/g','<br>');
}
I have to show this string in HTML ,but it does not seems to work. I'm using UTF-8
Any advice?
The problem you have is that you have enclosed your regex in quotes. This is incorrect.
.replace('/[\n\r]/g','<br>');
^ ^
remove these two quotes
The quotes are unnecessary because the regex is already delimited by the slashes.
By putting quotes in there, you've actually told it that you want to replace a fixed string rather than a regular expression. The fixed string may look like an expression, but with the quotes, it will just be seen as a plain string.
Remove the quotes and it will be seen as an expression, and it will work just fine.
One other thing, though -- in order to make your regex work perfectly, I'd also suggest modifying it slightly. As it stands, it will just replace all the \n and \r characters with <br>. But in some cases, they may come together as a \r\n pair. This should be a single line break, but your expression will replace it with two <br>s.
You could use an expression like this instead:
/\r\n|\n|\r/g
Hope that helps.
you are missing the ending semicolons ; in your code:
string_convert = function(aString){
return aString.replace("↵",'<br>').replace('↵','<br>');
}
this does not necessary solve your problem, but it could likely.
From: Trying to translate a carriage return into a html tag in Javascript?
text = text.replace(/(\r\n|\n|\r)/g,"<br />");
we have a text like:
this is a test :rep more text more more :rep2 another text text qweqweqwe.
or
this is a test :rep:rep2 more text more more :rep2:rep another text text qweqweqwe. (without space)
we should replace :rep with TEXT1 and :rep2 with TEXT2.
problem:
when try to replace using something like:
rgobj = new RegExp(":rep","gi");
txt = txt.replace(rgobj,"TEXT1");
rgobj = new RegExp(":rep2","gi");
txt = txt.replace(rgobj,"TEXT2");
we get TEXT1 in both of them because :rep2 is similar with :rep and :rep proccess sooner.
If you require that :rep always end with a word boundary, make it explicit in the regex:
new RegExp(":rep\\b","gi");
(If you don't require a word boundary, you can't distinguish what is meant by "hello I got :rep24 eggs" -- is that :rep, :rep2, or :rep24?)
EDIT:
Based on the new information that the match strings are provided by the user, the best solution is to sort the match strings by length and perform the replacements in that order. That way the longest strings get replaced first, eliminating the risk that the beginning of a long string will be partially replaced by a shorter substring match included in that long string. Thus, :replongeststr is replaced before :replong which is replaced before :rep .
If your data is always consistent, replace :rep2 before :rep.
Otherwise, you could search for :rep\s, searching for the space after the keyword. Just make sure you replace the space as well.
ok i do have this following data in my div
<div id="mydiv">
<!--
what is your present
<code>alert("this is my present");</code>
where?
<code>alert("here at my left hand");</code>
oh thank you! i love you!! hehe
<code>alert("welcome my honey ^^");</code>
-->
</div>
well what i need to do there is to get the all the scripts inside the <code> blocks and the html codes text nodes without removing the html comments inside. well its a homework given by my professor and i can't modify that div block..
I need to use regular expressions for this and this is what i did
var block = $.trim($("div#mydiv").html()).replace("<!--","").replace("-->","");
var htmlRegex = new RegExp(""); //I don't know what to do here
var codeRegex = new RegExp("^<code(*n)</code>$","igm");
var code = codeRegex.exec(block);
var html = "";
it really doesn't work... please don't give the exact answer.. please teach me.. thank you
I need to have the following blocks for the variable code
alert("this is my present");
alert("here at my left hand");
alert("welcome my honey ^^");
and this is the blocks i need for variable html
what is your present
where?
oh thank you! i love you!! hehe
my question is what is the regex pattern to get the results above?
Parsing HTML with a regular expression is not something you should do.
I'm sure your professor thinks he/she was really clever and that there's no way to access the DOM API and can wave a banner around and justify some minor corner-case for using regex to parse the DOM and that sometimes it's okay.
Well, no, it isn't. If you have complex code in there, what happens? Your regex breaks, and perhaps becomes a security exploit if this is ever in production.
So, here:
http://jsfiddle.net/zfp6D/
Walk the dom, get the nodeType 8 (comment) text value out of the node.
Invoke the HTML parser (that thing that browsers use to parse HTML, rather than regex, why you wouldn't use the HTML parser to parse HTML is totally beyond me, it's like saying "Yeah, I could nail in this nail with a hammer, but I think I'm going to just stomp on the nail with my foot until it goes in").
Find all the CODE elements in the newly parsed HTML.
Log them to console, or whatever you want to do with them.
First of all, you should be aware that because HTML is not a regular language, you cannot do generic parsing using regular expressions that will work for all valid inputs (generic nesting in particular cannot be expressed with regular expressions). Many parsers do use regular expressions to match individual tokens, but other algorithms need to be built around them
However, for a fixed input such as this, it's just a case of working through the structure you have (though it's still often easier to use different parsing methods than just regular expressions).
First lets get all the code:
var code = '', match = [];
var regex = new RegExp("<code>(.*?)</code>", "g");
while (match = regex.exec(content)) {
code += match[1] + "\n";
}
I assume content contains the content of the div that you've already extracted. Here the "g" flag says this is for "global" matching, so we can reuse the regex to find every match. The brackets indicate a capturing group, . means any character, * means repeated 0 or more times, and ? means "non-greedy" (see what happens without it to see what it does).
Now we can do a similar thing to get all the other bits, but this time the regex is slightly more complicated:
new RegExp("(<!--|</code>)(.*?)(-->|<code>)", "g")
Here | means "or". So this matches all the bits that start with either "start comment" or "end code" and end with "end comment" or "start code". Note also that we now have 3 sets of brackets, so the part we want to extract is match[2] (the second set).
You're doing a lot of unnecessary stuff. .html() gives you the inner contents as a string. You should be able to use regEx to grab exactly what you need from there. Also, try to stick with regEx literals (e.g. /^regexstring$/). You have to escape escape characters using new RegExp which gets really messy. You generally only want to use new RegExp when you need to put a string var into a regEx.
The match function of strings accepts regEx and returns a collection of every match when you add the global flag (e.g. /^regexstring$/g <-- note the 'g'). I would do something like this:
var block = $('#mydiv').html(), //you can set multiple vars in one statement w/commas
matches = block.match(/<code>[^<]*<\/code>/g);
//[^<]* <-- 0 or more characters that aren't '<' - google 'negative character class'
matches.join('_') //lazy way of avoiding a loop - join into a string with a safe character
.replace(/<\/*code>/g,'') //\/* 0 or more forward slashes
.split('_');//return the matches string back to array
//Now do what you want with matches. Eval (ew) or append in a script tag (ew).
//You have no control over the 'ew'. I just prefer data to scripts in strings
So I receive some xml in plaintext (and no I can't use DOM or JSON because apparently I am not allowed to), I want to strip all elements encased in a certain element and put them into an array, where I can strip out the text in the individual segments.
Now I am used to using POSIX regex and I will never actually understand the point behind PCRE regex, nor do I get the syntax.
Now here is the code I am using:
var strResponse = objResponse.text;
var strRegex = new RegExp("<item>(.*?)<\/item>","i");
var arrMatches = "";
var match;
while (match = strRegex.exec(strResponse)) {
arrMatches[] = match[1];
}
I have no idea why it won't find any matches with this code, can someone please help me on this and perhaps elaborate on what exactly it is I am continuously doing wrong with the PCRE syntax?
If those tags are in different rows the . will not match the newline characters and therefor your expression will not match. This is just a guess, I don't know your source.
You can try
var strRegex = new RegExp("<item>([\\s\\S]*?)<\\/item>","i");
[\\s\\S] is a character class. containing all whitespace and all non whitespace characters. linebreaks are covered by the whitespace characters.
The best way to complete this task is using the following, to parse it as proper HTML and navigate it with the DOM parser:
Javascript function to parse HTML string into DOM?
Regex has it with being very faulty and is in general not very good for parsing irregular text like HTML structure.