We have report pages that append a <pre> at the bottom of pages that lists information line by line.
Example:
<pre>
Site Report Info
This is where any error will appear as a query string of numbers: 938109283091238109281092
This is where the account ID will be.
This is where the account reference pin will be.
So on...
So on...
So on...
So on...
So on...
</pre>
Using javascript or jquery, perhaps regex, how can I place all of this into one array where each line is an array element? I assume regex, since the way to determine the lines is by identifying line breaks \n ?
var lines = $('pre').text().split('\n') should do the trick.
You don't need to use jQuery to get the text, of course, but if you're doing web programming, jQuery is pretty ubiquitous.
You may also want to trim the results to get rid of extra whitespace (or not, depending on your application):
var lines = $('pre').text().split('\n').map(function(l) { return l.trim(); });
Related
Since this question does not contain a specific question on regex but more on it's design/approach, it might take a while to understand the requirements and their dependencies. I have done everything I can to make it as easy as possible with this fully working yet not elegant solution(deadlink).
I need to optimize text in a messaging platform that is being created/edited by others and may have to be sanitized with regex. All optimizations need to be done with one single regex, since these happen often and are quite expensive (or am I wrong on this?). Furthermore the regex needs to be language-agnostic (at least compatible with Javascript and Php). Last but not least, the optimized text must not contain (additional) Html as it is used in a text-only environment.
Requirements
Optimize lines
Remove single lines
Do not remove single lines that end with two|no spaces (thus allow editors to force a newline)
Do not remove empty lines (double line breaks)
Do not remove single lines that start with symbol|char|digit|entity+space (raw lists)
Condense multiple consecutive empty lines (double line breaks) to one double line break
Optimize spaces
Remove excess spaces
Do not remove spaces at the end of a sentence
Optimize comments
Remove single line comments
Do not remove trailing comments
Overall
Preserve Html and do not add Html
Intermediate solution
So far, my solution is to combine 4 regexes which 'match' my requirements and get replaced by a single space:
Matches single lines while leaving empty lines intact and preserving raw lists: \n(?!\n|[-_.○•♥→›>+%\/*~=] |[a-zA-Z_1-9+][\.|\)|\:|\*]) (the length is due to several list-style-types I want to support)
Matches excess empty lines: (\n+)(?=\n\n)
Matches excess spaces: +
Matches single line comments (while ignoring trailing comments): ^\n?\/\/ .+\n
To make the optimization rather inexpensive, I concatenate them with | to one single regex which I can use in Javascript (as well as Php).
r = new RegExp(" \n(?!\n|[-_.○•♥→›>+%\/*~=] |[a-zA-Z_1-9+][.):*] )|(\n+)(?=\n\n)| + |^\n?\/\/ .+\n", "gm");
i = document.getElementById("input").innerHTML;
p = " ";
o = i.replace(r, p);
document.getElementById("output").innerHTML = o;
#input, #output { width: 100%; height: 88vh; }
#input { display: none; } #output { border: none; }
<textarea id="input">
MAKE PARAGRAPHS
This is the first paragraph.
Some sentences end with newlines.
Some don't. We need to cope with that.
This is the second paragraph.
It contains some unnecessary spaces.
Even at the end of a line.
This is the third paragraph.
Some sentences end with question- and exclamation-marks.
I hope that is ok for you. Is it? That's great! Really.
KEEP LISTS
This is an unordered list, starting with a minus+space:
- This is the first item.
- This is the second item.
- This is the third item.
Here is an unordered list, starting with entity|symbol+space:
• This is the second item.
> This is the third item. // Works in php only
* This is the fifth item.
This is a (manually) ordered lists, starting with char|digit+entity+space:
1. This is the first item.
b) This is the second item.
3: This is the third item.
Here is a mathematical list, starting with operators:
+ Plus
- Minus
% Percentage
/ Division
* Multiply
~ Like
= Equal
These are (manually) ordered lists, which are not summed up because they do not end with a space:
1 This is the first item.
b This is the second item.
I like the third item.
First: This works.
Second: It works great.
Third: That is nice!
KEEP HTML
The input text may contain Html.
The output text must simply keep it for further processing.
The output must not add Html as it is processed in a text-only environment.
I know this sounds stupid, but it isn't.
REMOVE COMMENTS
Single/whole line comments are being removed.
// Sources
// Removing single lines: https://regex101.com/r/qU1eP8/5
// Removing comments: https://www.perlmonks.org/?node_id=996552
// Tests
// Dialog: https://api.sefzig.net/dialog/test/regex/
// Jsbin: https://jsbin.com/goromad/edit?output
// Regex101: https://regex101.com/r/Xz5atA/2
// Regexr: https://regexr.com/45svm
Thank you, regex ♥ // Problem solved
~Fin~
</textarea>
<textarea id="output"><!-- Press "Run" --></textarea>
My request
Since I am not a regex-expert and my approach feels cumbersome, I'd like to hear your suggestions. I know Regex is expensive and everything can be done better.
You might wonder about a few details I haven't mentioned here for the sake of clarity. You also might want to test my Regexes. This is why I have set up a sandbox, isolating the requirements (Regexes), containing an example text with all use-cases as well as a detailed description:
https://api.sefzig.net/dialog/test/regex/(deadlink)
In case you want to use the features of great tools out there, here you go:
Regexr: https://regexr.com/45svm
Regex101: https://regex101.com/r/Xz5atA/2
Jsbin: https://jsbin.com/goromad/edit?output
Thank you
for helping me straighten this important feature of my messaging platform! Please feel free to enhance my approach, suggest an alternative or use the results in your own project ♥
This is my first question on stack overflow. I have researched a lot. Please bear with me if I have done anything wrong and help me fix that.
I can see the line breaks "↵" for a string in Chrome Developer Tools
<br>↵↵<br>Event Status: confirmed↵<br>Event Description: Hog Day (Night )and Hog Day (Day)↵↵Friday...
If i double click this and paste to Notepad, the line breaks are preserved.
When i save the string to an object like so,
var summary = el.find("summary").text();
var volunteerEvent = {title: title, summary: summary}
and eventually display it on a page,
$('#volunteerEventDescription').empty().html(event.summary);
the line breaks are gone and it's a huge chunk of text.
How do i keep the newlines?
I see two obvious options. Which one is the right one for you depends on how much control over the formatting you want.
Use the pre tag and the new lines will be respected. pre is for preformatted text and will use non-proportional font so it may not render as you would wish. See pre on MDN for more details.
Replace the new lines with the br tag. You can do this with a regular expression: stringValue.replace(/\n/g, '<br/>'). A more robust regular expression is present on another question: jQuery convert line breaks to br (nl2br equivalent).
The nl2br function equivalent from PHP can be found in php.js: http://phpjs.org/functions/nl2br/. nl2br, as the name might subtly suggest, converts newlines to break tags.
I'm trying to parse a text file and store to an array, but I can't seem to get rid of the unneeded characters.
For example, some of the text will be "fi nd" or "job;" or "writ,er"
Right now I'm using
lettersTemp = InputDataLine.match(/[a-zA-Z]['*]/);
to parse the text file, but that obviously isn't working because I'm pulling the entire string and not getting rid of the extra characters. Anyone got some advice or an easier way to do this?
Is the result a string that consists of only letters (without space separating words)? If so you can use to the following code to filter the letters;
lettersTemp = InputDataLine.match(/[a-zA-Z]+/g);
And then you can append each line in an array.
If you want to append each word in each line in the array, it seems impossible to realize that.
I am trying to replace a two multiline comments (on a single line) with javascript text in the middle. I am using a build tool, which reads the entire file, and need to replace a specific string (made up of comments) during the build.
Example:
var data = /*testThisDelete:start*/new Date();/*testThisDelete:end*/
Once replaced, should used like this
var data = 4.6.88
Try something like this to get started:
"your file as a string".replace(new RegExp('/\*testThisDelete\:start.*testThisDelete\:end\*/','m'), '"replacement text"');
See this post for a lot of useful additional info: JavaScript replace/regex
Are you looking for:
^.+?(\/\*testThisDelete:start\*\/.+?\/\*testThisDelete:end\*\/)$
With this you should just be able to replace the first matched substring with what you want.
I have something like the following;-
<--customMarker>Test1<--/customMarker>
<--customMarker key='myKEY'>Test2<--/customMarker>
<--customMarker>Test3 <--customInnerMarker>Test4<--/customInnerMarker> <--/customMarker>
I need to be able to replace text between the customMarker tags, I tried the following;-
str.replace(/<--customMarker>(.*?)<--\/customMarker>/g, 'item Replaced')
which works ok. I would like to also ignore custom inner tags and not match or replace them with text.
Also I need a separate expression to extract the value of the attribute key='myKEY' from the tag with Text2.
Many thanks
EDIT
actually I am trying to find things between comment tags but the comment tags were not displaying correctly so I had to remove the '!'. There's a unique situation that required comment tags... in anycase if anyone knows enough regex to help, it would be great. thank u.
In the end, I did something like the following (incase anyone else needs this. enjoy!!! But note: Word about town is that using regex with html tags is not ideal, so do your own research and make up your mind. For me, it had to be done this way, mostly bcos i wanted to, but also bcos it simplified the job in this instance);-
var retVal = str.replace(/<--customMarker>(.*?)<--\/customMarker>/g, function(token, match){
//question 1: I would like to also ignore custom inner tags and not match or replace them with text.
//answer:
var replacePattern = /<--customInnerMarker*?(.*?)<--\/customInnerMarker-->/g;
//remove inner tags from match
match = $.trim(match.replace(replacePattern, ''));
//replace and return what is left with a required value
return token.replace(match, objParams[match]);
//question 2: Also I need a separate expression to extract the value of the attribute key='myKEY' from the tag with Text2.
//answer
var attrPattern = /\w+\s*=\s*".*?"/g;
attrMatches = token.match(attrPattern);//returns a list of attributes as name/value pairs in an array
})
Can't you use <customMarker> instead? Then you can just use getElementsByTagName('customMarker') and get the inner text and child elements from it.
A regex merely matches an item. Once you have said match, it is up to you what you do with it. This is part of the problem most people have with using regular expressions, they try and combine the three different steps. The regex match is just the first step.
What you are asking for will not be possible with a single regex. You're going to need a mini state machine if you want to use regular expressions. That is, a logic wrapper around the matches such that it moves through each logical portion.
I would advise you look in the standard api for a prebuilt engine to parse html, rather than rolling your own. If you do need to do so, read the flex manual to get a basic understanding of how regular expressions work, and the state machines you build with them. The best example would be the section on matching multiline c comments.