preg replace: replace all line breaks outside a title attribute - javascript

in general I just want to use a function, which minifies my output (by removing line breaks and tabulators), but the problem is, with a normal code like that
return str_replace(array("\r\n", "\t"), '', $s);
also the title attributes (e.g. when you move over a word and a tooltip appears) are minified and the line breaks get lost. I want to keep line breaks, which are inside a title="textwithlinebreakhere", but remove all line breaks outside.
I have no idea how to realize that, so I hope you can help me.
Thanks!

you should use preg_replace_all() and then use (?<!your_match_here) and its siblings. What do I mean by siblings is negative_lookbehind and positive_lookbehind which conditions your search algorithm to see if a character is after or before a certain letter/sign/digit

Remove undesired characters with trim_all() - PHP :
This function was inspired from PHP's built-in function
trim
that removes undesired characters from the start and end of a string, and in case no such characters a provided as second argument to
trim
, removes white-space characters as provided in this list.
So, what does
trim_all()
do?
trim_all()
was intended to remove all instances of white-spaces from both ends of the string, as well as remove duplicate white-space characters inside the string. But, later on, I made it a general purpose function to do a little more than just white-space trimming and cleaning, and made it to accept characters-to-replace and characters-to-replace-with. With this function, you can:
normalize white-spaces, so that all multiple \r , \n , \t , \r\n , \0 , 0x0b , 0x20
and all control characters can be replaced with a single space, and also trimed from both ends of the string;
remove all undesired characters;
remove duplicates;
replace multiple occurrences of characters with a character or string.
function trim_all( $str , $what = NULL , $with = ' ' )
{
if( $what === NULL )
{
// Character Decimal Use
// "\0" 0 Null Character
// "\t" 9 Tab
// "\n" 10 New line
// "\x0B" 11 Vertical Tab
// "\r" 13 New Line in Mac
// " " 32 Space
$what = "\\x00-\\x20"; //all white-spaces and control chars
}
return trim( preg_replace( "/[".$what."]+/" , $with , $str ) , $what );
}
This function can be helpful when you want to remove unwanted characters from users' input. Here is how to use it.
Example Use :
$full_name = trim_all( $_POST['full_name'] );
or
$full_name = trim_all( $full_name , "\t" , "" );

Related

Regular expression not supporting line skipping \n

I'm trying to find the right regular expression for a number with a line skip at the end ( \n ) but every time it's not working.
My regular expression is /^\d+(\n)$/.
EDIT : The text area contains :
22\n
33
Here's my code ( I'm trying to validate what's in the textarea and it's only numbers going there with \n at the end of each lines ) :
function valideChamp()
{
var rExp1 = /^\d+(\n)$/;
var aChamps = document.querySelector("textarea").value;
if (rExp1.test(aChamps.value)==true){
alert("Valide")
}
else {
alert("Invalide")
return false;
}
}
If you want to check for any line containing only a number on it, you can use:
/(^|\n)\d+(\r?\n)/
If you just want to check that there's only a number, and then a newline, and nothing else:
/^\d+(\r?\n)$/
(which is what you were checking for, but that's an odd input pattern.)
If you want to make sure textarea ONLY has lines that are numbers, it might be simpler to check that string.replace(/[0-9\r\n]/g, '') == ''. This will confirm if it contains only numbers and newlines.
Remove ".value"
from this line:
if (rExp1.test(aChamps.value)==true){
You're using $ and \n together which is slightly redundant. Try
/\d+$/gm
where g = global flag and m = multiline flag. Note this will match multiple lines.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp

JavaScript's negative look-ahead doesn't work as expected?

I have some data in a textarea :
(yes it is multiline)
"#ObjectTypeID", DbType.In
"#ObjectID", DbType.Int32,
"#ClaimReasonID", DbType.I
"#ClaimReasonDetails", DbTy
"#AccidendDate", DbType.Da
"#AccidendPlaceID", DbType
"#AccidendPlaceDetails", Db
"#TypeOfMedicalTreatment",
"#MedicalTreatmentDate", Db
"#CreatedBy", DbType.Int32
"#Member_ID", DbType.Strin
.ExecuteScalar(command).ToS
In each row - I want to remove those sections : (from " (include) till the end of row) :
Visually : ( I sampled only 4 )
I've managed to do this :
value=value.replace(/\"[a-z,. ]+(?!.*\")/gi,'')
Which means : search the first " where have charters after it , which doesnot have a future "
This will yield the required results :
"#ObjectTypeID
"#ObjectID32,
"#ClaimReasonID
"#ClaimReasonDetails
"#AccidendDate
"#AccidendPlaceID
"#AccidendPlaceDetails
"#TypeOfMedicalTreatment
"#MedicalTreatmentDate
"#CreatedBy32
"#Member_ID
.ExecuteScalar(command).ToS
Question:
I understand why it is working , but I dont understand why the following is not working :
value=value.replace(/\".+(?!.*\")/gi,'')
http://jsbin.com/fanep/4/edit
I mean : it suppose to search " where has charters after it , which doesn't has future " ....
What am I missing ? I really hate to declare [a-z,. ]
+ is greedy. Since "the whole thing" matches your rule of "must not have a " after", it will go with that.
The reason your first regex works is because you are disallowing most characters by explicitly whitelisting certain ones.
To fix, try adding ? after the + - this will make it lazy instead, matching as little as possible while still meeting the rules.
Additionally, you are searching for the stuff you want to keep... and then deleting it.
Try this instead:
val = val.replace(/"[^"]*(?=[\r\n]|$)/g,'');
This will remove everything from the last " to the end of a line (or end of the input).
value=value.replace(/\"[a-z,. ]+(?!.*\")/gi,'')
means: search the first " where have charters after it, which doesnot have a future "
To be exact: It matches the first " that has some of the characters [a-z,. ] after it, which then is not (in any distance) followed by another ".
I dont understand why the following is not working:
value=value.replace(/\".+(?!.*\")/gi,'')
You have removed the restriction of the character class. .+ will now match any char, including quotes. Regardless whether greedy or not, it will now find the first " that is followed by an amount of any characters (including other quotes) that are no more followed by quotes - i.e. it will suffice if .+ matches until the last quote.
I really hate to declare [a-z,. ]
You can just use the class of all characters except quotes: [^"]. Indeed, I think the following lookahead-free version matches your intent better:
value = value.replace(/"[^"\n\r]*/gi, '');
The one that doesn't work fails because the .+ is greedy. It eats up all it can. (Visual tools can help here, such as this one: http://regex101.com/r/eJ5kJ2/1) We can make it clearer that .+ is matching too much by putting it in a capture group: http://regex101.com/r/qF7nR9/1 Which show us:
In your one that does work (http://regex101.com/r/kR8vL6/1), you've changed that to [a-z,. ]+, which means "one or more a to z, comma, period, or space" (note that the . there is just a period, not a wildcard). That's much more limited (in particular, it doesn't include #).
Side note: There's no need to escape the " with a backslash, " isn't a special character in regular expressions.
Why the below regex is not working?
\".+(?!.*\")
Answer:
\" matches the first " and the following .+ would match greedily upto the last character. Because the last character in a line isn't followed by any character zero or more times plus \, the above regex would match the whole line undoubtably.
For your case, you could simply use the below regex to match from the second " upto the end of the line anchor.
\"[^"\n]*$
DEMO

How can I remove all white spaces?

I am using the following:
var xpto = "AB23. 3434-.34212 23 42."
I'm removing the "." "-" And ""
xpto.replace(/\./g,"").replace(/\-/g,"").replace("/\s/g","")
How can I remove all white spaces?
Your last replace is using a string, not a regular expression. You also don't seem to have kept the result:
xpto = xpto.replace(/\./g,"").replace(/\-/g,"").replace(/\s/g,"");
// ^ No quotes here -------------------------------^--^
// \--- Remember result
You can also shorten that and just call replace once, using a character class ([...]):
xpto = xpto.replace(/[-.\s]/g,"");
(Note that when using the - character literally in a character class, you must make it the first character after the opening [ or the last character before the closing ], or put a backslash in front of it. If it appears between two other characters ([a-z], for instance), it means "any character in the range".)
You can remove white spaces using replace function
xpto.replace(/\s/g,'');
Your error comes from the quotes around your last regex, however I might also point out that you are calling replace way more than needed:
xpto = xpto.replace(/[\s.-]/g,"");
This will strip out spaces, dots and hyphens.
You done it right, but forgot the quotation marks "" at /\s/g. Also, you want to change the string xpto, to the replaced xpto, so you can now do something with it.
Javascript
var xpto = "AB23. 3434-.34212 23 42."
xpto = xpto.replace(/\./g,"").replace(/\-/g,"").replace(/\s/g,"");
Output
AB233434342122342
JSFiddle demo

remove unwanted commas in JavaScript

I want to remove all unnecessary commas from the start/end of the string.
eg; google, yahoo,, , should become google, yahoo.
If possible ,google,, , yahoo,, , should become google,yahoo.
I've tried the below code as a starting point, but it seems to be not working as desired.
trimCommas = function(s) {
s = s.replace(/,*$/, "");
s = s.replace(/^\,*/, "");
return s;
}
In your example you also want to trim the commas if there's spaces between them at the start or at the end, use something like this:
str.replace(/^[,\s]+|[,\s]+$/g, '').replace(/,[,\s]*,/g, ',');
Note the use of the 'g' modifier for global replace.
You need this:
s = s.replace(/[,\s]{2,}/,""); //Removes double or more commas / spaces
s = s.replace(/^,*/,""); //Removes all commas from the beginning
s = s.replace(/,*$/,""); //Removes all commas from the end
EDIT: Made all the changes - should work now.
My take:
var cleanStr = str.replace(/^[\s,]+/,"")
.replace(/[\s,]+$/,"")
.replace(/\s*,+\s*(,+\s*)*/g,",")
This one will work with opera, internet explorer, whatever
Actually tested this last one, and it works!
What you need to do is replace all groups of "space and comma" with a single comma and then remove commas from the start and end:
trimCommas = function(str) {
str = str.replace(/[,\s]*,[,\s]*/g, ",");
str = str.replace(/^,/, "");
str = str.replace(/,$/, "");
return str;
}
The first one replaces every sequence of white space and commas with a single comma, provided there's at least one comma in there. This handles the edge case left in the comments for "Internet Explorer".
The second and third get rid of the comma at the start and end of string where necessary.
You can also add (to the end):
str = str.replace(/[\s]+/, " ");
to collapse multi-spaces down to one space and
str = str.replace(/,/g, ", ");
if you want them to be formatted nicely (space after each comma).
A more generalized solution would be to pass parameters to indicate behaviour:
Passing true for collapse will collapse the spaces within a section (a section being defined as the characters between commas).
Passing true for addSpace will use ", " to separate sections rather than just "," on its own.
That code follows. It may not be necessary for your particular case but it might be better for others in terms of code re-use.
trimCommas = function(str,collapse,addspace) {
str = str.replace(/[,\s]*,[,\s]*/g, ",").replace(/^,/, "").replace(/,$/, "");
if (collapse) {
str = str.replace(/[\s]+/, " ");
}
if (addspace) {
str = str.replace(/,/g, ", ");
}
return str;
}
First ping on Google for "Javascript Trim": http://www.somacon.com/p355.php. You seem to have implemented this using commas, and I don't see why it would be a problem (though you escaped in the second one and not in the first).
Not quite as sophisticated, but simple with:
',google,, , yahoo,, ,'.replace(/\s/g, '').replace(/,+/g, ',');
You should be able to use only one replace call:
/^( *, *)+|(, *(?=,|$))+/g
Test:
'google, yahoo,, ,'.replace(/^( *, *)+|(, *(?=,|$))+/g, '');
"google, yahoo"
',google,, , yahoo,, ,'.replace(/^( *, *)+|(, *(?=,|$))+/g, '');
"google, yahoo"
Breakdown:
/
^( *, *)+ # Match start of string followed by zero or more spaces
# followed by , followed by zero or more spaces.
# Repeat one or more times
| # regex or
(, *(?=,|$))+ # Match , followed by zero or more spaces which have a comma
# after it or EOL. Repeat one or more times
/g # `g` modifier will run on until there is no more matches
(?=...) is a look ahead will will not move the position of the match but only verify that a the characters are after the match. In our case we look for , or EOL
match() is much better tool for this than replace()
str = " aa, bb,, cc , dd,,,";
newStr = str.match(/[^\s,]+/g).join(",")
alert("[" + newStr + "]")
When you want to replace ",," ",,,", ",,,," and ",,,,," below code will be removed by ",".
var abc = new String("46590,26.91667,75.81667,,,45346,27.18333,78.01667,,,45630,12.97194,77.59369,,,47413,19.07283,72.88261,,,45981,13.08784,80.27847,,");
var pqr= abc.replace(/,,/g,',').replace(/,,/g, ',');
alert(pqr);

Sort lines on webpage using javascript/ regex

I'd like to write a Greasemonkey script that requires finding lines ending with a string ("copies.") & sorting those lines based on the number preceding that string.
The page I'm looking to modify does not use tables unfortunately, just the br/ tag, so I assume that this will involve Regex:
http://www.publishersweekly.com/article/CA6591208.html
(Lines without the matching string will just be ignored.)
Would be grateful for any tips to get me started.
Most times, HTML and RegEx do not go together, and when parsing HTML your first thought should not be RegEx.
However, in this situation, the markup looks simple enough that it should be okay - at least until Publisher Weekly change how they do that page.
Here's a function that will extract the data, grab the appropriate lines, sort them, and put them back again:
($j is jQuery)
function reorderPwList()
{
var Container = $j('#article span.table');
var TargetLines = /^.+?(\d+(?:,\d{3})*) copies\.<br ?\/?>$/gmi
var Lines = Container.html().match( TargetLines );
Lines.sort( sortPwCopies );
Container.html( Lines.join('\n') );
function sortPwCopies()
{
function getCopyNum()
{ return arguments[0].replace(TargetLines,'$1').replace(/\D/g,'') }
return getCopyNum(arguments[0]) - getCopyNum(arguments[1]);
}
}
And an explanation of the regex used there:
^ # start of line
.+? # lazy match one or more non-newline characters
( # start capture group $1
\d+ # match one or more digits (0-9)
(?: # non-capture group
,\d{3} # comma, then three digits
)* # end group, repeat zero or more times
) # end group $1
copies\. # literal text, with . escaped
<br ?\/?> # match a br tag, with optional space or slash just in case
$ # end of line
(For readability, I've indented the groups - only the spaces before 'copies' and after 'br' are valid ones.)
The regex flags gmi are used, for global, multi-line mode, case-insensitive matching.
<OLD ANSWER>
Once you've extracted just the text you want to look at (using DOM/jQuery), you can then pass it to the following function, which will put the relevant information into a format that can then be sorted:
function makeSortable(Text)
{
// Mark sortable lines and put number before main content.
Text = Text.replace
( /^(.*)([\d,]+) copies\.<br \/>/gm
, "SORT ME$2 $1"
);
// Remove anything not marked for sorting.
Text = Text.replace( /^(?!SORT ME).*$/gm , '' );
// Remove blank lines.
Text = Text.replace( /\n{2,}/g , '\n' );
// Remove sort token.
Text = Text.replace( /SORT ME/g , '' );
return Text;
}
You'll then need a sort function to ensure that the numbers are sorted correctly (the standard JS array.sort method will sort on text, and put 100,000 before 20,000).
Oh, and here's a quick explanation of the regexes used here:
/^(.*)([\d,]+) copies\.<br \/>/gm
/.../gm a regex with global-match and multi-line modes
^ matches start of line
(.*) capture to $1, any char (except newline), zero or more times
([\d,]+) capture to $2, any digit or comma, one or more times
copies literal text
\.<br \/> literal text, with . and / escaped (they would be special otherwise)
/^(?!SORT ME).*$/gm
/.../gm again, enable global and multi-line
^ match start of line
(?!SORT ME) a negative lookahead, fails the match if text 'SORT ME' is after it
.* any char (except newline), zero or more times
$ end of line
/\n{2,}/g
\n{2,} a newline character, two or more times
</OLD ANSWER>
you can start with something like this (just copypaste into the firebug console)
// where are the things
var elem = document.getElementById("article").
getElementsByTagName("span")[1].
getElementsByTagName("span")[0];
// extract lines into array
var lines = []
elem.innerHTML.replace(/.+?\d+\s+copies\.\s*<br>/g,
function($0) { lines.push($0) });
// sort an array
// lines.sort(function(a, b) {
// var ma = a.match(/(\d+),(\d+)\s+copies/);
// var mb = b.match(/(\d+),(\d+)\s+copies/);
//
// return parseInt(ma[1] + ma[2]) -
// parseInt(mb[1] + mb[2]);
lines.sort(function(a, b) {
function getNum(p) {
return parseInt(
p.match(/([\d,]+)\s+copies/)[1].replace(/,/g, ""));
}
return getNum(a) - getNum(b);
})
// put it back
elem.innerHTML = lines.join("");
It's not clear to me what it is you're trying to do. When posting questions here, I encourage you to post (a part of) your actual data and clearly indicate what exactly you're trying to match.
But, I am guessing you know very little regex, in which case, why use regex at all? If you study the topic a bit, you will soon know that regex is not some magical tool that produces whatever it is you're thinking of. Regex cannot sort in whatever way. It simply matches text, that's all.
Have a look at this excellent on-line resource: http://www.regular-expressions.info/
And if after reading you think a regex solution to your problem is appropriate, feel free to elaborate on your question and I'm sure I, or someone else is able to give you a hand.
Best of luck.

Categories