Related
I have a string that output
20153 Risk
What i am trying to achieve is getting only letters, i have achieved by getting only numbers using regular expression which is
const cf_regex_number = cf_input.replace(/\D/g, '');
this will return only 20153 . But as soon as i tried to only get letters , its returning the while string instead of Risk . i have done my research and the regular expression to get only letters is using **/^[a-zA-Z]*$/**
This is my line of code i tried to get only letters
const cf_regex_character = cf_input.replace(/^[a-zA-Z]*$/,'')
but instead of returning Risk , it is returning 20153 Risk which is the whole line of string .
/[^a-z]+/i
The [ brackets ] signify a range of characters; specifically, a to z in this case.
Actually the i flag means insensitive to case, so that includes A to Z also.
The caret ^ inverts the pattern; it means, anything not in the specified range.
And the + means continue adding characters to the match as long as they are they within that range.
Then stop matching.
In effect this matches everything up to the space in 20153 Risk.
Then you replace this match with the empty string '' and what you've got left is Risk.
const string = '20153 Risk';
const result = string.replace(/[^a-z]+/i, '');
console.log(result);
Your first pattern is locating every non-digit and replacing it with nothing.
On the other hand, your second pattern is locating just the first occurence of a pattern, and the pattern is looking for start of string, followed by letters, followed by end of string. There is no such sequence - if you start from the start of string, there are exactly zero letters, and then you are left very far from the expected end of the string. Even if that worked, you are deleting letters, not non-letters.
This pattern is parallel to your first one (delete any occurence of a non-letter):
const cf_regex_character = cf_input.replace(/[^a-zA-Z]/g,'')
but possibly a better way to go is to extract the desired substring, instead of deleting everything that it is not:
const letters = cf_input.match(/[a-z]+/i)[0];
const numbers = cf_input.match(/\d+/)[0];
(This is if you know there is such a substring; if you are unsure it would be better to code a bit more defensively.)
cf_input="20153 Risk"
const cf_regex_character = cf_input.replace(/\d+\s/,'')
console.log(cf_regex_character)
str="20153 Risk"
reg=/[a-z]+/gi
res=str.match(reg)
console.log(res[0])
I want to write a regular expression, in JavaScript, for finding the string starting and ending with :.
For example "hello :smile: :sleeping:" from this string I need to find the strings which are starting and ending with the : characters. I tried the expression below, but it didn't work:
^:.*\:$
My guess is that you not only want to find the string, but also replace it. For that you should look at using a capture in the regexp combined with a replacement function.
const emojiPattern = /:(\w+):/g
function replaceEmojiTags(text) {
return text.replace(emojiPattern, function (tag, emotion) {
// The emotion will be the captured word between your tags,
// so either "sleep" or "sleeping" in your example
//
// In this function you would take that emotion and return
// whatever you want based on the input parameter and the
// whole tag would be replaced
//
// As an example, let's say you had a bunch of GIF images
// for the different emotions:
return '<img src="/img/emoji/' + emotion + '.gif" />';
});
}
With that code you could then run your function on any input string and replace the tags to get the HTML for the actual images in them. As in your example:
replaceEmojiTags('hello :smile: :sleeping:')
// 'hello <img src="/img/emoji/smile.gif" /> <img src="/img/emoji/sleeping.gif" />'
EDIT: To support hyphens within the emotion, as in "big-smile", the pattern needs to be changed since it is only looking for word characters. For this there is probably also a restriction such that the hyphen must join two words so that it shouldn't accept "-big-smile" or "big-smile-". For that you need to change the pattern to:
const emojiPattern = /:(\w+(-\w+)*):/g
That pattern is looking for any word that is then followed by zero or more instances of a hyphen followed by a word. It would match any of the following: "smile", "big-smile", "big-smile-bigger".
The ^ and $ are anchors (start and end respectively). These cause your regex to explicitly match an entire string which starts with : has anything between it and ends with :.
If you want to match characters within a string you can remove the anchors.
Your * indicates zero or more so you'll be matching :: as well. It'll be better to change this to + which means one or more. In fact if you're just looking for text you may want to use a range [a-z0-9] with a case insensitive modifier.
If we put it all together we'll have regex like this /:([a-z0-9]+):/gmi
match a string beginning with : with any alphanumeric character one or more times ending in : with the modifiers g globally, m multi-line and i case insensitive for things like :FacePalm:.
Using it in JavaScript we can end up with:
var mytext = 'Hello :smile: and jolly :wave:';
var matches = mytext.match(/:([a-z0-9]+):/gmi);
// matches = [':smile:', ':wave:'];
You'll have an array with each match found.
Is there a way to achieve the equivalent of a negative lookbehind in JavaScript regular expressions? I need to match a string that does not start with a specific set of characters.
It seems I am unable to find a regex that does this without failing if the matched part is found at the beginning of the string. Negative lookbehinds seem to be the only answer, but JavaScript doesn't has one.
This is the regex that I would like to work, but it doesn't:
(?<!([abcdefg]))m
So it would match the 'm' in 'jim' or 'm', but not 'jam'
Since 2018, Lookbehind Assertions are part of the ECMAScript language specification.
// positive lookbehind
(?<=...)
// negative lookbehind
(?<!...)
Answer pre-2018
As Javascript supports negative lookahead, one way to do it is:
reverse the input string
match with a reversed regex
reverse and reformat the matches
const reverse = s => s.split('').reverse().join('');
const test = (stringToTests, reversedRegexp) => stringToTests
.map(reverse)
.forEach((s,i) => {
const match = reversedRegexp.test(s);
console.log(stringToTests[i], match, 'token:', match ? reverse(reversedRegexp.exec(s)[0]) : 'Ø');
});
Example 1:
Following #andrew-ensley's question:
test(['jim', 'm', 'jam'], /m(?!([abcdefg]))/)
Outputs:
jim true token: m
m true token: m
jam false token: Ø
Example 2:
Following #neaumusic comment (match max-height but not line-height, the token being height):
test(['max-height', 'line-height'], /thgieh(?!(-enil))/)
Outputs:
max-height true token: height
line-height false token: Ø
Lookbehind Assertions got accepted into the ECMAScript specification in 2018.
Positive lookbehind usage:
console.log(
"$9.99 €8.47".match(/(?<=\$)\d+\.\d*/) // Matches "9.99"
);
Negative lookbehind usage:
console.log(
"$9.99 €8.47".match(/(?<!\$)\d+\.\d*/) // Matches "8.47"
);
Platform support:
✔️ V8
✔️ Google Chrome 62.0
✔️ Microsoft Edge 79.0
✔️ Node.js 6.0 behind a flag and 9.0 without a flag
✔️ Deno (all versions)
✔️ SpiderMonkey
✔️ Mozilla Firefox 78.0
🛠️ JavaScriptCore: support in beta as of 2023-02-16: the feature has been merged
✔️ Apple Safari 16.4 (in beta as of 2023-02-16)
✔️ iOS 16.4 beta WebView (all browsers on iOS + iPadOS)
✔️ Bun 0.2.2
❌ Chakra: Microsoft was working on it but Chakra is now abandoned in favor of V8
❌ Internet Explorer
❌ Edge versions prior to 79 (the ones based on EdgeHTML+Chakra)
Let's suppose you want to find all int not preceded by unsigned :
With support for negative look-behind:
(?<!unsigned )int
Without support for negative look-behind:
((?!unsigned ).{9}|^.{0,8})int
Basically idea is to grab n preceding characters and exclude match with negative look-ahead, but also match the cases where there's no preceeding n characters. (where n is length of look-behind).
So the regex in question:
(?<!([abcdefg]))m
would translate to:
((?!([abcdefg])).|^)m
You might need to play with capturing groups to find exact spot of the string that interests you or you want to replace specific part with something else.
Mijoja's strategy works for your specific case but not in general:
js>newString = "Fall ball bill balll llama".replace(/(ba)?ll/g,
function($0,$1){ return $1?$0:"[match]";});
Fa[match] ball bi[match] balll [match]ama
Here's an example where the goal is to match a double-l but not if it is preceded by "ba". Note the word "balll" -- true lookbehind should have suppressed the first 2 l's but matched the 2nd pair. But by matching the first 2 l's and then ignoring that match as a false positive, the regexp engine proceeds from the end of that match, and ignores any characters within the false positive.
Use
newString = string.replace(/([abcdefg])?m/, function($0,$1){ return $1?$0:'m';});
You could define a non-capturing group by negating your character set:
(?:[^a-g])m
...which would match every m NOT preceded by any of those letters.
This is how I achieved str.split(/(?<!^)#/) for Node.js 8 (which doesn't support lookbehind):
str.split('').reverse().join('').split(/#(?!$)/).map(s => s.split('').reverse().join('')).reverse()
Works? Yes (unicode untested). Unpleasant? Yes.
following the idea of Mijoja, and drawing from the problems exposed by JasonS, i had this idea; i checked a bit but am not sure of myself, so a verification by someone more expert than me in js regex would be great :)
var re = /(?=(..|^.?)(ll))/g
// matches empty string position
// whenever this position is followed by
// a string of length equal or inferior (in case of "^")
// to "lookbehind" value
// + actual value we would want to match
, str = "Fall ball bill balll llama"
, str_done = str
, len_difference = 0
, doer = function (where_in_str, to_replace)
{
str_done = str_done.slice(0, where_in_str + len_difference)
+ "[match]"
+ str_done.slice(where_in_str + len_difference + to_replace.length)
len_difference = str_done.length - str.length
/* if str smaller:
len_difference will be positive
else will be negative
*/
} /* the actual function that would do whatever we want to do
with the matches;
this above is only an example from Jason's */
/* function input of .replace(),
only there to test the value of $behind
and if negative, call doer() with interesting parameters */
, checker = function ($match, $behind, $after, $where, $str)
{
if ($behind !== "ba")
doer
(
$where + $behind.length
, $after
/* one will choose the interesting arguments
to give to the doer, it's only an example */
)
return $match // empty string anyhow, but well
}
str.replace(re, checker)
console.log(str_done)
my personal output:
Fa[match] ball bi[match] bal[match] [match]ama
the principle is to call checker at each point in the string between any two characters, whenever that position is the starting point of:
--- any substring of the size of what is not wanted (here 'ba', thus ..) (if that size is known; otherwise it must be harder to do perhaps)
--- --- or smaller than that if it's the beginning of the string: ^.?
and, following this,
--- what is to be actually sought (here 'll').
At each call of checker, there will be a test to check if the value before ll is not what we don't want (!== 'ba'); if that's the case, we call another function, and it will have to be this one (doer) that will make the changes on str, if the purpose is this one, or more generically, that will get in input the necessary data to manually process the results of the scanning of str.
here we change the string so we needed to keep a trace of the difference of length in order to offset the locations given by replace, all calculated on str, which itself never changes.
since primitive strings are immutable, we could have used the variable str to store the result of the whole operation, but i thought the example, already complicated by the replacings, would be clearer with another variable (str_done).
i guess that in terms of performances it must be pretty harsh: all those pointless replacements of '' into '', this str.length-1 times, plus here manual replacement by doer, which means a lot of slicing...
probably in this specific above case that could be grouped, by cutting the string only once into pieces around where we want to insert [match] and .join()ing it with [match] itself.
the other thing is that i don't know how it would handle more complex cases, that is, complex values for the fake lookbehind... the length being perhaps the most problematic data to get.
and, in checker, in case of multiple possibilities of nonwanted values for $behind, we'll have to make a test on it with yet another regex (to be cached (created) outside checker is best, to avoid the same regex object to be created at each call for checker) to know whether or not it is what we seek to avoid.
hope i've been clear; if not don't hesitate, i'll try better. :)
Using your case, if you want to replace m with something, e.g. convert it to uppercase M, you can negate set in capturing group.
match ([^a-g])m, replace with $1M
"jim jam".replace(/([^a-g])m/g, "$1M")
\\jiM jam
([^a-g]) will match any char not(^) in a-g range, and store it in first capturing group, so you can access it with $1.
So we find im in jim and replace it with iM which results in jiM.
As mentioned before, JavaScript allows lookbehinds now. In older browsers you still need a workaround.
I bet my head there is no way to find a regex without lookbehind that delivers the result exactly. All you can do is working with groups. Suppose you have a regex (?<!Before)Wanted, where Wanted is the regex you want to match and Before is the regex that counts out what should not precede the match. The best you can do is negate the regex Before and use the regex NotBefore(Wanted). The desired result is the first group $1.
In your case Before=[abcdefg] which is easy to negate NotBefore=[^abcdefg]. So the regex would be [^abcdefg](m). If you need the position of Wanted, you must group NotBefore too, so that the desired result is the second group.
If matches of the Before pattern have a fixed length n, that is, if the pattern contains no repetitive tokens, you can avoid negating the Before pattern and use the regular expression (?!Before).{n}(Wanted), but still have to use the first group or use the regular expression (?!Before)(.{n})(Wanted) and use the second group. In this example, the pattern Before actually has a fixed length, namely 1, so use the regex (?![abcdefg]).(m) or (?![abcdefg])(.)(m). If you are interested in all matches, add the g flag, see my code snippet:
function TestSORegEx() {
var s = "Donald Trump doesn't like jam, but Homer Simpson does.";
var reg = /(?![abcdefg])(.{1})(m)/gm;
var out = "Matches and groups of the regex " +
"/(?![abcdefg])(.{1})(m)/gm in \ns = \"" + s + "\"";
var match = reg.exec(s);
while(match) {
var start = match.index + match[1].length;
out += "\nWhole match: " + match[0] + ", starts at: " + match.index
+ ". Desired match: " + match[2] + ", starts at: " + start + ".";
match = reg.exec(s);
}
out += "\nResulting string after statement s.replace(reg, \"$1*$2*\")\n"
+ s.replace(reg, "$1*$2*");
alert(out);
}
This effectively does it
"jim".match(/[^a-g]m/)
> ["im"]
"jam".match(/[^a-g]m/)
> null
Search and replace example
"jim jam".replace(/([^a-g])m/g, "$1M")
> "jiM jam"
Note that the negative look-behind string must be 1 character long for this to work.
Is there a way to achieve the equivalent of a negative lookbehind in JavaScript regular expressions? I need to match a string that does not start with a specific set of characters.
It seems I am unable to find a regex that does this without failing if the matched part is found at the beginning of the string. Negative lookbehinds seem to be the only answer, but JavaScript doesn't has one.
This is the regex that I would like to work, but it doesn't:
(?<!([abcdefg]))m
So it would match the 'm' in 'jim' or 'm', but not 'jam'
Since 2018, Lookbehind Assertions are part of the ECMAScript language specification.
// positive lookbehind
(?<=...)
// negative lookbehind
(?<!...)
Answer pre-2018
As Javascript supports negative lookahead, one way to do it is:
reverse the input string
match with a reversed regex
reverse and reformat the matches
const reverse = s => s.split('').reverse().join('');
const test = (stringToTests, reversedRegexp) => stringToTests
.map(reverse)
.forEach((s,i) => {
const match = reversedRegexp.test(s);
console.log(stringToTests[i], match, 'token:', match ? reverse(reversedRegexp.exec(s)[0]) : 'Ø');
});
Example 1:
Following #andrew-ensley's question:
test(['jim', 'm', 'jam'], /m(?!([abcdefg]))/)
Outputs:
jim true token: m
m true token: m
jam false token: Ø
Example 2:
Following #neaumusic comment (match max-height but not line-height, the token being height):
test(['max-height', 'line-height'], /thgieh(?!(-enil))/)
Outputs:
max-height true token: height
line-height false token: Ø
Lookbehind Assertions got accepted into the ECMAScript specification in 2018.
Positive lookbehind usage:
console.log(
"$9.99 €8.47".match(/(?<=\$)\d+\.\d*/) // Matches "9.99"
);
Negative lookbehind usage:
console.log(
"$9.99 €8.47".match(/(?<!\$)\d+\.\d*/) // Matches "8.47"
);
Platform support:
✔️ V8
✔️ Google Chrome 62.0
✔️ Microsoft Edge 79.0
✔️ Node.js 6.0 behind a flag and 9.0 without a flag
✔️ Deno (all versions)
✔️ SpiderMonkey
✔️ Mozilla Firefox 78.0
🛠️ JavaScriptCore: support in beta as of 2023-02-16: the feature has been merged
✔️ Apple Safari 16.4 (in beta as of 2023-02-16)
✔️ iOS 16.4 beta WebView (all browsers on iOS + iPadOS)
✔️ Bun 0.2.2
❌ Chakra: Microsoft was working on it but Chakra is now abandoned in favor of V8
❌ Internet Explorer
❌ Edge versions prior to 79 (the ones based on EdgeHTML+Chakra)
Let's suppose you want to find all int not preceded by unsigned :
With support for negative look-behind:
(?<!unsigned )int
Without support for negative look-behind:
((?!unsigned ).{9}|^.{0,8})int
Basically idea is to grab n preceding characters and exclude match with negative look-ahead, but also match the cases where there's no preceeding n characters. (where n is length of look-behind).
So the regex in question:
(?<!([abcdefg]))m
would translate to:
((?!([abcdefg])).|^)m
You might need to play with capturing groups to find exact spot of the string that interests you or you want to replace specific part with something else.
Mijoja's strategy works for your specific case but not in general:
js>newString = "Fall ball bill balll llama".replace(/(ba)?ll/g,
function($0,$1){ return $1?$0:"[match]";});
Fa[match] ball bi[match] balll [match]ama
Here's an example where the goal is to match a double-l but not if it is preceded by "ba". Note the word "balll" -- true lookbehind should have suppressed the first 2 l's but matched the 2nd pair. But by matching the first 2 l's and then ignoring that match as a false positive, the regexp engine proceeds from the end of that match, and ignores any characters within the false positive.
Use
newString = string.replace(/([abcdefg])?m/, function($0,$1){ return $1?$0:'m';});
You could define a non-capturing group by negating your character set:
(?:[^a-g])m
...which would match every m NOT preceded by any of those letters.
This is how I achieved str.split(/(?<!^)#/) for Node.js 8 (which doesn't support lookbehind):
str.split('').reverse().join('').split(/#(?!$)/).map(s => s.split('').reverse().join('')).reverse()
Works? Yes (unicode untested). Unpleasant? Yes.
following the idea of Mijoja, and drawing from the problems exposed by JasonS, i had this idea; i checked a bit but am not sure of myself, so a verification by someone more expert than me in js regex would be great :)
var re = /(?=(..|^.?)(ll))/g
// matches empty string position
// whenever this position is followed by
// a string of length equal or inferior (in case of "^")
// to "lookbehind" value
// + actual value we would want to match
, str = "Fall ball bill balll llama"
, str_done = str
, len_difference = 0
, doer = function (where_in_str, to_replace)
{
str_done = str_done.slice(0, where_in_str + len_difference)
+ "[match]"
+ str_done.slice(where_in_str + len_difference + to_replace.length)
len_difference = str_done.length - str.length
/* if str smaller:
len_difference will be positive
else will be negative
*/
} /* the actual function that would do whatever we want to do
with the matches;
this above is only an example from Jason's */
/* function input of .replace(),
only there to test the value of $behind
and if negative, call doer() with interesting parameters */
, checker = function ($match, $behind, $after, $where, $str)
{
if ($behind !== "ba")
doer
(
$where + $behind.length
, $after
/* one will choose the interesting arguments
to give to the doer, it's only an example */
)
return $match // empty string anyhow, but well
}
str.replace(re, checker)
console.log(str_done)
my personal output:
Fa[match] ball bi[match] bal[match] [match]ama
the principle is to call checker at each point in the string between any two characters, whenever that position is the starting point of:
--- any substring of the size of what is not wanted (here 'ba', thus ..) (if that size is known; otherwise it must be harder to do perhaps)
--- --- or smaller than that if it's the beginning of the string: ^.?
and, following this,
--- what is to be actually sought (here 'll').
At each call of checker, there will be a test to check if the value before ll is not what we don't want (!== 'ba'); if that's the case, we call another function, and it will have to be this one (doer) that will make the changes on str, if the purpose is this one, or more generically, that will get in input the necessary data to manually process the results of the scanning of str.
here we change the string so we needed to keep a trace of the difference of length in order to offset the locations given by replace, all calculated on str, which itself never changes.
since primitive strings are immutable, we could have used the variable str to store the result of the whole operation, but i thought the example, already complicated by the replacings, would be clearer with another variable (str_done).
i guess that in terms of performances it must be pretty harsh: all those pointless replacements of '' into '', this str.length-1 times, plus here manual replacement by doer, which means a lot of slicing...
probably in this specific above case that could be grouped, by cutting the string only once into pieces around where we want to insert [match] and .join()ing it with [match] itself.
the other thing is that i don't know how it would handle more complex cases, that is, complex values for the fake lookbehind... the length being perhaps the most problematic data to get.
and, in checker, in case of multiple possibilities of nonwanted values for $behind, we'll have to make a test on it with yet another regex (to be cached (created) outside checker is best, to avoid the same regex object to be created at each call for checker) to know whether or not it is what we seek to avoid.
hope i've been clear; if not don't hesitate, i'll try better. :)
Using your case, if you want to replace m with something, e.g. convert it to uppercase M, you can negate set in capturing group.
match ([^a-g])m, replace with $1M
"jim jam".replace(/([^a-g])m/g, "$1M")
\\jiM jam
([^a-g]) will match any char not(^) in a-g range, and store it in first capturing group, so you can access it with $1.
So we find im in jim and replace it with iM which results in jiM.
As mentioned before, JavaScript allows lookbehinds now. In older browsers you still need a workaround.
I bet my head there is no way to find a regex without lookbehind that delivers the result exactly. All you can do is working with groups. Suppose you have a regex (?<!Before)Wanted, where Wanted is the regex you want to match and Before is the regex that counts out what should not precede the match. The best you can do is negate the regex Before and use the regex NotBefore(Wanted). The desired result is the first group $1.
In your case Before=[abcdefg] which is easy to negate NotBefore=[^abcdefg]. So the regex would be [^abcdefg](m). If you need the position of Wanted, you must group NotBefore too, so that the desired result is the second group.
If matches of the Before pattern have a fixed length n, that is, if the pattern contains no repetitive tokens, you can avoid negating the Before pattern and use the regular expression (?!Before).{n}(Wanted), but still have to use the first group or use the regular expression (?!Before)(.{n})(Wanted) and use the second group. In this example, the pattern Before actually has a fixed length, namely 1, so use the regex (?![abcdefg]).(m) or (?![abcdefg])(.)(m). If you are interested in all matches, add the g flag, see my code snippet:
function TestSORegEx() {
var s = "Donald Trump doesn't like jam, but Homer Simpson does.";
var reg = /(?![abcdefg])(.{1})(m)/gm;
var out = "Matches and groups of the regex " +
"/(?![abcdefg])(.{1})(m)/gm in \ns = \"" + s + "\"";
var match = reg.exec(s);
while(match) {
var start = match.index + match[1].length;
out += "\nWhole match: " + match[0] + ", starts at: " + match.index
+ ". Desired match: " + match[2] + ", starts at: " + start + ".";
match = reg.exec(s);
}
out += "\nResulting string after statement s.replace(reg, \"$1*$2*\")\n"
+ s.replace(reg, "$1*$2*");
alert(out);
}
This effectively does it
"jim".match(/[^a-g]m/)
> ["im"]
"jam".match(/[^a-g]m/)
> null
Search and replace example
"jim jam".replace(/([^a-g])m/g, "$1M")
> "jiM jam"
Note that the negative look-behind string must be 1 character long for this to work.
I'd like to write a Greasemonkey script that requires finding lines ending with a string ("copies.") & sorting those lines based on the number preceding that string.
The page I'm looking to modify does not use tables unfortunately, just the br/ tag, so I assume that this will involve Regex:
http://www.publishersweekly.com/article/CA6591208.html
(Lines without the matching string will just be ignored.)
Would be grateful for any tips to get me started.
Most times, HTML and RegEx do not go together, and when parsing HTML your first thought should not be RegEx.
However, in this situation, the markup looks simple enough that it should be okay - at least until Publisher Weekly change how they do that page.
Here's a function that will extract the data, grab the appropriate lines, sort them, and put them back again:
($j is jQuery)
function reorderPwList()
{
var Container = $j('#article span.table');
var TargetLines = /^.+?(\d+(?:,\d{3})*) copies\.<br ?\/?>$/gmi
var Lines = Container.html().match( TargetLines );
Lines.sort( sortPwCopies );
Container.html( Lines.join('\n') );
function sortPwCopies()
{
function getCopyNum()
{ return arguments[0].replace(TargetLines,'$1').replace(/\D/g,'') }
return getCopyNum(arguments[0]) - getCopyNum(arguments[1]);
}
}
And an explanation of the regex used there:
^ # start of line
.+? # lazy match one or more non-newline characters
( # start capture group $1
\d+ # match one or more digits (0-9)
(?: # non-capture group
,\d{3} # comma, then three digits
)* # end group, repeat zero or more times
) # end group $1
copies\. # literal text, with . escaped
<br ?\/?> # match a br tag, with optional space or slash just in case
$ # end of line
(For readability, I've indented the groups - only the spaces before 'copies' and after 'br' are valid ones.)
The regex flags gmi are used, for global, multi-line mode, case-insensitive matching.
<OLD ANSWER>
Once you've extracted just the text you want to look at (using DOM/jQuery), you can then pass it to the following function, which will put the relevant information into a format that can then be sorted:
function makeSortable(Text)
{
// Mark sortable lines and put number before main content.
Text = Text.replace
( /^(.*)([\d,]+) copies\.<br \/>/gm
, "SORT ME$2 $1"
);
// Remove anything not marked for sorting.
Text = Text.replace( /^(?!SORT ME).*$/gm , '' );
// Remove blank lines.
Text = Text.replace( /\n{2,}/g , '\n' );
// Remove sort token.
Text = Text.replace( /SORT ME/g , '' );
return Text;
}
You'll then need a sort function to ensure that the numbers are sorted correctly (the standard JS array.sort method will sort on text, and put 100,000 before 20,000).
Oh, and here's a quick explanation of the regexes used here:
/^(.*)([\d,]+) copies\.<br \/>/gm
/.../gm a regex with global-match and multi-line modes
^ matches start of line
(.*) capture to $1, any char (except newline), zero or more times
([\d,]+) capture to $2, any digit or comma, one or more times
copies literal text
\.<br \/> literal text, with . and / escaped (they would be special otherwise)
/^(?!SORT ME).*$/gm
/.../gm again, enable global and multi-line
^ match start of line
(?!SORT ME) a negative lookahead, fails the match if text 'SORT ME' is after it
.* any char (except newline), zero or more times
$ end of line
/\n{2,}/g
\n{2,} a newline character, two or more times
</OLD ANSWER>
you can start with something like this (just copypaste into the firebug console)
// where are the things
var elem = document.getElementById("article").
getElementsByTagName("span")[1].
getElementsByTagName("span")[0];
// extract lines into array
var lines = []
elem.innerHTML.replace(/.+?\d+\s+copies\.\s*<br>/g,
function($0) { lines.push($0) });
// sort an array
// lines.sort(function(a, b) {
// var ma = a.match(/(\d+),(\d+)\s+copies/);
// var mb = b.match(/(\d+),(\d+)\s+copies/);
//
// return parseInt(ma[1] + ma[2]) -
// parseInt(mb[1] + mb[2]);
lines.sort(function(a, b) {
function getNum(p) {
return parseInt(
p.match(/([\d,]+)\s+copies/)[1].replace(/,/g, ""));
}
return getNum(a) - getNum(b);
})
// put it back
elem.innerHTML = lines.join("");
It's not clear to me what it is you're trying to do. When posting questions here, I encourage you to post (a part of) your actual data and clearly indicate what exactly you're trying to match.
But, I am guessing you know very little regex, in which case, why use regex at all? If you study the topic a bit, you will soon know that regex is not some magical tool that produces whatever it is you're thinking of. Regex cannot sort in whatever way. It simply matches text, that's all.
Have a look at this excellent on-line resource: http://www.regular-expressions.info/
And if after reading you think a regex solution to your problem is appropriate, feel free to elaborate on your question and I'm sure I, or someone else is able to give you a hand.
Best of luck.