Sort lines on webpage using javascript/ regex

Sort lines on webpage using javascript/ regex - javascript

I'd like to write a Greasemonkey script that requires finding lines ending with a string ("copies.") & sorting those lines based on the number preceding that string.
The page I'm looking to modify does not use tables unfortunately, just the br/ tag, so I assume that this will involve Regex:
http://www.publishersweekly.com/article/CA6591208.html
(Lines without the matching string will just be ignored.)
Would be grateful for any tips to get me started.

Most times, HTML and RegEx do not go together, and when parsing HTML your first thought should not be RegEx.
However, in this situation, the markup looks simple enough that it should be okay - at least until Publisher Weekly change how they do that page.
Here's a function that will extract the data, grab the appropriate lines, sort them, and put them back again:
($j is jQuery)
function reorderPwList()
{
var Container = $j('#article span.table');
var TargetLines = /^.+?(\d+(?:,\d{3})*) copies\.<br ?\/?>$/gmi
var Lines = Container.html().match( TargetLines );
Lines.sort( sortPwCopies );
Container.html( Lines.join('\n') );
function sortPwCopies()
{
function getCopyNum()
{ return arguments[0].replace(TargetLines,'$1').replace(/\D/g,'') }
return getCopyNum(arguments[0]) - getCopyNum(arguments[1]);
}
}
And an explanation of the regex used there:
^ # start of line
.+? # lazy match one or more non-newline characters
( # start capture group $1
\d+ # match one or more digits (0-9)
(?: # non-capture group
,\d{3} # comma, then three digits
)* # end group, repeat zero or more times
) # end group $1
copies\. # literal text, with . escaped
<br ?\/?> # match a br tag, with optional space or slash just in case
$ # end of line
(For readability, I've indented the groups - only the spaces before 'copies' and after 'br' are valid ones.)
The regex flags gmi are used, for global, multi-line mode, case-insensitive matching.
<OLD ANSWER>
Once you've extracted just the text you want to look at (using DOM/jQuery), you can then pass it to the following function, which will put the relevant information into a format that can then be sorted:
function makeSortable(Text)
{
// Mark sortable lines and put number before main content.
Text = Text.replace
( /^(.*)([\d,]+) copies\.<br \/>/gm
, "SORT ME$2 $1"
);
// Remove anything not marked for sorting.
Text = Text.replace( /^(?!SORT ME).*$/gm , '' );
// Remove blank lines.
Text = Text.replace( /\n{2,}/g , '\n' );
// Remove sort token.
Text = Text.replace( /SORT ME/g , '' );
return Text;
}
You'll then need a sort function to ensure that the numbers are sorted correctly (the standard JS array.sort method will sort on text, and put 100,000 before 20,000).
Oh, and here's a quick explanation of the regexes used here:
/^(.*)([\d,]+) copies\.<br \/>/gm
/.../gm a regex with global-match and multi-line modes
^ matches start of line
(.*) capture to $1, any char (except newline), zero or more times
([\d,]+) capture to $2, any digit or comma, one or more times
copies literal text
\.<br \/> literal text, with . and / escaped (they would be special otherwise)
/^(?!SORT ME).*$/gm
/.../gm again, enable global and multi-line
^ match start of line
(?!SORT ME) a negative lookahead, fails the match if text 'SORT ME' is after it
.* any char (except newline), zero or more times
$ end of line
/\n{2,}/g
\n{2,} a newline character, two or more times
</OLD ANSWER>

you can start with something like this (just copypaste into the firebug console)
// where are the things
var elem = document.getElementById("article").
getElementsByTagName("span")[1].
getElementsByTagName("span")[0];
// extract lines into array
var lines = []
elem.innerHTML.replace(/.+?\d+\s+copies\.\s*<br>/g,
function($0) { lines.push($0) });
// sort an array
// lines.sort(function(a, b) {
// var ma = a.match(/(\d+),(\d+)\s+copies/);
// var mb = b.match(/(\d+),(\d+)\s+copies/);
//
// return parseInt(ma[1] + ma[2]) -
// parseInt(mb[1] + mb[2]);
lines.sort(function(a, b) {
function getNum(p) {
return parseInt(
p.match(/([\d,]+)\s+copies/)[1].replace(/,/g, ""));
}
return getNum(a) - getNum(b);
})
// put it back
elem.innerHTML = lines.join("");

It's not clear to me what it is you're trying to do. When posting questions here, I encourage you to post (a part of) your actual data and clearly indicate what exactly you're trying to match.
But, I am guessing you know very little regex, in which case, why use regex at all? If you study the topic a bit, you will soon know that regex is not some magical tool that produces whatever it is you're thinking of. Regex cannot sort in whatever way. It simply matches text, that's all.
Have a look at this excellent on-line resource: http://www.regular-expressions.info/
And if after reading you think a regex solution to your problem is appropriate, feel free to elaborate on your question and I'm sure I, or someone else is able to give you a hand.
Best of luck.

Related

How to write regexp for finding :smile: in javascript?

I want to write a regular expression, in JavaScript, for finding the string starting and ending with :.
For example "hello :smile: :sleeping:" from this string I need to find the strings which are starting and ending with the : characters. I tried the expression below, but it didn't work:
^:.*\:$

My guess is that you not only want to find the string, but also replace it. For that you should look at using a capture in the regexp combined with a replacement function.
const emojiPattern = /:(\w+):/g
function replaceEmojiTags(text) {
return text.replace(emojiPattern, function (tag, emotion) {
// The emotion will be the captured word between your tags,
// so either "sleep" or "sleeping" in your example
//
// In this function you would take that emotion and return
// whatever you want based on the input parameter and the
// whole tag would be replaced
//
// As an example, let's say you had a bunch of GIF images
// for the different emotions:
return '<img src="/img/emoji/' + emotion + '.gif" />';
});
}
With that code you could then run your function on any input string and replace the tags to get the HTML for the actual images in them. As in your example:
replaceEmojiTags('hello :smile: :sleeping:')
// 'hello <img src="/img/emoji/smile.gif" /> <img src="/img/emoji/sleeping.gif" />'
EDIT: To support hyphens within the emotion, as in "big-smile", the pattern needs to be changed since it is only looking for word characters. For this there is probably also a restriction such that the hyphen must join two words so that it shouldn't accept "-big-smile" or "big-smile-". For that you need to change the pattern to:
const emojiPattern = /:(\w+(-\w+)*):/g
That pattern is looking for any word that is then followed by zero or more instances of a hyphen followed by a word. It would match any of the following: "smile", "big-smile", "big-smile-bigger".

The ^ and $ are anchors (start and end respectively). These cause your regex to explicitly match an entire string which starts with : has anything between it and ends with :.
If you want to match characters within a string you can remove the anchors.
Your * indicates zero or more so you'll be matching :: as well. It'll be better to change this to + which means one or more. In fact if you're just looking for text you may want to use a range [a-z0-9] with a case insensitive modifier.
If we put it all together we'll have regex like this /:([a-z0-9]+):/gmi
match a string beginning with : with any alphanumeric character one or more times ending in : with the modifiers g globally, m multi-line and i case insensitive for things like :FacePalm:.
Using it in JavaScript we can end up with:
var mytext = 'Hello :smile: and jolly :wave:';
var matches = mytext.match(/:([a-z0-9]+):/gmi);
// matches = [':smile:', ':wave:'];
You'll have an array with each match found.

How to replace with regex in javascript [duplicate]

I need to replace HYD and HYD. with HYDRAULIC
But as you see HYD. does not get converted. What am I doing wrong?
console.log("HYD", /\bHYD\b/gi.test("HYD")) // OK!
console.log("HYD,CYLINDER", /\bHYD\b/gi.test("HYD,CYLINDER")) // OK!
console.log("HYD,CYLINDER", /\bHYD\b/gi.test("HYD,CYLINDER")) // <- OK!
console.log("HYD. CYLINDER", /\bHYD\.\b/gi.test("HYD. CYLINDER")) // NOT OK! Did not recognice HYD.
console.log("HYD.,CYLINDER", /\bHYD\.\b/gi.test("HYD.,CYLINDER")) // <- NOT OK! As I need to convert HYD. with HYDRAULIC..
// Example:
const abbreviation = "HYD.";
const expansion = "HYDRAULIC";
if(/\bHYD\.\b/gi.test("HYD.,CYLINDER")) { // as this does not returns true I cant continue do the replacement
"HYD.,CYLINDER".replace(abbreviation, expansion)
}

The root cause of the problem is quite obvious: in between a . and ,, there is no word boundary, and thus, HYD., does not match the /\bHYD\.\b/ or /HYD\.?\b/ regexps.
Moreover, since you are building the regex pattern dynamically, you can't play around with alternatives much.
In this case, the easiest and most convenient solution is by using the unambiguous word boundary (?!\w), a lookahead that fails the match if there is a word char (letter, digit or _, this may be further customized) immediately to the right of the current position.
Thus, you need to build the pattern like this (considering that all the search values consist of word chars):
new RegExp("\\b" + val + "\\.?(?!\\w)")
Note that the backslashes need to be double escaped. The \\.? pattern will match 1 or 0 dots, and (?!\\w) will require a trailing word boundary.
Note that in case the search values can have special chars (non-word chars), you will need to use something like
new RegExp("(^|\\W)" + val.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') + "\\.?(?!\\w)")
and replace with "$1" + expansion.replace(/\$/g, '$$$$').
Yes, .replace(/\$/g, '$$$$') is a necessary action if you are replacing with dynamic literal replacement patterns (as the literal $ must be doubled inside replacement patterns in JS).

REGEX - after bracket get data until end bracket

I have a string like the following:
SOME TEXT (BI1) SOME MORE TEXT (BI17) SOME FINAL TEXT (BI1234)
Question
I am trying to make a regex to get just the information between the curly brackets, for example the end string would look like:
BI1 BI17 BI1234
I have found this example on stackoverflow which will get the first value BI1, but will ignore the rest after.
Get text between two rounded brackets
this is the REGEX I created from the above link: /\(([^)]+)\)/g but it includes the brackets, I want to remove these.
I am using this website to attempt to solve this query which has a testing window to see if the regex entered works:
http://www.regexr.com
Additional Information
there can be any amount of numbers also, which is why I have given 3 different examples.
this is a continous string, not on seperate lines
thanks for any help on this matter.

While this isn't possible using just regexes, you can do it with string#split and the following regex:
\).*?\(|^.*?\(|\).*?$
Yielding code that looks a bit like this:
function getBracketed(str) {
return str.split(/\).*?\(|^.*?\(|\).*?$/).filter(Boolean);
}
(You need to filter out the empty strings that'll appear at the beginning and end if you do it this way - hence the extra operation).
Regex demo on Regex101
Code demo on Repl.it

If you need to keep all inside parentheses and remove everything else, you might use
var str = "SOME TEXT (BI1) SOME MORE TEXT (BI17) SOME FINAL TEXT (BI1234)";
var result = str.replace(/.*?\(([^()]*)\)/g, " $1").trim();
console.log(result);
If you need to get only the BI+digits pattern inside parentheses, use
/.*?\((BI\d+)\)/g
Details:
.*? - match any 0+ chars other than linebreak symbols
\( - match a (
(BI\d+) - Group 1 capturing BI + 1 or more digits (\d+) (or [^()]* - zero or more chars other than ( and ))
\) - a closing ).
To get all the values as array (say, for later joining), use
var str = "SOME TEXT (BI1) SOME MORE TEXT (BI17) SOME FINAL TEXT (BI1234)";
var re = /\((BI\d+)\)/g;
var res =str.match(re).map(function(s) {return s.substring(1, s.length-1);})
console.log(res);
console.log(res.join(" "));

Regex for extensive phone number validation

I have a number of rules which I need to apply to a phone number input field, following is my attempt:
var positive_checks = new Array(
/^[0-9]{8}$/g // 1. Must have 8 digits exactly
);
var negative_checks = new Array(
/^[0147]/g, // 2. Must not start with 0,1,4 or 7
/^[9]{3}/g, // 3. Must not start with 999
/(.)\\1*$/g // 4. Must not be all the same number
);
for (i in positive_checks) {
if (str.search(positive_checks[i]) < 0) {
return false;
}
}
for (i in negative_checks) {
if (str.search(negative_checks[i]) >= 0) {
return false;
}
}
All rules are working except rule 4, which I don't fully understand, other than it uses back-references somehow. I think there was mention that the environment needs to allow for back-references, is Javascript such an environment?
Secondarily, I'd be interested to try and rework all rules so I only need to have a single rule array and loop and not need to check for negative checks, is that possible in each of these instances? Ultimately I'm looking for a Javascript solution, however being able to use regex for all 4 makes it nicer looking code in my opinion, and being form validation logic means that performance is not really an issue here.

Your number four rule probably doesn't work because of the double backslashes you have for your backreference and I would also anchor it and change the * quantifier to + meaning "one or more times"
/^(.)\1+$/g
Explanation:
^ # the beginning of the string
( # group and capture to \1:
. # any character except \n
) # end of \1
\1+ # what was matched by capture \1 (1 or more times)
$ # before an optional \n, and the end of the string
A one-liner that will validate all of your requirements:
var re = /^(?=.{8}$)(?!999|[0147]|(.)\1+)[0-9]+$/

Use regexr.com/39khr and hover the different parts of your expression to see what they do.
As you do not say what doesn't work, ie: giving examples of a false number that should be true or the other way around, it's very hard to give you an answer.

How is RegEx handled differently in VBA and JavaScript?

I'm using a regular expression in Excel VBA to parse the results of a swim meet. The code reads a row of text that was copied from a PDF and outputs the important data into individual cells. Since the format of the string varies throughout the source PDF, the regular expression is quite complicated. Still, I'm able to parse 95% of the data at this point.
Some of the rows that are not being parsed are confusing me, though. VBA is clearly not able to find a match with the regular expression, but when I copy the exact same regex and string into this website, JavaScript is able to find a match without a problem. Is there something different in the way VBA and JavaScript handle regular expressions that might account for this?
Here's the string that VBA refuses to match:
12. NUNEZ CHENG, Walter 74 Club Tennis Las Terr 3:44.57 123
Here's the function I'm using in Excel (mostly successfully):
Function singleLineResults(SourceString As String) As Variant
Dim cSubmatches As Variant
Dim collectionArray(11) As String
Dim cnt As Integer
Dim oMatches As MatchCollection
With New RegExp
.MultiLine = MultiLine
.IgnoreCase = IgnoreCase
.Global = False
'1. JAROSOVA, Lenka 27 Swimmpower Prague 2:26.65 605 34.45 37.70 37.79 36.71
.Pattern = "(\d*)\.?\s?([^,]+),\s([^\d]+)\s?(\d+)\s((?:[A-Z]{3})?)\s?((?:(?!\d\:\d).)*)\s?((?:\d+:)?\d+\.\d+)(?:\s(\d+))?(?:\s((?:\d+:)?\d+.\d+))?(?:\s((?:\d+:)?\d+.\d+))?(?:\s((?:\d+:)?\d+.\d+))?(?:\s((?:\d+:)?\d+.\d+))?(?:Splash Meet Manager 11, Build \d{5} Registered to [\w\s]+ 2014-08-\d+ \d+:\d+ - Page \d+)?$"
Set oMatches = .Execute(SourceString)
If oMatches.Count > 0 Then
For Each submatch In oMatches(0).SubMatches
collectionArray(cnt) = submatch '.Value
cnt = cnt + 1
Next
Else
singleLineResults = Null
End If
End With
singleLineResults = collectionArray()
End Function

Could you add more examples to what actually matches? E.g. the surrounding lines that matches, and better yet, examples that are not supposed to match if any?
I've tried "cleaning" up a bit in the regex, removing groups that are not used to match that particular line, to make the error more obvious, and changed how one of the groups works, which might actually fix the issue:
(\d*)
\.?\s?
([^,]+)
,\s
([^\d]+)
\s?
(\d+)
\s
(
(?:[A-Z]{3})?
)
\s?
(
# OLD SOLUTION
# (?:
# (?!\d\:\d)
# .
# )*
# NEW SOLUTION
.*?
)
\s?
(
(?:\d+:)?
\d+\.\d+
)
(?:
\s
(\d+)
)?
$
See example on regex101.
The group that puzzles me the most, however, is this one:
(?:[A-Z]{3})?
Why the 3 character limit, when it only matches the first 3 letters in the street name?

We Keep Coding

JavaScript is the programming language of the Web.

Sort lines on webpage using javascript/ regex - javascript

Related

How to write regexp for finding :smile: in javascript?

How to replace with regex in javascript [duplicate]

REGEX - after bracket get data until end bracket

Regex for extensive phone number validation

How is RegEx handled differently in VBA and JavaScript?

Categories

Resources