Grep a page in firefox using javascript

Grep a page in firefox using javascript - javascript

Am writing an extension to provide grep'ing functionality in Firefox. At my workplace we access all log files using a browser, and grep functionality would be ideal for filtering results, looking at only particular logging levels (INFO,WARN,ERROR) etc.
Have setup the extension boilerplate.
Was wondering if I could get some hints on the required javascript. Am after a function:
function grepPage(regex){
...
}
which would apply the regex to each line in loaded text file in firefox, and change the loaded text file to only display lines that match.
This is the type of thing I could spend ages trying to work out, when I'm sure there would be simpler ways of doing this.
Any help would be highly appreciated.
Cheers,
Ben

Two ways to look at this.
One, you could avoid re-inventing the wheel:
http://api.jquery.com/jQuery.grep/
Two, you could whip up a quick function (this example doesn't require jQuery or other libraries).
function grepPage(regex) {
var lines = document.getElementByTagName('body').innerHTML.split("\n");
var matches = new Array();
// Check if the regex is surrounded by slashes by checking for a leading slash
if ( ! regex.match(/^\//) ) { regex = '/' + regex + '/'; }
for (var i = 0; i < lines.length; i++) {
if ( regex.test( lines[i] ) ) { matches.push(lines[i]); }
}
// Now the 'matches' array contains all your matches, do as you will with it.
}
Warning, untested, but it should work. :)

Related

RegEx change function name and parameter of string

I'm awful with RegEx to begin with. Anyway I tried my best and think I got pretty far, but I'm not exactly there yet...
What I have:
A javascript source file that I need to process in Node.js. Can look like that:
var str = "require(test < 123)\n\nrequire(test2 !== test)\n\nfunction(dontReplaceThisParam) {\n console.log(dontReplaceThisParam)\n}";
What I came up with:
console.log(str.replace(/\(\s*([^)].+?)\s*\)/g, 'Debug$&, \'error_1\''))
Theres a few problems:
I want that the string error gets inside the paranthesis so it acts as a second parameter.
All function calls, or I think even everything with paranthesis will be replaced. But only function calls to "require(xxx)" should be touched.
Also, the error codes should somehow increment if possible...
So a string like "require(test == 123)" should convert to "requireDebug(test == 123, 'error_N')" but only calls to "require"...
What currently gets outputted by my code:
requireDebug(test < 123), 'error_1'
requireDebug(test2 !== test), 'error_1'
functionDebug(dontReplaceThisParam), 'error_1' {
console.logDebug(dontReplaceThisParam), 'error_1'
}
What I need:
requireDebug(test < 123, 'error_1')
requireDebug(test2 !== test, 'error_2')
function(dontReplaceThisParam) {
console.log(dontReplaceThisParam)
}
I know I could just do things like that manually but we're talking here about a few hundred source files. I also know that doing such things is not a very good way, but the debugger inside the require function is not working so I need to make my own debug function with an error code to locate the error. Its pretty much all I can do at the moment...
Any help is greatly appreciated!

Start the regex with require, and since you need an incrementing counter, pass a function as the second arg to replace, so that you can increment and insert the counter for each match.
var str = "require(test < 123)\n\nrequire(test2 !== test)\n\nfunction(dontReplaceThisParam) {\n console.log(dontReplaceThisParam)\n}";
var counter = 0;
console.log(str.replace(/require\(\s*([^)].+?)\s*\)/g, (s, g2) =>
`requireDebug(${g2}, \'error_${++counter}\')`
));
Other than that, your code was unaltered.

How to make indexOf only match 'hi' as a match and not 'hirandomstuffhere'?

Basically I was playing around with an Steam bot for some time ago, and made it auto-reply when you said things in an array, I.E an 'hello-triggers' array, which would contain things like "hi", "hello" and such. I made so whenever it received an message, it would check for matches using indexOf() and everything worked fine, until I noticed it would notice 'hiasodkaso', or like, 'hidemyass' as an "hi" trigger.
So it would match anything that contained the word even if it was in the middle of a word.
How would I go about making indexOf only notice it if it's the exact word, and not something else in the same word?
I do not have the script that I use but I will make an example that is pretty much like it:
var hiTriggers = ['hi', 'hello', 'yo'];
// here goes the receiving message function and what not, then:
for(var i = 0; i < hiTriggers.length; i++) {
if(message.indexOf(hiTriggers[i]) >= 0) {
bot.sendMessage(SteamID, randomHelloMsg[Math stuff here blabla]); // randomHelloMsg is already defined
}
}
Regex wouldn't be used for this, right? As it is to be used for expressions or whatever. (my English isn't awesome, ikr)
Thanks in advance. If I wasn't clear enough on something, please let me know and I'll edit/formulate it in another way! :)

You can extend prototype:
String.prototype.regexIndexOf = function(regex, startpos) {
var indexOf = this.substring(startpos || 0).search(regex);
return (indexOf >= 0) ? (indexOf + (startpos || 0)) : indexOf;
}
and do:
var foo = "hia hi hello";
foo.regexIndexOf(/hi\b/);
Or if you don't want to extend the string object:
foo.substr(i).search(/hi\b/);
both examples where taken from the top answers of Is there a version of JavaScript's String.indexOf() that allows for regular expressions?

Regex wouldn't be used for this, right? As it is to be used for expressions or whatever. (my > English isn't awesome, ikr)
Actually, regex is for any old pattern matching. It's absolutely useful for this.
fmsf's answer should work for what you're trying to do, however, in general extending native objects prototypes is frowned upon afik. You can easily break libraries by doing so. I'd avoid it when possible. In this case you could use his regexIndexOf function by itself or in concert with something like:
//takes a word and searches for it using regexIndexOf
function regexIndexWord(word){
return regexIndexOf("/"+word+"\b/");
}
Which would let you search based on your array of words without having to add the special symbols to each one.

How to get all possible overlapping matches for a string

I'm working on the MIU system problem from "Gödel, Escher, Bach" chapter 2.
One of the rules states
Rule III: If III occurs in one of the strings in your collection, you may make a new string with U in place of III.
Which means that the string MIII can become MU, but for other, longer strings there may be multiple possibilities [matches in brackets]:
MIIII could yield
M[III]I >> MUI
MI[III] >> MIU
MUIIIUIIIU could yield
MU[III]UIIIU >> MUUUIIIU
MUIIIU[III]U >> MUIIIUUU
MUIIIIU could yield
MU[III]IU >> MUUIU
MUI[III]U >> MUIUU
Clearly regular expressions such as /(.*)III(.*)/ are helpful, but I can't seem to get them to generate every possible match, just the first one it happens to find.
Is there a way to generate every possible match?
(Note, I can think of ways to do this entirely manually, but I am hoping there is a better way using the built in tools, regex or otherwise)
(Edited to clarify overlapping needs.)

Here's the regex you need: /III/g - simple enough, right? Now here's how you use it:
var text = "MUIIIUIIIU", find = "III", replace "U",
regex = new RegExp(find,"g"), matches = [], match;
while(match = regex.exec(text)) {
matches.push(match);
regex.lastIndex = match.index+1;
}
That regex.lastIndex... line overrides the usual regex behaviour of not matching results that overap. Also I'm using a RegExp constructor to make this more flexible. You could even build it into a function this way.
Now you have an array of match objects, you can do this:
matches.forEach(function(m) { // older browsers need a shim or old-fashioned for loop
console.log(text.substr(0,m.index)+replace+text.substr(m.index+find.length));
});
EDIT: Here is a JSFiddle demonstrating the above code.

Sometimes regexes are overkill. In your case a simple indexOf might be fine too!
Here is, admittedly, a hack, but you can transform it into pretty, reusable code on your own:
var s = "MIIIIIUIUIIIUUIIUIIIIIU";
var results = [];
for (var i = 0; true; i += 1) {
i = s.indexOf("III", i);
if (i === -1) {
break;
}
results.push(i);
}
console.log("Match positions: " + JSON.stringify(results));
It takes care of overlaps just fine, and at least to me, the indexOf just looks simpler.

non-casesensitive str.search(array[i]) in <script> tag is not working

I have problem use array keyword and non-casesensitive search in innerHTML
I am trying write Greasemonkey JS which will remove tags which are containning keywords
function removebadcriptts() {
var scriptslinks = ['jumper.php','redirect.php'];
var theLinks = document.getElementsByTagName("script");
for (var i=0; i<scriptslinks.length; i++)
{
for (var j=0;j<theLinks.length;j++)
{
if (theLinks[i].innerHTML.search("/"+scriptslinks[i]+"/i/") !== -1)
/keyword/i = regular expression for non-case is not working
{
console.error("InnerHTML Keyword found ");
theLinks[j].parentNode.removeChild(theLinks[j]);
}
else
{
console.error("InnerHTML Keyword not found ");
}
}
}
}
Can anybody help howto remove and match this kind of scripts and remove from WEBpage also howto catch scripts which are injecting scripts into loaded webpage

When the Greasemonkey script is running, the scripts that are already on the page has already run. Removing them won't undo what these scripts did to the page.
Also the script that is inserted after the Greasemonkey script will not be catched, so this Greasemonkey probably will not work.
An alternative would be using NoScript add-on, because it is already designed to prevent scripts from running.
Edit: As the OP said that the primary problem is to make the search work, instead of storing patterns in a string inside an array, you can store patterns directly.
var scriptslinks = [/jumper\.php/i, /redirect\.php/i];
And when matching
if (theLinks[j].innerHTML.search(scriptslinks[i]) !== -1)
Note that a regex is passed directly to the search function. Also theLinks[i] should be theLinks[j].
Another solution: Use a single pattern.
if (theLinks[j].innerHTML.search(/jumper\.php|redirect\.php/i) !== -1)
That way you don't have to make 2 level for loops, and I think it will be faster, as the engine can search for 2 patterns at once.

search expects a RegExp object. So try this:
theLinks[i].innerHTML.search(new RegExp(scriptslinks[i], "i"))
Although you can pass a string too, it would be used create an RegExp object like new RegExp(string) but you can’t set the i modifier with that.
Furthermore, you should escape the special character of a regular expression like the .. You can use this method to do so:
RegExp.quote = function(str) {
return str.replace(/(?=[\\^$*+?.()|{}[\]])/g, "\\");
}

Do you mean
function removebadcriptts() {
var scriptslinks = ['jumper.php','redirect.php'];
var theLinks = document.getElementsByTagName("script");
for (var i=0; i<scriptslinks.length; i++)
{
for (var j=0;j<theLinks.length;j++)
{
if (theLinks[i].src.toLowerCase().indexOf(scriptslinks[i]) !== -1)
{
console.error("SRC Keyword found ");
theLinks[j].parentNode.removeChild(theLinks[j]);
}
else
{
console.error("SRC Keyword not found ");
}
}
}
}

Try using .Match, and i believe the regex would have to look like:
/scriptslink[i]/i

Javascript: Whitespace Characters being Removed in Chrome (but not Firefox)

Why would the below eliminate the whitespace around matched keyword text when replacing it with an anchor link? Note, this error only occurs in Chrome, and not firefox.
For complete context, the file is located at: http://seox.org/lbp/lb-core.js
To view the code in action (no errors found yet), the demo page is at http://seox.org/test.html. Copy/Pasting the first paragraph into a rich text editor (ie: dreamweaver, or gmail with rich text editor turned on) will reveal the problem, with words bunched together. Pasting it into a plain text editor will not.
// Find page text (not in links) -> doxdesk.com
function findPlainTextExceptInLinks(element, substring, callback) {
for (var childi= element.childNodes.length; childi-->0;) {
var child= element.childNodes[childi];
if (child.nodeType===1) {
if (child.tagName.toLowerCase()!=='a')
findPlainTextExceptInLinks(child, substring, callback);
} else if (child.nodeType===3) {
var index= child.data.length;
while (true) {
index= child.data.lastIndexOf(substring, index);
if (index===-1 || limit.indexOf(substring.toLowerCase()) !== -1)
break;
// don't match an alphanumeric char
var dontMatch =/\w/;
if(child.nodeValue.charAt(index - 1).match(dontMatch) || child.nodeValue.charAt(index+keyword.length).match(dontMatch))
break;
// alert(child.nodeValue.charAt(index+keyword.length + 1));
callback.call(window, child, index)
}
}
}
}
// Linkup function, call with various type cases (below)
function linkup(node, index) {
node.splitText(index+keyword.length);
var a= document.createElement('a');
a.href= linkUrl;
a.appendChild(node.splitText(index));
node.parentNode.insertBefore(a, node.nextSibling);
limit.push(keyword.toLowerCase()); // Add the keyword to memory
urlMemory.push(linkUrl); // Add the url to memory
}
// lower case (already applied)
findPlainTextExceptInLinks(lbp.vrs.holder, keyword, linkup);
Thanks in advance for your help. I'm nearly ready to launch the script, and will gladly comment in kudos to you for your assistance.

It's not anything to do with the linking functionality; it happens to copied links that are already on the page too, and the credit content, even if the processSel() call is commented out.
It seems to be a weird bug in Chrome's rich text copy function. The content in the holder is fine; if you cloneContents the selected range and alert its innerHTML at the end, the whitespaces are clearly there. But whitespaces just before, just after, and at the inner edges of any inline element (not just links!) don't show up in rich text.
Even if you add new text nodes to the DOM containing spaces next to a link, Chrome swallows them. I was able to make it look right by inserting non-breaking spaces:
var links= lbp.vrs.holder.getElementsByTagName('a');
for (var i= links.length; i-->0;) {
links[i].parentNode.insertBefore(document.createTextNode('\xA0 '), links[i]);
links[i].parentNode.insertBefore(document.createTextNode(' \xA0), links[i].nextSibling);
}
but that's pretty ugly, should be unnecessary, and doesn't fix up other inline elements. Bad Chrome!
var keyword = links[i].innerHTML.toLowerCase();
It's unwise to rely on innerHTML to get text from an element, as the browser may escape or not-escape characters in it. Most notably &, but there's no guarantee over what characters the browser's innerHTML property will output.
As you seem to be using jQuery already, grab the content with text() instead.
var isDomain = new RegExp(document.domain, 'g');
if (isDomain.test(linkUrl)) { ...
That'll fail every second time, because global regexps remember their previous state (lastIndex): when used with methods like test, you're supposed to keep calling repeatedly until they return no match.
You don't seem to need g (multiple matches) here... but then you don't seem to need regexp here either as a simple String indexOf would be more reliable. (In a regexp, each . in the domain would match any character in the link.)
Better still, use the URL decomposition properties on Location to do a direct comparison of hostnames, rather than crude string-matching over the whole URL:
if (location.hostname===links[i].hostname) { ...

// don't match an alphanumeric char
var dontMatch =/\w/;
if(child.nodeValue.charAt(index - 1).match(dontMatch) || child.nodeValue.charAt(index+keyword.length).match(dontMatch))
break;
If you want to match words on word boundaries, and case insensitively, I think you'd be better off using a regex rather than plain substring matching. That'd also save doing four calls to findText for each keyword as it is at the moment. You can grab the inner bit (in if (child.nodeType==3) { ...) of the function in this answer and use that instead of the current string matching.
The annoying thing about making regexps from string is adding a load of backslashes to the punctuation, so you'll want a function for that:
// Backslash-escape string for literal use in a RegExp
//
function RegExp_escape(s) {
return s.replace(/([/\\^$*+?.()|[\]{}])/g, '\\$1')
};
var keywordre= new RegExp('\\b'+RegExp_escape(keyword)+'\\b', 'gi');
You could even do all the keyword replacements in one go for efficiency:
var keywords= [];
var hrefs= [];
for (var i=0; i<links.length; i++) {
...
var text= $(links[i]).text();
keywords.push('(\\b'+RegExp_escape(text)+'\\b)');
hrefs.push[text]= links[i].href;
}
var keywordre= new RegExp(keywords.join('|'), 'gi');
and then for each match in linkup, check which match group has non-zero length and link with the hrefs[ of the same number.

I'd like to help you more, but it's hard to guess without being able to test it, but I suppose you can get around it by adding space-like characters around your links, eg. .
By the way, this feature of yours that adds helpful links on copying is really interesting.

We Keep Coding

JavaScript is the programming language of the Web.

Grep a page in firefox using javascript - javascript

Related

RegEx change function name and parameter of string

How to make indexOf only match 'hi' as a match and not 'hirandomstuffhere'?

How to get all possible overlapping matches for a string

non-casesensitive str.search(array[i]) in <script> tag is not working

Javascript: Whitespace Characters being Removed in Chrome (but not Firefox)

Categories

Resources