Regex efficiency, better way to select text among html

Regex efficiency, better way to select text among html - javascript

This is about a Chrome Extension.
Suppose a user select any text on a page, then clicks a button to save it. Via window.getSelection() I can get that text without the underlying html markup.
I store that text. For demo purposes, let's say the text is:
"John was much more likely to buy if he knew the price beforehand"
The next time the user visits the page, I want to find that text on the page. The issue is, the html for that text is actually:
<b>John was much more likely to buy if he knew the price <span class="italic">beforehand</span></b>
The second issue is that this system needs to work even if the selection is dirty, i.e. it starts/ends mid DOM node.
What I've build is bit of a fat solution, so I am curious how I can make it more efficient and/or smaller. This is the whole thing:
text.split("").map(function(el, i, arr){
if(specials.includes(el)){
return "\\"+el;
}
return el;
})
.join("(?:\\s*<[^>]+>\\s*)*\\s*");
where text is the saved text and specials is
var specials = [
'/', '.', '*', '+', '?', '|',
'(', ')', '[', ']', '{', '}', '\\'
];
The process is:
Split text into single characters
For each character, check if it's a special char and if so, prepend it with \
Join all letters together with regEx that check if there's any whitespace or html tags inbetween
My question is, can it be done in a better way? I get the "bruteforcing" feeling with this solution and I don't know if it would actually cause lag on larger sites/selection texts.
Plus, it doesn't work for SPAs where text may update a bit after the DOM is ready.
Thank you for any input.
EDIT:
So initially I was using mark.js, which doesn't handle this at all, but not 12 hours after I posted this question the maintainer release v8.0.0 that uses NodeList and handles my use case. The feature is "acrossElements", located here.

create a Range object
set it so that it spans the entire document from start to end
check if the string of interest is in its toString()
clone range twice
apply binary search by moving the start/end points of the subranges into roughly their midpoint. this can be approximated by finding the first descendant with > 1 child nodes and then splitting the child list
goto 3
this should roughly take n log m steps where n is the document text length and m the number of nodes.

Build the entire text representation of the document manually from each node with nodeType of Node.TEXT_NODE, saving the node reference and its text's start/end positions relative to the overall string in an array. Do it just once as DOM is slow, and you might want to search for multiple strings. Otherwise the other answer might be much faster (without actual benchmarks it's a moot point).
Apply HTML whitespace coalescing rules.
Otherwise you'll end up with huge spans of spaces and newline characters.
For example, Range.toString() doesn't strip them, meaning you'd have to convert your string to a RegExp with [\s\n\r]+ instead of spaces and all other special characters like {}()[]|^$*.?+ escaped.
Anyway, it'd be wise to use the converted RegExp on document.body.textContent before proceeding (easy to implement, many examples on the net, thus not included below).
A simplified implementation for plain-string search follows.
function TextMap(baseElement) {
this.baseElement = baseElement || document.body;
var textArray = [], textNodes = [], textLen = 0, collapseSpace = true;
var walker = document.createTreeWalker(this.baseElement, NodeFilter.SHOW_TEXT);
while (walker.nextNode()) {
var node = walker.currentNode;
var nodeText = node.textContent;
var parentName = node.parentNode.localName;
if (parentName==='noscript' || parentName==='script' || parentName==='style') {
continue;
}
if (parentName==='textarea' || parentName==='pre') {
nodeText = nodeText.replace(/^(\r\n|[\r\n])/, '');
collapseSpace = false;
} else {
nodeText = nodeText.replace(/^[\s\r\n]+/, collapseSpace ? '' : ' ')
.replace(/[\s\r\n]+$/, ' ');
collapseSpace = nodeText.endsWith(' ');
}
if (nodeText) {
var len = nodeText.length;
textArray.push(nodeText);
textNodes.push({
node: node,
start: textLen,
end: textLen + len - 1,
});
textLen += len;
}
}
this.text = textArray.join('');
this.nodeMap = textNodes;
}
TextMap.prototype.indexOf = function(str) {
var pos = this.text.indexOf(str);
if (pos < 0) {
return [];
}
var index1 = this.bisectLeft(pos);
var index2 = this.bisectRight(pos + str.length - 1, index1);
return this.nodeMap.slice(index1, index2 + 1)
.map(function(info) { return info.node });
}
TextMap.prototype.bisect =
TextMap.prototype.bisectLeft = function(pos) {
var a = 0, b = this.nodeMap.length - 1;
while (a < b - 1) {
var c = (a + b) / 2 |0;
if (this.nodeMap[c].start > pos) {
b = c;
} else {
a = c;
}
}
return this.nodeMap[b].start > pos ? a : b;
}
TextMap.prototype.bisectRight = function(pos, startIndex) {
var a = startIndex |0, b = this.nodeMap.length - 1;
while (a < b - 1) {
var c = (a + b) / 2 |0;
if (this.nodeMap[c].end > pos) {
b = c;
} else {
a = c;
}
}
return this.nodeMap[a].end >= pos ? a : b;
}
Usage:
var textNodes = new TextMap().indexOf('<span class="italic">');
When executed on this question's page:
[text, text, text, text, text, text]
Those are text nodes, so to access corresponding DOM elements use the standard .parentNode:
var textElements = textNodes.map(function(n) { return n.parentNode });
Array[6]
  0: span.tag
  1: span.pln
  2: span.atn
  3: span.pun
  4: span.atv
  5: span.tag

Related

Parse semi-structured values

it's my first question here. I tried to find an answer but couldn't, honestly, figure out which terms should I use, so sorry if it has been asked before.
Here it goes:
I have thousands of records in a .txt file, in this format:
(1, 3, 2, 1, 'John (Finances)'),
(2, 7, 2, 1, 'Mary Jane'),
(3, 7, 3, 2, 'Gerald (Janitor), Broflowski'),
... and so on. The first value is the PK, the other 3 are Foreign Keys, the 5th is a string.
I need to parse them as JSON (or something) in Javascript, but I'm having troubles because some strings have parentheses+comma (on 3rd record, "Janitor", e.g.), so I can't use substring... maybe trimming the right part, but I was wondering if there is some smarter way to parse it.
Any help would be really appreciated.
Thanks!

You can't (read probably shouldn't) use a regular expression for this. What if the parentheses contain another pair or one is mismatched?
The good news is that you can easily construct a tokenizer/parser for this.
The idea is to keep track of your current state and act accordingly.
Here is a sketch for a parser I've just written here, the point is to show you the general idea. Let me know if you have any conceptual questions about it.
It works demo here but I beg you not to use it in production before understanding and patching it.
How it works
So, how do we build a parser:
var State = { // remember which state the parser is at.
BeforeRecord:0, // at the (
DuringInts:1, // at one of the integers
DuringString:2, // reading the name string
AfterRecord:3 // after the )
};
We'll need to keep track of the output, and the current working object since we'll parse these one at a time.
var records = []; // to contain the results
var state = State.BeforeRecord;
Now, we iterate the string, keep progressing in it and read the next character
for(var i = 0;i < input.length; i++){
if(state === State.BeforeRecord){
// handle logic when in (
}
...
if(state === State.AfterRecord){
// handle that state
}
}
Now, all that's left is to consume it into the object at each state:
If it's at ( we start parsing and skip any whitespaces
Read all the integers and ditch the ,
After four integers, read the string from ' to the next ' reaching the end of it
After the string, read until the ) , store the object, and start the cycle again.
The implementation is not very difficult too.
The parser
var State = { // keep track of the state
BeforeRecord:0,
DuringInts:1,
DuringString:2,
AfterRecord:3
};
var records = []; // to contain the results
var state = State.BeforeRecord;
var input = " (1, 3, 2, 1, 'John (Finances)'), (2, 7, 2, 1, 'Mary Jane'), (3, 7, 3, 2, 'Gerald (Janitor), Broflowski')," // sample input
var workingRecord = {}; // what we're reading into.
for(var i = 0;i < input.length; i++){
var token = input[i]; // read the current input
if(state === State.BeforeRecord){ // before reading a record
if(token === ' ') continue; // ignore whitespaces between records
if(token === '('){ state = State.DuringInts; continue; }
throw new Error("Expected ( before new record");
}
if(state === State.DuringInts){
if(token === ' ') continue; // ignore whitespace
for(var j = 0; j < 4; j++){
if(token === ' ') {token = input[++i]; j--; continue;} // ignore whitespace
var curNum = '';
while(token != ","){
if(!/[0-9]/.test(token)) throw new Error("Expected number, got " + token);
curNum += token;
token = input[++i]; // get the next token
}
workingRecord[j] = Number(curNum); // set the data on the record
token = input[++i]; // remove the comma
}
state = State.DuringString;
continue; // progress the loop
}
if(state === State.DuringString){
if(token === ' ') continue; // skip whitespace
if(token === "'"){
var str = "";
token = input[++i];
var lenGuard = 1000;
while(token !== "'"){
str+=token;
if(lenGuard-- === 0) throw new Error("Error, string length bounded by 1000");
token = input[++i];
}
workingRecord.str = str;
token = input[++i]; // remove )
state = State.AfterRecord;
continue;
}
}
if(state === State.AfterRecord){
if(token === ' ') continue; // ignore whitespace
if(token === ',') { // got the "," between records
state = State.BeforeRecord;
records.push(workingRecord);
workingRecord = {}; // new record;
continue;
}
throw new Error("Invalid token found " + token);
}
}
console.log(records); // logs [Object, Object, Object]
// each object has four numbers and a string, for example
// records[0][0] is 1, records[0][1] is 3 and so on,
// records[0].str is "John (Finances)"

I echo Ben's sentiments about regular expressions usually being bad for this, and I completely agree with him that tokenizers are the best tool here.
However, given a few caveats, you can use a regular expression here. This is because any ambiguities in your (, ), , and ' can be attributed (AFAIK) to your final column; as all of the other columns will always be integers.
So, given:
The input is perfectly formed (with no unexpected (, ), , or ').
Each record is on a new line, per your edit
The only new lines in your input will be to break to the next record
... the following should work (Note "new lines" here are \n. If they're \r\n, change them accordingly):
var input = /* Your input */;
var output = input.split(/\n/g).map(function (cols) {
cols = cols.match(/^\((\d+), (\d+), (\d+), (\d+), '(.*)'\)/).slice(1);
return cols.slice(0, 4).map(Number).concat(cols[4]);
});
The code splits on new lines, then goes through row by row and splits into cells using a regular expression, which greedily attributes as much as it can to the final cell. It then turns the first 4 elements into integers, and sticks the 5th element (the string) onto the end.
This gives you an array of records, where each record is itself an array. The first 4 elements are your PK's (as integers) and your 5th element is the string.
For example, given your input, use output[0][4] to get "Gerald (Janitor), Broflowski", and output[1][0] to get the first PK 2 for the second record (don't forget JavaScript arrays are zero-indexed).
You can see it working here: http://jsfiddle.net/56ThR/

Another option would be to convert it into something that looks like an Array and eval it. I know it is not recommended to use eval, but it's a cool solution :)
var lines = input.split("\n");
var output = [];
for(var v in lines){
// Remove opening (
lines[v] = lines[v].slice(1);
// Remove closing ) and what is after
lines[v] = lines[v].slice(0, lines[v].lastIndexOf(')'));
output[v] = eval("[" + lines[v] + "]");
}
So, the eval parameter would look like: [1, 3, 2, 1, 'John (Finances)'], which is indeed an Array.
Demo: http://jsfiddle.net/56ThR/3/
And, it can also be written shorter like this:
var lines = input.split("\n");
var output = lines.map( function(el) {
return eval("[" + el.slice(1).slice(0, el.lastIndexOf(')') - 1) + "]");
});
Demo: http://jsfiddle.net/56ThR/4/

You can always do it "manually" :)
var lines = input.split("\n");
var output = [];
for(var v in lines){
output[v] = [];
// Remove opening (
lines[v] = lines[v].slice(1);
// Get integers
for(var i = 0; i < 4; ++i){
var pos = lines[v].indexOf(',');
output[v][i] = parseInt(lines[v].slice(0, pos));
lines[v] = lines[v].slice(pos+1);
}
// Get string betwen apostrophes
lines[v] = lines[v].slice(lines[v].indexOf("'") + 1);
output[v][4] = lines[v].slice(0, lines[v].indexOf("'"));
}
Demo: http://jsfiddle.net/56ThR/2/

What you have here is basically a csv (comma separated value) file which you wish to parse.
The easiest way would be to use an wxternal library that will take care of most of the issues you have
Example: jquery csv library is a good one. https://code.google.com/p/jquery-csv/

Is it possible to highlight all words on a web page without destroying the layout?

I've written an extension for firefox which highlights all words on a web page (excluding some words in a given list).
What i've noticed is that (besides that my extension is terribly slow) some web pages get "destroyed", more specifically the layout gets destroyed (particularly websites with overlay advertising or fancy drop-down menus).
My code wraps <span> tags around every "word", or to be precise around every token, because i'm splitting the text nodes with a whitespace as seperator.
So is it possible anyway to realize this task without destroying the page's layout?
I'm iterating over all text nodes, split them, and iterate over every token.
When the token is in my list, i don't highlight it, else i wrap the <span> tag around it.
So any suggestions how this could be done faster would be helpful, too.
Here are some screenshots for a correctly highlighted and a not correctly highlighted web page:
right:
en.wikipedia.org before highlighting,
en.wikipedia.org after highlighting.
wrong:
developer.mozilla.org before highlighting,
developer.mozilla.org after highlighting.

OK. Study this code. It searches for all instances of "is" and highlights if it is not surrounded by word characters. Put this in your scratchpad while this tab is focused. You will see that words like "List" and other words containing "Is" are no highlighted, but all the "Is"'s are.
I basically made an addon here for you. You can now release this as an addon called RegEx FindBar and take all the credit....
var doc = gBrowser.contentDocument;
var ctrler = _getSelectionController(doc.defaultView);
var searchRange = doc.createRange();
searchRange.selectNodeContents(doc.documentElement);
let startPt = searchRange.cloneRange();
startPt.collapse(true);
let endPt = searchRange.cloneRange();
endPt.collapse(false);
let retRane = null;
let finder = Cc["#mozilla.org/embedcomp/rangefind;1"].createInstance().QueryInterface(Ci.nsIFind);
finder.caseSensitive = false;
var i = 0;
while (retRange = finder.Find('is', searchRange, startPt, endPt)) {
i++;
var stCont = retRange.startContainer;
var endCont = retRange.endContainer;
console.log('retRange(' + i + ') = ', retRange);
console.log('var txt = retRange.commonAncestorContainer.data',retRange.commonAncestorContainer.data);
//now test if one posiion before startOffset and one position after endOffset are WORD characters
var isOneCharBeforeStCharWordChar; //var that holds if the character before the start character is a word character
if (retRange.startOffset == 0) {
//no characters befor this characte so obviously not a word char
isOneCharBeforeStCharWordChar = false;
} else {
var oneCharBeforeStChar = stCont.data.substr(retRange.startOffset-1,1);
if (/\w/.test(oneCharBeforeStChar)) {
isOneCharBeforeStCharWordChar = true;
} else {
isOneCharBeforeStCharWordChar = false;
}
console.log('oneCharBeforeStChar',oneCharBeforeStChar);
}
var isOneCharAfterEndCharWordChar; //var that holds if the character before the start character is a word character
if (retRange.endOffset == endCont.length - 1) {
//no characters after this characte so obviously not a word char
isOneCharAfterEndCharWordChar = false;
} else {
var oneCharAferEndChar = endCont.data.substr(retRange.endOffset,1); //no need to subtract 1 from endOffset, it takes into account substr 2nd arg is length and is treated like length I THINK
if (/\w/.test(oneCharAferEndChar)) {
isOneCharAfterEndCharWordChar = true;
} else {
isOneCharAfterEndCharWordChar = false;
}
console.log('oneCharAferEndChar',oneCharAferEndChar);
}
if (isOneCharBeforeStCharWordChar == false && isOneCharAfterEndCharWordChar == false) {
//highlight it as surrounding characters are no word characters
_highlightRange(retRange, ctrler);
console.log('highlighted it as it was not surrounded by word charactes');
} else {
console.log('NOT hilte it as it was not surrounded by word charactes');
}
//break;
startPt = retRange.cloneRange();
startPt.collapse(false);
}
/*********************/
function _getEditableNode(aNode) {
while (aNode) {
if (aNode instanceof Ci.nsIDOMNSEditableElement)
return aNode.editor ? aNode : null;
aNode = aNode.parentNode;
}
return null;
}
function _highlightRange(aRange, aController) {
let node = aRange.startContainer;
let controller = aController;
let editableNode = this._getEditableNode(node);
if (editableNode)
controller = editableNode.editor.selectionController;
let findSelection = controller.getSelection(Ci.nsISelectionController.SELECTION_FIND);
findSelection.addRange(aRange);
if (editableNode) {
// Highlighting added, so cache this editor, and hook up listeners
// to ensure we deal properly with edits within the highlighting
if (!this._editors) {
this._editors = [];
this._stateListeners = [];
}
let existingIndex = this._editors.indexOf(editableNode.editor);
if (existingIndex == -1) {
let x = this._editors.length;
this._editors[x] = editableNode.editor;
this._stateListeners[x] = this._createStateListener();
this._editors[x].addEditActionListener(this);
this._editors[x].addDocumentStateListener(this._stateListeners[x]);
}
}
}
function _getSelectionController(aWindow) {
// display: none iframes don't have a selection controller, see bug 493658
if (!aWindow.innerWidth || !aWindow.innerHeight)
return null;
// Yuck. See bug 138068.
let docShell = aWindow.QueryInterface(Ci.nsIInterfaceRequestor)
.getInterface(Ci.nsIWebNavigation)
.QueryInterface(Ci.nsIDocShell);
let controller = docShell.QueryInterface(Ci.nsIInterfaceRequestor)
.getInterface(Ci.nsISelectionDisplay)
.QueryInterface(Ci.nsISelectionController);
return controller;
}

Oh edit my solution out, will update with proper solution, I see you want to highlight all words
This is the code how firefox highlights stuff without changing document: Finder.jsm - _highlight function. You will have to copy this and use it for the whole document, if you need help let me know and I'll do it.
Here was my solution to highlight all matches of single word: https://stackoverflow.com/a/22206366/1828637
Here man this is how you are going to highlight the whole document, I didn't finish the snippet but this is the start of it: Gist - HighlightTextInDocument

Here's the copy paste answer to highlight everything in the document. As you learn more about it share with us, like how you can highlight with a different color, right now its all pink O_O
function _getEditableNode(aNode) {
while (aNode) {
if (aNode instanceof Ci.nsIDOMNSEditableElement)
return aNode.editor ? aNode : null;
aNode = aNode.parentNode;
}
return null;
}
function _highlightRange(aRange, aController) {
let node = aRange.startContainer;
let controller = aController;
let editableNode = this._getEditableNode(node);
if (editableNode)
controller = editableNode.editor.selectionController;
let findSelection = controller.getSelection(Ci.nsISelectionController.SELECTION_FIND);
findSelection.addRange(aRange);
if (editableNode) {
// Highlighting added, so cache this editor, and hook up listeners
// to ensure we deal properly with edits within the highlighting
if (!this._editors) {
this._editors = [];
this._stateListeners = [];
}
let existingIndex = this._editors.indexOf(editableNode.editor);
if (existingIndex == -1) {
let x = this._editors.length;
this._editors[x] = editableNode.editor;
this._stateListeners[x] = this._createStateListener();
this._editors[x].addEditActionListener(this);
this._editors[x].addDocumentStateListener(this._stateListeners[x]);
}
}
}
function _getSelectionController(aWindow) {
// display: none iframes don't have a selection controller, see bug 493658
if (!aWindow.innerWidth || !aWindow.innerHeight)
return null;
// Yuck. See bug 138068.
let docShell = aWindow.QueryInterface(Ci.nsIInterfaceRequestor)
.getInterface(Ci.nsIWebNavigation)
.QueryInterface(Ci.nsIDocShell);
let controller = docShell.QueryInterface(Ci.nsIInterfaceRequestor)
.getInterface(Ci.nsISelectionDisplay)
.QueryInterface(Ci.nsISelectionController);
return controller;
}
var doc = gBrowser.contentDocument;
var searchRange = doc.createRange();
searchRange.selectNodeContents(doc.documentElement);
_highlightRange(searchRange,_getSelectionController(gBrowser.contentWindow))

#jervis, I can't make a comment on your comment under #Noitidart code as I don't have 50rep yet. So I have to post here.
Re:
I did it with 'gFindBar._highlightDoc(true, word)' now. I'm using firefox 17, so i dont know if gFindBar is state of the art. – jervis 40 mins ago
But I tested his code and and it works.
Don't use gFindBar.
Copy it and then paste it into your Scratchpad.
Why are you using gFindBar._highlightDoc(true, word) ? I thoght you wanted to highlight everything in the document? Where did you get _highlightDoc from? I don't see that anywhere in #Noitidart's code.
Regading yoru comment on iterate all words and use gFindBar._highlightDoc:
I did it with 'gFindBar._highlightDoc(true, word)' now. I'm using firefox 17, so i dont know if gFindBar is state of the art. – jervis 39 mins ago
Dude why do that.... I saw #Noitidart posted a per word solution on the linked topic: gBrowser.tabContainer.childNodes[0].linkedBrowser.finder.highlight(true, 'YOUR_WORD_HERE'); that is extremely easy, one line and no need to create text nodes spans or anything. You have to run this code on each tab you want to highlight in.

Searching for most performant way for string replacing with javascript

I'm programming my own autocomplete textbox control using C# and javascript on clientside. On client side i want to replace the characters in string which matching the characters the user was searching for to highlight it. For example if the user was searching for the characters 'bue' i want to replace this letters in the word 'marbuel' like so:
mar<span style="color:#81BEF7;font-weight:bold">bue</span>l
in order to give the matching part another color. This works pretty fine if i have 100-200 items in my autocomplete, but when it comes to 500 or more, it takes too mutch time.
The following code shows my method which does the logic for this:
HighlightTextPart: function (text, part) {
var currentPartIndex = 0;
var partLength = part.length;
var finalString = '';
var highlightPart = '';
var bFoundPart = false;
var bFoundPartHandled = false;
var charToAdd;
for (var i = 0; i < text.length; i++) {
var myChar = text[i];
charToAdd = null;
if (!bFoundPart) {
var myCharLower = myChar.toLowerCase();
var charToCompare = part[currentPartIndex].toLowerCase();
if (charToCompare == myCharLower) {
highlightPart += myChar;
if (currentPartIndex == partLength - 1)
bFoundPart = true;
currentPartIndex++;
}
else {
currentPartIndex = 0;
highlightPart = '';
charToAdd = myChar;
}
}
else
charToAdd = myChar;
if (bFoundPart && !bFoundPartHandled) {
finalString += '<span style="color:#81BEF7;font-weight:bold">' + highlightPart + '</span>';
bFoundPartHandled = true;
}
if (charToAdd != null)
finalString += charToAdd;
}
return finalString;
},
This method only highlight the first occurence of the matching part.
I use it as follows. Once the request is coming back from server i build an html UL list with the matching items by looping over each item and in each loop i call this method in order to highlight the matching part.
As i told for up to 100 items it woks pretty nice but it is too mutch for 500 or more.
Is there any way to make it faster? Maybe by using regex or some other technique?
I also thought about using "setTimeOut" to do it in a extra function or maybe do it only for the items, which currently are visible, because only a couple of items are visible while for the others you have to scroll.

Try limiting visible list size, so you are only showing 100 items at maximum for example. From a usability standpoint, perhaps even go down to only 20 items, so it would be even faster than that. Also consider using classes - see if it improves performance. So instead of
mar<span style="color:#81BEF7;font-weight:bold">bue</span>l
You will have this:
mar<span class="highlight">bue</span>l

String replacement in JavaScript is pretty easy with String.replace():
function linkify(s, part)
{
return s.replace(part, function(m) {
return '<span style="color:#81BEF7;font-weight:bold">' + htmlspecialchars(m) + '</span>';
});
}
function htmlspecialchars(txt)
{
return txt.replace('<', '<')
.replace('>', '>')
.replace('"', '"')
.replace('&', '&');
}
console.log(linkify('marbuel', 'bue'));

I fixed this problem by using regex instead of my method posted previous. I replace the string now with the following code:
return text.replace(new RegExp('(' + part + ')', 'gi'), "<span>$1</span>");
This is pretty fast. Much faster as the code above. 500 items in the autocomplete seems to be no problem. But can anybody explain, why this is so mutch faster as my method or doing it with string.replace without regex? I have no idea.
Thx!

Javascript for Variations with Repetition (combinatorics) of missing string characters

My question is similar to THIS question that hasn't been answered yet.
How can I make my code (or any javascript code that might be suggested?) find all possible solutions of a known string length with multiple missing characters in variation with repetition?
I'm trying to take a string of known character lengths and find missing characters from that string. For example:
var missing_string = "ov!rf!ow"; //where "!" are the missing characters
I'm hoping to run a script with a specific array such as:
var r = new Array("A","B","C","D","E","F","G","H","I","J","K",
"L","M","N","O","P","Q","R","S","T","U","V",
"W","X","Y","Z",0,1,2,3,4,5,6,7,8,9);
To find all the possible variations with repetition of those missing characters to get a result of:
ovArfAow
ovBrfAow
ovCrfAow
...
ovBrfBow
ovBrfCow
...
etc //ignore the case insensitive, just to emphasize the example
and of course, eventually find ovErfLow within all the variations with repetition.
I've been able to make it work with 1 (single) missing character. However, when I put 2 missing characters with my code it obviously repeats the same array character for both missing characters which is GREAT for repition but I also need to find without repetition as well and might need to have 3-4 missing characters as well which may or may not be repeated. Here's what I have so far:
var r = new Array("A","B","C","D","E","F","G","H","I","J","K",
"L","M","N","O","P","Q","R","S","T","U","V",
"W","X","Y","Z",0,1,2,3,4,5,6,7,8,9);
var missing_string = "he!!ow!r!d";
var bt_lng = missing_string.length;
var bruted="";
for (z=0; z<r.length; z++) {
for(var x=0;x<bt_lng;x++){
for(var y=0;y<r.length;y++){
if(missing_string.charAt(x) == "!"){
bruted += r[z];
break;
}
else if(missing_string.charAt(x) == r[y]){
bruted += r[y];
}
}
}
console.log("br: " + bruted);
bruted="";
}
This works GREAT with just ONE "!":
helloworAd
helloworBd
helloworCd
...
helloworLd
However with 2 or more "!", I get:
heAAowArAd
heBBowBrBd
heCCowCrCd
...
heLLowLrLd
which is good for the repetition part but I also need to test all possible array M characters in each missing character spot.

Maybe the following function in pure javascript is a possible solution for you. It uses Array.prototype.reduce to create the cartesian product c of the given alphabet x, whereby its power n depends on the count of the exclamation marks in your word w.
function combinations(w) {
var x = new Array(
"A","B","C","D","E","F","G","H","I","J","K",
"L","M","N","O","P","Q","R","S","T","U","V",
"W","X","Y","Z",0,1,2,3,4,5,6,7,8,9
),
n = w.match(/\!/g).length,
x_n = new Array(),
r = new Array(),
c = null;
for (var i = n; i > 0; i--) {
x_n.push(x);
}
c = x_n.reduce(function(a, b) {
var c = [];
a.forEach(function(a) {
b.forEach(function(b) {
c.push(a.concat([b]));
});
});
return c;
}, [[]]);
for (var i = 0, j = 0; i < c.length; i++, j = 0) {
r.push(w.replace(/\!/g, function(s, k) {
return c[i][j++];
}));
}
return r;
}
Call it like this console.log(combinations("ov!rf!ow")) in your browser console.

What is a robust & nifty way to cycle thru a string and shave off a part of it till

I have a string and in some instances it can be over 150 chars in length(including space and special chars). I was just gonna take the curr length, minus 150 (if greater than 150) and with the remember, shave off a part of the string. I am curious if there is a robust way to do it? The issue is, I don't necessarily want to shave the end. I want to shave the part that resides in a "span" with a certain ID. I want to have that string section and append with "...". So, I have this.
For example. I have.
<div id="divid">
Funny thing is, I went to the store <span id="spanid">on a Tuesday afternoon while the sun was in the sky</span> and rode home with excitement and glee. Did I say it was Tuesday?
</div>
var txtcount = jQuery('#divid').text().length;
var spanidcount = jQuery('#spanid').text().length;
if(txtcount > 140){
var tocut = txtcount - 140;
// here I would reduce the contents of spanid so that the total string count is 140 or less. and have spanid end with "..." - with the ... counting toward the total of 140.
}

A cleaner way would be to use CSS text-overflow:ellipsis on your div. Sample fiddle.
The advantage of this way that you don't trust font size and variant letter width to not screw you up. You always cut the text exactly where you need it. And if div is resized, the ellipsis is automagically adjusted to the right length.

The best thing to do is implement a truncate function. You don't have to extend the String prototype, but I did in this case. :P
http://jsfiddle.net/j89em/1/
String.prototype.truncate = function (len, trail) {
len = len || 10; // default to 10
trail = trail || '...';
return len < this.length ? this.substring(0, len - trail.length) + trail : this;
};
var $div = $('div'),
$span = $div.find('span');
$span.text($span.text().truncate(25));
So you could actually test the total test length and apply the truncate method if needed.
if ($div.text().length > 140) {
$span.text($span.text().truncate(25));
}

You can do it like this:
html:
<p>Funny thing is, I went to the store <span>on a Tuesday afternoon while the sun was in the sky</span> and rode home with excitement and glee. Did I say it was Tuesday?</p>
<button>Reduce</button>
jQ:
var reduceStr = function(str, maxLen) {
return str.substr(0, maxLen-1) + (str.length > maxLen ? '...' : '');
};
$('button').click(function(){
$('span').text(reduceStr($('span').text(), 30));
});

Include this (plug-in) in a script tag or linked from an external .js file sometime after jQuery has been loaded:
(function($) {
$.fn.trimFluff = function(options) {
var settings = $.extend({
'childSelector': '#spanid',
'maxLength': 140
}, options);
return this.each(function() {
var container = $(this);
var child = $(settings.childSelector);
var containerLen = container.text().length;
var childLen = child.text().length;
var fluffToTrim = containerLen - settings.maxLength;
if (containerLen > settings.maxLength) {
if (fluffToTrim > childLen) { //'fluffToTrim' is larger than the child contents...
$(this).find(settings.childSelector).remove(); //remove child
containerLen = container.text().length; //recalc new length
fluffToTrim = containerLen - settings.maxLength; //recalc 'fluffToTrim'
if (containerLen > settings.maxLength) {
//remove "offending length" characters + 3 for the ellipsis and replace with the ellipsis
container.text(container.text().substring(0, containerLen - fluffToTrim + 3) + '...');
//string is now under (or equal to) 140 characters
}
} else {
//remove "offending length" characters + 3 for the ellipsis, from the child, and replace with the ellipsis
child.text(child.text().substring(0, childLen - fluffToTrim + 3) + '...');
}
}
});
};
}(jQuery));
Then call it like so:
$('#divid').trimFluff();
or pass in an options object. There are two options, childSelector which accepts any valid jQuery selector (or element, or jQuery object) and maxLength (which I hope is self-explanatory :) ).
Examples:
$('#divid').trimFluff({childSelector: 'span', maxLength: 150});
$('#divid').trimFluff({childSelector: $('#spanid'), maxLength: 140});
$('#divid').trimFluff({childSelector: 'span#spanid.customClass', maxLength: 160});
var s = document.getElementById('spanid');
$('divid').trimFluff({childSelector: s);
This will trim from the child first, keeping the left side, and if the text to to cut is larger than the child itself, it will remove the child entirely and trim the remaining contents of the div (or other container) until the text is less than the maxLength.
This does not do virtually any error checking, but it will work with any jQuery object where the text() function does something.
Have fun with it.

We Keep Coding

JavaScript is the programming language of the Web.

Regex efficiency, better way to select text among html - javascript

Related

Parse semi-structured values

Is it possible to highlight all words on a web page without destroying the layout?

Searching for most performant way for string replacing with javascript

Javascript for Variations with Repetition (combinatorics) of missing string characters

What is a robust & nifty way to cycle thru a string and shave off a part of it till

Categories

Resources