jQuery / JavaScript Parsing strings the proper way - javascript

Recently, I've been attempting to emulate a small language in jQuery and JavaScript, yet I've come across what I believe is an issue. I think that I may be parsing everything completely wrong.
In the code:
#name Testing
#inputs
#outputs
#persist
#trigger
print("Test")
The current way I am separating and parsing the string is by splitting all of the code into lines, and then reading through this lines array using searches and splits. For example, I would find the name using something like:
if(typeof lines[line] === 'undefined')
{
}
else
{
if(lines[line].search('#name') == 0)
{
name = lines[line].split(' ')[1];
}
}
But I think that I may be largely wrong on how I am handling parsing.
While reading through examples on how other people are handling parsing of code blocks like this, it appeared that people parsed the entire block, instead of splitting it into lines as I do. I suppose the question of the matter is, what is the proper and conventional way of parsing things like this, and how do you suggest I use it to parse something such as this?

In simple cases like this regular expressions is your tool of choice:
matches = code.match(/#name\s+(\w+)/)
name = matches[1]
To parse "real" programming languages regexps are not powerful enough, you'll need a parser, either hand-written or automatically generated with a tool like PEG.

A general approach to parsing, that I like to take often is the following:
loop through the complete block of text, character by character.
if you find a character that signalizes the start of one unit, call a specialized subfunction to parse the next characters.
within each subfunction, call additional subfunctions if you find certain characters
return from every subfunction when a character is found, that signalizes, that the unit has ended.
Here is a small example:
var text = "#func(arg1,arg2)"
function parse(text) {
var i, max_i, ch, funcRes;
for (i = 0, max_i = text.length; i < max_i; i++) {
ch = text.charAt(i);
if (ch === "#") {
funcRes = parseFunction(text, i + 1);
i = funcRes.index;
}
}
console.log(funcRes);
}
function parseFunction(text, i) {
var max_i, ch, name, argsRes;
name = [];
for (max_i = text.length; i < max_i; i++) {
ch = text.charAt(i);
if (ch === "(") {
argsRes = parseArguments(text, i + 1);
return {
name: name.join(""),
args: argsRes.arr,
index: argsRes.index
};
}
name.push(ch);
}
}
function parseArguments(text, i) {
var max_i, ch, args, arg;
arg = [];
args = [];
for (max_i = text.length; i < max_i; i++) {
ch = text.charAt(i);
if (ch === ",") {
args.push(arg.join(""));
arg = [];
continue;
} else if (ch === ")") {
args.push(arg.join(""));
return {
arr: args,
index: i
};
}
arg.push(ch);
}
}
FIDDLE
this example just parses function expressions, that follow the syntax "#functionName(argumentName1, argumentName2, ...)". The general idea is to visit every character exactly once without the need to save current states like "hasSeenAtCharacter" or "hasSeenOpeningParentheses", which can get pretty messy when you parse large structures.
Please note that this is a very simplified example and it misses all the error handling and stuff like that, but I hope the general idea can be seen. Note also that I'm not saying that you should use this approach all the time. It's a very general approach, that can be used in many scenerios. But that doesn't mean that it can't be combined with regular expressions for instance, if it, at some part of your text, makes more sense than parsing each individual character.
And one last remark: you can save yourself the trouble if you put the specialized parsing function inside the main parsing function, so that all functions have access to the same variable i.

Related

Number formatting in template strings (Javascript - ES6)

I was wondering if it is possible to format numbers in Javascript template strings, for example something like:
var n = 5.1234;
console.log(`This is a number: $.2d{n}`);
// -> 5.12
Or possibly
var n = 5.1234;
console.log(`This is a number: ${n.toString('.2d')}`);
// -> 5.12
That syntax obviously doesn't work, it is just an illustration of the type of thing I'm looking for.
I am aware of tools like sprintf from underscore.string, but this seems like something that JS should be able to do out the box, especially given the power of template strings.
EDIT
As stated above, I am already aware of 3rd party tools (e.g. sprintf) and customised functions to do this. Similar questions (e.g. JavaScript equivalent to printf/String.Format) don't mention template strings at all, probably because they were asked before the ES6 template strings were around. My question is specific to ES6, and is independent of implementation. I am quite happy to accept an answer of "No, this is not possible" if that is case, but what would be great is either info about a new ES6 feature that provides this, or some insight into whether such a feature is on its way.
No, ES6 does not introduce any new number formatting functions, you will have to live with the existing .toExponential(fractionDigits), .toFixed(fractionDigits), .toPrecision(precision), .toString([radix]) and toLocaleString(…) (which has been updated to optionally support the ECMA-402 Standard, though).
Template strings have nothing to do with number formatting, they just desugar to a function call (if tagged) or string concatenation (default).
If those Number methods are not sufficient for you, you will have to roll your own. You can of course write your formatting function as a template string tag if you wish to do so.
You should be able to use the toFixed() method of a number:
var num = 5.1234;
var n = num.toFixed(2);
If you want to use ES6 tag functions here's how such a tag function would look,
function d2(pieces) {
var result = pieces[0];
var substitutions = [].slice.call(arguments, 1);
for (var i = 0; i < substitutions.length; ++i) {
var n = substitutions[i];
if (Number(n) == n) {
result += Number(substitutions[i]).toFixed(2);
} else {
result += substitutions[i];
}
result += pieces[i + 1];
}
return result;
}
which can then be applied to a template string thusly,
d2`${some_float} (you can interpolate as many floats as you want) of ${some_string}`;
that will format the float and leave the string alone.
Here's a fully ES6 version of Filip Allberg's solution above, using ES6 "rest" params. The only thing missing is being able to vary the precision; that could be done by making a factory function. Left as an exercise for the reader.
function d2(strs, ...args) {
var result = strs[0];
for (var i = 0; i < args.length; ++i) {
var n = args[i];
if (Number(n) == n) {
result += Number(args[i]).toFixed(2);
} else {
result += args[i];
}
result += strs[i+1];
}
return result;
}
f=1.2345678;
s="a string";
console.log(d2`template: ${f} ${f*100} and ${s} (literal:${9.0001})`);
While template-string interpolation formatting is not available as a built-in, you can get equivalent behavior with Intl.NumberFormat:
const format = (num, fraction = 2) => new Intl.NumberFormat([], {
minimumFractionDigits: fraction,
maximumFractionDigits: fraction,
}).format(num);
format(5.1234); // -> '5.12'
Note that regardless of your implementation of choice, you might get bitten by rounding errors:
(9.999).toFixed(2) // -> '10.00'
new Intl.NumberFormat([], {
minimumFractionDigits: 2,
maximumFractionDigits: 2, // <- implicit rounding!
}).format(9.999) // -> '10.00'
based on ES6 Tagged Templates (credit to https://stackoverflow.com/a/51680250/711085), this will emulate typical template string syntax in other languages (this is loosely based on python f-strings; I avoid calling it f in case of name overlaps):
Demo:
> F`${(Math.sqrt(2))**2}{.0f}` // normally 2.0000000000000004
"2"
> F`${1/3}{%} ~ ${1/3}{.2%} ~ ${1/3}{d} ~ ${1/3}{.2f} ~ ${1/3}"
"33% ~ 33.33% ~ 0 ~ 0.33 ~ 0.3333333333333333"
> F`${[1/3,1/3]}{.2f} ~ ${{a:1/3, b:1/3}}{.2f} ~ ${"someStr"}`
"[0.33,0.33] ~ {\"a\":\"0.33\",\"b\":\"0.33\"} ~ someStr
Fairly simple code using :
var FORMATTER = function(obj,fmt) {
/* implements things using (Number).toFixed:
${1/3}{.2f} -> 0.33
${1/3}{.0f} -> 1
${1/3}{%} -> 33%
${1/3}{.3%} -> 33.333%
${1/3}{d} -> 0
${{a:1/3,b:1/3}}{.2f} -> {"a":0.33, "b":0.33}
${{a:1/3,b:1/3}}{*:'.2f',b:'%'} -> {"a":0.33, "b":'33%'} //TODO not implemented
${[1/3,1/3]}{.2f} -> [0.33, 0.33]
${someObj} -> if the object/class defines a method [Symbol.FTemplate](){...},
it will be evaluated; alternatively if a method [Symbol.FTemplateKey](key){...}
that can be evaluated to a fmt string; alternatively in the future
once decorators exist, metadata may be appended to object properties to derive
formats //TODO not implemented
*/
try {
let fracDigits=0,percent;
if (fmt===undefined) {
if (typeof obj === 'string')
return obj;
else
return JSON.stringify(obj);
} else if (obj instanceof Array)
return '['+obj.map(x=> FORMATTER(x,fmt))+']'
else if (typeof obj==='object' && obj!==null /*&&!Array.isArray(obj)*/)
return JSON.stringify(Object.fromEntries(Object.entries(obj).map(([k,v])=> [k,FORMATTER(v,fmt)])));
else if (matches = fmt.match(/^\.(\d+)f$/))
[_,fracDigits] = matches;
else if (matches = fmt.match(/^(?:\.(\d+))?(%)$/))
[_,fracDigits,percent] = matches;
else if (matches = fmt.match(/^d$/))
fracDigits = 0;
else
throw 'format not recognized';
if (obj===null)
return 'null';
if (obj===undefined) {
// one might extend the above syntax to
// allow for example for .3f? -> "undefined"|"0.123"
return 'undefined';
}
if (percent)
obj *= 100;
fracDigits = parseFloat(fracDigits);
return obj.toFixed(fracDigits) + (percent? '%':'');
} catch(err) {
throw `error executing F\`$\{${someObj}\}{${fmt}}\` specification: ${err}`
}
}
function F(strs, ...args) {
/* usage: F`Demo: 1+1.5 = ${1+1.5}{.2f}`
--> "Demo: 1+1.5 = 2.50"
*/
let R = strs[0];
args.forEach((arg,i)=> {
let [_,fmt,str] = strs[i+1].match(/(?:\{(.*)(?<!\\)\})?(.*)/);
R += FORMATTER(arg,fmt) + str;
});
return R;
}
sidenote: The core of the code is as follows. The heavy lifting is done by the formatter. The negative lookbehind is somewhat optional, and to let one escape actual curly braces.
let R = strs[0];
args.forEach((arg,i)=> {
let [_,fmt,str] = strs[i+1].match(/(?:\{(.*)(?<!\\)\})?(.*)/);
R += FORMATTER(arg,fmt) + str;
});
You can use es6 tag functions. I don't know ready for use of that.
It might look like this:
num`This is a number: $.2d{n}`
Learn more:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals
https://developers.google.com/web/updates/2015/01/ES6-Template-Strings

Does JavaScript support array/list comprehensions like Python?

I'm practicing/studying both JavaScript and Python. I'm wondering if Javascript has the equivalence to this type of coding.
I'm basically trying to get an array from each individual integer from the string for practice purposes. I'm more proficient in Python than JavaScript
Python:
string = '1234-5'
forbidden = '-'
print([int(i) for i in str(string) if i not in forbidden])
Does Javascript have something similar for me to do above?
Update: Array comprehensions were removed from the standard. Quoting MDN:
The array comprehensions syntax is non-standard and removed starting with Firefox 58. For future-facing usages, consider using Array.prototype.map, Array.prototype.filter, arrow functions, and spread syntax.
See this answer for an example with Array.prototype.map:
let emails = people.map(({ email }) => email);
Original answer:
Yes, JavaScript will support array comprehensions in the upcoming EcmaScript version 7.
Here's an example.
var str = "1234-5";
var ignore = "-";
console.log([for (i of str) if (!ignore.includes(i)) i]);
Given the question's Python code
print([int(i) for i in str(string) if i not in forbidden])
this is the most direct translation to JavaScript (ES2015):
const string = '1234-5';
const forbidden = '-';
console.log([...string].filter(c => !forbidden.includes(c)).map(c => parseInt(c)));
// result: [ 1, 2, 3, 4, 5 ]
Here is a comparison of the Python and JavaScript code elements being used:
(Python -> Javascript):
print -> console.log
unpacking string to list -> spread operator
list comprehension 'if' -> Array.filter
list comprehension 'for' -> Array.map
substr in str? -> string.includes
Reading the code, I assume forbidden can have more than 1 character. I'm also assuming the output should be "12345"
var string = "12=34-5";
var forbidden = "=-";
console.log(string.split("").filter(function(str){
return forbidden.indexOf(str) < 0;
}).join(""))
If the output is "1" "2" "3" "4" "5" on separate lines
var string = "12=34-5";
var forbidden = "=-";
string.split("").forEach(function(str){
if (forbidden.indexOf(str) < 0) {
console.log(str);
}
});
Not directly, but it's not hard to replicate.
var string = "1234-5";
var forbidden = "-";
string.split("").filter(function(str){
if(forbidden.indexOf(str) < 0) {
return str;
}
}).forEach(function(letter) { console.log(letter);});
I guess more directly:
for(var i=0 ; i < str.length ; i++) {
if(forbidden.indexOf(str) < 0) {
console.log(str[i]);
}
}
But there's no built in way to filter in your for loop.
You could easily achieve this behavior using an application functor.
Array.prototype.ap = function(xs) {
return this.reduce((acc, f) => acc.concat(xs.map(f)), [])
}
const result = [x => x +1].ap([2])
console.log(result)
JavaScript no longer supports array comprehensions.
I too was looking for the JavaScript equivalent. Mozilla Developer's Network indicates that this functionality is no longer supported.
The preferred syntax is referenced in the aforementioned link.
For "completeness"-sake, here's a shorter regexp version.
var str = "1234-5";
var ignore = "-=";
console.log(str.replace(new RegExp(ignore.split("").join("|")), "").split(""));
EDIT: To make sure that RegExp does not "choke" on special characters, ignore can be implemented as regexp literal, instead of a string:
var str = "1234-5";
var ignore = /[\+=-]/;
console.log(str.replace(ignore, "").split(""));
It does have a poor mans version
const string = '1234-5'
const forbidden = '-'
print([int(i) for i in str(string) if i not in forbidden])
const result = string.split('').filter(char => char !== forbidden);
console.log(result)
In JS you can only iterate over single elements in array, so no extraction of multiple entries at a time like in Python.
For this particular case you should use a RegExp to filter the string though.
You could have a look at CoffeeScript.
CoffeeScript adds missing features to java-script and allows you to write cleaner, more readable code. https://coffeescript.org/#coffeescript-2
You write a .coffee file and the coffeScript-compiler compiles your coffee file into a JavaScript file. Because the translation into JavaScript happens by compiling, the script should not run any slower.
So your code would look like the following in coffee script:
string = '1234-5'
forbidden = '-'
alert(JSON.stringify(+i for i in string when i isnt forbidden))
Honestly, this is even easier to read then the python counterpart. And it compiles quickly to the fallowing JavaScript:
var forbidden, i, string;
string = '1234-5';
forbidden = '-';
alert(JSON.stringify((function() {
var j, len, results;
results = [];
for (j = 0, len = string.length; j < len; j++) {
i = string[j];
if (i !== forbidden) {
results.push(+i);
}
}
return results;
})()));
You don’t even need to install anything. On their website you can play around with it, and it will show you the translated JavaScript code.
Javascript doesn't need list comprehensions because the map and filter functions work better in the language compared to Python.
In Python:
[int(i) for i in '1234-5' if i != '-']
# is equivalent to the ugly
list(map(lambda _: int(_),filter(lambda _: _!='-','1234-5')))
# so we use list comprehensions
In Javascript, to me this is fine once you're familiar with the syntax:
[...'1234-5'].filter(_=> _!='-').map(_=> parseInt(_))

Node.innerHTML giving tag names in lower case

I am iterating NodeList to get Node data, but while using Node.innerHTML i am getting the tag names in lowercase.
Actual Tags
<Panel><Label>test</Label></Panel>
giving as
<panel><label>test</label></panel>
I need these tags as it is. Is it possible to get it with regular expression? I am using it with dojo (is there any way in dojo?).
var xhrArgs = {
url: "./user/"+Runtime.userName+"/ws/workspace/"+Workbench.getProject()+"/lib/custom/"+(first.type).replace(".","/")+".html",
content: {},
sync:true,
load: function(data){
var test = domConstruct.toDom(data);
dojo.forEach(dojo.query("[id]",test),function(node){
domAttr.remove(node,"id");
});
var childEle = "";
dojo.forEach(test.childNodes,function(node){
if(node.innerHTML){
childEle+=node.innerHTML;
}
});
command.add(new ModifyCommand(newWidget,{},childEle,context));
}
};
You cannot count on .innerHTML preserving the exact nature of your original HTML. In fact, in some browsers, it's significantly different (though generates the same results) with different quotation, case, order of attributes, etc...
It is much better to not rely on the preservation of case and adjust your javascript to deal with uncertain case.
It is certainly possible to use a regular expression to do a case insensitive search (the "i" flag designates its searches as case insensitive), though it is generally much, much better to use direct DOM access/searching rather than innerHTML searching. You'd have to tell us more about what exactly you're trying to do before we could offer some code.
It would take me a bit to figure that out with a regex, but you can use this:
var str = '<panel><label>test</label></panel>';
chars = str.split("");
for (var i = 0; i < chars.length; i++) {
if (chars[i] === '<' || chars[i] === '/') {
chars[i + 1] = chars[i + 1].toUpperCase();
}
}
str = chars.join("");
jsFiddle
I hope it helps.
If you are trying to just capitalise the first character of the tag name, you can use:
var s = 'panel';
s.replace(/(^.)(.*)/,function(m, a, b){return a.toUpperCase() + b.toLowerCase()}); // Panel
Alternatively you can use string manipulation (probably more efficient than a regular expression):
s.charAt(0).toUpperCase() + s.substring(1).toLowerCase(); // Panel
The above will output any input string with the first character in upper case and everything else lower case.
this is not thoroughly tested , and is highly inefficcient, but it worked quite quickly in the console:
(also, it's jquery, but it can be converted to pure javascript/DOM easily)
in jsFiddle
function tagString (element) {
return $(element).
clone().
contents().
remove().
end()[0].
outerHTML.
replace(/(^<\s*\w)|(<\/\s*\w(?=\w*\s*>$))/g,
function (a) {
return a.
toUpperCase();
}).
split(/(?=<\/\s*\w*\s*>$)/);
}
function capContents (element) {
return $(element).
contents().
map(function () {
return this.nodeType === 3 ? $(this).text() : capitalizeHTML(this);
})
}
function capitalizeHTML (selector) {
var e = $(selector).first();
var wrap = tagString(e);
return wrap[0] + capContents(e).toArray().join("") + wrap[1];
}
capitalizeHTML('body');
also, besides being a nice exercise (in my opinion), do you really need to do this?

Javascript for Variations with Repetition (combinatorics) of missing string characters

My question is similar to THIS question that hasn't been answered yet.
How can I make my code (or any javascript code that might be suggested?) find all possible solutions of a known string length with multiple missing characters in variation with repetition?
I'm trying to take a string of known character lengths and find missing characters from that string. For example:
var missing_string = "ov!rf!ow"; //where "!" are the missing characters
I'm hoping to run a script with a specific array such as:
var r = new Array("A","B","C","D","E","F","G","H","I","J","K",
"L","M","N","O","P","Q","R","S","T","U","V",
"W","X","Y","Z",0,1,2,3,4,5,6,7,8,9);
To find all the possible variations with repetition of those missing characters to get a result of:
ovArfAow
ovBrfAow
ovCrfAow
...
ovBrfBow
ovBrfCow
...
etc //ignore the case insensitive, just to emphasize the example
and of course, eventually find ovErfLow within all the variations with repetition.
I've been able to make it work with 1 (single) missing character. However, when I put 2 missing characters with my code it obviously repeats the same array character for both missing characters which is GREAT for repition but I also need to find without repetition as well and might need to have 3-4 missing characters as well which may or may not be repeated. Here's what I have so far:
var r = new Array("A","B","C","D","E","F","G","H","I","J","K",
"L","M","N","O","P","Q","R","S","T","U","V",
"W","X","Y","Z",0,1,2,3,4,5,6,7,8,9);
var missing_string = "he!!ow!r!d";
var bt_lng = missing_string.length;
var bruted="";
for (z=0; z<r.length; z++) {
for(var x=0;x<bt_lng;x++){
for(var y=0;y<r.length;y++){
if(missing_string.charAt(x) == "!"){
bruted += r[z];
break;
}
else if(missing_string.charAt(x) == r[y]){
bruted += r[y];
}
}
}
console.log("br: " + bruted);
bruted="";
}
This works GREAT with just ONE "!":
helloworAd
helloworBd
helloworCd
...
helloworLd
However with 2 or more "!", I get:
heAAowArAd
heBBowBrBd
heCCowCrCd
...
heLLowLrLd
which is good for the repetition part but I also need to test all possible array M characters in each missing character spot.
Maybe the following function in pure javascript is a possible solution for you. It uses Array.prototype.reduce to create the cartesian product c of the given alphabet x, whereby its power n depends on the count of the exclamation marks in your word w.
function combinations(w) {
var x = new Array(
"A","B","C","D","E","F","G","H","I","J","K",
"L","M","N","O","P","Q","R","S","T","U","V",
"W","X","Y","Z",0,1,2,3,4,5,6,7,8,9
),
n = w.match(/\!/g).length,
x_n = new Array(),
r = new Array(),
c = null;
for (var i = n; i > 0; i--) {
x_n.push(x);
}
c = x_n.reduce(function(a, b) {
var c = [];
a.forEach(function(a) {
b.forEach(function(b) {
c.push(a.concat([b]));
});
});
return c;
}, [[]]);
for (var i = 0, j = 0; i < c.length; i++, j = 0) {
r.push(w.replace(/\!/g, function(s, k) {
return c[i][j++];
}));
}
return r;
}
Call it like this console.log(combinations("ov!rf!ow")) in your browser console.

Fastest way to search string in javascript

I have a hidden field on my page that stores space separated list of emails.
I can have maximum 500 emails in that field.
What will be the fastest way to search if a given email already exists in that list?
I need to search multiple emails in a loop
use RegEx to find a match
use indexOf()
convert the list to a
javascript dictionary and then
search
If this is an exact duplicate, please let me know the other question.
Thanks
EDIT:
Thanks everyone for your valuable comments and answers.
Basically my user has a list of emails(0-500) in db.
User is presented with his own contact list.
User can then choose one\more emails from his contact list to add to the list.
I want to ensure at client side that he is not adding duplicate emails.
Whole operation is driven by ajax, so jsvascript is required.
The answer is: It depends.
It depends on what you actually want to measure.
It depends on the relationship between how many you're searching for vs. how many you're searching.
It depends on the JavaScript implementation. Different implementations usually have radically different performance characteristics. This is one of the many reasons why the rule "Don't optimize prematurely" applies especially to cross-implementation JavaScript.
...but provided you're looking for a lot fewer than you have in total, it's probably String#indexOf unless you can create the dictionary once and reuse it (not just this one loop of looking for X entries, but every loop looking for X entries, which I tend to doubt is your use-case), in which case that's hands-down faster to build the 500-key dictionary and use that.
I put together a test case on jsperf comparing the results of looking for five strings buried in a string containing 500 space-delimited, unique entries. Note that that jsperf page compares some apples and oranges (cases where we can ignore setup and what kind of setup we're ignoring), but jsperf was being a pain about splitting it and I decided to leave that as an exercise for the reader.
In my tests of what I actually think you're doing, Chrome, Firefox, IE6, IE7 and IE9 did String#indexOf fastest. Opera did RegExp alternation fastest. (Note that IE6 and IE7 don't have Array#indexOf; the others do.) If you can ignore dictionary setup time, then using a dictionary is the hands-down winner.
Here's the prep code:
// ==== Main Setup
var toFind = ["aaaaa100#zzzzz", "aaaaa200#zzzzz", "aaaaa300#zzzzz", "aaaaa400#zzzzz", "aaaaa500#zzzzz"];
var theString = (function() {
var m, n;
m = [];
for (n = 1; n <= 500; ++n) {
m.push("aaaaa" + n + "#zzzzz");
}
return m.join(" ");
})();
// ==== String#indexOf (and RegExp) setup for when we can ignore setup
var preppedString = " " + theString + " ";
// ==== RegExp setup for test case ignoring RegExp setup time
var theRegExp = new RegExp(" (?:" + toFind.join("|") + ") ", "g");
// ==== Dictionary setup for test case ignoring Dictionary setup time
var theDictionary = (function() {
var dict = {};
var index;
var values = theString.split(" ");
for (index = 0; index < values.length; ++index) {
dict[values[index]] = true;
}
return dict;
})();
// ==== Array setup time for test cases where we ignore array setup time
var theArray = theString.split(" ");
The String#indexOf test:
var index;
for (index = 0; index < toFind.length; ++index) {
if (theString.indexOf(toFind[index]) < 0) {
throw "Error";
}
}
The String#indexOf (ignore setup) test, in which we ignore the (small) overhead of putting spaces at either end of the big string:
var index;
for (index = 0; index < toFind.length; ++index) {
if (preppedString.indexOf(toFind[index]) < 0) {
throw "Error";
}
}
The RegExp alternation test:
// Note: In real life, you'd have to escape the values from toFind
// to make sure they didn't have special regexp chars in them
var regexp = new RegExp(" (?:" + toFind.join("|") + ") ", "g");
var match, counter = 0;
var str = " " + theString + " ";
for (match = regexp.exec(str); match; match = regexp.exec(str)) {
++counter;
}
if (counter != 5) {
throw "Error";
}
The RegExp alternation (ignore setup) test, where we ignore the time it takes to set up the RegExp object and putting spaces at either end of the big string (I don't think this applies to your situation, the addresses you're looking for would be static):
var match, counter = 0;
for (match = theRegExp.exec(preppedString); match; match = theRegExp.exec(preppedString)) {
++counter;
}
if (counter != 5) {
throw "Error";
}
The Dictionary test:
var dict = {};
var index;
var values = theString.split(" ");
for (index = 0; index < values.length; ++index) {
dict[values[index]] = true;
}
for (index = 0; index < toFind.length; ++index) {
if (!(toFind[index] in dict)) {
throw "Error";
}
}
The Dictionary (ignore setup) test, where we don't worry about the setup time for the dictionary; note that this is different than the RegExp alternation (ignore setup) test because it assumes the overall list is invariant:
var index;
for (index = 0; index < toFind.length; ++index) {
if (!(toFind[index] in theDictionary)) {
throw "Error";
}
}
The Array#indexOf test (note that some very old implementations of JavaScript may not have Array#indexOf):
var values = theString.split(" ");
var index;
for (index = 0; index < toFind.length; ++index) {
if (values.indexOf(toFind[index]) < 0) {
throw "Error";
}
}
The Array#indexOf (ignore setup) test, which like Dictionary (ignore setup) assumes the overall list is invariant:
var index;
for (index = 0; index < toFind.length; ++index) {
if (theArray.indexOf(toFind[index]) < 0) {
throw "Error";
}
}
Instead of looking for the fastest solution, you first need to make sure that you’re actually having a correct solution. Because there are four cases an e-mail address can appear and a naive search can fail:
Alone: user#example.com
At the begin: user#example.com ...
At the end: ... user#example.com
In between: ... user#example.com ...
Now let’s analyze each variant:
To allow arbitrary input, you will need to escape the input properly. You can use the following method to do so:
RegExp.quote = function(str) {
return str.toString().replace(/(?=[.?*+^$[\]\\(){}-])/g, "\\");
};
To match all four cases, you can use the following pattern:
/(?:^|\ )user#example\.com(?![^\ ])/
Thus:
var inList = new RegExp("(?:^| )" + RegExp.quote(needle) + "(?![^ ])").test(haystack);
Using indexOf is a little more complex as you need to check the boundaries manually:
var pos = haystack.indexOf(needle);
if (pos != -1 && (pos != 0 && haystack.charAt(pos-1) !== " " || haystack.length < (pos+needle.length) && haystack.charAt(pos+needle.length) !== " ")) {
pos = -1;
}
var inList = pos != -1;
This one is rather quite simple:
var dict = {};
haystack.match(/[^\ ]+/g).map(function(match) { dict[match] = true; });
var inList = dict.hasOwnProperty(haystack);
Now to test what variant is the fastest, you can do that at jsPerf.
indexOf() is most probably the fastest just keep in mind you need to search for two possible cases:
var existingEmails = "email1, email2, ...";
var newEmail = "somethingHere#email.com";
var exists = (existingEmails.indexOf(newEmail + " ") >= 0) || (existingEmails.indexOf(" " + newEmail ) > 0);
You're asking a question with too many unstated variables for us to answer. For example, how many times do you expect to perform this search? only once? A hundred times? Is this a fixed list of emails, or does it change every time? Are you loading the emails with the page, or by AJAX?
IF you are performing more than one search, or the emails are loaded with the page, then you are probably best off creating a dictionary of the names, and using the Javascript in operator.
If you get the string from some off-page source, and you only search it once, then indexOf may well be better.
In all cases, if you really care about the speed, you're best off making a test.
But then I'd ask "Why do you care about the speed?" This is a web page, where loading the page happens at network speeds; the search happens at more or less local-processor speed. It's very unlikely that this one search will make a perceptible difference in the behavior of the page.
Here is a little explanation:
Performing a dictionary lookup is relatively complicated - very fast compared with (say) a linear lookup by key when there are lots of keys, but much more complicated than a straight array lookup. It has to calculate the hash of the key, then work out which bucket that should be in, possibly deal with duplicate hashes (or duplicate buckets) and then check for equality.
As always, choose the right data structure for the job - and if you really can get away with just indexing into an array (or List) then yes, that will be blindingly fast.
The above has been taken from one of the blog posts of #Jon Skeet.
I know this is an old question, but here goes an answer for those who might need in the future.
I made some tests and the indexOf() method is impossibly fast!
Tested the case on Opera 12.16 and it took 216µs to search and possibly find something.
Here is the code used:
console.time('a');
var a=((Math.random()*1e8)>>0).toString(16);
for(var i=0;i<1000;++i)a=a+' '+((Math.random()*1e8)>>0).toString(16)+((Math.random()*1e8)>>0).toString(16)+((Math.random()*1e8)>>0).toString(16)+((Math.random()*1e8)>>0).toString(16);
console.timeEnd('a');
console.time('b');
var b=(' '+a).indexOf(((Math.random()*1e8)>>0).toString(16));
console.timeEnd('b');
console.log([a,b]);
In the console you will see a huge output.
The timer 'a' counts the time taken to make the "garbage", and the timer 'b' is the time to search for the string.
Just adding 2 spaces, one before and one after, on the email list and adding 1 space before and after the email, you are set to go.
I use it to search for a class in an element without jQuery and it works pretty fast and fine.

Categories