I'm creating a text editor and I've just finished writing the highlighting algorithms to have each of the syntax appear in a different color, and render in the right position using the proper parse trees.
I was wondering if anyone could provide me with, or the location of a test or series of test cases to make sure nothing will break. The test case(s) should cover all of JavaScript syntax as it is used on the web including edge cases (i.e., including syntax like throw although it is rarely used), DOM creation and manipulation etc.
I have added the following static test case. It should cover all the syntax.
There are a few things to note: since the code is being parse recursively on a grammar level, only basic cases are required. For example, to the editor:
a[1]; and a[1][2][3][4][5]; would be the same syntax. Since the second line, is just recursively more subs then the the first line.
The test case I have created has been moved to an answer below.
Interesting question. I think my initial approach, barring any other interesting suggestions here, would be to grab a bunch of JavaScript from fairly major libraries. I'm thinking jQuery, Mootools, Prototype, etc.
Then, once you've done a few major libs, do some smaller ones. I'd checkout Github. Maybe look at Underscore, HeadJS, and maybe some others at https://github.com/languages/JavaScript.
I would also take a couple minified libraries, run them through JSBeautifier. Not sure if beautified JS may have slightly altered syntax from the original.
Lastly, I would consider running some of these libraries through JSLint, and then manually go through and modify the sources to explicitly hit some of the 'rules' that JSLint has laid out.
EDIT: And by "hit", I mean make sure you cover both scenarios offered by each rule, not just the 'clean' version.
One possible approach: there are various applications that will generate random pieces of code starting from a BNF grammar of a language (such as this one) and there are grammar files for javascript available.
That won't get you a static test case that you can script tests against with known expected results, necessarily, but might be a good way to test your parser against unexpected (but legal) strings and make sure it doesn't break.
This is so far the best test case I was able to come up with.
EDIT: Added regexp, and throw. This case is syntactically valid and should cover all cases of JS. Please message me directly if you find anything missing so that I can add it here.
a = 1;
b = { 'a' : a };
c = 'a';
d = this;
var patt1=/w3ghouls/i;
throw "Err3";
function e(a,b,c){
d += a + b + c++;
return d;
}
this.xy.z = function(a, b){
var x = null;
}
var f = function(a,b){
if(a == b || (b === a && a)){
var f = [a,b];
try{
f = f.slice(0);
}catch(e){
console.log(e * e + '');
}
}else if(a){
a = null;
a = undefined;
b = typeof a;
b = true;
b = false;
}else{
switch(c){
case 'c':
break;
default:
null;
break;
}
}
}
for(var i =0; i <= a.length; i++){
do{
continue;
null;
}while(a != b);
}
if(a == b)
(a) ? null : null;
/* This is a finished
test case */
A good way to start would be to run this through JSLint to see if your JavaScript is valid. It is the best checking tool I know of, but I'm not sure how well it will do to check if code broken. :(
Hope that helps.
Related
So I'm building a small app where you can evaluate some pieces of JavaScript code, but I'm having a huge "moral" problem:
Initially I wanted to use eval, but I found out about its dangers, so I quickly looked for an alternative.
The closest thing I could find was the function constructor, but for one thing it doesn't evaluate simple pieces of code, such as 2 + 3, since it needs a return statement, whereas eval doesn't, and it's also not that much better security-wise than eval (at least from what I've gathered).
Are there any other ways to evaluate a string as if it were code?
If you want to evaluate JavaScript code, use eval. Is it dangerous? Yes. But that's only because evaluating JavaScript is dangerous. There's no safe way to evaluate JavaScript. If you want to evaluate JavaScript, use eval.
Take every security precaution possible. It's impossible to know what security precautions you should take without knowing more details on what you want to support and how you plan to implement it.
This may be useful:
Is It Possible to Sandbox JavaScript Running In the Browser?
https://github.com/google/caja
You can easily make your own interpreter of JS in JS. I made such thing for www.Photopea.com (File - Scripts, I want to let users execute scripts over PSD documents).
Acorn is an advanced JS parser, which takes a string (JS code) and returns a syntax tree. Then, start at the root of the syntax tree and execute commands one by one.
"Jump" across the tree recursively. Use the JS call stack of the environment as a call stack of the interpreted code. Use JS objects {var1: ..., var2: ...} to store values of variables in each execution space (global, local in a function ...).
You can allow that code to access data from the outer environment through some interface, or make it completely sandboxed. I thought that making my own interpreter would take me a week, but I made it like in 6 hours :)
Please never ever use eval no matter what, there is a much better alternative. Instead of eval, use new function. eval is evil, there's no question about that, but most people skip over the most evil aspect of eval: it gives you access to variables in your local scope. Back in the 90's, back before the concept of JIST compilation, eval's sounded like a good idea (and they were): just insert some additional lines dynamically into the code you're already executing line-by-line. This also meant that evals didn't really slow things down all that much. However, now-a-days with JIST compilation eval statements are very taxing on JIST compilers which internally remove the concept of variable names entirely. For JIST compilers, in order to evaluate an eval statement, it has to figure out where all of its variables are stored, and match them with unknown globals found in the evaled statement. The problem extends even deeper if you get really technical.
But, with new function, the JIST compiler doesn't have to do any expensive variable name lookups: the entire code block is self-contained and in the global scope. For example, take the following terribly inefficient eval snippet. Please note that this is only for the purpose of being an example. In production code, you shouldn't even be using eval or new Function to generate a function from a string whose content is already known.
var a = {
prop: -1
};
var k = eval('(function(b){return a.prop + b;})');
alert( k(3) ); // will alert 2
Now, let's take a look at the much better new Function alternative.
var a = {
prop: -1
};
var k = (new Function('a', 'b', 'return a.prop + b')).bind(undefined, a);
alert( k(3) ); // will alert 2
Notice the difference? There is a major one: the eval is executed inside the local scope while the new Function is executed inside the global one.
Now, onto the next problem: security. There is a lot of talk about how security is difficult, and yes, with eval it is pretty much impossible (e.x. if you wrap the whole code in a sandboxing function, then all you have to do is prematurely end the function and start a new one to execute code freely in the current scope). But, with new Function, you can easily (but not the most efficiently) sandbox anything. Look at the following code.
var whitelist = ['Math', 'Number', 'Object', 'Boolean', 'Array'];
var blacklist = Object.getOwnPropertyNames(window).filter(function(x){
return whitelist.indexOf(x) === -1 && !/^[^a-zA-Z]|\W/.test(x)
});
var listlen = blacklist.length;
var blanklist = (new Array(listlen+1)).fill(undefined);
function sandboxed_function(){
"use-strict";
blacklist.push.apply(blacklist, arguments);
blacklist[blacklist.length-1] =
'"use-strict";' + arguments[arguments.length-1];
var newFunc = Function.apply(
Function,
blacklist
);
blacklist.length = listlen;
return newFunc.bind.apply(newFunc, blanklist);
}
Then, fiddle around with the whitelist, get it just the way you want it, and then you can use sandboxed_function just like new Function. For example:
var whitelist = ['Math', 'Number', 'Object', 'Boolean', 'Array'];
var blacklist = Object.getOwnPropertyNames(window).filter(function(x){
return whitelist.indexOf(x) === -1 && !/^[^a-zA-Z]|\W/.test(x)
});
var listlen = blacklist.length;
var blanklist = (new Array(listlen+1)).fill(undefined);
function sandboxed_function(){
"use-strict";
blacklist.push.apply(blacklist, arguments);
blacklist[blacklist.length-1] =
'"use-strict";' + arguments[arguments.length-1];
var newFunc = Function.apply(
Function,
blacklist
);
blacklist.length = listlen;
return newFunc.bind.apply(newFunc, blanklist);
}
var myfunc = sandboxed_function('return "window = " + window + "\\ndocument = " + document + "\\nBoolean = " + Boolean');
output.textContent = myfunc();
<pre id="output"></pre>
As for writing code to be runned under this strict sandbox, you may be asking, if window is undefined, how do I test for the existence of methods. There are two solutions to this. #1 is just simply to use typeof like so.
output.textContent = 'typeof foobar = ' + typeof foobar;
<div id="output"></div>
As you can see in the above code, using typeof will not throw an error, rather it will only just return undefined. The 2nd primary method to check for a global is to use the try/catch method.
try {
if (foobar)
output.textContent = 'foobar.constructor = ' + foobar.constructor;
else
output.textContent = 'foobar.constructor = undefined';
} catch(e) {
output.textContent = 'foobar = undefined';
}
<div id="output"></div>
So, in conclusion, I hope my code snippets gave you some insight into a much better, nicer, cleaner alternative to eval. And I hope I have aspired you to a greater purpose: snubbing on eval. As for the browser compatibility, while the sandboxed_function will run in IE9, in order for it to actually sandbox anything, IE10+ is required. This is because the "use-strict" statement is very essential to eliminating much of the sneaky sand-box breaking ways like the one below.
var whitelist = ['Math', 'Number', 'Object', 'Boolean', 'Array'];
var blacklist = Object.getOwnPropertyNames(window).filter(function(x){
return whitelist.indexOf(x) === -1 && !/^[^a-zA-Z]|\W/.test(x)
});
var listlen = blacklist.length;
var blanklist = (new Array(listlen+1)).fill(undefined);
function sandboxed_function(){
blacklist.push.apply(blacklist, arguments);
blacklist[blacklist.length-1] =
/*'"use-strict";' +*/ arguments[arguments.length-1];
var newFunc = Function.apply(
Function,
blacklist
);
blacklist.length = listlen;
return newFunc.bind.apply(newFunc, blanklist);
}
var myfunc = sandboxed_function(`return (function(){
var snatched_window = this; // won't work in strict mode where the this
// variable doesn't need to be an object
return snatched_window;
}).call(undefined)`);
output.textContent = "Successful broke out: " + (myfunc() === window);
<pre id="output"></pre>
One last final comment is that if you are going to allow event API's into your sandboxed environment, then you must be careful: the view property can be a window object, making it so you have to erase that too. There are several other things, but I would recommend researching thoroughly and exploring the objects in Chrome's console.
This question already has answers here:
What does the construct x = x || y mean?
(12 answers)
Closed 6 years ago.
In JavaScript I recently realized you could use the OR || logical operator for assignment, and I want to know if it's considered bad practice.
In particular I have some functions that have optional array input, if the input is null or undefined I should just set it to an empty array [], if it has content it should take the content.
I found that using the assignment using the OR operator handles that perfectly in a single line, it's clean. However, it feels like the kind of thing that might be considered bad practice, or may have some horrible pitfalls I'm not considering.
Another approach is a simple if check, which is fairly safe in general.
I want to know if using the || approach seen below has any pitfalls I'm not considering, although it works in this scenario I would appreciate knowing if it works well to keep using this in the future, or to stop using it altogether.
https://jsbin.com/nozuxiwawa/1/edit?js,console
var myArray = ['Some', 'Strings', 'Whatever'];
// Just assign using OR
var pathOne = function(maybeAnArray) {
var array = maybeAnArray || [];
console.log(array);
}
// Assign using IF
var pathTwo = function(maybeAnArray) {
var array = [];
// Covers null and undefined
if (maybeAnArray != null) {
array = maybeAnArray;
}
console.log(array);
}
console.log('Path one:');
pathOne(myArray); // ['Some', 'Strings', 'Whatever']
pathOne(null); // []
console.log('\nPath two:');
pathTwo(myArray); // ['Some', 'Strings', 'Whatever']
pathTwo(null); // []
IMHO the use of the OR || for the purposes of assignment is perfectly valid and is good practice. We certainly use it in our projects and I've seen it used in lots of 3rd party projects that we use.
The thing you need to be aware of is how certain JavaScript objects can be coerced to be other values. So for example, if you're ORing values such as "", false or 0 then they are treated as false... this means that when you have the following:
function f(o) {
var x = o || -1;
return x;
}
Calling:
f(0)
...will return -1... but calling
f(1)
Will return 1 ... even though in both cases you passed a number - because 0 is treated as false -1 is assigned to x.
...that said, as long as you're aware of how the OR operator will treat the operands that you use with it - then it is good JavaScript practice to use it.
i prefer the first option, it's clear for my eyes, but when i need to share my code with others will think about to use second, will be more clear for any.
Now i'm using sonar, and prefer the second option too, will more easy to comprend for machine in inegration works.
Last idea is to use
if(maybeAnArray !== void(0))
Two reasons:
use cast and type conditionals
void(0) will works same for all browsers
Expect it helps yopu
When given the option, I prefer concise code (which must still be readable).
I would say || is common enough that it is considered good practice. Once one has seen it a few times it reads just fine.
In my opinion there are few reasons why you should rather use the second option:
First of all it's much more readable - new developers that are still learning can have problems with understanding notation like var myArray = someArrayArg || [];
If you are using some kind of code checkers like JSLint, they will return warnings and/or errors like Expected a conditional expression and instead saw an assignment. for the statement with var myArray = someArrayArg || [];
We already have something like var myArray = someArrayArg ? someArrayArg : []; that works pretty well
I've got this error message, that I'm not a fan of.
Bad line breaking before '?'.
I feel like
var s = (a === b)
? 'one'
: 'two';
looks better. Crockford says:
Semicolon insertion can mask copy/paste errors. If you always break lines after operators, then JSLint can do a better job of finding those errors.
Can someone give me an example or two, of the kind of copy/paste errors he's referring to?
Update:
var s = (a === b)
? 'one'
: 'two';
looks better than
var s;
if(a === b) {
s = 'one';
} else {
s = 'two';
}
(As requested, my comments re-posted as an answer:)
The "obvious" copy/paste error in the example you show would be to copy the first line:
var s = (a === b)
...which of course is valid code on its own but clearly doesn't do the same thing as the three lines together. One would hope that people would look at surrounding code before copying one line, but you never know.
The point that I think Mr Crockford is trying to make is that if you deliberately split a multi-line expression up in a way that the individual lines are not valid code on their own, then if you accidentally copy just one line of the expression it will likely cause a syntax error when you paste it somewhere else. Which is good because syntax errors are reported by the browser and/or JSLint/JSHint, and so easier to find than the more subtle bugs created if you copy/paste a line that is valid on its own. So if you "always break lines after operators" as Crockford suggest:
var s = (a === b) ?
'one' :
'two';
...then the only line of the ternary that is valid code on its own (the third) doesn't really look complete, and so would be easier to spot as a mistake if pasted on its own because it so obviously doesn't do anything on its own - and it's less likely to be copied by itself in the first place for the same reason.
(Having said that, I don't stress about the ternary operator in my own code, and I think the above looks ugly. I put a short ternary expression on one line, a longer one over two lines with the line break after the middle operand and the : lined up under the ?, or a really long one on three lines like yours.)
The most (in)famous example is as follows:
function one() {
return
{
val: 1
};
}
alert(one()); // undefined
vs
function one() {
return {
val: 1
};
}
alert(one()); // [objet Object]
The type of copy-paste errors he's referring to are the ones where you hand your code off to someone else, or yourself in 6 months, and that other person haphazardly copies your code, ending on the closing paren of the condition, assuming that the assignment is meant to be the value of the evaluated right-hand side.
This seems implausible, and in a sense, you would hope that it is...
But I know that auto-insertion has borked code for my company multiple times, now, and they still haven't forced adoption of explicit semicolons, still treat JS as if new lines were significant and still make cut/paste errors, through neglect plus lack of tools/version-management/build-systems.
Say you paste a function expression in immediately before,
var a = 1, b = 1; // a === b, expect 'one'
(function(){
console.log('called');
})
(a === b)
? 'one'
: 'two'
// called
// "two"
Once, I saw an example like this:
var a, x, y;
var r = 10;
with (Math) {
a = PI * r * r;
x = r * cos(PI);
y = r * sin(PI / 2);
}
And it looks very convenience, because that way I don't have to type all the Math.s.
But when I take a look at the MDN, it says:
Using with is not recommended, and is forbidden in ECMAScript 5 strict mode. The recommended alternative is to assign the object whose properties you want to access to a temporary variable.
So is it okay to use with()? In HTML5?
The MDN you linked says Using with is not recommended...
with is an excellent way of making spaghetti code for lunch.
You might like it, but the guy that will need to debug it will curse you.
javascript has some very weird operators, like the comma operator(,).
Can you understand what the following code does?
var a = "a";
var b = "b";
a = [b][b = a,0];
Well it swaps a and b... You don't understand , so as the guy that will need maintain your with code. Don't use hacks, hacks are cool in charades games, not in real code.
When is the comma operator useful?
The comma swap Fiddle
It is okay to use any feature of JavaScript, so long as you understand it.
For example, using with you can access existing properties of an object, but you cannot create new ones.
Observe:
var obj = {a:1,b:2};
with(obj) {
a = 3;
c = 5;
}
// obj is now {a:3,b:2}, and there is a global variable c with the value 5
It can be useful for shortening code, such as:
with(elem.parentNode.children[elem.parentNode.children.length-3].lastChild.style) {
backgroundColor = "red";
color = "white";
fontWeight = "bold";
}
Because the properties of the style object already exist.
I hope this explanation is clear enough.
In his excellent book "Javascript: The Good Parts", Douglas Crockford lists the "with Statement" in Appendix B: The Bad Parts.
He says "Unfortunately its results can sometimes be unpredictable, so it should be avoided".
He goes on to give an example, where an assignment inside the with will operate on different variables depending on whether the object is defined or not.
See With statement considered harmful (but less detailed than the explanation in the book).
The main issue I'm thinking about is whether assigning a variable in an if statement is safe and reliable across different browsers. If it is safe, I'd like to use it.
Here it reads the querystring and if the querystring variable SN is either Twitter or Facebook then it enters the if and you can use the variable, if the querystring variable doesn't exist or is some other value then it goes into the else.
if(socialNetwork = (window.location.search.indexOf("SN=Twitter") > 0) ? "Twitter" : ((window.location.search.indexOf("SN=Facebook") > 0) ? "Facebook" : null))
{
alert(socialNetwork);
}
else
{
alert("nope");
}
It is part of the language design and should work in every browser, but it's very difficult to read.
That's ugly.
var uselessSocialNetworkingApp = window.location.search.replace(/.*\bSN=(\w+)\b.*/, "$1");
if (uselessSocialNetworkingApp)
alert("yay!");
else
alert("no");
It's kind-of funny that there'd be that hideous construction in the "if" header, but that it'd be an "if" instead of a "? :" expression inside the "alert" argument list :-)
Also, to be at least slightly sympathetic to the intended style, this is an example of what the "let" statement in ultra-modern Javascript is for.
Oh my! This is valid and should always work, assuming that you create the socialNetwork variable elsewhere, don't ever create implied globals. However, this is really a strange way to solve your problem. Why not create a function that returns the social network to abstract this a little?
That said, if you really want a one line solution, how about this?:
alert(function(){ var m = /SN=([A-Za-z]+)/.exec(window.location.search); return (m ? m[1] : null)}());
location.socialNetwork== (function(){
var s= location.search || '';
s= /SN=([a-zA-Z]+)/.exec(s) || [];
return s[1] || null;
})()
alert(location.socialNetwork)