Is there any way to do this? Or limit execution time of eval()(e.g. not more then 1 scecond)
You could try one of the minifiers, such as UglifyJS. They all include a parser which might be fairly easy to extract (UglifyJS contains a file called "parse-js.js", although I haven't looked at it in detail).
Related
My motive is to mangle variable and function names and also encrypt strings in a javascript file.
For this I only need to separate strings, comments, and variable/function names.
I've tried UglifyJs2 but I need more control on myself so I tried to write a lexer myself using Flex.
I'm able to take care of comments and quoted strings.
However I'm stuck in regular expression format for example /"/ -- a regular expression containing quotes causing correct parsing to fail.
Looks like to correctly identify a regular expression i'd need Bison parser using grammar rules otherwise comments, strings and regular expression get mixed up.
I don't want to get that far and use Bison.
One way is to move all regular expression code to another file in functions.
Is there any other alternative so that I can handle this in Flex itself?
If you can run JavaScript, you can use Esprima, a JavaScript parser coded in JavaScript. It can even run in your browser or any runtime like NodeJS.
It can output just tokens or abstract syntax trees. I believe that this should enough for you.
I'm working in a system where there is no document and no jQuery, but
I do have to present html entities in an understandable way. So the trick of putting the string in an element and then taking the .text() won't work.
I need a pure JavaScript solution. The system isn't reachable from the outside, there is no user-input so security is not really an issue.
Thanks for any help, I'm out of ideas (not that I had to many to begin with)...
Perhaps I should clarify, what I am looking for is a function (or pointers to get me pointing in the right direction) which is able to translate a string with substrings that should translate to characters. So it should be able to translate "blah < blahblah" into "blah < blahblah".
There are no additional frameworks I can use other than pure javascript.
UPDATE:
I've got the html4 part working, not extremely difficult, but I have been busy with other things. Here's the fiddle:html4 entities to characters.
You could have done the same with a dictionary with just the characters already in there, but I didn't feel like making such a dictionary. The function is fairly simple but I guess it could do with some refactoring, can't really be bothered at the moment...
This function exists in PHP (htmlspecialchars_decode). As such, you'll find a javascript port from PHPJS. This is based on a very established codebase, and should be better than rolling something on your own.
Edit / Add:
Flub on my part. I didn't read the entities part properly. You want the equiv of html_entity_decode:
http://phpjs.org/functions/html_entity_decode/
Assuming you are using nodejs, cheerio is exactly what you need. I have used it myself a couple of times with great success for off-browser testing of HTML structures returned from servers.
https://github.com/cheeriojs/cheerio
The most awesome part is that it uses jQuery API.
I'm not sure why Microsoft hasn't allowed an Intellisense-friendly environment for using native regex with JavaScript in WebMatrix, but here is what I see when I attempt an otherwise normal JavaScript function:
As you can see, while this is perfectly valid JavaScript (using native regex), WebMatrix's Intellisense for my .js file is showing more random colors than a kaleidoscope.
I probably shouldn't be complaining, since it works, but I would like to restore human-readability if I can. I have noticed that this hasn't been mentioned anywhere else before (that I can find), and I was wondering if there were a more aesthetically pleasing way to handle this, given the environment.
I've tried using something like new RegEx(/&/g) for the first argument in the replace function, but of course, it produces the same glitch.
I've also tried storing the regex in a string, but I don't think that is in the format the first argument expects, so no dice there either.
I am not a master at regular expressions, by any means, so I apologize if I am overlooking a simple workaround here.
Is there anything I can do to retain this function in a more human-readable way?
--------------------------UPDATE-----------------------------
I just noticed that the line input = input.replace(/'/g, ",") is actually replacing an apostrophe with a comma. Rather than reload the picture, I will just mention that here (the proper hex code for apostrophe should be ').
new RegExp(/&/g) does not really make sense as the /&/g already creates a RegExp object. You can use new RegExp('&', 'g') instead.
By the way, this is not the only problem there is with JavaScript Intellisense.
I need to split a JavaScript file into single instructions. For example
a = 2;
foo()
function bar() {
b = 5;
print("spam");
}
has to be separated into three instructions. (assignment, function call and function definition).
Basically I need to instrument the code, injecting code between these instructions to perform checks. Splitting by ";" wouldn't obviously work because you can also end instructions with newlines and maybe I don't want to instrument code inside function and class definitions (I don't know yet). I took a course about grammars with flex/Bison but in this case the semantic action for this rule would be "print all the descendants in the parse tree and put my code at the end" which can't be done with basic Bison I think. How do I do this? I also need to split the code because I need to interface with Python with python-spidermonkey.
Or... is there a library out there already which saves me from reinventing the wheel? It doesn't have to be in Python.
Why not use a JavaScript parser? There are lots, including a Python API for ANTLR and a Python wrapper around SpiderMonkey.
JavaScript is tricky to parse; you need a full JavaScript parser.
The DMS Software Reengineering Toolkit can parse full JavaScript and build a corresponding AST.
AST operators can then be used to walk over the tree to "split it". Even easier, however, is to apply source-to-source transformations that look for one surface syntax (JavaScript) pattern, and replace it by another. You can use such transformations to insert the instrumentation into the code, rather than splitting the code to make holds in which to do the insertions. After the transformations are complete, DMS can regenerate valid JavaScript code (complete with the orignal comments if unaffected).
Why not use an existing JavaScript interpreter like Rhino (Java) or python-spidermonkey (not sure whether this one is still alive)? It will parse the JS and then you can examine the resulting parse tree. I'm not sure how easy it will be to recreate the original code but that mostly depends on how readable the instrumented code must be. If no one ever looks at it, just generate a really compact form.
pyjamas might also be of interest; this is a Python to JavaScript transpiler.
[EDIT] While this doesn't solve your problem at first glance, you might use it for a different approach: Instead of instrumenting JavaScript, write your code in Python instead (which can be easily instrumented; all the tools are already there) and then convert the result to JavaScript.
Lastly, if you want to solve your problem in Python but can't find a parser: Use a Java engine to add comments to the code which you can then search for in Python to instrument the code.
Why not try a javascript beautifier?
For example http://jsbeautifier.org/
Or see Command line JavaScript code beautifier that works on Windows and Linux
Forget my parser. https://bitbucket.org/mvantellingen/pyjsparser is great and complete parser. I've fixed a couple of it's bugs here: https://bitbucket.org/nullie/pyjsparser
I have a web application that uses TONS of javascript, and as usual, there are a lot of textual constants that will be displayed to the user embedded in the code itself.
What do you think is the best way to make this localizable?
I know I need to take those strings off of the code and replace them with constants, which will be defined into some external place.
For the server side, ASP.Net provides some very neat capabilities for dealing with this.
What's the best to do this in Javascript?
The best idea I have is to have a JS file with ALL the string constants of the JS of the site (i'd have different copies of this, for each language), and then on each page, I include this script first, before all the others.
This seems like the most centralized way, that also wastes the least bandwidth.
Are there any other better approaches?
Thanks!
here's how we did it (in ASP.net), pretty much along the lines of what you've mentioned:
1) Created two javascript files: one which defines all javascript functions/DOM manipulations as required by the site, and, second called Messages.js: this defines all the string literals that need to be localized, something like var ALERT_MSG = "Alert message in english".
2) Created different version of the Messages.js, one for each locale that we are supporting and localized the strings. The localized js files were named using messages.locale.js naming convention (for eg. messages.fr-FR.js).
3) Included the js files within the "ScriptManager" and provided the ResourceUICultures for the Messages.js file: this ensures that the correct localized file is embedded in the html output (if you are not using ASP.net you can build this basic functionality by doing some culture sniffing and including the appropriate js file).
4) Voila!
Your approach makes sense. Details:
I'd have the strings for each language in an object.
localized={"cat":"chat","dog":"chien"};
Then in code:
localized["cat"]
The quotations around of the keys and the array notation (rather than the more common object dot notation) are to avoid collisions with JavaScript reserved words.
There is a gettext library but I haven't used it.
Your approach sounds good enough.
If you have lots of strings and you are concerned about the bulkiness of the file you may want to consider a script that creates a single javascript file for each language by concatenating the code javascript and the locale javascript and then applying something like Minify.
You'll waste some CPU cycles on publishing but you'll save some round trips...
There's a library for localizing JavaScript applications: https://github.com/wikimedia/jquery.i18n
The strings are stored in JSON files, as pretty much everybody else suggests, but it has a few more features:
It can do parameter replacement, supports gender (clever he/she handling), number (clever plural handling, including languages that have more than one plural form), and custom grammar rules that some languages need.
The only requirement is jQuery.