Find all string variables in javascript files

Find all string variables in javascript files - javascript

i am facing the problem that i have to translate a larger html and javascript project into several languages. The html content was no problem, but the numerous javascript files are problematic, since i was a bit lazy during the development process. For instance, if i needed a message text, i just added it in the concerning position.
My approach now is, that i am using a build-in file search (Eclipse) for every occurrence of " and ', which i am getting line-wise. This would be extremely time consuming and errors are unavoidable.
Here are some examples that occur in the files:
var d = "Datum: " + d.getDate()+"."+(d.getMonth()+1)+"."+d.getFullYear();
showYesNoDialog("heading text","Are you sure?",function(){};
Sometimes i am mixing " and ', sometimes a string goes over several lines:
var list="";
list+='<li data-role="list-divider">';
list+='Text To Translate';
list+='</li>';
Things i don't want to get, are jquery selectors, e.g.:
$("input[name^=checkbox]").each(function () {};
Do you see any time saving method to get all of the strings that i would like to translate?
Regex? A java interpreter? Grep?
I know, that is a bit unusual question.
So any suggestion would be great.
Thanks in advance!

It is better to use some kind of the lexical scanner that converts the code into the tokens and then walk over the list of tokens (or syntax tree). There is a number of such tools (I even created one of them myself - here you can find some of the examples https://github.com/AlexAtNet/spelljs/blob/master/test/scan.js).
With it you can scan the JS file and just iterate over the tokens:
var scan = require('./..../scan.js');
scan(text).forEach(function (item) {
if (item.str) {
console.log(item);
}
});

Related

Javascript Bookmarklet Unresponsive

Javascript newb here. Creating a bookmarklet to automate a simple task at work. Mostly a learning exercise. It will scan a transcript on CNN.com, for instance: (http://transcripts.cnn.com/TRANSCRIPTS/1302/28/acd.01.html). It will grab the lead stories at the top of the page, the name and title of the guests on the show, and format them so that they can be copy pasted into another document.
I've come up with a simple version that includes some jQuery that grabs the subheading and then uses a regular expression to find the names of the guests (it will also exclude everything between (begin videoclip) and (end videoclip), but I haven't gotten that far yet. It then alerts them (will eventually print them in a pop-up window, alert is just for troubleshooting purposes).
I'm using http://benalman.com/code/test/jquery-run-code-bookmarklet/ to create the bookmarklet. My problem is that once the bookmarklet is created it is completely unresponsive. Click on it and nothing happens. I've tried minimizing the code first with no result. My guess is that cnn.com's javascript is conflicting with mine but I'm not sure how to get around that. Or do I need to include some code to load and store the text on the current page? Here's the code (I've included comments, but I took these out when I used the bookmarklet generator.) Thanks for any help!
//Grabs the subheading
var leadStories=$(".cnnTransSubHead").text();
//Scans the webpage for guest name and title. Includes a regular expression to find any
//string that starts with a capital letter, includes a comma, and ends in a colon.
var scanForGuests=/[A-Z ].+,[A-Z0-9 ].+:/g;
//Joins the array created by scanForGuests with a semicolon instead of a comma
var guests=scanForGuests.join(‘; ‘);
//Creates an alert in the proper format including stories and guests.
alert(“Lead Stories: “ + leadStories + “. ” + guests + “. SEE TRANSCRIPT FIELD FOR FULL TRANSCRIPT.“)

Go to the page. Open up developer tools (ctrl+shift+j in chrome) and paste your code in the console to see what's wrong.
The $ in var leadStories = $(".cnnTransSubHead").text(); is from jQuery and the link provided does not have jQuery loaded into the page.
On any modern browser you should be able to achieve the same results without jQuery:
var leadStories = document.getElementsByClassName('cnnTransSubHead')
.map(function(el) { return el.innerText } );
next we have:
var scanForGuests=/[A-Z ].+,[A-Z0-9 ].+:/g;
var guests=scanForGuests.join('; ');
scanForGuests IS a regular expression, you never actually matched it to anything - so .join() is going to throw an error. I'm not exactly sure what you're trying to do. Are you trying to scan the full text of the page for that regex? In that case something like this would be your best bet
document.body.innerText.match(scanForGuests);
keep in mind that while innerText removes html markup, it's far from perfect and what pops up in it is very much at the mercy of how the page's html is structured. That said, on my quick test it seems to work.
Finally, for something like this you should use an immediately invoked function or you're sticking all your variables into the global context.
So putting it all together you get something like this:
(function() {
var leadStories = document.getElementsByClassName('cnnTransSubHead')
.map(function(el) { return el.innerText } );
var scanForGuests=/[A-Z ].+,[A-Z0-9 ].+:/g;
var guests = document.body.innerText.match(scanForGuests).join("; ");
alert("Leads: " + leadStories + " Guests: " + guests);
})();

Javascript creating an HTML object vs creating an HTML string

Pretty simple question that I couldn't find an answer to, maybe because it's a non-issue, but I'm wondering if there is a difference between creating an HTML object using Javascript or using a string to build an element. Like, is it a better practice to declare any HTML elements in JS as JS objects or as strings and let the browser/library/etc parse them? For example:
jQuery('<div />', {'class': 'example'});
vs
jQuery('<div class="example></div>');
(Just using jQuery as an example, but same question applies for vanilla JS as well.)
It seems like a non-issue to me but I'm no JS expert, and I want to make sure I'm doing it right. Thanks in advance!

They're both "correct". And both are useful at different times for different purposes.
For instance, in terms of page-speed, these days it's faster to just do something like:
document.body.innerHTML = "<header>....big string o' html text</footer>";
The browser will spit it out in an instant.
As a matter of safety, when dealing with user-input, it's safer to build elements, attach them to a documentFragment and then append them to the DOM (or replace a DOM node with your new version, or whatever).
Consider:
var userPost = "My name is Bob.<script src=\"//bad-place.com/awful-things.js\"></script>",
paragraph = "<p>" + userPost + "</p>";
commentList.innerHTML += paragraph;
Versus:
var userPost = "My name is Bob.<script src=\"//bad-place.com/awful-things.js\"></script>",
paragraph = document.createElement("p");
paragraph.appendChild( document.createTextNode(userPost) );
commentList.appendChild(paragraph);
One does bad things and one doesn't.
Of course, you don't have to create textNodes, you could use innerText or textContent or whatever (the browser will create the text node on its own).
But it's always important to consider what you're sharing and how.
If it's coming from anywhere other than a place you trust (which should be approximately nowhere, unless you're serving static pages, in which case, why are you building html?), then you should keep injection in mind -- only the things you WANT to be injected should be.

Either can be preferable depending on your particular scenario—ie, if everything is hard-coded, option 2 is probably better, as #camus said.
One limitation with the first option though, is that this
$("<div data-foo='X' />", { 'class': 'example' });
will not work. That overload expects a naked tag as the first parameter with no attributes at all.
This was reported here

1/ is better if your attribubes depends on variables set before calling the $ function , dont have to concatenate strings and variables. Aside from that fact ,since you can do both , and it's just some js code somebody else wrote , not a C++ DOM API hardcoded in the browser...

Javascript - get strings inside a string

var str = '<div part="1">
<div>
...
<p class="so">text</p>
...
</div>
</div><span></span>';
I got a long string stored in var str, I need to extract the the strings inside div part="1". Can you help me please?

you could create a DOM element and set its innerHTML to your string.
Then you can iterate through the childNodes and read the attributes you want ;)
example
var str = "<your><html>";
var node = document.createElement("div");
node.innerHTML = str;
for(var i = 0; i < node.childNodes.length; i++){
console.log(node.childNodes[i].getAttribute("part"));
}

If you're using a library like JQuery, this is trivially easy without having to go through the horrors of parsing HTML with regex.
Simply load the string into a JQuery object; then you'll be able to query it using selectors. It's as simple as this:
var so = $(str).find('.so');
to get the class='so' elememnt.
If you want to get all the text in part='1', then it would be this:
var part1 = $(str).find('[part=1]').text();
Similar results can be achieved with Prototype library, or others. Without any library, you can still do the same thing using the DOM, but it'll be much harder work.
Just to clarify why it's a bad idea to do this sort of thing in regex:
Yes, it can be done. It is possible to scan a block of HTML code with regex and find things within the string.
However, the issue is that HTML is too variable -- it is defined as a non-regular language (bear in mind that the 'reg' in 'regex' is for 'regular').
If you know that your HTML structure is always going to look the same, it's relatively easy. However if it's ever going to be possible that the incoming HTML might contain elements or attributes other than the exact ones you're expecting, suddenly writing the regex becomes extremely difficult, because regex is designed for searching in predictable strings. When you factor in the possibility of being given invalid HTML code to parse, the difficulty factor increases even more.
With a lot of effort and good understanding of the more esoteric parts of regex, it can be done, with a reasonable degree of reliability. But it's never going to be perfect -- there's always going to be the possibility of your regex not working if it's fed with something it doesn't expect.
By contrast, parsing it with the DOM is much much simpler -- as demonstrated, with the right libraries, it can be a single line of code (and very easy to read, unlike the horrific regex you'd need to write). It'll also be much more efficient to run, and gives you the ability to do other search operations on the same chunk of HTML, without having to re-parse it all again.

Techniques to avoid building HTML strings in JavaScript?

It is very often I come across a situation in which I want to modify, or even insert whole blocks of HTML into a page using JavaScript. Usually it also involves changing several parts of the HTML dynamically depending on certain parameters.
However, it can make for messy/unreadable code, and it just doesn't seem right to have these little snippets of HTML in my JavaScript code, dammit.
So, what are some of your techniques to avoid mixing HTML and JavaScript?

The Dojo toolkit has a quite useful system to deal with HTML fragments/templates. Let's say the HTML snippet mycode/mysnippet.tpl.html is something like the following
<div>
<span dojoAttachPoint="foo"></span>
</div>
Notice the dojoAttachPoint attribute. You can then make a widget mycode/mysnippet.js using the HTML snippet as its template:
dojo.declare("mycode.mysnippet", [dijit._Widget, dijit._Templated], {
templateString: dojo.cache("mycode", "mysnippet.tpl.html"),
construct: function(bar){
this.bar = bar;
},
buildRendering: function() {
this.inherited(arguments);
this.foo.innerHTML = this.bar;
}
});
The HTML elements given attach point attributes will become class members in the widget code. You can then use the templated widget like so:
new mycode.mysnippet("A cup of tea would restore my normality.").placeAt(someOtherDomElement);
A nice feature is that if you use dojo.cache and Dojo's build system, it will insert the HTML template text into the javascript code, so that the client doesn't have to make a separate request.
This may of course be way too bloated for your use case, but I find it quite useful - and since you asked for techniques, there's mine. Sitepoint has a nice article on it too.

There are many possible techniques. Perhaps the most obvious is to have all elements on the page but have them hidden - then your JS can simply unhide them/show them as required. This may not be possible though for certain situations. What if you need to add a number (unspecified) of duplicate elements (or groups of elements)? Then perhaps have the elements in question hidden and using something like jQuery's clone function insert them as required into the DOM.
Alternatively if you really have to build HTML on the fly then definitely make your own class to handle it so you don't have snippets scattered through your code. You could employ jQuery literal creators to help do this.

I'm not sure if it qualifies as a "technique", but I generally tend to avoid constructing blocks of HTML in JavaScript by simply loading the relevant blocks from the back-end via AJAX and using JavaScript to swap them in and out/place them as required. (i.e.: None of the low-level text shuffling is done in JavaScript - just the DOM manipulation.)
Whilst you of course need to allow for this during the design of the back-end architecture, I can't help but think to leads to a much cleaner set up.

Sometimes I utilise a custom method to return a node structure based on provided JSON argument(s), and add that return value to the DOM as required. It ain't accessible once JS is unavailable like some backend solutions could be.

After reading some of the responses I managed to come up with my own solution using Python/Django and jQuery.
I have the HTML snippet as a Django template:
<div class="marker_info">
<p> _info_ </p>
more info...
</div>
In the view, I use the Django method render_to_string to load the templates as strings stored in a dictionary:
snippets = { 'marker_info': render_to_string('templates/marker_info_snippet.html')}
The good part about this is I can still use the template tags, for example, the url function. I use simplejson to dump it as JSON and pass it into the full template. I still wanted to dynamically replace strings in the JavaScript code, so I wrote a function to replace words surrounded by underscores with my own variables:
function render_snippet(snippet, dict) {
for (var key in dict)
{
var regex = new RegExp('_' + key + '_', 'gi');
snippet = snippet.replace(regex, dict[key]);
}
return snippet;
}

JavaScript multiline strings and templating?

I have been wondering if there is a way to define multiline strings in JavaScript like you can do in languages like PHP:
var str = "here
goes
another
line";
Apparently this breaks up the parser. I found that placing a backslash \ in front of the line feed solves the problem:
var str = "here\
goes\
another\
line";
Or I could just close and reopen the string quotes again and again.
The reason why I am asking because I am making JavaScript based UI widgets that utilize HTML templates written in JavaScript. It is painful to type HTML in strings especially if you need to open and close quotes all the time. What would be a good way to define HTML templates within JavaScript?
I am considering using separate HTML files and a compilation system to make everything easier, but the library is distributed among other developers so that HTML templates have to be easy to include for the developers.

No thats basically what you have to do to do multiline strings.
But why define the templates in javascript anwyay? why not just put them into a file and have a ajax call load them up in a variable when you need them?
For instantce (using jquery)
$.get('/path/to/template.html', function(data) {
alert(data); //will alert the template code
});

#slebetman, Thanks for the detailed example.
Quick comment on the substitute_strings function.
I had to revise
str.replace(n,substitutions[n]);
to be
str = str.replace(n,substitutions[n]);
to get it to work. (jQuery version 1.5? - it is pure javascript though.)
Also when I had below situation in my template:
$CONTENT$ repeated twice $CONTENT$ like this
I had to do additional processing to get it to work.
str = str.replace(new RegExp(n, 'g'), substitutions[n]);
And I had to refrain from $ (regex special char) as the delimiter and used # instead.
Thought I would share my findings.

There are several templating systems in javascript. However, my personal favorite is one I developed myself using ajax to fetch XML templates. The templates are XML files which makes it easy to embed HTML cleanly and it looks something like this:
<title>This is optional</title>
<body><![CDATA[
HTML content goes here, the CDATA block prevents XML errors
when using non-xhtml html.
<div id="more">
$CONTENT$ may be substituted using replace() before being
inserted into $DOCUMENT$.
</div>
]]></body>
<script><![CDATA[
/* javascript code to be evaled after template
* is inserted into document. This is to get around
* the fact that this templating system does not
* have its own turing complete programming language.
* Here's an example use:
*/
if ($HIDE_MORE$) {
document.getElementById('more').display = 'none';
}
]]></script>
And the javascript code to process the template goes something like this:
function insertTemplate (url_to_template, insertion_point, substitutions) {
// Ajax call depends on the library you're using, this is my own style:
ajax(url_to_template, function (request) {
var xml = request.responseXML;
var title = xml.getElementsByTagName('title');
if (title) {
insertion_point.innerHTML += substitute_strings(title[0],substitutions);
}
var body = xml.getElementsByTagName('body');
if (body) {
insertion_point.innerHTML += substitute_strings(body[0],substitutions);
}
var script = xml.getElementsByTagName('script');
if (script) {
eval(substitute_strings(script[0],substitutions));
}
});
}
function substitute_strings (str, substitutions) {
for (var n in substitutions) {
str.replace(n,substitutions[n]);
}
return str;
}
The way to call the template would be:
insertTemplate('http://path.to.my.template', myDiv, {
'$CONTENT$' : "The template's content",
'$DOCUMENT$' : "the document",
'$HIDE_MORE$' : 0
});
The $ sign for substituted strings is merely a convention, you may use % of # or whatever delimiters you prefer. It's just there to make the part to be substituted unambiguous.
One big advantage to using substitutions on the javascript side instead of server side processing of the template is that this allows the template to be plain static files. The advantage of that (other than not having to write server side code) is that you can then set the caching policy for the template to be very aggressive so that the browser only needs to fetch the template the first time you load it. Subsequent use of the template would come from cache and would be very fast.
Also, this is a very simple example of the implementation to illustrate the mechanism. It's not what I'm using. You can modify this further to do things like multiple substitution, better handling of script block, handle multiple content blocks by using a for loop instead of just using the first element returned, properly handling HTML entities etc.
The reason I really like this is that the HTML is simply HTML in a plain text file. This avoids quoting hell and horrible string concatenation performance issues that you'll usually find if you directly embed HTML strings in javascript.

I think I found a solution I like.
I will store templates in files and fetch them using AJAX. This works for development stage only. For production stage, the developer has to run a compiler once that compiles all templates with the source files. It also compiles JavaScript and CSS to be more compact and it compiles them to a single file.
The biggest problem now is how to educate other developers doing that. I need to build it so that it is easy to do and understand why and what are they doing.

You could also use \n to generate newlines. The html would however be on a single line and difficult to edit. But if you generate the JS using PHP or something it might be an alternative

We Keep Coding

JavaScript is the programming language of the Web.