Why is the Alternative symbol of the ECMAScript RegExp grammar left recursive? - javascript

I cannot for the life of me figure out why Alternative is left recursive. It really throws a wrench into my parser.
Alternative ::
[empty]
Alternative Term
Here is a note in the semantics portion of the spec that is not exactly clear. Maybe the reasoning would be revealed once I understand this?
NOTE Consecutive Terms try to
simultaneously match consecutive
portions of the input String. If the
left Alternative, the right Term, and
the sequel of the regular expression
all have choice points, all choices in
the sequel are tried before moving on
to the next choice in the right Term,
and all choices in the right Term are
tried before moving on to the next
choice in the left Alternative.
What kind of parser can properly handle a left recursive grammar?

Because for certain types of parser left-recursion is much better (e.g. for yacc - see section 6.2 here for an explanation).
If it's causing trouble for your particular parser, then by all means swap it over - it doesn't affect the definition of the language in any way.

Related

Improvement of JS Regex to restrict all letters of a word in a specific range

I'm solving the Ranges challenge in RegexGolf, but I'm somewhat stuck in trying to shorten the regex.
Here is a screenshot of the conditions -
My current solution is \b[a-f]+\b. This pattern has the required range [a-f] in a word boundary. While this works, the regex has 10 characters, and the result list shows submissions with 8, and even 1 character.
Would appreciate any insights on improving this regex.
First please note that shorter doesn't necessarily means better, faster or better readable. But as this is a golfing challenge:
This site seems to handle every input as a separate string. While the word boundaries you are using are fine, using start and end of string anchors (^ and $) will be 1 character shorter each. I don't see how it could be minimized further, so your regex could be
^[a-f]+$
Note: One of the 1-score solutions comments, that i dont know regex but i know javascript, so I'd guess that there was some cheating involved.

Unicode 'not perpendicular' symbol

According to this source, we have a neat way to show parallel, not parallel and perpendicular symbols. What I'm looking for though, would be crossed perpendicular symbol, or 'not perpendicular' symbol. I know it's very rare, but usage can be easily justified from math point of view, hence my question.
But it seems like it's not within Unicode character list.
How can I add the symbol and render it with HTML and JavaScript.
(Turning my comment into an answer.)
You can use Unicode's combining characters, e.g:
U+27C2 (PERPENDICULAR) followed by either
U+0338 (COMBINING LONG SOLIDUS OVERLAY):
⟂̸
or U+0337 (COMBINING SHORT SOLIDUS OVERLAY):
⟂̷
Or some other character that you find fitting.
Naturally this will only work with supportive fonts and proper renderers (i.e. browsers in this case), but then again font limitations can always cause problems, with or without combining characters.

Logical Expression Visualizer

I've developed a business rule engine that users can write rules in boolean syntax.
For example rules are: R1, R2, R3
Sample Expression: (R1 AND R2) OR R3
I want to visualize this expression. For example, visualization framework may display the expression in a tree view and insert colors.
Is there any javascript or any other code framework to achieve this?
(Application is an ASP.NET application)
I can't help but answer this one even though my answer may not help you easily solve your problem. Back in 1998, my very first Javascript project was precisely a boolean expression visualizer.
The code is not available anywhere, so I can't share it. (I doubt even my former employer still has a copy.) And even it it was, it ran on IE4, 5.0 and 5.5; I don't think it was ever updated for IE6, and don't know if it ran there.
But I can still tell you the basic ideas, and even today, I'm still fairly proud of the results, although I know I would shudder to see the actual code.
Of course a boolean expression can easily be represented by a tree structure. Each non-leaf node was either an AND, an OR, or a NOT node in the tree, and the ANDs and ORs could have multiple children (so I represented ("A and B and C and D" as AND(A, B, C, D), not just as combinations of binary ANDs.) To display the data, I simply used nested boxes. ANDs ran horizontally, ORs ran vertically, with the keyword "and" and "or" repeated between the blocks. NOT was just a box in a box with the keyword "not" in the outer one.
My leaf nodes were associated with real data scenarios that the user could use for testing, so instead of just "A" and "B", they looked, for instance, like
age < 30
gender = 'F'
income > 40000
The user could enter sample data for fields age, gender, and income and the output would change to a red-green display to show whether each block of the expression, and of course the entire expression was true or false.
The fields to use were configurable, and the test cases were saved for future elaboration.
This was a very fun project, and it helped in communication between business people who were writing rules, and programmers who implemented them, groups who often had very different ideas of how one might use the word "and" in polite company. :-)
But the main points are that one very useful way to visualize boolean expression is with simple boxes: NOT is a box in a box, with word "not" in the outer one. ORs are boxes containing vertically grouped boxes with "or" in between, and ANDs are boxes containing horizontally grouped boxes with "and" in between. If you can actually assign truthy/falsey values to your primitives, then green for truthy boxes, red for falsey ones makes for a very compelling display.
...
But you'll have to write your own code. Sorry.

"Javascript, The Good Parts": Railroad Diagrams

I'm reading "Javascript, The Good Parts" by Douglas Crockford, and having a difficulty understanding the use of all the railroad diagrams. He also doesn't elaborate much on this. He just says the following (on pg. 21):
The rules for interpreting these diagrams are simple:
You start on the left edge and follow the tracks to the right edge.
As you go, you will encounter literals in ovals, and rules or descriptions in rectangles.
Any sequence that can be made by following the tracks is legal.
Any sequence that cannot be made by following the tracks is not legal.
Railroad diagrams with one bar at each end allow whitespace to be inserted between any pair of tokens. Railroad diagrams with two bars at each end do not.
I am aware that this book is considered to be fundamental read for anyone who's really serious about Javascript, and I would very much like to understand the concepts he's addressing. But something just isn't clicking about the whole railroad diagram thing.
Could anyone explain his use of the railroad diagrams? Examples would be great.
This IBM page probably has the simplest explanation.
The Wikipedia page offers more info in how to construct them.
Railroad diagrams (Syntax diagrams, http://en.wikipedia.org/wiki/Syntax_diagram) are a graphical way to explain a grammar. If all you want to do is understand a railroad diagram, understand that you start at the left, and follow the line (track). And when you encounter a symbol/name, you go follow that track, until it is done, and then come back where you left off.
Also, reading about BNF and EBNF (Extended? Backus-Naur Formalism, http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form) which is a formal way of describing a language grammar, using a set of productions, or rewrite rules. BNF/EBNF work the same as railroad diagrams, but using symbolic notation, the ::= production symbol, and a more formal/mathematical way to document a grammar.
I am also reading this book. It takes me a long time, but finally understand Railroad Diagrams.
First, as #ChuckCottrill mentioned, you should have a basic acknowledge about Syntax Diagrams and BNF/EBNF. But after reading that, it still confused me until I compare three graphs of different situation:
zero or more, zero or one, one or more
To understand their differences (as the following picture shows), the point is
"You start on the left edge and follow the tracks to the right edge."
So imagine you are the train, you just turn right, cannot turn left.
the above picture created by http://bottlecaps.de/rr/
In the "Edit Grammar" tab, input the following grammar:
zeroormore ::= element*
zeroorone ::= element?
oneormore ::= element+

String similarity [duplicate]

I'm building a website that should collect various news feeds and would like the texts to be compared for similarity. What i need is some sort of a news text similarity algorithm.
I know that php has the similar_text function and am not sure how good it is + i need it for javascript.
So if anyone could point me to an example or a plugin or any instruction on how this is possible or at least where to look and start investigating.
There's a javascript implementation of the Levenshtein distance metric, which is often used for text comparisons. If you want to compare whole articles or headlines though you might be better off looking at intersections between the sets of words that make up the text (and frequencies of those words) rather than just string similarity measures.
The question whether two texts are similar is a philosophical one as long as you don't specify exactly what it should mean. Consider the Strings "house" and "mouse". Seen from a semantic level they are not very similar, but they are very similar regarding their "physical appearance", because only one letter is different (and in this case you could go by Levenshtein distance).
To decide about similarity you need an appropriate text representation. You could – for instance – extract and count all n-grams and compare the two resulting frequency-vectors using a similarity measure as e.g. cosine similarity. Or you could stem the words to their root form after having removed all stopwords, sum up their occurrences and use this as input for a similarity measure.
There are plenty approaches and papers about that topic, e.g. this one about short texts. In any case: The higher the abstraction level where you want to decide if two texts are similar the more difficult it will get. I think your question is a non-trivial one (and hence my answer rather abstract) ... ;-)

Categories