I have essentially the same question as PEG for Python style indentation, but I'd like to get a little more direction regarding this answer.
The answer successfully generates an array of strings that are each line of input with 'INDENT' and 'DEDENT' between lines. It seems like he's pretty much used PEG.js to tokenize, but no real parsing is happening.
So how can I extend his example to do some actual parsing?
As an example, how can I change this grammar:
start = obj
obj = id:id children:(indent obj* outdent)?
{
if (children) {
let o = {}; o[id] = children[1];
return o;
} else {
return id;
}
}
id = [a-z]
indent = '{'
outdent = '}'
to use indentation instead of braces to delineate blocks, and still get the same output?
(Use http://pegjs.majda.cz/online to test that grammar with the following input: a{bcd{zyx{}}})
Parser:
// do not use result cache, nor line and column tracking
{ var indentStack = [], indent = ""; }
start
= INDENT? l:line
{ return l; }
line
= SAMEDENT line:(!EOL c:. { return c; })+ EOL?
children:( INDENT c:line* DEDENT { return c; })?
{ var o = {}; o[line] = children; return children ? o : line.join(""); }
EOL
= "\r\n" / "\n" / "\r"
SAMEDENT
= i:[ \t]* &{ return i.join("") === indent; }
INDENT
= &(i:[ \t]+ &{ return i.length > indent.length; }
{ indentStack.push(indent); indent = i.join(""); pos = offset; })
DEDENT
= { indent = indentStack.pop(); }
Input:
a
b
c
d
z
y
x
Output:
{
"a": [
"b",
"c",
{
"d": [
"z",
"y",
"x"
]
}
]
}
It cannot parse an empty object (last x), however, it should be easy to solve. Trick here is the SAMEDENT rule, it succeeds when indentation level hasn't changed. INDENT and DEDENT change current indentation level without changing position in text pos = offset.
Update 2021
Here is a working example which runs in the online playground of Peggy.js. Peggy.js is a fork of PEG.js under active development. PEG.js was discontinued by David Maida.
The example shows, how the INDENT, SAMEDENT and DEDENT rules are parsed, and how to use parsing locations. Check the console log.
It uses these syntaxes, which may not be known from other parser generators:
(top of file)
{{...}} (Global initializer) – Run ... on parser generation.
{...} (Per-parse initializer) – Run ... on parser instantiation.
(in-file)
X {...} (action) – Do ... when X succeeds. Variables from the initializers are available. If ... returns something, it will replace what X returns.
$X – Return the raw text parsed with X, instead of the result of X.
... #X ... (pluck operator) – Replace the result of ... X ... with the result of X.
X &{...} (predicate) – "and ... also needs to be true for X to succeed".
X = &(...) – If ... succeeds, X succeeds. ... consumes no input.
See the docs for more information.
{{
console.clear()
console.log('Parser generated')
}}
{
let indentstack = []
let indent = ''
function found (what) {
let loc = location()
console.log(`[${loc.start.line}:${loc.start.column} - ${loc.end.line}:${loc.end.column}] found ${what}`)
}
console.log('Parser instantiated')
}
DOCUMENT = NEWLINES? #THINGS NEWLINES? _
THINGS = ( SAMEDENT #( OBJECT / LINE ) )*
OBJECT = key:KEY childs:(BLOCK / INLINE) {
found(`object "${key}"`)
let o = {}
o[key] = childs
return o
}
KEY = #$( [^ \t\r\n:]+ ) _ ':' _
BLOCK = NEWLINES INDENT #THINGS DEDENT
INLINE = line:LINE { return [line] }
LINE = text:$( (!EOL .)+ ) NEWLINES? {
found(`line "${text}"`)
return text
}
INDENT = &(
spaces:$( [ \t]+ ) &{
return spaces.length > indent.length
} {
indentstack.push(indent)
indent = spaces
}
) {
found('indent')
}
SAMEDENT = spaces:$( [ \t]* ) &{
return spaces === indent
} {
found('samedent')
}
/* Because of this rule, results cache must be disabled */
DEDENT = &{
indent = indentstack.pop()
return true
} {
found('dedent')
}
_ = [ \t]*
EOL = '\r\n' / '\n' / '\r'
NEWLINES = (_ EOL)+
/* Test with this input
H:
a
b
c
G:
d
e
f
*/
Old Answer
Here is a fix for #Jakub Kulhan´s grammar which works in PEG.js v 0.10.0. The last line needs to be changed to = &{ indent = indentStack.pop(); return true;} because PEG.js now does not allow standalone actions ({...}) in a grammar anymore. This line is now a predicate (&{...}) which always succeeds (return true;).
i also removed the pos = offset; because it gives an error offset is not defined. Probably Jakub was referring to some global variable available in older versions of PEG.js. PEG.js now provides the location() function which returns an object which contains offset and other information.
// do not use result cache, nor line and column tracking
{ var indentStack = [], indent = ""; }
start
= INDENT? l:line
{ return l; }
line
= SAMEDENT line:(!EOL c:. { return c; })+ EOL?
children:( INDENT c:line* DEDENT { return c; })?
{ var o = {}; o[line] = children; return children ? o : line.join(""); }
EOL
= "\r\n" / "\n" / "\r"
SAMEDENT
= i:[ \t]* &{ return i.join("") === indent; }
INDENT
= &(i:[ \t]+ &{ return i.length > indent.length; }
{ indentStack.push(indent); indent = i.join(""); })
DEDENT
= &{ indent = indentStack.pop(); return true;}
Starting with v 0.11.0 PEG.js also supports the Value Plucking operator, # which would allow writing this grammar even simpler, but as it is currently not in the online parser I will refrain from adding it to this example.
Related
I'm trying to convert some math related strings containing absolute values, using Regex in Javascript.
I would like to convert all occurences of |foo| to abs(foo).
How can I detect if the character is opening or closing, given that they could also be nested?
Basically I would like to convert all occurrences of opening | to abs( and all closing | to ). Whatever is between the vertical bars is unchanged.
Some examples of possible input and desired output:
|x|+12 abs(x)+12
|x|+12+|x+2| abs(x)+12+abs(x+2)
|x|+|x+|z|| abs(x)+abs(x+abs(z))
Any ideas?
There are regex dialects that support nesting, JavaScript is not one of them. You can however do this in steps:
tag the |s with nesting level (+1, -1, as you go from left to right)
identify start and end | of same level from left to right based on tags, and from lowest level to highest level
clean up left over tags in case of unbalanced input
Functional code with test cases up to 3 levels (the code works to any level) :
function fixAbs(str) {
const startTag = '{{s%L%}}';
const endTag = '{{e%L%}}';
const absRegex = /\{\{s(\d+)\}\}(.*?)\{\{e\1\}\}/g;
let level = 0;
str = str
.replace(/ /g, '') // remove all spaces
.replace(/(\|*)?(\w+)(\|*)?/g, function(m, c1, c2, c3) {
// regex matches variables with all leading and trailing `|`s
let s = c2;
if(c1) {
// add a start tag to each leading `|`: `{{s0}}`, `{{s1}}`, ...
// and post-increase level
s = '';
for(let i = 0; i < c1.length; i++) {
s += startTag.replace(/%L%/, level++);
}
s += c2;
}
if(c3) {
// decrease level,
// and add a end tag to each trailing `|`: `{{e2}}`, `{{e1}}`, ...
for(let i = 0; i < c3.length; i++) {
s += endTag.replace(/%L%/, --level);
}
}
return s;
});
// find matching start and end tag from left to right,
// repeat for each level
while(str.match(absRegex)) {
str = str.replace(absRegex, function(m, c1, c2, c3) {
return 'abs(' + c2 + ')';
});
}
// clean up tags in case of unbalanced input
str = str.replace(/\{\{[se]-?\d+\}\}/g, '|');
return str;
}
const testCases = [
'|x|+12',
'|x|+|y+|z||',
'|x|+||y|+z|',
'|x|+|x+|y|+z|',
'|x|+|x+|y+|t||+z|',
'|x|+12+|2+x|',
'|x|+12+|x+2|'
].forEach(str => {
let result = fixAbs(str);
console.log('"' + str + '" ==> "' + result + '"');
});
Output:
"|x|+12" ==> "abs(x)+12"
"|x|+|y+|z||" ==> "abs(x)+abs(y+abs(z))"
"|x|+||y|+z|" ==> "abs(x)+abs(abs(y)+z)"
"|x|+|x+|y|+z|" ==> "abs(x)+abs(x+abs(y)+z)"
"|x|+|x+|y+|t||+z|" ==> "abs(x)+abs(x+abs(y+abs(t))+z)"
"|x|+12+|2+x|" ==> "abs(x)+12+abs(2+x)"
"|x|+12+|x+2|" ==> "abs(x)+12+abs(x+2)"
Code is annotated with comments for clarity.
This is based on a TWiki blog at https://twiki.org/cgi-bin/view/Blog/BlogEntry201109x3
Write a function that reverses characters in (possibly nested) parentheses in the input string.
Input strings will always be well-formed with matching ()s.
For inputString = "(bar)", the output should be
reverseInParentheses(inputString) = "rab";
For inputString = "foo(bar)baz", the output should be
reverseInParentheses(inputString) = "foorabbaz";
For inputString = "foo(bar(baz))blim", the output should be
reverseInParentheses(inputString) = "foobazrabblim".
[input] string inputString
A string consisting of lowercase English letters and the characters ( and ). It is guaranteed that all parentheses in inputString form a regular bracket sequence.
Guaranteed constraints:
0 ≤ inputString.length ≤ 50.
[output] string
Return inputString, with all the characters that were in parentheses reversed.
My Solution
Java Script
function reverseInParentheses(inputString) {
let arr = inputString
let start = arr.indexOf(')') < arr.lastIndexOf('(') ? arr.indexOf('(') : arr.lastIndexOf('(')
let end = arr.indexOf(')')
let temp = arr.substring(start + 1, end)
if(start !== -1 && end !== -1){
return reverseInParentheses(arr.substring(0, start) +
[...temp].reverse().join('') +
arr.substring(end + 1))
}
return arr
}
Problem
I am passing all cases except for final hidden case, no runtime or execution time limit error is being returned. So I am having trouble figuring out what scenario is causing the fail. I really want to use my own solution instead of copying the regex ones and in my mind this solution should work, perhaps a more experienced mind can show my folly. Thanks in advance.
The problem is that your calculation of start and end really don't work. And there's no simple fix to this problem.
The comment from Jonas Wilms suggests trying '((see)(you))'. For this test case, you will get start and end like this:
0 5
((see)(you))
^ ^
start ----' '---- end
Note that the start and end are not an actual pair here. There's another '(' in between.
You can fix this up by doing a more sophisticated calculation of these values, by iterating through the characters, updating start every time you hit a '(' and updating end when you hit a ')', then stopping.
That might look like this:
function reverseInParentheses(inputString) {
let arr = inputString
let i = 0, start = 0, end = -1
while (end < start && i < arr.length) {
if (arr[i] == '(') {start = i}
if (arr[i] == ')') {end = i}
i++
}
let temp = arr.substring(start + 1, end)
if(start !== -1 && end !== -1){
return reverseInParentheses(arr.substring(0, start) +
[...temp].reverse().join('') +
arr.substring(end + 1))
}
return arr
}
console .log (reverseInParentheses('(bar)'))
console .log (reverseInParentheses('foo(bar)baz'))
console .log (reverseInParentheses('foo(bar(baz))blim'))
console .log (reverseInParentheses('((see)(you))'))
I don't particularly like this, combining the iteration to find the parentheses with recursion to keep reapplying the function until there are none left. It feels awkward.
There are other solutions, as you noted. One would be to use regular expressions. Note that the language of balanced parentheses is not a regular language, and hence cannot be captured by any one regular expression, but you can repeatedly apply regular expression operations in an iteration or a recursion to get this to work. Here is one version of that.
const rev = ([...cs]) => cs.reverse().join('')
const reverseInParentheses = (s) =>
/\(([^)]*)\)/ .test (s)
? reverseInParentheses (s .replace(/(.*)\(([^)]*)\)(.*)/, (_, a, b, c) => a + rev(b) + c))
: s
console .log (reverseInParentheses('(bar)'))
console .log (reverseInParentheses('foo(bar)baz'))
console .log (reverseInParentheses('foo(bar(baz))blim'))
console .log (reverseInParentheses('((see)(you))'))
Briefly, this finds innermost pairs of parentheses, replaces them with the reversal of their content, then recurs on the result, bottoming out when there are no more pairs found.
This solution was thrown together, and there are probably better regular expressions operations available.
But I actually prefer a different approach altogether, treating the characters of the string as events for a simple state machine, with a stack of nested parenthesized substrings. Here is what I wrote:
const reverseInParentheses = ([c, ...cs], res = ['']) =>
c == undefined
? res [0]
: c == '('
? reverseInParentheses (cs, [...res, ''])
: c == ')'
? reverseInParentheses (cs, [...res.slice(0, -2), res[res.length - 2] + [...res[res.length - 1]].reverse().join('')])
: reverseInParentheses (cs, [...res.slice(0, -1), res[res.length - 1] + c])
console .log (reverseInParentheses('(bar)'))
console .log (reverseInParentheses('foo(bar)baz'))
console .log (reverseInParentheses('foo(bar(baz))blim'))
console .log (reverseInParentheses('((see)(you))'))
We can examine the behavior by adding this as the first line of the body expression:
console .log (`c: ${c ? `"${c}"` : '< >'}, cs: "${cs.join('')}", res: ["${res.join('", "')}"]`) ||
For '((see)(you))', we would get something like this:
curr (c)
remaining (cs)
stack (res)
"("
"(see)(you))"
[""]
"("
"see)(you))"
["", ""]
"s"
"ee)(you))"
["", "", ""]
"e"
"e)(you))"
["", "", "s"]
"e"
")(you))"
["", "", "se"]
")"
"(you))"
["", "", "see"]
"("
"you))"
["", "ees"]
"y"
"ou))"
["", "ees", ""]
"o"
"u))"
["", "ees", "y"]
"u"
"))"
["", "ees", "yo"]
")"
")"
["", "ees", "you"]
")"
""
["", "eesuoy"]
< >
< >
["yousee"]
I choose to process this state machine recursively, because I prefer working with immutable data, not reassigning variables, etc. But this technique should work equally well with an iterative approach.
String reverseInParentheses(String inputString) {
//recursion
int start = -1;
int end = -1 ;
for(int i = 0; i < inputString.length(); i++){
if(inputString.charAt(i) == '('){
start = i;
}
if(inputString.charAt(i) == ')'){
end = i;
String reverse = new StringBuilder(inputString.substring(start+1, end)).reverse().toString();
return reverseInParentheses(inputString.substring(0, start) + reverse+ inputString.substring(end+1));
}
}
return inputString;
}
function solution(inputString) {
let s;
let e = 0;
while (e < inputString.length) {
//if we saw a ')', we mark the index as e, then we go back the
//nearest '(', and mark the index as s
if (inputString[e] === ')') {
s = e;
while (inputString[s] !== '(') {
s--;
}
//get the string in the parenthesis
let beforeRevert = inputString.slice(s + 1, e);
//revert it
let reversed = beforeRevert.split('').reverse().join('');
//put pieces together to get a new inputString
inputString = inputString.slice(0, s) + reversed +
inputString.slice(e + 1, inputString.length)
//because we get rid of the '(' and ')', now we are at index e-1 of
//new inputString
e--;
} else {
e++;
}
}
return inputString;
}
You could try this. It worked for me.
function solution(s) {
while (true) {
let c = s.indexOf(")");
if (c === -1) {
break;
}
let o = s.substring(0, c).lastIndexOf("(");
let start = s.substring(0, o);
let middle = s.substring(o + 1, c).split("").reverse().join("");
let end = s.substring(c + 1, s.length);
s = start + middle + end;
}
return s;
}
My string have a two part and separated by /
I want left side string of slash accept any string except "HAHAHA" end of word
And right side string of slash accept any string and allow use "HAHAHA" in end of string
only by Regular Expression and match function to return result parts
For example:
Accept : fooo/baarHAHAHA
Reject : fooHAHAHA/baaar
I want if string have one part, for example baarHAHAHA, accept but result like this:
string: baarHAHAHA
Group1: empty
Group2: baarHAHAHA
Have any idea?
You can try
^(\w*?)(?<!HAHAHA)\/?(\w+)$
Explanation of the above regex:
^, $ - Represents start and end of the line respectively.
(\w*?) - Represents first capturing group capturing the word characters([a-zA-Z0-9_]) zero or more times lazily.
(?<!HAHAHA) - Represents a negative look-behind not matching if the first captured group contains HAHAHA at the end.
\/? - Matches / literally zero or one time.
(\w+) - Represents second capturing group matching word characters([0-9a-zA-Z_]) one or more times.
You can find the demo of the above regex in here.
const regex = /^(\w*?)(?<!HAHAHA)\/?(\w+)$/gm;
const str = `
fooo/baarHAHAHA
fooHAHAHA/baaar
/baar
barHAHAHA
`;
let m;
let resultString = "";
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
if(m[1] === "")resultString = resultString.concat(`GROUP 1: empty\nGROUP 2: ${m[2]}\n`);
else resultString = resultString.concat(`GROUP 1: ${m[1]}\nGROUP 2: ${m[2]}\n`);
}
console.log(resultString);
You don't need regex for this, which is good since it is quite slow. A simple string.split() should be enough to separate the parts. Then you can just check if the word contains "HAHAHA" with the string.endsWith() method.
const a = 'fooHAHAHA/bar';
const b = 'foo/bar';
const c = 'fooHAHAHA';
console.log(a.split('/')); // Array [ 'fooHAHAHA', 'bar' ]
console.log(b.split('/')); // Array [ 'foo', 'bar' ]
console.log(c.split('/')); // Array [ 'fooHAHAHA' ]
// therefore ...
function splitMyString(str) {
const strSplit = str.split('/');
if (strSplit.length > 1) {
if (strSplit[0].endsWith('HAHAHA')) {
return ''; // or whatever you want to do if it gets rejected ...
}
}
return str;
}
console.log('a: ', splitMyString(a)); // ''
console.log('b: ', splitMyString(b)); // foo/bar
console.log('c: ', splitMyString(c)); // fooHAHAHA
Alternative non-regex solution:
const a = 'fooHAHAHA/bar';
const b = 'foo/bar';
const c = 'fooHAHAHA';
function splitMyString(str) {
const separator = str.indexOf('/');
if (separator !== -1) {
const firstPart = str.substring(0, separator);
if (firstPart.endsWith('HAHAHA')) {
return ''; // or whatever you want to do if it gets rejected ...
}
}
return str;
}
console.log('a: ', splitMyString(a)); // ''
console.log('b: ', splitMyString(b)); // foo/bar
console.log('c: ', splitMyString(c)); // fooHAHAHA
var str, re;
function match(rgx, str) {
this.str = str;
this.patt = rgx
var R = [], r;
while (r = re.exec(str)) {
R.push({
"match": r[0],
"groups": r.slice(1)
})
}
return R;
}
str = `
fooo/baarHAHAHA
fooHAHAHA/baaar
/baar
barHAHAHA
barr/bhHAHAHA
`;
re = /(?<=\s|^)(.*?)\/(.*?HAHAHA)(?=\s)/g;
console.log(match(re, str))
Reference:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
Edit: When I make this code I think to letting user to call the str and when call it, it will return the mactheds and groups. But, if I make like this.str = str and have return too, this.str will be declined.
I am trying to generate a syntax tree, for a given string with simple math operators (+, -, *, /, and parenthesis).
Given the string "1 + 2 * 3":
It should return an array like this:
["+",
[1,
["*",
[2,3]
]
]
]
I made a function to transform "1 + 2 * 3" in [1,"+",2,"*",3].
The problem is: I have no idea to give priority to certain operations.
My code is:
function isNumber(ch){
switch (ch) {
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
case '.':
return true;
break;
default:
return false;
break;
}
}
function generateSyntaxTree(text){
if (typeof text != 'string') return [];
var code = text.replace(new RegExp("[ \t\r\n\v\f]", "gm"), "");
var codeArray = [];
var syntaxTree = [];
// Put it in its on scope
(function(){
var lastPos = 0;
var wasNum = false;
for (var i = 0; i < code.length; i++) {
var cChar = code[i];
if (isNumber(cChar)) {
if (!wasNum) {
if (i != 0) {
codeArray.push(code.slice(lastPos, i));
}
lastPos = i;
wasNum = true;
}
} else {
if (wasNum) {
var n = Number(code.slice(lastPos, i));
if (isNaN(n)) {
throw new Error("Invalid Number");
return [];
} else {
codeArray.push(n);
}
wasNum = false;
lastPos = i;
}
}
}
if (wasNum) {
var n = Number(code.slice(lastPos, code.length));
if (isNaN(n)) {
throw new Error("Invalid Number");
return [];
} else {
codeArray.push(n);
}
}
})();
// At this moment, codeArray = [1,"+",2,"*",3]
return syntaxTree;
}
alert('Returned: ' + generateSyntaxTree("1 + 2 * 3"));
The way to do a top down parser, if not using FLEX/BISON or any other similar package is to first write a tokenizer that can parse input and serve tokens.
Basically you need a tokenizer that provides getNextToken, peekNextToken and skipNextToken.
Then you work your way down using this structure.
// parser.js
var input, currToken, pos;
var TOK_OPERATOR = 1;
var TOK_NUMBER = 2;
var TOK_EOF = 3;
function nextToken() {
var c, tok = {};
while(pos < input.length) {
c = input.charAt(pos++);
switch(c) {
case '+':
case '-':
case '*':
case '/':
case '(':
case ')':
tok.op = c;
tok.type = TOK_OPERATOR;
return tok;
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
tok.value = c;
tok.type = TOK_NUMBER;
return tok;
default:
throw "Unexpected character: " + c;
}
}
tok.type = TOK_EOF;
return tok;
}
function getNextToken() {
var ret;
if(currToken)
ret = currToken;
else
ret = nextToken();
currToken = undefined;
return ret;
}
function peekNextToken() {
if(!currToken)
currToken = nextToken();
return currToken;
}
function skipNextToken() {
if(!currToken)
currToken = nextToken();
currToken = undefined;
}
function parseString(str) {
input = str;
pos = 0;
return expression();
}
function expression() {
return additiveExpression();
}
function additiveExpression() {
var left = multiplicativeExpression();
var tok = peekNextToken();
while(tok.type == TOK_OPERATOR && (tok.op == '+' || tok.op == '-') ) {
skipNextToken();
var node = {};
node.op = tok.op;
node.left = left;
node.right = multiplicativeExpression();
left = node;
tok = peekNextToken();
}
return left;
}
function multiplicativeExpression() {
var left = primaryExpression();
var tok = peekNextToken();
while(tok.type == TOK_OPERATOR && (tok.op == '*' || tok.op == '/') ) {
skipNextToken();
var node = {};
node.op = tok.op;
node.left = left;
node.right = primaryExpression();
left = node;
tok = peekNextToken();
}
return left;
}
function primaryExpression() {
var tok = peekNextToken();
if(tok.type == TOK_NUMBER) {
skipNextToken();
node = {};
node.value = tok.value;
return node;
}
else
if(tok.type == TOK_OPERATOR && tok.op == '(') {
skipNextToken();
var node = expression(); // The beauty of recursion
tok = getNextToken();
if(tok.type != TOK_OPERATOR || tok.op != ')')
throw "Error ) expected";
return node
}
else
throw "Error " + tok + " not exptected";
}
As you can see, you start by requesting the least privileged operation, which requires the next higher privileged operation as its left and right term and so on. Unary operators has a little different structure. The neat thing is the recursion at the end when a parenthesis is encountered.
Here is a demo page that uses the parser and renders the parse-tree (had the code for it laying around...)
<html>
<head>
<title>tree</title>
<script src="parser.js"></script>
</head>
<body onload="testParser()">
<script>
function createTreeNode(x, y, val, color) {
var node = document.createElement("div");
node.style.position = "absolute";
node.style.left = "" + x;
node.style.top = "" + y;
node.style.border= "solid";
node.style.borderWidth= 1;
node.style.backgroundColor= color;
node.appendChild(document.createTextNode(val));
return node;
};
var yStep = 24;
var width = 800;
var height = 600;
var RED = "#ffc0c0";
var BLUE = "#c0c0ff";
container = document.createElement("div");
container.style.width = width;
container.style.height = height;
container.style.border = "solid";
document.body.appendChild(container);
var svgNS = "http://www.w3.org/2000/svg";
function renderLink(x1, y1, x2, y2)
{
var left = Math.min(x1,x2);
var top = Math.min(y1,y2);
var width = 1+Math.abs(x2-x1);
var height = 1+Math.abs(y2-y1);
var svg = document.createElementNS(svgNS, "svg");
svg.setAttribute("x", left);
svg.setAttribute("y", top);
svg.setAttribute("width", width );
svg.setAttribute("height", height );
var line = document.createElementNS(svgNS,"line");
line.setAttribute("x1", (x1 - left) );
line.setAttribute("x2", (x2 - left) );
line.setAttribute("y1", (y1 - top) );
line.setAttribute("y2", (y2 - top) );
line.setAttribute("stroke-width", "1");
line.setAttribute("stroke", "black");
svg.appendChild(line);
var div = document.createElement("div");
div.style.position = "absolute";
div.style.left = left;
div.style.top = top;
div.style.width = width;
div.style.height = height;
div.appendChild(svg);
container.appendChild(div);
}
function getHeight(dom) {
var h = dom.offsetHeight;
return h;
}
function getWidth(dom) {
var w = dom.offsetWidth;
return w;
}
function renderTree(x, y, node, width, height)
{
if(height < 1.5*yStep)
height = 1.5*yStep;
var val;
if(node.op) {
val = node.op;
color = BLUE;
}
else
if(node.value) {
val = node.value;
color = RED;
}
else
val = "?";
var dom = createTreeNode(x, y, val, color);
container.appendChild(dom);
var w = getWidth(dom);
var h = getHeight(dom);
var nx, ny;
var child;
if(node.left) {
nx = x - width/2;
ny = y+height;
var child = renderTree(nx, ny, node.left, width/2, height/2);
renderLink(x+w/2, y+h, nx+getWidth(child)/2, ny);
}
if(node.right) {
nx = x + width/2;
ny = y+height;
child = renderTree(nx, ny, node.right, width/2, height/2);
renderLink(x+w/2, y+h, nx+getWidth(child)/2, ny);
}
return dom;
}
var root;
function testParser()
{
var str = "1+2*5-5*(9+2)";
var exp = document.createElement("div");
exp.appendChild(document.createTextNode(str));
container.appendChild(exp);
var tree = parseString(str);
renderTree(width/2, 20, tree, width/2, 4*yStep);
}
</script>
</body>
</html>
The thing to do is to use a parser generator like flex or ANTLR (searching at google will find one for your language).
But if you are doing this for fun or to learn how parsers work, look up wikipedia for recursive descent parser.
A simple recursive descent parser can be easily made for simple expressions like this. You can define the grammar as:
<expression> ::= <term> | <term> <add_op> <expression>
<term> ::= <factor> | <factor> <mul_op> <term>
<factor> ::= ( <expression> ) | <number>
<add_op> ::= + | -
<mul_op> ::= * | /
Notice that by making the rule for <term> contain the rule for <factor> this grammar makes sure all multiplication/division operations occur lower in the parse tree than any addition/subtraction. This ensures those operations are evaluated first.
Similar to approach in other answers, here is another recursive implementation. It has the following distinctive characteristics:
It produces the nested array structure that is described in the question.
It supports signed numbers, so that -1 (without intermediate space) can be interpreted as a literal, not necessarily as an operator.
It supports unary minus, such as the first minus in this example: -(-1). It would also accept the string - -1 or --1, ...etc.
It supports decimal numbers with a mandatory digit before the decimal point.
It uses a regular expression to identify tokens. This will match number literals as one token, and any other, single non white space character.
Throws an error when there is a syntax validation error, with an indication where in the input string the error occurred.
The supported grammar can be described as:
<literal> ::= [ '-' ] <digit> { <digit> } [ '.' { <digit> } ] ; no white space allowed
<operator2> ::= '*' | '/'
<operator1> ::= '+' | '-'
<factor> ::= '-' <factor> | '(' <expression> ')' | <literal>
<term> ::= [ <term> <operator2> ] <factor>
<expression> ::= [ <expression> <operator1> ] <term>
Precedence is given to match the minus sign as part of a <literal> when possible.
Interactive snippet
function parse(s) {
// Create a closure for the two variables needed to iterate the input:
const
get = ((tokens, match=tokens.next().value) =>
// get: return current token when it is of the required group, and move forward,
// else if it was mandatory, throw an error, otherwise return undefined
(group, mandatory) => {
if (match?.groups[group] !== undefined)
return [match?.groups[group], match = tokens.next().value][0];
if (mandatory)
throw `${s}\n${' '.repeat(match?.index ?? s.length)}^ Expected ${group}`;
}
)( // Get iterator that matches tokens with named capture groups.
s.matchAll(/(?<number>(?:(?<![\d.)]\s*)-)?\d+(?:\.\d*)?)|(?<open>\()|(?<close>\))|(?<add>\+|(?<unary>-))|(?<mul>[*\/])|(?<end>$)|\S/g)
),
// node: Creates a tree node from given operation
node = (operation, ...values) => [operation, values],
// Grammar rules implementation, using names of regex capture groups, returning nodes
factor = (op=get("unary")) =>
op ? node(op, factor()) : get("open") ? expr("close") : +get("number", 1),
term = (arg=factor(), op=get("mul")) =>
op ? term(node(op, arg, factor())) : arg,
expr = (end, arg=term(), op=get("add")) =>
op ? expr(end, node(op, arg, term())) : (get(end, 1), arg);
return expr("end");
}
// I/O Management
const [input, output] = document.querySelectorAll("input, pre");
(input.oninput = () => {
try {
output.textContent = JSON.stringify(parse(input.value), null, 2)
} catch(err) {
output.textContent = err;
}
})();
input { width: 100%; margin-bottom: 10px; }
Math expression: <input value="1 + 2 * 3">
<pre></pre>
Explanations
tokens is an iterator over the input based on a regular expression. The regex has a look-behind assertion to ensure that the minus -- if present -- is not a binary operator, and can be included in the match of the numerical literal. The regex defines named groups, so that the code can rely on names and doesn't have to refer to literal characters.
get uses this iterator to get the next token in a shared variable (match) and return the previous one. get takes an argument to specify which named group is expected to have a match. If this is indeed the case, the next token be read, otherwise get checks whether the match was mandatory. If so, an exception is thrown, otherwise the function returns undefined, so the caller can try another grammar rule.
term, factor and expr implement the grammar rules with the corresponding names. They rely on get (with argument) to decide which way to go in the grammar rules. These functions all return trees (root nodes).
node constructs a node in the output tree, bottom up. If nodes in the tree should be something different than arrays, or some reduction should be performed (merging nodes) then this is the function to change.
Have you read up on the theory behind parsers? Wikipedia (as always) has some good articles to read:
LR parser
Recursive descent parser
I built a fun little calculator once and had the same problem as you, which I solved by
building the syntax tree without keeping the order precedence in mind,firstly. Each node has a precedence value, and when eval'ing non-constants, I'd check the left node: if it has lower precedence, I'd rotate the tree clockwise: bring it into evaluation and evaluate that first, likewise for the right node. then I'd just try to evaluate again. It seemed to work well enough for me.
I want to create a string in JavaScript that contains all ascii characters. How can I do this?
var s = ' !"#$%&\'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~';
My javascript is a bit rusty, but something like this:
s = '';
for( var i = 32; i <= 126; i++ )
{
s += String.fromCharCode( i );
}
Not sure if the range is correct though.
Edit:
Seems it should be 32 to 127 then. Adjusted.
Edit 2:
Since char 127 isn't a printable character either, we'll have to narrow it down to 32 <= c <= 126, in stead of 32 <= c <= 127.
Just loop the character codes and convert each to a character:
var s = '';
for (var i=32; i<=127;i++) s += String.fromCharCode(i);
Just wanted to put this here for reference. (takes about 13/100 to 26/100 of a ms on my computer to generate).
var allAsciiPrintables = JSON.stringify((Array.from(Array(126 + 32).keys()).slice(32).map((item) => {
return String.fromCharCode(item);
})).join(''));
Decomposed:
var allAsciiPrintables = (function() {
/* ArrayIterator */
var result = Array(126 + 32).keys();
/* [0, 126 + 32] */
result = Array.from(result);
/* [32, 126 + 32] */
result = result.slice(32);
/* transform each item from Number to its ASCII as String. */
result = result.map((item) => {
return String.fromCharCode(item);
});
/* convert from array of each string[1] to a single string */
result = result.join('');
/* create an escaped string so you can replace this code with the string
to avoid having to calculate this on each time the program runs */
result = JSON.stringify(result);
/* return the string */
return result;
})();
The most efficient solution(if you do want to generate the whole set each time the script runs, is probably)(takes around 3/100-35/100 of a millisecond on my computer to generate).
var allAsciiPrintables = (() => {
var result = new Array(126-32);
for (var i = 32; i <= 126; ++i) {
result[i - 32] = (String.fromCharCode(i));
}
return JSON.stringify(result.join(''));
})();
strangely, this is only 3-10 times slower than assigning the string literal directly(with backticks to tell javascript to avoid most backslash parsing).
var x;
var t;
t = performance.now();
x = '!\"#$%&\'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~';
t = performance.now() - t;
console.log(t);
.
This is a version written in python. Gives all ASCII characters in order as a single string.
all_ascii = ''.join(chr(k) for k in range(128)) # 7 bits
all_chars = ''.join(chr(k) for k in range(256)) # 8 bits
printable_ascii = ''.join(chr(k) for k in range(128) if len(repr(chr(k))) == 3)
>>> print(printable_ascii)
' !"#$%&\'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~'
The last string here, printable_ascii contains only those characters that contain no escapes (i.e. have length == 1). The chars like: \x05, \x06 or \t, \n which does not have its own glyph in your system's font, are filtered out.
len(repr(chr(k))) == 3 includes 2 quotes that come from repr call.
Without doing several appends:
var s = Array.apply(null, Array(127-32))
.map(function(x,i) {
return String.fromCharCode(i+32);
}).join("");
document.write(s);
Here is an ES6 one liner:
asciiChars = Array.from({ length: 95 }, (e, i) => String.fromCharCode(i + 32)).join('');
console.log(asciiChars)
let str = '';// empty string declear
for( var i = 32; i <= 126; i++ )
{
str = str + String.fromCharCode( i ); /* this method received one integer and
convert it into a ascii characters and store it str variable one by one by using
string concatenation method. The loop start for 32 and end 126 */
}
Here is a version in coffeescript
require 'fluentnode'
all_Ascii = ->
(String.fromCharCode(c) for c in [0..255])
describe 'all Ascii', ->
it 'all_Ascii', ->
all_Ascii.assert_Is_Function()
all_Ascii().assert_Size_Is 256
all_Ascii()[0x41].assert_Is 'A'
all_Ascii()[66 ].assert_Is 'B'
all_Ascii()[50 ].assert_Is '2'
all_Ascii()[150 ].assert_Is String.fromCharCode(150)