Issue with a Jison Grammar, Strange error from generate dparser

Issue with a Jison Grammar, Strange error from generate dparser - javascript

I am writing a simple Jison grammar in order to get some experience before starting a more complex project. I tried a simple grammar which is a comma separated list of numeric ranges, with ranges where the beginning and ending values were the same to use a single number shorthand. However, when running the generated parser on some test input I get an error which doe snot make alot of sense to me. Here is the grammar i came up with:
/* description: Parses end executes mathematical expressions. */
/* lexical grammar */
%lex
%%
\s+ /* skip whitespace */
[0-9]+ {return 'NUMBER'}
"-" {return '-'}
"," {return ','}
<<EOF>> {return 'EOF'}
. {return 'INVALID'}
/lex
/* operator associations and precedence */
%start ranges
%% /* language grammar */
ranges
: e EOF
{return $1;}
;
e : rng { $$ = $1;}
| e ',' e {alert('e,e');$$ = new Array(); $$.push($1); $$.push($3);}
;
rng
: NUMBER '-' NUMBER
{$$ = new Array(); var rng = {Start:$1, End: $3; }; $$.push(rng); }
| NUMBER
{$$ = new Array(); var rng = {Start:$1, End: $1; }; $$.push(rng);}
;
NUMBER: {$$ = Number(yytext);};
The Test input is this:
5-10,12-16
The output is:
Parse error on line 1:
5-10,12-16
^
Expecting '-', 'EOF', ',', got '8'
If it put an 'a' at the front i get and expected error about finding "INVALID" but i dont have an "8" in the input string so i wondering if this is an internal state?
I am using the online parser generator at: http://zaach.github.io/jison/try/
thoughts?

This production is confusing Jison (and it confused me, too :) ):
NUMBER: {$$ = Number(yytext);};
NUMBER is supposed to be a terminal, but the above production declares it as a non-terminal with an empty body. Since it can match nothing, it immediately matches, and your grammar doesn't allow two consecutive NUMBERs. Hence the error.
Also, your grammar is ambiguous, although I suppose Jison's default will solve the issue. It would be better to be explicit, though, since it's easy. Your rule:
e : rng
| e ',' e
does not specify how , "associates": in other words, whether rng , rng , rng should be considered as e , rng or rng , e. The first one is probably better for you, so you should write it explicitly:
e : rng
| e ',' rng
One big advantage of the above is that you don't need to create a new array in the second production; you can just push $3 onto the end of $1 and set $$ to $1.

Related

iPhone stops running Javascript with exec() [duplicate]

Is there a way to achieve the equivalent of a negative lookbehind in JavaScript regular expressions? I need to match a string that does not start with a specific set of characters.
It seems I am unable to find a regex that does this without failing if the matched part is found at the beginning of the string. Negative lookbehinds seem to be the only answer, but JavaScript doesn't has one.
This is the regex that I would like to work, but it doesn't:
(?<!([abcdefg]))m
So it would match the 'm' in 'jim' or 'm', but not 'jam'

Since 2018, Lookbehind Assertions are part of the ECMAScript language specification.
// positive lookbehind
(?<=...)
// negative lookbehind
(?<!...)
Answer pre-2018
As Javascript supports negative lookahead, one way to do it is:
reverse the input string
match with a reversed regex
reverse and reformat the matches
const reverse = s => s.split('').reverse().join('');
const test = (stringToTests, reversedRegexp) => stringToTests
.map(reverse)
.forEach((s,i) => {
const match = reversedRegexp.test(s);
console.log(stringToTests[i], match, 'token:', match ? reverse(reversedRegexp.exec(s)[0]) : 'Ø');
});
Example 1:
Following #andrew-ensley's question:
test(['jim', 'm', 'jam'], /m(?!([abcdefg]))/)
Outputs:
jim true token: m
m true token: m
jam false token: Ø
Example 2:
Following #neaumusic comment (match max-height but not line-height, the token being height):
test(['max-height', 'line-height'], /thgieh(?!(-enil))/)
Outputs:
max-height true token: height
line-height false token: Ø

Lookbehind Assertions got accepted into the ECMAScript specification in 2018.
Positive lookbehind usage:
console.log(
"$9.99 €8.47".match(/(?<=\$)\d+\.\d*/) // Matches "9.99"
);
Negative lookbehind usage:
console.log(
"$9.99 €8.47".match(/(?<!\$)\d+\.\d*/) // Matches "8.47"
);
Platform support:
✔️ V8
✔️ Google Chrome 62.0
✔️ Microsoft Edge 79.0
✔️ Node.js 6.0 behind a flag and 9.0 without a flag
✔️ Deno (all versions)
✔️ SpiderMonkey
✔️ Mozilla Firefox 78.0
🛠️ JavaScriptCore: support in beta as of 2023-02-16: the feature has been merged
✔️ Apple Safari 16.4 (in beta as of 2023-02-16)
✔️ iOS 16.4 beta WebView (all browsers on iOS + iPadOS)
✔️ Bun 0.2.2
❌ Chakra: Microsoft was working on it but Chakra is now abandoned in favor of V8
❌ Internet Explorer
❌ Edge versions prior to 79 (the ones based on EdgeHTML+Chakra)

Let's suppose you want to find all int not preceded by unsigned :
With support for negative look-behind:
(?<!unsigned )int
Without support for negative look-behind:
((?!unsigned ).{9}|^.{0,8})int
Basically idea is to grab n preceding characters and exclude match with negative look-ahead, but also match the cases where there's no preceeding n characters. (where n is length of look-behind).
So the regex in question:
(?<!([abcdefg]))m
would translate to:
((?!([abcdefg])).|^)m
You might need to play with capturing groups to find exact spot of the string that interests you or you want to replace specific part with something else.

Mijoja's strategy works for your specific case but not in general:
js>newString = "Fall ball bill balll llama".replace(/(ba)?ll/g,
function($0,$1){ return $1?$0:"[match]";});
Fa[match] ball bi[match] balll [match]ama
Here's an example where the goal is to match a double-l but not if it is preceded by "ba". Note the word "balll" -- true lookbehind should have suppressed the first 2 l's but matched the 2nd pair. But by matching the first 2 l's and then ignoring that match as a false positive, the regexp engine proceeds from the end of that match, and ignores any characters within the false positive.

Use
newString = string.replace(/([abcdefg])?m/, function($0,$1){ return $1?$0:'m';});

You could define a non-capturing group by negating your character set:
(?:[^a-g])m
...which would match every m NOT preceded by any of those letters.

This is how I achieved str.split(/(?<!^)#/) for Node.js 8 (which doesn't support lookbehind):
str.split('').reverse().join('').split(/#(?!$)/).map(s => s.split('').reverse().join('')).reverse()
Works? Yes (unicode untested). Unpleasant? Yes.

following the idea of Mijoja, and drawing from the problems exposed by JasonS, i had this idea; i checked a bit but am not sure of myself, so a verification by someone more expert than me in js regex would be great :)
var re = /(?=(..|^.?)(ll))/g
// matches empty string position
// whenever this position is followed by
// a string of length equal or inferior (in case of "^")
// to "lookbehind" value
// + actual value we would want to match
, str = "Fall ball bill balll llama"
, str_done = str
, len_difference = 0
, doer = function (where_in_str, to_replace)
{
str_done = str_done.slice(0, where_in_str + len_difference)
+ "[match]"
+ str_done.slice(where_in_str + len_difference + to_replace.length)
len_difference = str_done.length - str.length
/* if str smaller:
len_difference will be positive
else will be negative
*/
} /* the actual function that would do whatever we want to do
with the matches;
this above is only an example from Jason's */
/* function input of .replace(),
only there to test the value of $behind
and if negative, call doer() with interesting parameters */
, checker = function ($match, $behind, $after, $where, $str)
{
if ($behind !== "ba")
doer
(
$where + $behind.length
, $after
/* one will choose the interesting arguments
to give to the doer, it's only an example */
)
return $match // empty string anyhow, but well
}
str.replace(re, checker)
console.log(str_done)
my personal output:
Fa[match] ball bi[match] bal[match] [match]ama
the principle is to call checker at each point in the string between any two characters, whenever that position is the starting point of:
--- any substring of the size of what is not wanted (here 'ba', thus ..) (if that size is known; otherwise it must be harder to do perhaps)
--- --- or smaller than that if it's the beginning of the string: ^.?
and, following this,
--- what is to be actually sought (here 'll').
At each call of checker, there will be a test to check if the value before ll is not what we don't want (!== 'ba'); if that's the case, we call another function, and it will have to be this one (doer) that will make the changes on str, if the purpose is this one, or more generically, that will get in input the necessary data to manually process the results of the scanning of str.
here we change the string so we needed to keep a trace of the difference of length in order to offset the locations given by replace, all calculated on str, which itself never changes.
since primitive strings are immutable, we could have used the variable str to store the result of the whole operation, but i thought the example, already complicated by the replacings, would be clearer with another variable (str_done).
i guess that in terms of performances it must be pretty harsh: all those pointless replacements of '' into '', this str.length-1 times, plus here manual replacement by doer, which means a lot of slicing...
probably in this specific above case that could be grouped, by cutting the string only once into pieces around where we want to insert [match] and .join()ing it with [match] itself.
the other thing is that i don't know how it would handle more complex cases, that is, complex values for the fake lookbehind... the length being perhaps the most problematic data to get.
and, in checker, in case of multiple possibilities of nonwanted values for $behind, we'll have to make a test on it with yet another regex (to be cached (created) outside checker is best, to avoid the same regex object to be created at each call for checker) to know whether or not it is what we seek to avoid.
hope i've been clear; if not don't hesitate, i'll try better. :)

Using your case, if you want to replace m with something, e.g. convert it to uppercase M, you can negate set in capturing group.
match ([^a-g])m, replace with $1M
"jim jam".replace(/([^a-g])m/g, "$1M")
\\jiM jam
([^a-g]) will match any char not(^) in a-g range, and store it in first capturing group, so you can access it with $1.
So we find im in jim and replace it with iM which results in jiM.

As mentioned before, JavaScript allows lookbehinds now. In older browsers you still need a workaround.
I bet my head there is no way to find a regex without lookbehind that delivers the result exactly. All you can do is working with groups. Suppose you have a regex (?<!Before)Wanted, where Wanted is the regex you want to match and Before is the regex that counts out what should not precede the match. The best you can do is negate the regex Before and use the regex NotBefore(Wanted). The desired result is the first group $1.
In your case Before=[abcdefg] which is easy to negate NotBefore=[^abcdefg]. So the regex would be [^abcdefg](m). If you need the position of Wanted, you must group NotBefore too, so that the desired result is the second group.
If matches of the Before pattern have a fixed length n, that is, if the pattern contains no repetitive tokens, you can avoid negating the Before pattern and use the regular expression (?!Before).{n}(Wanted), but still have to use the first group or use the regular expression (?!Before)(.{n})(Wanted) and use the second group. In this example, the pattern Before actually has a fixed length, namely 1, so use the regex (?![abcdefg]).(m) or (?![abcdefg])(.)(m). If you are interested in all matches, add the g flag, see my code snippet:
function TestSORegEx() {
var s = "Donald Trump doesn't like jam, but Homer Simpson does.";
var reg = /(?![abcdefg])(.{1})(m)/gm;
var out = "Matches and groups of the regex " +
"/(?![abcdefg])(.{1})(m)/gm in \ns = \"" + s + "\"";
var match = reg.exec(s);
while(match) {
var start = match.index + match[1].length;
out += "\nWhole match: " + match[0] + ", starts at: " + match.index
+ ". Desired match: " + match[2] + ", starts at: " + start + ".";
match = reg.exec(s);
}
out += "\nResulting string after statement s.replace(reg, \"$1*$2*\")\n"
+ s.replace(reg, "$1*$2*");
alert(out);
}

This effectively does it
"jim".match(/[^a-g]m/)
> ["im"]
"jam".match(/[^a-g]m/)
> null
Search and replace example
"jim jam".replace(/([^a-g])m/g, "$1M")
> "jiM jam"
Note that the negative look-behind string must be 1 character long for this to work.

Antlr4 lexer rule should only match if at the beginning of line

I am using Antlr4 with JavaScript and C#. I have a rule which should match only if it is at the start of line. If a expression starts with REM it should be recognized as comment and therefore be hidden. Otherwise it is not a comment.
REM 1+1 is a comment
1+1 REM 2 is not a comment
The bellow code in my Lexer is the best I could do till now. But the problem is it works only if a have a new line before which is actually not that good.
START_COMMENT : ('\r\n' | '\n' | '\f') ([R][E][M] | [;] | [#][ ]) ~[\r\n]* -> channel(HIDDEN);
I am interested to know if there is some kind of trick to tell the Lexer that a rule should only match if it is at the beginning of the line and nowhere else ?

You could use a predicate that tests if the current character index of the lexer is 0 (indicating the start of a line). The disadvantage of this is that you add target specific code to your grammar. For JavaScript, this should work:
REMARK
: {this.getCharIndex() === 0}? ( R E M | ';' | '# ' ) ~[\r\n]*
;
framgment R : [rR];
framgment E : [eE];
framgment M : [mM];

this is probably a hacky way to do it, but when I had this problem, I simply told it to look for a new line before it, and then in my code, I checked if the inputted string is supposed to start with it, and in that case, I added a new line to the top. So I basically did this
comment: NEWLINE '//';
NEWLINE: [\r\n] -> skip;
then in my code:
if (content.startsWith("//")) {
content = "\n" + content;
}
Of course this has the disadvantage of any errors being one line off, but it may work in your case.

Negative Look Behind alternative JAVASCRIPT [duplicate]

Is there a way to achieve the equivalent of a negative lookbehind in JavaScript regular expressions? I need to match a string that does not start with a specific set of characters.
It seems I am unable to find a regex that does this without failing if the matched part is found at the beginning of the string. Negative lookbehinds seem to be the only answer, but JavaScript doesn't has one.
This is the regex that I would like to work, but it doesn't:
(?<!([abcdefg]))m
So it would match the 'm' in 'jim' or 'm', but not 'jam'

Since 2018, Lookbehind Assertions are part of the ECMAScript language specification.
// positive lookbehind
(?<=...)
// negative lookbehind
(?<!...)
Answer pre-2018
As Javascript supports negative lookahead, one way to do it is:
reverse the input string
match with a reversed regex
reverse and reformat the matches
const reverse = s => s.split('').reverse().join('');
const test = (stringToTests, reversedRegexp) => stringToTests
.map(reverse)
.forEach((s,i) => {
const match = reversedRegexp.test(s);
console.log(stringToTests[i], match, 'token:', match ? reverse(reversedRegexp.exec(s)[0]) : 'Ø');
});
Example 1:
Following #andrew-ensley's question:
test(['jim', 'm', 'jam'], /m(?!([abcdefg]))/)
Outputs:
jim true token: m
m true token: m
jam false token: Ø
Example 2:
Following #neaumusic comment (match max-height but not line-height, the token being height):
test(['max-height', 'line-height'], /thgieh(?!(-enil))/)
Outputs:
max-height true token: height
line-height false token: Ø

Lookbehind Assertions got accepted into the ECMAScript specification in 2018.
Positive lookbehind usage:
console.log(
"$9.99 €8.47".match(/(?<=\$)\d+\.\d*/) // Matches "9.99"
);
Negative lookbehind usage:
console.log(
"$9.99 €8.47".match(/(?<!\$)\d+\.\d*/) // Matches "8.47"
);
Platform support:
✔️ V8
✔️ Google Chrome 62.0
✔️ Microsoft Edge 79.0
✔️ Node.js 6.0 behind a flag and 9.0 without a flag
✔️ Deno (all versions)
✔️ SpiderMonkey
✔️ Mozilla Firefox 78.0
🛠️ JavaScriptCore: support in beta as of 2023-02-16: the feature has been merged
✔️ Apple Safari 16.4 (in beta as of 2023-02-16)
✔️ iOS 16.4 beta WebView (all browsers on iOS + iPadOS)
✔️ Bun 0.2.2
❌ Chakra: Microsoft was working on it but Chakra is now abandoned in favor of V8
❌ Internet Explorer
❌ Edge versions prior to 79 (the ones based on EdgeHTML+Chakra)

Let's suppose you want to find all int not preceded by unsigned :
With support for negative look-behind:
(?<!unsigned )int
Without support for negative look-behind:
((?!unsigned ).{9}|^.{0,8})int
Basically idea is to grab n preceding characters and exclude match with negative look-ahead, but also match the cases where there's no preceeding n characters. (where n is length of look-behind).
So the regex in question:
(?<!([abcdefg]))m
would translate to:
((?!([abcdefg])).|^)m
You might need to play with capturing groups to find exact spot of the string that interests you or you want to replace specific part with something else.

Mijoja's strategy works for your specific case but not in general:
js>newString = "Fall ball bill balll llama".replace(/(ba)?ll/g,
function($0,$1){ return $1?$0:"[match]";});
Fa[match] ball bi[match] balll [match]ama
Here's an example where the goal is to match a double-l but not if it is preceded by "ba". Note the word "balll" -- true lookbehind should have suppressed the first 2 l's but matched the 2nd pair. But by matching the first 2 l's and then ignoring that match as a false positive, the regexp engine proceeds from the end of that match, and ignores any characters within the false positive.

Use
newString = string.replace(/([abcdefg])?m/, function($0,$1){ return $1?$0:'m';});

You could define a non-capturing group by negating your character set:
(?:[^a-g])m
...which would match every m NOT preceded by any of those letters.

This is how I achieved str.split(/(?<!^)#/) for Node.js 8 (which doesn't support lookbehind):
str.split('').reverse().join('').split(/#(?!$)/).map(s => s.split('').reverse().join('')).reverse()
Works? Yes (unicode untested). Unpleasant? Yes.

following the idea of Mijoja, and drawing from the problems exposed by JasonS, i had this idea; i checked a bit but am not sure of myself, so a verification by someone more expert than me in js regex would be great :)
var re = /(?=(..|^.?)(ll))/g
// matches empty string position
// whenever this position is followed by
// a string of length equal or inferior (in case of "^")
// to "lookbehind" value
// + actual value we would want to match
, str = "Fall ball bill balll llama"
, str_done = str
, len_difference = 0
, doer = function (where_in_str, to_replace)
{
str_done = str_done.slice(0, where_in_str + len_difference)
+ "[match]"
+ str_done.slice(where_in_str + len_difference + to_replace.length)
len_difference = str_done.length - str.length
/* if str smaller:
len_difference will be positive
else will be negative
*/
} /* the actual function that would do whatever we want to do
with the matches;
this above is only an example from Jason's */
/* function input of .replace(),
only there to test the value of $behind
and if negative, call doer() with interesting parameters */
, checker = function ($match, $behind, $after, $where, $str)
{
if ($behind !== "ba")
doer
(
$where + $behind.length
, $after
/* one will choose the interesting arguments
to give to the doer, it's only an example */
)
return $match // empty string anyhow, but well
}
str.replace(re, checker)
console.log(str_done)
my personal output:
Fa[match] ball bi[match] bal[match] [match]ama
the principle is to call checker at each point in the string between any two characters, whenever that position is the starting point of:
--- any substring of the size of what is not wanted (here 'ba', thus ..) (if that size is known; otherwise it must be harder to do perhaps)
--- --- or smaller than that if it's the beginning of the string: ^.?
and, following this,
--- what is to be actually sought (here 'll').
At each call of checker, there will be a test to check if the value before ll is not what we don't want (!== 'ba'); if that's the case, we call another function, and it will have to be this one (doer) that will make the changes on str, if the purpose is this one, or more generically, that will get in input the necessary data to manually process the results of the scanning of str.
here we change the string so we needed to keep a trace of the difference of length in order to offset the locations given by replace, all calculated on str, which itself never changes.
since primitive strings are immutable, we could have used the variable str to store the result of the whole operation, but i thought the example, already complicated by the replacings, would be clearer with another variable (str_done).
i guess that in terms of performances it must be pretty harsh: all those pointless replacements of '' into '', this str.length-1 times, plus here manual replacement by doer, which means a lot of slicing...
probably in this specific above case that could be grouped, by cutting the string only once into pieces around where we want to insert [match] and .join()ing it with [match] itself.
the other thing is that i don't know how it would handle more complex cases, that is, complex values for the fake lookbehind... the length being perhaps the most problematic data to get.
and, in checker, in case of multiple possibilities of nonwanted values for $behind, we'll have to make a test on it with yet another regex (to be cached (created) outside checker is best, to avoid the same regex object to be created at each call for checker) to know whether or not it is what we seek to avoid.
hope i've been clear; if not don't hesitate, i'll try better. :)

Using your case, if you want to replace m with something, e.g. convert it to uppercase M, you can negate set in capturing group.
match ([^a-g])m, replace with $1M
"jim jam".replace(/([^a-g])m/g, "$1M")
\\jiM jam
([^a-g]) will match any char not(^) in a-g range, and store it in first capturing group, so you can access it with $1.
So we find im in jim and replace it with iM which results in jiM.

As mentioned before, JavaScript allows lookbehinds now. In older browsers you still need a workaround.
I bet my head there is no way to find a regex without lookbehind that delivers the result exactly. All you can do is working with groups. Suppose you have a regex (?<!Before)Wanted, where Wanted is the regex you want to match and Before is the regex that counts out what should not precede the match. The best you can do is negate the regex Before and use the regex NotBefore(Wanted). The desired result is the first group $1.
In your case Before=[abcdefg] which is easy to negate NotBefore=[^abcdefg]. So the regex would be [^abcdefg](m). If you need the position of Wanted, you must group NotBefore too, so that the desired result is the second group.
If matches of the Before pattern have a fixed length n, that is, if the pattern contains no repetitive tokens, you can avoid negating the Before pattern and use the regular expression (?!Before).{n}(Wanted), but still have to use the first group or use the regular expression (?!Before)(.{n})(Wanted) and use the second group. In this example, the pattern Before actually has a fixed length, namely 1, so use the regex (?![abcdefg]).(m) or (?![abcdefg])(.)(m). If you are interested in all matches, add the g flag, see my code snippet:
function TestSORegEx() {
var s = "Donald Trump doesn't like jam, but Homer Simpson does.";
var reg = /(?![abcdefg])(.{1})(m)/gm;
var out = "Matches and groups of the regex " +
"/(?![abcdefg])(.{1})(m)/gm in \ns = \"" + s + "\"";
var match = reg.exec(s);
while(match) {
var start = match.index + match[1].length;
out += "\nWhole match: " + match[0] + ", starts at: " + match.index
+ ". Desired match: " + match[2] + ", starts at: " + start + ".";
match = reg.exec(s);
}
out += "\nResulting string after statement s.replace(reg, \"$1*$2*\")\n"
+ s.replace(reg, "$1*$2*");
alert(out);
}

This effectively does it
"jim".match(/[^a-g]m/)
> ["im"]
"jam".match(/[^a-g]m/)
> null
Search and replace example
"jim jam".replace(/([^a-g])m/g, "$1M")
> "jiM jam"
Note that the negative look-behind string must be 1 character long for this to work.

Using Jison to create / translate simple script to another language

Have been playing with Jison to try to create an "interpreter" for a very simple scripting syntax (this is just for a personal messing around project, no business case!)
It's been about 20 years since I had to create a compiler, and I think I'm just not grasping some of the concepts.
What I am thinking of doing is give a program of very simple statements, one per line, to Jison, and get a stream of Javascript statements back that then perform the actions.
I may be looking at this wrong - maybe I need to actually perform the actions during the parse? This doesn't sound right though.
Anyway, what I've got is (I'm trying this online btw http://zaach.github.io/jison/try/)
/* lexical grammar */
%lex
%options case-insensitive
%%
\s+ /* skip whitespace */
is\s+a\b return 'OCREATE'
is\s+some\b return 'ACREATE'
[_a-zA-Z]+[_a-zA-Z0-9]*\b return 'IDENTIFIER'
<<EOF>> return 'EOF'
/lex
/* operator associations and precedence */
%start input
%% /* language grammar */
input
: /* empty */
| program EOF
{ return $1; }
;
program
: expression
{ $$ = $1; }
| program expression
{ $$ = $1; }
;
expression
: IDENTIFIER OCREATE IDENTIFIER
{ $$ = 'controller.createOne(\'' + $1 + '\', \'' + $3 + '\');' }
| IDENTIFIER ACREATE IDENTIFIER
{ $$ = 'controller.createSeveral(\'' + $1 + '\', \'' + $3 + '\');' }
;
So, for the input:
basket is some apples
orange is a fruit
...I want:
controller.createSeveral('basket', 'apples');
controller.createOne('orange', 'fruit');
What I am getting is:
controller.createSeveral('basket', 'apples');
This kind of makes sense to me to get a single result, but I have no idea what to do to progress with building my output.

The problem is in your second production for program:
program
: expression
{ $$ = $1; }
| program expression
{ $$ = $1; }
What the second production is saying, basically, is "a program can be a (shorter) program followed by an expression, but its semantic value is the value of the shorter program."
You evidently want the value of the program to be augmented by the value of the expression, so you need to say that:
program
: expression
{ $$ = $1; }
| program expression
{ $$ = $1.concat($2); }
(or $$ = $1 + $2 if you prefer. And you might want a newline for readability.)

Negative lookbehind equivalent in JavaScript

Is there a way to achieve the equivalent of a negative lookbehind in JavaScript regular expressions? I need to match a string that does not start with a specific set of characters.
It seems I am unable to find a regex that does this without failing if the matched part is found at the beginning of the string. Negative lookbehinds seem to be the only answer, but JavaScript doesn't has one.
This is the regex that I would like to work, but it doesn't:
(?<!([abcdefg]))m
So it would match the 'm' in 'jim' or 'm', but not 'jam'

Since 2018, Lookbehind Assertions are part of the ECMAScript language specification.
// positive lookbehind
(?<=...)
// negative lookbehind
(?<!...)
Answer pre-2018
As Javascript supports negative lookahead, one way to do it is:
reverse the input string
match with a reversed regex
reverse and reformat the matches
const reverse = s => s.split('').reverse().join('');
const test = (stringToTests, reversedRegexp) => stringToTests
.map(reverse)
.forEach((s,i) => {
const match = reversedRegexp.test(s);
console.log(stringToTests[i], match, 'token:', match ? reverse(reversedRegexp.exec(s)[0]) : 'Ø');
});
Example 1:
Following #andrew-ensley's question:
test(['jim', 'm', 'jam'], /m(?!([abcdefg]))/)
Outputs:
jim true token: m
m true token: m
jam false token: Ø
Example 2:
Following #neaumusic comment (match max-height but not line-height, the token being height):
test(['max-height', 'line-height'], /thgieh(?!(-enil))/)
Outputs:
max-height true token: height
line-height false token: Ø

Lookbehind Assertions got accepted into the ECMAScript specification in 2018.
Positive lookbehind usage:
console.log(
"$9.99 €8.47".match(/(?<=\$)\d+\.\d*/) // Matches "9.99"
);
Negative lookbehind usage:
console.log(
"$9.99 €8.47".match(/(?<!\$)\d+\.\d*/) // Matches "8.47"
);
Platform support:
✔️ V8
✔️ Google Chrome 62.0
✔️ Microsoft Edge 79.0
✔️ Node.js 6.0 behind a flag and 9.0 without a flag
✔️ Deno (all versions)
✔️ SpiderMonkey
✔️ Mozilla Firefox 78.0
🛠️ JavaScriptCore: support in beta as of 2023-02-16: the feature has been merged
✔️ Apple Safari 16.4 (in beta as of 2023-02-16)
✔️ iOS 16.4 beta WebView (all browsers on iOS + iPadOS)
✔️ Bun 0.2.2
❌ Chakra: Microsoft was working on it but Chakra is now abandoned in favor of V8
❌ Internet Explorer
❌ Edge versions prior to 79 (the ones based on EdgeHTML+Chakra)

Let's suppose you want to find all int not preceded by unsigned :
With support for negative look-behind:
(?<!unsigned )int
Without support for negative look-behind:
((?!unsigned ).{9}|^.{0,8})int
Basically idea is to grab n preceding characters and exclude match with negative look-ahead, but also match the cases where there's no preceeding n characters. (where n is length of look-behind).
So the regex in question:
(?<!([abcdefg]))m
would translate to:
((?!([abcdefg])).|^)m
You might need to play with capturing groups to find exact spot of the string that interests you or you want to replace specific part with something else.

Mijoja's strategy works for your specific case but not in general:
js>newString = "Fall ball bill balll llama".replace(/(ba)?ll/g,
function($0,$1){ return $1?$0:"[match]";});
Fa[match] ball bi[match] balll [match]ama
Here's an example where the goal is to match a double-l but not if it is preceded by "ba". Note the word "balll" -- true lookbehind should have suppressed the first 2 l's but matched the 2nd pair. But by matching the first 2 l's and then ignoring that match as a false positive, the regexp engine proceeds from the end of that match, and ignores any characters within the false positive.

Use
newString = string.replace(/([abcdefg])?m/, function($0,$1){ return $1?$0:'m';});

You could define a non-capturing group by negating your character set:
(?:[^a-g])m
...which would match every m NOT preceded by any of those letters.

This is how I achieved str.split(/(?<!^)#/) for Node.js 8 (which doesn't support lookbehind):
str.split('').reverse().join('').split(/#(?!$)/).map(s => s.split('').reverse().join('')).reverse()
Works? Yes (unicode untested). Unpleasant? Yes.

following the idea of Mijoja, and drawing from the problems exposed by JasonS, i had this idea; i checked a bit but am not sure of myself, so a verification by someone more expert than me in js regex would be great :)
var re = /(?=(..|^.?)(ll))/g
// matches empty string position
// whenever this position is followed by
// a string of length equal or inferior (in case of "^")
// to "lookbehind" value
// + actual value we would want to match
, str = "Fall ball bill balll llama"
, str_done = str
, len_difference = 0
, doer = function (where_in_str, to_replace)
{
str_done = str_done.slice(0, where_in_str + len_difference)
+ "[match]"
+ str_done.slice(where_in_str + len_difference + to_replace.length)
len_difference = str_done.length - str.length
/* if str smaller:
len_difference will be positive
else will be negative
*/
} /* the actual function that would do whatever we want to do
with the matches;
this above is only an example from Jason's */
/* function input of .replace(),
only there to test the value of $behind
and if negative, call doer() with interesting parameters */
, checker = function ($match, $behind, $after, $where, $str)
{
if ($behind !== "ba")
doer
(
$where + $behind.length
, $after
/* one will choose the interesting arguments
to give to the doer, it's only an example */
)
return $match // empty string anyhow, but well
}
str.replace(re, checker)
console.log(str_done)
my personal output:
Fa[match] ball bi[match] bal[match] [match]ama
the principle is to call checker at each point in the string between any two characters, whenever that position is the starting point of:
--- any substring of the size of what is not wanted (here 'ba', thus ..) (if that size is known; otherwise it must be harder to do perhaps)
--- --- or smaller than that if it's the beginning of the string: ^.?
and, following this,
--- what is to be actually sought (here 'll').
At each call of checker, there will be a test to check if the value before ll is not what we don't want (!== 'ba'); if that's the case, we call another function, and it will have to be this one (doer) that will make the changes on str, if the purpose is this one, or more generically, that will get in input the necessary data to manually process the results of the scanning of str.
here we change the string so we needed to keep a trace of the difference of length in order to offset the locations given by replace, all calculated on str, which itself never changes.
since primitive strings are immutable, we could have used the variable str to store the result of the whole operation, but i thought the example, already complicated by the replacings, would be clearer with another variable (str_done).
i guess that in terms of performances it must be pretty harsh: all those pointless replacements of '' into '', this str.length-1 times, plus here manual replacement by doer, which means a lot of slicing...
probably in this specific above case that could be grouped, by cutting the string only once into pieces around where we want to insert [match] and .join()ing it with [match] itself.
the other thing is that i don't know how it would handle more complex cases, that is, complex values for the fake lookbehind... the length being perhaps the most problematic data to get.
and, in checker, in case of multiple possibilities of nonwanted values for $behind, we'll have to make a test on it with yet another regex (to be cached (created) outside checker is best, to avoid the same regex object to be created at each call for checker) to know whether or not it is what we seek to avoid.
hope i've been clear; if not don't hesitate, i'll try better. :)

Using your case, if you want to replace m with something, e.g. convert it to uppercase M, you can negate set in capturing group.
match ([^a-g])m, replace with $1M
"jim jam".replace(/([^a-g])m/g, "$1M")
\\jiM jam
([^a-g]) will match any char not(^) in a-g range, and store it in first capturing group, so you can access it with $1.
So we find im in jim and replace it with iM which results in jiM.

As mentioned before, JavaScript allows lookbehinds now. In older browsers you still need a workaround.
I bet my head there is no way to find a regex without lookbehind that delivers the result exactly. All you can do is working with groups. Suppose you have a regex (?<!Before)Wanted, where Wanted is the regex you want to match and Before is the regex that counts out what should not precede the match. The best you can do is negate the regex Before and use the regex NotBefore(Wanted). The desired result is the first group $1.
In your case Before=[abcdefg] which is easy to negate NotBefore=[^abcdefg]. So the regex would be [^abcdefg](m). If you need the position of Wanted, you must group NotBefore too, so that the desired result is the second group.
If matches of the Before pattern have a fixed length n, that is, if the pattern contains no repetitive tokens, you can avoid negating the Before pattern and use the regular expression (?!Before).{n}(Wanted), but still have to use the first group or use the regular expression (?!Before)(.{n})(Wanted) and use the second group. In this example, the pattern Before actually has a fixed length, namely 1, so use the regex (?![abcdefg]).(m) or (?![abcdefg])(.)(m). If you are interested in all matches, add the g flag, see my code snippet:
function TestSORegEx() {
var s = "Donald Trump doesn't like jam, but Homer Simpson does.";
var reg = /(?![abcdefg])(.{1})(m)/gm;
var out = "Matches and groups of the regex " +
"/(?![abcdefg])(.{1})(m)/gm in \ns = \"" + s + "\"";
var match = reg.exec(s);
while(match) {
var start = match.index + match[1].length;
out += "\nWhole match: " + match[0] + ", starts at: " + match.index
+ ". Desired match: " + match[2] + ", starts at: " + start + ".";
match = reg.exec(s);
}
out += "\nResulting string after statement s.replace(reg, \"$1*$2*\")\n"
+ s.replace(reg, "$1*$2*");
alert(out);
}

This effectively does it
"jim".match(/[^a-g]m/)
> ["im"]
"jam".match(/[^a-g]m/)
> null
Search and replace example
"jim jam".replace(/([^a-g])m/g, "$1M")
> "jiM jam"
Note that the negative look-behind string must be 1 character long for this to work.

We Keep Coding

JavaScript is the programming language of the Web.

Issue with a Jison Grammar, Strange error from generate dparser - javascript

Related

iPhone stops running Javascript with exec() [duplicate]

Antlr4 lexer rule should only match if at the beginning of line

Negative Look Behind alternative JAVASCRIPT [duplicate]

Using Jison to create / translate simple script to another language

Negative lookbehind equivalent in JavaScript

Categories

Resources