JavaScript's negative look-ahead doesn't work as expected?

JavaScript's negative look-ahead doesn't work as expected? - javascript

I have some data in a textarea :
(yes it is multiline)
"#ObjectTypeID", DbType.In
"#ObjectID", DbType.Int32,
"#ClaimReasonID", DbType.I
"#ClaimReasonDetails", DbTy
"#AccidendDate", DbType.Da
"#AccidendPlaceID", DbType
"#AccidendPlaceDetails", Db
"#TypeOfMedicalTreatment",
"#MedicalTreatmentDate", Db
"#CreatedBy", DbType.Int32
"#Member_ID", DbType.Strin
.ExecuteScalar(command).ToS
In each row - I want to remove those sections : (from " (include) till the end of row) :
Visually : ( I sampled only 4 )
I've managed to do this :
value=value.replace(/\"[a-z,. ]+(?!.*\")/gi,'')
Which means : search the first " where have charters after it , which doesnot have a future "
This will yield the required results :
"#ObjectTypeID
"#ObjectID32,
"#ClaimReasonID
"#ClaimReasonDetails
"#AccidendDate
"#AccidendPlaceID
"#AccidendPlaceDetails
"#TypeOfMedicalTreatment
"#MedicalTreatmentDate
"#CreatedBy32
"#Member_ID
.ExecuteScalar(command).ToS
Question:
I understand why it is working , but I dont understand why the following is not working :
value=value.replace(/\".+(?!.*\")/gi,'')
http://jsbin.com/fanep/4/edit
I mean : it suppose to search " where has charters after it , which doesn't has future " ....
What am I missing ? I really hate to declare [a-z,. ]

+ is greedy. Since "the whole thing" matches your rule of "must not have a " after", it will go with that.
The reason your first regex works is because you are disallowing most characters by explicitly whitelisting certain ones.
To fix, try adding ? after the + - this will make it lazy instead, matching as little as possible while still meeting the rules.
Additionally, you are searching for the stuff you want to keep... and then deleting it.
Try this instead:
val = val.replace(/"[^"]*(?=[\r\n]|$)/g,'');
This will remove everything from the last " to the end of a line (or end of the input).

value=value.replace(/\"[a-z,. ]+(?!.*\")/gi,'')
means: search the first " where have charters after it, which doesnot have a future "
To be exact: It matches the first " that has some of the characters [a-z,. ] after it, which then is not (in any distance) followed by another ".
I dont understand why the following is not working:
value=value.replace(/\".+(?!.*\")/gi,'')
You have removed the restriction of the character class. .+ will now match any char, including quotes. Regardless whether greedy or not, it will now find the first " that is followed by an amount of any characters (including other quotes) that are no more followed by quotes - i.e. it will suffice if .+ matches until the last quote.
I really hate to declare [a-z,. ]
You can just use the class of all characters except quotes: [^"]. Indeed, I think the following lookahead-free version matches your intent better:
value = value.replace(/"[^"\n\r]*/gi, '');

The one that doesn't work fails because the .+ is greedy. It eats up all it can. (Visual tools can help here, such as this one: http://regex101.com/r/eJ5kJ2/1) We can make it clearer that .+ is matching too much by putting it in a capture group: http://regex101.com/r/qF7nR9/1 Which show us:
In your one that does work (http://regex101.com/r/kR8vL6/1), you've changed that to [a-z,. ]+, which means "one or more a to z, comma, period, or space" (note that the . there is just a period, not a wildcard). That's much more limited (in particular, it doesn't include #).
Side note: There's no need to escape the " with a backslash, " isn't a special character in regular expressions.

Why the below regex is not working?
\".+(?!.*\")
Answer:
\" matches the first " and the following .+ would match greedily upto the last character. Because the last character in a line isn't followed by any character zero or more times plus \, the above regex would match the whole line undoubtably.
For your case, you could simply use the below regex to match from the second " upto the end of the line anchor.
\"[^"\n]*$
DEMO

Related

Javascript - Regex for double quotes on inch measurements

Given measurement data like:
2"
3" Contract
When coming back from the server it looks like this:
"\"2\"\"\""
"\"3\"\" Contract\""
e.g. as shown within the image:
I want the data to be displayed as a proper measurement to the user. So:
2"
3" Contract
As shown above
I resorted to complicated regexes to get the second example working (3" Contract) but it would just turn 2" to 2.
let measurement_formatted = value.replace("\"\"", '\"');
measurement_formatted = measurement_formatted.replace(/(^"|"$)/g, '');
measurement_formatted = measurement_formatted.replace("\"", '\"');
How can I develop a proper regex for both cases?

First of all, those \ before the " are just put there to tell you that the " (preceded by a\) is being escaped.
Based on that, the string "\"3\"\" Contract\"" is the same as '"3"" Contract"' because escaping " is no longer needed when the string is delimited by ' character.
To answer, or rather land some help (which I'll always gladly do), you may use the following regex /^"*|(\D)"/g in conjunction with the replace method :
/ : tells the JS engine that we're creating a regex.
^"* : tells the JS engine to match any " at the start of the string (0 or more).
| : acts as the logical OR operator.
(\D)" :
(\D) : creates a matching group of any NON-NUMERIC character.
" : the literal " character.
g : tells the JS engine to match all the occurrences of that regex.
The idea here is to tell the replace method to replace all " characters that are preceded by a non-numeric character with that matched non-numeric character and entirely delete the " character.
Here's a live example :
const regex = /^"*|(\D)"/g;
/** $1 : means write down the first matched capturing group */
console.log('"3"" Contract"'.replace(regex, '$1')); // 3" Contract
console.log('2"'.replace(/^"*|(\D)"/g, '$1')); // 2"
Learn more about the replace method.
Hope i managed to land some help.

Trying to write a regex where a newline may appear anywhere in a group

I'm trying to make a regex divide text into two parts and ignore everything that comes after these two parts.
The (insufficient) regex I'm trying to use is:
/Artikelnummer(?:(&&&))(.*)(?:\s*.*)\W?(?:Dokumentation&&&KKS-Nummer&&&Beschreibung&&&Seite&&&)((.*)&&&(.*)&&&(\d)+)*/
The text I'm matching is saved at these links:
https://regex101.com/r/VDnUoe/1
https://regex101.com/r/j62Mw0/2
Part 1) Everything after Artikelnummer and before Dokumentation... (easy to match)
Part 2) Everything after (?:Dokumentation&&&KKS-Nummer&&&Beschreibung&&&Seite&&&) that follows the pattern:
text&&&text&&&digits
In one of the above links, the above pattern works except for a new line that is thrown in, which causes some text to be left out that should be included.
The first part is matched:
all&&&Vorwort&&&1&&&all&&&Sicherheit&&&2&&&all&&&Richtlinien und Normen&&&3&&&all&&&Produktbeschreibung&&&4&&&all&&&Installation&&&5&&&all&&&Wichtige Informationene zur Inbetriebnahme&&&6&&&all&&&Projektierung - Wichtige Infos&&&7&&&all&&&Anhang 1&&&8&&&all&&&Anhang 2&&&9&&&all&&&Anhang 3&&&10&&&all&&&Anhang 4&&&11&&&all&&&Anhang 5&&&12&&&all&&&Anhang 6&&&13&&&all&&&Anhang 7&&&14&&&all&&&Anhang 8&&&15&&&all&&&Anhang 9&&&16&&&all&&&Anhang 10&&&17&&&all&&&Anhang 11&&&18&&&all&&&Anhang 12&&&19&&&all&&&Anhang 13&&&20&&&all&&&Anhang 14&&&21&&&all&&&Anhang 15&&&22&&&all&&&Anhang 16&&&23&&&all&&&Anhang 17&&&24&&&all&&&Anhang 18&&&25&&&all&&&Anhang 19&&&26&&&all&&&Anhang 20&&&27&&&all&&&Anhang 21&&&28&&&all&&&Anhang 22&&&29&&&all&&&Anhang 23&&&30&&&all&&&Anhang 24&&&31&&&all&&&Anhang 25&&&32&&&all&&&Anhang 26&&&33
And then this isn't matched, because a newline is inserted:
all&&&Anhang 27&&&34&&&all&&&Anhang 28&&&35&&&all&&&Anhang 29&&&36&&&all&&&Anhang 30&&&37&&&all&&&Anhang 31&&&38&&&all&&&Anhang 32&&&39&&&all&&&Anhang 33&&&40&&&all&&&Anhang 34&&&41&&&all&&&Anhang 35&&&42&&&all&&&Anhang 36&&&43&&&all&&&Anhang 37&&&44&&&all&&&Anhang 38&&&45
My question is, how can this regex be rewritten so that a newline could theoretically be placed anywhere within the second part of the text and still match everything I want?

I'm not sure this is what you want, anyway this regex works with newlines too:
Artikelnummer(?:(&&&))(.*)(?:\s*.*)\W?(?:Dokumentation&&&KKS-Nummer&&&Beschreibung&&&Seite&&&)((.*)&&&(.*)&&&(\d)+(\n?)*)*
\n matches newline
? is the quantifier for zero or one (if newline is found or not)
* I added this one if more newline are encountered

I would try a regex like this:
(Artikelnummer([\n|\r| |\S]*)(?=Dokumentation))(([\n|\r| |\S]*&&&){2}\d+)*
Looking for the \n\r and all other non space chars.
Second I wouldn't use the ?: - for maching every find. The positive lookup ?= should give you the requirements for the first group.

Complex Regex composition - Regex that match "if"

I'm making a Regex to match hashtags to my project. I want that regex match hashtags that are separeted by one single space, don't have another hashtag inside this content and just match a space in the string if this is followed by any word (except other blank space or #).
I'm really curious to know if I can do something like "if" in regular expressions and I hope you can help me with this.
So, in:
"#hashtag?!-=_" "#hashhash#" "#hash tag" "#hash tag" "#hash #ahuhuhhuasd" "#hash "
The regex must match the following sentences:
"#hashtag?!-=_" "#hashhash" "#hash tag" "#hash" "#hash #ahuhuhhuasd" "#hash"
(all hashtag) (one) (another h.)
Actually, this is my code:
#{1,1}\S+\s{0,1}
You can test here this code, but it matches things that isn't desired:
"#ahusdhuas?!__??###hud #ahusdhuads "
The blank space in the end of the string, the 3 '#' inside the string.
none of the following content is desired in this string, just "#ahusdhuas?!__??"
Glad if you can help me!

I think this is what you need :
(#(?:\s?[^#\s]+)+)
Here are some tests :

Is any of these are what You've been looking for?

Try:
#[^# ]+(?: [^# ]+)*
Match a #, then one or more characters that aren't # or , then 0 or more instances of ( A space followed by one or more characters that aren't # or ). The ?: makes the group non-capturing.
If you don't want to match ###hud in #ahusdhuas?!__??###hud #ahusdhuads at all because it begins with three #, you can add the negative lookbehind: (?<!#) to the front of the regex:
(?<!#)#[^# ]+(?: [^# ]+)*
However, that will work in Ruby but not in JavaScript, since JavaScript doesn't have the capability to do lookbehinds. In that case you'd have to use the #[^# ]+(?: [^# ]+)* pattern, and if the match starts after the first character, test the previous character in the string in your code to see if it is a #, and if so, reject the match the regex returns.

I think I got it, though I'm not accustomed to Javascript's regex expression because I only use Python.
I tested the following on the site regexpal.com given by Monty Wild, it's the only one that showed me all the substrings matched:
(?:^ |^| )(#[^#\s]+(?: [^#\s]+)?)(?:(?=\Z| \Z| \S)| +(?=#))
result
#hashtag?!-=_
#hash tag
#hash
#ahuhuhhuasd
#hash
As Javascript's regexex doesn't accept lookbehind assertions, I used a trick to make so that a hastag preceded by two or more blanks won't match: these preceding blanks are consumed by the regex machine as subsequent blanks in the preceding matching: that's the role of the last part +(?=#) of the regex to trihgger such a matching of trailing blanks of a matcjing if there are more than one. This cosumption intervenes only if the former part (?=\Z| \Z| \S) didn't match

Tried this in a standard HTML page and in Firebug as well ...
Works againt inputs you gave.
var hashTags = ["#hashtag?!-=_", "#hashhash#", "#hash tag", "#hash tag", "#hash #ahuhuhhuasd", "#hash ", "#hash #", "#foo bar baz"];
hashTags.forEach(function(el, idx, arr) {
console.log( el.match(/#([^#\s]|(( [^\s])(?!\s|$)))+/g));
});
// Console output
> ["#hashtag?!-=_"]
> ["#hashhash"]
> ["#hash tag"]
> ["#hash"]
> ["#hash #ahuhuhhuasd"]
> ["#hash"]
> ["#hash"]
> ["#foo bar baz"]

Javascript Regular Expressions Functionality

I've spent a few hours on this and I can't seem to figure this one out.
In the code below, I'm trying to understand exactly what and how the regular expressions in the url.match are working.
As the code is below, it doesn't work. However if I remove (?:&toggle=|&ie=utf-8|&FORM=|&aq=|&x=|&gwp) it seems to give me the output that I want.
However, I don't want to remove this without understanding what it is doing.
I found a pretty useful resource, but after a few hours I still can't precisely determine what these expressions are doing:
https://developer.mozilla.org/en-US/docs/JavaScript/Guide/Regular_Expressions#Using_Parenthesized_Substring_Matches
Could someone break this down for me and explain how exactly it is parsing the strings. The expressions themselves and the placement of the parentheses is not really clear to me and frankly very confusing.
Any help is appreciated.
(function($) {
$(document).ready(function() {
function parse_keywords(url){
var matches = url.match(/.*(?:\?p=|\?q=|&q=|\?s=)([a-zA-Z0-9 +]*)(?:&toggle=|&ie=utf-8|&FORM=|&aq=|&x=|&gwp)/);
return matches ? matches[1].split('+') : [];
}
myRefUrl = "http://www.google.com/url?sa=f&rct=j&url=https://www.mydomain.com/&q=my+keyword+from+google&ei=fUpnUaage8niAKeiICgCA&usg=AFQjCNFAlKg_w5pZzrhwopwgD12c_8z_23Q";
myk1 = (parse_keywords(myRefUrl));
kw="";
for (i=0;i<myk1.length;i++) {
if (i == (myk1.length - 1)) {
kw = kw + myk1[i];
}
else {
kw = kw + myk1[i] + '%20';
}
}
console.log (kw);
if (kw != null && kw != "" && kw != " " && kw != "%20") {
orighref = $('a#applynlink').attr('href');
$('a#applynlink').attr('href', orighref + '&scbi=' + kw);
}
});
})(jQuery);

Let's break this regex down.
/
Begin regex.
.*
Match zero or more anything - basically, we're willing to match this regex at any point into the string.
(?:\?p=
|\?q=
|&q=
|\?s=)
In this, the ?: means 'do not capture anything inside of this group'. See http://www.regular-expressions.info/refadv.html
The \? means take ? literally, which is normally a character meaning 'match 0 or 1 copies of the previous token' but we want to match an actual ?.
Other than that, it's just looking for a multitude of different options to select (| means 'the regex is valid if I match either what's before me or after me.)
([a-zA-Z0-9 +]*)
Now we match zero or more of any of the following characters in any arrangement: a-ZA-Z0-9 + And since it is inside a () with no ?: we DO capture it.
(?:&toggle=
|&ie=utf-8
|&FORM=
|&aq=
|&x=
|&gwp)
We see another ?: so this is another non-capturing group.
Other than that, it is just full of literal characters separated by |s, so it is not doing any fancy logic.
/
End regex.
In summary, this regex looks through the string for any instance of the first non capturing group, captures everything inside of it, then looks for any instance of the second non capturing group to 'cap' it off and returns everything that was between those two non capturing groups. (Think of it as a 'sandwich', we look for the header and footer and capture everything in between that we're interested in)
After the regex runs, we do this:
return matches ? matches[1].split('+') : [];
Which grabs the captured group and splits it on + into an array of strings.

For situations like this, it's really helpful to visualize it with www.debuggex.com (which I built). It immediately shows you the structure of your regex and allows you to walk through step-by-step.
In this case, the reason it works when you remove the last part of your regex is because none of the strings &toggle=, &ie=utf-8, etc are in your sample url. To see this, drag the grey slider above the test string on debuggex and you'll see that it never makes it past the & in that last group.

How to make this simple regexp?

I need to make a string starts and ends with alphanumeric range between 5 to 20 characters and it could have a space or none between characters. /^[a-z\s?A-Z0-9]{5,20}$/ but this is not working.
EDIT
test test -should pass
testtest -should pass
test test test -should not pass

You can't do this with traditional regex without writing a ridiculously long expression, so you need to use a look-ahead:
/^(?=(\w| ){15,20}$)\w+ ?\w+$/
This says, make sure there are between 15 and 20 characters in the match, then match /\w+ \w+/
Note I used \w for simplification. It is the same as your character class above except it also accepts underscores. If you don't want to match them you have to do:
/^(?=[a-zA-Z0-9 ]{15,20}$)[a-zA-Z0-9]+ ?[a-zA-Z0-9]+$/

You can't put a ? inside of [...]. [...] is used to specify a set of characters precisely, you can't maybe (?) have a character inside a set of characters. The occurrence of any specific characters is already optional, the ? is meaningless.
If you allow any number of spaces inside your match, just remove the question mark. If you want to allow a single space but no more, then regular expressions alone can't do that for you, you'd need something like
if (myString.match(/^[a-z\sA-Z0-9]{5,20}$/ && myString.match(/\s/g).length <= 1)
You couldn't do this with a single traditional regex without it being dozens of lines long; regexes are meant for matching more simpler patterns than this.
If you only want to use regexes, you could use two instead of one. The first matches the general pattern, the second ensures that only one non-space characters is found.
if (myString.match(/^[a-z\sA-Z0-9]{5,20}$/ && myString.match(/^[^\s]*\s?[^\s]*$/))) {
Example Usage
inputs = ["test test", "testtest", "test test test"];
for (index in inputs) {
var myString = inputs[index];
if (myString.match(/^[a-z\sA-Z0-9]{5,20}$/ && myString.match(/^[^\s]*\s?[^\s]*$/))) {
console.log(myString + " matches.")
} else {
console.log(myString + " does not match.")
}
}
This produces the output specified in your question.

Meh , So here's the ridiculously long traditional regex for the same
(?i)[a-z0-9]+( [a-z0-9]+)?{5,12}
js vesrion (w/o the nested quantifier)
/^([a-z0-9]( [a-z0-9])?){5,12}$/i

We Keep Coding

JavaScript is the programming language of the Web.

JavaScript's negative look-ahead doesn't work as expected? - javascript

Related

Javascript - Regex for double quotes on inch measurements

Trying to write a regex where a newline may appear anywhere in a group

Complex Regex composition - Regex that match "if"

Javascript Regular Expressions Functionality

How to make this simple regexp?

Categories

Resources