Regular expression that remove second occurrence of a character in a string - javascript

I'm trying to write a JavaScript function that removes any second occurrence of a character using the regular expression. Here is my function
var removeSecondOccurrence = function(string) {
return string.replace(/(.*)\1/gi, '');
}
It's only removing consecutive occurrence. I'd like it to remove even non consecutive one. for example papirana should become pairn.
Please help

A non-regexp solution:
"papirana".split("").filter(function(x, n, self) { return self.indexOf(x) == n }).join("")
Regexp code is complicated, because JS doesn't support lookbehinds:
str = "papirana";
re = /(.)(.*?)\1/;
while(str.match(re)) str = str.replace(re, "$1$2")
or a variation of the first method:
"papirana".replace(/./g, function(a, n, str) { return str.indexOf(a) == n ? a : "" })

Using a zero-width lookahead assertion you can do something similar
"papirana".replace(/(.)(?=.*\1)/g, "")
returns
"pirna"
The letters are of course the same, just in a different order.
Passing the reverse of the string and using the reverse of the result you can get what you're asking for.

This is how you would do it with a loop:
var removeSecondOccurrence = function(string) {
var results = "";
for (var i = 0; i < string.length; i++)
if (!results.contains(string.charAt(i)))
results += string.charAt(i);
}
Basically: for each character in the input, if you haven't seen that character already, add it to the results. Clear and readable, at least.

What Michelle said.
In fact, I strongly suspect it cannot be done using regular expressions. Or rather, you can if you reverse the string, remove all but the first occurences, then reverse again, but it's a dirty trick and what Michelle suggests is way better (and probably faster).
If you're still hot on regular expressions...
"papirana".
split("").
reverse().
join("").
replace(/(.)(?=.*\1)/g, '').
split("").
reverse().
join("")
// => "pairn"
The reason why you can't find all but the first occurence without all the flippage is twofold:
JavaScript does not have lookbehinds, only lookaheads
Even if it did, I don't think any regexp flavour allows variable-length lookbehinds

Related

Javascript regexes - Lookbehind and lookahead at the same time

I am trying to create a regex in JavaScript that matches the character b if it is not preceded or followed by the character a.
Apparently, JavaScript regexes don't have negative lookbehind readily implemented, making the task difficult. I came up with the following one, but it does not work.
"ddabdd".replace(new RegExp('(?:(?![a]b(?![a])))*b(?![a])', 'i'),"c");
is the best I could come up with. Here, the b should not match because it has a preceding it, but it matches.
So some examples on what I want to achieve
"ddbdd" matches the b
"b" matches the b
"ddb" matches the b
"bdd" matches the b
"ddabdd" or "ddbadd" does not match the b
It seems you could use a capturing group containing either the beginning of string anchor or a negated character class preceding "b" while using Negative Lookahead to assert that "a" does not follow as well. Then you would simply reference $1 inside of the replacement call along with the rest of your replacement string.
var s = 'ddbdd b ddb bdd ddabdd ddabdd ddbadd';
var r = s.replace(/(^|[^a])b(?!a)/gi, '$1c');
console.log(r); //=> "ddcdd c ddc cdd ddabdd ddabdd ddbadd"
Edit: As #nhahtdh pointed out the comment about consecutive characters, you may consider a callback.
var s = 'ddbdd b ddb bdd ddabdd ddabdd ddbadd sdfbbfds';
var r = s.replace(/(a)?b(?!a)/gi, function($0, $1) {
return $1 ? $0 : 'c';
});
console.log(r); //=> "ddcdd c ddc cdd ddabdd ddabdd ddbadd sdfccfds"
There is no way to emulate the behavior of look-behind with regex alone in this case, since there may be consecutive b in the string, which requires the zero-width property of a look-behind to check the immediately preceding character.
Since the condition in the look-behind is quite simple, you can check for it in the replacement function:
inputString.replace(/b(?!a)/gi, function ($0, idx, str) {
if (idx == 0 || !/a/i.test(str[idx - 1])) { // Equivalent to (?<!a)
return 'c';
} else {
return $0; // $0 is the text matched by /b(?!a)/
}
});
What you are really trying to do here is write a parser for a tiny language. Regexp is good at some parsing tasks, but bad at many (and JS regexps are somewhat underpowered). You may be able to find a regexp to work in a particular situation, then when your syntax rules change, the regexp may be difficult or impossible to change to reflect that. The simple program below has the advantage that it is readable and maintainable. It does exactly what it says.
function find_bs(str) {
var indexes = [];
for (var i = 0; i < str.length; i++) {
if (str[i] === 'b' && str[i-1] !== 'a' && str[i+1] !== 'a')
indexes.push(i);
}
return indexes;
}
Using a regexp
If you absolutely insist on using a regexp, you can use the trick of resetting the lastIndex property on the regexp in conjunction with RegExp.exec:
function find_bs(str) {
var indexes = [];
var regexp = /.b[^a]|[^a]b./g;
var matches;
while (matches = regexp.exec(str)) {
indexes.push(matches.index + 1);
regexp.lastIndex -= 2;
}
return indexes;
}
You will need to tweak the logic to handle the beginning and end of the string.
How this works
We find the entire xbx string using the regexp. The index of b will be one plus the index of the match, so we record this. Before we do the next match, we reset lastIndex, which governs the starting point from which the search will continue, back to the b, so it serves as the first character of any following potential match.

Count parentheses with regular expression

My string is: (as(dh(kshd)kj)ad)... ()()
How is it possible to count the parentheses with a regular expression? I would like to select the string which begins at the first opening bracket and ends before the ...
Applying that to the above example, that means I would like to get this string: (as(dh(kshd)kj)ad)
I tried to write it, but this doesn't work:
var str = "(as(dh(kshd)kj)ad)... ()()";
document.write(str.match(/(.*)/m));
As I said in the comments, contrary to popular belief (don't believe everything people say) matching nested brackets is possible with regex.
The downside of using it is that you can only do it up to a fixed level of nesting. And for every additional level you wish to support, your regex will be bigger and bigger.
But don't take my word for it. Let me show you. The regex \([^()]*\) matches one level. For up to two levels see the regex here. To match your case, you'd need:
\(([^()]*|\(([^()]*|\([^()]*\))*\))*\)
It would match the bold part: (as(dh(kshd)kj)ad)... ()()
Check the DEMO HERE and see what I mean by fixed level of nesting.
And so on. To keep adding levels, all you have to do is change the last [^()]* part to ([^()]*|\([^()]*\))* (check three levels here). As I said, it will get bigger and bigger.
See Tim's answer for why this won't work, but here's a function that'll do what you're after instead.
function getFirstBracket(str){
var pos = str.indexOf("("),
bracket = 0;
if(pos===-1) return false;
for(var x=pos; x<str.length; x++){
var char = str.substr(x, 1);
bracket = bracket + (char=="(" ? 1 : (char==")" ? -1 : 0));
if(bracket==0) return str.substr(pos, (x+1)-pos);
}
return false;
}
getFirstBracket("(as(dh(kshd)kj)ad)... ()(");
There is a possibility and your approach was quite good:
Match will give you an array if you had some hits, if so you can look up the array length.
var str = "(as(dh(kshd)kj)ad)... ()()",
match = str.match(new RegExp('.*?(?:\\(|\\)).*?', 'g')),
count = match ? match.length : 0;
This regular expression will get all parts of your text that include round brackets. See http://gskinner.com/RegExr/ for a nice online regex tester.
Now you can use count for all brackets.
match will deliver a array that looks like:
["(", "as(", "dh(", "kshd)", "kj)", "ad)", "... (", ")", "(", ")"]
Now you can start sorting your results:
var newStr = '', open = 0, close = 0;
for (var n = 0, m = match.length; n < m; n++) {
if (match[n].indexOf('(') !== -1) {
open++;
newStr += match[n];
} else {
if (open > close) newStr += match[n];
close++;
}
if (open === close) break;
}
... and newStr will be (as(dh(kshd)kj)ad)
This is probably not the nicest code but it will make it easier to understand what you're doing.
With this approach there is no limit of nesting levels.
This is not possible with a JavaScript regex. Generally, regular expressions can't handle arbitrary nesting because that can no longer be described by a regular language.
Several modern regex flavors do have extensions that allow for recursive matching (like PHP, Perl or .NET), but JavaScript is not among them.
No. Regular expressions express regular languages. Finite automatons (FA) are the machines which recognise regular language. A FA is, as its name implies, finite in memory. With a finite memory, the FA can not remember an arbitrary number of parentheses - a feature which is needed in order to do what you want.
I suggest you use an algorithms involving an enumerator in order to solve your problem.
try this jsfiddle
var str = "(as(dh(kshd)kj)ad)... ()()";
document.write(str.match(/\((.*?)\.\.\./m)[1] );

Split string with a single occurence (not twice) of a delimiter in Javascript

This is better explained with an example. I want to achieve an split like this:
two-separate-tokens-this--is--just--one--token-another
->
["two", "separate", "tokens", "this--is--just--one--token", "another"]
I naively tried str.split(/-(?!-)/) and it won't match the first occurrence of double delimiters, but it will match the second (as it is not followed by the delimiter):
["two", "separate", "tokens", "this-", "is-", "just-", "one-", "token", "another"]
Do I have a better alternative than looping through the string?
By the way, the next step should be replacing the two consecutive delimiters by just one, so it's kind of escaping the delimiter by repeating it... So the final result would be this:
["two", "separate", "tokens", "this-is-just-one-token", "another"]
If that can be achieved in just one step, that should be really awesome!
str.match(/(?!-)(.*?[^\-])(?=(?:-(?!-)|$))/g);
Check this fiddle.
Explanation:
Non-greedy pattern (?!-)(.*?[^\-]) match a string that does not start and does not end with dash character and pattern (?=(?:-(?!-)|$)) requires such match to be followed by single dash character or by end of line. Modifier /g forces function match to find all occurrences, not just a single (first) one.
Edit (based on OP's comment):
str.match(/(?:[^\-]|--)+/g);
Check this fiddle.
Explanation:
Pattern (?:[^\-]|--) will match non-dash character or double-dash string. Sign + says that such matching from the previous pattern should be multiplied as many times as can. Modifier /g forces function match to find all occurrences, not just a single (first) one.
Note:
Pattern /(?:[^-]|--)+/g works in Javascript as well, but JSLint requires to escape - inside of square brackets, otherwise it comes with error.
You would need a negative lookbehind assertion as well as your negative lookahead:
(?<!-)-(?!-)
http://regexr.com?31qrn
Unfortunately the javascript regular expression parser does not support negative lookbehinds, I believe the only workaround is to inspect your results afterwards and remove any matches that would have failed the lookbehind assertion (or in this case, combine them back into a single match).
#Ωmega has the right idea in using match instead of split, but his regex is more complicated than it needs to be. Try this one:
s.match(/[^-]+(?:--[^-]+)*/g);
It reads exactly the way you expect it to work: Consume one or more non-hyphens, and if you encounter a double hyphen, consume that and go on consuming non-hyphens. Repeat as necessary.
EDIT: Apparently the source string may contain runs of two or more consecutive hyphens, which should not be treated as delimiters. That can be handled by adding a + to the second hyphen:
s.match(/[^-]+(?:--+[^-]+)*/g);
You can also use a {min,max} quantifier:
s.match(/[^-]+(?:-{2,}[^-]+)*/g);
I don't know how to do it purely with the regex engine in JS. You could do it this way that is a little less involved than manually parsing:
var str = "two-separate-tokens-this--is--just--one--token-another";
str = str.replace(/--/g, "#!!#");
var split = str.split(/-/);
for (var i = 0; i < split.length; i++) {
split[i] = split[i].replace(/#!!#/g, "--");
}
Working demo: http://jsfiddle.net/jfriend00/hAhAB/
You can achieve this without negative lookbehind (as #jbabey mentioned these are not supported in JS) like that (inspired by this article):
\b-\b
Given that the regular expressions weren't very good with edge cases (like 5 consecutive delimiters) and I had to deal with replacing the double delimiters with a single one (and then again it would get tricky because '----'.replace('--', '-') gives '---' rather than '--')
I wrote a function that loops over the characters and does everything in one go (although I'm concerned that using the string accumulator can be slow :-s)
f = function(id, delim) {
var result = [];
var acc = '';
var i = 0;
while(i < id.length) {
if (id[i] == delim) {
if (id[i+1] == delim) {
acc += delim;
i++;
} else {
result.push(acc);
acc = '';
}
} else {
acc += id[i];
}
i++;
}
if (acc != '') {
result.push(acc);
}
return result;
}
and some tests:
> f('a-b--', '-')
["a", "b-"]
> f('a-b---', '-')
["a", "b-"]
> f('a-b---c', '-')
["a", "b-", "c"]
> f('a-b----c', '-')
["a", "b--c"]
> f('a-b----c-', '-')
["a", "b--c"]
> f('a-b----c-d', '-')
["a", "b--c", "d"]
> f('a-b-----c-d', '-')
["a", "b--", "c", "d"]
(If the last token is empty, it's meant to be skipped)

Can regex matches in javascript match any word after an equal operator?

I am trying to target ?state=wildcard in this statement :
?state=uncompleted&dancing=yes
I would like to target the entire line ?state=uncomplete, but also allow it to find whatever word would be after the = operator. So uncomplete could also be completed, unscheduled, or what have you.
A caveat I am having is granted I could target the wildcard before the ampersand, but what if there is no ampersand and the param state is by itself?
Try this regular expression:
var regex = /\?state=([^&]+)/;
var match = '?state=uncompleted&dancing=yes'.match(regex);
match; // => ["?state=uncompleted", "uncompleted"]
It will match every character after the string "\?state=" except an ampersand, all the way to the end of the string, if necessary.
Alternative regex: /\?state=(.+?)(?:&|$)/
It will match everything up to the first & char or the end of the string
IMHO, you don't need regex here. As we all know, regexes tend to be slow, especially when using look aheads. Why not do something like this:
var URI = '?state=done&user=ME'.split('&');
var passedVals = [];
This gives us ['?state=done','user=ME'], now just do a for loop:
for (var i=0;i<URI.length;i++)
{
passedVals.push(URI[i].split('=')[1]);
}
Passed Vals wil contain whatever you need. The added benefit of this is that you can parse a request into an Object:
var URI = 'state=done&user=ME'.split('&');
var urlObjects ={};
for (var i=0;i<URI.length;i++)
{
urlObjects[URI[i].split('=')[0]] = URI[i].split('=')[1];
}
I left out the '?' at the start of the string, because a simple .replace('?','') can fix that easily...
You can match as many characters that are not a &. If there aren't any &s at all, that will of course also work:
/(\?state=[^&]+)/.exec("?state=uncompleted");
/(\?state=[^&]+)/.exec("?state=uncompleted&a=1");
// both: ["?state=uncompleted", "?state=uncompleted"]

Create RegExps on the fly using string variables

Say I wanted to make the following re-usable:
function replace_foo(target, replacement) {
return target.replace("string_to_replace",replacement);
}
I might do something like this:
function replace_foo(target, string_to_replace, replacement) {
return target.replace(string_to_replace,replacement);
}
With string literals this is easy enough. But what if I want to get a little more tricky with the regex? For example, say I want to replace everything but string_to_replace. Instinctually I would try to extend the above by doing something like:
function replace_foo(target, string_to_replace, replacement) {
return target.replace(/^string_to_replace/,replacement);
}
This doesn't seem to work. My guess is that it thinks string_to_replace is a string literal, rather than a variable representing a string. Is it possible to create JavaScript regexes on the fly using string variables? Something like this would be great if at all possible:
function replace_foo(target, string_to_replace, replacement) {
var regex = "/^" + string_to_replace + "/";
return target.replace(regex,replacement);
}
There's new RegExp(string, flags) where flags are g or i. So
'GODzilla'.replace( new RegExp('god', 'i'), '' )
evaluates to
zilla
With string literals this is easy enough.
Not really! The example only replaces the first occurrence of string_to_replace. More commonly you want to replace all occurrences, in which case, you have to convert the string into a global (/.../g) RegExp. You can do this from a string using the new RegExp constructor:
new RegExp(string_to_replace, 'g')
The problem with this is that any regex-special characters in the string literal will behave in their special ways instead of being normal characters. You would have to backslash-escape them to fix that. Unfortunately, there's not a built-in function to do this for you, so here's one you can use:
function escapeRegExp(s) {
return s.replace(/[-/\\^$*+?.()|[\]{}]/g, '\\$&')
}
Note also that when you use a RegExp in replace(), the replacement string now has a special character too, $. This must also be escaped if you want to have a literal $ in your replacement text!
function escapeSubstitute(s) {
return s.replace(/\$/g, '$$$$');
}
(Four $s because that is itself a replacement string—argh!)
Now you can implement global string replacement with RegExp:
function replace_foo(target, string_to_replace, replacement) {
var relit= escapeRegExp(string_to_replace);
var sub= escapeSubstitute(replacement);
var re= new RegExp(relit, 'g');
return target.replace(re, sub);
}
What a pain. Luckily if all you want to do is a straight string replace with no additional parts of regex, there is a quicker way:
s.split(string_to_replace).join(replacement)
...and that's all. This is a commonly-understood idiom.
say I want to replace everything but string_to_replace
What does that mean, you want to replace all stretches of text not taking part in a match against the string? A replacement with ^ certainly doesn't this, because ^ means a start-of-string token, not a negation. ^ is only a negation in [] character groups. There are also negative lookaheads (?!...), but there are problems with that in JScript so you should generally avoid it.
You might try matching ‘everything up to’ the string, and using a function to discard any empty stretch between matching strings:
var re= new RegExp('(.*)($|'+escapeRegExp(string_to_find)+')')
return target.replace(re, function(match) {
return match[1]===''? match[2] : replacement+match[2];
});
Here, again, a split might be simpler:
var parts= target.split(string_to_match);
for (var i= parts.length; i-->0;)
if (parts[i]!=='')
parts[i]= replacement;
return parts.join(string_to_match);
As the others have said, use new RegExp(pattern, flags) to do this. It is worth noting that you will be passing string literals into this constructor, so every backslash will have to be escaped. If, for instance you wanted your regex to match a backslash, you would need to say new RegExp('\\\\'), whereas the regex literal would only need to be /\\/. Depending on how you intend to use this, you should be wary of passing user input to such a function without adequate preprocessing (escaping special characters, etc.) Without this, your users may get some very unexpected results.
Yes you can.
https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
function replace_foo(target, string_to_replace, replacement) {
var regex = new RegExp("^" + string_to_replace);
return target.replace(regex, replacement);
}
A really simple solution to this is this:
function replace(target, string_to_replace, replacement) {
return target.split(string_to_replace).join(replacement);
}
No need for Regexes at all
It also seems to be the fastest on modern browsers https://jsperf.com/replace-vs-split-join-vs-replaceall
I think I have very good example for highlight text in string (it finds not looking at register but highlighted using register)
function getHighlightedText(basicString, filterString) {
if ((basicString === "") || (basicString === null) || (filterString === "") || (filterString === null)) return basicString;
return basicString.replace(new RegExp(filterString.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\\\$&'), 'gi'),
function(match)
{return "<mark>"+match+"</mark>"});
}
http://jsfiddle.net/cdbzL/1258/

Categories