preg_match_all to javascript - javascript

Can anyone help converting this PHP regex pattern to a JavaScript compatible one?
PHP
preg_match_all('/(?ims)([a-z0-9\s\.\:#_\-#,]+)\{([^\}]*)\}/', $data, $matches);
Javascript
matches=data.match(/(?ims)([a-z0-9\s\.\:#_\-#,]+)\{([^\}]*)\}/);
Thanks

To mimic the PHP output, you need to look at more than only the regular expression. For a single match, the JavaScript match method will do the trick, but for multiple matches it will no longer return the captured groups. In fact, there is no out-of-the-box statement in JavaScript that is equivalent to preg_match_all.
The method that comes closest is regex.exec, being the only method that can both return the captured groups and multiple matches. But it does not return all the matches in one go. Instead you need to iterate over them, for instance like this:
for (matches = []; result = regex.exec(data); matches.push(result));
Now the regular expression needs also some adjustments:
In Javascript the modifiers (ims) cannot be specified like you have it, they must be specified at the end, after the closing slash. Note that in PHP you can do the same.
Secondly, Javascript has no support for the s modifier. But in your case this is not a problem, as your regular expression does not rely on it -- you would get the same results without that modifier.
In order for the exec method to return multiple matches, the regular expression must use the g modifier, which is not needed nor allowed in PHP's preg_match_all -- the _all in the method name already takes care of that.
So, in JavaScript, the regular expression will be defined like this:
var regex = /([a-z0-9\s\.\:#_\-#,]+)\{([^\}]*)\}/gim;
Finally, the matches are returned in a different format than in PHP. Let's say the data is "Alpha{78} Beta{333}", then PHP will return:
[["Alpha{78}"," Beta{333}"],["Alpha"," Beta"],["78","333"]]
But the above JavaScript code returns that data in a transposed way (rows and columns are swapped):
[["Alpha{78}","Alpha","78"],[" Beta{333}"," Beta","333"]]
So, if you also want that to be the same, you need to transpose that array. Here is a generic transpose function you could use for that:
function transpose(a) {
return a[0].map(function (val, c) {
return a.map(function (r) {
return r[c];
});
});
}
So putting it all together, this will do the job:
function transpose(a) {
return a[0].map(function (val, c) {
return a.map(function (r) {
return r[c];
});
});
}
// test data:
var data = "Alpha{78} Beta{333}";
var regex = /([a-z0-9\s\.\:#_\-#,]+)\{([^\}]*)\}/gim;
// collect all matches in array
for (var matches = []; result = regex.exec(data); matches.push(result));
// reorganise the array to mimic PHP output:
matches = transpose(matches);
console.log(JSON.stringify(matches));
// for this snippet only:
document.write(JSON.stringify(matches));
Output:
[["Alpha{78}"," Beta{333}"],["Alpha"," Beta"],["78","333"]]

Related

Regex to define the number of appearances substituted [duplicate]

I'd like to know how to replace a capture group with its uppercase in JavaScript. Here's a simplified version of what I've tried so far that's not working:
> a="foobar"
'foobar'
> a.replace( /(f)/, "$1".toUpperCase() )
'foobar'
> a.replace( /(f)/, String.prototype.toUpperCase.apply("$1") )
'foobar'
Would you explain what's wrong with this code?
You can pass a function to replace.
var r = a.replace(/(f)/, function(v) { return v.toUpperCase(); });
Explanation
a.replace( /(f)/, "$1".toUpperCase())
In this example you pass a string to the replace function. Since you are using the special replace syntax ($N grabs the Nth capture) you are simply giving the same value. The toUpperCase is actually deceiving because you are only making the replace string upper case (Which is somewhat pointless because the $ and one 1 characters have no upper case so the return value will still be "$1").
a.replace( /(f)/, String.prototype.toUpperCase.apply("$1"))
Believe it or not the semantics of this expression are exactly the same.
I know I'm late to the party but here is a shorter method that is more along the lines of your initial attempts.
a.replace('f', String.call.bind(a.toUpperCase));
So where did you go wrong and what is this new voodoo?
Problem 1
As stated before, you were attempting to pass the results of a called method as the second parameter of String.prototype.replace(), when instead you ought to be passing a reference to a function
Solution 1
That's easy enough to solve. Simply removing the parameters and parentheses will give us a reference rather than executing the function.
a.replace('f', String.prototype.toUpperCase.apply)
Problem 2
If you attempt to run the code now you will get an error stating that undefined is not a function and therefore cannot be called. This is because String.prototype.toUpperCase.apply is actually a reference to Function.prototype.apply() via JavaScript's prototypical inheritance. So what we are actually doing looks more like this
a.replace('f', Function.prototype.apply)
Which is obviously not what we have intended. How does it know to run Function.prototype.apply() on String.prototype.toUpperCase()?
Solution 2
Using Function.prototype.bind() we can create a copy of Function.prototype.call with its context specifically set to String.prototype.toUpperCase. We now have the following
a.replace('f', Function.prototype.apply.bind(String.prototype.toUpperCase))
Problem 3
The last issue is that String.prototype.replace() will pass several arguments to its replacement function. However, Function.prototype.apply() expects the second parameter to be an array but instead gets either a string or number (depending on if you use capture groups or not). This would cause an invalid argument list error.
Solution 3
Luckily, we can simply substitute in Function.prototype.call() (which accepts any number of arguments, none of which have type restrictions) for Function.prototype.apply(). We have now arrived at working code!
a.replace(/f/, Function.prototype.call.bind(String.prototype.toUpperCase))
Shedding bytes!
Nobody wants to type prototype a bunch of times. Instead we'll leverage the fact that we have objects that reference the same methods via inheritance. The String constructor, being a function, inherits from Function's prototype. This means that we can substitute in String.call for Function.prototype.call (actually we can use Date.call to save even more bytes but that's less semantic).
We can also leverage our variable 'a' since it's prototype includes a reference to String.prototype.toUpperCase we can swap that out with a.toUpperCase. It is the combination of the 3 solutions above and these byte saving measures that is how we get the code at the top of this post.
Why don't we just look up the definition?
If we write:
a.replace(/(f)/, x => x.toUpperCase())
we might as well just say:
a.replace('f','F')
Worse, I suspect nobody realises that their examples have been working only because they were capturing the whole regex with parentheses. If you look at the definition, the first parameter passed to the replacer function is actually the whole matched pattern and not the pattern you captured with parentheses:
function replacer(match, p1, p2, p3, offset, string)
If you want to use the arrow function notation:
a.replace(/xxx(yyy)zzz/, (match, p1) => p1.toUpperCase()
Old post but it worth to extend #ChaosPandion answer for other use cases with more restricted RegEx. E.g. ensure the (f) or capturing group surround with a specific format /z(f)oo/:
> a="foobazfoobar"
'foobazfoobar'
> a.replace(/z(f)oo/, function($0,$1) {return $0.replace($1, $1.toUpperCase());})
'foobazFoobar'
// Improve the RegEx so `(f)` will only get replaced when it begins with a dot or new line, etc.
I just want to highlight the two parameters of function makes finding a specific format and replacing a capturing group within the format possible.
SOLUTION
a.replace(/(f)/,(m,g)=>g.toUpperCase())
for replace all grup occurrences use /(f)/g regexp. The problem in your code: String.prototype.toUpperCase.apply("$1") and "$1".toUpperCase() gives "$1" (try in console by yourself) - so it not change anything and in fact you call twice a.replace( /(f)/, "$1") (which also change nothing).
let a= "foobar";
let b= a.replace(/(f)/,(m,g)=>g.toUpperCase());
let c= a.replace(/(o)/g,(m,g)=>g.toUpperCase());
console.log("/(f)/ ", b);
console.log("/(o)/g", c);
Given a dictionary (object, in this case, a Map) of property, values, and using .bind() as described at answers
const regex = /([A-z0-9]+)/;
const dictionary = new Map([["hello", 123]]);
let str = "hello";
str = str.replace(regex, dictionary.get.bind(dictionary));
console.log(str);
Using a JavaScript plain object and with a function defined to get return matched property value of the object, or original string if no match is found
const regex = /([A-z0-9]+)/;
const dictionary = {
"hello": 123,
[Symbol("dictionary")](prop) {
return this[prop] || prop
}
};
let str = "hello";
str = str.replace(regex, dictionary[Object.getOwnPropertySymbols(dictionary)[0]].bind(dictionary));
console.log(str);
In the case of string conversion from CamelCase to bash_case (ie: for filenames), use a callback with ternary operator.
The captured group selected with a regexp () in the first (left) replace arg is sent to the second (right) arg that is a callback function.
x and y give the captured string (don't know why 2 times!) and index (the third one) gives the index of the beginning of the captured group in the reference string.
Therefor a ternary operator can be used not to place _ at first occurence.
let str = 'MyStringName';
str = str.replace(/([^a-z0-9])/g, (x,y,index) => {
return index != 0 ? '_' + x.toLowerCase() : x.toLowerCase();
});
console.log(str);

How can I perform a global replace on a variable?

I have a basic replace function, but I need it to perform a global replace, as it seems to be stopping on the first instance. I do not want to do it with a Regex. Applying the global attribute seems easy enough in most examples, but I am passing in a variable as the value to be replaced, and /g is having no impact. What am I doing wrong? Here is the example without the /g:
test string
"Why is my ^%friend so ^%? Maybe I need a ^!% one, abrand^!% one"
Simple replace function
function translate(oddStr) {
var tagDictionary = {};
tagDictionary['^%'] = 'odd';
tagDictionary['^!%'] = 'new';
Object.keys(tagDictionary).forEach( function (tag) {
oddStr = oddStr.replace(tag, tagDictionary[tag]);
});
return oddStr;
};
This function returns the first instance of each replaced, as expected. How can I apply /g to the tag variable in the forEach?
Use a split-join combo like this:
oddStr = oddStr.split(tag).join(tagDictionary[tag]);
"Why is my ^% friend so ^%? Maybe I need a ^!% one, abrand ^!% one".replace(/\^%/g, 'odd').replace(/\^!%/g, 'new')
"Why is my odd friend so odd? Maybe I need a new one, abrand new one"
If you need to create the regular expression from string, you can use RegExp constructor: new RegExp('\\^%', 'g').
If you don't have control over the tag-dictionary and it is coming from some external resource, then you will have to properly escape the tags.
Instead of using adhoc symbols for templating you should ideally use something like lodash.template
You need to escape your regex special characters (^=Start of string)
function translate(oddStr) {
var tagDictionary = {
'\\^%' : "odd",
'\\^!%' : 'new'
};
Object.keys(tagDictionary).forEach( function (tag) {
var r = new RegExp(tag, "g");
oddStr = oddStr.replace(r, tagDictionary[tag]);
});
return oddStr;
};
console.log(translate("Why is my ^%friend so ^%? Maybe I need a ^!% one, a brand ^!% one"));

syntax for javascript reverse function

hey guys I understand that the following will work for reversing a string passed to the function:
function reverseString(str) {
return str.split('').reverse().join('');
}
reverseString("hello");
however can someone help me understand why the following won't work?
function reverseString(str) {
str.split(' ');
str.reverse();
str.join(' ');
return str;
}
Those functions don't modify the string; strings are immutable. The functions return new values.
So, a statement like
str.split('');
is a valid statement, but it has no net effect because the returned array is ignored. In the first example you quoted, the returned values are used as object contexts from which the subsequent functions are accessed and called. The return statement there returns the result of the last function call in the chain (the .join() call).
Try using var , if expected result is to re-define str , set using str = /*new value */
function reverseString(str) {
var copy = str.split("");
copy.reverse();
str = copy.join("");
return str;
}
console.log(reverseString("hello"))
Firstly, strings are immutable, you can create new required string, using methods like the first one to return value by operating/invoking methods.
The first function calls methods in chain. Meaning, the return value of one function (object/reference types only) to invoke its method to compute new result.
str.split('').reverse().join('');
Here, split returns array and array's reverse method reverses contents in array (index) and finally join method joins the array elements with ' ' as separator.
whereas in second function its just a sequential call of statements. I guess in str.join(' '); there is no function called join in string prototype.
The method calls are chained so that each method uses the return value from the previous. You can do the same in separate statements if you keep the return value so that you can use it in the next statement:
function reverseString(str) {
var arr = str.split('');
arr = arr.reverse();
var result = arr.join('');
return result;
}
console.log(reverseString("hello"));
Note also that there is a difference between split('') and split(' '). The first one splits between each character while the second one splits at space characters.

Intersecting texts to find common words

I'm trying to find out which would be the most optimal way of intersection a set of texts and find the common words in them. Given this scenario:
var t1 = 'My name is Mary-Ann, and I come from Kansas!';
var t2 = 'John, meet Mary, she comes from far away';
var t3 = 'Hi Mary-Ann, come here, nice to meet you!';
intersection result should be:
var result =["Mary"];
It should be able to ignore punctuation marks like .,!?-
Would a solution with regular expressions be optimal?
Here's a tested solution :
function intersect() {
var set = {};
[].forEach.call(arguments, function(a,i){
var tokens = a.match(/\w+/g);
if (!i) {
tokens.forEach(function(t){ set[t]=1 });
} else {
for (var k in set){
if (tokens.indexOf(k)<0) delete set[k];
}
}
});
return Object.keys(set);
}
This function is variadic, you can call it with any number of texts :
console.log(intersect(t1, t2, t3)) // -> ["Mary"]
console.log(intersect(t1, t2)) // -> ["Mary", "from"]
console.log(intersect()) // -> []
If you need to support non English languages, then this regex won't be enough because of the poor support of Unicode in JavaScript regexes. Either you use a regex library or you define your regex by explicitly excluding characters as in a.match(/[^\s\-.,!?]+/g); (this will probably be enough for you) .
Detailed explanation :
The idea is to fill a set with the tokens of the first text and then remove from the set the tokens missing in the other texts.
The set is a JavaScript object used as a map. Some purists would have used Object.create(null) to avoid a prototype, I like the simplicity of {}.
As I want my function to be variadic, I use arguments instead of defining the passed texts as explicit arguments.
arguments isn't a real array, so to iterate over it you need either a for loop or a trick like [].forEach.call. It works because arguments is "array-like".
To tokenize, I simply use match to match words, nothing special here (see note above regarding better support of other languages, though)
I use !i to check if it's the first text. In that case, I simply copy the tokens as properties in the set. A value must be used, I use 1. In the future, ES6 sets will make the intent more obvious here.
For the following texts, I iterate over the elements of the sets (the keys) and I remove the ones which are not in the array of tokens (tokens.indexOf(k)<0)
Finally, I return the elements of the sets because we want an array. The simplest solution is to use Object.keys.

Should I regexp.test before I string.replace?

When I want to replace some parts of a string, should I call replace directly like this?
var r1 = /"\+((:?[\w\.]+)(:?(:?\()(:?.*?)(:?\))|$){0,1})\+"/g;
arg = arg.replace(r1, function(outer, inner){
return eval(inner);
});
Or test for a match first, and then replace if it's a hit, like this?
var r1 = /"\+((:?[\w\.]+)(:?(:?\()(:?.*?)(:?\))|$){0,1})\+"/g;
if (r1.test(arg)) {
arg = arg.replace(r1, function(outer, inner){
return eval(inner);
});
}
I guess this boils down to how the string.replace(regex, string) function works. Will it go into my callback even if there is no match, or will it then simply return arg? In that case I assume the calling replace directly is the right way to go to avoid having the regex engine match the string twice?
You don have to use test. The function in replace only executed when a match occurs.
No matches: No function (eval) call
1 match: 1 call
2 matches: 2 calls
etc.
Also, why are you using eval? eval executes the parameter, as if it's a JavaScript expression. Since you know the input format, it's likely that you're able to achieve the same behaviour without eval.

Categories