How to extract a text using regex? - javascript

My Text
1618148163###JASSER-PC#-#1125015374###anas kayyat#-#1543243035###anas kayyat#-#
Result Should Be:
JASSER-PC
anas kayyat
anas kayyat
I am using :
(?<=###)(.+)(?=#-#)
But it gives me that :
JASSER-PC#-#1125015374###anas kayyat#-#1543243035###anas kayyat

JavaScript’s regular expressions don’t support look-behind assertions (i.e. (?<=…) and (?<!…)), so you can’t use that regular expression. But you can use this:
###(.+)(?=#-#)
Then just take the matched string of the first group. Additionally, to only match as little as possible, make the + quantifier non-greedy by using +?.

The group (.+) will match as much as it can (it's "greedy"). To make it find a minimal match you can use (.+?).

JavaScript does not support lookbehinds. Make the quantifier non greedy, and use:
var regex = /###(.+?)#-#/g;
var strings = [];
var result;
while ((result = regex.exec(input)) != null) {
strings.push(result[1]);
}

I'll give you a non-regex answer, since using regular expressions isn't always appropriate, be it speed or readibility of the regex itself:
function getText(text) {
var arr = text.split("###"); // arr now contains [1618148163,JASSER-PC#-#1125015374,anas kayyat#-#1543243035,anas kayyat#-#]
var newarr = [];
for(var i = 0; i < arr.length; i++) {
var index = arr[i].indexOf("#-#");
if(index != -1) { // if an array element doesn't contain "#-#", we ignore it
newarr.push(arr[i].substring(0, index));
}
}
return newarr;
}
Now, using
getText("1618148163###JASSER-PC#-#1125015374###anas kayyat#-#1543243035###anas kayyat#-#");
returns what you wanted.

Related

Dynamically compare accuracy of regex patterns for match

Supposing I have two regexes and both match a string, but one of them matches it in a stricter way, is there a way to figure that out programmatically?
For example, I'm matching this string:
/path/on/file/system
and I have the following two regular expressions:
const opt1 = /\/path\/on/;
const opt2 = /\/path/;
I can see with my eyes that opt1 is stricter, but how can javascript know about that?
Is converting the regex to a string and checking for character length a good measure of strictness?
You can implement this function by:
Sorting your regular expressions by length.
loop through your sorted regular expressions array to check if there is a match
then return the most strict matching element.
function check(arrayOfRegEx, str) {
//sort the regex array by length
var sortedArr = arrayOfRegEx.sort(function(a, b) {
return a.toString().length - b.toString().length || a.toString().localeCompare(b);
});
let mostStrict = '';
sortedArr.forEach(function(reg) {
if(new RegExp((reg.toString()).replace(/\\/g, "").substring(1)).test(str)) {
mostStrict = reg;
}
});
return mostStrict;
}
var result = check([/\/path/, /\/test\/test/, /\/path\/on/], '/path/on/file/system');
console.log(result); // returns /\/path\/on/
And of course you can tweak this function to fit your needs

How to match a string against an array of pattern in Javascript?

I've two variables:
var input = "user1#gmail.com";
var preferredPatterns = ["*#gmail.com", "*#yahoo.com", "*#live.com"];
Want to match the input with preferred pattern array. If any of the patterns matches I've to do certain task (in this example, input is a definite match). How can I match against an array of pattern in javascript?
You can compile your patterns (if they are valid regular expressions) into one for performance:
var masterPattern = new RegExp(patterns.join('|'));
Putting it all together:
var input = 'user1#gmail.com';
var preferredPatterns = [
".*#gmail.com$",
".*#yahoo.com$",
".*#live.com$"
];
var masterPattern = new RegExp(preferredPatterns.join('|'));
console.log(masterPattern.test(input));
// true
You need to use RegExp constructor while passing a variable as regex.
var input = 'user1#gmail.com';
var preferredPatterns = [".*#gmail\\.com$", ".*#yahoo\\.com$", ".*#live\\.com$"];
for (i=0; i < preferredPatterns.length;i++) {
if(input.match(RegExp(preferredPatterns[i]))) {
console.log(preferredPatterns[i])
}
}
Dot is a special meta-character in regex which matches any character. You need to escape the dot in the regex to match a literal dot.
As #zerkms said, you could use the below list of patterns also.
var preferredPatterns = ["#gmail\\.com$", "#yahoo\\.com$", "#live\\.com$"];
Try this helper function:
/**
* Returns an integer representing the number of items in the patterns
* array that contain the target string text
*/
function check(str, patterns) {
return patterns.reduce(function (previous, current) {
return previous + (str.indexOf(current) == -1 ? 0 : 1);
}, 0);
}
check("user#gmail.com", ["#gmail.com", "#yahoo.com", "#live.com"]; // returns 1
check("user#live.com", ["#gmail.com", "#yahoo.com", "#live.com"]; // returns 0
If you want a general approach to matching against a list of regular expressions then some version of Avinash Raj's answer will work.
Based on the fact that you are specifying certain domains, you might want to match any valid email address using the regex here, and if it matches then check if the domain is a preferred one. There are a number of different ways you could do that of course. Here's just a simple example, splitting on the # and using jQuery.inArray() to check if the domain is preferred.
var preferredDomains = ["gmail.com", "yahoo.com", "live.com"];
function isValid(inputVal) {
var re = /^([\w-]+(?:\.[\w-]+)*)#((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$/i;
return re.test(inputVal) && $.inArray(inputVal.split('#')[1], preferredDomains) > -1;
}
The advantage here is that the underlying regex doesn't change, just the much easier to read/maintain list of domains. You could tweak this to capture the domain in a group, instead of using split().
No regexp you may do as follows;
function matchPattern(xs, [y,...ys]){
function helper([x,...xs]){
return "*" + xs.join('') === y ? true
: xs.length ? helper(xs)
: false;
}
return helper(xs) ? y
: ys.length ? matchPattern(xs,ys)
: "No Match..!";
}
var input = "user1#gmail.com",
preferredPatterns = ["*#yahoo.com", "*#live.com", "*#gmail.com"];
result = matchPattern(input, preferredPatterns);
console.log(result);
preferredPatterns.forEach(function(element, index){
if(input.match('/'+element) != null){
console.log('matching ' + element)
}
})
you can write your custom logic if a string matches any pattern.
You may iterate through array and then use regex to compare with individual items.

replace non matches between delimiters

I've have a input string:
12345,3244,654,ffgv,87676,988ff,87657
I'm having a difficulty to transform all terms in the string that are not five digit numbers to a constant 34567 using regular expressions. So, the output would be like this:
12345,34567,34567,34567,87676,34567,87657
For this, I looked at two options:
negated character class: Not useful because it does not execute directly on this expression ,[^\d{5}],
lookahead and lookbehind: Issue here is that it doesn't include non-matched part in the result of this expression ,(?!\d{5}) or (?<!\d{5}), for the purpose of substitution/replace.
Once the desired expression is found, it would give a result so that one can replace non-matched part using tagged regions like \1, \2.
Is there any mechanism in regular expression tools to achieve the output as mentioned in the above example?
Edit: I really appreciate those who have answered non-regex solutions, but I would be more thankful if you provide a regex-based solution.
You don't need regex for this. You can use str.split to split the string at commas first and then for each item check if its length is greater than or equal to 5 and it contains only digits(using str.isdigit). Lastly combine all the items using str.join.
>>> s = '12345,3244,654,ffgv,87676,988ff,87657'
>>> ','.join(x if len(x) >= 5 and x.isdigit() else '34567' for x in s.split(','))
'12345,34567,34567,34567,87676,34567,87657'
Javascript version:
function isdigit(s){
for(var i=0; i <s.length; i++){
if(!(s[i] >= '0' && s[i] <= '9')){
return false;
}
}
return true;
}
arr = "12345,3244,654,ffgv,87676,988ff,87657".split(",");
for(var i=0; i < arr.length; i++){
if(arr[i].length < 5 || ! isdigit(arr[i])) arr[i] = '34567';
}
output = arr.join(",")
Try the following: /\b(?!\d{5})[^,]+\b/g
It constrains the expression between word boundaries (\b),
Followed by a negative look-ahead for non five digit numbers (!\d{5}),
Followed by any characters between ,
const expression = /\b(?!\d{5})[^,]+\b/g;
const input = '12345,3244,654,ffgv,87676,988ff,87657';
const expectedOutput = '12345,34567,34567,34567,87676,34567,87657';
const output = input.replace(expression, '34567');
console.log(output === expectedOutput, expectedOutput, output);
This approach uses /\b(\d{5})|(\w+)\b/g:
we match on boundaries (\b)
our first capture group captures "good strings"
our looser capture group gets the leftovers (bad strings)
our replacer() function knows the difference
const str = '12345,3244,654,ffgv,87676,988ff,87657';
const STAND_IN = '34567';
const massageString = (str) => {
const pattern = /\b(\d{5})|(\w+)\b/g;
const replacer = (match, goodstring, badstring) => {
if (goodstring) {
return goodstring;
} else {
return STAND_IN;
}
}
const r = str.replace(pattern,replacer);
return r;
};
console.log( massageString(str) );
I think the following would work for value no longer than 5 alphanumeric characters:
(,(?!\d{5})\w{1,5})
if longer than 5 alphanumeric characters, then remove 5 in above expression:
(,(?!\d{5})\w{1,})
and you can replace using:
,34567
You can see a demo on regex101. Of course, there might be faster non-regex methods for specific languages as well (python, perl or JS)

javascript:get strings between specific strings

I hava a long string like this detail?ww=hello"....detail?ww=that".I want't to get all strings between detail?ww= and the next ",I use .match(/detail\?ww=.+\"/g) but the array i get contains detail?ww= and ",how can I only get strings without detail?ww= and "
If JavaScript understood lookbehind, you could use that to match strings preceded by detail?ww= and followed by ;. Unfotunately that is not the case, so a little more processing is required:
var str = 'detail?ww=hello"....detail?ww=that"';
var regexG = /detail\?ww\=(.+?)\"/g;
var regex = /detail\?ww\=(.+?)\"/;
var matches = str.match(regexG).map(function(item){ return item.match(regex)[1] });
console.log(matches);
Some changes to your regexp:
+? - non-greedy quantifier.
You could do this using a basic loop :
var result = [],
s = 'detail?ww=hello"....detail?ww=that"',
r = /detail\?ww=([^"]+)/g,
m;
while (m = r.exec(s)) {
result.push(m[1]);
}
result; // ["hello", "that"]
[^"]+ : any char except double quotes, one or more times
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec
Keep in mind that map is not supported by IE8 and below : http://kangax.github.io/es5-compat-table/#Array.prototype.map. If you really don't like loops but need a cross browser compatible solution, here is an alternative :
var s = 'detail?ww=hello"....detail?ww=that"';
s = s.replace(/.*?detail\?ww=([^"]+")/g, '$1').match(/[^"]+/g) || [];
s; // ["hello", "that"]

Javascript regex - split string

Struggling with a regex requirement. I need to split a string into an array wherever it finds a forward slash. But not if the forward slash is preceded by an escape.
Eg, if I have this string:
hello/world
I would like it to be split into an array like so:
arrayName[0] = hello
arrayName[1] = world
And if I have this string:
hello/wo\/rld
I would like it to be split into an array like so:
arrayName[0] = hello
arrayName[1] = wo/rld
Any ideas?
I wouldn't use split() for this job. It's much easier to match the path components themselves, rather than the delimiters. For example:
var subject = 'hello/wo\\/rld';
var regex = /(?:[^\/\\]+|\\.)+/g;
var matched = null;
while (matched = regex.exec(subject)) {
print(matched[0]);
}
output:
hello
wo\/rld
test it at ideone.com
The following is a little long-winded but will work, and avoids the problem with IE's broken split implementation by not using a regular expression.
function splitPath(str) {
var rawParts = str.split("/"), parts = [];
for (var i = 0, len = rawParts.length, part; i < len; ++i) {
part = "";
while (rawParts[i].slice(-1) == "\\") {
part += rawParts[i++].slice(0, -1) + "/";
}
parts.push(part + rawParts[i]);
}
return parts;
}
var str = "hello/world\\/foo/bar";
alert( splitPath(str).join(",") );
Here's a way adapted from the techniques in this blog post:
var str = "Testing/one\\/two\\/three";
var result = str.replace(/(\\)?\//g, function($0, $1){
return $1 ? '/' : '[****]';
}).split('[****]');
Live example
Given:
Testing/one\/two\/three
The result is:
[0]: Testing
[1]: one/two/three
That first uses the simple "fake" lookbehind to replace / with [****] and to replace \/ with /, then splits on the [****] value. (Obviously, replace [****] with anything that won't be in the string.)
/*
If you are getting your string from an ajax response or a data base query,
that is, the string has not been interpreted by javascript,
you can match character sequences that either have no slash or have escaped slashes.
If you are defining the string in a script, escape the escapes and strip them after the match.
*/
var s='hello/wor\\/ld';
s=s.match(/(([^\/]*(\\\/)+)([^\/]*)+|([^\/]+))/g) || [s];
alert(s.join('\n'))
s.join('\n').replace(/\\/g,'')
/* returned value: (String)
hello
wor/ld
*/
Here's an example at rubular.com
For short code, you can use reverse to simulate negative lookbehind
function reverse(s){
return s.split('').reverse().join('');
}
var parts = reverse(myString).split(/[/](?!\\(?:\\\\)*(?:[^\\]|$))/g).reverse();
for (var i = parts.length; --i >= 0;) { parts[i] = reverse(parts[i]); }
but to be efficient, it's probably better to split on /[/]/ and then walk the array and rejoin elements that have an escape at the end.
Something like this may take care of it for you.
var str = "/hello/wo\\/rld/";
var split = str.replace(/^\/|\\?\/|\/$/g, function(match) {
if (match.indexOf('\\') == -1) {
return '\x00';
}
return match;
}).split('\x00');
alert(split);

Categories