I have problem with regex and need some help.
Current I have url has type
/search/:year/:month/:day/xxxx with :month and :day maybe exist or not . Now I need replace /search/:year/:month/:day patern on my url with empty string. Meaning get remain of url part. So this is some example below
1.'/search/2017/02/03/category/gun' => '/category/gun'
2.'/search/2017/02/03/' => '/'
3.'/search/2017/01/category/gun' => '/category/gun/'
4.'/search/2017/category/gun/' => '/category/gun/'
5.'/search/2018/?category=gun&type%5B%5D=sendo' => '/?category=gun&type%5B%5D=sendo/'
I try to use regex = /^\/search\/((?:\d{4}))(?:\/((?:\d|1[012]|0[1-9])))?(?:\/((?:[0-3]\d)))/
But it is failed for case /search/2017/category/gun/
const regex = /^\/search\/((?:\d{4}))(?:\/((?:\d|1[012]|0[1-9])))?(?:\/((?:[0-3]\d)))/
const testLink = [
'/search/2017/02/03/category/gun/',
'/search/2017/01/category/gun/',
'/search/2017/category/gun/',
'/search/2017/02/03/category/gun/',
'/search/2018/?category=gun&type%5B%5D=sendo'
]
testLink.forEach((value, i) => {
console.log(value.replace(regex, ''))
console.log('-------------------')
})
Use this regex pattern (\/search\/.*\d+)(?=\/)
Demo
const regex = /(\/search\/.*\d+)(?=\/)/g;
const testLink = [
'/search/2017/02/03/category/gun/',
'/search/2017/01/category/gun/',
'/search/2017/',
'/search/2017/02/03/category/gun/',
'/search/2018/?category=gun&type%5B%5D=sendo'
]
testLink.forEach((value, i) => {
console.log(value.replace(regex, ''))
console.log('-------------------')
})
Splitting on the following pattern seems to work:
\d{4}(?:/\d{2})*
This would place the target you want as the second element in the split array, with the first portion being what precedes the year-digit portions of the path.
input = '/search/2017/02/03/category/gun';
parts = input.split(/\d{4}(?:\/\d{2})*/);
console.log(parts[1]);
console.log('/search/2017/02/03'.split(/\d{4}(?:\/\d{2})*/)[1]);
console.log('/search/2017/01/category/gun'.split(/\d{4}(?:\/\d{2})*/)[1]);
console.log('/search/2017/category/gun/'.split(/\d{4}(?:\/\d{2})*/)[1]);
Whether you do or do not expect a final trailing path separator is not entirely clear. In any case, the above answered can be modified per that expectation.
Here's one way:
strings = [
'/search/2017/02/03/category/gun',
'/search/2017/02/03/',
'/search/2017/01/category/gun',
'/search/2017/category/gun/',
]; for (var i = 0; i < strings.length; i++) {
alert(strings[i].replace(/\/search\/[0-9\/]+(\/)(category\/[^\/$]+)?.*/,'$1$2'));
}
you can use the same regex but revers the order of this part (?:\d|1[012]|0[1-9])
as following (?:\d|1[012]|0[1-9]) and make last group optional as following ((?:[0-3]\d)))?
const regex = /^\/search\/((?:\d{4}))(?:\/((?:0[1-9]|1[012]|\d)))?(?:\/((?:[0-3]\d)))?/
const testLink = [
'/search/2017/02/03/category/gun/',
'/search/2017/01/category/gun/',
'/search/2017/category/gun/',
'/search/2017/02/03/category/gun/'
]
output
/category/gun/
/category/gun/
/category/gun/
/category/gun/
if you are sure of date format so you can simplify your exp to be ^\/search\/\d{4}(?:\/\d{1,2}){0,2}
Related
I want to know about the algorithm for below question in JavaScript.
Check whether the given word can be "programming" or not by removing the substring between them. You can only remove one substring from the given the word.
Give answer in 'yes' and 'no'
example answer explanation
"progabcramming" yes remove substring 'abc'
"programmmeding" yes remove substring 'med'
"proasdgrammiedg" no u have to remove 2 subtring 'asd' and 'ied'
which is not allowed
"pxrogramming" yes remove substring 'x'
"pxrogramminyg" no u have to remove 2 subtring 'x' and 'y'
which is not allowed
Please tell me an algorithm to solve it
{
// will create a regexp for fuzzy search
const factory = (str) => new RegExp(str.split('').join('(.*?)'), 'i')
const re = factory('test') // re = /t(.*?)e(.*?)s(.*?)t/i
const matches = re.exec('te-abc-st') ?? [] // an array of captured groups
const result = matches
.slice(1) // first element is a full match, we don't need it
.filter(group => group.length) // we're also not interested in empty matches
// now result contains a list of captured groups
// in this particular example a single '-abc-'
}
I'm not sure how efficient this code is, but only thing i can come up with is using regular expression.
const word = 'programming';
const test = ['progabcramming', 'programmmeding', 'proasdgrammiedg', 'pxrogramming', 'pxrogramminyg', 'programming'];
// create regular expression manually
// const regexp = /^(p).+(rogramming)|(pr).+(ogramming)|(pro).+(gramming)|(prog).+(ramming)|(progr).+(amming)|(progra).+(mming)|(program).+(ming)|(programm).+(ing)|(programmi).+(ng)|(programmin).+(g)$/;
// create regular expression programmatically
let text = '/^';
word.split('').forEach((character, i) => {
text += i ? `(${word.substring(0, i)}).+(${word.substring(i)})|` : '';
});
text = text.substring(text.length - 1, 1) + '$/';
const regexp = new RegExp(text);
// content output
let content = '';
test.forEach(element => {
content += `${element}: ${regexp.test(element)}\n`;
});
document.body.innerText = content;
Lowercase everything after firts appearance of the character in a string in JS
One option is using regular expression:
str.replace(/\.([^.]*?)$/, (m) => m.toLowerCase())
What you can do is splitting the string at ".", then convert the last part .toLowerCase() and finally .join() everything back together.
const t = 'qwery.ABC.ABC';
const parts = t.split(".");
console.log(parts.slice(0, -1).join(".") + "." + parts[parts.length - 1].toLowerCase());
One could argue whether that would actually be a cleaner variant. What usually isn't a bad idea for code readability is writing a utility function for that use case.
const t = "qwery.ABC.ABC";
const lastBitToLowerCase = (text, separator) => {
const parts = t.split(separator);
return `${parts.slice(0, -1).join(separator)}${separator}${parts[
parts.length - 1
].toLowerCase()}`;
};
const result = lastBitToLowerCase(t, "."); // "qwery.ABC.abc"
Regex using negative lookahead:
const re = /\.((?:.(?!\.))+)$/;
const inputs = [
"qwerty.ABC.ABC",
"yuiop.uu",
"QWERT.YUIOP"
];
inputs.forEach(input => {
const result = input.replace(re, x => x.toLowerCase());
console.log(input, "-->", result);
});
Regex described here: https://regexr.com/6qk6r
I have to the following long string. How do I extract all the values that are in between "url=" and "," so that I then have the following array?
"load("#bazel_tools//tools/build_defs/repo:http.bzl","http_jar")definclude_java_deps():http_jar(name="com_google_inject_guice",sha256="b378ffc35e7f7125b3c5f3a461d4591ae1685e3c781392f0c854ed7b7581d6d2",url="https://repo1.maven.org/maven2/com/google/inject/guice/4.0/guice-4.0.jar",)http_jar(name="org_sonatype_sisu_inject_cglib",sha256="42e1dfb26becbf1a633f25b47e39fcc422b85e77e4c0468d9a44f885f5fa0be2",url="https://repo1.maven.org/maven2/org/sonatype/sisu/inject/cglib/2.2.1-v20090111/cglib-2.2.1-v20090111.jar",)http_jar(name="javax_inject_javax_inject",sha256="91c77044a50c481636c32d916fd89c9118a72195390452c81065080f957de7ff",url="https://repo1.maven.org/maven2/javax/inject/javax.inject/1/javax.inject-1.jar",)"
[
https://repo1.maven.org/maven2/com/google/inject/guice/4.0/guice-4.0.jar,
https://repo1.maven.org/maven2/org/sonatype/sisu/inject/cglib/2.2.1-v20090111/cglib-2.2.1-v20090111.jar,
https://repo1.maven.org/maven2/javax/inject/javax.inject/1/javax.inject-1.jar
]
I've tried the following, but it only gives me the first occurrence of it, but I need them all. Thanks!
var arr = contents.split('url=').pop().split(',')
for(i in arr) {
console.log(arr[i]);
}
You can solve this by using a Regular Expression
const regEx = /(?:url=")([^,]+)(?:",)/gm;
const string = 'load("#bazel_tools//tools/build_defs/repo:http.bzl","http_jar")definclude_java_deps():http_jar(name="com_google_inject_guice",sha256="b378ffc35e7f7125b3c5f3a461d4591ae1685e3c781392f0c854ed7b7581d6d2",url="https://repo1.maven.org/maven2/com/google/inject/guice/4.0/guice-4.0.jar",)http_jar(name="org_sonatype_sisu_inject_cglib",sha256="42e1dfb26becbf1a633f25b47e39fcc422b85e77e4c0468d9a44f885f5fa0be2",url="https://repo1.maven.org/maven2/org/sonatype/sisu/inject/cglib/2.2.1-v20090111/cglib-2.2.1-v20090111.jar",)http_jar(name="javax_inject_javax_inject",sha256="91c77044a50c481636c32d916fd89c9118a72195390452c81065080f957de7ff",url="https://repo1.maven.org/maven2/javax/inject/javax.inject/1/javax.inject-1.jar",)'
const matches = string.matchAll(regEx);
for (const match of matches) {
console.log(match[1]);
}
Or with string methods
const string = 'load("#bazel_tools//tools/build_defs/repo:http.bzl","http_jar")definclude_java_deps():http_jar(name="com_google_inject_guice",sha256="b378ffc35e7f7125b3c5f3a461d4591ae1685e3c781392f0c854ed7b7581d6d2",url="https://repo1.maven.org/maven2/com/google/inject/guice/4.0/guice-4.0.jar",)http_jar(name="org_sonatype_sisu_inject_cglib",sha256="42e1dfb26becbf1a633f25b47e39fcc422b85e77e4c0468d9a44f885f5fa0be2",url="https://repo1.maven.org/maven2/org/sonatype/sisu/inject/cglib/2.2.1-v20090111/cglib-2.2.1-v20090111.jar",)http_jar(name="javax_inject_javax_inject",sha256="91c77044a50c481636c32d916fd89c9118a72195390452c81065080f957de7ff",url="https://repo1.maven.org/maven2/javax/inject/javax.inject/1/javax.inject-1.jar",)'
const arr = string.split('url="');
const urls = arr
.filter((subStr) => subStr.includes('https://'))
.map((subStr) => subStr.split('",)')[0]);
console.log(urls);
The RegEx solution is of course widely more flexible.
Be aware that Regular Expressions can be "unsafe" which means they might have extremely long evaluation times depending on the input. Libraries like this can help you detect these
I have implemented the search feature using JavaScript and Regex. Firstly, I converted the input string into tokens then searched for it in the target array.
This is the sample code.
const tokens = inputString
.toLowerCase()
.split(' ')
.filter(function (token) {
return token.trim() !== ''
})
const searchTermRegex = new RegExp(tokens.join(' '), 'gim')
const filteredList = targetArray.filter(function (item) {
return item.match(searchTermRegex)
})
This code is running fine, only problem is it does not search if the words are present in random order.
For example, if target string is "scrape the data from pages", and I search for "data scrape" then it is not able to detect it.
What's the better solution for it?
Expected Output: If at least a single word from the input is present in the target string, it should show that string in the final output.
Since it's not clear whether your need both the words in your targeted result or either of them, there is an answer for either case therefore I'm adding solution if both words are need for which you'll need positive lookaheads, and it seems like your requirement is that you need to have both word complete in your inputString, that's why I've added word boundaries in the regex using \b, if that's not needed you can update the token mapper with this:
(?=.*${token})
Also I've refactored your code a little bit, hope that helps
const tokens = inputString
.split(' ')
.filter(Boolean)
.map(token => `(?=.*\\b${token}\\b)`);
const searchTermRegex = new RegExp(tokens.join(''), 'gim');
const filteredList = targetArray.filter(item => item.match(searchTermRegex));
You could do like this. I think split not necessary
const inputString = "scrape the data from pages"
const targetArray = ["data", "scrape"]
const filteredList = targetArray.every(function(item) {
return inputString.indexOf(item) > -1
})
console.log("matchStatus", filteredList)
OR Regex
const inputString = "scrape the data from pages"
const targetArray = ["data", "scrape"]
const filteredList = new RegExp(targetArray.join('|'),'gim').test(inputString)
console.log("matchStatus", filteredList)
I'd like to transform a string like:
hello!world.what?up into ["hello!", "world.", "what?", "up"]
.split(/[?=<\.\?\!>]+/) is close to what I'm after, which returns:
["hello", "world", "what", "up"]
.split(/(?=[\?\!\.])/) is a bit closer yet, which returns:
["hello", "!world", ".what", "?up"]
This does the trick, but it's not pretty:
.split(/(?=[\?\!\.])/).map((s, idx, arr) => { (idx > 0) s = s.slice(1); return idx < arr.length - 1 ? s + arr[idx+1][0] : s }).filter(s => s)
How would I rephrase this to achieve the desired output?
Edit: Updated question.
Not sure of the real requirement but to accomplish what you want you could use .match instead of .split.
const items =
'hello!world.what?'.match(/\w+\W/g);
console.log(items);
update after comment
You could add a group for any character you want to use as the terminator for each part.
const items =
'hello!world.what?'.match(/\w+[!.?]/g);
console.log(items);
additional update
the previous solution would only select alphanumeric chars before the !.?
If you want to match any char except the delimiters then use
const items =
'hello!world.what?up'.match(/[^!.?]+([!.?]|$)/g);
console.log(items);
One solution could be first to use replace() for add a token after each searched character, then you can split by this token.
let input = "hello!world.what?";
const customSplit = (str) =>
{
let token = "#";
return str.replace(/[!.?]/g, (match) => match + "#")
.split(token)
.filter(Boolean);
}
console.log(customSplit(input));