Matching Words With or Without Hyphens - javascript

So I'm basically trying to match words in a string that may or may not contain hyphens.
For instance, in the strings below:
let firstStr = "filter-table";
let secondStr = "filter-second-table";
"filter-" is a required keyword, and so, I'd want to match the words containing "filter-" followed by any character/word (hyphenated or not).
Using the following:
secondStr.match(/filter-\w+/);
"firstStr" matches correctly but not "secondStr". "secondStr" only matches "filter-second" and not the hyphenated word after - "filter-second-table".
I'd want to be able to match any potential hyphenated word as in "second-table".

Make a group that can be matched multiple times like this: /filter(-\w+)+/
myTest("filter-table");
myTest("filter-second-table");
myTest("filter-awesome-second-table");
function myTest(string){
let matches = string.match(/filter(-\w+)+/)
console.log(string, matches ? "matches!" : "doesn't match")
}

Related

Regexp to explode url

I have a string url like "home/products/product_name_1/details/some_options"
And i want to parse it into array with Regexp to ["home", "products","product","details","some"]
So the rule is "split by words if backslash, but if the word have underscores - take only that part that comes before first underscore"
JavaScript equivalent for this regex is
str.split("/").map(item => item.indexOf("_") > -1 ? item.split("_")[0] : item)
Please help!
you can use this pattern
(?<!\w)[^/_]+
results
['home', 'products', 'product', 'details', 'some']
python code
import re
str="home/products/product_name_1/details/some_options"
re.findall('(?<!\w)[^/_]+',str)
['home', 'products', 'product', 'details', 'some']
Try this:
input = ["home/products/product_name_1/details/some_options",
"company/products/cars_all/details/black_color",
"public/places/1_cities/disctricts/1234_something"]
let pattern = /([a-zA-Z\d]*)(?:\/|_.*?(?:\/|$))/gmi
input.forEach(el => {
let matches = el.matchAll(pattern)
for (const match of matches) {
console.log(match[1]);
}
})
Remove \d from the regex pattern if you dont want digits in the url.
I have used matchAll here, matchAll returns a iterator, use that to get each match object, inside which the first element is the full match, and the second elemnt(index: 1) is the required group.
/([a-zA-Z\d]*)(?:\/|_.*?(?:\/|$))/gmi
/
([a-zA-Z\d]*) capture group to match letters and digits
(?:\/|_.*?(?:\/|$)) non capture group to match '/' or '_' and everything till another '/' or end of the line is found
/gmi
You can test this regex here: https://regex101.com/r/B5Bo74/1
You can use:
\b[^\W_]+
\b A word boundary to prevent a partial match
[^\W_]+ Match 1+ word characters except for _
See a regex demo.
const s = "home/products/product_name_1/details/some_options";
const regex = /\b[^\W_]+/g;
console.log(s.match(regex));
If there has to be a leading / or the start of the string before the match, you can use an alternation (?:^|\/) and use a capture group for the values that you want to keep:
const s = "home/products/product_name_1/details/some_options";
const regex = /(?:^|\/)([^\W_]+)/g;
console.log(Array.from(s.matchAll(regex), m => m[1]));
Given input:
string "home/products/product_name_1/details/some_options"
Expected output:
array ["home", "products", "product", "details", "some"]
Note: ignore/exclude name, 1, options (because word occurs after 1st underscore).
Task:
split URI by slash into a set of path-segments (words)
(if the path-segment or word contains underscores) remove the part after first underscore
Regex to match
With a regex \/|_\w+ you could match the URL-path separator (slash) and excluded word-part (every word after an underscore).
Then use this regex
either as separator to split the string into its parts(excluding the regex matches): e.g. in JS split(/\/|_\w+/)
or as search-pattern in replace to prepare a string that can be easily split: e.g. in JS replaceAll(/\/|_\w+/g, ',') to obtain a CSV row which can be easily split by comma `split(',')
Beware: The regular-expression itself (flavor) and functions to apply it depend on your environment/regex-engine and script-/programming-language.
Regex applied in Javascript
split by regex
For example in Javascript use url.split(/\/|_\w*/) where:
/pattern/: everything inside the slashes is the regex-pattern
\/: a c slash (URL-path-separator)
|: the alternate junction, interpreted as boolean OR
_\w*: zero or more (*) word-characters (w, i.e. letter from alphabet, numeric digit or underscore) following an underscore
See also:
Use of capture groups in String.split()
However, this returns also empty strings (as empty split-off second parts inside underscore-containing path-segments). We can remove the empty strings with a filter where predicate s => s returns true if the string is non-empty.
Demo to solve your task:
const url = "home/products/product_name_1/details/some_options";
let firstWordsInSegments = url.split(/\/|_\w*/).filter(s => s);
console.log(firstWordsInSegments);
const urlDuplicate = "home/products/product_name_1/details/some_options/_/home";
console.log(urlDuplicate.split(/\/|_\w*/).filter(s => s)); // contains duplicates in output array
replace into CSV, then split and exclude (map,replace,filter)
The CSV containing path-segments can be split by comma and resulting parts (path-segments) can be filtered or replaced to exclude unwanted sub-parts.
using:
replaceAll to transform to CSV or remove empty strings. Note: global flag required when calling replaceAll with regex
map to remove unwanted parts after underscore
filter(s => s) to filter out empty strings
const url = "home/products/product_name_1/details/some_options";
// step by step
let pathSegments = url.split('/');
console.log('pathSegments:', pathSegments);
let firstWordsInSegments = pathSegments.map(s => s.replaceAll(/_\w*/g,''));
console.log(firstWordsInSegments);
// replace to obtain CSV and then split
let csv = "home/products/product_name_1/details/some_options/_/home".replaceAll(/\/|_\w+/g, ',');
console.log('csv:', csv);
let parts = csv.split(',');
console.log('parts:', parts); // contains empty parts
let nonEmptyParts = parts.filter(s => s);
console.log('nonEmptyParts:', nonEmptyParts); // filtered out empty parts
Bonus Tip
Try your regex online (e.g. regex101 or regexplanet). See the demo on regex101.
You could split the url with this regex
(_\w*)+|(\/)
This matches the /, _name_1 and _options.
BUT depending what you are trying to to, or which language do you use, there are way better options to do this.
You can try a pattern like \/([^\/_]+){1,} (assuming that the path starts with '/' and the components are separated by '/'); depending on language you might get an array or iterator that will give the components.
Try ^[[:alpha:]]+|(?<=\/)[[:alpha:]]+ or ^[a-zA-Z]+|(?<=\/)[a-zA-Z]+ if [[:alpha:]] is not supported , it matches one or more characters on the beginning or after slash until first non char.

Regex to match string in a sentence

I am trying to find a strictly declared string in a sentence, the thread says:
Find the position of the string "ten" within a sentence, without using the exact string directly (this can be avoided in many ways using just a bit of RegEx). Print as many spaces as there were characters in the original sentence before the aforementioned string appeared, and then the string itself in lowercase.
I've gotten this far:
let words = 'A ton of tunas weighs more than ten kilograms.'
function findTheNumber(){
let regex=/t[a-z]*en/gi;
let output = words.match(regex)
console.log(words)
console.log(output)
}
console.log(findTheNumber())
The result should be:
input = A ton of tunas weighs more than ten kilograms.
output = ten(ENTER)
You could try a regex replacement approach, with the help of a callback function:
var input = "A ton of tunas weighs more than ten kilograms.";
var output = input.replace(/\w+/g, function(match, contents, offset, input_string)
{
if (!match.match(/^[t][e][n]$/)) {
return match.replace(/\w/g, " ");
}
else {
return match;
}
});
console.log(input);
console.log(output);
The above logic matches every word in the input sentence, and then selectively replaces every word which is not ten with an equal number of spaces.
You can use
let text = 'A ton of tunas weighs more than ten kilograms.'
function findTheNumber(words){
console.log( words.replace(/\b(t[e]n)\b|[^.]/g, (x,y) => y ?? " ") )
}
findTheNumber(text)
The \b(t[e]n)\b is basically ten whole word searching pattern.
The \b(t[e]n)\b|[^.] regex will match and capture ten into Group 1 and will match any char but . (as you need to keep it at the end). If Group 1 matches, it is kept (ten remains in the output), else the char matched is replaced with a space.
Depending on what chars you want to keep, you may adjust the [^.] pattern. For example, if you want to keep all non-word chars, you may use \w.

How to create a new string by removing anything that doesn't match a regex from the old one (in JavaScript)

I have a regex for matching IMDB IDs from input, like so
const reg = /(tt[0-9]{7,8})/
an input can be any link from IMDB, e.g.
https://www.imdb.com/title/tt0468569/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=e31d89dd-322d-4646-8962-327b42fe94b1&pf_rd_r=2K0PR4FAVS54AC50131G&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=top&ref_=chttp_tt_4
What I'm trying to do is make a new string from this input, leaving only the ID.
So the expected output is tt0468569
I've only been able to find examples of how to remove everything that does match the regex, which is the opposite of what I need.
I want something like
const reg = /(tt[0-9]{7,8})/
var input = "https://www.imdb.com/title/tt0468569/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=e31d89dd-322d-4646-8962-327b42fe94b1&pf_rd_r=2K0PR4FAVS54AC50131G&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=top&ref_=chttp_tt_4"
var result = input.replace(!reg, '')
console.log(result)
Any help would be appreciated
This would do:
The \d is for a digit. You can choose \d{7,8} or \d+ which means 1 or more, or you can use \d{7,} to mean 7 or more.
This is the docs for match()
const s = "https://www.imdb.com/title/tt0468569/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=e31d89dd-322d-4646-8962-327b42fe94b1&pf_rd_r=2K0PR4FAVS54AC50131G&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=top&ref_=chttp_tt_4"
console.log(s.match(/tt\d{7,8}/)[0]);
And we can also guard against the case of not found:
const s = "https://www.imdb.com/title/tt0468569/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=e31d89dd-322d-4646-8962-327b42fe94b1&pf_rd_r=2K0PR4FAVS54AC50131G&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=top&ref_=chttp_tt_4"
// to guard against "not found":
const matches = s.match(/tt\d{7,8}/);
const id = matches && matches[0];
console.log(id);
// Example when "not found"
const matches2 = s.match(/tt\d{20}/);
const id2 = matches2 && matches2[0];
console.log(id2);
The part for matches && matches[0], it means if matches is truthy (an array is truthy), then evaluate the second part and take its value. But when there is no match, then matches is null and is falsy, and then the && will not go on and just take the null as the value.
You are trying to match something, so you probably won't use replace(), but if you do, it is:
const s = "https://www.imdb.com/title/tt0468569/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=e31d89dd-322d-4646-8962-327b42fe94b1&pf_rd_r=2K0PR4FAVS54AC50131G&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=top&ref_=chttp_tt_4"
console.log(s.replace(/.*?(tt\d{7,8}).*/, '$1'));
The .*? means any number of characters and non-greedy, so as to match the first occurrence of tt_______. If it is .*, that is greedy and the parenthesized pattern can match the second or last occurrence of tt_______ if it exists. So basically matching that pattern and replace with the first parenthesized match.

How to match bold markdown if it isn't preceded with a backslash?

I'm looking to match bolded markdown. Here are some examples:
qwer *asdf* zxcv matches *asdf*
qwer*asdf*zxcv matches *asdf*
qwer \*asdf* zxcv does not match
*qwer* asdf zxcv matches *qwer*
A negative look behind like this (?<!\\)\*(.*)\* works.
Except there is no browser support in Firefox, so I cannot use it.
Similarly, I can get very close with (^|[^\\])\*(.*)\*
The issue is that there are two capture groups, and I need the index of the second capture group, and Javascript only returns the index of the first capture group. I can bandaid it in this case by just adding 1, but in other cases this hack will not work.
My reasoning for doing this is that I'm trying to replace a small subset of Markdown with React components. As an example, I'm trying to convert this string:
qwer *asdf* zxcv *123*
Into this array:
[ "qwer ", <strong>asdf</strong>, " zxcv ", <strong>123</strong> ]
Where the second and fourth elements are created via JSX and included as array elements.
You will also need to take into account that when a backslash occurs before an asterisk, it may be one that is itself escaped by a backslash, and in that case the asterisk should be considered the start of bold markup. Except if that one is also preceded by a backslash,...etc.
So I would suggest this regular expression:
((?:^|[^\\])(?:\\.)*)\*((\\.|[^*])*)\*
If the purpose is to replace these with tags, like <strong> ... </strong>, then just use JavaScript's replace as follows:
let s = String.raw`now *this is bold*, and \\*this too\\*, but \\\*this\* not`;
console.log(s);
let regex = /((?:^|[^\\])(?:\\.)*)\*((\\.|[^*])*)\*/g;
let res = s.replace(regex, "$1<strong>$2</strong>");
console.log(res);
If the bolded words should be converted to a React component and stored in an array with the other pieces of plain text, then you could use split and map:
let s = String.raw`now *this is bold*, and \\*this too\\*, but \\\*this\* not`;
console.log(s);
let regex = /((?:^|[^\\])(?:\\.)*)\*((?:\\.|[^*])*)\*/g;
let res = s.split(regex).map((s, i) =>
i%3 === 2 ? React.createComponent("strong", {}, s) : s
);
Since there are two capture groups in the "delimiter" for the split call, one having the preceding character(s) and the second the word itself, every third item in the split result is a word to be bolded, hence the i%3 expression.
This should do the trick:
/(?:^|[^\\])(\*[^*]+[^\\]\*)/
The only capturing group there is the string surrounded by *'s.

Regex to accept only variable names

How do I write a regex that accepts only words or letters and split them by ,?
I have tried array = input.replace(/ /g, '').split(','), but then h-e,a<y will become ['h-e','a<y'] I want to accept only variables, so I guess h-e,a<y should become ['he','ay'].
Would it be something like
array = input.replace(/[\s|^\w]/g, '').split(',')
You could use this regex to find all the characters that you want to remove from your string:
array = input.replace(/[-><?.:;]*/ig, '').split(',')
You will replace all the characters that are inside the [ ].
Split first, then remove characters not allowed:
input.split(,).map(fix)
where
function fix(s) {
return s.replace(/[^\w$]/g, '');
}
Another approach is to grab the characters you want, instead of throwing away the ones you don't:
function fix(s) {
return s.match(/[\w$]/g).join('');
}
Actually, this is not exactly right, because JavaScript variable names can also contain Unicode characters such as Σ. Also, this would not fix up leading numeric characters, which JavaScript variable names cannot start with.

Categories