Handling Dash Character in Regular Expression for Filenames - javascript

I have a string which will be used to create a filename. The original string pattern may include a dash. Recently, the pattern has changed and I need to handle the regular expression to remove the dashes near the end or middle of the string but not those at the beginning of the string.
Regular Expression Pattern Rules/Requirements:
Replace all special characters with an underscore with some exceptions
Remove dashes not located at the beginning of the string
The dashes which need to be kept are typically between numeric values [0-9] and can appear any number of times in the string (i.e. "23-564-8 Testing - The - String" -> "23-564-8_testing_the_string")
The dashes which should be converted to underscores are typically between [a-zA-Z] characters (i.e. "Testing - The - String" -> "testing_the_string")
Examples of Potential Strings:
23-564-8 Testing the String -> Expected Output: 23-564-8_testing_the_string
Testing - The String -> Expected Output: testing_the_string
23-564-8 Testing - The - String -> Expected Output: 23-564-8_testing_the_string
Opinion: Personally, I'm not a fan of including dashes in filename but it is a requirement
Current Regexp Solution:
var str = "23-564-8 Testing the String";
str.replace(/[^a-zA-Z0-9-]/g, '_').replace(/__/g, '_');
Question: What is the best way to handle this case? My current solution leaves all dashes in the string.

You may use this regex with a negative lookahead:
/[^a-zA-Z0-9-]+|-(?!\d)/g
RegEx Details:
[^a-zA-Z0-9-]+: Match 1 or more of any character that is not hyphen or alphanumeric
|: OR
-(?!\d): Match hyphen if it is NOT immediately followed by a digit
Code:
const arr = [
'23-564-8 Testing the String',
'Testing - The String',
'-23-564-8 Testing - The - String'
]
const re = /[^a-zA-Z0-9-]+|-(?!\d)/g
var result = []
arr.forEach(el => {
result.push( el.replace(re, '_').replace(/_{2,}/g, '_') )
})
console.log( result )

The following Regex pattern can be used with a replacement string $1_ (see demo):
(\d+(?:-+\d+)+)?[\W\-_]+
The pattern consists of two parts:
(\d+(?:-+\d+)+)? captures numbers with allowed dashes into the Group1
[\W\-_]+ captures special characters to be replaced
The Group1 is required to prevent allowed dashes from being replaced. The $1 token in the replacement string ensures that this content of Group1 will be kept in the result.
This Regex pattern also handles the scenario of duplicate _ characters, so .replace(/__/g, '_') is no longer required. The code can be transformed to:
var str = "23-564-8 Testing the String";
var res = str.replace(/(\d+(?:-+\d+)+)?[\W\-_]+/g, "$1_");
console.log(res);

Related

Regex Capture Character and Replace with another

Trying to replace the special characters preceded by digits with dot.
const time = "17:34:12:p. m.";
const output = time.replace(/\d+(.)/g, '.');
// Expected Output "17.34.12.p. m."
console.log(output);
I had wrote the regex which will capture any character preceded by digit/s. The output is replacing the digit too with the replacement. Can someone please help me to figure out the issue?
You can use
const time = "17:34:12:p. m.";
const output = time.replace(/(\d)[\W_]/g, '$1.');
console.log(output);
The time.replace(/(\d)[\W_]/g, '$1.') code will match and capture a digit into Group 1 and match any non-word or underscore chars, and the $1. replacement will put the digit back and replace : with ..
If you want to "subtract" whitespace pattern from [\W_], use (?:[^\w\s]|_).
Consider checking more special character patterns in Check for special characters in string.
You should look for non word(\w) and non spaces (\s) characters and replace them with dot.
You should use some live simulator for regular expressions. For example regex101: https://regex101.com/r/xIStHH/1
const time = "17:34:12:p. m.";
const output = time.replace(/[^\w\s]/g, '.');
// Expected Output "17.34.12.p. m."
console.log(output);

How to match regular expression In Javascript

I have string [FBWS-1] comes first than [FBWS-2]
In this string, I want to find all occurance of [FBWS-NUMBER]
I tried this :
var term = "[FBWS-1] comes first than [FBWS-2]";
alert(/^([[A-Z]-[0-9]])$/.test(term));
I want to get all the NUMBERS where [FBWS-NUMBER] string is matched.
But no success. I m new to regular expressions.
Can anyone help me please.
Note that ^([[A-Z]-[0-9]])$ matches start of a string (^), a [ or an uppercase ASCII letter (with [[A-Z]), -, an ASCII digit and a ] char at the end of the string. So,basically, strings like [-2] or Z-3].
You may use
/\[[A-Z]+-[0-9]+]/g
See the regex demo.
NOTE If you need to "hardcode" FBWS (to only match values like FBWS-123 and not ABC-3456), use it instead of [A-Z]+ in the pattern, /\[FBWS-[0-9]+]/g.
Details
\[ - a [ char
[A-Z]+ - one or more (due to + quantifier) uppercase ASCII letters
- - a hyphen
[0-9]+ - one or more (due to + quantifier) ASCII digits
] - a ] char.
The /g modifier used with String#match() returns all found matches.
JS demo:
var term = "[FBWS-1] comes first than [FBWS-2]";
console.log(term.match(/\[[A-Z]+-[0-9]+]/g));
You can use:
[\w+-\d]
var term = "[FBWS-1] comes first than [FBWS-2]";
alert(/[\w+-\d]/.test(term));
There are several reasons why your existing regex doesn't work.
You trying to match the beginning and ending of your string when you
actually want everything in between, don't use ^$
Your only trying to match one alpha character [A-Z] you need to make this greedy using the +
You can shorten [A-Z] and [0-9] by using the shorthands \w and \d. The brackets are generally unnecessary.
Note your code only returns a true false value (your using test) ATM it's unclear if this is what you want. You may want to use match with a global modifier (//g) instead of test to get a collection.
Here is an example using string.match(reg) to get all matches strings:
var term = "[FBWS-1] comes first than [FBWS-2]";
var reg1 = /\[[A-Z]+-[0-9]\]/g;
var reg2 = /\[FBWS-[0-9]\]/g;
var arr1 = term.match(reg1);
var arr2 = term.match(reg2)
console.log(arr1);
console.log(arr2);
Your regular expression /^([[A-Z]-[0-9]])$/ is wrong.
Give this regex a try, /\[FBWS-\d\]/g
remove the g if you only want to find 1 match, as g will find all similar matches
Edit: Someone mentioned that you want ["any combination"-"number"], hence if that's what you're looking for then this should work /\[[A-Z]+-\d\]/

regular expression to match special characters between delimiters

i have a basic string and would like to get only specific charaters between the brackets
Base string: This is a test string [more or less]
regex: to capture all r's and e's works just fine.
(r|e)
=> This is a test string [more or less]
Now i want to use the following regex and group it with my regex to give only r's and e's between the brackets, but unfortunately this doesn't work:
\[(r|e)\]
Expected result should be : more or less
can someone explain?
edit: the problem is very similar to this one: Regular Expression to find a string included between two characters while EXCLUDING the delimiters
but with the difference, that i don't want to get the whole string between the brackets.
Follow up problem
base string = 'this is a link:/en/test/äpfel/öhr[MyLink_with_äöü] BREAK äöü is now allowed'
I need a regex for finding the non-ascii characters äöü in order to replace them but only in the link:...] substring which starts with the word link: and ends with a ] char.
The result string will look like this:
result string = 'this is a link:/en/test/apfel/ohr[MyLink_with_aou] BREAK äöü is now allowed again'
The regex /[äöü]+(?=[^\]\[]*])/g from the solution in the comments only delivers the äöü chars between the two brackets.
I know that there is a forward lookahead with a char list in the regex, but i wonder why this one does not work:
/link:([äöü]+(?=[^\]\[]*])/
thanks
You can use the following solution: match all between link: and ], and replace your characters only inside the matched substrings inside a replace callback method:
var hashmap = {"ä":"a", "ö":"o", "ü":"u"};
var s = 'this is a link:/en/test/äpfel/öhr[MyLink_with_äöü] BREAK äöü is now allowed';
var res = s.replace(/\blink:[^\]]*/g, function(m) { // m = link:/en/test/äpfel/öhr[MyLink_with_äöü]
return m.replace(/[äöü]/g, function(n) { // n = ä, then ö, then ü,
return hashmap[n]; // each time replaced with the hashmap value
});
});
console.log(res);
Pattern details:
\b - a leading word boundary
link: - whole word link with a : after it
[^\]]* - zero or more chars other than ] (a [^...] is a negated character class that matches any char/char range(s) but the ones defined inside it).
Also, see Efficiently replace all accented characters in a string?

regex precceded by two or more special character

I am stuck with creating regex such that if the word is preceded or ended by special character more than one regex on each side regex 'exec' method should throw null. Only if word is wrap with exactly one bracket on each side 'exec' method should give result Below is the regular expression I have come up with.
If the string is like "(test)" or then only regex.exec should have values for other combination such as "((test))" OR "((test)" OR "(test))" it should be null. Below code is not throwing null which it should. Please suggest.
var w1 = "\(test\)";
alert(new RegExp('(^|[' + '\(\)' + '])(' + w1 + ')(?=[' + '\(\)' + ']|$)', 'g').exec("this is ((test))"))
If you have a list of words and want to filter them, you can do the following.
string.split(' ').filter(function(word) {
return !(/^[!##$%^&*()]{2,}.+/).test(word) || !(/[!##$%^&*()]{2,}$).test(word)
});
The split() function splits a string at a space character and returns an array of words, which we can then filter.
To keep the valid words, we will test two regex expressions to see if the word starts or ends with 2 or more special characters respectively.
RegEx Breakdown
^ - Expression starts with the following
[] - A single character in the block
!##$%^&*() - These are the special characters I used. Replace them with the ones you want.
{2,} - Matches 2 or more of the preceeding characters
.+ - Matches 1 or more of any character
$ - Expression ends with the following
To use the exec function this way do this
!(/^[!##$%^&*()]{2,}.+/).exec(string) || !(/[!##$%^&*()]{2,}$).exec(string)
If I understand correctly, you are looking for any string which contains (test), anywhere in it, and exactly that, right?
In that case, what you probably need is the following:
var regExp = /.*[^)]\(test\)[^)].*/;
alert(regExp.exec("this is ((test))")); // → null
alert(regExp.exec("this is (test))" )); // → null
alert(regExp.exec("this is ((test)" )); // → null
alert(regExp.exec("this is (test) ...")); // → ["this is (test) ..."]
Explanation:
.* matches any character (except newline) between zero and unlimited times, as many times as possible.
[^)] match a single character but not the literal character )
This makes sure there's your test string in the given string, but it is only ever wrapped with one brace in every side!
You can use the following regex:
(^|[^(])(\(test\))(?!\))
See regex demo here, replace with $1<span style="new">$2</span>.
The regex features an alternation group (^|[^(]) that matches either start of string ^ or any character other than (. This alternation is a kind of a workaround since JS regex engine does not support look-behinds.
Then, (\(test\)) matches and captures (test). Note the round brackets are escaped. If they were not, they would be treated as a capturing group delimiters.
The (?!\)) is a look-ahead that makes sure there is no literal ) right after test). Look-aheads are supported fully by JS regex engine.
A JS snippet:
var re = /(^|[^(])(\(test\))(?!\))/gi;
var str = 'this is (test)\nthis is ((test))\nthis is ((test)\nthis is (test))\nthis is ((test\nthis is test))';
var subst = '$1<span style="new">$2</span>';
var result = str.replace(re, subst);
alert(result);

Regex needed to split a string by "."

I am in need for a regex in Javascript. I have a string:
'*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5'
I want to split this string by periods such that I get an array:
[
'*window',
'some1',
'some\.2', //ignore the . because it's escaped
'(a.b ? cc\.c : d.n [a.b, cc\.c])', //ignore everything inside ()
'some\.3',
'(this.o.p ? ".mike." [ff\.])',
'some5'
]
What regex will do this?
var string = '*window.some1.some\\.2.(a.b + ")" ? cc\\.c : d.n [a.b, cc\\.c]).some\\.3.(this.o.p ? ".mike." [ff\\.]).some5';
var pattern = /(?:\((?:(['"])\)\1|[^)]+?)+\)+|\\\.|[^.]+?)+/g;
var result = string.match(pattern);
result = Array.apply(null, result); //Convert RegExp match to an Array
Fiddle: http://jsfiddle.net/66Zfh/3/
Explanation of the RegExp. Match a consecutive set of characters, satisfying:
/ Start of RegExp literal
(?: Create a group without reference (example: say, group A)
\( `(` character
(?: Create a group without reference (example: say, group B)
(['"]) ONE `'` OR `"`, group 1, referable through `\1` (inside RE)
\) `)` character
\1 The character as matched at group 1, either `'` or `"`
| OR
[^)]+? Any non-`)` character, at least once (see below)
)+ End of group (B). Let this group occur at least once
| OR
\\\. `\.` (escaped backslash and dot, because they're special chars)
| OR
[^.]+? Any non-`.` character, at least once (see below)
)+ End of group (A). Let this group occur at least once
/g "End of RegExp, global flag"
/*Summary: Match everything which is not satisfying the split-by-dot
condition as specified by the OP*/
There's a difference between + and +?. A single plus attempts to match as much characters as possible, while a +? matches only these characters which are necessary to get the RegExp match. Example: 123 using \d+? > 1 and \d+ > 123.
The String.match method performs a global match, because of the /g, global flag. The match function with the g flag returns an array consisting of all matches subsequences.
When the g flag is omitted, only the first match will be selected. The array will then consist of the following elements:
Index 0: <Whole match>
Index 1: <Group 1>
The regex below :
result = subject.match(/(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g);
Can be used to acquire the desired results. Group 1 has the results since you want to omit the .
Use this :
var myregexp = /(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g;
var match = myregexp.exec(subject);
while (match != null) {
for (var i = 0; i < match.length; i++) {
// matched text: match[i]
}
match = myregexp.exec(subject);
}
Explanation :
// (?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))
//
// Match the regular expression below «(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))»
// Match the regular expression below and capture its match into backreference number 1 «(\(.*?[^'"]\)|.*?[^\\])»
// Match either the regular expression below (attempting the next alternative only if this one fails) «\(.*?[^'"]\)»
// Match the character “(” literally «\(»
// Match any single character that is not a line break character «.*?»
// Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match a single character NOT present in the list “'"” «[^'"]»
// Match the character “)” literally «\)»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «.*?[^\\]»
// Match any single character that is not a line break character «.*?»
// Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match any character that is NOT a “A \ character” «[^\\]»
// Match the regular expression below «(?:\.|$)»
// Match either the regular expression below (attempting the next alternative only if this one fails) «\.»
// Match the character “.” literally «\.»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «$»
// Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
It is notoriously difficult to use a Regex to do balanced parenthesis matching, especially in Javascript.
You would be way better off creating your own parser. Here's a clever way to do this that will utilize the strength of Regex's:
Create a Regex that matches and captures any "pattern of interest" - /(?:(\\.)|([\(\[\{])|([\)\]\}])|(\.))/g
Use string.replace(pattern, function (...)), and in the function, keep a count of opening braces and closing braces.
Add the matching text to a buffer.
If the split character is found and the opening and closing braces are balanced, add the buffer to your results array.
This solution will take a bit of work, and requires knowledge of closures, and you should probably see the documentation of string.replace, but I think it is a great way to solve your problem!
Update:
After noticing the number of questions related to this one, I decided to take on the above challenge.
Here is the live code to use a Regex to split a string.
This code has the following features:
Uses a Regex pattern to find the splits
Only splits if there are balanced parenthesis
Only splits if there are balanced quotes
Allows escaping of parenthesis, quotes, and splits using \
This code will work perfectly for your example.
not need regex for this work.
var s = '*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5';
console.log(s.match(/(?:\([^\)]+\)|.*?\.)/g));
output:
["*window.", "some1.", "some.", "2.", "(a.b + ")", "" ? cc.", "c : d.", "n [a.", "b, cc.", "c]).", "some.", "3.", "(this.o.p ? ".mike." [ff.])", "."]
So, was working with this, and now I see that #FailedDev is rather not a failure, since that was pretty nice. :)
Anyhow, here's my solution. I'll just post the regex only.
((\(.*?((?<!")\)(?!")))|((\\\.)|([^.]))+)
Sadly this won't work in your case however, since I'm using negative lookbehind, which I don't think is supported by javascript regex engine. It should work as intended in other engines however, as can be confirmed here: http://gskinner.com/RegExr/. Replace with $1\n.

Categories