In the URLs
https://image/4x/c1/abc/5b026cdb06921e7ca5f7a24aff46512e--wedding-vendors-wedding-receptions.jpg
https://image/4x/c1/abc/5b026cdb06921e7ca5f7a24aff46512e.jpg
I'm trying to capture 5b026cdb06921e7ca5f7a24aff46512e in both of these strings. The string will always happen after the last slash, it will be a random assortment of letters and numbers, it may or may not have --randomtext appended, and it will have .jpg at the end.
I currently have ([^\/]+)$ to extract any string after the last slash, but would like to know how to capture everything before .jpg and --randomtext(if present). I will be using this in javascript.
If what is after the last forward slash is a random assortment of letters and numbers a-z0-9, on option is to use a capturing group.
^.*\/([a-z0-9]+).*\.jpg$
In parts
^ Start of string
.*\/ Match until including the last /
([a-z0-9]+) Capture in group 1 matching 1+ chars a-z or digits 0-9
.* Match any char except a newline 0+ times
\.jpg Match .jpg
$ End of string
Regex demo
const regex = /^.*\/([a-z0-9]+).*\.jpg$/;
["https://image/4x/c1/abc/5b026cdb06921e7ca5f7a24aff46512e--wedding-vendors-wedding-receptions.jpg",
"https://image/4x/c1/abc/5b026cdb06921e7ca5f7a24aff46512e.jpg"
].forEach(s => console.log(s.match(regex)[1]));
You can split by / and take the last part, and then replace anything after -- or .jpg from end with empty string
let arr = ["https://image/4x/c1/abc/5b026cdb06921e7ca5f7a24aff46512e--wedding-vendors-wedding-receptions.jpg","https://image/4x/c1/abc/5b026cdb06921e7ca5f7a24aff46512e.jpg"]
let getText = (url) =>{
return url.split('/').pop().replace(/(--.*|\.jpg)$/g,'')
}
arr.forEach(url=> console.log(getText(url)))
If there are chances to have -- more than one time than instead of replacing you can simply match match(/^[a-z0-9]+/g) and take the first element from matched array
Use:
([^\/]*?)(?:--.*)?\.jpg$
and your desired match will be in $1
https://regex101.com/r/gZ9kSi/1
Related
I have a URL as follow:
https://res.cloudinary.com/frivillighet-norge/image/upload/v1501681528/5648f10ae4b09f27e34dd22a.jpg
and I want to match only the id of the picture at the end of the string without including .jpg. So far, I have written something like that: ^[A-Za-z0-9]{24}$ which matches a string of numbers and letters with a length of 24, since my id in the string has always length 24, but this does not work as it matches strings of length 24 only.
Any help would be appreciated.
[A-Za-z0-9]{24}(?=(\.jpg))
"(?=(.jpg))" is a lookaround. It ends the match with .jpg but does not include it.
You could make the pattern a bit more specific by matching the protocol followed by matching 1+ occurrences of a non whitespace char \S+.
Then match the last occurrence of / and capture the id which consists of 24 characters ([A-Za-z0-9]{24}) followed by matching a dot and 2 or more times a char a-z \.[a-z]{2,}
If you want to match the whole string, you could add anchors to assert the start ^ and end $ of the string.
The id is in capture group 1.
^https?:\/\/\S+\/([A-Za-z0-9]{24})\.[a-z]{2,}$
Regex demo
const regex = /https?:\/\/\S+\/([A-Za-z0-9]{24})\.\w+$/;
const str = `https://res.cloudinary.com/frivillighet-norge/image/upload/v1501681528/5648f10ae4b09f27e34dd22a.jpg`;
console.log(str.match(regex)[1])
Let's say I have the following string in javascript:
&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&
I want to remove all the leading and trailing special characters (anything which is not alphanumeric or alphabet in another language) from all the words.
So the string should look like
a.b.c a.b.c a.b.c a.b.c a.b&.c a.b.&&dc ê.b..c
Notice how the special characters in between the alphanumeric is left behind. The last ê is also left behind.
This regex should do what you want. It looks for
start of line, or some spaces (^| +) captured in group 1
some number of symbol characters [!-\/:-#\[-``\{-~]*
a minimal number of non-space characters ([^ ]*?) captured in group 2
some number of symbol characters [!-\/:-#\[-``\{-~]*
followed by a space or end-of-line (using a positive lookahead) (?=\s|$)
Matches are replaced with just groups 1 and 2 (the spacing and the characters between the symbols).
let str = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*([^ ]*?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);
Note that if you want to preserve a string of punctuation characters on their own (e.g. as in Apple & Sauce), you should change the second capture group to insist on there being one or more non-space characters (([^ ]+?)) instead of none and add a lookahead after the initial match of punctuation characters to assert that the next character is not punctuation:
let str = 'Apple &&& Sauce; -This + !That!';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*(?![!-\/:-#\[-`\{-~])([^ ]+?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);
a-zA-Z\u00C0-\u017F is used to capture all valid characters, including diacritics.
The following is a single regular expression to capture each individual word. The logic is that it will look for the first valid character as the beginning of the capture group, and then the last sequence of invalid characters before a space character or string terminator as the end of the capture group.
const myRegEx = /[^a-zA-Z\u00C0-\u017F]*([a-zA-Z\u00C0-\u017F].*?[a-zA-Z\u00C0-\u017F]*)[^a-zA-Z\u00C0-\u017F]*?(\s|$)/g;
let myString = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&'.replace(myRegEx, '$1$2');
console.log(myString);
Something like this might help:
const string = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
const result = string.split(' ').map(s => /^[^a-zA-Z0-9ê]*([\w\W]*?)[^a-zA-Z0-9ê]*$/g.exec(s)[1]).join(' ');
console.log(result);
Note that this is not one single regex, but uses JS help code.
Rough explanation: We first split the string into an array of strings, divided by spaces. We then transform each of the substrings by stripping
the leading and trailing special characters. We do this by capturing all special characters with [^a-zA-Z0-9ê]*, because of the leading ^ character it matches all characters except those listed, so all special characters. Between these two groups we capture all relevant characters with ([\w\W]*?). \w catches words, \W catches non-words, so \w\W catches all possible characters. By appending the ? after the *, we make the quantifier * lazy, so that the group stops catching as soon as the next group, which catches trailing special characters, catches something. We also start the regex with a ^ symbol and end it with an $ symbol to capture the entire string (they respectively set anchors to the start end the end of the string). With .exec(s)[1] we then execute the regex on the substring and return the first capturing group result in our transform function. Note that this might be null if a substring does not include proper characters. At the end we join the substrings with spaces.
Say I have this url:
git+https://github.com/ORESoftware/npp.git
I want to remove the first characters that do not match "http". I also want to remove the .git, but not sure how to do that reliably.
So I am looking to get this string:
https://github.com/ORESoftware/npp
as a total side conversation, not sure how that url differs from:
www.github.com/ORESoftware/npp
You could try this:
let s = 'git+https://github.com/ORESoftware/npp.git';
console.log(s.replace(/^.*?(http.*?)\.git$/, '$1'))
Output:
https://github.com/ORESoftware/npp
This regex works as follows:
^.*? is a non-greedy match from the beginning of the string until the next element which does match, in this case the (http.*?) capturing group.
(http.*?) is a capturing group which captures everything from the http until the next match (since .*? is again non-greedy)
\.git$ matches a trailing .git on the string.
The replacement string $1 replaces the contents of the original string with only the contents of the capturing group. In this case that is everything from http until the last character before .git.
I'm trying to create a regex using javascript that will allow names like abc-def but will not allow abc-
(hyphen is also the only nonalpha character allowed)
The name has to be a minimum of 2 characters. I started with
^[a-zA-Z-]{2,}$, but it's not good enough so I'm trying something like this
^([A-Za-z]{2,})+(-[A-Za-z]+)*$.
It can have more than one - in a name but it should never start or finish with -.
It's allowing names like xx-x but not names like x-x. I'd like to achieve that x-x is also accepted but not x-.
Thanks!
Option 1
This option matches strings that begin and end with a letter and ensures two - are not consecutive so a string like a--a is invalid. To allow this case, see the Option 2.
^[a-z]+(?:-?[a-z]+)+$
^ Assert position at the start of the line
[a-z]+ Match any lowercase ASCII letter one or more times (with i flag this also matches uppercase variants)
(?:-?[a-z]+)+ Match the following one or more times
-? Optionally match -
[a-z]+ Match any ASCII letter (with i flag)
$ Assert position at the end of the line
var a = [
"aa","a-a","a-a-a","aa-aa-aa","aa-a", // valid
"aa-a-","a","a-","-a","a--a" // invalid
]
var r = /^[a-z]+(?:-?[a-z]+)+$/i
a.forEach(function(s) {
console.log(`${s}: ${r.test(s)}`)
})
Option 2
If you want to match strings like a--a then you can instead use the following regex:
^[a-z]+[a-z-]*[a-z]+$
var a = [
"aa","a-a","a-a-a","aa-aa-aa","aa-a","a--a", // valid
"aa-a-","a","a-","-a" // invalid
]
var r = /^[a-z]+[a-z-]*[a-z]+$/i
a.forEach(function(s) {
console.log(`${s}: ${r.test(s)}`)
})
You can use a negative lookahead:
/(?!.*-$)^[a-z][a-z-]+$/i
Regex101 Example
Breakdown:
// Negative lookahead so that it can't end with a -
(?!.*-$)
// The actual string must begin with a letter a-z
[a-z]
// Any following strings can be a-z or -, there must be at least 1 of these
[a-z-]+
let regex = /(?!.*-$)^[a-z][a-z-]+$/i;
let test = [
'xx-x',
'x-x',
'x-x-x',
'x-',
'x-x-x-',
'-x',
'x'
];
test.forEach(string => {
console.log(string, ':', regex.test(string));
});
The problem is that the first assertion accepts 2 or more [A-Za-z]. You will need to modify it to accept one or more character:
^[A-Za-z]+((-[A-Za-z]{1,})+)?$
Edit: solved some commented issues
/^[A-Za-z]+((-[A-Za-z]{1,})+)?$/.test('xggg-dfe'); // Logs true
/^[A-Za-z]+((-[A-Za-z]{1,})+)?$/.test('x-d'); // Logs true
/^[A-Za-z]+((-[A-Za-z]{1,})+)?$/.test('xggg-'); // Logs false
Edit 2: Edited to accept characters only
/^[A-Za-z]+((-[A-Za-z]{1,})+)?$/.test('abc'); // Logs true
Use this if you want to accept such as A---A as well :
^(?!-|.*-$)[A-Za-z-]{2,}$
https://regex101.com/r/4UYd9l/4/
If you don't want to accept such as A---A do this:
^(?!-|.*[-]{2,}.*|.*-$)[A-Za-z-]{2,}$
https://regex101.com/r/qH4Q0q/4/
So both will accept only word starting from two characters of the pattern [A-Za-z-] and not start or end (?!-|.*-$) (negative lookahead) with - .
Try this /([a-zA-Z]{1,}-[a-zA-Z]{1,})/g
I suggest the following :
^[a-zA-Z][a-zA-Z-]*[a-zA-Z]$
It validates :
that the matched string is at least composed of two characters (the first and last character classes are matched exactly once)
that the first and the last characters aren't dashes (the first and last character classes do not include -)
that the string can contain dashes and be greater than 2 characters (the second character class includes dashes and will consume as much characters as needed, dashes included).
Try it online.
^(?=[A-Za-z](?:-|[A-Za-z]))(?:(?:-|^)[A-Za-z]+)+$
Asserts that
the first character is a-z
the second is a-z or hyphen
If this matches
looks for groups of one or more letters prefixed by a hyphen or start of string, all the way to end of string.
You can also use the I switch to make it case insensitive.
I would like to test if user type only alphanumeric value or one "-".
hello-world -> Match
hello-first-world -> match
this-is-my-super-world -> match
hello--world -> NO MATCH
hello-world-------this-is -> NO MATCH
-hello-world -> NO MATCH (leading dash)
hello-world- -> NO MATCH (trailing dash)
Here is what I have so far, but I dont know how to implement the "-" sign to test it if it is only once without repeating.
var regExp = /^[A-Za-z0-9-]+$/;
Try this:
/^[A-Za-z0-9]+(?:-[A-Za-z0-9]+)*$/
This will only match sequences of one or more sequences of alphanumeric characters separated by a single -. If you do not want to allow single words (e.g. just hello), replace the * multiplier with + to allow only one or more repetitions of the last group.
Here you go (this works).
var regExp = /^[A-Za-z0-9]+([-]{1}[A-Za-z0-9]+)+$/;
letters and numbers greedy, single dash, repeat this combination, end with letters and numbers.
(^-)|-{2,}|[^a-zA-Z-]|(-$) looks for invalid characters, so zero matches to that pattern would satisfy your requirement.
I'm not entirely sure if this works because I haven't done regex in awhile, but it sounds like you need the following:
/^[A-Za-z0-9]+(-[A-Za-z0-9]+)+$/
You're requirement is split up in the following:
One or more alphanumeric characters to start (that way you ALWAYS have an alphanumeric starting.
The second half entails a "-" followed by one or more alphanumeric characters (but this is optional, so the entire thing is required 0 or more times). That way you'll have 0 or more instances of the dash followed by 1+ alphanumeric.
I'm just not sure if I did the regex properly to follow that format.
The expression can be simplified to: /^[^\W_]+(?:-[^\W_]+)+$/
Explanation:
^ match the start of string
[^\W_]+ match one or more word(a-zA-Z0-9) chars
(?:-[^\W_]+)+ match one or more group of '-' follwed by word chars
$ match the end of string
Test: https://regex101.com/r/MODQxw/1