express js: Conditional route parameters with RegEx - javascript

I need to match a route that has this form: /city-state-country
Where city can be in formats: san-francisco (multiword separated by '-') or newtown (single word).
And also some countries have state missing, so '-state' param in route should be optional.
How can I strictly match match my route pattern, meaning that it will take either 2 or 3 parameters separated by '-'?
I had something like this:
app.get(/([A-Za-z\-\']+)-([A-Za-z\']+)-([A-Za-z\']+)/, routes.index_location);
but, it didn't work.
Ultimately, cases like these should not work:
/c/san-jose-ca-us
/san-jose-ca-us-someweirdstuff

san-jose-ca-us-someweirdstuff can be parsed as san-jose-ca (city) - us (state) - someweirdstuff (country), so it's perfectly valid case
Unless you missed something, the task is impossible in general. We know that us isn't a state, but regexp doesn't.
You can try to limit an amount of dashes in the city to one, or enumerate all possible countries, or do something like that... Anyway, this has nothing to do with regular expressions, really.

Actually, there is a way. But, it would take a multi step process. In the first pass, replace all two letter states (since they are optional) with a different delimiter. In the second pass, replace all of the countries with a different delimiter so you can recognize cities. In the third pass, replace all city dashes with some other character and add back the states and countries with dash delimiters. In the final pass, replace your cities with a different delimiter with the delimiter you expect.
For instance:
replace /-(al|ca|az...)/ with ~$1 san-jose-ca-us = san-jose~ca-us
replace /-(.+)$/ with ~$1 san-jose~ca-us = san-jose~ca~us
replace /-/ with *$1 san-jose~ca~us = san*jose~ca~us
replace /~/ with - san*jose~ca~us = san*jose-ca-us
etc.

If you only want to keep your information on 1 level hierarchy you can try the underscore delimiter. So, your url be like: city_state_country

Related

Splitting a string at question mark, exclamation mark, or period in javascript and retain those marks?

I was a bit surprised, that actually no one had the exact same issue in javascript...
I tried several different solutions none of them parse the content correctly.
The closest one I tried : (I stole its regex query from a PHP solution)
const test = `abc?aaa.abcd?.aabbccc!`;
const sentencesList = test.split("/(\?|\.|!)/");
But result just going to be
["abc?aaa.abcd?.aabbccc!"]
What I want to get is
['abc?', 'aaa.', 'abcd?','.', 'aabbccc!']
I am so confused.. what exactly is wrong?
/[a-z]*[?!.]/g) will do what you want:
const test = `abc?aaa.abcd?.aabbccc!`;
console.log(test.match(/[a-z]*[?!.]/g))
To help you out, what you write is not a regex. test.split("/(\?|\.|!)/"); is simply an 11 character string. A regex would be, for example, test.split(/(\?|\.|!)/);. This still would not be the regex you're looking for.
The problem with this regex is that it's looking for a ?, ., or ! character only, and capturing that lone character. What you want to do is find any number of characters, followed by one of those three characters.
Next, String.split does not accept regexes as arguments. You'll want to use a function that does accept them (such as String.match).
Putting this all together, you'll want to start out your regex with something like this: /.*?/. The dot means any character matches, the asterisk means 0 or more, and the questionmark means "non-greedy", or try to match as few characters as possible, while keeping a valid match.
To search for your three characters, you would follow this up with /[?!.]/ to indicate you want one of these three characters (so far we have /.*?[?!.]/). Lastly, you want to add the g flag so it searches for every instance, rather than only the first. /.*?[?!.]/g. Now we can use it in match:
const rawText = `abc?aaa.abcd?.aabbccc!`;
const matchedArray = rawText.match(/.*?[?!.]/g);
console.log(matchedArray);
The following code works, I do not think we need pattern match. I take that back, I have been answering in Java.
final String S = "An sentence may end with period. Does it end any other way? Ofcourse!";
final String[] simpleSentences = S.split("[?!.]");
//now simpleSentences array has three elements in it.

Javascript - how to use regex process the following complicated string

I have the following string that will occur repeatedly in a larger string:
[SM_g]word[SM_h].[SM_l] "
Notice in this string after the phrase "[SM_g]word[Sm_h]" there are three components:
A period (.) This could also be a comma (,)
[SM_l]
"
Zero to all three of these components will always appear after "[SM_g]word[SM_h]". However, they can also appear in any order after "[SM_g]word[SM_h]". For example, the string could also be:
[SM_g]word[SM_h][SM_l]"
or
[SM_g]word[SM_h]"[SM_l].
or
[SM_g]word[SM_h]".
or
[SM_g]word[SM_h][SM_1].
or
[SM_g]word[SM_h].
or simply just
[SM_g]word[SM_h]
These are just some of the examples. The point is that there are three different components (more if you consider the period can also be a comma) that can appear after "[SM_h]word[SM_g]" where these three components can be in any order and sometimes one, two, or all three of the components will be missing.
Not only that, sometimes there will be up to one space before " and the previous component/[SM_g]word[SM_h].
For example:
[SM_g]word[SM_h] ".
or
[SM_g]word[SM_h][SM_l] ".
etc. etc.
I am trying to process this string by moving each of the three components inside of the core string (and preserving the space, in case there is a space before &\quot; and the previous component/[SM_g]word[SM_h]).
For example, [SM_g]word[SM_h].[SM_l]" would turn into
[SM_g]word.[SM_l]"[SM_h]
or
[SM_g]word[SM_h]"[SM_l]. would turn into
[SM_g]word"[SM_l].[SM_h]
or, to simulate having a space before "
[SM_g]word[SM_h] ".
would turn into
[SM_g]word ".[SM_h]
and so on.
I've tried several combinations of regex expressions, and none of them have worked.
Does anyone have advice?
You need to put each component within an alternation in a grouping construct with maximum match try of 3 if it is necessary:
\[SM_g]word(\[SM_h])((?:\.|\[SM_l]| ?"){0,3})
You may replace word with .*? if it is not a constant or specific keyword.
Then in replacement string you should do:
$1$3$2
var re = /(\[SM_g]word)(\[SM_h])((?:\.|\[SM_l]| ?"){0,3})/g;
var str = `[SM_g]word[SM_h][SM_l] ".`;
console.log(str.replace(re, `$1$3$2`));
This seems applicable for your process, in other word, changing sub-string position.
(\[SM_g])([^[]*)(\[SM_h])((?=([,\.])|(\[SM_l])|( ?&\\?quot;)).*)?
Demo,,, in which all sub-strings are captured to each capture group respectively for your post processing.
[SM_g] is captured to group1, word to group2, [SM_h] to group3, and string of all trailing part is to group4, [,\.] to group5, [SM_l] to group6, " ?&\\?quot;" to group7.
Thus, group1~3 are core part, group4 is trailing part for checking if trailing part exists, and group5~7 are sub-parts of group4 for your post processing.
Therefore, you can get easily matched string's position changed output string in the order of what you want by replacing with captured groups like follows.
\1\2\7\3 or $1$2$7$3 etc..
For replacing in Javascript, please refer to this post. JS Regex, how to replace the captured groups only?
But above regex is not sufficiently precise because it may allow any repeatitions of the sub-part of the trailing string, for example, \1\2\3\5\5\5\5 or \1\2\3\6\7\7\7\7\5\5\5, etc..
To avoid this situation, it needs to adopt condition which accepts only the possible combinations of the sub-parts of the trailing string. Please refer to this example. https://regex101.com/r/6aM4Pv/1/ for the possible combinations in the order.
But if the regex adopts the condition of allowing only possible combinations, the regex will be more complicated so I leave the above simplified regex to help you understand about it. Thank you:-)

What Regex would capture both the beginning and end from of a string?

I am trying to edit a DateTime string in typescript file.
The string in question is 02T13:18:43.000Z.
I want to trim the first three characters including the letter T from the beginning of a string AND also all 5 characters from the end of the string, that is Z000., including the dot character. Essentialy I want the result to look like this: 13:18:43.
From what I found the following pattern (^(.*?)T) can accomplish only the first part of the trim I require, that leaves the initial result like this: 13:18:43.000Z.
What kind of Regex pattern must I use to include the second part of the trim I have mentioned? I have tried to include the following block in the same pattern (Z000.)$ but of course it failed.
Thanks.
Any help would be appreciated.
There is no need to use regular expression in order to achieve that. You can simply use:
let value = '02T13:18:43.000Z';
let newValue = value.slice(3, -5);
console.log(newValue);
it will return 13:18:43, assumming that your string will always have the same pattern. According to the documentation slice method will substring from beginIndex to endIndex. endIndex is optional.
as I see you only need regex solution so does this pattern work?
(\d{2}:)+\d{2} or simply \d{2}:\d{2}:\d{2}
it searches much times for digit-digit-doubleDot combos and digit-digit-doubleDot at the end
the only disadvange is that it doesn't check whether say there are no minutes>59 and etc.
The main reason why I didn't include checking just because I kept in mind that you get your dates from sources where data that are stored are already valid, ex. database.
Solution
This should suffice to remove both the prefix from beginning to T and postfix from . to end:
/^.*T|\..*$/g
console.log(new Date().toISOString().replace(/^.*T|\..*$/g, ''))
See the visualization on debuggex
Explanation
The section ^.*T removes all characters up to and including the last encountered T in the string.
The section \..*$ removes all characters from the first encountered . to the end of the string.
The | in between coupled with the global g flag allows the regular expression to match both sections in the string, allowing .replace(..., '') to trim both simultaneously.

Custom Backbonejs route parameters

Consider this URL:
domains.google.com/registrar#t=b
note:
#t=b
In this example, the variable "t" stores the current tab on the page where "b" is for billing.
How can I achieve query like parameters in backbone as shown above?
I understand Backbone has routes that support parameters in urls, but this is limited to when the data is in a hierarchy, for example: item/:id
But what about application settings that would not work well in a directory like structure?
The only solution I can think of is a custom parser and break up the key/values myself.
Any ideas?
Expanding on #try-catch-finally's comment, I'm going to show you how to create your own route with a simple RegEx pattern that will match your conditions.
Here's the regex we'll use:
^\w+? # match one word (or at least a character) at the
# beginning of the route
[=] # until we parse a single 'equals' sign
( # then start saving the match inside the parenthesis
[a-zA-Z0-9]* # which is any combination of letters and numbers
) # and stop saving
Putting it all together the regex looks like: /^\w+?[=]([a-zA-Z0-9]*)/.
Now we set up our router,
var MyRouter = Backbone.Router.extend({
initialize: function(options) {
// Matches t=b, passing "b" to this.open
this.route(/^\w+?(?<=[=])(.*)/, "testFn");
},
routes: {
// Optional additional routes
},
testFn: function (id) {
console.log('id: ' + id );
}
});
var router = new MyRouter();
Backbone.history.start();
The TL;DR is that in the MyRouter.initialize we added a route that takes the regex above and invokes the MyRouter.testFn function. Any call to http://yourdomain.com#word1=word2 will call the MyRouter.testFn with the word after the parenthesis as a parameter. Of course, your word place setting could be a single character like in your question t=b.
Expanding your parameters
Say you want to pull multiple parameters, not just the one. The key to understanding how Backbone pulls your parameters is the capturing group (). A capturing group allows your to "save" the match defined within the parenthesis into variables local to the regex expression. Backbone uses these saved matches as the parameters it applies to the the route callback.
So if you want to capture two parameters in your route you'd use this regex:
^\w+?[=]([a-zA-Z0-9]*)[,]\w+?[=]([a-zA-Z0-9]*)
which simply says to expect a comma delimiter between the two parameter placeholders. It would match,
t=b,some=thing
More general route patterns
You can repeat the [,]\w+?[=]([a-zA-Z0-9]*) pattern as many times as you need. If you want to generalize the pattern, you cold use the non-capturing token (?: ... ) and do something like,
^\w+?[=]([a-zA-Z0-9]*)(?:[,]\w+?[=]([a-zA-Z0-9]*))?(?:[,]\w+?[=]([a-zA-Z0-9]*))?
The regex above will look for at least one match and will optionally take two more matches. By placing a ? token at the end of the (?: ... ) group, we say the pattern in the parenthesis may be found zero or one times (i.e. it may or may not be there). This allows you to set a route when you know you can expect up to 3 parameters, but sometimes you may want only one or two.
Is there a truly general route?
You may be asking yourself, why not simply use one greedy (?: ... ) group that will allow an unlimited number of matches. Something like,
^\w+?[=]([a-zA-Z0-9]*)(?:[,]\w+?[=]([a-zA-Z0-9]*))*
With this regex pattern you must supply one parameter, but you can take an unlimited number of subsequent matches. Unfortunately, while the regex will work fine, you won't get the desired result. (See, for example, this Question.)
That's a limitation of JavaScript. With repeating capturing-groups (i.e. the ([a-zA-Z0-9]*) capturing-group will repeat with every repetition of the (?: ... ) non-capturing-group) JavaScript only saves the last match. So if you pass a route like t=b,z=d,g=f,w=1ee, you'll only save 1ee. So, unfortunately you have to have an idea of what the maximum number of parameters your route should take, and manually code them into your regex pattern like we did above.

What's wrong with this regular expression to find URLs?

I'm working on a JavaScript to extract a URL from a Google search URL, like so:
http://www.google.com/search?client=safari&rls=en&q=thisisthepartiwanttofind.org&ie=UTF-8&oe=UTF-8
Right now, my code looks like this:
var checkForURL = /[\w\d](.org)/i;
var findTheURL = checkForURL.exec(theURL);
I've ran this through a couple regex testers and it seems to work, but in practice the string I get returned looks like this:
thisisthepartiwanttofind.org,.org
So where's that trailing ,.org coming from?
I know my pattern isn't super robust but please don't suggest better patterns to use. I'd really just like advice on what in particular I did wrong with this one. Thanks!
Remove the parentheses in the regex if you do not process the .org (unlikely since it is a literal). As per #Mark comment, add a + to match one or more characters of the class [\w\d]. Also, I would escape the dot:
var checkForURL = /[\w\d]+\.org/i;
What you're actually getting is an array of 2 results, the first being the whole match, the second - the group you defined by using parens (.org).
Compare with:
/([\w\d]+)\.org/.exec('thisistheurl.org')
→ ["thisistheurl.org", "thisistheurl"]
/[\w\d]+\.org/.exec('thisistheurl.org')
→ ["thisistheurl.org"]
/([\w\d]+)(\.org)/.exec('thisistheurl.org')
→ ["thisistheurl.org", "thisistheurl", ".org"]
The result of an .exec of a JS regex is an Array of strings, the first being the whole match and the subsequent representing groups that you defined by using parens. If there are no parens in the regex, there will only be one element in this array - the whole match.
You should escape .(DOT) in (.org) regex group or it matches any character. So your regex would become:
/[\w\d]+(\.org)/
To match the url in your example you can use something like this:
https?://([0-9a-zA-Z_.?=&\-]+/?)+
or something more accurate like this (you should choose the right regex according to your needs):
^https?://([0-9a-zA-Z_\-]+\.)+(com|org|net|WhatEverYouWant)(/[0-9a-zA-Z_\-?=&.]+)$

Categories