I have a situation where I need to take a sentence and check certain boxes and/or enter numbers into text inputs.
The boxes are things like..
every day
ever week
every month
So I'm not sure if it would be better to use different regex objects to search for the different situations, or if I should try to make 1 big regex object and then switch/case the results.
Here are some examples of what the string can be:
every day
every weekday
every week on sunday, monday, wednesday
every 3 weeks on sunday, friday
every first sunday of every month
day 1 of every 2 months
every january 1
I can do OK when it comes to regex but this is out of league and I'm not sure if I should use different regex objects or try to make a big one. Thanks for any help
It looks like /\w+\s?(\d+?)?\s(\w+)/ takes care of "every week", "every day", "every month", "every year", "every 10 days", etc.
I don't think regular expressions are going to help you much here. They might be able to do some of the really simple matching, but what you're really looking at here is a grammar parsing problem. You might want to read up on languages designed to express abstract grammars, like Extended Backus-Naur Form (EBNF). It sounds intimidating, but it's really not that hard to grasp. Once you're able to describe your grammar in a formal language, suddenly parsing it becomes much easier (at the very least, you have a specification of what kind of inputs are valid). For example, you might have the following EBNF for your problem:
expression = "every" time-unit|time-unit-list|composite-time-unit
time-unit = { ordinal } "day" | "weekday"
ordinal = "first" | "second" | "third" | ...
And so on. This is not a trivial job; parsing an English sentence, even a fairly restrictive one like this can be quite involved. However, it is a well-established and rigorous method.
Once you've got your grammar defined, you can build a parser for it. This is a matter of looking for terminals (like "every") and then matching them to a rule. For example, you might have something like the following (pseudocode):
words = split(/\s*/,lowercase(input))
if( words[0] == "every" ) {
switch( words[1] ) {
case "first":
case "second":
case "third":
...
parseTimeUnit(words);
break;
case "day":
everyDay = true;
break;
...
}
}
Depending on the complexity of your grammar, you might look into automatically generating the parser with something like Yacc.
You've bitten yourself off a hunk of a problem, but it's a rewarding one to work through, so good luck!
Update: I only suggested Yacc because it's among the oldest parser generators I know of. However, there are a million of them, and a lot of them will emit Javascript for you. You can check out Wikipedia's comparison of parser generators for more information.
It seems like what you are trying to do is parse a string into some data structure, and that I believe is not a job for regex (although it could be a part of the solution).
Related
First post on here!
I've done a couple hours of research, and I can't seem to find any actual answers to this, though it may be my understanding that's wrong.
I want to convert a string, lets say "Hello 123" into any Base N, lets say N = 32 for simplicity.
My Attempt
Using Javascript's built-in methods (Found through other websites, and):
stringToBase(string, base) {
return parseInt(string, 10).toString(base);
}
So, this encodes the string to base 10 (decimal) and then into the base I want, however the caveat with this is that it only works from 2 to 36, which is good, but not really in the range that I'm looking for.
More
I'm aware that I can use the JS BigInt, but I'm looking to convert with bases as high as 65536 that uses an arbitrary character set that does not stop when encountering ASCII or (yes I'm aware it's completely useless, I'm just having some fun and I'm very persistent). Most solutions I've seen use an alphabet string or array (e.g. "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+-").
I've seen a couple threads that say that encoding to a radix which is not divisible by 2 won't work, is that true? Since base 85, 91, exist.
I know that the methods atob() and btoa() exist, but this is only for Radix/Base 64.
Some links:
I had a look at this github page: https://github.com/gliese1337/base-to-base/blob/main/src/index.ts , but it's in typescript and I'm not even sure what's going on.
This one is in JS: https://github.com/adanilo/base128codec/blob/master/b128image.js . It makes a bit more sense than the last one, but the fact there is a whole github page just for Base 128 sort of implies that they're all unique and may not be easily converted.
This is the aim of the last and final base: https://github.com/qntm/base65536 . The output of "Hello World!" for instance, is "驈ꍬ啯𒁗ꍲ噤".
(I can code java much better than JS, so if there is a java solution, please let me know as well)
With some addresses a building might take up multiple door numbers for example 13 - 15 StreetName.
The "13 - 15" is the part I am focusing on. How would you do a regular expression to pick out this part.
I thought something like [0-9] - [0-9] which works for 1 - 3 but if the address was 12 - 13 [0-9][0-9] - [0-9][0-9] could work but then I want to make sure that something like 13 - 3 wouldnt work as the addresses cannot go backwards and something like 99 - 103 would also work where the numbers are different lengths. Is it really simple and I'm missing something?
I'm still a student and not very good at regular expressions, I just need it for some js I'm doing and have spent far too long getting nowhere.
Thank you.
There's not really a good way to do this since you're effectively trying to parse something that is not a regular language. Referencing something that you've seen before is allowed by several regular expression languages though, but that won't help you in this specific case.
We can easily go for the brute-force solution though :)
https://regex101.com/r/4bRmiL/1
^(\d{3} - \d{3,}|\d{2} - \d{2,}|\d - \d+)$
As you can see though, it still breaks for cases like 5 - 1 which are probably invalid. That's something you need to check outside of the regex.
I don't think this is even possible with regex. I would instead just do something like this:
var addresses = [
"12 - 14 State St.",
"14 - 12 State St.",
];
addresses.forEach(address => console.log(validAddress(address)));
function validAddress(address) {
return !!(address.match(/\d+\s?-\s?\d+/) || []).filter(a => {
var numbers = a.split('-').map(b => b.trim());
return (numbers.length && numbers[0] < numbers[1]);
}).length;
}
looking for an algorithm that when given a First and a last name, an id is generated such that it consists of purely alphanumeric characters. Also, I would want this to be as short as possible whilst maintaining uniqueness. I was hoping for around 10-12 characters - something that a human could enter.
I have read about suggestions of computing a hash, then simply taking the first n bytes and calling modulus with 36 (the idea is that you have a mapping from 0-35 to the letters a-z 0-9).
Also heard suggestions of maybe truncating and using a higher base to pack more bits into the id.
I guess I could append some encoding of the generation time to the produced id to make it unique but again I need a way for this to be short.
What's your opinion? Are there specific hashing algorithms/truncating methods I should go for? I'll be implementing it in javascript as part of a static html page used as a local webapp.
I am just worried as crypto is hard and I would welcome advice from anyone who thinks they know what they are doing with it.
If it helps the number of ids I expect to make is small - around 4 digits.
One technique would be to just use a combination of the first name and last name, similar to how large companies create email aliases. If you only are creating a few thousand, it wouldn't be hard to work around collisions. These are probably the most human friendly type of id to deal with. For example, Bill Smith would be billsm or something similar.
If you don't want your ids to be easily guessable (though if guessing an id breaks your security model you should probably look into that) then you can go with something like the following (untested javascript pseudocode):
var sequence = 1,
shardId = 1,
epoch = 1357027200000;
function nextId() {
sequence = sequence + 1;
now = Date.now() - epoch;
seqId = sequence % 1023
nextId = now << 15 | shardId << 10 | seqId;
return (nextId).toString(36);
}
Everybody.
Several days ago an Interviewer asked me a question.
And I couldn't answer it. May be at this site exists some guru JS. =)
We have just one string: VARNAME[byte][byte][byte][byte] where [byte] is place for one char.
Question: How write JS correct, if pair of [byte][byte] in HEX MUST BE NOT MORE than 1000 in decimal?
I tried following :
1) VARNAME[20][3D][09][30] it is equal
2) VARNAME<space>=1<space> and it is correct JS CODE BUT!
3) 0x203D = 8253 in decimal not correct must be <=1000
0x0120 = 2352 not correct must be <=1000!
I tried replacing 20 on 09, then:
0x093d = 2365 it is more good, but more than 1000 =(
How i can make it? Interviewer says that it is possible because char can be any( i mean
varname;<space><space><space> and etc), but he can not say me an answer.
Who can make it guys?
The question as described has no answer.
The lowest code point that can appear in an expression context after a variable references is \u0009 which, as you point out, will result in a value greater than 1000 (>= 2304). The ECMAScript 5 specification requires JavaScript environment to generate an early error when an invalid character is encountered. The only characters legal here are a identifier continuation character or a InputElementDiv which is either Whitespace, LineTerminator, Comment, Token, and DivPunctuator, none of which allow code points in the range \u0000-\u0003 which would be required for the question to have an answer.
There are some environments that terminate parsing when a \u0000 is encountered (the C end-of-string character) but those do not conform ES5 in this respect.
The statement that JavaScript allows any character in this position is simply wrong.
This all changes if VARNAME is in a string or a regular expression, however, which can both take character in the range \u0000-\u0003. If this is the trick the interviewer is looking for I can only say that was an unfair question.
Remember, in an interview, you are interviewing the company as much, or more, than the company is interviewing you. I would have serious reservations about joining a company that considers such a question a valid question to use in an interview.
I asked a similar question on how to do this on the server side (SQL), however it makes more sense to accomplish this on the client side, based on the app architecture.
I've got a MVC3 app with Razor on the .Net framework, where I have model data available that I would like to parse and return the first dollar value from a given string using Javascript / regex,
For example, each of the following lines represents a sample data set:
Used knife set for sale $200.00 or best offer.
$4,500 Persian rug for sale.
Today only, $100 rebate.
Five items for sale: $20 Motorola phone car charger, $150 PS2, $50.00 3 foot high shelf.
I've seen a few issues already including the # in JS and a few other pitfalls I would like to try to avoid.
Thanks.
var m = line.match(/\$[0-9,]+\.?\d*/);
if (m)
return m[0];
should give you a hint. This Regex returns you a string which consists of a dollar sign, some numbers or commata, and optional a dot another few numbers behind it. You might want to limit its wideness (only 2 decimals, not starting with zero etc).