I'm having no end of trouble coming up with an appropriate regex or set of regex's.
What I want to do is detect:
Detect contineous run of digits of length 13 through 19
Detect contineous run of digits interspersed with whitespace of length 13 through 19
Detect contineous run of digits interspersed with dashes of length 13 through 19
The basic business requirement is to warn a user that they may have entered a credit card number in a text field and they ought not to do that (though only a warning, not a hard error). The text field could span multiple lines, could be up to 8k long, a CC # could be embedded anywhere (unlikely to split across multiple lines), could be more than 1 CC# (though detecting the prescence of at least 1 is all I need. I don't need the actual value). Don't need to validate check digit.
The length check can be done external... ie I'm happy to loop through a set of matches, drop any whitespace/dashes and then do a length comparison.
But... JavaScript regex is defeating my every attempt (just not in the right "head space") so I thought I'd ask here :)
Thanks!
Here are all the rules required for credit card validation. You should be able to easily do this in Javascript.
Seems like a fairly simple regex. I've made your requirements a little more stringent - as it was, you'd match lists of dates, such as "1999-04-02 2009-12-09 2003-11-21". I've assumed the sections will be in three to six groups and will themselves be groups of three to eight numbers/dashes/whitespace. You can tune these numbers fairly easily.
/[0-9]{13,19}|([0-9- ]{3,8}){3,6}/
I'm not sure this is what you want, but I figured it was worth a go. If this isn't what you're looking for, perhaps you could show us some regexes that come close or give an impression of what you want?
You want to regex 8k of text with Javascript? Don't waste your time (or the users poor CPU) doing this with Javascript. You're gonna need server-side validation anyways, so just leave it to that.
Here is a great script for many credit card validation steps.
http://javascript.internet.com/forms/val-credit-card.html
/(?:\d[ -]?){12,18}\d/
Checks for 13-19 digits, each of which digit (except the last) may have a space or dash after it. This is a very liberal regex, though, and you may want to narrow it down to known actual credit card formats.
Related
I have been working on a regular expression to divide the one text in sentences. But I been having problems with numbers like 13.4 or emails. In reasons of the '.'. Someone would how to fix it?
/([^\n\r\.!\?:;]+[\.!\?:;]\s)|([^\.!\?]+$)/g
As #WiktorStribizew said, this is a too hard task to be completed with just regex. But you can still get an approximation matching interpunction followed by spaces or new lines:
/.*?[.!?:;](\s|[\r\n]|$)/gm
You can see an example here.
I'm working on a regex that lets me split into chunks a long text that could have #variables# inside. The rules to do the splitting basically are:
Split by each #photo# or #childphoto# variable and look behind or
ahead for text to don't cut the sentence.
Each chunk should have only one #photo# or #childphoto# variable, or not have any of these variables
Also, the chunk should be less than 350 characters
The chunk should not have to cut words or sentences
The chunk should not have to cut any of the possible text variables into the text #anyOtherVariables#
Currently, I have this Regex
/^.*[\S\s]{0,350}[\s\S](?<=(#photo#|#childphoto#)).*/
That currently is working with the .match() JavaScript method to extract the chunks of text that have the variables using the 'look behind' approach, but is not working with the other chunks that do not match the 'look behind' condition, is there a way to include the other parts?
There are the regexp and the study test case. https://regex101.com/r/kdKHkQ/1
I will really appreciate any help with that.
Here is a single JavaScript regex that does what you have specified:
^\b(?=([^]*))[^]{0,350}$(?<=(?![^]{1,}\1$)(?:#(photo|childphoto)#)?[^]*?)(?<!(?=\1$)(?:[^]*?#(photo|childphoto)#){2}[^]*?)
Demo on regex101
It enforces the 350 character limit by taking a snapshot (using lookahead) at the beginning, consuming and capturing up to 350 characters, and then using a lookbehind to look no further back than the snapshotted beginning, to assert that one of the variables in question is inside the just-captured string. Then it uses a negative lookbehind to enforce that there are not two or more of the variables in question in the just-captured string.
I did not understand your rule "The chunk should not have to cut any of the possible text variables into the text #anyOtherVariables#". If by that you mean that the lines containing variables other than #photo# or #childphoto# should be skipped over (not matched), then this regex does not do that, but it could be easily modified to do so.
Now, practically speaking, it would probably be better to implement this in code, or a combination of code and regex, but this demonstrates that exactly what you asked is possible with a pure regex.
I would like to point out that calling this "splitting by each #photo# or #childphoto# variable" is disingenous, and if I actually took that literally, it would be breaking your other rule, that the chunk should not cut sentences. That is probably why you got downvoted.
I'm posting my answer here, despite the fact that you got downvoted, because I already answered this on reddit and you disappeared without commenting.
I'm solving the Ranges challenge in RegexGolf, but I'm somewhat stuck in trying to shorten the regex.
Here is a screenshot of the conditions -
My current solution is \b[a-f]+\b. This pattern has the required range [a-f] in a word boundary. While this works, the regex has 10 characters, and the result list shows submissions with 8, and even 1 character.
Would appreciate any insights on improving this regex.
First please note that shorter doesn't necessarily means better, faster or better readable. But as this is a golfing challenge:
This site seems to handle every input as a separate string. While the word boundaries you are using are fine, using start and end of string anchors (^ and $) will be 1 character shorter each. I don't see how it could be minimized further, so your regex could be
^[a-f]+$
Note: One of the 1-score solutions comments, that i dont know regex but i know javascript, so I'd guess that there was some cheating involved.
This question already has answers here:
How to count the correct length of a string with emojis in javascript?
(9 answers)
Closed 8 years ago.
I ran into an issue with counting unicode characters. I need to count total combined unicode characters.
Take this character for example:
द्ध
if you use .length property on this string it gives you 3. Which is technically correct as it is a combination of
द, ् and ध
However, put द्धin a text area and then you realize by using arrow keys that it is considered as one character. Only if you use backspace you realize that there are 3 characters.
Edit: Also for your test case please consider that it could be a word. It could be something like,
द्धद्द
This should give 2 with .length, but gives 6
This is a problem when you want to get or set the current caret position in input elements.
Your example “द्ध” is a string of three Unicode characters, and the length property correctly indicates this.
What you apparently to want to count is “characters” in some other sense, something like “what a speaker of a language intuitively sees as one character”. This is a vague and mutable concept. The Unicode standard annex UAX #29 Unicode Text Segmentation tries to analyze the concept, calling it “grapheme cluster”, and describes some algorithms on working with it.
Unfortunately, JavaScript has no built-in tools for recognizing whether a character is e.g. combining mark and this should be regarded as part of a cluster. However, if you can limit yourself to handling just one writing system, you can probably code the operations manually, referring to possible Unicode characters by their code numbers.
Moreover, if the intent is to make the count match the way some input editor works (e.g. how the arrow keys more over characters), you would need to know the logic of that editor. It may implement Unicode grapheme clusters in some sense, or something else.
I"m getting bombarded by spam with posts like below, so what would be the best and most efficient way of remove all the jargon from something like this:
<texarea id="comment">ȑ̉̽ͧ̔͆ͦ̊͛̿͗҉̷̢̧̫̗̗͎͈͕e̷̪͓̼̼̣̻̻͙͔̳̘̗͙̬̱͎ͭ̃͗ͩͯͥͬ̂ͧ͐͌̑̅͢͜ͅd̴̦̺̖̣͎̲̥͕̗̺̯̤͗ͬ͌ͧ̓͒ͭ́̋ͩͥ͊̇̓̌ͫ̃́́͠</textarea>
I'm assuming RegEx, but what exactly are those things called and how would it be referenced in RegExp? The problem lays within a <textarea> tag, and upon retrieving the value, I'd like to be able to remove all that jargon from the value and have it only display the real characters which in this case should be red.
Allowing other Unicode type of characters are essential, but not characters that stack on top of each other.
Zalgo waits behind the wall.
You want to filter out combining characters, such as the diacritical marks listed here.
You should be able to get away with a simple character class pattern match, i.e.:
fooString.replace(/[\u0300-\u036f\u0483-\u0489\u1dc0-\u1dff\u20d0-\u20ff\ufe20-\ufe2f]/, "");
If you want to limit content to one combination per character (not that this really alleviates all negative side-effects), you could simply use
fooString.replace(/([\u0300-\u036f\u0483-\u0489\u1dc0-\u1dff\u20d0-\u20ff\ufe20-\ufe2f])[\u0300-\u036f\u0483-\u0489\u1dc0-\u1dff\u20d0-\u20ff\ufe20-\ufe2f]*/, "$1");
EDIT: Added a number of other combining character ranges. This is most likely still not exhaustive.
Removing combining diacriticals will make input of some languages (such as Vietnamese) difficult or impossible, so you should reconsider.