Extract numeric and text parts of a string, in varying formats - javascript

I'm trying to put together a RegEx to split a variety of possible user inputs, and while I've managed to succeed with some cases, I've not managed to cover every case that I'd like to.
Possible inputs, and expected outputs
"1 day" > [1,"day"]
"1day" > [1,"day"]
"10,000 days" > [10000,"days"]
Is it possible to split the numeric and text parts from the string without necessarily having a space, and to also remove the commas etc from the string at the same time?
This is what I've got at the moment
[a-zA-Z]+|[0-9]+
Which seems to split the numeric and text portions nicely, but is tripped up by commas. (Actually, as I write this, I'm thinking I could use the last part of the results array as the text part, and concatenate all the other parts as the numeric part?)

var test = [
'1 day',
'1day',
'10,000 days',
];
console.log(test.map(function (a) {
a = a.replace(/(\d),(\d)/g, '$1$2'); // remove the commas
return a.match(/^(\d+)\s*(.+)$/); // split in two parts
}));

This regular expression works, apart from removing the comma from the matched number string:
([0-9,]+]) *(.*)
You cannot "ignore" a character in a returned regular expression match string, so you will just have to remove the comma from the returned regex match afterwards.

Related

split on words except when phrase contains that word

I am trying to split where clauses, I want to split text on AND|OR|NOT except when NOT is in the 'phrase' NOT IN or NOT LIKE or IS NOT NULL.
1st example:
DEVLDATE IS NOT NULL AND STATUS = D AND PICKUPDATE IS NULL
I expect 3 segments, splitting on the AND's, but not on the NOT in this instance.
2nd ex:
(NOT (STATUS IN ('A','X') )) AND LINEHAUL = 0
I want to split on this NOT & AND, also expecting 3 segments in this instance
I'm trying this look ahead from another almost similar example but it is not splitting at all. I have next to zero regex experience. Not sure what I'm missing or if it's even possible.
Thanks in advance.
var ignoreRegex = /(?!.*\b([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL])\b)(?=.*\b(AND|OR|NOT)\b)/g
var filterArray = filterBy.split(new RegExp(ignoreRegex));
Try with:
\b(AND|OR|NOT(?!\s+NULL|IN|LIKE))\b
DEMO
About your regex:
(?!.*\b([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL])\b)(?=.*\b(AND|OR|NOT)\b
[NOT IN] - this is character class [...] it will match character
which you put in in, so it can match: N,T,etc. not whole
word/sentence,
([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL]) - this whole part actually can match only one character, because it doesn't use any quantifires or intervals, it doesn't work as you expect at all,
so whole regex should match: some text with AND, OR or NOT, but if line within which the part was matched doesn't consist letters and spaces included in character classes..... so it will not match anything probably.

How to write regex for this javascript string

How to write this string below
"(22.0796251, 82.13914120000004),36", "(22.744108, 77.73696700000005),48",...and so on
Like this:
(22.0796251, 82.13914120000004) 36
(22.744108, 77.73696700000005) 48
...and so on.................. ..
How to do this using regex in javscript ?
My try is this:
substring = test.split(',');
where test contains the data to be formatted. But its wrong
You should use the ability of split to split on regular expressions and then keep them in the results. To do this, simply put a capturing group in the regexp. In your case, you will "split" on things in double quote marks:
pieces = test.split(/(".*?")/)
^^^^^^^ CAPTURE GROUP
// ["", ""(22.0796251, 82.13914120000004),36"", ", ", ""(22.744108, 77.73696700000005),48"", ""]
The question mark is to make sure it doesn't eat up all the characters up through the last quote in the input. It makes the * quantifier "non-greedy".
Now get rid of the junk (empty strings and ", "):
pieces = pieces . filter (function(seg) { return !/^[, ]*$/.test(seg); })
// ["(22.0796251, 82.13914120000004),36", "(22.744108, 77.73696700000005),48"]
Next you can break down each piece with another regexp, as in
arrays = pieces . map(function(piece) { return piece.match(/(.*), (.*)/).slice(1); });
// [["(22.0796251, 82.13914120000004)", "36"], ["(22.744108, 87.73696700000005)", "48"]]
The slice is to get rid of the first element of the array returned by match, which is the entire match and we don't need that.
Now print out arrays, split its elements further, or do whatever else you want with it.

Javascript regex match returning a string with comma at the end

Just as the title says...i'm trying to parse a string for example
2x + 3y
and i'm trying to get only the coefficients (i.e. 2 and 3)
I first tokenized it with space character as delimiter giving me "2x" "+" "3y"
then i parsed it again to this statement to get only the coefficients
var number = eqTokens[i].match(/(\-)?\d+/);
I tried printing the output but it gave me "2,"
why is it printing like this and how do i fix it? i tried using:
number = number.replace(/[,]/, "");
but this just gives me an error that number.replace is not a function
What's wrong with this?
> "2x + 3y".match(/-?\d+(?=[A-Za-z]+)/g)
[ '2', '3' ]
The above regex would match the numbers only if it's followed by one or more alphabets.
Match is going to return an array of every match. Since you put the optional negative in a parentheses, it's another capture group. That capture group has one term and it's optional, so it'll return an empty match in addition to your actual match.
Input 2x -> Your output: [2,undefined] which prints out as "2,"
Input -2x -> Your output: [2,-]
Remove the parentheses around the negative.
This is just for the sake of explaining why your case is breaking but personally I'd use Avinash's answer.

Empty strings in array after using the split method with a regexp

I'm reading through Chapter 5 of Professional JavaScript for Web Developers and came across this example involving the split method and a regular expression. My confusion stems from the output of the variable colors3. Why does the array contain an empty string before and after the commas?
var colorText = “red,blue,green,yellow”;
var colors1 = colorText.split(“,”); //[“red”, “blue”, “green”, “yellow”]
var colors2 = colorText.split(“,”, 2); //[“red”, “blue”]
var colors3 = colorText.split(/[^\,]+/); //[“”, “,”, “,”, “,”, “”]
In the last case, you're defining separator as "any run of characters that aren't commas".
Because nothing precedes the first "separator" ("red") and nothing follows the last "separator" ("yellow"). Split presumes that the first separator is preceded by a value, and that the last separator is followed by a value -- as they are, in your first and second examples, and in any normal case such as a line in a CSV file. The only quasi-exception would be if the first (or last) value in the CSV line were an empty string; in that case, what would you see if there were an empty string followed by a separator?
You would see just a seemingly orphaned separator at the beginning of the line (or a separator at the end). It has to be this way because you have to support empty values.
If you preceded "red" with a comma, you would see an initial empty string in the first array, and an initial comma in the last.
I think you're thrown off by the fact that your last regex redefines "separator" as a set of characters normally regarded as data, and redefines "data" as a character normally defined as a separator.
Accept the arbitrariness. Let it flow through you. They're not commas and letters, they're zeroes and ones.

Split string by HTML entities?

My string contain a lot of HTML entities, like this
"Hello <everybody> there"
And I want to split it by HTML entities into this :
Hello
everybody
there
Can anybody suggest me a way to do this please? May be using Regex?
It looks like you can just split on &[^;]*; regex. That is, the delimiter are strings that starts with &, ends with ;, and in between there can be anything but ;.
If you can have multiple delimiters in a row, and you don't want the empty strings between them, just use (&[^;]*;)+ (or in general (delim)+ pattern).
If you can have delimiters in the beginning or front of the string, and you don't want them the empty strings caused by them, then just trim them away before you split.
Example
Here's a snippet to demonstrate the above ideas (see also on ideone.com):
var s = ""Hello <everybody> there""
print (s.split(/&[^;]*;/));
// ,Hello,,everybody,,there,
print (s.split(/(?:&[^;]*;)+/));
// ,Hello,everybody,there,
print (
s.replace(/^(?:&[^;]*;)+/, "")
.replace(/(?:&[^;]*;)+$/, "")
.split(/(?:&[^;]*;)+/)
);
// Hello,everybody,there
var a = str.split(/\&[#a-z0-9]+\;/); should do it, although you'll end up with empty slots in the array when you have two entities next to each other.
split(/&.*?;(?=[^&]|$)/)
and cut the last and first result:
["", "Hello", "everybody", "there", ""]
>> ""Hello <everybody> there"".split(/(?:&[^;]+;)+/)
['', 'Hello', 'everybody', 'there', '']
The regex is: /(?:&[^;]+;)+/
Matches entities as & followed by 1+ non-; characters, followed by a ;. Then matches at least one of those (or more) as the split delimiter. The (?:expression) non-capturing syntax is used so that the delimiters captured don't get put into the result array (split() puts capture groups into the result array if they appear in the pattern).

Categories