How to write regex for this javascript string - javascript

How to write this string below
"(22.0796251, 82.13914120000004),36", "(22.744108, 77.73696700000005),48",...and so on
Like this:
(22.0796251, 82.13914120000004) 36
(22.744108, 77.73696700000005) 48
...and so on.................. ..
How to do this using regex in javscript ?
My try is this:
substring = test.split(',');
where test contains the data to be formatted. But its wrong

You should use the ability of split to split on regular expressions and then keep them in the results. To do this, simply put a capturing group in the regexp. In your case, you will "split" on things in double quote marks:
pieces = test.split(/(".*?")/)
^^^^^^^ CAPTURE GROUP
// ["", ""(22.0796251, 82.13914120000004),36"", ", ", ""(22.744108, 77.73696700000005),48"", ""]
The question mark is to make sure it doesn't eat up all the characters up through the last quote in the input. It makes the * quantifier "non-greedy".
Now get rid of the junk (empty strings and ", "):
pieces = pieces . filter (function(seg) { return !/^[, ]*$/.test(seg); })
// ["(22.0796251, 82.13914120000004),36", "(22.744108, 77.73696700000005),48"]
Next you can break down each piece with another regexp, as in
arrays = pieces . map(function(piece) { return piece.match(/(.*), (.*)/).slice(1); });
// [["(22.0796251, 82.13914120000004)", "36"], ["(22.744108, 87.73696700000005)", "48"]]
The slice is to get rid of the first element of the array returned by match, which is the entire match and we don't need that.
Now print out arrays, split its elements further, or do whatever else you want with it.

Related

How can I include the delimiter with regex String.split()?

I need to parse the tokens from a GS1 UDI format string:
"(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
I would like to split that string with a regex on the "(nnn)" and have the delimiter included with the split values, like this:
[ "(20)987111", "(240)A", "(10)ABC123", "(17)2022-04-01", "(21)888888888888888" ]
Below is a JSFiddle with examples, but in case you want to see it right here:
// This includes the delimiter match in the results, but I want the delimiter included WITH the value
// after it, e.g.: ["(20)987111", ...]
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\))/).filter(Boolean))
// Result: ["(20)", "987111", "(240)", "A", "(10)", "ABC123", "(17)", "2022-04-01", "(21)", "888888888888888"]
// If I include a pattern that should (I think) match the content following the delimiter I will
// only get a single result that is the full string:
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)\W+)/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
// I think this is because I'm effectively mathching the entire string, hence a single result.
// So now I'll try to match only up to the start of the next "(":
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)(^\())/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
I've found and read this question, however the examples there are matching literals and I'm using character classes and getting different results.
I'm failing to create a regex pattern that will provide what I'm after. Here's a JSFiddle of some of the things I've tried: https://jsfiddle.net/6bogpqLy/
I can't guarantee the order of the "application identifiers" in the input string and as such, match with named captures isn't an attractive option.
You can split on positions where parenthesised element follows, by using a zero-length lookahead assertion:
const text = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
const parts = text.split(/(?=\(\d+\))/)
console.log(parts)
Instead of split use match to create the array. Then find 1) digits in parenthesis, followed by a group that might contain a digit, a letter, or a hyphen, and then 2) group that whole query.
(PS. I often find a site like Regex101 really helps when it comes to testing out expressions outside of a development environment.)
const re = /(\(\d+\)[\d\-A-Z]+)/g;
const str = '(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888';
console.log(str.match(re));

How to write regexp for finding :smile: in javascript?

I want to write a regular expression, in JavaScript, for finding the string starting and ending with :.
For example "hello :smile: :sleeping:" from this string I need to find the strings which are starting and ending with the : characters. I tried the expression below, but it didn't work:
^:.*\:$
My guess is that you not only want to find the string, but also replace it. For that you should look at using a capture in the regexp combined with a replacement function.
const emojiPattern = /:(\w+):/g
function replaceEmojiTags(text) {
return text.replace(emojiPattern, function (tag, emotion) {
// The emotion will be the captured word between your tags,
// so either "sleep" or "sleeping" in your example
//
// In this function you would take that emotion and return
// whatever you want based on the input parameter and the
// whole tag would be replaced
//
// As an example, let's say you had a bunch of GIF images
// for the different emotions:
return '<img src="/img/emoji/' + emotion + '.gif" />';
});
}
With that code you could then run your function on any input string and replace the tags to get the HTML for the actual images in them. As in your example:
replaceEmojiTags('hello :smile: :sleeping:')
// 'hello <img src="/img/emoji/smile.gif" /> <img src="/img/emoji/sleeping.gif" />'
EDIT: To support hyphens within the emotion, as in "big-smile", the pattern needs to be changed since it is only looking for word characters. For this there is probably also a restriction such that the hyphen must join two words so that it shouldn't accept "-big-smile" or "big-smile-". For that you need to change the pattern to:
const emojiPattern = /:(\w+(-\w+)*):/g
That pattern is looking for any word that is then followed by zero or more instances of a hyphen followed by a word. It would match any of the following: "smile", "big-smile", "big-smile-bigger".
The ^ and $ are anchors (start and end respectively). These cause your regex to explicitly match an entire string which starts with : has anything between it and ends with :.
If you want to match characters within a string you can remove the anchors.
Your * indicates zero or more so you'll be matching :: as well. It'll be better to change this to + which means one or more. In fact if you're just looking for text you may want to use a range [a-z0-9] with a case insensitive modifier.
If we put it all together we'll have regex like this /:([a-z0-9]+):/gmi
match a string beginning with : with any alphanumeric character one or more times ending in : with the modifiers g globally, m multi-line and i case insensitive for things like :FacePalm:.
Using it in JavaScript we can end up with:
var mytext = 'Hello :smile: and jolly :wave:';
var matches = mytext.match(/:([a-z0-9]+):/gmi);
// matches = [':smile:', ':wave:'];
You'll have an array with each match found.

Javascript skip double pipes in a string

I have the following string:
var test = "test|2014-07-22 12:13:47||ASD|\|nameOfSomething123\||anothersmt";
var s = test.split('|');
console.log(s);
//outputs
[ 'test',
'2014-07-22 12:13:47',
'',
'ASD',
'',
'nameOfSomething123',
'',
'anothersmt' ]
Because the |nameOfSomething123| also has pipes, the split('|'), the result is not good, I need to get rid of the 5 and 6th position. No good.
I would like to split it, but skipping \|nameOfSomething123\|
Does anyone know how to solve it ?
Thank you.
First, I'm going to assume that your test string actually contains \| sequences. If you were to write the string literal as you've shown, \| would be interpreted as an escape sequence for |. For this script to work as you've shown, you'd need to write test like this:
var test = "test|2014-07-22 12:13:47||ASD|\\|nameOfSomething123\\||anothersmt";
You can accomplish this pretty easily using match instead of split:
test.match(/(\\\||[^|])+/g);
// outputs
[ "test",
"2014-07-22 12:13:47",
"ASD",
"\|nameOfSomething123\|",
"anothersmt" ]
This pattern matches one or more sequences of either \| or any character other than |. Note that the the \ and the | need to be escaped to refer to literal \ and | characters. Given your sample input, this should accomplish the goal. (Of course if the \ can be escaped, too, that's complicates it a bit)
If you need to capture empty strings between two pipes like ||, then you can use split around the matched values and filter out the separators. For example:
test.split(/((?:\\\||[^|])*)/g).filter(function(x, i) { return i % 2 });
// outputs
[ "test",
"2014-07-22 12:13:47",
"",
"ASD",
"\|nameOfSomething123\|",
"anothersmt" ]
This works because split will return any captured substrings as a separate entry in the result array. Then filter just picks every other element from the result. Note that filter requires ECMAScript 5.1 or later, so it may not work in older browsers. If this is a problem, see the polyfill option described in the linked documentation.
I don't see why this is a hard problem. If your separator is always |, then the only case when you get an empty string from .split is going to be when you have a double | (or triple or quadruple). As long as the double pipes have no semantic purpose for you, all you need to do is get rid of the empty strings:
function check_for_empty_string(element){
if (element.length != 0) return element;
}
s = s.filter(check_for_empty_string);
Now s should only contain non-empty strings and you're done. Array.filter is a javascript built-in that takes a callback that checks an element. Whatever you return from the callback passes through the filter and into the new array. Here I've used the old array as the target, for brevity, but .filter returns a new array so you can keep the old one if you want.

Javascript Regex: replacing the last dot for a comma

I have the following code:
var x = "100.007"
x = String(parseFloat(x).toFixed(2));
return x
=> 100.01
This works awesomely just how I want it to work. I just want a tiny addition, which is something like:
var x = "100,007"
x.replace(",", ".")
x.replace
x = String(parseFloat(x).toFixed(2));
x.replace(".", ",")
return x
=> 100,01
However, this code will replace the first occurrence of the ",", where I want to catch the last one. Any help would be appreciated.
You can do it with a regular expression:
x = x.replace(/,([^,]*)$/, ".$1");
That regular expression matches a comma followed by any amount of text not including a comma. The replacement string is just a period followed by whatever it was that came after the original last comma. Other commas preceding it in the string won't be affected.
Now, if you're really converting numbers formatted in "European style" (for lack of a better term), you're also going to need to worry about the "." characters in places where a "U.S. style" number would have commas. I think you would probably just want to get rid of them:
x = x.replace(/\./g, '');
When you use the ".replace()" function on a string, you should understand that it returns the modified string. It does not modify the original string, however, so a statement like:
x.replace(/something/, "something else");
has no effect on the value of "x".
You can use a regexp. You want to replace the last ',', so the basic idea is to replace the ',' for which there's no ',' after.
x.replace(/,([^,]*)$/, ".$1");
Will return what you want :-).
You could do it using the lastIndexOf() function to find the last occurrence of the , and replace it.
The alternative is to use a regular expression with the end of line marker:
myOldString.replace(/,([^,]*)$/, ".$1");
You can use lastIndexOf to find the last occurence of ,. Then you can use slice to put the part before and after the , together with a . inbetween.
You don't need to worry about whether or not it's the last ".", because there is only one. JavaScript doesn't store numbers internally with comma or dot-delimited sets.

Split string by HTML entities?

My string contain a lot of HTML entities, like this
"Hello <everybody> there"
And I want to split it by HTML entities into this :
Hello
everybody
there
Can anybody suggest me a way to do this please? May be using Regex?
It looks like you can just split on &[^;]*; regex. That is, the delimiter are strings that starts with &, ends with ;, and in between there can be anything but ;.
If you can have multiple delimiters in a row, and you don't want the empty strings between them, just use (&[^;]*;)+ (or in general (delim)+ pattern).
If you can have delimiters in the beginning or front of the string, and you don't want them the empty strings caused by them, then just trim them away before you split.
Example
Here's a snippet to demonstrate the above ideas (see also on ideone.com):
var s = ""Hello <everybody> there""
print (s.split(/&[^;]*;/));
// ,Hello,,everybody,,there,
print (s.split(/(?:&[^;]*;)+/));
// ,Hello,everybody,there,
print (
s.replace(/^(?:&[^;]*;)+/, "")
.replace(/(?:&[^;]*;)+$/, "")
.split(/(?:&[^;]*;)+/)
);
// Hello,everybody,there
var a = str.split(/\&[#a-z0-9]+\;/); should do it, although you'll end up with empty slots in the array when you have two entities next to each other.
split(/&.*?;(?=[^&]|$)/)
and cut the last and first result:
["", "Hello", "everybody", "there", ""]
>> ""Hello <everybody> there"".split(/(?:&[^;]+;)+/)
['', 'Hello', 'everybody', 'there', '']
The regex is: /(?:&[^;]+;)+/
Matches entities as & followed by 1+ non-; characters, followed by a ;. Then matches at least one of those (or more) as the split delimiter. The (?:expression) non-capturing syntax is used so that the delimiters captured don't get put into the result array (split() puts capture groups into the result array if they appear in the pattern).

Categories