How to compress IPV6 address using javascript? - javascript

I have seen the code to compress IPV6 in java.
The link specifies the same.
Below is the code in Java . String resultString = subjectString.replaceAll("((?::0\\b){2,}):?(?!\\S*\\b\\1:0\\b)(\\S*)", "::$2");
But in Javascript I am confused as how can I get the regex expression to match the same . Can you share some pointers here?
Example : fe80:00:00:00:8e3:a11a:2a49:1148
Result : fe80::8e3:a11a:2a49:1148

There's a couple problems with the other answer by #ClasG:
If the repeating zeroes are at the beginning of the IPv6 address or it's all zeroes, only 1 colon is replaced.
If the repeating zeroes are at the end, they're not replaced.
I suggest using the regex \b:?(?:0+:?){2,} and have it replaced with :: (two colons)
Regex101 tests
JavaScript example:
var ips = [
'2001:0db8:ac10:0000:0000:0000:0000:ffff',
'2001:0db8:ac10:0000:0000:0000:0000:0000',
'0:0:0:0:0:2001:0db8:ac10',
'2001:0db8:ac10:aaaa:0000:bbbb:cccc:ffff',
'2001:0db8:ac10:0000:0000:bbbb:00:00'
];
for (var i = 0; i < ips.length; i++) {
document.write(ips[i].replace(/\b:?(?:0+:?){2,}/, '::') + "<br>");
}
Note: The Regex101 tests replace multiple repeating groups of zeroes. In XYZ programming language, you'll have to limit the number of replacements to 1. In JavaScript, you omit the global flag. In PHP, you set the $limit for preg_replace to 1.

You can do it by replacing
\b(?:0+:){2,}
with
:
function compIPV6(input) {
return input.replace(/\b(?:0+:){2,}/, ':');
}
document.write(compIPV6('2001:db8:0:0:0:0:2:1') + '<br/>');
document.write(compIPV6('fe80:00:00:00:8e3:a11a:2a49:1148' + '<br/>'));
Check it out at regex101.

You can use this method in order to compress IPv6 AND remove leading 0s:
function compressIPV6(input) {
var formatted = input.replace(/\b(?:0+:){2,}/, ':');
var finalAddress = formatted.split(':')
.map(function(octet) {
return octet.replace(/\b0+/g, '');
}).join(':');
return finalAddress;
}
document.write(compressIPV6('2001:0db8:0000:0000:0000:0000:1428:57ab') );

You can use a function that considers all of the needed cases:
const compressIPV6 = (ip) => {
//First remove the leading 0s of the octets. If it's '0000', replace with '0'
let output = ip.split(':').map(terms => terms.replace(/\b0+/g, '') || '0').join(":");
//Then search for all occurrences of continuous '0' octets
let zeros = [...output.matchAll(/\b:?(?:0+:?){2,}/g)];
//If there are occurences, see which is the longest one and replace it with '::'
if (zeros.length > 0) {
let max = '';
zeros.forEach(item => {
if (item[0].replaceAll(':', '').length > max.replaceAll(':', '').length) {
max = item[0];
}
})
output = output.replace(max, '::');
}
return output;
}
document.write(compressIPV6('38c1:3db8:0000:0000:0000:0000:0043:000a') + '<br/>');
document.write(compressIPV6('0000:0000:0000:0000:38c1:3db8:0043:000a') + '<br/>');
document.write(compressIPV6('38c1:3db8:0000:0043:000a:0000:0000:0000') + '<br/>');
document.write(compressIPV6('38c1:0000:0000:3db8:0000:0000:0000:12ab') + '<br/>');
If there's more than one occurrence of consecutive '0' octets of the same length, it will only replace the first one. This will work regardless if the repeating zeroes are at the beginning, at the middle or at the end.

Related

RegExp replace all letter but not first and last

I have to replace all letters of name on ****.
Example:
Jeniffer -> J****r
I try $(this).text( $(this).text().replace(/([^\w])\//g, "*"))
Also, if name is Ron -> R****n
You can use a regular expression for this, by capturing the first and last letters in a capture group and ignoring all letters between them, then using the capture groups in the replacement:
var updated = name.replace(/^(.).*(.)$/, "$1****$2");
Live Example:
function obscure(name) {
return name.replace(/^(.).*(.)$/, "$1****$2");
}
function test(name) {
console.log(name, "=>", obscure(name));
}
test("Ron");
test("Jeniffer");
But it's perhaps easier without:
var updated = name[0] + "****" + name[name.length - 1];
Live Example:
function obscure(name) {
return name[0] + "****" + name[name.length - 1];;
}
function test(name) {
console.log(name, "=>", obscure(name));
}
test("Ron");
test("Jeniffer");
Both of those do assume the names will be at least two characters long. I pity the fool who tries this on Mr. T's surname.
Since, you need to have four asterisk on each condition, you can create a reusable function that will create this format for you:
function replace(str){
var firstChar = str.charAt(0);
var lastChar = str.charAt(str.length-1);
return firstChar + '****' + lastChar;
}
var str = 'Jeniffer';
console.log(replace(str));
str = 'America';
console.log(replace(str))
Appears that you're looking for regex lookaround
Regex: (?<=\w)(\w+)(?=\w) - group 1 matches all characters which follow one character and followed by another one.
Tests: https://regex101.com/r/PPeEqx/2/
More Info: https://www.regular-expressions.info/lookaround.html
Find first and last chars and append **** to the first one and add the last one:
const firstName = 'Jeniffer';
const result = firstName.match(/^.|.$/gi).reduce((s, c, i) => `${s}${!i ? `${c}****` : c }`, '');
console.log(result);

How to separate the values of a line of .csv file which contains commas in data? [duplicate]

I have the following type of string
var string = "'string, duppi, du', 23, lala"
I want to split the string into an array on each comma, but only the commas outside the single quotation marks.
I can't figure out the right regular expression for the split...
string.split(/,/)
will give me
["'string", " duppi", " du'", " 23", " lala"]
but the result should be:
["string, duppi, du", "23", "lala"]
Is there a cross-browser solution?
Disclaimer
2014-12-01 Update: The answer below works only for one very specific format of CSV. As correctly pointed out by DG in the comments, this solution does NOT fit the RFC 4180 definition of CSV and it also does NOT fit MS Excel format. This solution simply demonstrates how one can parse one (non-standard) CSV line of input which contains a mix of string types, where the strings may contain escaped quotes and commas.
A non-standard CSV solution
As austincheney correctly points out, you really need to parse the string from start to finish if you wish to properly handle quoted strings that may contain escaped characters. Also, the OP does not clearly define what a "CSV string" really is. First we must define what constitutes a valid CSV string and its individual values.
Given: "CSV String" Definition
For the purpose of this discussion, a "CSV string" consists of zero or more values, where multiple values are separated by a comma. Each value may consist of:
A double quoted string. (may contain unescaped single quotes.)
A single quoted string. (may contain unescaped double quotes.)
A non-quoted string. (may NOT contain quotes, commas or backslashes.)
An empty value. (An all whitespace value is considered empty.)
Rules/Notes:
Quoted values may contain commas.
Quoted values may contain escaped-anything, e.g. 'that\'s cool'.
Values containing quotes, commas, or backslashes must be quoted.
Values containing leading or trailing whitespace must be quoted.
The backslash is removed from all: \' in single quoted values.
The backslash is removed from all: \" in double quoted values.
Non-quoted strings are trimmed of any leading and trailing spaces.
The comma separator may have adjacent whitespace (which is ignored).
Find:
A JavaScript function which converts a valid CSV string (as defined above) into an array of string values.
Solution:
The regular expressions used by this solution are complex. And (IMHO) all non-trivial regexes should be presented in free-spacing mode with lots of comments and indentation. Unfortunately, JavaScript does not allow free-spacing mode. Thus, the regular expressions implemented by this solution are first presented in native regex syntax (expressed using Python's handy: r'''...''' raw-multi-line-string syntax).
First here is a regular expression which validates that a CVS string meets the above requirements:
Regex to validate a "CSV string":
re_valid = r"""
# Validate a CSV string having single, double or un-quoted values.
^ # Anchor to start of string.
\s* # Allow whitespace before value.
(?: # Group for value alternatives.
'[^'\\]*(?:\\[\S\s][^'\\]*)*' # Either Single quoted string,
| "[^"\\]*(?:\\[\S\s][^"\\]*)*" # or Double quoted string,
| [^,'"\s\\]*(?:\s+[^,'"\s\\]+)* # or Non-comma, non-quote stuff.
) # End group of value alternatives.
\s* # Allow whitespace after value.
(?: # Zero or more additional values
, # Values separated by a comma.
\s* # Allow whitespace before value.
(?: # Group for value alternatives.
'[^'\\]*(?:\\[\S\s][^'\\]*)*' # Either Single quoted string,
| "[^"\\]*(?:\\[\S\s][^"\\]*)*" # or Double quoted string,
| [^,'"\s\\]*(?:\s+[^,'"\s\\]+)* # or Non-comma, non-quote stuff.
) # End group of value alternatives.
\s* # Allow whitespace after value.
)* # Zero or more additional values
$ # Anchor to end of string.
"""
If a string matches the above regex, then that string is a valid CSV string (according to the rules previously stated) and may be parsed using the following regex. The following regex is then used to match one value from the CSV string. It is applied repeatedly until no more matches are found (and all values have been parsed).
Regex to parse one value from valid CSV string:
re_value = r"""
# Match one value in valid CSV string.
(?!\s*$) # Don't match empty last value.
\s* # Strip whitespace before value.
(?: # Group for value alternatives.
'([^'\\]*(?:\\[\S\s][^'\\]*)*)' # Either $1: Single quoted string,
| "([^"\\]*(?:\\[\S\s][^"\\]*)*)" # or $2: Double quoted string,
| ([^,'"\s\\]*(?:\s+[^,'"\s\\]+)*) # or $3: Non-comma, non-quote stuff.
) # End group of value alternatives.
\s* # Strip whitespace after value.
(?:,|$) # Field ends on comma or EOS.
"""
Note that there is one special case value that this regex does not match - the very last value when that value is empty. This special "empty last value" case is tested for and handled by the js function which follows.
JavaScript function to parse CSV string:
// Return array of string values, or NULL if CSV string not well formed.
function CSVtoArray(text) {
var re_valid = /^\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*(?:,\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*)*$/;
var re_value = /(?!\s*$)\s*(?:'([^'\\]*(?:\\[\S\s][^'\\]*)*)'|"([^"\\]*(?:\\[\S\s][^"\\]*)*)"|([^,'"\s\\]*(?:\s+[^,'"\s\\]+)*))\s*(?:,|$)/g;
// Return NULL if input string is not well formed CSV string.
if (!re_valid.test(text)) return null;
var a = []; // Initialize array to receive values.
text.replace(re_value, // "Walk" the string using replace with callback.
function(m0, m1, m2, m3) {
// Remove backslash from \' in single quoted values.
if (m1 !== undefined) a.push(m1.replace(/\\'/g, "'"));
// Remove backslash from \" in double quoted values.
else if (m2 !== undefined) a.push(m2.replace(/\\"/g, '"'));
else if (m3 !== undefined) a.push(m3);
return ''; // Return empty string.
});
// Handle special case of empty last value.
if (/,\s*$/.test(text)) a.push('');
return a;
};
Example input and output:
In the following examples, curly braces are used to delimit the {result strings}. (This is to help visualize leading/trailing spaces and zero-length strings.)
// Test 1: Test string from original question.
var test = "'string, duppi, du', 23, lala";
var a = CSVtoArray(test);
/* Array hes 3 elements:
a[0] = {string, duppi, du}
a[1] = {23}
a[2] = {lala} */
// Test 2: Empty CSV string.
var test = "";
var a = CSVtoArray(test);
/* Array hes 0 elements: */
// Test 3: CSV string with two empty values.
var test = ",";
var a = CSVtoArray(test);
/* Array hes 2 elements:
a[0] = {}
a[1] = {} */
// Test 4: Double quoted CSV string having single quoted values.
var test = "'one','two with escaped \' single quote', 'three, with, commas'";
var a = CSVtoArray(test);
/* Array hes 3 elements:
a[0] = {one}
a[1] = {two with escaped ' single quote}
a[2] = {three, with, commas} */
// Test 5: Single quoted CSV string having double quoted values.
var test = '"one","two with escaped \" double quote", "three, with, commas"';
var a = CSVtoArray(test);
/* Array hes 3 elements:
a[0] = {one}
a[1] = {two with escaped " double quote}
a[2] = {three, with, commas} */
// Test 6: CSV string with whitespace in and around empty and non-empty values.
var test = " one , 'two' , , ' four' ,, 'six ', ' seven ' , ";
var a = CSVtoArray(test);
/* Array hes 8 elements:
a[0] = {one}
a[1] = {two}
a[2] = {}
a[3] = { four}
a[4] = {}
a[5] = {six }
a[6] = { seven }
a[7] = {} */
Additional notes:
This solution requires that the CSV string be "valid". For example, unquoted values may not contain backslashes or quotes, e.g. the following CSV string is NOT valid:
var invalid1 = "one, that's me!, escaped \, comma"
This is not really a limitation because any sub-string may be represented as either a single or double quoted value. Note also that this solution represents only one possible definition for: "Comma Separated Values".
Edit: 2014-05-19: Added disclaimer.
Edit: 2014-12-01: Moved disclaimer to top.
RFC 4180 solution
This does not solve the string in the question since its format is not conforming with RFC 4180; the acceptable encoding is escaping double quote with double quote. The solution below works correctly with CSV files d/l from google spreadsheets.
UPDATE (3/2017)
Parsing single line would be wrong. According to RFC 4180 fields may contain CRLF which will cause any line reader to break the CSV file. Here is an updated version that parses CSV string:
'use strict';
function csvToArray(text) {
let p = '', row = [''], ret = [row], i = 0, r = 0, s = !0, l;
for (l of text) {
if ('"' === l) {
if (s && l === p) row[i] += l;
s = !s;
} else if (',' === l && s) l = row[++i] = '';
else if ('\n' === l && s) {
if ('\r' === p) row[i] = row[i].slice(0, -1);
row = ret[++r] = [l = '']; i = 0;
} else row[i] += l;
p = l;
}
return ret;
};
let test = '"one","two with escaped """" double quotes""","three, with, commas",four with no quotes,"five with CRLF\r\n"\r\n"2nd line one","two with escaped """" double quotes""","three, with, commas",four with no quotes,"five with CRLF\r\n"';
console.log(csvToArray(test));
OLD ANSWER
(Single line solution)
function CSVtoArray(text) {
let ret = [''], i = 0, p = '', s = true;
for (let l in text) {
l = text[l];
if ('"' === l) {
s = !s;
if ('"' === p) {
ret[i] += '"';
l = '-';
} else if ('' === p)
l = '-';
} else if (s && ',' === l)
l = ret[++i] = '';
else
ret[i] += l;
p = l;
}
return ret;
}
let test = '"one","two with escaped """" double quotes""","three, with, commas",four with no quotes,five for fun';
console.log(CSVtoArray(test));
And for the fun, here is how you create CSV from the array:
function arrayToCSV(row) {
for (let i in row) {
row[i] = row[i].replace(/"/g, '""');
}
return '"' + row.join('","') + '"';
}
let row = [
"one",
"two with escaped \" double quote",
"three, with, commas",
"four with no quotes (now has)",
"five for fun"
];
let text = arrayToCSV(row);
console.log(text);
I liked FakeRainBrigand's answer, however it contains a few problems: It can not handle whitespace between a quote and a comma, and does not support 2 consecutive commas. I tried editing his answer but my edit got rejected by reviewers that apparently did not understand my code. Here is my version of FakeRainBrigand's code.
There is also a fiddle: http://jsfiddle.net/xTezm/46/
String.prototype.splitCSV = function() {
var matches = this.match(/(\s*"[^"]+"\s*|\s*[^,]+|,)(?=,|$)/g);
for (var n = 0; n < matches.length; ++n) {
matches[n] = matches[n].trim();
if (matches[n] == ',') matches[n] = '';
}
if (this[0] == ',') matches.unshift("");
return matches;
}
var string = ',"string, duppi, du" , 23 ,,, "string, duppi, du",dup,"", , lala';
var parsed = string.splitCSV();
alert(parsed.join('|'));
I had a very specific use case where I wanted to copy cells from Google Sheets into my web app. Cells could include double-quotes and new-line characters. Using copy and paste, the cells are delimited by a tab characters, and cells with odd data are double quoted. I tried this main solution, the linked article using regexp, and Jquery-CSV, and CSVToArray. http://papaparse.com/ Is the only one that worked out of the box. Copy and paste is seamless with Google Sheets with default auto-detect options.
PEG(.js) grammar that handles RFC 4180 examples at http://en.wikipedia.org/wiki/Comma-separated_values:
start
= [\n\r]* first:line rest:([\n\r]+ data:line { return data; })* [\n\r]* { rest.unshift(first); return rest; }
line
= first:field rest:("," text:field { return text; })*
& { return !!first || rest.length; } // ignore blank lines
{ rest.unshift(first); return rest; }
field
= '"' text:char* '"' { return text.join(''); }
/ text:[^\n\r,]* { return text.join(''); }
char
= '"' '"' { return '"'; }
/ [^"]
Test at http://jsfiddle.net/knvzk/10 or https://pegjs.org/online.
Download the generated parser at https://gist.github.com/3362830.
People seemed to be against RegEx for this. Why?
(\s*'[^']+'|\s*[^,]+)(?=,|$)
Here's the code. I also made a fiddle.
String.prototype.splitCSV = function(sep) {
var regex = /(\s*'[^']+'|\s*[^,]+)(?=,|$)/g;
return matches = this.match(regex);
}
var string = "'string, duppi, du', 23, 'string, duppi, du', lala";
console.log( string.splitCSV() );
.as-console-wrapper { max-height: 100% !important; top: 0; }
Adding one more to the list, because I find all of the above not quite "KISS" enough.
This one uses regex to find either commas or newlines while skipping over quoted items. Hopefully this is something noobies can read through on their own. The splitFinder regexp has three things it does (split by a |):
, - finds commas
\r?\n - finds new lines, (potentially with carriage return if the exporter was nice)
"(\\"|[^"])*?" - skips anynthing surrounded in quotes, because commas and newlines don't matter in there. If there is an escaped quote \\" in the quoted item, it will get captured before an end quote can be found.
const splitFinder = /,|\r?\n|"(\\"|[^"])*?"/g;
function csvTo2dArray(parseMe) {
let currentRow = [];
const rowsOut = [currentRow];
let lastIndex = splitFinder.lastIndex = 0;
// add text from lastIndex to before a found newline or comma
const pushCell = (endIndex) => {
endIndex = endIndex || parseMe.length;
const addMe = parseMe.substring(lastIndex, endIndex);
// remove quotes around the item
currentRow.push(addMe.replace(/^"|"$/g, ""));
lastIndex = splitFinder.lastIndex;
}
let regexResp;
// for each regexp match (either comma, newline, or quoted item)
while (regexResp = splitFinder.exec(parseMe)) {
const split = regexResp[0];
// if it's not a quote capture, add an item to the current row
// (quote captures will be pushed by the newline or comma following)
if (split.startsWith(`"`) === false) {
const splitStartIndex = splitFinder.lastIndex - split.length;
pushCell(splitStartIndex);
// then start a new row if newline
const isNewLine = /^\r?\n$/.test(split);
if (isNewLine) { rowsOut.push(currentRow = []); }
}
}
// make sure to add the trailing text (no commas or newlines after)
pushCell();
return rowsOut;
}
const rawCsv = `a,b,c\n"test\r\n","comma, test","\r\n",",",\nsecond,row,ends,with,empty\n"quote\"test"`
const rows = csvTo2dArray(rawCsv);
console.log(rows);
No regexp, readable, and according to https://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules:
function csv2arr(str: string) {
let line = ["",];
const ret = [line,];
let quote = false;
for (let i = 0; i < str.length; i++) {
const cur = str[i];
const next = str[i + 1];
if (!quote) {
const cellIsEmpty = line[line.length - 1].length === 0;
if (cur === '"' && cellIsEmpty) quote = true;
else if (cur === ",") line.push("");
else if (cur === "\r" && next === "\n") { line = ["",]; ret.push(line); i++; }
else if (cur === "\n" || cur === "\r") { line = ["",]; ret.push(line); }
else line[line.length - 1] += cur;
} else {
if (cur === '"' && next === '"') { line[line.length - 1] += cur; i++; }
else if (cur === '"') quote = false;
else line[line.length - 1] += cur;
}
}
return ret;
}
If you can have your quote delimiter be double quotes, then this is a duplicate of Example JavaScript code to parse CSV data.
You can either translate all single-quotes to double-quotes first:
string = string.replace( /'/g, '"' );
...or you can edit the regex in that question to recognize single-quotes instead of double-quotes:
// Quoted fields.
"(?:'([^']*(?:''[^']*)*)'|" +
However, this assumes certain markup that is not clear from your question. Please clarify what all the various possibilities of markup can be, per my comment on your question.
I've used regex a number of times, but I always have to relearn it each time, which is frustrating :-)
So Here's a non-regex solution:
function csvRowToArray(row, delimiter = ',', quoteChar = '"'){
let nStart = 0, nEnd = 0, a=[], nRowLen=row.length, bQuotedValue;
while (nStart <= nRowLen) {
bQuotedValue = (row.charAt(nStart) === quoteChar);
if (bQuotedValue) {
nStart++;
nEnd = row.indexOf(quoteChar + delimiter, nStart)
} else {
nEnd = row.indexOf(delimiter, nStart)
}
if (nEnd < 0) nEnd = nRowLen;
a.push(row.substring(nStart,nEnd));
nStart = nEnd + delimiter.length + (bQuotedValue ? 1 : 0)
}
return a;
}
How it works:
Pass in the csv string in row.
While the start position of the next value is within the row, do the following:
If this value has been quoted, set nEnd to the closing quote.
Else if value has NOT been quoted, set nEnd to the next delimiter.
Add the value to an array.
Set nStart to nEnd plus the length of the delimeter.
Sometimes it's good to write your own small function, rather than use a library. Your own code is going to perform well and use only a small footprint. In addition, you can easily tweak it to suit your own needs.
Regular expressions to the rescue! These few lines of code properly handle quoted fields with embedded commas, quotes, and newlines based on the RFC 4180 standard.
function parseCsv(data, fieldSep, newLine) {
fieldSep = fieldSep || ',';
newLine = newLine || '\n';
var nSep = '\x1D';
var qSep = '\x1E';
var cSep = '\x1F';
var nSepRe = new RegExp(nSep, 'g');
var qSepRe = new RegExp(qSep, 'g');
var cSepRe = new RegExp(cSep, 'g');
var fieldRe = new RegExp('(?<=(^|[' + fieldSep + '\\n]))"(|[\\s\\S]+?(?<![^"]"))"(?=($|[' + fieldSep + '\\n]))', 'g');
var grid = [];
data.replace(/\r/g, '').replace(/\n+$/, '').replace(fieldRe, function(match, p1, p2) {
return p2.replace(/\n/g, nSep).replace(/""/g, qSep).replace(/,/g, cSep);
}).split(/\n/).forEach(function(line) {
var row = line.split(fieldSep).map(function(cell) {
return cell.replace(nSepRe, newLine).replace(qSepRe, '"').replace(cSepRe, ',');
});
grid.push(row);
});
return grid;
}
const csv = 'A1,B1,C1\n"A ""2""","B, 2","C\n2"';
const separator = ','; // field separator, default: ','
const newline = ' <br /> '; // newline representation in case a field contains newlines, default: '\n'
var grid = parseCsv(csv, separator, newline);
// expected: [ [ 'A1', 'B1', 'C1' ], [ 'A "2"', 'B, 2', 'C <br /> 2' ] ]
Unless stated elsewhere, you don't need a finite state machine. The regular expression handles RFC 4180 properly thanks to positive lookbehind, negative lookbehind, and positive lookahead.
Clone/download code at https://github.com/peterthoeny/parse-csv-js
I have also faced the same type of problem when I had to parse a CSV file.
The file contains a column address which contains the ',' .
After parsing that CSV file to JSON, I get mismatched mapping of the keys while converting it into a JSON file.
I used Node.js for parsing the file and libraries like baby parse and csvtojson.
Example of file -
address,pincode
foo,baar , 123456
While I was parsing directly without using baby parse in JSON, I was getting:
[{
address: 'foo',
pincode: 'baar',
'field3': '123456'
}]
So I wrote code which removes the comma(,) with any other delimiter
with every field:
/*
csvString(input) = "address, pincode\\nfoo, bar, 123456\\n"
output = "address, pincode\\nfoo {YOUR DELIMITER} bar, 123455\\n"
*/
const removeComma = function(csvString){
let delimiter = '|'
let Baby = require('babyparse')
let arrRow = Baby.parse(csvString).data;
/*
arrRow = [
[ 'address', 'pincode' ],
[ 'foo, bar', '123456']
]
*/
return arrRow.map((singleRow, index) => {
//the data will include
/*
singleRow = [ 'address', 'pincode' ]
*/
return singleRow.map(singleField => {
//for removing the comma in the feild
return singleField.split(',').join(delimiter)
})
}).reduce((acc, value, key) => {
acc = acc +(Array.isArray(value) ?
value.reduce((acc1, val)=> {
acc1 = acc1+ val + ','
return acc1
}, '') : '') + '\n';
return acc;
},'')
}
The function returned can be passed into the csvtojson library and thus the result can be used.
const csv = require('csvtojson')
let csvString = "address, pincode\\nfoo, bar, 123456\\n"
let jsonArray = []
modifiedCsvString = removeComma(csvString)
csv()
.fromString(modifiedCsvString)
.on('json', json => jsonArray.push(json))
.on('end', () => {
/* do any thing with the json Array */
})
Now you can get the output like:
[{
address: 'foo, bar',
pincode: 123456
}]
My answer presumes your input is a reflection of code/content from web sources where single and double quote characters are fully interchangeable provided they occur as an non-escaped matching set.
You cannot use regex for this. You actually have to write a micro parser to analyze the string you wish to split. I will, for the sake of this answer, call the quoted parts of your strings as sub-strings. You need to specifically walk across the string. Consider the following case:
var a = "some sample string with \"double quotes\" and 'single quotes' and some craziness like this: \\\" or \\'",
b = "sample of code from JavaScript with a regex containing a comma /\,/ that should probably be ignored.";
In this case you have absolutely no idea where a sub-string starts or ends by simply analyzing the input for a character pattern. Instead you have to write logic to make decisions on whether a quote character is used a quote character, is itself unquoted, and that the quote character is not following an escape.
I am not going to write that level of complexity of code for you, but you can look at something I recently wrote that has the pattern you need. This code has nothing to do with commas, but is otherwise a valid enough micro-parser for you to follow in writing your own code. Look into the asifix function of the following application:
https://github.com/austincheney/Pretty-Diff/blob/master/fulljsmin.js
To complement this answer
If you need to parse quotes escaped with another quote, example:
"some ""value"" that is on xlsx file",123
You can use
function parse(text) {
const csvExp = /(?!\s*$)\s*(?:'([^'\\]*(?:\\[\S\s][^'\\]*)*)'|"([^"\\]*(?:\\[\S\s][^"\\]*)*)"|"([^""]*(?:"[\S\s][^""]*)*)"|([^,'"\s\\]*(?:\s+[^,'"\s\\]+)*))\s*(?:,|$)/g;
const values = [];
text.replace(csvExp, (m0, m1, m2, m3, m4) => {
if (m1 !== undefined) {
values.push(m1.replace(/\\'/g, "'"));
}
else if (m2 !== undefined) {
values.push(m2.replace(/\\"/g, '"'));
}
else if (m3 !== undefined) {
values.push(m3.replace(/""/g, '"'));
}
else if (m4 !== undefined) {
values.push(m4);
}
return '';
});
if (/,\s*$/.test(text)) {
values.push('');
}
return values;
}
While reading the CSV file into a string, it contains null values in between strings, so try it with \0 line by line. It works for me.
stringLine = stringLine.replace(/\0/g, "" );
Try this one.
function parseCSV(csv) {
let quotes = [];
let token = /(?:(['"`])([\s\S]*?)\1)|([^\t,\r\n]+)\3?|([\r\n])/gm;
let text = csv.replace(/\\?(['"`])\1?/gm, s => s.length != 2 ? s : `_r#${quotes.push(s) - 1}`);
return [...text.matchAll(token)]
.map(t => (t[2] || t[3] || t[4])
.replace(/^_r#\d+$/, "")
.replace(/_r#\d+/g, q => quotes[q.replace(/\D+/, '')][1]))
.reduce((a, b) => /^[\r\n]$/g.test(b)
? a.push([]) && a
: a[a.length - 1].push(b) && a, [[]])
.filter(d => d.length);
}
Use the npm library csv-string to parse the strings instead of split: https://www.npmjs.com/package/csv-string
This will handle the comma in quotes and empty entries
This one is based on niry's answer but for semicolon:
'use strict';
function csvToArray(text) {
let p = '', row = [''], ret = [row], i = 0, r = 0, s = !0, l;
for (l of text) {
if ('"' === l) {
if (s && l === p) row[i] += l;
s = !s;
} else if (';' === l && s) l = row[++i] = '';
else if ('\n' === l && s) {
if ('\r' === p) row[i] = row[i].slice(0, -1);
row = ret[++r] = [l = '']; i = 0;
} else row[i] += l;
p = l;
}
return ret;
};
let test = '"one";"two with escaped """" double quotes""";"three; with; commas";four with no quotes;"five with CRLF\r\n"\r\n"2nd line one";"two with escaped """" double quotes""";"three, with; commas and semicolons";four with no quotes;"five with CRLF\r\n"';
console.log(csvToArray(test));
Aside from the excellent and complete answer from ridgerunner, I thought of a very simple workaround for when your backend runs PHP.
Add this PHP file to your domain's backend (say: csv.php)
<?php
session_start(); // Optional
header("content-type: text/xml");
header("charset=UTF-8");
// Set the delimiter and the End of Line character of your CSV content:
echo json_encode(array_map('str_getcsv', str_getcsv($_POST["csv"], "\n")));
?>
Now add this function to your JavaScript toolkit (should be revised a bit to make crossbrowser I believe).
function csvToArray(csv) {
var oXhr = new XMLHttpRequest;
oXhr.addEventListener("readystatechange",
function () {
if (this.readyState == 4 && this.status == 200) {
console.log(this.responseText);
console.log(JSON.parse(this.responseText));
}
}
);
oXhr.open("POST","path/to/csv.php",true);
oXhr.setRequestHeader("Content-type", "application/x-www-form-urlencoded; charset=utf-8");
oXhr.send("csv=" + encodeURIComponent(csv));
}
It will cost you one Ajax call, but at least you won't duplicate code nor include any external library.
Ref: http://php.net/manual/en/function.str-getcsv.php
You can use papaparse.js like the example below:
<!DOCTYPE html>
<html lang="en">
<head>
<title>CSV</title>
</head>
<body>
<input type="file" id="files" multiple="">
<button onclick="csvGetter()">CSV Getter</button>
<h3>The Result will be in the Console.</h3>
<script src="papaparse.min.js"></script>
<script>
function csvGetter() {
var file = document.getElementById('files').files[0];
Papa.parse(file, {
complete: function(results) {
console.log(results.data);
}
});
}
</script>
</body>
</html>
Don't forget to include papaparse.js in the same folder.
According to this blog post, this function should do it:
String.prototype.splitCSV = function(sep) {
for (var foo = this.split(sep = sep || ","), x = foo.length - 1, tl; x >= 0; x--) {
if (foo[x].replace(/'\s+$/, "'").charAt(foo[x].length - 1) == "'") {
if ((tl = foo[x].replace(/^\s+'/, "'")).length > 1 && tl.charAt(0) == "'") {
foo[x] = foo[x].replace(/^\s*'|'\s*$/g, '').replace(/''/g, "'");
} else if (x) {
foo.splice(x - 1, 2, [foo[x - 1], foo[x]].join(sep));
} else foo = foo.shift().split(sep).concat(foo);
} else foo[x].replace(/''/g, "'");
} return foo;
};
You would call it like so:
var string = "'string, duppi, du', 23, lala";
var parsed = string.splitCSV();
alert(parsed.join("|"));
This jsfiddle kind of works, but it looks like some of the elements have spaces before them.

Add colon (:) after every 2nd character using Javascript

I have a string and want to add a colon after every 2nd character (but not after the last set), eg:
12345678
becomes
12:34:56:78
I've been using .replace(), eg:
mystring = mystring.replace(/(.{2})/g, NOT SURE WHAT GOES HERE)
but none of the regex for : I've used work and I havent been able to find anything useful on Google.
Can anyone point me in the right direction?
Without the need to remove any trailing colons:
mystring = mystring.replace(/..\B/g, '$&:')
\B matches a zero-width non-word boundary; in other words, when it hits the end of the string, it won't match (as that is considered to be a word boundary) and therefore won't perform the replacement (hence no trailing colon, either).
$& contains the matched substring (so you don't need to use a capture group).
mystring = mystring.replace(/(..)/g, '$1:').slice(0,-1)
This is what comes to mind immediately. I just strip off the final character to get rid of the colon at the end.
If you want to use this for odd length strings as well, you just need to make the second character optional. Like so:
mystring = mystring.replace(/(..?)/g, '$1:').slice(0,-1)
If you're looking for approach other than RegEx, try this:
var str = '12345678';
var output = '';
for(var i = 0; i < str.length; i++) {
output += str.charAt(i);
if(i % 2 == 1 && i > 0) {
output += ':';
}
}
alert(output.substring(0, output.length - 1));
Working JSFiddle
A somewhat different approach without regex could be using Array.prototype.reduce:
Array.prototype.reduce.call('12345678', function(acc, item, index){
return acc += index && index % 2 === 0 ? ':' + item : item;
}, ''); //12:34:56:78
mystring = mytring.replace(/(.{2})/g, '\:$1').slice(1)
try this
Easy, just match every group of up-to 2 characters and join the array with ':'
mystring.match(/.{1,2}/g).join(':')
var mystring = '12345678';
document.write(mystring.match(/.{1,2}/g).join(':'))
no string slicing / trimming required.
It's easier if you tweak what you're searching for to avoid an end-of-line colon(using negative lookahead regex)
mystring = mystring.replace(/(.{2})(?!$)/g, '\$1:');
mystring = mystring.replace(/(.{2})/g, '$1\:')
Give that a try
I like my approach the best :)
function colonizer(strIn){
var rebuiltString = '';
strIn.split('').forEach(function(ltr, i){
(i % 2) ? rebuiltString += ltr + ':' : rebuiltString += ltr;
});
return rebuiltString;
}
alert(colonizer('Nicholas Abrams'));
Here is a demo
http://codepen.io/anon/pen/BjjNJj

Count number of words in string using JavaScript

I am trying to count the number of words in a given string using the following code:
var t = document.getElementById('MSO_ContentTable').textContent;
if (t == undefined) {
var total = document.getElementById('MSO_ContentTable').innerText;
} else {
var total = document.getElementById('MSO_ContentTable').textContent;
}
countTotal = cword(total);
function cword(w) {
var count = 0;
var words = w.split(" ");
for (i = 0; i < words.length; i++) {
// inner loop -- do the count
if (words[i] != "") {
count += 1;
}
}
return (count);
}
In that code I am getting data from a div tag and sending it to the cword() function for counting. Though the return value is different in IE and Firefox. Is there any change required in the regular expression? One thing that I show that both browser send same string there is a problem inside the cword() function.
[edit 2022, based on comment] Nowadays, one would not extend the native prototype this way. A way to extend the native protype without the danger of naming conflicts is to use the es20xx symbol. Here is an example of a wordcounter using that.
Old answer: you can use split and add a wordcounter to the String prototype:
if (!String.prototype.countWords) {
String.prototype.countWords = function() {
return this.length && this.split(/\s+\b/).length || 0;
};
}
console.log(`'this string has five words'.countWords() => ${
'this string has five words'.countWords()}`);
console.log(`'this string has five words ... and counting'.countWords() => ${
'this string has five words ... and counting'.countWords()}`);
console.log(`''.countWords() => ${''.countWords()}`);
I would prefer a RegEx only solution:
var str = "your long string with many words.";
var wordCount = str.match(/(\w+)/g).length;
alert(wordCount); //6
The regex is
\w+ between one and unlimited word characters
/g greedy - don't stop after the first match
The brackets create a group around every match. So the length of all matched groups should match the word count.
This is the best solution I've found:
function wordCount(str) {
var m = str.match(/[^\s]+/g)
return m ? m.length : 0;
}
This inverts whitespace selection, which is better than \w+ because it only matches the latin alphabet and _ (see http://www.ecma-international.org/ecma-262/5.1/#sec-15.10.2.6)
If you're not careful with whitespace matching you'll count empty strings, strings with leading and trailing whitespace, and all whitespace strings as matches while this solution handles strings like ' ', ' a\t\t!\r\n#$%() d ' correctly (if you define 'correct' as 0 and 4).
You can make a clever use of the replace() method although you are not replacing anything.
var str = "the very long text you have...";
var counter = 0;
// lets loop through the string and count the words
str.replace(/(\b+)/g,function (a) {
// for each word found increase the counter value by 1
counter++;
})
alert(counter);
the regex can be improved to exclude html tags for example
//Count words in a string or what appears as words :-)
function countWordsString(string){
var counter = 1;
// Change multiple spaces for one space
string=string.replace(/[\s]+/gim, ' ');
// Lets loop through the string and count the words
string.replace(/(\s+)/g, function (a) {
// For each word found increase the counter value by 1
counter++;
});
return counter;
}
var numberWords = countWordsString(string);

Regular Expression for formatting numbers in JavaScript

I need to display a formatted number on a web page using JavaScript. I want to format it so that there are commas in the right places. How would I do this with a regular expression? I've gotten as far as something like this:
myString = myString.replace(/^(\d{3})*$/g, "${1},");
...and then realized this would be more complex than I think (and the regex above is not even close to what I need). I've done some searching and I'm having a hard time finding something that works for this.
Basically, I want these results:
45 becomes 45
3856 becomes 3,856
398868483992 becomes 398,868,483,992
...you get the idea.
This can be done in a single regex, no iteration required. If your browser supports ECMAScript 2018, you could simply use lookaround and just insert commas at the right places:
Search for (?<=\d)(?=(\d\d\d)+(?!\d)) and replace all with ,
In older versions, JavaScript doesn't support lookbehind, so that doesn't work. Fortunately, we only need to change a little bit:
Search for (\d)(?=(\d\d\d)+(?!\d)) and replace all with \1,
So, in JavaScript, that would look like:
result = subject.replace(/(\d)(?=(\d\d\d)+(?!\d))/g, "$1,");
Explanation: Assert that from the current position in the string onwards, it is possible to match digits in multiples of three, and that there is a digit left of the current position.
This will also work with decimals (123456.78) as long as there aren't too many digits "to the right of the dot" (otherwise you get 123,456.789,012).
You can also define it in a Number prototype, as follows:
Number.prototype.format = function(){
return this.toString().replace(/(\d)(?=(\d{3})+(?!\d))/g, "$1,");
};
And then using it like this:
var num = 1234;
alert(num.format());
Credit: Jeffrey Friedl, Mastering Regular Expressions, 3rd. edition, p. 66-67
Formatting a number can be handled elegantly with one line of code.
This code extends the Number object; usage examples are included below.
Code:
Number.prototype.format = function () {
return this.toString().split( /(?=(?:\d{3})+(?:\.|$))/g ).join( "," );
};
How it works
The regular expression uses a look-ahead to find positions within the string where the only thing to the right of it is one or more groupings of three numbers, until either a decimal or the end of string is encountered. The .split() is used to break the string at those points into array elements, and then the .join() merges those elements back into a string, separated by commas.
The concept of finding positions within the string, rather than matching actual characters, is important in order to split the string without removing any characters.
Usage examples:
var n = 9817236578964235;
alert( n.format() ); // Displays "9,817,236,578,964,235"
n = 87345.87;
alert( n.format() ); // Displays "87,345.87"
Of course, the code can easily be extended or changed to handle locale considerations. For example, here is a new version of the code that automatically detects the locale settings and swaps the use of commas and periods.
Locale-aware version:
Number.prototype.format = function () {
if ((1.1).toLocaleString().indexOf(".") >= 0) {
return this.toString().split( /(?=(?:\d{3})+(?:\.|$))/g ).join( "," );
}
else {
return this.toString().split( /(?=(?:\d{3})+(?:,|$))/g ).join( "." );
}
};
Unless it's really necessary, I prefer the simplicity of the first version though.
With the caveat that Intl.NumberFormat and Number.toLocaleString() are now there for this purpose in JavaScript:
The other answers using regular expressions all break down for decimal numbers (although the authors seem to not know this because they have only tested with 1 or 2 decimal places). This is because without lookbehind, JS regular expressions have no way to know whether you are working with the block of digits before or after the decimal point. That leaves two ways to address this with JS regular expressions:
Know whether there is a decimal point in the number, and use different regular expressions depending on that:
/(\d)(?=(\d{3})+$)/g for integers
/(\d)(?=(\d{3})+\.)/g for decimals
Use two regular expressions, one to match the decimal portion, and a second to do a replace on it.
function format(num) {
return num.toString().replace(/^[+-]?\d+/, function(int) {
return int.replace(/(\d)(?=(\d{3})+$)/g, '$1,');
});
}
console.log(format(332432432))
console.log(format(332432432.3432432))
console.log(format(-332432432))
console.log(format(1E6))
console.log(format(1E-6))
function numberWithCommas(x) {
return x.toString().replace(/\B(?=(\d{3})+(?!\d))/g, ",");
}
var num=numberWithCommas(2000000); //any number
console.log(num);
enter code here
Try this
// You might want to take decimals into account
Number.prototype.commas= function(){
var s= '', temp,
num= this.toString().split('.'), n=num[0];
while(n.length> 3){
temp= n.substring(n.length-3);
s= ','+temp+s;
n= n.slice(0, -3);
}
if(n) s= n+s;
if(num[1]) s+='.'+num[1];
return s;
}
var n= 10000000000.34;
n.commas() = returned value: (String) 10,000,000,000.34
underscore.string has a nice implementation.
I've amended it slightly to accept numeric strings.
function numberFormat(number, dec, dsep, tsep) {
if (isNaN(number) || number == null) return '';
number = parseFloat(number).toFixed(~~dec);
tsep = typeof tsep == 'string' ? tsep : ',';
var parts = number.split('.'),
fnums = parts[0],
decimals = parts[1] ? (dsep || '.') + parts[1] : '';
return fnums.replace(/(\d)(?=(?:\d{3})+$)/g, '$1' + tsep) + decimals;
}
console.log(numberFormat(123456789))
console.log(numberFormat(123456789.123456789))
console.log(numberFormat(-123456789))
console.log(numberFormat(1E6))
console.log(numberFormat(1E-6))
console.log('---')
console.log(numberFormat(123456789, 6, ',', '_'))
console.log(numberFormat(123456789.123456789, 6, ',', '_'))
console.log(numberFormat(-123456789, 6, ',', '_'))
console.log(numberFormat(1E6, 6, ',', '_'))
console.log(numberFormat(1E-6, 6, ',', '_'))
One RegExp for integers and decimals:
// Formats number 1234.5678 into string "1 234.5678".
function formatNumber(number: number): string {
return number.toString().replace(/(?<!(\.\d*|^.{0}))(?=(\d{3})+(?!\d))/g, ' ');
}
console.log(formatNumber(1234.5678)); // "1 234.5678"
console.log(formatNumber(123)); // "123"
console.log(formatNumber(123.45678)); // "123.45678"
console.log(formatNumber(123456789.11111111)); // "123 456 789.1111111"
Try something like this:
function add_commas(numStr)
{
numStr += '';
var x = numStr.split('.');
var x1 = x[0];
var x2 = x.length > 1 ? '.' + x[1] : '';
var rgx = /(\d+)(\d{3})/;
while (rgx.test(x1)) {
x1 = x1.replace(rgx, '$1' + ',' + '$2');
}
return x1 + x2;
}
If you really want a regex, you can use two in a while loop:
while(num.match(/\d{4}/)) {
num = num.replace(/(\d{3})(,\d|$)/, ',$1$2');
}
And if you want to be fancy, you can format numbers with decimal points too:
while(num.match(/\d{4}(\,|\.)/)) {
num = num.replace(/(\d{3})(,\d|$|\.)/, ',$1$2');
}
Edit:
You can also do this with 2 regular expressions and no loop, splits, joins, etc:
num = num.replace(/(\d{1,2}?)((\d{3})+)$/, "$1,$2");
num = num.replace(/(\d{3})(?=\d)/g, "$1,");
The first regex puts a comma after the first 1 or 2 digits if the remaining number of digits is divisible by three. The second regex places a comma after every remaining group of 3 digits.
These won't work with decimals, but they work great for positive and negative integers.
Test output:
45
3,856
398,868,483,992
635
12,358,717,859,918,856
-1,388,488,184
Someone mentioned that lookbehind isn't possible in Javascript RegExp. Here is a great page that explains how to use lookaround (lookahead and lookbehind).
http://www.regular-expressions.info/lookaround.html
I think you would necessarily have to do multiple passes to achieve this with regular expressions. Try the following:
Run a regex for one digit followed by 3 digits.
If that regex matches, replace it with the first digit, then a comma, then the next 3 digits.
Repeat until (1) finds no matches.
Iteration isn't necessary
function formatNumber(n, separator) {
separator = separator || ",";
n = n.toString()
.split("").reverse().join("")
.replace(/(\d{3})/g, "$1" + separator)
.split("").reverse().join("");
// Strings that have a length that is a multiple of 3 will have a leading separator
return n[0] == separator ? n.substr(1) : n;
}
var testCases = [1, 45, 2856, 398868483992];
for ( var i in testCases ) {
if ( !ns.hasOwnProperty(i) ) { continue; }
console.info(testCases[i]);
console.log(formatNumber(testCases[i]));
}
Results
1
1
45
45
2856
2,856
398868483992
398,868,483,992
First reverse a character array, then add commas after every third number unless it's just before the end of the string or before a - sign. Then reverse the character array again and make it a string again.
function add_commas(numStr){
return numStr.split('').reverse().join('').replace(/(\d{3})(?=[^$|^-])/g, "$1,").split('').reverse().join('');
}
Brandon,
I didn't see too many answers working the regex from the decimal point back, so I thought I might chime in.
I wondered if there is any elegant benefit to re-writing the regexp to scan from the back forward...
function addCommas(inputText) {
// pattern works from right to left
var commaPattern = /(\d+)(\d{3})(\.\d*)*$/;
var callback = function (match, p1, p2, p3) {
return p1.replace(commaPattern, callback) + ',' + p2 + (p3 || '');
};
return inputText.replace(commaPattern, callback);
}
>> Fiddle Demo <<
This accounts for any decimal place.
After so much searching, I generate a regex which accepts all formats
(\d+[-, ,(]{0,3}\d+[-, ,(,)]{0,3}\d+[-, ,(,)]{0,3}\d+[)]{0,2})

Categories