Remove all ANSI colors/styles from strings - javascript

I use a library that adds ANSI colors / styles to strings. For example:
> "Hello World".rgb(255, 255, 255)
'\u001b[38;5;231mHello World\u001b[0m'
> "Hello World".rgb(255, 255, 255).bold()
'\u001b[1m\u001b[38;5;231mHello World\u001b[0m\u001b[22m'
When I do:
console.log('\u001b[1m\u001b[38;5;231mHello World\u001b[0m\u001b[22m')
a "Hello World" white and bold message will be output.
Having a string like '\u001b[1m\u001b[38;5;231mHello World\u001b[0m\u001b[22m' how can these elements be removed?
foo('\u001b[1m\u001b[38;5;231mHello World\u001b[0m\u001b[22m') //=> "Hello World"
Maybe a good regular expression? Or is there any built-in feature?
The work around I was thinking was to create child process:
require("child_process")
.exec("node -pe \"console.error('\u001b[1m\u001b[38;5;231mHello World\u001b[0m\u001b[22m')\""
, function (err, stderr, stdout) { console.log(stdout);
});
But the output is the same...

The regex you should be using is
/[\u001b\u009b][[()#;?]*(?:[0-9]{1,4}(?:;[0-9]{0,4})*)?[0-9A-ORZcf-nqry=><]/g
This matches most of the ANSI escape codes, beyond just colors, including the extended VT100 codes, archaic/proprietary printer codes, etc.
Note that the \u001b in the above regex may not work for your particular library (even though it should); check out my answer to a similar question regarding acceptable escape characters if it doesn't.
If you don't like regexes, you can always use the strip-ansi package.
For instance, the string jumpUpAndRed below contains ANSI codes for jumping to the previous line, writing some red text, and then going back to the beginning of the next line - of which require suffixes other than m.
var jumpUpAndRed = "\x1b[F\x1b[31;1mHello, there!\x1b[m\x1b[E";
var justText = jumpUpAndRed.replace(
/[\u001b\u009b][[()#;?]*(?:[0-9]{1,4}(?:;[0-9]{0,4})*)?[0-9A-ORZcf-nqry=><]/g, '');
console.log(justText);

The escape character is \u001b, and the sequence from [ until first m is encountered is the styling. You just need to remove that. So, replace globally using the following pattern:
/\u001b\[.*?m/g
Thus,
'\u001b[1m\u001b[38;5;231mHello World\u001b[0m\u001b[22m'.replace(/\u001b\[.*?m/g, '')

The colors are like ESC[39m format, the shortest regexp is for it the /\u001b[^m]*?m/g
Where \u001b is the ESC character,
[^m]*? is any character(s) till m (not greedy pattern),
the m itself, and /g for global (all) replace.
Example:
var line="\x1B[90m2021-02-03 09:35:50.323\x1B[39m\t\x1B[97mFinding: \x1B[39m\x1B[97m»\x1B[39m\x1B[33m42125121242\x1B[39m\x1B[97m«\x1B[39m\x1B[0m\x1B[0m\t\x1B[92mOK\x1B[39m";
console.log(line.replace(/\u001b[^m]*?m/g,""));
// -> 2021-02-03 09:35:50.323 Finding: »42125121242« OK ( without colors )
console.log(line);
// -> 2021-02-03 09:35:50.323 Finding: »42125121242« OK ( colored )

Related

Remove everything after constant using regex

I've got XML that has additional information, BLAH, in each tag. When creating the tags, I've separated the extra info from the tag name with a constant (XMLSPLIT as constant XML_SPLITTER)... I needed to do this because I'm generating my XML from a JSON object and I can't have multiple keys that are the same thing... but in the XML output, can't have that superfluous stuff.
For example:
....
<SetXMLSPLITBLAH>
<Value>9</Value>
<SetType>
<Name>Foo</Name>
</SetType>
</SetXMLSPLITBLAH>
...
So, after generating the XML, I go through and clean it. I'm trying to do it with a regex. I figure, I want to remove anything on a line after the splitter and replace it with just the >.
let reg = new RegExp("<Set"+XML_SPLITTER+"(.*)\/g");
cleanXML = dirtyXML.replace(reg, "<Set>")
This fails to work.
I will note, that I reg = /<Set(.*)/g; and that worked just fine... but it also captures "SetType" and any other use of a tag that starts with "
It's because ^ is a Regex special character that indicates "beginning of line". You'd need to escape it like \^ for this to work. Something like /<Set\^\^[^>]*>/g should do the trick.
Small note: The above regex assumes that the "BLAH" string in your example will never contain the > character... but if it does, then your XML is super malformed anyway.
Using .* will match > and if - for some reason - your XML file is not broken up into multiple lines (i.e. minified), you'll match more than you should. To avoid this, you can use [^>]* to match everything up to the >.
Since you've gracefully included a splitter, it'll make matching much easier and much more predictable (as you mentioned, you match SetType without a splitter).
Without a splitter, you'd have to use a regex pattern that resembles <Set(?!Type>)[^>]* or <Set(?!(?:Type|SomethingElse)>)[^>]* if you had more than just one suffix to Set that should remain. These methods use a negative lookahead to assert what follows does not match.
var str = `<SetXMLSPLITBLAH>
<Value>9</Value>
<SetType>
<Name>Foo</Name>
</SetType>
</SetXMLSPLITBLAH>`
var XML_SPLITTER = 'XMLSPLIT'
var p = `(</?)Set${XML_SPLITTER}[^>]*`
var r = new RegExp(p,'g')
x = str.replace(r,'$1Set')
console.log(x)

Replace backslash in a string using regex

Why does this works:
'ye+low'.replace(/\+/g, 'l')
// > yellow
but this does NOT work:
'ye\low'.replace(/\\/g, 'l')
// > yelow
??
I need to replace ONE backslash with something, but I can't seem to make it happen.
NOTE: I CAN'T change the string as it comes in a variable.
EDIT: I understand \ is an escape character in javascript. This is fine with my understanding and I read plenty of other SO answers in this regard. My question is: "Ok I know, but still: HOW DO I REPLACE ye\low to be yellow using javascript?" I understand regex may not be the way to go because of its interpretation of backslashes, but I bet there is some way to get the desired output i some fashion.
You code shows \l, which is not 2 characters, but one character. It is an invalid escape code that falls back to just l. If you want to represent a backslash in code, you have to escape the backslash like this 'ye\\low'. This might look like two backslashes, but this is the code that represents ONE backslash.
This is a string of 5 characters: 'ye\low'.
console.log('ye\low')
// "yelow"
'ye\low'.length === 5
These two blocks of code are identical:
'ye\low'.replace(/\\/g, 'l')
'yelow'.replace(/\\/g, 'l')
The character \l is invalid and is translated to an l with no slash.
If your string has a slash in it, you have to escape the backslash like this: 'yel\\low'
const yelloWith_ONE_Backslash = 'ye\\low'
console.log(yelloWith_ONE_Backslash)
// "ye\low"
'ye\\low'.length === 6
// true
console.log('yelow')
// "yelow"
console.log('ye\low')
// "yelow"
console.log('ye\\low')
// "ye\low"
console.log('ye\\\\low')
// "ye\\low"
So you would do this:
'ye\\low'.replace(/\\/g, 'l')
Demo
var input = prompt('Try to type `ye\\low`')
var replaced = input.replace(/\\/g, 'l')
alert(replaced)
I found this to work for me:
// This allows backslash to be ineffective, meaning ye\low will remain as a string with 6 characters INCLUDING the \
var value = String.raw`ye\low`;
console.log( value.replace('/\\/g', 'l') )
Output is
yellow
To use with caution as it is not widely supported by all browsers yet. See more here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/raw

JavaScript regex match characters inside quotes and not in character set

I have a string I would like to split using #, ., [], or {} characters, as in CSS. The desired functionality is:
- Input:
"div#foo[bar='value'].baz{text}"
- Output:
["div", "#foo", "[bar='value'", ".baz", "{text"]
This is easy enough, with this RegEx:
input.match(/([#.\[{]|^.*?)[^#.\[{\]}]*/g)
However, this doesn't ignore syntax characters inside quotes, as I would like it too. (e.x. "div[bar='value.baz']" should ignore the .)
How can I make the second part of my RegEx (the [^#.\[{\]}]* portion) capture not only the negated character set, but also any character within quotes. In other words, how can I implement the RegEx, (\"|').+?\1 into my current one.
Edit:
I've figured out a regex that works decent, but can't handle escaped-quotes inside quotes (for example: "stuff here \\" quote "). If someone knows how to do that, it would be extremely helpful:
str.match(/([#.\[{]|^.*?)((['"]).*?\3|[^.#\[\]{\}])*/g);
var str = "div#foo[bar='value.baz'].baz{text}";
str.match(/(^|[\.#[\]{}])(([^'\.#[\]{}]+)('[^']*')?)+/g)
// [ 'div', '#foo', '[bar=\'value.baz\'', '.baz', '{text' ]
var tokens = myCssString.match(/\/\*[\s\S]*?\*\/|"(?:[^"\\]|\\[\s\S]*)"|'(?:[^'\\]|\\[\s\S])*'|[\{\}:;\(\)\[\]./#]|\s+|[^\s\{\}:;\(\)\[\]./'"#]+/g);
Given your string, it produces
div
#
foo
[
bar=
'value.foo'
]
.
baz
{
text
}
The RegExp above is loosely based on the CSS 2.1 lexical grammar
Firstly, and i can't stress this enough: you shouldn't use regexps to parse css, you should use a real parser, for instance http://glazman.org/JSCSSP/ or similar - many have built them, no need for you to reinvent the wheel.
that said, to solve your current problem do this:
var str = "div#foo[bar='value.foo'].baz{text}";
str.match(/([#.\[{]|^.*?)(?:[^#\[{\]}]*|\.*)/g);
//["div", "#foo", "[bar='value.foo'", ".baz", "{text"]

What does this JS do?

var passwordArray = pwd.replace(/\s+/g, '').split(/\s*/);
I found the above line of code is a rather poorly documented JavaScript file, and I don't know exactly what it does. I think it splits a string into an array of characters, similar to PHP's str_split. Am I correct, and if so, is there a better way of doing this?
it replaces any spaces from the password and then it splits the password into an array of characters.
It is a bit redundant to convert a string into an array of characters,because you can already access the characters of a string through brackets(.. not in older IE :( ) or through the string method "charAt" :
var a = "abcdefg";
alert(a[3]);//"d"
alert(a.charAt(1));//"b"
It does the same as: pwd.split(/\s*/).
pwd.replace(/\s+/g, '').split(/\s*/) removes all whitespace (tab, space, lfcr etc.) and split the remainder (the string that is returned from the replace operation) into an array of characters. The split(/\s*/) portion is strange and obsolete, because there shouldn't be any whitespace (\s) left in pwd.
Hence pwd.split(/\s*/) should be sufficient. So:
'hello cruel\nworld\t how are you?'.split(/\s*/)
// prints in alert: h,e,l,l,o,c,r,u,e,l,w,o,r,l,d,h,o,w,a,r,e,y,o,u,?
as will
'hello cruel\nworld\t how are you?'.replace(/\s+/g, '').split(/\s*/)
The replace portion is removing all white space from the password. The \\s+ atom matches non-zero length white spcace. The 'g' portion matches all instances of the white space and they are all replaced with an empty string.

Create a permalink with JavaScript

I have a textbox where a user puts a string like this:
"hello world! I think that __i__ am awesome (yes I am!)"
I need to create a correct URL like this:
hello-world-i-think-that-i-am-awesome-yes-i-am
How can it be done using regular expressions?
Also, is it possible to do it with Greek (for example)?
"Γεια σου κόσμε"
turns to
geia-sou-kosme
In other programming languages (Python/Ruby) I am using a translation array. Should I do the same here?
Try this:
function doDashes(str) {
var re = /[^a-z0-9]+/gi; // global and case insensitive matching of non-char/non-numeric
var re2 = /^-*|-*$/g; // get rid of any leading/trailing dashes
str = str.replace(re, '-'); // perform the 1st regexp
return str.replace(re2, '').toLowerCase(); // ..aaand the second + return lowercased result
}
console.log(doDashes("hello world! I think that __i__ am awesome (yes I am!)"));
// => hello-world-I-think-that-i-am-awesome-yes-I-am
As for the greek characters, yeah I can't think of anything else than some sort of lookup table used by another regexp.
Edit, here's the oneliner version:
Edit, added toLowerCase():
Edit, embarrassing fix to the trailing regexp:
function doDashes2(str) {
return str.replace(/[^a-z0-9]+/gi, '-').replace(/^-*|-*$/g, '').toLowerCase();
}
A simple regex for doing this job is matching all "non-word" characters, and replace them with a -. But before matching this regex, convert the string to lowercase. This alone is not fool proof, since a dash on the end may be possible.
[^a-z]+
Thus, after the replacement; you can trim the dashes (from the front and the back) using this regex:
^-+|-+$
You'd have to create greek-to-latin glyps translation yourself, regex can't help you there. Using a translation array is a good idea.
I can't really say for Greek characters, but for the first example, a simple:
/[^a-zA-Z]+/
Will do the trick when using it as your pattern, and replacing the matches with a "-"
As per the Greek characters, I'd suggest using an array with all the "character translations", and then adding it's values to the regular expression.
To roughly build the url you would need something like this.
var textbox = "hello world! I think that __i__ am awesome (yes I am!)";
var url = textbox.toLowerCase().replace(/([^a-z])/, '').replace(/\s+/, " ").replace(/\s/, '-');
It simply removes all non-alpha characters, removes double spacing, and then replaces all space chars with a dash.
You could use another regular expression to replace the greek characters with english characters.

Categories