\s RegEx not capturing new line data

\s RegEx not capturing new line data - javascript

I am trying to clean up input and put it into a desired way. Basically, we have serialnumbers that are entered several different ways - enter delimited (newline), space, comma, etc.
My problem in my code below in testing is that new line delimited isn't working. According to w3schools and 2 other sites:
The \s metacharacter is used to find a whitespace character.
A whitespace character can be:
-A space character
-A tab character
-A carriage return character
-A new line character
-A vertical tab character
-A form feed character
This should mean that I can catch basically any new line. In Netsuite, the user is entering the value as:
SN1SN2SN3
I want this to change to "SN1,SN2,SN3,". Currently the \s RegEx is not picking up the newline? Any help would be appreciated.
**For the record - while I am using Netsuite (CRM) to get the input, the rest of this code is typical javascript and regex work. This is why I am using all 3 tags - netsuite, js, and regex
function fixSerailNumberString(s_serialNum){
var cleanString = '';
var regExSpace = new RegExp('\\s',"g");
if(regExSpace.test(s_serialNum)){
var a_splitSN = s_serialNum.split(regExSpace);
for(var i = 0; i < a_splitSN.length;i++){
if(a_splitSN[i].length!=0){
cleanString = cleanString + a_splitSN[i]+((a_splitSN[i].split(',').length>1)?'':',');
}
}
return cleanString;
}
else{
alert("No cleaning needed");
return s_serialNum;
}
}
EDITS:
1-I need to handle both if it has spaces (such as "sn1, sn2, sn3" needs to become "sn1,sn2,sn3") and this newline issue. What I have above works for the spaces.
2- I am not sure if it matters, but the field is a textarea. Does that impact this?

#Cheery found why this was happening. As I said, I got the data from Netsuite and was using the API to get the data. In the UI of Netsuite this data did look like each line was on a new line, however, when doing a console.log the values were not.
Example:
UI displayed:
sn1
sn2
sn3
Console.log displayed:
sn1sn2sn3
I was assuming the UI translated into the actual value and didn't think to check what the string was.

NetSuite multi-select fields (like the Serial Numbers transaction column) usually return all selected values as a single string, as you've noted with "sn1sn2sn3"; however, each of these values is actually separated by a non-printing character \x05. Try .split(/\x05/).join(',')

Related

Regex-rule for matching single word during input (TipTap InputRule)

I'm currently experimenting with TipTap, an editor framework.
My goal is to build a Custom Node extension for TipTap that wraps a single word in <w>-Tags, whenever a user is typing text. In TipTap I can write an InputRule with Regex for this purpose
For example the rule /(?:^|\s)((?:~)((?:[^~]+))(?:~))$/ will match text between two tildes (~text~) and wrap it with <strike>-Tags.
Click here for my Codesandbox
I was trying for so long and can't figure it out. Here are the rules that I tried:
/**
* Regex that matches a word node during input
*/
// Will match words between two tilde characters; I'm using this expression from the documentation as my starting point.
//const inputRegex = /(?:^|\s)((?:~)((?:[^~]+))(?:~))$/
// Will match a word but will append the following text to that word without the space inbetween
//const inputRegex = /\b\w+\b\s$/
// Will match a word but will append the following text to previous word without the space inbetween; Will work with double spaces
//const inputRegex = /(?:^|\s\b)(?:[^\s])(\w+\b)(?:\s)$/
// Will match a word but will swallow every second character
//const inputRegex = /\b([^\s]+)\b$/g
// Will match every second word
//const inputRegex = /\b([^\s]+)\b\s(?:\s)$/
// Will match every word but swallow spaces; Will work if I insert double spaces
const inputRegex = /\b([^\s]+)(?:\b)\s$/

The problem here is the choice of delimiter, which is space.
This becomes clear when we see the code for markInputRule.ts (line 37 to be precise)
if (captureGroup) {
const startSpaces = fullMatch.search(/\S/)
const textStart = range.from + fullMatch.indexOf(captureGroup)
const textEnd = textStart + captureGroup.length
const excludedMarks = getMarksBetween(range.from, range.to, state.doc)
When we are using '~' as delimiters, the input rule tries to place the markers for start and end, without the delimiters and provide the enclosed-text to the extension tag (CustomItalic, in your case). You can clearly test this when entering strike-through text with enclosing '~', in which case the '~' are extracted out and the text is put inside the strike-through tag.
This is exactly the cause of your double-space problem, when you are getting the match of a word with space, the spaces are replaced and then the text is entered into the tag.
I have tried to work around this using negative look-ahead patterns, but the problem remains in the code of the file mentioned above.
What I would suggest here is to copy the code in markInputRule.ts and make a custom InputRule as per your requirements, which would be way easier than working with the in-built one. Hope this helps.

I assume the problem lies within the "space". Depending on the browser, the final "space" is either not represented at all in the underlying html (Firefox) or replaced with (e.g. Chrome).
I suggest you replace the \s with (\s|\ ) in your regex.

Javascript regex to match more than one line break

I need a javascript regex that will not allow more than one line break or carriage return. One line break is OK, more than one should not be permitted. I have this which does not allow any, but I'm unable to modify it to allow only one line break?
^[^\n\r]*$

The round brackets constitute a group. Your group is "\n\r", which should not be multiple. So you use a "+", that constitute 1 or more. In following case it will replace every multiple "\n\r" with "\n\r\" and every single "\n\r" with it.
var multiple = "hello\n\r\n\rworld\n\r!"
var single = multiple.replace(/(\n\r)+/g, "\\n\\r");
console.log(single);

Instead of looking for ^[\n\r]* look for ^\n\r[\n\r]*
var regpat = /^(\n\r)[\n\r]*/;
var str = "\n\r\n\r";
str.replace(re, '$1');

You can use match to text for multiple \n and throw an alert, like so:
var text = "hello\nworld\n\nmore here\n"
if (text.match(/\n[\n]+/g)){
alert("Error mulitple new lines");
}
You may want to first remove the \r or alter the above to also match \r also.

How to remove leading and trailing white spaces from input text?

I need to fix a bug in AngularJS application, which has many forms to submit data-
Every Text box in forms is accepting whitespaces(both leading and trailing) and saving them into the database. So in order to fix this I used ng-trim="true", it worked and data is getting saved correctly in the back-end.
Problem: Even after using ng-trim when I click on save/update, the form UI shows the text with white-spaces not the trimmed data. It shows correct data only when I refresh the page.
Can anyone guide me.. what will be the approach to fix this?
P.S. - I'm new to both JavaScript and Angular!
Thanks

Using trim() method works fine, but is used in newer browsers.
function removeWhitespaceUsingTrimMethod {
var str = " This is whitespace string for testing purpose ";
var wsr = str.trim();
alert(wsr);
}
Output:
This is whitespace string for testing purpose
From Docs:
(method) String.trim(): string
Removes the leading and trailing white space and line terminator
characters from a string.
Using replace() method – works in all browsers
Syntax:
testStr.replace(rgExp, replaceText);
str.replace(/^\s+|\s+$/g, '');
function removeWhitespaceUsingReplaceMethod {
var str = " This is whitespace string for testing purpose ";
var wsr = str.replace(/^\s+|\s+$/g, '');
alert( wsr);
}
Output:
This is whitespace string for testing purpose

Use string = string.trim() where string is the variable holding your string value.

When using reactive forms
this.form.controls['email'].valueChanges
.subscribe(x=>{
if(x.includes(' ')){
console.log('contains spaces')
this.form.controls['email'].setValue(x.trim())
}else{
console.log('does not contain spaces')
}
})

How to allow whitespace in Regex

I have a regex which allows only to enter integers and floats in a text box.
Regex Code:-
("^[0-9]*(?:[.][0-9]*|)$");
But it gives an error when the user enters whitespace at the beginning and end of the entered values. I want the user to allow spaces at the beginning and at the end as optional, so I changed the regex as below but it didn't work.
Note: Spaces may be spaces or tabs.
Test Case: User might enter:
"10","10.23"," 10","10 "," 10.23","10.23 "
Any number of spaces are allowed.
("^(?:\s)*[0-9]*(?:[.][0-9]*|)$")
I am newbie with regex, so any help will be highly appreciated.
Thank you.

Try this:
/^\s*[0-9]*(?:[.][0-9]*|)\s*$/;
You don't have to wrap a single entity in a group to repeat it, and I have added a second zero-or-more-spaces at the end which is what you are missing to make it work.
Note: You have not posted the code you use to create the RegExp object, but if it is new RegExp(string), remember to escape your backslashes (by doubling them):
var r = new RegExp("^\\s*[0-9]*(?:[.][0-9]*|)\\s*$");
Also, as #Blender suggests, this can be simplified to:
/^\s*[0-9]*(?:\.[0-9]*)?\s*$/;
Or, using \d instead of [0-9]:
/^\s*\d*(?:\.\d*)?\s*$/;

You don't necessarily need a Regular Expression: !isNaN(Number(textboxvalue.trim())) would be sufficient.
Otherwise, try /^\s{0,}\d+\.{0,1}\d+\s{0,}$/. Test:
var testvalues = ["10","10.23"," 10","10 "," 10.23","10.23 ","10.24.25"];
for (var i=0;i<testvalues.length;i+=1){
console.log(/^\s{0,}\d+\.{0,1}\d+\s{0,}$/.test(testvalues[i]));
}
//=> 6 x true, 1 x false

Chrome counts characters wrong in textarea with maxlength attribute

Here is an example:
$(function() {
$('#test').change(function() {
$('#length').html($('#test').val().length)
})
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea id=test maxlength=10></textarea>
length = <span id=length>0</span>
Fill textarea with lines (one character at one line) until browser allows.
When you finish, leave textarea, and js code will calculate characters too.
So in my case I could enter only 7 characters (including whitespaces) before chrome stopped me. Although value of maxlength attribute is 10:

Here's how to get your javascript code to match the amount of characters the browser believes is in the textarea:
http://jsfiddle.net/FjXgA/53/
$(function () {
$('#test').keyup(function () {
var x = $('#test').val();
var newLines = x.match(/(\r\n|\n|\r)/g);
var addition = 0;
if (newLines != null) {
addition = newLines.length;
}
$('#length').html(x.length + addition);
})
})
Basically you just count the total line breaks in the textbox and add 1 to the character count for each one.

Your carriage returns are considered 2 characters each when it comes to maxlength.
1\r\n
1\r\n
1\r\n
1
But it seems that the javascript only could one of the \r\n (I am not sure which one) which only adds up to 7.

It seems like the right method, based on Pointy's answer above, is to count all new lines as two characters. That will standardize it across browsers and match what will get sent when it's posted.
So we could follow the spec and replace all occurrences of a Carriage Return not followed by a New Line, and all New Lines not followed by a Carriage Return, with a Carriage Return - Line Feed pair.
var len = $('#test').val().replace(/\r(?!\n)|\n(?!\r)/g, "\r\n").length;
Then use that variable to display the length of the textarea value, or limit it, and so on.

For reasons unknown, jQuery always converts all newlines in the value of a <textarea> to a single character. That is, if the browser gives it \r\n for a newline, jQuery makes sure it's just \n in the return value of .val().
Chrome and Firefox both count the length of <textarea> tags the same way for the purposes of "maxlength".
However, the HTTP spec insists that newlines be represented as \r\n. Thus, jQuery, webkit, and Firefox all get this wrong.
The upshot is that "maxlength" on <textarea> tags is pretty much useless if your server-side code really has a fixed maximum size for a field value.
edit — at this point (late 2014) it looks like Chrome (38) behaves correctly. Firefox (33) however still doesn't count each hard return as 2 characters.

It looks like that javascript is considering length of new line character also.
Try using:
var x = $('#test').val();
x = x.replace(/(\r\n|\n|\r)/g,"");
$('#length').html(x.length);
I used it in your fiddle and it was working. Hope this helps.

That is because an new line is actually 2 bytes, and therefore 2 long. JavaScript doesn't see it that way and therefore it will count only 1, making the total of 7 (3 new lines)

Here's a more universal solution, which overrides the jQuery 'val' function. Will be making this issue into a blog post shortly and linking here.
var originalVal = $.fn.val;
$.fn.val = function (value) {
if (typeof value == 'undefined') {
// Getter
if ($(this).is("textarea")) {
return originalVal.call(this)
.replace(/\r\n/g, '\n') // reduce all \r\n to \n
.replace(/\r/g, '\n') // reduce all \r to \n (we shouldn't really need this line. this is for paranoia!)
.replace(/\n/g, '\r\n'); // expand all \n to \r\n
// this two-step approach allows us to not accidentally catch a perfect \r\n
// and turn it into a \r\r\n, which wouldn't help anything.
}
return originalVal.call(this);
}
else {
// Setter
return originalVal.call(this, value);
}
};

If you want to get remaining content length of text area then you can use match on the string containing the line breaks.
HTML:
<textarea id="content" rows="5" cols="15" maxlength="250"></textarea>
JS:
var getContentWidthWithNextLine = function(){
return 250 - content.length + (content.match(/\n/g)||[]).length;
}

var value = $('#textarea').val();
var numberOfLineBreaks = (value.match(/\n/g)||[]).length;
$('#textarea').attr("maxlength",500+numberOfLineBreaks);
works perfectly on google already in IE have to avoid the script! In IE the 'break-line' is counted only once, so avoid this solution in IE!

Textareas are still not fully in sync among browsers. I noticed 2 major problems: Carriage returns and Character encodings
Carriage return
By default are manipulated as 2 characters \r\n (Windows style).
The problem is that Chrome and Firefox will count it as one character. You can also select it to observe there is an invisivle character selected as a space.
A workaround is found here:
var length = $.trim($(this).val()).split(" ").join("").split('\n').join('').length;
Jquery word counts when user type line break
Internet explorer on the other hand will count it as 2 characters.
Their representation is :
Binary: 00001101 00001010
Hex: 0D0A
, and are represented in UTF-8 as 2 characters and counted for maxlength as 2 characters.
The HTML entities can be
1) Created from javascript code:
<textarea id='txa'></textarea>
document.getElementById("txa").value = String.fromCharCode(13, 10);
2) Parsed from the content of the textarea:
Ansi code:
<textarea>Line one.
Line two.</textarea>
3) Inserted from keyboard Enter key
4) Defined as the multiline content of the textbox
<textarea>Line one.
Line two.</textarea>
Character Encoding
Character encoding of an input field like textarea is independent than the character encoding of the page. This is important if you plan to count the bytes. So, if you have a meta header to define ANSI encoding of your page (with 1 byte per character), the content of your textbox is still UTF-8 with 2 bytes per character.
A workaround for the character encoding is provided here:
function htmlEncode(value){
// Create a in-memory div, set its inner text (which jQuery automatically encodes)
// Then grab the encoded contents back out. The div never exists on the page.
return $('<div/>').text(value).html();
}
function htmlDecode(value){
return $('<div/>').html(value).text();
}
HTML-encoding lost when attribute read from input field

We Keep Coding

JavaScript is the programming language of the Web.

\s RegEx not capturing new line data - javascript

NetSuite multi-select fields (like the Serial Numbers transaction column) usually return all selected values as a single string, as you've noted with "sn1sn2sn3"; however, each of these values is actually separated by a non-printing character \x05. Try .split(/\x05/).join(',')

Related

Regex-rule for matching single word during input (TipTap InputRule)

Javascript regex to match more than one line break

How to remove leading and trailing white spaces from input text?

How to allow whitespace in Regex

Chrome counts characters wrong in textarea with maxlength attribute

Categories

Resources