Regex-rule for matching single word during input (TipTap InputRule)

Regex-rule for matching single word during input (TipTap InputRule) - javascript

I'm currently experimenting with TipTap, an editor framework.
My goal is to build a Custom Node extension for TipTap that wraps a single word in <w>-Tags, whenever a user is typing text. In TipTap I can write an InputRule with Regex for this purpose
For example the rule /(?:^|\s)((?:~)((?:[^~]+))(?:~))$/ will match text between two tildes (~text~) and wrap it with <strike>-Tags.
Click here for my Codesandbox
I was trying for so long and can't figure it out. Here are the rules that I tried:
/**
* Regex that matches a word node during input
*/
// Will match words between two tilde characters; I'm using this expression from the documentation as my starting point.
//const inputRegex = /(?:^|\s)((?:~)((?:[^~]+))(?:~))$/
// Will match a word but will append the following text to that word without the space inbetween
//const inputRegex = /\b\w+\b\s$/
// Will match a word but will append the following text to previous word without the space inbetween; Will work with double spaces
//const inputRegex = /(?:^|\s\b)(?:[^\s])(\w+\b)(?:\s)$/
// Will match a word but will swallow every second character
//const inputRegex = /\b([^\s]+)\b$/g
// Will match every second word
//const inputRegex = /\b([^\s]+)\b\s(?:\s)$/
// Will match every word but swallow spaces; Will work if I insert double spaces
const inputRegex = /\b([^\s]+)(?:\b)\s$/

The problem here is the choice of delimiter, which is space.
This becomes clear when we see the code for markInputRule.ts (line 37 to be precise)
if (captureGroup) {
const startSpaces = fullMatch.search(/\S/)
const textStart = range.from + fullMatch.indexOf(captureGroup)
const textEnd = textStart + captureGroup.length
const excludedMarks = getMarksBetween(range.from, range.to, state.doc)
When we are using '~' as delimiters, the input rule tries to place the markers for start and end, without the delimiters and provide the enclosed-text to the extension tag (CustomItalic, in your case). You can clearly test this when entering strike-through text with enclosing '~', in which case the '~' are extracted out and the text is put inside the strike-through tag.
This is exactly the cause of your double-space problem, when you are getting the match of a word with space, the spaces are replaced and then the text is entered into the tag.
I have tried to work around this using negative look-ahead patterns, but the problem remains in the code of the file mentioned above.
What I would suggest here is to copy the code in markInputRule.ts and make a custom InputRule as per your requirements, which would be way easier than working with the in-built one. Hope this helps.

I assume the problem lies within the "space". Depending on the browser, the final "space" is either not represented at all in the underlying html (Firefox) or replaced with (e.g. Chrome).
I suggest you replace the \s with (\s|\ ) in your regex.

Related

Regex to convert markdown to html

My goal is to take a markdown text and create the necessary bold/italic/underline html tags.
Looked around for answers, got some inspiration but I'm still stuck.
I have the following typescript code, the regex matches the expression including the double asterisk:
var text = 'My **bold\n\n** text.\n'
var bold = /(?=\*\*)((.|\n)*)(?<=\*\*)/gm
var html = text.replace(bold, '<strong>$1</strong>');
console.log(html)
Now the result of this is : My <\strong>** bold\n\n **<\strong> text.
Everything is great aside from the leftover double asterisk.
I also tried to remove them in a later 'replace' statement, but this creates further issues.
How can I ensure they are removed properly?

With your pattern (?=\*\*)((.|\n)*)(?<=\*\*) you assert (not match) with (?=\*\*) that there is ** directly to the right.
Then directly after that, you capture the ** using ((.|\n)*) so then it becomes part of the match.
Then at the end you assert again with (?<=\*\*) that there is ** directly to the left, but ((.|\n)*) has already matched it.
This way so you will end up with all the ** in the match.
You don't need lookarounds at all, as you are already using a capture group.
In Javascript you could match the ** on the left and right and capture any character in a capture group:
\*\*([^]*?)\*\*
Regex demo
But I would suggest using a dedicated parser to parse markdown instead of using a regex.

Just make another call to replaceAll removing the ** with and empty string.
var text = 'My **bold\n\n** text.\n'
var bold = /(?=\*\*)((.|\n)*)(?<=\*\*)/gm
var html = text.replace(bold, '<strong>$1</strong>');
html = html.replaceAll(/\*\*/gm,'');
console.log(html)

Repeat a javascript replace until no change is made?

I have a short but complex regular expression to trim spaces regardless of html tags present in the string.
var text = "<span><span>ex ample </span> </span>";
// trim from start; not relevant in this example
text = text.replace(/^((<[^>]*>)*)\s+/g, "$1");
// trim from end
text = text.replace(/\s+((<[^>]*>)*)$/g, "$1");
console.log(text);
<span><span>ex ample </span> </span> - example input
<span><span>ex ample</span></span> - expected output
<span><span>ex ample </span></span> - observed output
How do I achieve my expected output?
I've tried adding the /g flag because it should supposedly match more than once and that should fix it (running the replace twice does work for the example) but it doesn't seem to repeat anything at all.
Alternative ways to trim strings regardless of tags are also appreciated because that is my primary objective. The secondary objective is learning why this didn't work.

You need to add some meaning to your tags, some need their spaces, some don't.
Try this:
text.replace(/\s*(<\/?(span|div)>)\s*/g, "$1")
.trim()
.replace(/\s+/g, ' ');
It:
replaces spaces around tags "surrounding" content
trims spaces around global string
removes redundant spaces
The list of "surrounding" tags can be changed to include things like tr...
Steps 2 and 3 might come first to speed things up.
Tried it with:
var text = "<div> <i>ano</i> <b>ther</b> <span> <b>my</b> <i>ex</i> <u> ample </u> </span> </div>";
First answer, prior to comments.
The idea is to remove all spaces between:
a non-space character and an opening tag
a closing tag and a non-space character
text.replace(/([^\s])\s*(<)/g, "$1$2")
.replace(/([>])\s*([^\s])/g, "$1$2")
.trim();

Preamble: don't just copy this, read to the end.
Thinking from the other way around - by replacing until no match is found instead of until no change is made, this seems to work very simply.
var text = "<span><span>ex ample </span> </span>";
var trim_start = /^((<[^>]*>)*)\s+/;
while(text.match(trim_start)) {
text = text.replace(trim_start, "$1");
}
var trim_end = /\s+((<[^>]*>)*)$/;
while (text.match(trim_end)) {
text = text.replace(trim_end, "$1");
}
console.log(text);
The output is as expected - the only space is between ex ample
But this has a big problem if the replace might not change anything. Simply changing \s+ to \s* makes it turn into an infinite cycle. So, all in all, it works for my case but is not robust and to use it, you must be completely sure every single replace will change something when the regex matches.

How to write regexp for finding :smile: in javascript?

I want to write a regular expression, in JavaScript, for finding the string starting and ending with :.
For example "hello :smile: :sleeping:" from this string I need to find the strings which are starting and ending with the : characters. I tried the expression below, but it didn't work:
^:.*\:$

My guess is that you not only want to find the string, but also replace it. For that you should look at using a capture in the regexp combined with a replacement function.
const emojiPattern = /:(\w+):/g
function replaceEmojiTags(text) {
return text.replace(emojiPattern, function (tag, emotion) {
// The emotion will be the captured word between your tags,
// so either "sleep" or "sleeping" in your example
//
// In this function you would take that emotion and return
// whatever you want based on the input parameter and the
// whole tag would be replaced
//
// As an example, let's say you had a bunch of GIF images
// for the different emotions:
return '<img src="/img/emoji/' + emotion + '.gif" />';
});
}
With that code you could then run your function on any input string and replace the tags to get the HTML for the actual images in them. As in your example:
replaceEmojiTags('hello :smile: :sleeping:')
// 'hello <img src="/img/emoji/smile.gif" /> <img src="/img/emoji/sleeping.gif" />'
EDIT: To support hyphens within the emotion, as in "big-smile", the pattern needs to be changed since it is only looking for word characters. For this there is probably also a restriction such that the hyphen must join two words so that it shouldn't accept "-big-smile" or "big-smile-". For that you need to change the pattern to:
const emojiPattern = /:(\w+(-\w+)*):/g
That pattern is looking for any word that is then followed by zero or more instances of a hyphen followed by a word. It would match any of the following: "smile", "big-smile", "big-smile-bigger".

The ^ and $ are anchors (start and end respectively). These cause your regex to explicitly match an entire string which starts with : has anything between it and ends with :.
If you want to match characters within a string you can remove the anchors.
Your * indicates zero or more so you'll be matching :: as well. It'll be better to change this to + which means one or more. In fact if you're just looking for text you may want to use a range [a-z0-9] with a case insensitive modifier.
If we put it all together we'll have regex like this /:([a-z0-9]+):/gmi
match a string beginning with : with any alphanumeric character one or more times ending in : with the modifiers g globally, m multi-line and i case insensitive for things like :FacePalm:.
Using it in JavaScript we can end up with:
var mytext = 'Hello :smile: and jolly :wave:';
var matches = mytext.match(/:([a-z0-9]+):/gmi);
// matches = [':smile:', ':wave:'];
You'll have an array with each match found.

\s RegEx not capturing new line data

I am trying to clean up input and put it into a desired way. Basically, we have serialnumbers that are entered several different ways - enter delimited (newline), space, comma, etc.
My problem in my code below in testing is that new line delimited isn't working. According to w3schools and 2 other sites:
The \s metacharacter is used to find a whitespace character.
A whitespace character can be:
-A space character
-A tab character
-A carriage return character
-A new line character
-A vertical tab character
-A form feed character
This should mean that I can catch basically any new line. In Netsuite, the user is entering the value as:
SN1SN2SN3
I want this to change to "SN1,SN2,SN3,". Currently the \s RegEx is not picking up the newline? Any help would be appreciated.
**For the record - while I am using Netsuite (CRM) to get the input, the rest of this code is typical javascript and regex work. This is why I am using all 3 tags - netsuite, js, and regex
function fixSerailNumberString(s_serialNum){
var cleanString = '';
var regExSpace = new RegExp('\\s',"g");
if(regExSpace.test(s_serialNum)){
var a_splitSN = s_serialNum.split(regExSpace);
for(var i = 0; i < a_splitSN.length;i++){
if(a_splitSN[i].length!=0){
cleanString = cleanString + a_splitSN[i]+((a_splitSN[i].split(',').length>1)?'':',');
}
}
return cleanString;
}
else{
alert("No cleaning needed");
return s_serialNum;
}
}
EDITS:
1-I need to handle both if it has spaces (such as "sn1, sn2, sn3" needs to become "sn1,sn2,sn3") and this newline issue. What I have above works for the spaces.
2- I am not sure if it matters, but the field is a textarea. Does that impact this?

#Cheery found why this was happening. As I said, I got the data from Netsuite and was using the API to get the data. In the UI of Netsuite this data did look like each line was on a new line, however, when doing a console.log the values were not.
Example:
UI displayed:
sn1
sn2
sn3
Console.log displayed:
sn1sn2sn3
I was assuming the UI translated into the actual value and didn't think to check what the string was.

NetSuite multi-select fields (like the Serial Numbers transaction column) usually return all selected values as a single string, as you've noted with "sn1sn2sn3"; however, each of these values is actually separated by a non-printing character \x05. Try .split(/\x05/).join(',')

Replace words of text area

I have made a javascript function to replace some words with other words in a text area, but it doesn't work. I have made this:
function wordCheck() {
var text = document.getElementById("eC").value;
var newText = text.replace(/hello/g, '<b>hello</b>');
document.getElementById("eC").innerText = newText;
}
When I alert the variable newText, the console says that the variable doesn't exist.
Can anyone help me?
Edit:
Now it replace the words, but it replaces it with <b>hello</b>, but I want to have it bold. Is there a solution?

Update:
In response to your edit, about your wanting to see the word "hello" show up in bold. The short answer to that is: it can't be done. Not in a simple textarea, at least. You're probably looking for something more like an online WYSIWYG editor, or at least a RTE (Richt Text Editor). There are a couple of them out there, like tinyMCE, for example, which is a decent WYSIWYG editor. A list of RTE's and HTML editors can be found here.
First off: As others have already pointed out: a textarea element's contents is available through its value property, not the innerText. You get the contents alright, but you're trying to update it through the wrong property: use value in both cases.
If you want to replace all occurrences of a string/word/substring, you'll have to resort to using a regular expression, using the g modifier. I'd also recommend making the matching case-insensitive, to replace "hello", "Hello" and "HELLO" all the same:
var txtArea = document.querySelector('#eC');
txtArea.value = txtArea.value.replace(/(hello)/gi, '<b>$1</b>');
As you can see: I captured the match, and used it in the replacement string, to preserve the caps the user might have used.
But wait, there's more:
What if, for some reason, the input already contains <b>Hello</b>, or contains a word containing the string "hello" like "The company is called hellonearth?" Enter conditional matches (aka lookaround assertions) and word boundaries:
txtArea.value = txtArea.value.replace(x.value.replace(/(?!>)\b(hello)\b(?!<)/gi, '<b>$1</b>');
fiddle
How it works:
(?!>): Only match the rest if it isn't preceded by a > char (be more specific, if you want to and use (?!<b>). This is called a negative look-ahead
\b: a word boundary, to make sure we're not matching part of a word
(hello): match and capture the string literal, provided (as explained above) it is not preceded by a > and there is a word boundary
(?!<): same as above, only now we don't want to find a matching </b>, so you can replace this with the more specific (?!<\/b>)
/gi: modifiers, or flags, that affect the entire pattern: g for global (meaning this pattern will be applied to the entire string, not just a single match). The i tells the regex engine the pattern is case-insensitive, ie: h matches both the upper and lowercase character.
The replacement string <b>$1</b>: when the replacement string contains $n substrings, where n is a number, they are treated as backreferences. A regex can group matches into various parts, each group has a number, starting with 1, depending on how many groups you have. We're only grouping one part of the pattern, but suppose we wrote:
'foobar hello foobar'.replace(/(hel)(lo)/g, '<b>$1-$2</b>');
The output would be "foobar <b>hel-lo</b> foobar", because we've split the match up into 2 parts, and added a dash in the replacement string.
I think I'll leave the introduction to RegExp at that... even though we've only scratched the surface, I think it's quite clear now just how powerful regex's can be. Put some time and effort into learning more about this fantastic tool, it is well worth it.

If <textarea>, then you need to use .value property.
document.getElementById("eC").value = newText;
And, as mentioned Barmar, replace() replaces only first word. To replace all word, you need to use simple regex. Note that I removed quotes. /g means global replace.
var newText = text.replace(/hello/g, '<b>hello</b>');
But if you want to really bold your text, you need to use content editable div, not text area:
<div id="eC" contenteditable></div>
So then you need to access innerHTML:
function wordCheck() {
var text = document.getElementById("eC").innerHTML;
var newText = text.replace(/hello/g, '<b>hello</b>');
newText = newText.replace(/<b><b>/g,"<b>");//These two lines are there to prevent <b><b>hello</b></b>
newText = newText.replace(/<\/b><\/b>/g,"</b>");
document.getElementById("eC").innerHTML = newText;
}

We Keep Coding

JavaScript is the programming language of the Web.

Regex-rule for matching single word during input (TipTap InputRule) - javascript

I assume the problem lies within the "space". Depending on the browser, the final "space" is either not represented at all in the underlying html (Firefox) or replaced with (e.g. Chrome). I suggest you replace the \s with (\s|\ ) in your regex.

Related

Regex to convert markdown to html

Repeat a javascript replace until no change is made?

How to write regexp for finding :smile: in javascript?

\s RegEx not capturing new line data

Replace words of text area

Categories

Resources