Get the text next to # char in string with JavaScript - javascript

I'm implementing a mentions mechanism in a React based chat application. When the user types # in the textarea I am opening a list of members of the group and performing a search in that list using the text which comes after the # char. The code for search query extraction is as follows:
const regexp = /#(\S+)/g;
const text = regexp.exec(message);
let mentionText = '';
mentionText = text ? text[0] : '#';
This works only if there's 1 # char in the string. For example, if there's an e-mail written in the message text before the # char which opens the list of members, this will not work because I'm taking the first item in the array which is the e-mail.
Is there a better/more elegant/better working way to get the search term from the entered string? I basically need to take the text which comes after # char which is not an e-mail, for example. Yes, to be more precise - text before # char which has a heading space. That way I can tell for sure that the user wants to write a mention. Basically I think it should be like this #. If there's a space and a # sign next to it - the user is writing a mention. The problem is how to identify that # - with a heading space.
Thanks!

RegEx has an internal parameter which stores the last match index so that when you run it the next time, you get the next match. Just keep running the regex in a while loop.
const message = "#player with player#email went to #place."
const regexp = /#(\S+)/g;
const mentions = [];
let text;
while (text = regexp.exec(message)) {
let mentionText = '';
mentionText = text ? text[1] : '#';
mentions.push(mentionText)
}
console.log("mentions", mentions)

At the start of the regular expression, alternate between a space and the start of the string. Then lose the global flag (you're only looking for one match, after all). Use optional chaining to keep things concise.
const mentionText = /(?: |^)#(\S+)/.exec(message)?.[1] ?? '#';

Related

Regex-rule for matching single word during input (TipTap InputRule)

I'm currently experimenting with TipTap, an editor framework.
My goal is to build a Custom Node extension for TipTap that wraps a single word in <w>-Tags, whenever a user is typing text. In TipTap I can write an InputRule with Regex for this purpose
For example the rule /(?:^|\s)((?:~)((?:[^~]+))(?:~))$/ will match text between two tildes (~text~) and wrap it with <strike>-Tags.
Click here for my Codesandbox
I was trying for so long and can't figure it out. Here are the rules that I tried:
/**
* Regex that matches a word node during input
*/
// Will match words between two tilde characters; I'm using this expression from the documentation as my starting point.
//const inputRegex = /(?:^|\s)((?:~)((?:[^~]+))(?:~))$/
// Will match a word but will append the following text to that word without the space inbetween
//const inputRegex = /\b\w+\b\s$/
// Will match a word but will append the following text to previous word without the space inbetween; Will work with double spaces
//const inputRegex = /(?:^|\s\b)(?:[^\s])(\w+\b)(?:\s)$/
// Will match a word but will swallow every second character
//const inputRegex = /\b([^\s]+)\b$/g
// Will match every second word
//const inputRegex = /\b([^\s]+)\b\s(?:\s)$/
// Will match every word but swallow spaces; Will work if I insert double spaces
const inputRegex = /\b([^\s]+)(?:\b)\s$/
The problem here is the choice of delimiter, which is space.
This becomes clear when we see the code for markInputRule.ts (line 37 to be precise)
if (captureGroup) {
const startSpaces = fullMatch.search(/\S/)
const textStart = range.from + fullMatch.indexOf(captureGroup)
const textEnd = textStart + captureGroup.length
const excludedMarks = getMarksBetween(range.from, range.to, state.doc)
When we are using '~' as delimiters, the input rule tries to place the markers for start and end, without the delimiters and provide the enclosed-text to the extension tag (CustomItalic, in your case). You can clearly test this when entering strike-through text with enclosing '~', in which case the '~' are extracted out and the text is put inside the strike-through tag.
This is exactly the cause of your double-space problem, when you are getting the match of a word with space, the spaces are replaced and then the text is entered into the tag.
I have tried to work around this using negative look-ahead patterns, but the problem remains in the code of the file mentioned above.
What I would suggest here is to copy the code in markInputRule.ts and make a custom InputRule as per your requirements, which would be way easier than working with the in-built one. Hope this helps.
I assume the problem lies within the "space". Depending on the browser, the final "space" is either not represented at all in the underlying html (Firefox) or replaced with (e.g. Chrome).
I suggest you replace the \s with (\s|\ ) in your regex.

Temporarily remove URL from string

I've created a Twitter bot that copies the tweet of a certain user and then uwu-fies them, meaning it just changes some characters to make them funny, Elon becomes Ewon for example. Now of course it's very debatable how funny this actually is but I think that's besides the point for now.
If I got a tweet with a URL, of course the URL can't be uwu-fied since it would become invalid. The way I've sold this right now is, search for the URL using a regex, replace it with a performance.now() (I used to use a UUID v4 but that also contains characters that would get uwu-fied) and save an object with the URL and performance.now() that was used.
Then when the uwu-fication is done I can reconstruct is using the saved object, this does work but it feels like a bodged solution. The only other solution I could think of is generating a UUID that only contains characters that won't get uwu-fied?
EDIT:
Based of the current marked answer I've solved the problem by transforming my code into this:
// Split the sentence into words
const words = sentence.split(` `);
const pattern = new RegExp(/(?:https?|ftp):\/\/[\n\S]+/g);
// If the word is a URL just attach it to the new string without uwufying
let uwufied = ``;
words.forEach(word => uwufied += ` ${pattern.test(word) ? word : uwufyWord(word)}`);
You can split the tweet into an array .split(" "), and then run over that array with a foreach loop. You can handle the tweet word by word then. At the start of your handle process you would check that the "word" is not an url. Then handle your replacements.
let tweet = "Hello World. What's up?"
let arr = tweet.split(" ")
let output = ""
for (word of arr) {
// Check that it's not an URL here
// Replace here
output += word + " "
}
// Use output here
console.log(output)

Regex to match #word [duplicate]

I am writing an application in Node.js that allows users to mention each other in messages like on twitter. I want to be able to find the user and send them a notification. In order to do this I need to pull #usernames to find mentions from a string in node.js?
Any advice, regex, problems?
I have found that this is the best way to find mentions inside of a string in javascript.
var str = "#jpotts18 what is up man? Are you hanging out with #kyle_clegg";
var pattern = /\B#[a-z0-9_-]+/gi;
str.match(pattern);
["#jpotts18", "#kyle_clegg"]
I have purposefully restricted it to upper and lowercase alpha numeric and (-,_) symbols in order to avoid periods that could be confused for usernames like (#j.potts).
This is what twitter-text.js is doing behind the scenes.
// Mention related regex collection
twttr.txt.regexen.validMentionPrecedingChars = /(?:^|[^a-zA-Z0-9_!#$%&*#@]|RT:?)/;
twttr.txt.regexen.atSigns = /[#@]/;
twttr.txt.regexen.validMentionOrList = regexSupplant(
'(#{validMentionPrecedingChars})' + // $1: Preceding character
'(#{atSigns})' + // $2: At mark
'([a-zA-Z0-9_]{1,20})' + // $3: Screen name
'(\/[a-zA-Z][a-zA-Z0-9_\-]{0,24})?' // $4: List (optional)
, 'g');
twttr.txt.regexen.endMentionMatch = regexSupplant(/^(?:#{atSigns}|[#{latinAccentChars}]|:\/\/)/);
Please let me know if you have used anything that is more efficient, or accurate. Thanks!
Twitter has a library that you should be able to use for this. https://github.com/twitter/twitter-text-js.
I haven't used it, but if you trust its description, "the library provides autolinking and extraction for URLs, usernames, lists, and hashtags.". You should be able to use it in Node with npm install twitter-text.
While I understand that you're not looking for Twitter usernames, the same logic still applies and you should be able to use it fine (it does not validate that extracted usernames are valid twitter usernames). If not, forking it for your own purposes may be a very good place to start.
Edit: I looked at the docs closer, and there is a perfect example of what you need right here.
var usernames = twttr.txt.extractMentions("Mentioning #twitter and #jack")
// usernames == ["twitter", "jack"]
here is how you extract mentions from instagram caption with JavaScript and underscore.
var _ = require('underscore');
function parseMentions(text) {
var mentionsRegex = new RegExp('#([a-zA-Z0-9\_\.]+)', 'gim');
var matches = text.match(mentionsRegex);
if (matches && matches.length) {
matches = matches.map(function(match) {
return match.slice(1);
});
return _.uniq(matches);
} else {
return [];
}
}
I would respect names with diacritics, or character from any language \p{L}.
/(?<=^| )#\p{L}+/gu
Example on Regex101.com with description.
PS:
Don't use \B since it will match ##wrong.

HipChat Bot JS REGEX

I'm completely useless with Regular expressions and need help with this one. I'm currently creating a HipChat bot for my work which will create JIRA tickets from HipChat.. HipChat bots have the ability to monitor a chat room for keywords.. then run JavaScript if the keyword has been used.
In my case, I would like the Bot to monitor for -
"/ask ********************************"
Where * = text of the JIRA issue body of unlimited length
so for this, i would need regex to hook on to, and also regex to move the description text into a variable.. would anyone here be able to assist??
If I haven't explained myself well, below is an example of how "Karma" works (addon.webhook). Thanks!
https://bitbucket.org/atlassianlabs/ac-koa-hipchat-karma/src/cca57e089a2f630d924cd5df23211f0da3617063/web.js?at=master
You could simply use this regex:
^\/ask (.*)$
It will capture the text after /ask. Note the escaped backslash here as well. Since you're using Javascript, the actual expression is used with delimiters, e.g.:
/^\/ask (.*)$/
Here's a regex101 to play with: https://regex101.com/r/tZ0iB6/1
It's a quiet simple regex:
var matches = /^\/ask\s+(.*)$/i.exec(str);
if (matches && matches.length > 0) {
var content = matches[1];
}
If you want to match multiple different command you can use something like this:
var matches = /^\/([^\s]+)\s+(.*)$/g.exec(str);
if (matches && matches.length > 0) {
var command = matches[1];
var content = matches[2];
switch(command.toLowerCase()) {
case 'ask':
// do something
break;
default:
// command not found
break;
}
}
Edit:
Fixed the 'a' variable
the variable str has to be the input string.
var str = '/ask *****';

jQuery, remove certain chunks of text

Is it possible, using jQuery (Or anything else), to remove certain bits of text from an element but leave the rest intact?
I'm using a Wordpress plugin that compiles all my tweets into WP posts but the post titles are automatically saved using the full text body of the tweet. For example;
#username http://t.co/XXXXXXXX #hashtag
I want to be able to remove the hyperlink and also the #hashtag
The hashtag will always be the same (Ie; it will always be #hashtag), but the hyperlink will change with every post. Is there a way that I can do this?
Things could be harder or easier depending on whether the username, link and hashtag are always in the same position of the tweet or not. But I might suggest splitting the tweet string on ' ' and looping to construct a string without words that begin with '#', 'http://t.co', and '#'.
It would be easier with a full example because you may not want to remove all the handles, usernames, and hashtags. But I do suspect there is some uniformity in the format you may want to exploit.
Example:
var words = $(el).text().split(' ');
var cleaned = [];
for (var i = 0, len = words.length; i < len; i++) {
if (!(words[i].indexOf('#') === 1)) { // ... etc for other characters
cleaned.push(words[i]);
}
}
var cleaned_string = cleaned.join(' ');
You can use regular expressions to remove the url and the hastags, for example:
To remove the http part:
fullTweet.replace(/http:\/\/.+?\s/g, "");
The regular expression means http:// followed by any number of characters until a space (use non-eager, i.e. +?, meaning it will stop at the first space)
To remove the hashtag
fullTweet.replace(/#.+?(\s|$)/g, "");
Here it's a # followed by any character until a space or end of string
Here's a complete example.
Good reference for javascript regular expressions.
Jquery Link:
var test = "#username http://t.co/XXXXXXXX #hashtag";
var matchindex = test.indexOf("#");
alert(matchindex); // Tells you at what index the hashtag starts
var res = test.split("#");
alert(res[0]); // Gives you rest of the string without hastag
You can do something like this, to get the string without hashtag.You'll have to get the whole text as string into a variable first.

Categories