How to remove all of string up to and including hyphen - javascript

I am using javascript in a Mirth transformer. I apologize for my ignorance but I have no javascript training and have a hard time successfully utilizing info from similar threads here or elsewhere. I am trying to trim a string from 'Room-Bed' to be just 'Bed'. The "Room" and "Bed" values will fluctuate. All associated data is coming in from an ADT interface where our client is sending both the room and bed values, separated by a hyphen, in the bed field creating unnecessary redundancy. Please help me with the code needed to produce the result of 'Bed' from the received 'Room-Bed'.

There are many ways to reduce the string you have to the string you want. Which you choose will depend on your inputs and the output you want. For your simple example, all will work. But if you have strings come in with multiple hyphens, they'll render different results. They'll also have different performance characteristics. Balance the performance of it with how often it will be called, and whichever you find to be most readable.
// Turns the string in to an array, then pops the last instance out: 'Bed'!
'Room-Bed'.split('-').pop() === 'Bed'
// Looks for the given pattern,
// replacing the match with everything after the hyphen.
'Room-Bed'.replace(/.+-(.+)/, '$1') === 'Bed'
// Finds the first index of -,
// and creates a substring based on it.
'Room-Bed'.substr('Room-Bed'.indexOf('-') + 1) === 'Bed'

Related

How to select and replace certain words in input strings regardless of the full string. (More info for clarification below)

I am making a filter for a chat room I own.
I was succesful in having it turn NSFW words into a bunch of symbols and astericks to censor it, but many people bypass it by simply putting a backslash, period, or other symbol/letter after it because I only put in the words without the punctation and symbols. They also come up with a bit more creative methods such as eeeNSFWeee so the filter doesn't count it as a word.
Is there a way to make it so that the filter will select certain characters that form a word in a string and replace them (with or without replacing the extra characters connected to the message)?
The Filter is made in javascript and Socket.io
Filter code:
const array = [
"NSFW",
"Bad Word"
"Innapropiate Word"
];
message = message
.split(" ")
.map((word) => (array.includes(word.toLowerCase()) ? "$#!%" : word))
.join(" ");
For an example if somebody typed "Bad Word" exactly like that (caps are not a problem), it would censor it succesfully.
But if somebody typed "Bad Word." that would be a problem because since it has a period it would count it as a different word, thats what I need fixed.
There are a number of approaches you could take here.
You could use replace() if you just want to remove symbols. For example:
word.replace(/[&\/\\#,+()$~%.`'"!;\^:*?<>{}_\[\]]/g, '')
You could use Regular Expressions in general, which allows you to match on patterns instead of exact string matching.
You could also use more complex fuzzy matching libraries or custom fuzzy matching to accomplish your goal. This post may be helpful.

How to find key words in paragraph of text?

I'm trying to find a fast(milliseconds or seconds) solution for having an inputted block of text and a large list(11 million) of specific words/phrases to test against. So I would like to see what words/phrases are in the inputted paragraph?
We use Javascript and have SQL, MongoDB & DynamoDB as existing data stores that we can integrate this solution into.
I've done searching on this problem but can only find checking if words exist in text. not the other way around.
All ideas are welcome!
In cases like these you want to eliminate as much unnecessary data as possible. Assuming that order matters:
First things first, make sure you have a B tree index built on your phrases database clustered on the phrase. This will speed up range lookup times.
Let n = 2 (or 1, if you're into that)
Split the text block into phrases of length n and perform a query for phrases in the dictionary that begin with any of the phrase pairs ('My Phrase%'). This won't perform 4521 million string comparisons thanks to the index.
Remember the phrases that are an exact match
Let n = n + 1
Repeat from step 3 using the reduced dictionary, until the reduced dictionary is empty
You can also make small optimizations here and there depending on the kind of matches you're looking for, such as, not matching across punctuation, only phrases of a certain word length, etc. In any case, I'd expect the time bottleneck here to be on disk access, rather than actual comparisons.
Also, I'm pretty sure I based this algorithm off of an existing one but I don't remember its name so bonus points to anyone who can name it. I think it had something to do with data warehousing/mining and calculating frequencies and patterns?

Javascript - how to use regex process the following complicated string

I have the following string that will occur repeatedly in a larger string:
[SM_g]word[SM_h].[SM_l] "
Notice in this string after the phrase "[SM_g]word[Sm_h]" there are three components:
A period (.) This could also be a comma (,)
[SM_l]
"
Zero to all three of these components will always appear after "[SM_g]word[SM_h]". However, they can also appear in any order after "[SM_g]word[SM_h]". For example, the string could also be:
[SM_g]word[SM_h][SM_l]"
or
[SM_g]word[SM_h]"[SM_l].
or
[SM_g]word[SM_h]".
or
[SM_g]word[SM_h][SM_1].
or
[SM_g]word[SM_h].
or simply just
[SM_g]word[SM_h]
These are just some of the examples. The point is that there are three different components (more if you consider the period can also be a comma) that can appear after "[SM_h]word[SM_g]" where these three components can be in any order and sometimes one, two, or all three of the components will be missing.
Not only that, sometimes there will be up to one space before " and the previous component/[SM_g]word[SM_h].
For example:
[SM_g]word[SM_h] ".
or
[SM_g]word[SM_h][SM_l] ".
etc. etc.
I am trying to process this string by moving each of the three components inside of the core string (and preserving the space, in case there is a space before &\quot; and the previous component/[SM_g]word[SM_h]).
For example, [SM_g]word[SM_h].[SM_l]" would turn into
[SM_g]word.[SM_l]"[SM_h]
or
[SM_g]word[SM_h]"[SM_l]. would turn into
[SM_g]word"[SM_l].[SM_h]
or, to simulate having a space before "
[SM_g]word[SM_h] ".
would turn into
[SM_g]word ".[SM_h]
and so on.
I've tried several combinations of regex expressions, and none of them have worked.
Does anyone have advice?
You need to put each component within an alternation in a grouping construct with maximum match try of 3 if it is necessary:
\[SM_g]word(\[SM_h])((?:\.|\[SM_l]| ?"){0,3})
You may replace word with .*? if it is not a constant or specific keyword.
Then in replacement string you should do:
$1$3$2
var re = /(\[SM_g]word)(\[SM_h])((?:\.|\[SM_l]| ?"){0,3})/g;
var str = `[SM_g]word[SM_h][SM_l] ".`;
console.log(str.replace(re, `$1$3$2`));
This seems applicable for your process, in other word, changing sub-string position.
(\[SM_g])([^[]*)(\[SM_h])((?=([,\.])|(\[SM_l])|( ?&\\?quot;)).*)?
Demo,,, in which all sub-strings are captured to each capture group respectively for your post processing.
[SM_g] is captured to group1, word to group2, [SM_h] to group3, and string of all trailing part is to group4, [,\.] to group5, [SM_l] to group6, " ?&\\?quot;" to group7.
Thus, group1~3 are core part, group4 is trailing part for checking if trailing part exists, and group5~7 are sub-parts of group4 for your post processing.
Therefore, you can get easily matched string's position changed output string in the order of what you want by replacing with captured groups like follows.
\1\2\7\3 or $1$2$7$3 etc..
For replacing in Javascript, please refer to this post. JS Regex, how to replace the captured groups only?
But above regex is not sufficiently precise because it may allow any repeatitions of the sub-part of the trailing string, for example, \1\2\3\5\5\5\5 or \1\2\3\6\7\7\7\7\5\5\5, etc..
To avoid this situation, it needs to adopt condition which accepts only the possible combinations of the sub-parts of the trailing string. Please refer to this example. https://regex101.com/r/6aM4Pv/1/ for the possible combinations in the order.
But if the regex adopts the condition of allowing only possible combinations, the regex will be more complicated so I leave the above simplified regex to help you understand about it. Thank you:-)

How to split a string with space delimiter in google apps script

I want to get the first space delimited string out of a longer string with multiple words in it. Lots of examples, all of which do something like .split(" "). Google Apps script tells me that split can't take null as a parameter. I tried setting a variable to " ", even tried String.fromCharCode(32), but GAS tells me that's null.
Search, indexOf have the same issue -- GAS tells me it can't do those with a null argument. How do I tell GAS that a single space is not a null?
Yes, this was a case of the string being null in some cases. I was using the file method getDescription. After using Logger a bunch, I found the problem and I solved those some cases by this check.
var desctext = file.getDescription();
if ( desctext == null) { desctext = " ; ; ; "};
It's been many years since I've done anything with REGEX, and is was confusing then. Any suggestions on a good site for learning it?
I also changed my delimiter to a semicolon, even though it makes the user input more tedious.
The background is that I'm writing a script to work with a photo inventory where I want to keep the important information in the photo's EXIF. Not sure if Google is actually storing it there, but in any case the only fields available aside from the date and filename is the description. So I'm having the users type in coded filenames and descriptions with semicolon delimiters to indicate things like category, item name, status and value.

Node.js Emoji Parsing

I'm trying to parse an incoming string to determine whether it contains any non-emojis.
I've gone through this great article by Mathias and am leveraging both native punycode for the encoding / decoding and regenerate for the regex generation. I'm also using EmojiData to get my dictionary of emojis.
With that all said, certain emojis continue to be pesky little buggers and refuse to match. For certain emoji, I continue to get a pair of code points.
// Example of a single code point:
console.log(punycode.ucs2.decode('💩'));
>> [ 128169 ]
// Example of a paired code point:
console.log(punycode.ucs2.decode('⌛️'));
>> [ 8987, 65039 ]
Mathias touches on this in his article (and gives an example of punycode working around this) but even using his example I get an incorrect response:
function countSymbols(string) {
return punycode.ucs2.decode(string).length;
}
console.log(countSymbols('💩'));
>> 1
console.log(countSymbols('⌛️'));
>> 2
What is the best way to detect whether a string contains all emojis or not? This is for a proof of concept so the solution can be as brute force as need be.
--- UPDATE ---
A little more context on my pesky emoji above.
These are visually identical but in fact different unicode values (the second one is from the example above):
⌛ // \u231b
⌛️ // \u231b\ufe0f
The first one works great, the second does not. Unfortunately, the second version is what iOS seems to use (if you copy and paste from iMessage you get the second one, and when receiving a text from Twilio, same thing).
The U+FE0F is not a combining mark, it's a variation sequence that controls the rendering of the glyph (see this answer). Removing such sequences may change the appearance of the character, for example: U+231B+U+FE0E (⌛︎).
Also, emoji sequences can be made from multiple code points. For example, U+0032 (2) is not an emoji by itself, but U+0032+U+20E3 (2⃣) or U+0032+U+20E3+U+FE0F (2⃣️) is—but U+0041+U+20E3 (A⃣) isn't. A complete list of emoji sequences are maintained in the emoji-data.txt file by the Unicode Consortium (the emoji-data-js library appears to have this information).
To check if a string contains emoji characters, you will need to test if any single character is in emoji-data.txt, or starts a substring for a sequence in it.
If, hypothetically, you know what non-emoji characters you expect to run into, you can use a little lodash magic via their toArray or split modules, which are emoji aware. For example, if you want to see if a string contains alphanumeric characters, you could write a function like so:
function containsAlphaNumeric(string){
return _(string).toArray().filter(function(char){
return char.match(/[a-zA-Z0-9]/);
}).value().length > 0 ? true : false;
}

Categories