How to match a verb in any tense in Compromise.js

How to match a verb in any tense in Compromise.js - javascript

The rather excellent compromise.js offers, among other things, a match function.
I'm struggling to get it to work on variants of a verb:
var nlp = require('compromise');
var sentences = [
'I am discharging you',
'I have discharged you',
'I will discharge him',
'I discharged you',
'Monkey'
];
let doc = nlp(sentences.join('. '));
console.log(doc.match('discharge').sentences().out('text'));
/* Output:
discharge
*/
Above only matches 1 sentence out of an expected 4.
How can I get it to match all 4 sentences shown above that contain a conjugate of the word 'discharge'?
Running the following does correctly find the conjugations of the verb 'discharge':
doc.verbs().conjugate()
/* Output:
[ { PastTense: 'discharged',
PresentTense: 'discharges',
Infinitive: 'discharge',
Gerund: 'discharging',
Actor: 'discharger',
FutureTense: 'will discharge' },
{ PastTense: 'had',
PresentTense: 'has',
Infinitive: 'have',
Gerund: 'having',
Actor: 'haver',
Participle: 'had',
FutureTense: 'will have' },
{ PastTense: 'discharged',
PresentTense: 'discharges',
Infinitive: 'discharge',
Gerund: 'discharging',
Actor: 'discharger',
FutureTense: 'will discharge' },
{ PastTense: 'discharged',
PresentTense: 'discharges',
Infinitive: 'discharge',
Gerund: 'discharging',
Actor: 'discharger',
FutureTense: 'will discharge' } ]
*/

The goal of .match() is to provide a quick way to describe any
grammatical pattern, or match condition, using a human-readable, and
mostly-reasonable style. Ref
You can use regex pattern in match and you don't need sentences
var nlp = nlp
var sentences = ['I am discharging you','I have discharged you','I will discharge him','I discharged you','Monkey'];
let doc = nlp(sentences.join('. '));
console.log(doc.match('/discharg(ing|e|ed)/').out('text'));
// to capture all verbs
console.log(doc.match('#verb').out('array'));
<script src="https://unpkg.com/compromise#latest/builds/compromise.min.js"></script>

early versions of compromise tried to store a 'root' conjugation for every verb, for this purpose, but It became too slow on a large text.
perhaps the best way to do this is to conjugate the terms in the document to a known tense, then look for it.
let doc = nlp('i discharged and was discharging')
doc.verbs().toInfinitive()
doc.match('discharge').length
// 2
https://runkit.com/spencermountain/5d080c35d95eb800198fcc78
cheers

Related

Partial String Match - Dynamic Strings

For certain dynamic strings like:
covid-19 testing status upto may 05,2021
covid-19 testing status upto may 04,2021
covid-19 testing status upto may 01,2021
....
covid-19 testing status upto {{date}}
and others like:
Jack and Jones are friends
Jack and JC are friends
Jack and Irani are friends
.....
Jack and {{friend-name}} are friends
I want to match the incoming string like:
covid-19 testing status upto may 01,2021
with
covid-19 testing status upto {{date}}
and if there is a match, I want to extract the value of date.
Similarly, for an incoming string like
Jack and JC are friends
I want to match with
Jack and {{friend-name}} are friends
and extract JC or the friend-name. How could I do this?
I am trying to create a setup where dynamic strings like these, can be merged into one. There could be thousands of incoming strings that I want to match against the existing patterns.
INCOMING_STRINGS -------EXISTING-PATTERNS----->
[
covid-19 testing status upto {{date}},
Jack and {{friend-name}} are friends,
....
] ---> FIND THE PATTERN AND EXTRACT THE DYNAMIC VALUE
EDIT
It is not guaranteed that the pattern will always exist in the incoming strings.

It's very easy to use regex for the second of your examples. If you use a capturing group for the "friend name" part you can extract that with ease:
const re = /Jack and ([a-zA-Z]+) are friends/
const inputs = ["Jack and Jones are friends",
"Jack and JC are friends",
"Jack and Irani are friends",
"Bob and John are friends"] // last one wont match
for(let i=0;i<inputs.length;i++){
const match = inputs[i].match(re);
if(match)
console.log("friend=",match[1]);
else
console.log("No match for the string:", inputs[i])
}
The first example is slightly hardeer, but only because the regex is more difficult to write. Assuming the format is always "short month name 2 digit day comma 4 digit year" it is doable
const re = /covid-19 testing status upto ((jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec) (0?[1-9]|[12][0-9]|3[01]),\d{4})/
const inputs = ["covid-19 testing status upto may 05,2021",
"covid-19 testing status upto may 04,2021",
"covid-19 testing status upto may 01,2021",
"covid-19 testing status upto 01/01/2020"] // wrong date format
for(let i=0;i<inputs.length;i++){
const match = inputs[i].match(re);
if(match)
console.log("date=",match[1]);
else
console.log("No match for the string:", inputs[i])
}

It's fairly unclear to me what your actual inputs and outputs should be. Here's an attempt that guesses at that. With inputs like
[{
sample: 'covid-19 testing status upto may 05,2021',
extract: 'may 05,2021',
propName: 'date'
}, {
sample: 'Jack and Jones are friends',
extract: 'Jones',
propName: 'friend-name'
}]
we generate a function which can be used like this:
mySubs ('Jack and William are friends')
//=> {"friend-name": "William"}
or
(mySubs ('covid-19 testing status upto apr 30,2021')
//=> {"date": "apr 30,2021"}
or
mySubs ('Jack and Jessica are friends who dicsussed covid-19 testing status upto apr 27,2021')
//=> {"date": "apr 27,2021", "friend-name": "Jessica"}
and which would yield an empty object if nothing matched.
We do this by dynamically generating regular expressions for our samples, ones which will capture the substitutions made:
const regEscape = (s) =>
s .replace (/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
const makeTester = ({sample, extract, propName}) => ({
regex: new RegExp (
regEscape (sample .slice (0, sample .indexOf (extract))) +
'(.+)' +
regEscape (sample .slice (sample .indexOf (extract) + extract .length))
),
propName
})
const substitutes = (configs, testers = configs .map (makeTester)) => (sentence) =>
Object.assign( ...testers .flatMap (({regex, propName}) => {
const match = sentence .match (regex)
return (match)
? {[propName]: match[1]}
: {}
}))
const configs = [{
sample: 'covid-19 testing status upto may 05,2021',
extract: 'may 05,2021',
propName: 'date'
}, {
sample: 'Jack and Jones are friends',
extract: 'Jones',
propName: 'friend-name'
}]
const mySubs = substitutes (configs)
console .log (mySubs ('Jack and William are friends'))
console .log (mySubs ('covid-19 testing status upto apr 30,2021'))
console .log (mySubs ('Jack and Jessica are friends who dicsussed covid-19 testing status upto apr 27,2021'))
console .log (mySubs ('Some random string that does not match'))
.as-console-wrapper {max-height: 100% !important; top: 0}
If you needed to also report what templates matched, you could add a name to each template, and then carry the results through the two main functions to give results like this:
{"covid": {"date": "apr 27,2021"}, "friends": {"friend-name": "Jessica"}}
It's only slightly more complex:
const regEscape = (s) =>
s .replace (/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
const makeTester = ({name, sample, extract, propName}) => ({
regex: new RegExp (
regEscape (sample .slice (0, sample .indexOf (extract))) +
'(.+)' +
regEscape (sample .slice (sample .indexOf (extract) + extract .length))
),
propName,
name
})
const substitutes = (configs, testers = configs.map(makeTester)) => (sentence) =>
Object.assign( ...testers .flatMap (({name, regex, propName}) => {
const match = sentence .match (regex)
return (match)
? {[name]: {[propName]: match[1]}}
: {}
}))
const configs = [{
name: 'covid',
sample: 'covid-19 testing status upto may 05,2021',
extract: 'may 05,2021',
propName: 'date'
}, {
name: 'friends',
sample: 'Jack and Jones are friends',
extract: 'Jones',
propName: 'friend-name'
}]
const mySubs = substitutes (configs)
console .log (mySubs ('Jack and William are friends'))
console .log (mySubs ('covid-19 testing status upto apr 30,2021'))
console .log (mySubs ('Jack and Jessica are friends who dicsussed covid-19 testing status upto apr 27,2021'))
console .log (mySubs ('Some random string that does not match'))
Either way, this has some limitations. It's hard to figure out what to do if the templates overlap in odd ways, etc. It's also possible that you want to match only complete sentences, and my examples of double-matching won't make sense. If so, you can just prepend a '^' and append a '$' to the string passed to new RegExp.
Again, this is a guess at your requirements. The important thing here is that you might be able to dynamically generate regexes to use.

splitting an string list with proper aligning the string elements

Example Company 1,company ltd 2,company, Inc.,company Nine nine, ltd,company ew So here is example of the string, I want to split it like that it consider Company 1 as one company and company, Inc. as one, but here got situation in company, Inc. it condidering 2 companies while this logic. how can I resolve this? Lke with such strings company, Inc. I want to consider it one element only
const company = company.split(",");
Here the string can be anything, this is just example for the string, but it can be any name. So I am looking for generic logic which works for any string, having same structure of string.
Note $ ==(,) represents as separation point, kept to get clarity that from that point I need to separate the string
Object:
Example 1
{
_id: 5de4debcccea611e4d14d4d5
companies: One Bros. Inc. & Might Bros. Dist. Corp.$Pages, Inc.$Google Inc. Search$Aphabet Inc. tech.
}
Example 2
{
_id: 5de4debccc333611e4d14d4f5
companies: Google Comp. Inc.$Google Comp. Inc. Estd.$Tree, Ltd.$Tree, Ltd.
}

First I split on 'ompany' rather than 'company', because you have one instance of 'Company' with a capital C -- see the output of the first console log within a comment below.
Then I put things back together using reduce -- map is not the right choice here, as I need an array that is one fewer than the size of the fragments I generated. Then though since I need an array that corresponds to the number of strings we want to return, which is one fewer than the number of fragments, the first thing I do inside my reduce is ensure I do not look beyond the end of the array.
Then I split each fragment and pop off the last element, which just puts either "C" or "c" back together with "ompany". Then I replace any trailing ',c' from the next fragment with an empty string, and add the result to the company. Finally I add the entire result to the array I'm generating with reduce. See comment results at bottom. Also here it is on repl.it: https://repl.it/#dexygen/splitOnCompanyStringLiteral
This is a fairly concise way to do this but again if you can do anything to improve your data, you won't have to use such unnecessarily complicated code.
const companiesStr = "Company 1,company ltd 2,company, Inc.,company Nine nine, ltd,company ew";
const companySuffixFragments = companiesStr.split("ompany");
console.log(companySuffixFragments);
/*
[ 'C', ' 1,c', ' ltd 2,c', ', Inc.,c', ' Nine nine, ltd,c', ' ew' ]
*/
const companiesArr = companySuffixFragments.reduce((companies, fragment, index, origArr) => {
if (index < companySuffixFragments.length - 1) {
let company = fragment.split(',').pop() + 'ompany'
company = company + origArr[index + 1].replace(/,c$/, '');
companies.push(company);
}
return companies
}, []);
console.log(companiesArr);
/*
[ 'Company 1',
'company ltd 2',
'company, Inc.',
'company Nine nine, ltd',
'company ew' ]
*/

First change , with any other symbol. I am using & here and then split string with ,
var str= 'Company 1,company ltd 2,company, Inc.,company Nine nine, ltd,company ew';
str = str.replace(', Inc.','& Inc.');
/*str = str.replace(', ltd','& ltd');*/
console.log(str.split(',').map((e)=>{return e.replace('&',',').trim()}));

try with the below solution.
var str = ["company 1","company ltd 2","company", "Inc.","company Nine nine", "ltd","company ews"];
var str2 =str.toString()
var str3 = str2.split("company")
function myFunction(item, index,arr){if(item !=""){let var2 = item.replace(/,/g," ");var2 = "Company"+var2;arr[index]=var2;} }
str3.forEach(myFunction)
OUtput:
str3
(6) ["", "Company 1 ", "Company ltd 2 ", "Company Inc. ", "Company Nine nine ltd ", "Company ews"]
And remove the first element of the array.

As has been commented I'd try to get a more clean String so that you don't have to write "strange" code to get what you need.
If you can't do that right now this code should solve your problem:
let string = 'Company 1,company ltd 2,company, Inc.,company Nine nine, ltd,company
ew';
let array = string.split(',');
const filterFnc = (array) => {
let newArr = [],
i = 0;
for(i = 0; i < array.length; i++) {
if(array[i].toLowerCase().indexOf('company') !== -1) {
newArr.push(array[i]);
} else {
newArr.splice(newArr.length - 1, 1, `${array[i - 1]}, ${array[i]}`);
}
}
return newArr;
};
let filteredArray = filterFnc(array);

How to remove colon based emojis from text using javascript

How can i remove all instances of :smile: style emjois from a string using javascript? Here is an example below I got in JSON with :point_right: in it. I'd love to remove all of them from a string.
[ { service_name: 'Instagram',
title: 'Instagram: “:point_right: Real people, making real products from real plants, using their actual hands to put them in boxes that show up on your doorstep.…”',
text: '36 Likes, 2 Comments - “:point_right: Real people, making real products',
ts: '1523497358.000299' }

Just use String.prototype.replace() with a regular expression:
const input = 'Instagram: “:point_right: Real people, making real products from real plants, using their actual hands to put them in boxes that show up on your doorstep.…”';
const output = input.replace(/:\w+:/g, '');
console.log(output);

Assuming the emojis are all one word, between :s:
const obj = {
service_name: 'Instagram',
title: 'Instagram: “:point_right: Real people, making real products from real plants, using their actual hands to put them in boxes that show up on your doorstep.…”',
text: '36 Likes, 2 Comments - “:point_right: Real people, making real products',
ts: '1523497358.000299'
}
obj.title = obj.title.replace(/:[^ ]+:/g, '');
obj.text = obj.text.replace(/:[^ ]+:/g, '');
console.log(obj);

From this answer Replacing values in JSON object you could do this :
var json=[ { service_name: 'Instagram',
title: 'Instagram: “:point_right: Real people, making real products from real plants, using their actual hands to put them in boxes that show up on your doorstep.…”',
text: '36 Likes, 2 Comments - “:point_right: Real people, making real products',
ts: '1523497358.000299' }];
var rep = JSON.stringify(json).replace(/(“)(:[^:]+:)/g, '$1');
var New = JSON.parse(rep);
console.log(New);

Try this :
// JSON Object
var jsonObj = [{
"service_name": "Instagram",
"title": "Instagram: “:point_right: Real people, making real products from real plants, using their actual hands to put them in boxes that show up on your doorstep.…”",
"text": "36 Likes, 2 Comments - “:point_right: Real people, making real products",
"ts": "1523497358.000299"
}];
// Replace :point_right: with blank string.
jsonObj.map(obj => {
obj.title = obj.title.replace(":point_right: ", "");
obj.text = obj.text.replace(":point_right: ", "");
return obj;
});
// Output
console.log(jsonObj);

set expected result as true when find in a string partly correct protractor

Hello everyone I got following code:
it('Search for string', function () {
var MySearch = element(by.model('searchQuery'));
MySearch.sendKeys('Apple Pomace');
expect(MySearch.getAttribute('value')).toBe('Apple Pomace');
element(by.buttonText('Search')).click();
//browser.pause();
var optionTexts = element.all(by.repeater('product in products')).map(function (Options) {
return Options.getText();
});
optionTexts.then(function (array){
expect(array).toContain("Apple Pomace");
});
});
then I get as result:
[ 'Apple Pomace\nFinest pressings of apples. Allergy disclaimer: Might contain traces of worms. Can be sent back to us for recycling.\n0.89' ]
now I want to check if the string contains Apple Pomace
I have tried following code:
expect(array).toContain('Apple Pomace');
then I get:
Expected [ 'Apple Pomace
Finest pressings of apples. Allergy disclaimer: Might contain traces of worms. Can be sent back to us for recycling.
0.89' ] to contain 'Apple Pomace'. <Click to see difference>
how do I set the test to true even if the whole string doesn't match my result?
or validate the string to the first "\" ?
code
Thank you in advance

First of all element.all(by.repeater('product in products')).getText() will return array of strings.If you use toContain matcher on the array, it will check for the whole string to be present in the array.
In your case, you need to check if the entire array has any string that matches the word Apple Pomace. To achieve this, you need to transform the result array into a string and then apply toContain matcher on it.
var displayedResults = element.all(by.repeater('product in products')).getText()
.then(function(resultArray){
return resultArray.join(); // will convert the array to string.
})
expect(displayedResults).toContain("Apple Pomace");
Hope this might help you!

Splitting Name into last name, first name, middle initial

I have a user input from a textbox that contains a user's name
input can look like this:
var input = "Doe, John M";
However, input can be a whole lot more complex.
like:
var input = "Doe Sr, John M"
or "Doe, John M"
or "Doe, John, M"
or even "Doe Sr, John,M"
What I'd like to do is separate the last name (with the sr or jr) the first name, and then the middle initial.
So, these strings become :
var input = "Doe#John#M" or "Doe Sr#John#M" or "Doe#John#M"
I've tried this regular expression,
input = input.replace(/\s*,\s*/g, '#');
but this doesn't take into account the last middle initial.

I'm sure this can probably be done via RegEx but splitting the string into arrays is often faster and a little less complex (IMO). Try this function:
var parseName = function(s) {
var last = s.split(',')[0];
var first = s.split(',')[1].split(' ')[1];
var mi = s[s.length-1];
return {
first: first,
mi: mi,
last: last
};
};
You call it just passing in the name e.g. parseName('Doe, John M') and it returns an object with first, mi, last. I created a jsbin you can try that tests the formats of names you show in your question.

Checkout humanparser on npm.
https://www.npmjs.org/package/humanparser
Parse a human name string into salutation, first name, middle name, last name, suffix.
Install
npm install humanparser
Usage
var human = require('humanparser');
var fullName = 'Mr. William R. Jenkins, III'
, attrs = human.parseName(fullName);
console.log(attrs);
//produces the following output
{ saluation: 'Mr.',
firstName: 'William',
suffix: 'III',
lastName: 'Jenkins',
middleName: 'R.',
fullName: 'Mr. William R. Jenkins, III' }

The following seems to match your requirements
input = input.replace(/\s*,\s*|\s+(?=\S+$)/g, '#');

We Keep Coding

JavaScript is the programming language of the Web.

How to match a verb in any tense in Compromise.js - javascript

Related

Partial String Match - Dynamic Strings

splitting an string list with proper aligning the string elements

How to remove colon based emojis from text using javascript

set expected result as true when find in a string partly correct protractor

Splitting Name into last name, first name, middle initial

Categories

Resources