I'm trying to find out which would be the most optimal way of intersection a set of texts and find the common words in them. Given this scenario:
var t1 = 'My name is Mary-Ann, and I come from Kansas!';
var t2 = 'John, meet Mary, she comes from far away';
var t3 = 'Hi Mary-Ann, come here, nice to meet you!';
intersection result should be:
var result =["Mary"];
It should be able to ignore punctuation marks like .,!?-
Would a solution with regular expressions be optimal?
Here's a tested solution :
function intersect() {
var set = {};
[].forEach.call(arguments, function(a,i){
var tokens = a.match(/\w+/g);
if (!i) {
tokens.forEach(function(t){ set[t]=1 });
} else {
for (var k in set){
if (tokens.indexOf(k)<0) delete set[k];
}
}
});
return Object.keys(set);
}
This function is variadic, you can call it with any number of texts :
console.log(intersect(t1, t2, t3)) // -> ["Mary"]
console.log(intersect(t1, t2)) // -> ["Mary", "from"]
console.log(intersect()) // -> []
If you need to support non English languages, then this regex won't be enough because of the poor support of Unicode in JavaScript regexes. Either you use a regex library or you define your regex by explicitly excluding characters as in a.match(/[^\s\-.,!?]+/g); (this will probably be enough for you) .
Detailed explanation :
The idea is to fill a set with the tokens of the first text and then remove from the set the tokens missing in the other texts.
The set is a JavaScript object used as a map. Some purists would have used Object.create(null) to avoid a prototype, I like the simplicity of {}.
As I want my function to be variadic, I use arguments instead of defining the passed texts as explicit arguments.
arguments isn't a real array, so to iterate over it you need either a for loop or a trick like [].forEach.call. It works because arguments is "array-like".
To tokenize, I simply use match to match words, nothing special here (see note above regarding better support of other languages, though)
I use !i to check if it's the first text. In that case, I simply copy the tokens as properties in the set. A value must be used, I use 1. In the future, ES6 sets will make the intent more obvious here.
For the following texts, I iterate over the elements of the sets (the keys) and I remove the ones which are not in the array of tokens (tokens.indexOf(k)<0)
Finally, I return the elements of the sets because we want an array. The simplest solution is to use Object.keys.
Related
I'm a new learner of JavaScript, and when I get to learn the way of using a function. It sometime confuses me on why we should declare a new variable and add the variable to the action we want to execute. Let's look into the code.
function reverse(word){
Array.from(word);
let reverseWord='';
for(i = word.length-1; i >= 0; i--) {
reverseWord += word[i];
}
return reverseWord;
}
I'm sure you know this one of the way of reversing string in javascript, my question is:
Why do we need to declare a new variable within the function, when should we declare it?
Why can't I just type console.log(word[i]);?
What does it mean by wordLength+=word[i];?
Why should we return the new variable(wordLength), instead of the function(reverse) after the loop?
Why do we need to declare a new variable within the function...
Because you need a place to store the reversed word as you build it. (Note: wordLength isn't a good name for that variable. It doesn't contain the word's length. It contains the characters of the reversed word.)
...when should we declare it?
Any time before you first need it.
Why can't I just type console.log(word[i]);?
Because the goal of the exercise is to build a string containing the reversed word, not just to output it. (And because console.log writes a new line each time you call it.)
What does it mean by wordLength+=word[i];?
That adds the character in word[i] to the end of wordLength. For instance, if the word is "start", wordLength starts out with "", then gets "t" added to it to make it "t", then gets "r" added to it to make "tr", and so on.
(+= is a shorthand way to write wordLength = wordLength + word[i];. There are several of these compound assignment operators, most of them for math: -=, *=, etc.)
Side note: The Array.from call in your code isn't doing anything useful. It's creating an array, but then throwing that array away because nothing uses the return value. The rest of the code is using the string you receive in word.
Why do we need to declare a new variable within the function, when should we declare it?
Vars is a place to store data. If your algorithm requires keeping some data to use it later you need vars. Also well named variables is a good way to create easy-to-understand code
Why can't I just type console.log(word[i]);?
You can, but it will do nothing useful. Your goal is to build a string and return it. Usage of your function will be something like
const word = getSomeText()
const reversedText = reverse(word)
doSomeStuff(reversedText) // whatever, send it online, or render it on screen some fancy way, not in the console.
So you need to return actual string, not to solve a puzzle and show the answer whatever way you like
Why should we return the new variable(wordLength), instead of the function(reverse) after the loop?
Because it contains reversed word and you function supposed to return it. there is rare complicated occasions when a function returning itself is useful, but it has nothing in common with your task
Why do we need to declare a new variable within the function, when should we declare it?
Variable is required to store the data value that can be changed later on. In your case wordLength variable is required to store the reverse string.
It's best to declare variables when you first use them to ensure that they are always initialized to some valid value.
Why can't I just type console.log(word[i])
console.log() is used just to print the output but will not use if you want to return something and as per your statement it will just print the word[i] not a whole reverse string.
What does it mean by wordLength+=word[i]
It means you are concatenating the each iteration word[i] into a wordLength variable.
wordLength+=word[i] is a shorthand for wordLength = wordLength + word[i]. If the left hand side of the + operator is a string, JavaScript will coerce the right hand side to a string.
Why should we return the new variable(wordLength), instead of the function(reverse) after the loop ?
Because this is what you expected from the function. It returns the reversed string and function should return it.
I'd like to know how to replace a capture group with its uppercase in JavaScript. Here's a simplified version of what I've tried so far that's not working:
> a="foobar"
'foobar'
> a.replace( /(f)/, "$1".toUpperCase() )
'foobar'
> a.replace( /(f)/, String.prototype.toUpperCase.apply("$1") )
'foobar'
Would you explain what's wrong with this code?
You can pass a function to replace.
var r = a.replace(/(f)/, function(v) { return v.toUpperCase(); });
Explanation
a.replace( /(f)/, "$1".toUpperCase())
In this example you pass a string to the replace function. Since you are using the special replace syntax ($N grabs the Nth capture) you are simply giving the same value. The toUpperCase is actually deceiving because you are only making the replace string upper case (Which is somewhat pointless because the $ and one 1 characters have no upper case so the return value will still be "$1").
a.replace( /(f)/, String.prototype.toUpperCase.apply("$1"))
Believe it or not the semantics of this expression are exactly the same.
I know I'm late to the party but here is a shorter method that is more along the lines of your initial attempts.
a.replace('f', String.call.bind(a.toUpperCase));
So where did you go wrong and what is this new voodoo?
Problem 1
As stated before, you were attempting to pass the results of a called method as the second parameter of String.prototype.replace(), when instead you ought to be passing a reference to a function
Solution 1
That's easy enough to solve. Simply removing the parameters and parentheses will give us a reference rather than executing the function.
a.replace('f', String.prototype.toUpperCase.apply)
Problem 2
If you attempt to run the code now you will get an error stating that undefined is not a function and therefore cannot be called. This is because String.prototype.toUpperCase.apply is actually a reference to Function.prototype.apply() via JavaScript's prototypical inheritance. So what we are actually doing looks more like this
a.replace('f', Function.prototype.apply)
Which is obviously not what we have intended. How does it know to run Function.prototype.apply() on String.prototype.toUpperCase()?
Solution 2
Using Function.prototype.bind() we can create a copy of Function.prototype.call with its context specifically set to String.prototype.toUpperCase. We now have the following
a.replace('f', Function.prototype.apply.bind(String.prototype.toUpperCase))
Problem 3
The last issue is that String.prototype.replace() will pass several arguments to its replacement function. However, Function.prototype.apply() expects the second parameter to be an array but instead gets either a string or number (depending on if you use capture groups or not). This would cause an invalid argument list error.
Solution 3
Luckily, we can simply substitute in Function.prototype.call() (which accepts any number of arguments, none of which have type restrictions) for Function.prototype.apply(). We have now arrived at working code!
a.replace(/f/, Function.prototype.call.bind(String.prototype.toUpperCase))
Shedding bytes!
Nobody wants to type prototype a bunch of times. Instead we'll leverage the fact that we have objects that reference the same methods via inheritance. The String constructor, being a function, inherits from Function's prototype. This means that we can substitute in String.call for Function.prototype.call (actually we can use Date.call to save even more bytes but that's less semantic).
We can also leverage our variable 'a' since it's prototype includes a reference to String.prototype.toUpperCase we can swap that out with a.toUpperCase. It is the combination of the 3 solutions above and these byte saving measures that is how we get the code at the top of this post.
Why don't we just look up the definition?
If we write:
a.replace(/(f)/, x => x.toUpperCase())
we might as well just say:
a.replace('f','F')
Worse, I suspect nobody realises that their examples have been working only because they were capturing the whole regex with parentheses. If you look at the definition, the first parameter passed to the replacer function is actually the whole matched pattern and not the pattern you captured with parentheses:
function replacer(match, p1, p2, p3, offset, string)
If you want to use the arrow function notation:
a.replace(/xxx(yyy)zzz/, (match, p1) => p1.toUpperCase()
Old post but it worth to extend #ChaosPandion answer for other use cases with more restricted RegEx. E.g. ensure the (f) or capturing group surround with a specific format /z(f)oo/:
> a="foobazfoobar"
'foobazfoobar'
> a.replace(/z(f)oo/, function($0,$1) {return $0.replace($1, $1.toUpperCase());})
'foobazFoobar'
// Improve the RegEx so `(f)` will only get replaced when it begins with a dot or new line, etc.
I just want to highlight the two parameters of function makes finding a specific format and replacing a capturing group within the format possible.
SOLUTION
a.replace(/(f)/,(m,g)=>g.toUpperCase())
for replace all grup occurrences use /(f)/g regexp. The problem in your code: String.prototype.toUpperCase.apply("$1") and "$1".toUpperCase() gives "$1" (try in console by yourself) - so it not change anything and in fact you call twice a.replace( /(f)/, "$1") (which also change nothing).
let a= "foobar";
let b= a.replace(/(f)/,(m,g)=>g.toUpperCase());
let c= a.replace(/(o)/g,(m,g)=>g.toUpperCase());
console.log("/(f)/ ", b);
console.log("/(o)/g", c);
Given a dictionary (object, in this case, a Map) of property, values, and using .bind() as described at answers
const regex = /([A-z0-9]+)/;
const dictionary = new Map([["hello", 123]]);
let str = "hello";
str = str.replace(regex, dictionary.get.bind(dictionary));
console.log(str);
Using a JavaScript plain object and with a function defined to get return matched property value of the object, or original string if no match is found
const regex = /([A-z0-9]+)/;
const dictionary = {
"hello": 123,
[Symbol("dictionary")](prop) {
return this[prop] || prop
}
};
let str = "hello";
str = str.replace(regex, dictionary[Object.getOwnPropertySymbols(dictionary)[0]].bind(dictionary));
console.log(str);
In the case of string conversion from CamelCase to bash_case (ie: for filenames), use a callback with ternary operator.
The captured group selected with a regexp () in the first (left) replace arg is sent to the second (right) arg that is a callback function.
x and y give the captured string (don't know why 2 times!) and index (the third one) gives the index of the beginning of the captured group in the reference string.
Therefor a ternary operator can be used not to place _ at first occurence.
let str = 'MyStringName';
str = str.replace(/([^a-z0-9])/g, (x,y,index) => {
return index != 0 ? '_' + x.toLowerCase() : x.toLowerCase();
});
console.log(str);
this.breakintoletters=()=>
this.lengthi!==0?(this.title2=this.title,this.title2.split(),this.title2.
map((x)=>this.arol.push(new letter(x))))
:!!false
So basically this is the code. It's supposed to break the string into letters and then push pertinent objects into an array..
Checks for the length of the string, if not 0 proceeds, returns an errors where MAP function is at...) TypeError telling me it's not a function. Editor is not showing errors. Would appreciate help
I suggest to use a different approach by checking this.lengthi in advance and return either false, or later the mapped length of the pushed values.
this.breakintoletters = () => {
if (!this.lengthi) return false;
this.title2 = this.title;
// this.title2.split(); the result is not taken, not even splitted
return this.title2.map(x => this.arol.push(new letter(x)));
};
You're not assigning splited value back to this.title2 and than using map on this.split2 which is a string
this.breakintoletters=()=>
this.lengthi ? (this.title2=this.title,
this.title2=this.title2.split(),
this.title2.map((x)=>this.arol.push(new letter(x))))
:false
IMO you should try to make your code consice only upto a point where it stays readabale you can simply it in following manner
this.breakintoletters = ( ) => {
if(this.lengthi === 0 ) return false;
this.title2=this.title;
this.title2=this.title2.split();
return this.title2.map((x) => this.arol.push(new letter(x))))
}
(this.title2=this.title, this.title2.split(), this.title2.
map((x)=>this.arol.push(new letter(x))))
Is this.arol the name of an array?
Try restructuring it to be:
(this.title.split().map((x) => this.arol.push(new letter(x)))
Methods like split() join() map() etc can be chained together.
I would rethink using the map function here though and the ternary which other commenters covered above. I mean it works technically, but if the goal is to iterate through the string in order to push certain values, it'd be better to use a for loop. Map functions are more when you want to iterate in order to apply the same specified method to each individual character
Also this is just a formatting thing but it makes it a lot easier to read and understand your code when you have some spaces between variables and operators, and choosing variable names that make sense for what you are doing (this.bookLength, this.reverseAr), or at least using the generic this.array or this.arr It makes it easier to ask questions like this because you'll get less clarifying questions about typos, and also if you are ever planning to work on a larger code base it's important to write clean code that is understandable to someone who doesn't know you
I am working through the exercises in the book Object-Oriented JavaScript by Stoyan Stefanov. The exercise is asking me to create a function constructor for a String object. None of the built-in String properties or methods can be used. I am trying to recreate returning a character at a certain index of a string. So the following code is the part of the exercise I am having difficulty getting to work:
var s = new MyString('hello');
s[0];
I cannot figure out how to have my function constructor return the character at the index specified. I should be able to display to the screen the character 'h'. I was able to specifically target certain indexes but that would not be usable as there could be any number of characters in the string passed into the function constructor. Here is the code for that, this return value is for the constructor itself:
return {
'0': this.string[0]; // Is this code using built-in String object properties or methods?
}
Okay thanks if you can point me in the right direction.
A simple way to achieve this is to not make it act like a real string, but only deal with the letters that do exist by running over the input string as an array using forEach:
var MyString = function(content) {
var thisObject = this;
var letters = content.split('');
letters.forEach(function(letter, position) {
thisObject[position] = letter;
});
};
JS objects are all dynamic property/value maps, so you can set a binding that is effectively this[0] = 't'; this[1] = 'h'; this[2] = 'e' and have something that works.
Does this make sense to do? Not... really? I don't quite see what this exercise teaches you if it's telling you that your code should allow for yourstring[somenumber], but this would be one way to do it.
a="12345"
a[2]=3
a[2]='9'
console.log(a) //=> "12345"
What is going on?? This quirk caused me 1 hour painful debugging. How to avoid this in a sensible way?
You cannot use brackets to rewrite individual characters of the string; only 'getter' (i.e. read) access is available. Quoting the doc (MDN):
For character access using bracket notation, attempting to delete or
assign a value to these properties will not succeed. The properties
involved are neither writable nor configurable.
That's for "what's going on" part of the question. And for "how to replace" part there's a useful snippet (taken from an answer written long, long ago):
String.prototype.replaceAt = function(index, char) {
return this.slice(0, index) + char + this.slice(index+char.length);
}
You may use as it is (biting the bullet of extending the JS native object) - or inject this code as a method in some utility object (obviously it should be rewritten a bit, taking the source string as its first param and working with it instead of this).
According to this question, this is not supported among all browsers.
If your strings aren't too long, you can do this relatively easy like that:
var a="12345";
a = a.split("");
a[2]='9';
a = a.join("");
console.log(a);
var letters = a.split('');
letters[2] = 3;
letters[2] = 9;
console.log(letters.join(''));
http://jsfiddle.net/XWwKz/
Cheers