Check similarity between strings in Javascript [closed]

Check similarity between strings in Javascript [closed] - javascript

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
How to compare 2 text strings to see if they are similar, for example:
var a = "Hello Blue World";
var b = "Hello Blut World?";
if(a similar b)
{
console.log(true);
}

You could use string-similarity library.
Finds degree of similarity between strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.
var a = "Hello Blue World";
var b = "Hello Blut World?";
var stringSimilarity = require("string-similarity");
var similarityCoef = stringSimilarity.compareTwoStrings(a, b);
if (similarityCoef > 0.8) { console.log(true); }
Note that similarityCoef print true when the string matches at 80% (0.8). You can adjust this value to your needs.

That's tricky. Because you have somehow to tell in percent what similarity means for you. What about this approach?
You compare them string by string and count the matches. I know that this will fail as soon as there is one additional character very early in one of the strings. But for a start it should suffice.
var a = "Hello Blue World";
var b = "Hello Blut World?";
// only compare both strings with their mutual length, because of the loop we use
const mutualLength = (a.length > b.length) ? b.length : a.length;
const similarityAt = 90; // percent
let matchCount = 0;
// with each match increase matchCount by 1
for (let pointer = 0; pointer < mutualLength; pointer++) {
if (a.substring(pointer, 1) === (b.substring(pointer, 1) {
matchCount++;
}
}
// compute similarity in percent
const similarity = (matchCount * 100) / mutualLength;
console.log('Similarity given: ' + (similarity >= similarityAt));

B"H
Depends on degree of similarity, but if you want a percent amount to tell you what matches, you can simply loop through the shortest, and keep track of how many times each character matches with the subsequent index of the longest string (or add whitespace to the shortest string, but that might mess up some calculations), and then divide the total number of matches with the length of the (shortest) string to get the percent of equality
var str1 = "Hello blue world"
var str2 = "Hello blut world?!"
var shortest = str2.length >= str1.length?str1:str2
var longest = str2.length < str1.length?str1:str2
var matches= 0
shortest
.split("")
//Just check if index of shortest
//Matches index of longest, and if so (&& means do next
//Expression) add the total number of matches by one
.forEach (
(x,k)=>
((x==longest[k]) && (matches++) )
)
//Final result, divide matches by total length
var similarity = matches / shortest.length

Related

Calculate an alphabetic score for a word

How can I generate a numeric score for a string, which I can later user to order things alphabetically?
(I'd like to add objectIds to a redis sorted set based on a name property. This needs a numeric score. My list-of-things might be too big to sort all at once, hence wanting to score each item individually)
Words earlier in an alphabetic list should have a lower score, with 'a' = 0.
My naive approach so far; (letter alphabetic position from Replace a letter with its alphabet position )
function alphaScoreString(inputString) {
let score = 0
inputString
.trim()
.toLowerCase()
.split('')
.map((letter, index) => {
const letterNumber = parseInt(letter, 36) - 10
if (letterNumber >= 0) {
score += letterNumber / (index + 1)
}
})
return score * 1000
}
This does not work, as
alphaScoreString('bb')
1500
alphaScoreString('bc')
2000
alphaScoreString('bbz')
9833.333333333334
You can see that 'bbz' has a higher score than 'bc', whereas it should be lower, as 'bbz' would come before 'bc' in an alphabetical list.

You can convert each character to its unicode (and ensure that every character is 4 digits by padding the string. e.g. "H" = 72 but is padded to 0072: Doing a word by word comparison, you can still determine the 'alphabetical order' of each string:
var instring = "Hello World";
var output = "";
for(i=0; i<instring.length;i++){
const newchar = String(instring.charCodeAt(i)).padStart(4, '0');
output = output.concat(newchar)
console.log(output);
}

Answer writen in python.
char_codex = {'a':0.01, 'b':0.02, 'c':0.03, 'd':0.04, 'e':0.05, 'f':0.06,
'g':0.07, 'h':0.08, 'i':0.09, 'j':0.10, 'k':0.11, 'l':0.12,
'm':0.13, 'n':0.14, 'o':0.15, 'p':0.16, 'q':0.17, 'r':0.18,
's':0.19, 't':0.20, 'u':0.21, 'v':0.22, 'w':0.23, 'x':0.24,
'y':0.25, 'z':0.26}
def alphabetic_score(word):
bitwiseshift = '1'
scores = [0.00] * len(word)
for index, letter in enumerate(word.lower()):
if index is 0:
scores[index] = char_codex[letter]
else:
bitwiseshift = bitwiseshift+'00'
scores[index] = char_codex[letter]/int(bitwiseshift)
return sum(scores)

Find highest number of decimal places in String [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
The example string(s) can look like that:
"3.0000+3"
"3.00+3.00"
"3.00+3.000"
I want to find the highest amount of decimal places out of the numbers inside 1 string

The most straight-forward way to do this is iterating over the string, checking for the occurence of a dot and from there on count the number of digits up to the next character that's NOT a number or the end of the string. Since your string contains multiple numbers you need to add a variable which holds the highest amount of decimal places.
e.g.
var str = "3.00+3.000";
function getDecimalPlaces(numb) {
var highest = 0;
var counter = 0;
for (a = 0; a < numb.length; a++) {
if (numb.charAt(a - 1) == ".") {
do {
counter++;
a++;
}
while (!isNaN(numb.charAt(a)) && a < numb.length);
}
if (counter > highest) {
highest = counter;
}
counter = 0;
}
return highest;
}
console.log(str + " has " + getDecimalPlaces(str) + " decimal places");
This can be made a bit more elegant by using a regular expression in conjunction with the .match() method. This searches a string for a given pattern and returns an array of results.
var str = "3.00+3.000";
console.log(str.match(/(\.)([0-9]+)/g));
This will return an array like:
[".00", ".000"]
By comparing the length of it's elements - minus 1 since it includes the dot - we can get the number of decimal places using this nifty short function:
var str = "3.00+3.000";
var highest = str.match(/(\.)([0-9]+)/g).reduce(function(a, b) {
return Math.max(a.length - 1, b.length - 1);
});
console.log(str + " has " + highest + " decimal places");

Find extra character between two strings [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
How can I find an extra character between two strings in an optimal way.
Ex1: S1 - 'abcd', S2 - 'abcxd', output - 'x'
Ex2: S1 - '100001', S2 - '1000011', output - '1'
We can do this by traversing linearly and comparing each character in O(n). I want this to be done in much more optimal way, say in O(logn)

Baseline method (O(n)): Just comparing chars and narrowing in on both sides each cycle.
function findDiffChar(base, baseExtraChar) {
let extraLastIndex = base.length;
let lastIndex = extraLastIndex - 1;
for (let i = 0; i < extraLastIndex / 2; i++) {
console.log(`Loop: ${i}`);
if (base[i] !== baseExtraChar[i])
return baseExtraChar[i];
if (base[lastIndex - i] !== baseExtraChar[extraLastIndex - i])
return baseExtraChar[extraLastIndex - i];
}
return false;
}
console.log(findDiffChar('FOOOOOAR', 'FOOOOOBAR')); // B
Improved method using binary search (O(log n)): Compare halves until you've narrowed it down to one character.
function findDiffChar(base, baseExtraChar) {
if (baseExtraChar.length === 1) return baseExtraChar.charAt(0);
let halfBaseLen = Number.parseInt(base.length / 2) || 1;
let halfBase = base.substring(0,halfBaseLen);
let halfBaseExtra = baseExtraChar.substring(0,halfBaseLen);
return (halfBase !== halfBaseExtra)
? findDiffChar(halfBase, halfBaseExtra)
: findDiffChar(base.substring(halfBaseLen),baseExtraChar.substring(halfBaseLen));
}
console.log(findDiffChar('FOOOOAR', 'FOOOOBAR')); // B
console.log(findDiffChar('---------', '--------X')); // X
console.log(findDiffChar('-----------', '-----X-----')); // X
console.log(findDiffChar('------------', '---X--------')); // X
console.log(findDiffChar('----------', '-X--------')); // X
console.log(findDiffChar('----------', 'X---------')); // X

How can I split a string into a given number of lines?

Here is my question:
Given a string, which is made up of space separated words, how can I split that into N strings of (roughly) even length, only breaking on spaces?
Here is what I've gathered from research:
I started by researching word-wrapping algorithms, because it seems to me that this is basically a word-wrapping problem. However, the majority of what I've found so far (and there is A LOT out there about word wrapping) assumes that the width of the line is a known input, and the number of lines is an output. I want the opposite.
I have found a (very) few questions, such as this that seem to be helpful. However, they are all focused on the problem as one of optimization - e.g. how can I split a sentence into a given number of lines, while minimizing the raggedness of the lines, or the wasted whitespace, or whatever, and do it in linear (or NlogN, or whatever) time. These questions seem mostly to be unanswered, as the optimization part of the problem is relatively "hard".
However, I don't care that much about optimization. As long as the lines are (in most cases) roughly even, I'm fine if the solution doesn't work in every single edge case, or can't be proven to be the least time complexity. I just need a real world solution that can take a string, and a number of lines (greater than 2), and give me back an array of strings that will usually look pretty even.
Here is what I've come up with:
I think I have a workable method for the case when N=3. I start by putting the first word on the first line, the last word on the last line, and then iteratively putting another word on the first and last lines, until my total width (measured by the length of the longest line) stops getting shorter. This usually works, but it gets tripped up if your longest words are in the middle of the line, and it doesn't seem very generalizable to more than 3 lines.
var getLongestHeaderLine = function(headerText) {
//Utility function definitions
var getLongest = function(arrayOfArrays) {
return arrayOfArrays.reduce(function(a, b) {
return a.length > b.length ? a : b;
});
};
var sumOfLengths = function(arrayOfArrays) {
return arrayOfArrays.reduce(function(a, b) {
return a + b.length + 1;
}, 0);
};
var getLongestLine = function(lines) {
return lines.reduce(function(a, b) {
return sumOfLengths(a) > sumOfLengths(b) ? a : b;
});
};
var getHeaderLength = function(lines) {
return sumOfLengths(getLongestLine(lines));
}
//first, deal with the degenerate cases
if (!headerText)
return headerText;
headerText = headerText.trim();
var headerWords = headerText.split(" ");
if (headerWords.length === 1)
return headerText;
if (headerWords.length === 2)
return getLongest(headerWords);
//If we have more than 2 words in the header,
//we need to split them into 3 lines
var firstLine = headerWords.splice(0, 1);
var lastLine = headerWords.splice(-1, 1);
var lines = [firstLine, headerWords, lastLine];
//The header length is the length of the longest
//line in the header. We will keep iterating
//until the header length stops getting shorter.
var headerLength = getHeaderLength(lines);
var lastHeaderLength = headerLength;
while (true) {
//Take the first word from the middle line,
//and add it to the first line
firstLine.push(headerWords.shift());
headerLength = getHeaderLength(lines);
if (headerLength > lastHeaderLength || headerWords.length === 0) {
//If we stopped getting shorter, undo
headerWords.unshift(firstLine.pop());
break;
}
//Take the last word from the middle line,
//and add it to the last line
lastHeaderLength = headerLength;
lastLine.unshift(headerWords.pop());
headerLength = getHeaderLength(lines);
if (headerLength > lastHeaderLength || headerWords.length === 0) {
//If we stopped getting shorter, undo
headerWords.push(lastLine.shift());
break;
}
lastHeaderLength = headerLength;
}
return getLongestLine(lines).join(" ");
};
debugger;
var header = "an apple a day keeps the doctor away";
var longestHeaderLine = getLongestHeaderLine(header);
debugger;
EDIT: I tagged javascript, because ultimately I would like a solution I can implement in that language. It's not super critical to the problem though, and I would take any solution that works.
EDIT#2: While performance is not what I'm most concerned about here, I do need to be able to perform whatever solution I come up with ~100-200 times, on strings that can be up to ~250 characters long. This would be done during a page load, so it needs to not take forever. For example, I've found that trying to offload this problem to the rendering engine by putting each string into a DIV and playing with the dimensions doesn't work, since it (seems to be) incredibly expensive to measure rendered elements.

Try this. For any reasonable N, it should do the job:
function format(srcString, lines) {
var target = "";
var arr = srcString.split(" ");
var c = 0;
var MAX = Math.ceil(srcString.length / lines);
for (var i = 0, len = arr.length; i < len; i++) {
var cur = arr[i];
if(c + cur.length > MAX) {
target += '\n' + cur;
c = cur.length;
}
else {
if(target.length > 0)
target += " ";
target += cur;
c += cur.length;
}
}
return target;
}
alert(format("this is a very very very very " +
"long and convoluted way of creating " +
"a very very very long string",7));

You may want to give this solution a try, using canvas. It will need optimization and is only a quick shot, but I think canvas might be a good idea as you can calculate real widths. You can also adjust the font to the really used one, and so on. Important to note: This won't be the most performant way of doing things. It will create a lot of canvases.
DEMO
var t = `However, I don't care that much about optimization. As long as the lines are (in most cases) roughly even, I'm fine if the solution doesn't work in every single edge case, or can't be proven to be the least time complexity. I just need a real world solution that can take a string, and a number of lines (greater than 2), and give me back an array of strings that will usually look pretty even.`;
function getTextTotalWidth(text) {
var canvas = document.createElement("canvas");
var ctx = canvas.getContext("2d");
ctx.font = "12px Arial";
ctx.fillText(text,0,12);
return ctx.measureText(text).width;
}
function getLineWidth(lines, totalWidth) {
return totalWidth / lines ;
}
function getAverageLetterSize(text) {
var t = text.replace(/\s/g, "").split("");
var sum = t.map(function(d) {
return getTextTotalWidth(d);
}).reduce(function(a, b) { return a + b; });
return sum / t.length;
}
function getLines(text, numberOfLines) {
var lineWidth = getLineWidth(numberOfLines, getTextTotalWidth(text));
var letterWidth = getAverageLetterSize(text);
var t = text.split("");
return createLines(t, letterWidth, lineWidth);
}
function createLines(t, letterWidth, lineWidth) {
var i = 0;
var res = t.map(function(d) {
if (i < lineWidth || d != " ") {
i+=letterWidth;
return d;
}
i = 0;
return "<br />";
})
return res.join("");
}
var div = document.createElement("div");
div.innerHTML = getLines(t, 7);
document.body.appendChild(div);

I'm sorry this is C#. I had created my project already when you updated your post with the Javascript tag.
Since you said all you care about is roughly the same line length... I came up with this. Sorry for the simplistic approach.
private void DoIt() {
List<string> listofwords = txtbx_Input.Text.Split(' ').ToList();
int totalcharcount = 0;
int neededLineCount = int.Parse(txtbx_LineCount.Text);
foreach (string word in listofwords)
{
totalcharcount = totalcharcount + word.Count(char.IsLetter);
}
int averagecharcountneededperline = totalcharcount / neededLineCount;
List<string> output = new List<string>();
int positionsneeded = 0;
while (output.Count < neededLineCount)
{
string tempstr = string.Empty;
while (positionsneeded < listofwords.Count)
{
tempstr += " " + listofwords[positionsneeded];
if ((positionsneeded != listofwords.Count - 1) && (tempstr.Count(char.IsLetter) + listofwords[positionsneeded + 1].Count(char.IsLetter) > averagecharcountneededperline))//if (this is not the last word) and (we are going to bust the average)
{
if (output.Count + 1 == neededLineCount)//if we are writting the last line
{
//who cares about exceeding.
}
else
{
//we're going to exceed the allowed average, gotta force this loop to stop
positionsneeded++;//dont forget!
break;
}
}
positionsneeded++;//increment the needed position by one
}
output.Add(tempstr);//store the string in our list of string to output
}
//display the line on the screen
foreach (string lineoftext in output)
{
txtbx_Output.AppendText(lineoftext + Environment.NewLine);
}
}

(Adapted from here, How to partition an array of integers in a way that minimizes the maximum of the sum of each partition?)
If we consider the word lengths as a list of numbers, we can binary search the partition.
Our max length ranges from 0 to sum (word-length list) + (num words - 1), meaning the spaces. mid = (range / 2). We check if mid can be achieved by partitioning into N sets in O(m) time: traverse the list, adding (word_length + 1) to the current part while the current sum is less than or equal to mid. When the sum passes mid, start a new part. If the result includes N or less parts, mid is achievable.
If mid can be achieved, try a lower range; otherwise, a higher range. The time complexity is O(m log num_chars). (You'll also have to consider how deleting a space per part, meaning where the line break would go, features into the calculation.)
JavaScript code (adapted from http://articles.leetcode.com/the-painters-partition-problem-part-ii):
function getK(arr,maxLength) {
var total = 0,
k = 1;
for (var i=0; i<arr.length; i++) {
total += arr[i] + 1;
if (total > maxLength) {
total = arr[i];
k++;
}
}
return k;
}
function partition(arr,n) {
var lo = Math.max(...arr),
hi = arr.reduce((a,b) => a + b);
while (lo < hi) {
var mid = lo + ((hi - lo) >> 1);
var k = getK(arr,mid);
if (k <= n){
hi = mid;
} else{
lo = mid + 1;
}
}
return lo;
}
var s = "this is a very very very very "
+ "long and convoluted way of creating "
+ "a very very very long string",
n = 7;
var words = s.split(/\s+/),
maxLength = partition(words.map(x => x.length),7);
console.log('max sentence length: ' + maxLength);
console.log(words.length + ' words');
console.log(n + ' lines')
console.log('')
var i = 0;
for (var j=0; j<n; j++){
var str = '';
while (true){
if (!words[i] || str.length + words[i].length > maxLength){
break
}
str += words[i++] + ' ';
}
console.log(str);
}

Using the Java String Split() Method to split a string we will discover How and Where to Apply This String Manipulation Technique:
We'll examine the Java Split() method's explanation and discover how to apply it. The principles are explained simply and with enough programming examples, either as a separate explanation or in the comment part of the programs.
The Java String Split() method is used to divide or split the calling Java String into pieces and return the Array, as the name implies. The delimiters("", " ", ) or regular expressions that we have supplied separately for each component or item of an array.
Syntax
String[ ] split(String regExp)
First Case: It involves initializing a Java String variable with a variety of words separated by spaces, using the Java String Split() method, and evaluating the results. We can effectively print each word without the space using the Java Split() function.
Second Case: In this case, we initialize a Java String variable and attempt to split or deconstruct the main String variable to use the String Split() method utilizing a substring of the initialized String variable.
Third Case: In this case, we will attempt to split a String using its character by taking a String variable (a single word).
You can check out other approaches to this problem on YouTube and even coding websites on google such as Coding Ninjas

This old question was revived by a recent answer, and I think I have a simpler technique than the answers so far:
const evenSplit = (text = '', lines = 1) => {
if (lines < 2) {return [text]}
const baseIndex = Math .round (text .length / lines)
const before = text .slice (0, baseIndex) .lastIndexOf (' ')
const after = text .slice (baseIndex) .indexOf (' ') + baseIndex
const index = after - baseIndex < baseIndex - before ? after : before
return [
text .slice (0, index),
... evenSplit (text .slice (index + (before > -1 ? 1 : 0)), lines - 1)
]
}
const text = `However, I don't care that much about optimization. As long as the lines are (in most cases) roughly even, I'm fine if the solution doesn't work in every single edge case, or can't be proven to be the least time complexity. I just need a real world solution that can take a string, and a number of lines (greater than 2), and give me back an array of strings that will usually look pretty even.`
const display = (lines) => console .log (lines .join ('\n'))
display (evenSplit (text, 7))
display (evenSplit (text, 5))
display (evenSplit (text, 12))
display (evenSplit (`this should be three lines, but it has a loooooooooooooooooooooooooooooooong word`, 3))
.as-console-wrapper {max-height: 100% !important; top: 0}
It works by finding the first line then recurring on the remaining text with one fewer lines. The recursion bottoms out when we have a single line. To calculate the first line, we take an initial target index which is just an equal share of the string based on its length and the number of lines. We then check to find the closest space to that index, and split the string there.
It does no optimization, and could certainly be occasionally misled by long words, but mostly it just seems to work.

Spliting the binary string in half

I am trying to split binary number in half and then just add 4 zeroes.
For example for 10111101 I want to end up with only the first half of the number and make the rest of the number zeroes. What I want to end up would be 10110000.
Can you help me with this?

Use substring to split and then looping to pad
var str = '10111101';
var output = str.substring( 0, str.length/2 );
for ( var counter = 0; counter < str.length/2; counter++ )
{
output += "0";
}
alert(output)

try this (one-liner)
var binary_str = '10111101';
var padded_binary = binary_str.slice(0, binary_str.length/2) + new Array(binary_str.length/2+1).join('0');
console.log([binary_str,padded_binary]);
sample output
['10111101','10110000']

I guess you are using JavaScript...
"10111101".substr(0, 4) + "0000";

It's a bit unclear if you are trying to operate on numbers or strings. The answers already given do a good job of showing how to operate on a strings. If you want to operate with numbers only, you can do something like:
// count the number of leading 0s in a 32-bit word
function nlz32 (word) {
var count;
for (count = 0; count < 32; count ++) {
if (word & (1 << (31 - count))) {
break;
}
}
return count;
}
function zeroBottomHalf (num) {
var digits = 32 - nlz32(num); // count # of digits in num
var half = Math.floor(digits / 2);// how many to set to 0
var lowerMask = (1 << half) - 1; //mask for lower bits: 0b00001111
var upperMask = ~lowerMask //mask for upper bits: 0b11110000
return num & upperMask;
}
var before = 0b10111101;
var after = zeroBottomHalf(before);
console.log('before = ', before.toString(2)); // outputs: 10111101
console.log('after = ', after.toString(2)); // outputs: 10110000
In practice, it is probably simplest to covert your number to a string with num.toString(2), then operate on it like a string as in one of the other answers. At the end you can convert back to a number with parseInt(str, 2)

If you have a real number, not string, then just use binary arithmetic. Assuming your number is always 8 binary digits long - your question is kinda vague on that - it'd be simply:
console.log((0b10111101 & 0b11110000).toString(2))
// 10110000

We Keep Coding

JavaScript is the programming language of the Web.

Check similarity between strings in Javascript [closed] - javascript

Related

Calculate an alphabetic score for a word

Find highest number of decimal places in String [closed]

Find extra character between two strings [closed]

How can I split a string into a given number of lines?

Spliting the binary string in half

Categories

Resources