Sort array by specific array objects column [duplicate] - javascript

I have a list of objects I wish to sort based on a field attr of type string. I tried using -
list.sort(function (a, b) {
return a.attr - b.attr
})
but found that - doesn't appear to work with strings in JavaScript. How can I sort a list of objects based on an attribute with type string?

Use String.prototype.localeCompare as per your example:
list.sort(function (a, b) {
return ('' + a.attr).localeCompare(b.attr);
})
We force a.attr to be a string to avoid exceptions. localeCompare has been supported since Internet Explorer 6 and Firefox 1. You may also see the following code used that doesn't respect a locale:
if (item1.attr < item2.attr)
return -1;
if ( item1.attr > item2.attr)
return 1;
return 0;

An updated answer (October 2014)
I was really annoyed about this string natural sorting order so I took quite some time to investigate this issue.
Long story short
localeCompare() character support is badass, just use it.
As pointed out by Shog9, the answer to your question is:
return item1.attr.localeCompare(item2.attr);
Bugs found in all the custom JavaScript "natural string sort order" implementations
There are quite a bunch of custom implementations out there, trying to do string comparison more precisely called "natural string sort order"
When "playing" with these implementations, I always noticed some strange "natural sorting order" choice, or rather mistakes (or omissions in the best cases).
Typically, special characters (space, dash, ampersand, brackets, and so on) are not processed correctly.
You will then find them appearing mixed up in different places, typically that could be:
some will be between the uppercase 'Z' and the lowercase 'a'
some will be between the '9' and the uppercase 'A'
some will be after lowercase 'z'
When one would have expected special characters to all be "grouped" together in one place, except for the space special character maybe (which would always be the first character). That is, either all before numbers, or all between numbers and letters (lowercase & uppercase being "together" one after another), or all after letters.
My conclusion is that they all fail to provide a consistent order when I start adding barely unusual characters (i.e., characters with diacritics or characters such as dash, exclamation mark and so on).
Research on the custom implementations:
Natural Compare Lite https://github.com/litejs/natural-compare-lite : Fails at sorting consistently https://github.com/litejs/natural-compare-lite/issues/1 and http://jsbin.com/bevututodavi/1/edit?js,console, basic Latin characters sorting http://jsbin.com/bevututodavi/5/edit?js,console
Natural Sort https://github.com/javve/natural-sort : Fails at sorting consistently, see issue https://github.com/javve/natural-sort/issues/7 and see basic Latin characters sorting http://jsbin.com/cipimosedoqe/3/edit?js,console
JavaScript Natural Sort https://github.com/overset/javascript-natural-sort: seems rather neglected since February 2012, Fails at sorting consistently, see issue https://github.com/overset/javascript-natural-sort/issues/16
Alphanum http://www.davekoelle.com/files/alphanum.js , Fails at sorting consistently, see http://jsbin.com/tuminoxifuyo/1/edit?js,console
Browsers' native "natural string sort order" implementations via localeCompare()
localeCompare() oldest implementation (without the locales and options arguments) is supported by Internet Explorer 6 and later, see http://msdn.microsoft.com/en-us/library/ie/s4esdbwz(v=vs.94).aspx (scroll down to localeCompare() method).
The built-in localeCompare() method does a much better job at sorting, even international & special characters.
The only problem using the localeCompare() method is that "the locale and sort order used are entirely implementation dependent". In other words, when using localeCompare such as stringOne.localeCompare(stringTwo): Firefox, Safari, Chrome, and Internet Explorer have a different sort order for Strings.
Research on the browser-native implementations:
http://jsbin.com/beboroyifomu/1/edit?js,console - basic Latin characters comparison with localeCompare()
http://jsbin.com/viyucavudela/2/ - basic Latin characters comparison with localeCompare() for testing on Internet Explorer 8
http://jsbin.com/beboroyifomu/2/edit?js,console - basic Latin characters in string comparison : consistency check in string vs when a character is alone
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/localeCompare - Internet Explorer 11 and later supports the new locales & options arguments
Difficulty of "string natural sorting order"
Implementing a solid algorithm (meaning: consistent but also covering a wide range of characters) is a very tough task. UTF-8 contains more than 2000 characters and covers more than 120 scripts (languages).
Finally, there are some specification for this tasks, it is called the "Unicode Collation Algorithm", which can be found at http://www.unicode.org/reports/tr10/. You can find more information about this on this question I posted https://softwareengineering.stackexchange.com/questions/257286/is-there-any-language-agnostic-specification-for-string-natural-sorting-order
Final conclusion
So considering the current level of support provided by the JavaScript custom implementations I came across, we will probably never see anything getting any close to supporting all this characters and scripts (languages). Hence I would rather use the browsers' native localeCompare() method. Yes, it does have the downside of being non-consistent across browsers but basic testing shows it covers a much wider range of characters, allowing solid & meaningful sort orders.
So as pointed out by Shog9, the answer to your question is:
return item1.attr.localeCompare(item2.attr);
Further reading:
https://softwareengineering.stackexchange.com/questions/257286/is-there-any-language-agnostic-specification-for-string-natural-sorting-order
How to sort strings in JavaScript
Natural sort of alphanumerical strings in JavaScript
Sort Array of numeric & alphabetical elements (Natural Sort)
Sort mixed alpha/numeric array
https://web.archive.org/web/20130929122019/http://my.opera.com/GreyWyvern/blog/show.dml/1671288
https://web.archive.org/web/20131005224909/http://www.davekoelle.com/alphanum.html
http://snipplr.com/view/36012/javascript-natural-sort/
http://blog.codinghorror.com/sorting-for-humans-natural-sort-order/
Thanks to Shog9's nice answer, which put me in the "right" direction I believe.

Answer (in Modern ECMAScript)
list.sort((a, b) => (a.attr > b.attr) - (a.attr < b.attr))
Or
list.sort((a, b) => +(a.attr > b.attr) || -(a.attr < b.attr))
Description
Casting a boolean value to a number yields the following:
true -> 1
false -> 0
Consider three possible patterns:
x is larger than y: (x > y) - (y < x) -> 1 - 0 -> 1
x is equal to y: (x > y) - (y < x) -> 0 - 0 -> 0
x is smaller than y: (x > y) - (y < x) -> 0 - 1 -> -1
(Alternative)
x is larger than y: +(x > y) || -(x < y) -> 1 || 0 -> 1
x is equal to y: +(x > y) || -(x < y) -> 0 || 0 -> 0
x is smaller than y: +(x > y) || -(x < y) -> 0 || -1 -> -1
So these logics are equivalent to typical sort comparator functions.
if (x == y) {
return 0;
}
return x > y ? 1 : -1;

Since strings can be compared directly in JavaScript, this will do the job:
list.sort(function (a, b) {
return a.attr < b.attr ? -1: 1;
})
This is a little bit more efficient than using
return a.attr > b.attr ? 1: -1;
because in case of elements with same attr (a.attr == b.attr), the sort function will swap the two for no reason.
For example
var so1 = function (a, b) { return a.atr > b.atr ? 1: -1; };
var so2 = function (a, b) { return a.atr < b.atr ? -1: 1; }; // Better
var m1 = [ { atr: 40, s: "FIRST" }, { atr: 100, s: "LAST" }, { atr: 40, s: "SECOND" } ].sort (so1);
var m2 = [ { atr: 40, s: "FIRST" }, { atr: 100, s: "LAST" }, { atr: 40, s: "SECOND" } ].sort (so2);
// m1 sorted but ...: 40 SECOND 40 FIRST 100 LAST
// m2 more efficient: 40 FIRST 40 SECOND 100 LAST

You should use > or < and == here. So the solution would be:
list.sort(function(item1, item2) {
var val1 = item1.attr,
val2 = item2.attr;
if (val1 == val2) return 0;
if (val1 > val2) return 1;
if (val1 < val2) return -1;
});

Nested ternary arrow function
(a,b) => (a < b ? -1 : a > b ? 1 : 0)

I had been bothered about this for long, so I finally researched this and give you this long winded reason for why things are the way they are.
From the spec:
Section 11.9.4 The Strict Equals Operator ( === )
The production EqualityExpression : EqualityExpression === RelationalExpression
is evaluated as follows:
- Let lref be the result of evaluating EqualityExpression.
- Let lval be GetValue(lref).
- Let rref be the result of evaluating RelationalExpression.
- Let rval be GetValue(rref).
- Return the result of performing the strict equality comparison
rval === lval. (See 11.9.6)
So now we go to 11.9.6
11.9.6 The Strict Equality Comparison Algorithm
The comparison x === y, where x and y are values, produces true or false.
Such a comparison is performed as follows:
- If Type(x) is different from Type(y), return false.
- If Type(x) is Undefined, return true.
- If Type(x) is Null, return true.
- If Type(x) is Number, then
...
- If Type(x) is String, then return true if x and y are exactly the
same sequence of characters (same length and same characters in
corresponding positions); otherwise, return false.
That's it. The triple equals operator applied to strings returns true iff the arguments are exactly the same strings (same length and same characters in corresponding positions).
So === will work in the cases when we're trying to compare strings which might have arrived from different sources, but which we know will eventually have the same values - a common enough scenario for inline strings in our code. For example, if we have a variable named connection_state, and we wish to know which one of the following states ['connecting', 'connected', 'disconnecting', 'disconnected'] is it in right now, we can directly use the ===.
But there's more. Just above 11.9.4, there is a short note:
NOTE 4
Comparison of Strings uses a simple equality test on sequences of code
unit values. There is no attempt to use the more complex, semantically oriented
definitions of character or string equality and collating order defined in the
Unicode specification. Therefore Strings values that are canonically equal
according to the Unicode standard could test as unequal. In effect this
algorithm assumes that both Strings are already in normalized form.
Hmm. What now? Externally obtained strings can, and most likely will, be weird unicodey, and our gentle === won't do them justice. In comes localeCompare to the rescue:
15.5.4.9 String.prototype.localeCompare (that)
...
The actual return values are implementation-defined to permit implementers
to encode additional information in the value, but the function is required
to define a total ordering on all Strings and to return 0 when comparing
Strings that are considered canonically equivalent by the Unicode standard.
We can go home now.
tl;dr;
To compare strings in javascript, use localeCompare; if you know that the strings have no non-ASCII components because they are, for example, internal program constants, then === also works.

An explanation of why the approach in the question doesn't work:
let products = [
{ name: "laptop", price: 800 },
{ name: "phone", price:200},
{ name: "tv", price: 1200}
];
products.sort( (a, b) => {
{let value= a.name - b.name; console.log(value); return value}
});
> 2 NaN
Subtraction between strings returns NaN.
Echoing Alejadro's answer, the right approach is:
products.sort( (a,b) => a.name > b.name ? 1 : -1 )

A typescript sorting method modifier using a custom function to return a sorted string in either ascending or descending order
const data = ["jane", "mike", "salome", "ababus", "buisa", "dennis"];
const sortStringArray = (stringArray: string[], mode?: 'desc' | 'asc') => {
if (!mode || mode === 'asc') {
return stringArray.sort((a, b) => a.localeCompare(b))
}
return stringArray.sort((a, b) => b.localeCompare(a))
}
console.log(sortStringArray(data, 'desc'));// [ 'salome', 'mike', 'jane', 'dennis', 'buisa', 'ababus' ]
console.log(sortStringArray(data, 'asc')); // [ 'ababus', 'buisa', 'dennis', 'jane', 'mike', 'salome' ]

There should be ascending and descending orders functions
if (order === 'asc') {
return a.localeCompare(b);
}
return b.localeCompare(a);

If you want to control locales (or case or accent), then use Intl.collator:
const collator = new Intl.Collator();
list.sort((a, b) => collator.compare(a.attr, b.attr));
You can construct a collator like:
new Intl.Collator("en");
new Intl.Collator("en", {sensitivity: "case"});
...
See the above link for documentation.
Note: unlike some other solutions, it handles null, undefined the JavaScript way, i.e., moves them to the end.

Use sort() straightforward without any - or <
const areas = ['hill', 'beach', 'desert', 'mountain']
console.log(areas.sort())
// To print in descending way
console.log(areas.sort().reverse())

In your operation in your initial question, you are performing the following operation:
item1.attr - item2.attr
So, assuming those are numbers (i.e. item1.attr = "1", item2.attr = "2") You still may use the "===" operator (or other strict evaluators) provided that you ensure type. The following should work:
return parseInt(item1.attr) - parseInt(item2.attr);
If they are alphaNumeric, then do use localCompare().

list.sort(function(item1, item2){
return +(item1.attr > item2.attr) || +(item1.attr === item2.attr) - 1;
})
How they work samples:
+('aaa'>'bbb')||+('aaa'==='bbb')-1
+(false)||+(false)-1
0||0-1
-1
+('bbb'>'aaa')||+('bbb'==='aaa')-1
+(true)||+(false)-1
1||0-1
1
+('aaa'>'aaa')||+('aaa'==='aaa')-1
+(false)||+(true)-1
0||1-1
0

var str = ['v','a','da','c','k','l']
var b = str.join('').split('').sort().reverse().join('')
console.log(b)

<!doctype html>
<html>
<body>
<p id = "myString">zyxtspqnmdba</p>
<p id = "orderedString"></p>
<script>
var myString = document.getElementById("myString").innerHTML;
orderString(myString);
function orderString(str) {
var i = 0;
var myArray = str.split("");
while (i < str.length){
var j = i + 1;
while (j < str.length) {
if (myArray[j] < myArray[i]){
var temp = myArray[i];
myArray[i] = myArray[j];
myArray[j] = temp;
}
j++;
}
i++;
}
var newString = myArray.join("");
document.getElementById("orderedString").innerHTML = newString;
}
</script>
</body>
</html>

Related

getting wrong answer for finding largest of 2 numbers using readline-sync in js [duplicate]

I store some parameters client-side in HTML and then need to compare them as integers. Unfortunately I have come across a serious bug that I cannot explain. The bug seems to be that my JS reads parameters as strings rather than integers, causing my integer comparisons to fail.
I have generated a small example of the error, which I also can't explain. The following returns 'true' when run:
console.log("2" > "10")
Parse the string into an integer using parseInt:
javascript:alert(parseInt("2", 10)>parseInt("10", 10))
Checking that strings are integers is separate to comparing if one is greater or lesser than another. You should always compare number with number and string with string as the algorithm for dealing with mixed types not easy to remember.
'00100' < '1' // true
as they are both strings so only the first zero of '00100' is compared to '1' and because it's charCode is lower, it evaluates as lower.
However:
'00100' < 1 // false
as the RHS is a number, the LHS is converted to number before the comparision.
A simple integer check is:
function isInt(n) {
return /^[+-]?\d+$/.test(n);
}
It doesn't matter if n is a number or integer, it will be converted to a string before the test.
If you really care about performance, then:
var isInt = (function() {
var re = /^[+-]?\d+$/;
return function(n) {
return re.test(n);
}
}());
Noting that numbers like 1.0 will return false. If you want to count such numbers as integers too, then:
var isInt = (function() {
var re = /^[+-]?\d+$/;
var re2 = /\.0+$/;
return function(n) {
return re.test((''+ n).replace(re2,''));
}
}());
Once that test is passed, converting to number for comparison can use a number of methods. I don't like parseInt() because it will truncate floats to make them look like ints, so all the following will be "equal":
parseInt(2.9) == parseInt('002',10) == parseInt('2wewe')
and so on.
Once numbers are tested as integers, you can use the unary + operator to convert them to numbers in the comparision:
if (isInt(a) && isInt(b)) {
if (+a < +b) {
// a and b are integers and a is less than b
}
}
Other methods are:
Number(a); // liked by some because it's clear what is happening
a * 1 // Not really obvious but it works, I don't like it
Comparing Numbers to String Equivalents Without Using parseInt
console.log(Number('2') > Number('10'));
console.log( ('2'/1) > ('10'/1) );
var item = { id: 998 }, id = '998';
var isEqual = (item.id.toString() === id.toString());
isEqual;
use parseInt and compare like below:
javascript:alert(parseInt("2")>parseInt("10"))
Always remember when we compare two strings.
the comparison happens on chacracter basis.
so '2' > '12' is true because the comparison will happen as
'2' > '1' and in alphabetical way '2' is always greater than '1' as unicode.
SO it will comeout true.
I hope this helps.
You can use Number() function also since it converts the object argument to a number that represents the object's value.
Eg: javascript:alert( Number("2") > Number("10"))
+ operator will coerce the string to a number.
console.log( +"2" > +"10" )
The answer is simple. Just divide string by 1.
Examples:
"2" > "10" - true
but
"2"/1 > "10"/1 - false
Also you can check if string value really is number:
!isNaN("1"/1) - true (number)
!isNaN("1a"/1) - false (string)
!isNaN("01"/1) - true (number)
!isNaN(" 1"/1) - true (number)
!isNaN(" 1abc"/1) - false (string)
But
!isNaN(""/1) - true (but string)
Solution
number !== "" && !isNaN(number/1)
The alert() wants to display a string, so it will interpret "2">"10" as a string.
Use the following:
var greater = parseInt("2") > parseInt("10");
alert("Is greater than? " + greater);
var less = parseInt("2") < parseInt("10");
alert("Is less than? " + less);

Sorting alphanumeric array with java-script

I have a alphanumeric array -
var a1 =["3800tl-24G-2XG-PoEP", "3500", "stack-TA", "2620-48-PoEP", "2120", "3800tl-24G-2XG-PoEP"];
I tried sorting this array using natsort which gives me the correct output -
`Array ( [4] => 2120 [3] => 2620-48-PoEP [1] => 3500 [0] => 3800tl-24G-2XG-PoEP [5] => 3800tl-24G-2XG-PoEP [2] => stack-TA )`
I tried doing the same with javascript -
var a1 =["3800tl-24G-2XG-PoEP", "3500", "stack-TA", "2620-48-PoEP", "2120", "3800tl-24G-2XG-PoEP"];
var a2 = a1.sort(function(a,b){
var charPart = [a.substring(0,1), b.substring(0,1)],
numPart = [a.substring(1)*1, b.substring(1)*1];
if(charPart[0] < charPart[1]) return -1;
else if(charPart[0] > charPart[1]) return 1;
else{ //(charPart[0] == charPart[1]){
if(numPart[0] < numPart[1]) return -1;
else if(numPart[0] > numPart[1]) return 1;
return 0;
}
});
$('#r').html(a2.toString())
which gives me the output -
2620-48-PoEP,2120,3800tl-24G-2XG-PoEP,3500,3800tl-24G-2XG-PoEP,stack-TA.
Why the javascript code is not taking 2120 as the 1st element.
What is wrong with the javascript code ?
Replace this line...
[a.substring(1)*1, b.substring(1)*1]
... with ...
[parseInt(a.substring(1), 10), parseInt(b.substring(1), 10)]
... and it will work (demo), though the sorting function still won't be as universal as possible.
Explanation: a.substr(1)*1 (better written as just +a.substr(1), btw) casts that substring into Number type. In theory, that should give you a real number. In JS practice, however, it'll fail (giving NaN) if the rest of the string is not a complete representation of a number, having nothing except but that number.
For example, for 2620-48-PoEP line, '620-48-PoEP' is cast to Number. Now JS cannot interpret it as 620 - it'll be too bold for it, as there are some other meaningful (=non-whitespace) characters there. So instead of giving you 620, it gives you NaN.
Now, any comparison between NaN and NaN is false (expect !=) - and that'll essentially make your sorting function gone haywire.
As a sidenote, array.sort updates the object in-place - you don't need to use another reference to it (a2 variable), you can work with a1.

Sorting of Javascript array [duplicate]

This question already has answers here:
How to perform case-insensitive sorting array of string in JavaScript?
(15 answers)
Closed 8 years ago.
Say if I had a Javascript array as following and I sorted the array before printing it to console
var f = ["Salil", "Sam", "Anil", "John", "ajay", "Baba"];
f.sort();
console.log(f);
It prints as follows
Anil,Baba,John,Salil,Sameer,ajay
Why does ajay prints last, how does the case of the first letter plays a role?
thanks,
Salil
The lower-case letters come after the upper-case letters in Unicode (in the base code page at least).
You can sort so that you pay no attention to case:
f.sort(function(w1, w2) {
w1 = w1.toLowerCase(); w2 = w2.toLowerCase();
return w1 < w2 ? -1 : w2 < w1 ? 1 : 0;
}
More complicated collations require more complicated comparison routines. The function you pass to .sort() should take two arguments, those being two members of the array to be compared. It's pretty important that your comparator function always returns the same answer for any two strings (in either order!), or else things can get weird.
Because lowercase letters come after uppercase letters.
To enable caseless sorting, try this:
f.sort(function(a,b) {return a.toLowerCase() < b.toLowerCase() ? -1 : 1;});
Here is an example in long form to make it a bit clearer with the whole if/else if/else chain for returning -1/1/0 from the sort comparison:
var caseInsensitiveComparator = function(a, b) {
var _a = a.toLowerCase(),
_b = b.toLowerCase();
if (_a < _b) {
return -1;
} else if (_b < _a) {
return 1;
} else {
return 0;
}
};
var f = ["Salil", "Sam", "Anil", "John", "ajay", "Baba"];
f.sort(caseInsensitiveComparator);
console.log(f);
> ["ajay", "Anil", "Baba", "John", "Salil", "Sam"]
The Mozilla Developer Network page on Array.prototype.sort has an explanation on how sort works.

How to compare Unicode strings in Javascript?

When I wrote in JavaScript "Ł" > "Z" it returns true. In Unicode order it should be of course false. How to fix this? My site is using UTF-8.
You can use Intl.Collator or String.prototype.localeCompare, introduced by ECMAScript Internationalization API:
"Ł".localeCompare("Z", "pl"); // -1
new Intl.Collator("pl").compare("Ł","Z"); // -1
-1 means that Ł comes before Z, like you want.
Note it only works on latest browsers, though.
Here is an example for the french alphabet that could help you for a custom sort:
var alpha = function(alphabet, dir, caseSensitive){
return function(a, b){
var pos = 0,
min = Math.min(a.length, b.length);
dir = dir || 1;
caseSensitive = caseSensitive || false;
if(!caseSensitive){
a = a.toLowerCase();
b = b.toLowerCase();
}
while(a.charAt(pos) === b.charAt(pos) && pos < min){ pos++; }
return alphabet.indexOf(a.charAt(pos)) > alphabet.indexOf(b.charAt(pos)) ?
dir:-dir;
};
};
To use it on an array of strings a:
a.sort(
alpha('ABCDEFGHIJKLMNOPQRSTUVWXYZaàâäbcçdeéèêëfghiïîjklmnñoôöpqrstuûüvwxyÿz')
);
Add 1 or -1 as the second parameter of alpha() to sort ascending or descending.
Add true as the 3rd parameter to sort case sensitive.
You may need to add numbers and special chars to the alphabet list
You may be able to build your own sorting function using localeCompare() that - at least according to the MDC article on the topic - should sort things correctly.
If that doesn't work out, here is an interesting SO question where the OP employs string replacement to build a "brute-force" sorting mechanism.
Also in that question, the OP shows how to build a custom textExtract function for the jQuery tablesorter plugin that does locale-aware sorting - maybe also worth a look.
Edit: As a totally far-out idea - I have no idea whether this is feasible at all, especially because of performance concerns - if you are working with PHP/mySQL on the back-end anyway, I would like to mention the possibility of sending an Ajax query to a mySQL instance to have it sorted there. mySQL is great at sorting locale aware data, because you can force sorting operations into a specific collation using e.g. ORDER BY xyz COLLATE utf8_polish_ci, COLLATE utf8_german_ci.... those collations would take care of all sorting woes at once.
Mic's code improved for non-mentioned chars:
var alpha = function(alphabet, dir, caseSensitive){
dir = dir || 1;
function compareLetters(a, b) {
var ia = alphabet.indexOf(a);
var ib = alphabet.indexOf(b);
if(ia === -1 || ib === -1) {
if(ib !== -1)
return a > 'a';
if(ia !== -1)
return 'a' > b;
return a > b;
}
return ia > ib;
}
return function(a, b){
var pos = 0;
var min = Math.min(a.length, b.length);
caseSensitive = caseSensitive || false;
if(!caseSensitive){
a = a.toLowerCase();
b = b.toLowerCase();
}
while(a.charAt(pos) === b.charAt(pos) && pos < min){ pos++; }
return compareLetters(a.charAt(pos), b.charAt(pos)) ? dir:-dir;
};
};
function assert(bCondition, sErrorMessage) {
if (!bCondition) {
throw new Error(sErrorMessage);
}
}
assert(alpha("bac")("a", "b") === 1, "b is first than a");
assert(alpha("abc")("ac", "a") === 1, "shorter string is first than longer string");
assert(alpha("abc")("1abc", "0abc") === 1, "non-mentioned chars are compared as normal");
assert(alpha("abc")("0abc", "1abc") === -1, "non-mentioned chars are compared as normal [2]");
assert(alpha("abc")("0abc", "bbc") === -1, "non-mentioned chars are compared with mentioned chars in special way");
assert(alpha("abc")("zabc", "abc") === 1, "non-mentioned chars are compared with mentioned chars in special way [2]");
You have to keep two sortkey strings. One is for primary order, where German ä=a (primary a->a) and French é=e (primary sortkey e->e) and one for secondary order, where ä comes after a (translating a->azzzz in secondary key) or é comes after e (secondary key e->ezzzz). Especially in Czech some letters are variations of a letter (áéí…) whereas others stand in their full right in the list (ABCČD…GHChI…RŘSŠT…). Plus the problem to consider digraphs a single letters (primary ch->hzzzz). No trivial problem, and there should be a solution within JS.
Funny, I have to think about that problem and finished searching here, because it came in mind, that I can use my own javascript module. I wrote a module to generate a clean URL, therefor I have to translitate the input string... (http://pid.github.io/speakingurl/)
var mySlug = require('speakingurl').createSlug({
maintainCase: true,
separator: " "
});
var input = "Schöner Titel läßt grüßen!? Bel été !";
var result;
slug = mySlug(input);
console.log(result); // Output: "Schoener Titel laesst gruessen bel ete"
Now you can sort with this results. You can ex. store the original titel in the field "title" and the field for sorting in "title_sort" with the result of mySlug.

Find the longest common starting substring in a set of strings [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
This is a challenge to come up with the most elegant JavaScript, Ruby or other solution to a relatively trivial problem.
This problem is a more specific case of the Longest common substring problem. I need to only find the longest common starting substring in an array. This greatly simplifies the problem.
For example, the longest substring in [interspecies, interstelar, interstate] is "inters". However, I don't need to find "ific" in [specifics, terrific].
I've solved the problem by quickly coding up a solution in JavaScript as a part of my answer about shell-like tab-completion (test page here). Here is that solution, slightly tweaked:
function common_substring(data) {
var i, ch, memo, idx = 0
do {
memo = null
for (i=0; i < data.length; i++) {
ch = data[i].charAt(idx)
if (!ch) break
if (!memo) memo = ch
else if (ch != memo) break
}
} while (i == data.length && idx < data.length && ++idx)
return (data[0] || '').slice(0, idx)
}
This code is available in this Gist along with a similar solution in Ruby. You can clone the gist as a git repo to try it out:
$ git clone git://gist.github.com/257891.git substring-challenge
I'm not very happy with those solutions. I have a feeling they might be solved with more elegance and less execution complexity—that's why I'm posting this challenge.
I'm going to accept as an answer the solution I find the most elegant or concise. Here is for instance a crazy Ruby hack I come up with—defining the & operator on String:
# works with Ruby 1.8.7 and above
class String
def &(other)
difference = other.to_str.each_char.with_index.find { |ch, idx|
self[idx].nil? or ch != self[idx].chr
}
difference ? self[0, difference.last] : self
end
end
class Array
def common_substring
self.inject(nil) { |memo, str| memo.nil? ? str : memo & str }.to_s
end
end
Solutions in JavaScript or Ruby are preferred, but you can show off clever solution in other languages as long as you explain what's going on. Only code from standard library please.
Update: my favorite solutions
I've chosen the JavaScript sorting solution by kennebec as the "answer" because it struck me as both unexpected and genius. If we disregard the complexity of actual sorting (let's imagine it's infinitely optimized by the language implementation), the complexity of the solution is just comparing two strings.
Other great solutions:
"regex greed" by FM takes a minute or two to grasp, but then the elegance of it hits you. Yehuda Katz also made a regex solution, but it's more complex
commonprefix in Python — Roberto Bonvallet used a feature made for handling filesystem paths to solve this problem
Haskell one-liner is short as if it were compressed, and beautiful
the straightforward Ruby one-liner
Thanks for participating! As you can see from the comments, I learned a lot (even about Ruby).
It's a matter of taste, but this is a simple javascript version:
It sorts the array, and then looks just at the first and last items.
//longest common starting substring in an array
function sharedStart(array){
var A= array.concat().sort(),
a1= A[0], a2= A[A.length-1], L= a1.length, i= 0;
while(i<L && a1.charAt(i)=== a2.charAt(i)) i++;
return a1.substring(0, i);
}
DEMOS
sharedStart(['interspecies', 'interstelar', 'interstate']) //=> 'inters'
sharedStart(['throne', 'throne']) //=> 'throne'
sharedStart(['throne', 'dungeon']) //=> ''
sharedStart(['cheese']) //=> 'cheese'
sharedStart([]) //=> ''
sharedStart(['prefix', 'suffix']) //=> ''
In Python:
>>> from os.path import commonprefix
>>> commonprefix('interspecies interstelar interstate'.split())
'inters'
Ruby one-liner:
l=strings.inject{|l,s| l=l.chop while l!=s[0...l.length];l}
My Haskell one-liner:
import Data.List
commonPre :: [String] -> String
commonPre = map head . takeWhile (\(x:xs)-> all (==x) xs) . transpose
EDIT: barkmadley gave a good explanation of the code below. I'd also add that haskell uses lazy evaluation, so we can be lazy about our use of transpose; it will only transpose our lists as far as necessary to find the end of the common prefix.
You just need to traverse all strings until they differ, then take the substring up to this point.
Pseudocode:
loop for i upfrom 0
while all strings[i] are equal
finally return substring[0..i]
Common Lisp:
(defun longest-common-starting-substring (&rest strings)
(loop for i from 0 below (apply #'min (mapcar #'length strings))
while (apply #'char=
(mapcar (lambda (string) (aref string i))
strings))
finally (return (subseq (first strings) 0 i))))
Yet another way to do it: use regex greed.
words = %w(interspecies interstelar interstate)
j = '='
str = ['', *words].join(j)
re = "[^#{j}]*"
str =~ /\A
(?: #{j} ( #{re} ) #{re} )
(?: #{j} \1 #{re} )*
\z/x
p $1
And the one-liner, courtesy of mislav (50 characters):
p ARGV.join(' ').match(/^(\w*)\w*(?: \1\w*)*$/)[1]
In Python I wouldn't use anything but the existing commonprefix function I showed in another answer, but I couldn't help to reinvent the wheel :P. This is my iterator-based approach:
>>> a = 'interspecies interstelar interstate'.split()
>>>
>>> from itertools import takewhile, chain, izip as zip, imap as map
>>> ''.join(chain(*takewhile(lambda s: len(s) == 1, map(set, zip(*a)))))
'inters'
Edit: Explanation of how this works.
zip generates tuples of elements taking one of each item of a at a time:
In [6]: list(zip(*a)) # here I use list() to expand the iterator
Out[6]:
[('i', 'i', 'i'),
('n', 'n', 'n'),
('t', 't', 't'),
('e', 'e', 'e'),
('r', 'r', 'r'),
('s', 's', 's'),
('p', 't', 't'),
('e', 'e', 'a'),
('c', 'l', 't'),
('i', 'a', 'e')]
By mapping set over these items, I get a series of unique letters:
In [7]: list(map(set, _)) # _ means the result of the last statement above
Out[7]:
[set(['i']),
set(['n']),
set(['t']),
set(['e']),
set(['r']),
set(['s']),
set(['p', 't']),
set(['a', 'e']),
set(['c', 'l', 't']),
set(['a', 'e', 'i'])]
takewhile(predicate, items) takes elements from this while the predicate is True; in this particular case, when the sets have one element, i.e. all the words have the same letter at that position:
In [8]: list(takewhile(lambda s: len(s) == 1, _))
Out[8]:
[set(['i']),
set(['n']),
set(['t']),
set(['e']),
set(['r']),
set(['s'])]
At this point we have an iterable of sets, each containing one letter of the prefix we were looking for. To construct the string, we chain them into a single iterable, from which we get the letters to join into the final string.
The magic of using iterators is that all items are generated on demand, so when takewhile stops asking for items, the zipping stops at that point and no unnecessary work is done. Each function call in my one-liner has a implicit for and an implicit break.
This is probably not the most concise solution (depends if you already have a library for this), but one elegant method is to use a trie. I use tries for implementing tab completion in my Scheme interpreter:
http://github.com/jcoglan/heist/blob/master/lib/trie.rb
For example:
tree = Trie.new
%w[interspecies interstelar interstate].each { |s| tree[s] = true }
tree.longest_prefix('')
#=> "inters"
I also use them for matching channel names with wildcards for the Bayeux protocol; see these:
http://github.com/jcoglan/faye/blob/master/client/channel.js
http://github.com/jcoglan/faye/blob/master/lib/faye/channel.rb
Just for the fun of it, here's a version written in (SWI-)PROLOG:
common_pre([[C|Cs]|Ss], [C|Res]) :-
maplist(head_tail(C), [[C|Cs]|Ss], RemSs), !,
common_pre(RemSs, Res).
common_pre(_, []).
head_tail(H, [H|T], T).
Running:
?- S=["interspecies", "interstelar", "interstate"], common_pre(S, CP), string_to_list(CPString, CP).
Gives:
CP = [105, 110, 116, 101, 114, 115],
CPString = "inters".
Explanation:
(SWI-)PROLOG treats strings as lists of character codes (numbers). All the predicate common_pre/2 does is recursively pattern-match to select the first code (C) from the head of the first list (string, [C|Cs]) in the list of all lists (all strings, [[C|Cs]|Ss]), and appends the matching code C to the result iff it is common to all (remaining) heads of all lists (strings), else it terminates.
Nice, clean, simple and efficient... :)
A javascript version based on #Svante's algorithm:
function commonSubstring(words){
var iChar, iWord,
refWord = words[0],
lRefWord = refWord.length,
lWords = words.length;
for (iChar = 0; iChar < lRefWord; iChar += 1) {
for (iWord = 1; iWord < lWords; iWord += 1) {
if (refWord[iChar] !== words[iWord][iChar]) {
return refWord.substring(0, iChar);
}
}
}
return refWord;
}
Combining answers by kennebec, Florian F and jberryman yields the following Haskell one-liner:
commonPrefix l = map fst . takeWhile (uncurry (==)) $ zip (minimum l) (maximum l)
With Control.Arrow one can get a point-free form:
commonPrefix = map fst . takeWhile (uncurry (==)) . uncurry zip . (minimum &&& maximum)
Python 2.6 (r26:66714, Oct 4 2008, 02:48:43)
>>> a = ['interspecies', 'interstelar', 'interstate']
>>> print a[0][:max(
[i for i in range(min(map(len, a)))
if len(set(map(lambda e: e[i], a))) == 1]
) + 1]
inters
i for i in range(min(map(len, a))), number of maximum lookups can't be greater than the length of the shortest string; in this example this would evaluate to [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
len(set(map(lambda e: e[i], a))), 1) create an array of the i-thcharacter for each string in the list; 2) make a set out of it; 3) determine the size of the set
[i for i in range(min(map(len, a))) if len(set(map(lambda e: e[i], a))) == 1], include just the characters, for which the size of the set is 1 (all characters at that position were the same ..); here it would evaluate to [0, 1, 2, 3, 4, 5]
finally take the max, add one, and get the substring ...
Note: the above does not work for a = ['intersyate', 'intersxate', 'interstate', 'intersrate'], but this would:
>>> index = len(
filter(lambda l: l[0] == l[1],
[ x for x in enumerate(
[i for i in range(min(map(len, a)))
if len(set(map(lambda e: e[i], a))) == 1]
)]))
>>> a[0][:index]
inters
Doesn't seem that complicated if you're not too concerned about ultimate performance:
def common_substring(data)
data.inject { |m, s| s[0,(0..m.length).find { |i| m[i] != s[i] }.to_i] }
end
One of the useful features of inject is the ability to pre-seed with the first element of the array being interated over. This avoids the nil memo check.
puts common_substring(%w[ interspecies interstelar interstate ]).inspect
# => "inters"
puts common_substring(%w[ feet feel feeble ]).inspect
# => "fee"
puts common_substring(%w[ fine firkin fail ]).inspect
# => "f"
puts common_substring(%w[ alpha bravo charlie ]).inspect
# => ""
puts common_substring(%w[ fork ]).inspect
# => "fork"
puts common_substring(%w[ fork forks ]).inspect
# => "fork"
Update: If golf is the game here, then 67 characters:
def f(d)d.inject{|m,s|s[0,(0..m.size).find{|i|m[i]!=s[i]}.to_i]}end
This one is very similar to Roberto Bonvallet's solution, except in ruby.
chars = %w[interspecies interstelar interstate].map {|w| w.split('') }
chars[0].zip(*chars[1..-1]).map { |c| c.uniq }.take_while { |c| c.size == 1 }.join
The first line replaces each word with an array of chars. Next, I use zip to create this data structure:
[["i", "i", "i"], ["n", "n", "n"], ["t", "t", "t"], ...
map and uniq reduce this to [["i"],["n"],["t"], ...
take_while pulls the chars off the array until it finds one where the size isn't one (meaning not all chars were the same). Finally, I join them back together.
The accepted solution is broken (for example, it returns a for strings like ['a', 'ba']). The fix is very simple, you literally have to change only 3 characters (from indexOf(tem1) == -1 to indexOf(tem1) != 0) and the function would work as expected.
Unfortunately, when I tried to edit the answer to fix the typo, SO told me that "edits must be at least 6 characters". I could change more then those 3 chars, by improving naming and readability but that feels like a little bit too much.
So, below is a fixed and improved (at least from my point of view) version of the kennebec's solution:
function commonPrefix(words) {
max_word = words.reduce(function(a, b) { return a > b ? a : b });
prefix = words.reduce(function(a, b) { return a > b ? b : a }); // min word
while(max_word.indexOf(prefix) != 0) {
prefix = prefix.slice(0, -1);
}
return prefix;
}
(on jsFiddle)
Note, that it uses reduce method (JavaScript 1.8) in order to find alphanumeric max / min instead of sorting the array and then fetching the first and the last elements of it.
While reading these answers with all the fancy functional programming, sorting and regexes and whatnot, I just thought: what's wrong a little bit of C? So here's a goofy looking little program.
#include <stdio.h>
int main (int argc, char *argv[])
{
int i = -1, j, c;
if (argc < 2)
return 1;
while (c = argv[1][++i])
for (j = 2; j < argc; j++)
if (argv[j][i] != c)
goto out;
out:
printf("Longest common prefix: %.*s\n", i, argv[1]);
}
Compile it, run it with your list of strings as command line arguments, then upvote me for using goto!
Here's a solution using regular expressions in Ruby:
def build_regex(string)
arr = []
arr << string.dup while string.chop!
Regexp.new("^(#{arr.join("|")})")
end
def substring(first, *strings)
strings.inject(first) do |accum, string|
build_regex(accum).match(string)[0]
end
end
I would do the following:
Take the first string of the array as the initial starting substring.
Take the next string of the array and compare the characters until the end of one of the strings is reached or a mismatch is found. If a mismatch is found, reduce starting substring to the length where the mismatch was found.
Repeat step 2 until all strings have been tested.
Here’s a JavaScript implementation:
var array = ["interspecies", "interstelar", "interstate"],
prefix = array[0],
len = prefix.length;
for (i=1; i<array.length; i++) {
for (j=0, len=Math.min(len,array[j].length); j<len; j++) {
if (prefix[j] != array[i][j]) {
len = j;
prefix = prefix.substr(0, len);
break;
}
}
}
Instead of sorting, you could just get the min and max of the strings.
To me, elegance in a computer program is a balance of speed and simplicity.
It should not do unnecessary computation, and it should be simple enough to make its correctness evident.
I could call the sorting solution "clever", but not "elegant".
Oftentimes it's more elegant to use a mature open source library instead of rolling your own. Then, if it doesn't completely suit your needs, you can extend it or modify it to improve it, and let the community decide if that belongs in the library.
diff-lcs is a good Ruby gem for least common substring.
My solution in Java:
public static String compute(Collection<String> strings) {
if(strings.isEmpty()) return "";
Set<Character> v = new HashSet<Character>();
int i = 0;
try {
while(true) {
for(String s : strings) v.add(s.charAt(i));
if(v.size() > 1) break;
v.clear();
i++;
}
} catch(StringIndexOutOfBoundsException ex) {}
return strings.iterator().next().substring(0, i);
}
Golfed JS solution just for fun:
w=["hello", "hell", "helen"];
c=w.reduce(function(p,c){
for(r="",i=0;p[i]==c[i];r+=p[i],i++){}
return r;
});
Here's an efficient solution in ruby. I based the idea of the strategy for a hi/lo guessing game where you iteratively zero in on the longest prefix.
Someone correct me if I'm wrong, but I think the complexity is O(n log n), where n is the length of the shortest string and the number of strings is considered a constant.
def common(strings)
lo = 0
hi = strings.map(&:length).min - 1
return '' if hi < lo
guess, last_guess = lo, hi
while guess != last_guess
last_guess = guess
guess = lo + ((hi - lo) / 2.0).ceil
if strings.map { |s| s[0..guess] }.uniq.length == 1
lo = guess
else
hi = guess
end
end
strings.map { |s| s[0...guess] }.uniq.length == 1 ? strings.first[0...guess] : ''
end
And some checks that it works:
>> common %w{ interspecies interstelar interstate }
=> "inters"
>> common %w{ dog dalmation }
=> "d"
>> common %w{ asdf qwerty }
=> ""
>> common ['', 'asdf']
=> ""
Fun alternative Ruby solution:
def common_prefix(*strings)
chars = strings.map(&:chars)
length = chars.first.zip( *chars[1..-1] ).index{ |a| a.uniq.length>1 }
strings.first[0,length]
end
p common_prefix( 'foon', 'foost', 'forlorn' ) #=> "fo"
p common_prefix( 'foost', 'foobar', 'foon' ) #=> "foo"
p common_prefix( 'a','b' ) #=> ""
It might help speed if you used chars = strings.sort_by(&:length).map(&:chars), since the shorter the first string, the shorter the arrays created by zip. However, if you cared about speed, you probably shouldn't use this solution anyhow. :)
Javascript clone of AShelly's excellent answer.
Requires Array#reduce which is supported only in firefox.
var strings = ["interspecies", "intermediate", "interrogation"]
var sub = strings.reduce(function(l,r) {
while(l!=r.slice(0,l.length)) {
l = l.slice(0, -1);
}
return l;
});
This is by no means elegant, but if you want concise:
Ruby, 71 chars
def f(a)b=a[0];b[0,(0..b.size).find{|n|a.any?{|i|i[0,n]!=b[0,n]}}-1]end
If you want that unrolled it looks like this:
def f(words)
first_word = words[0];
first_word[0, (0..(first_word.size)).find { |num_chars|
words.any? { |word| word[0, num_chars] != first_word[0, num_chars] }
} - 1]
end
It's not code golf, but you asked for somewhat elegant, and I tend to think recursion is fun. Java.
/** Recursively find the common prefix. */
public String findCommonPrefix(String[] strings) {
int minLength = findMinLength(strings);
if (isFirstCharacterSame(strings)) {
return strings[0].charAt(0) + findCommonPrefix(removeFirstCharacter(strings));
} else {
return "";
}
}
/** Get the minimum length of a string in strings[]. */
private int findMinLength(final String[] strings) {
int length = strings[0].size();
for (String string : strings) {
if (string.size() < length) {
length = string.size();
}
}
return length;
}
/** Compare the first character of all strings. */
private boolean isFirstCharacterSame(String[] strings) {
char c = string[0].charAt(0);
for (String string : strings) {
if (c != string.charAt(0)) return false;
}
return true;
}
/** Remove the first character of each string in the array,
and return a new array with the results. */
private String[] removeFirstCharacter(String[] source) {
String[] result = new String[source.length];
for (int i=0; i<result.length; i++) {
result[i] = source[i].substring(1);
}
return result;
}
A ruby version based on #Svante's algorithm. Runs ~3x as fast as my first one.
def common_prefix set
i=0
rest=set[1..-1]
set[0].each_byte{|c|
rest.each{|e|return set[0][0...i] if e[i]!=c}
i+=1
}
set
end
My Javascript solution:
IMOP, using sort is too tricky.
My solution is compare letter by letter through looping the array.
Return string if letter is not macthed.
This is my solution:
var longestCommonPrefix = function(strs){
if(strs.length < 1){
return '';
}
var p = 0, i = 0, c = strs[0][0];
while(p < strs[i].length && strs[i][p] === c){
i++;
if(i === strs.length){
i = 0;
p++;
c = strs[0][p];
}
}
return strs[0].substr(0, p);
};
Realizing the risk of this turning into a match of code golf (or is that the intention?), here's my solution using sed, copied from my answer to another SO question and shortened to 36 chars (30 of which are the actual sed expression). It expects the strings (each on a seperate line) to be supplied on standard input or in files passed as additional arguments.
sed 'N;s/^\(.*\).*\n\1.*$/\1\n\1/;D'
A script with sed in the shebang line weighs in at 45 chars:
#!/bin/sed -f
N;s/^\(.*\).*\n\1.*$/\1\n\1/;D
A test run of the script (named longestprefix), with strings supplied as a "here document":
$ ./longestprefix <<EOF
> interspecies
> interstelar
> interstate
> EOF
inters
$

Categories