How to compare Unicode strings in Javascript? - javascript

When I wrote in JavaScript "Ł" > "Z" it returns true. In Unicode order it should be of course false. How to fix this? My site is using UTF-8.

You can use Intl.Collator or String.prototype.localeCompare, introduced by ECMAScript Internationalization API:
"Ł".localeCompare("Z", "pl"); // -1
new Intl.Collator("pl").compare("Ł","Z"); // -1
-1 means that Ł comes before Z, like you want.
Note it only works on latest browsers, though.

Here is an example for the french alphabet that could help you for a custom sort:
var alpha = function(alphabet, dir, caseSensitive){
return function(a, b){
var pos = 0,
min = Math.min(a.length, b.length);
dir = dir || 1;
caseSensitive = caseSensitive || false;
if(!caseSensitive){
a = a.toLowerCase();
b = b.toLowerCase();
}
while(a.charAt(pos) === b.charAt(pos) && pos < min){ pos++; }
return alphabet.indexOf(a.charAt(pos)) > alphabet.indexOf(b.charAt(pos)) ?
dir:-dir;
};
};
To use it on an array of strings a:
a.sort(
alpha('ABCDEFGHIJKLMNOPQRSTUVWXYZaàâäbcçdeéèêëfghiïîjklmnñoôöpqrstuûüvwxyÿz')
);
Add 1 or -1 as the second parameter of alpha() to sort ascending or descending.
Add true as the 3rd parameter to sort case sensitive.
You may need to add numbers and special chars to the alphabet list

You may be able to build your own sorting function using localeCompare() that - at least according to the MDC article on the topic - should sort things correctly.
If that doesn't work out, here is an interesting SO question where the OP employs string replacement to build a "brute-force" sorting mechanism.
Also in that question, the OP shows how to build a custom textExtract function for the jQuery tablesorter plugin that does locale-aware sorting - maybe also worth a look.
Edit: As a totally far-out idea - I have no idea whether this is feasible at all, especially because of performance concerns - if you are working with PHP/mySQL on the back-end anyway, I would like to mention the possibility of sending an Ajax query to a mySQL instance to have it sorted there. mySQL is great at sorting locale aware data, because you can force sorting operations into a specific collation using e.g. ORDER BY xyz COLLATE utf8_polish_ci, COLLATE utf8_german_ci.... those collations would take care of all sorting woes at once.

Mic's code improved for non-mentioned chars:
var alpha = function(alphabet, dir, caseSensitive){
dir = dir || 1;
function compareLetters(a, b) {
var ia = alphabet.indexOf(a);
var ib = alphabet.indexOf(b);
if(ia === -1 || ib === -1) {
if(ib !== -1)
return a > 'a';
if(ia !== -1)
return 'a' > b;
return a > b;
}
return ia > ib;
}
return function(a, b){
var pos = 0;
var min = Math.min(a.length, b.length);
caseSensitive = caseSensitive || false;
if(!caseSensitive){
a = a.toLowerCase();
b = b.toLowerCase();
}
while(a.charAt(pos) === b.charAt(pos) && pos < min){ pos++; }
return compareLetters(a.charAt(pos), b.charAt(pos)) ? dir:-dir;
};
};
function assert(bCondition, sErrorMessage) {
if (!bCondition) {
throw new Error(sErrorMessage);
}
}
assert(alpha("bac")("a", "b") === 1, "b is first than a");
assert(alpha("abc")("ac", "a") === 1, "shorter string is first than longer string");
assert(alpha("abc")("1abc", "0abc") === 1, "non-mentioned chars are compared as normal");
assert(alpha("abc")("0abc", "1abc") === -1, "non-mentioned chars are compared as normal [2]");
assert(alpha("abc")("0abc", "bbc") === -1, "non-mentioned chars are compared with mentioned chars in special way");
assert(alpha("abc")("zabc", "abc") === 1, "non-mentioned chars are compared with mentioned chars in special way [2]");

You have to keep two sortkey strings. One is for primary order, where German ä=a (primary a->a) and French é=e (primary sortkey e->e) and one for secondary order, where ä comes after a (translating a->azzzz in secondary key) or é comes after e (secondary key e->ezzzz). Especially in Czech some letters are variations of a letter (áéí…) whereas others stand in their full right in the list (ABCČD…GHChI…RŘSŠT…). Plus the problem to consider digraphs a single letters (primary ch->hzzzz). No trivial problem, and there should be a solution within JS.

Funny, I have to think about that problem and finished searching here, because it came in mind, that I can use my own javascript module. I wrote a module to generate a clean URL, therefor I have to translitate the input string... (http://pid.github.io/speakingurl/)
var mySlug = require('speakingurl').createSlug({
maintainCase: true,
separator: " "
});
var input = "Schöner Titel läßt grüßen!? Bel été !";
var result;
slug = mySlug(input);
console.log(result); // Output: "Schoener Titel laesst gruessen bel ete"
Now you can sort with this results. You can ex. store the original titel in the field "title" and the field for sorting in "title_sort" with the result of mySlug.

Related

Check length between two characters in a string

So I've been working on this coding challenge for about a day now and I still feel like I haven't scratched the surface even though it's suppose to be Easy. The problem asks us to take a string parameter and if there are exactly 3 characters (not including spaces) in between the letters 'a' and 'b', it should be true.
Example: Input: "maple bread"; Output: false // Because there are > 3 places
Input: "age bad"; Output: true // Exactly three places in between 'a' and 'b'
Here is what I've written, although it is unfinished and most likely in the wrong direction:
function challengeOne(str) {
let places = 0;
for (let i=0; i < str.length; i++) {
if (str[i] != 'a') {
places++
} else if (str[i] === 'b'){
}
}
console.log(places)
}
So my idea was to start counting places after the letter 'a' until it gets to 'b', then it would return the amount of places. I would then start another flow where if 'places' > 3, return false or if 'places' === 3, then return true.
However, attempting the first flow only returns the total count for places that aren't 'a'. I'm using console.log instead of return to test if it works or not.
I'm only looking for a push in the right direction and if there is a method I might be missing or if there are other examples similar to this. I feel like the solution is pretty simple yet I can't seem to grasp it.
Edit:
I took a break from this challenge just so I could look at it from fresh eyes and I was able to solve it quickly! I looked through everyone's suggestions and applied it until I found the solution. Here is the new code that worked:
function challengeOne(str) {
// code goes here
str = str.replace(/ /g, '')
let count = Math.abs(str.lastIndexOf('a')-str.lastIndexOf('b'));
if (count === 3) {
return true
} else return false
}
Thank you for all your input!
Here's a more efficient approach - simply find the indexes of the letter a and b and check whether the absolute value of subtracting the two is 4 (since indexes are 0 indexed):
function challengeOne(str) {
return Math.abs(str.indexOf("a") - str.indexOf("b")) == 4;
}
console.log(challengeOne("age bad"));
console.log(challengeOne("maple bread"));
if there are exactly 3 characters (not including spaces)
Simply remove all spaces via String#replace, then perform the check:
function challengeOne(str) {
return str = str.replace(/ /g, ''), Math.abs(str.indexOf("a") - str.indexOf("b")) == 4;
}
console.log(challengeOne("age bad"));
console.log(challengeOne("maple bread"));
References:
Math#abs
String#indexOf
Here is another approach: This one excludes spaces as in the OP, so the output reflects that. If it is to include spaces, that line could be removed.
function challengeOne(str) {
//strip spaces
str = str.replace(/\s/g, '');
//take just the in between chars
let extract = str.match(/a(.*)b/).pop();
return extract.length == 3
}
console.log(challengeOne('maple bread'));
console.log(challengeOne('age bad'));
You can go recursive:
Check if the string starts with 'a' and ends with 'b' and check the length
Continue by cutting the string either left or right (or both) until there are 3 characters in between or the string is empty.
Examples:
maple bread
aple brea
aple bre
aple br
aple b
ple
le
FALSE
age bad
age ba
age b
TRUE
const check = (x, y, z) => str => {
const exec = s => {
const xb = s.startsWith(x);
const yb = s.endsWith(y);
return ( !s ? false
: xb && yb && s.length === z + 2 ? true
: xb && yb ? exec(s.slice(1, -1))
: xb ? exec(s.slice(0, -1))
: exec(s.slice(1)));
};
return exec(str);
}
const challenge = check('a', 'b', 3);
console.log(`
challenge("maple bread"): ${challenge("maple bread")}
challenge("age bad"): ${challenge("age bad")}
challenge("aabab"): ${challenge("aabab")}
`)
I assume spaces are counted and your examples seem to indicate this, although your question says otherwise. If so, here's a push that should be helpful. You're right, there are JavaScript methods for strings, including one that should help you find the index (location) of the a and b within the given string.
Try here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#instance_methods

Sort array by specific array objects column [duplicate]

I have a list of objects I wish to sort based on a field attr of type string. I tried using -
list.sort(function (a, b) {
return a.attr - b.attr
})
but found that - doesn't appear to work with strings in JavaScript. How can I sort a list of objects based on an attribute with type string?
Use String.prototype.localeCompare as per your example:
list.sort(function (a, b) {
return ('' + a.attr).localeCompare(b.attr);
})
We force a.attr to be a string to avoid exceptions. localeCompare has been supported since Internet Explorer 6 and Firefox 1. You may also see the following code used that doesn't respect a locale:
if (item1.attr < item2.attr)
return -1;
if ( item1.attr > item2.attr)
return 1;
return 0;
An updated answer (October 2014)
I was really annoyed about this string natural sorting order so I took quite some time to investigate this issue.
Long story short
localeCompare() character support is badass, just use it.
As pointed out by Shog9, the answer to your question is:
return item1.attr.localeCompare(item2.attr);
Bugs found in all the custom JavaScript "natural string sort order" implementations
There are quite a bunch of custom implementations out there, trying to do string comparison more precisely called "natural string sort order"
When "playing" with these implementations, I always noticed some strange "natural sorting order" choice, or rather mistakes (or omissions in the best cases).
Typically, special characters (space, dash, ampersand, brackets, and so on) are not processed correctly.
You will then find them appearing mixed up in different places, typically that could be:
some will be between the uppercase 'Z' and the lowercase 'a'
some will be between the '9' and the uppercase 'A'
some will be after lowercase 'z'
When one would have expected special characters to all be "grouped" together in one place, except for the space special character maybe (which would always be the first character). That is, either all before numbers, or all between numbers and letters (lowercase & uppercase being "together" one after another), or all after letters.
My conclusion is that they all fail to provide a consistent order when I start adding barely unusual characters (i.e., characters with diacritics or characters such as dash, exclamation mark and so on).
Research on the custom implementations:
Natural Compare Lite https://github.com/litejs/natural-compare-lite : Fails at sorting consistently https://github.com/litejs/natural-compare-lite/issues/1 and http://jsbin.com/bevututodavi/1/edit?js,console, basic Latin characters sorting http://jsbin.com/bevututodavi/5/edit?js,console
Natural Sort https://github.com/javve/natural-sort : Fails at sorting consistently, see issue https://github.com/javve/natural-sort/issues/7 and see basic Latin characters sorting http://jsbin.com/cipimosedoqe/3/edit?js,console
JavaScript Natural Sort https://github.com/overset/javascript-natural-sort: seems rather neglected since February 2012, Fails at sorting consistently, see issue https://github.com/overset/javascript-natural-sort/issues/16
Alphanum http://www.davekoelle.com/files/alphanum.js , Fails at sorting consistently, see http://jsbin.com/tuminoxifuyo/1/edit?js,console
Browsers' native "natural string sort order" implementations via localeCompare()
localeCompare() oldest implementation (without the locales and options arguments) is supported by Internet Explorer 6 and later, see http://msdn.microsoft.com/en-us/library/ie/s4esdbwz(v=vs.94).aspx (scroll down to localeCompare() method).
The built-in localeCompare() method does a much better job at sorting, even international & special characters.
The only problem using the localeCompare() method is that "the locale and sort order used are entirely implementation dependent". In other words, when using localeCompare such as stringOne.localeCompare(stringTwo): Firefox, Safari, Chrome, and Internet Explorer have a different sort order for Strings.
Research on the browser-native implementations:
http://jsbin.com/beboroyifomu/1/edit?js,console - basic Latin characters comparison with localeCompare()
http://jsbin.com/viyucavudela/2/ - basic Latin characters comparison with localeCompare() for testing on Internet Explorer 8
http://jsbin.com/beboroyifomu/2/edit?js,console - basic Latin characters in string comparison : consistency check in string vs when a character is alone
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/localeCompare - Internet Explorer 11 and later supports the new locales & options arguments
Difficulty of "string natural sorting order"
Implementing a solid algorithm (meaning: consistent but also covering a wide range of characters) is a very tough task. UTF-8 contains more than 2000 characters and covers more than 120 scripts (languages).
Finally, there are some specification for this tasks, it is called the "Unicode Collation Algorithm", which can be found at http://www.unicode.org/reports/tr10/. You can find more information about this on this question I posted https://softwareengineering.stackexchange.com/questions/257286/is-there-any-language-agnostic-specification-for-string-natural-sorting-order
Final conclusion
So considering the current level of support provided by the JavaScript custom implementations I came across, we will probably never see anything getting any close to supporting all this characters and scripts (languages). Hence I would rather use the browsers' native localeCompare() method. Yes, it does have the downside of being non-consistent across browsers but basic testing shows it covers a much wider range of characters, allowing solid & meaningful sort orders.
So as pointed out by Shog9, the answer to your question is:
return item1.attr.localeCompare(item2.attr);
Further reading:
https://softwareengineering.stackexchange.com/questions/257286/is-there-any-language-agnostic-specification-for-string-natural-sorting-order
How to sort strings in JavaScript
Natural sort of alphanumerical strings in JavaScript
Sort Array of numeric & alphabetical elements (Natural Sort)
Sort mixed alpha/numeric array
https://web.archive.org/web/20130929122019/http://my.opera.com/GreyWyvern/blog/show.dml/1671288
https://web.archive.org/web/20131005224909/http://www.davekoelle.com/alphanum.html
http://snipplr.com/view/36012/javascript-natural-sort/
http://blog.codinghorror.com/sorting-for-humans-natural-sort-order/
Thanks to Shog9's nice answer, which put me in the "right" direction I believe.
Answer (in Modern ECMAScript)
list.sort((a, b) => (a.attr > b.attr) - (a.attr < b.attr))
Or
list.sort((a, b) => +(a.attr > b.attr) || -(a.attr < b.attr))
Description
Casting a boolean value to a number yields the following:
true -> 1
false -> 0
Consider three possible patterns:
x is larger than y: (x > y) - (y < x) -> 1 - 0 -> 1
x is equal to y: (x > y) - (y < x) -> 0 - 0 -> 0
x is smaller than y: (x > y) - (y < x) -> 0 - 1 -> -1
(Alternative)
x is larger than y: +(x > y) || -(x < y) -> 1 || 0 -> 1
x is equal to y: +(x > y) || -(x < y) -> 0 || 0 -> 0
x is smaller than y: +(x > y) || -(x < y) -> 0 || -1 -> -1
So these logics are equivalent to typical sort comparator functions.
if (x == y) {
return 0;
}
return x > y ? 1 : -1;
Since strings can be compared directly in JavaScript, this will do the job:
list.sort(function (a, b) {
return a.attr < b.attr ? -1: 1;
})
This is a little bit more efficient than using
return a.attr > b.attr ? 1: -1;
because in case of elements with same attr (a.attr == b.attr), the sort function will swap the two for no reason.
For example
var so1 = function (a, b) { return a.atr > b.atr ? 1: -1; };
var so2 = function (a, b) { return a.atr < b.atr ? -1: 1; }; // Better
var m1 = [ { atr: 40, s: "FIRST" }, { atr: 100, s: "LAST" }, { atr: 40, s: "SECOND" } ].sort (so1);
var m2 = [ { atr: 40, s: "FIRST" }, { atr: 100, s: "LAST" }, { atr: 40, s: "SECOND" } ].sort (so2);
// m1 sorted but ...: 40 SECOND 40 FIRST 100 LAST
// m2 more efficient: 40 FIRST 40 SECOND 100 LAST
You should use > or < and == here. So the solution would be:
list.sort(function(item1, item2) {
var val1 = item1.attr,
val2 = item2.attr;
if (val1 == val2) return 0;
if (val1 > val2) return 1;
if (val1 < val2) return -1;
});
Nested ternary arrow function
(a,b) => (a < b ? -1 : a > b ? 1 : 0)
I had been bothered about this for long, so I finally researched this and give you this long winded reason for why things are the way they are.
From the spec:
Section 11.9.4 The Strict Equals Operator ( === )
The production EqualityExpression : EqualityExpression === RelationalExpression
is evaluated as follows:
- Let lref be the result of evaluating EqualityExpression.
- Let lval be GetValue(lref).
- Let rref be the result of evaluating RelationalExpression.
- Let rval be GetValue(rref).
- Return the result of performing the strict equality comparison
rval === lval. (See 11.9.6)
So now we go to 11.9.6
11.9.6 The Strict Equality Comparison Algorithm
The comparison x === y, where x and y are values, produces true or false.
Such a comparison is performed as follows:
- If Type(x) is different from Type(y), return false.
- If Type(x) is Undefined, return true.
- If Type(x) is Null, return true.
- If Type(x) is Number, then
...
- If Type(x) is String, then return true if x and y are exactly the
same sequence of characters (same length and same characters in
corresponding positions); otherwise, return false.
That's it. The triple equals operator applied to strings returns true iff the arguments are exactly the same strings (same length and same characters in corresponding positions).
So === will work in the cases when we're trying to compare strings which might have arrived from different sources, but which we know will eventually have the same values - a common enough scenario for inline strings in our code. For example, if we have a variable named connection_state, and we wish to know which one of the following states ['connecting', 'connected', 'disconnecting', 'disconnected'] is it in right now, we can directly use the ===.
But there's more. Just above 11.9.4, there is a short note:
NOTE 4
Comparison of Strings uses a simple equality test on sequences of code
unit values. There is no attempt to use the more complex, semantically oriented
definitions of character or string equality and collating order defined in the
Unicode specification. Therefore Strings values that are canonically equal
according to the Unicode standard could test as unequal. In effect this
algorithm assumes that both Strings are already in normalized form.
Hmm. What now? Externally obtained strings can, and most likely will, be weird unicodey, and our gentle === won't do them justice. In comes localeCompare to the rescue:
15.5.4.9 String.prototype.localeCompare (that)
...
The actual return values are implementation-defined to permit implementers
to encode additional information in the value, but the function is required
to define a total ordering on all Strings and to return 0 when comparing
Strings that are considered canonically equivalent by the Unicode standard.
We can go home now.
tl;dr;
To compare strings in javascript, use localeCompare; if you know that the strings have no non-ASCII components because they are, for example, internal program constants, then === also works.
An explanation of why the approach in the question doesn't work:
let products = [
{ name: "laptop", price: 800 },
{ name: "phone", price:200},
{ name: "tv", price: 1200}
];
products.sort( (a, b) => {
{let value= a.name - b.name; console.log(value); return value}
});
> 2 NaN
Subtraction between strings returns NaN.
Echoing Alejadro's answer, the right approach is:
products.sort( (a,b) => a.name > b.name ? 1 : -1 )
A typescript sorting method modifier using a custom function to return a sorted string in either ascending or descending order
const data = ["jane", "mike", "salome", "ababus", "buisa", "dennis"];
const sortStringArray = (stringArray: string[], mode?: 'desc' | 'asc') => {
if (!mode || mode === 'asc') {
return stringArray.sort((a, b) => a.localeCompare(b))
}
return stringArray.sort((a, b) => b.localeCompare(a))
}
console.log(sortStringArray(data, 'desc'));// [ 'salome', 'mike', 'jane', 'dennis', 'buisa', 'ababus' ]
console.log(sortStringArray(data, 'asc')); // [ 'ababus', 'buisa', 'dennis', 'jane', 'mike', 'salome' ]
There should be ascending and descending orders functions
if (order === 'asc') {
return a.localeCompare(b);
}
return b.localeCompare(a);
If you want to control locales (or case or accent), then use Intl.collator:
const collator = new Intl.Collator();
list.sort((a, b) => collator.compare(a.attr, b.attr));
You can construct a collator like:
new Intl.Collator("en");
new Intl.Collator("en", {sensitivity: "case"});
...
See the above link for documentation.
Note: unlike some other solutions, it handles null, undefined the JavaScript way, i.e., moves them to the end.
Use sort() straightforward without any - or <
const areas = ['hill', 'beach', 'desert', 'mountain']
console.log(areas.sort())
// To print in descending way
console.log(areas.sort().reverse())
In your operation in your initial question, you are performing the following operation:
item1.attr - item2.attr
So, assuming those are numbers (i.e. item1.attr = "1", item2.attr = "2") You still may use the "===" operator (or other strict evaluators) provided that you ensure type. The following should work:
return parseInt(item1.attr) - parseInt(item2.attr);
If they are alphaNumeric, then do use localCompare().
list.sort(function(item1, item2){
return +(item1.attr > item2.attr) || +(item1.attr === item2.attr) - 1;
})
How they work samples:
+('aaa'>'bbb')||+('aaa'==='bbb')-1
+(false)||+(false)-1
0||0-1
-1
+('bbb'>'aaa')||+('bbb'==='aaa')-1
+(true)||+(false)-1
1||0-1
1
+('aaa'>'aaa')||+('aaa'==='aaa')-1
+(false)||+(true)-1
0||1-1
0
var str = ['v','a','da','c','k','l']
var b = str.join('').split('').sort().reverse().join('')
console.log(b)
<!doctype html>
<html>
<body>
<p id = "myString">zyxtspqnmdba</p>
<p id = "orderedString"></p>
<script>
var myString = document.getElementById("myString").innerHTML;
orderString(myString);
function orderString(str) {
var i = 0;
var myArray = str.split("");
while (i < str.length){
var j = i + 1;
while (j < str.length) {
if (myArray[j] < myArray[i]){
var temp = myArray[i];
myArray[i] = myArray[j];
myArray[j] = temp;
}
j++;
}
i++;
}
var newString = myArray.join("");
document.getElementById("orderedString").innerHTML = newString;
}
</script>
</body>
</html>

How to sort greek letters/symbols JavaScript

I have managed to sort an array of objects by element name 'post_api_name' however some of these elements start with a Greek symbol (e.g. α-β Arteether). How can I convert these symbols to be placed alongside 0-9 and not Z+1.
RESULT.sort(function(a, b) {
var textA = a.post_api_name.toUpperCase();
var textB = b.post_api_name.toUpperCase();
return (textA < textB) ? -1 : (textA > textB) ? 1 : 0;
});
For clarity, the current sorted results show as:
a
b
c
α
I am hoping to achieve:
α
a
b
c
You can try sorting using the Greek locale. You would want to test with more cases as I have no idea what other sorting rules could apply. Browser support for localeCompare is still relatively cutting edge (IE11). Maybe you can find a poly-fill.
Maybe more preferably you might just have a list of Greek characters that you support and replace them with one of the characters between 9 and A for sorting purposes (:;<=>? or #).
var items = ['a', 'b', 'c', 'α', '9', '4', '1']
items.sort(function(a, b){
return a.localeCompare(b, 'el');
});
console.log(items);
By default, the decimal code for the characters like 'α' will start from 128 i.e they come under extended ASCII codes.
I know there is no particular way of doing this as per your requirement. You can try the below function which will achieve you sorting the normal characters too and puts the greek/additional characters at first.
var mySortedChars = [];
function sortMyArray(array) {
array.forEach(function(val) {
if (val.charCodeAt(0) > 127) {
mySortedChars.push(val);
_.pull(array, val);
}
});
array.sort();
array.forEach(function(val) {
mySortedChars.push(val);
});
return mySortedChars;
}
The above function will return you the sorted characters as per your requirement.
Note: _.pull is a util from the lodash library. You may have to use your own methods to splice the special characters instead of using _.pull(array, val);
A possible solution written in ES6. Split data out into standard and greek starting elements. Sort these results and then concat them.
const greekLower = new Set();
for (let c = 0x3B1; c < 0x3C9; c += 1) {
greekLower.add(String.fromCodePoint(c));
}
const data = ['xa', 'αb', 'yc', 'βd', 'ae', '1f', 'γg', '9h'];
const standard = [];
const greek = [];
for (let v of data) {
if (greekLower.has(v.charAt(0).toLowerCase())) {
greek.push(v);
} else {
standard.push(v);
}
}
standard.sort();
greek.sort();
const sorted = greek.concat(standard);
document.getElementById('out').textContent = JSON.stringify(sorted);
console.log(sorted);
<pre id="out"></pre>

Javascript sort string numeric-alpha with zeroes at end

I'm terrible with math, but I need to be able to sort a string in JavaScript (or jQuery) in numeric-alpha order with zeroes at the end.
So to do this, I know I can convert the string into an array with myStr.split('') and then call sort() with a custom sort function, so here is what I have so far:
function mysort(a, b) {
// not sure what to put here.
// order should be: 123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ0
}
function sortMyString(str) {
str.split('');
return str.sort(mysort);
}
I would really appreciate help on this.
Thank you.
This will result in the string being the same as above, feel free to mix the string up, add more etc and it should come up in the order you want.
http://jsfiddle.net/dJMHK/
var str = '123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ0';
str = str.split('');
str = str.sort(function (a,b) {
if (a === '0' || b === '0')
return (b === a) ? 0 : (a < b) ? 1 : -1;
return (a < b) ? -1 : (a === b) ? 0 : 1;
});
document.write(str);
If you're splitting and comparing individual characters you can use the charCodeAt function.
a.charCodeAt(0)
and
b.charCodeAt(0)
to get the unicode character value. You'll want to have cases for 1-9, A-Z, a-z, 0, and anything you weren't planning for.
In your sort function, to get items in the appropriate order: return 0 if a == b, 1 if a > b, and -1 if a < b. Let me know if you need help working on the sort function, this is all I have time to guide you on at the moment.
You can use the sort function then handle the zeros with some custom code. Here is a reference:
http://www.w3schools.com/jsref/jsref_sort.asp

Find the longest common starting substring in a set of strings [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
This is a challenge to come up with the most elegant JavaScript, Ruby or other solution to a relatively trivial problem.
This problem is a more specific case of the Longest common substring problem. I need to only find the longest common starting substring in an array. This greatly simplifies the problem.
For example, the longest substring in [interspecies, interstelar, interstate] is "inters". However, I don't need to find "ific" in [specifics, terrific].
I've solved the problem by quickly coding up a solution in JavaScript as a part of my answer about shell-like tab-completion (test page here). Here is that solution, slightly tweaked:
function common_substring(data) {
var i, ch, memo, idx = 0
do {
memo = null
for (i=0; i < data.length; i++) {
ch = data[i].charAt(idx)
if (!ch) break
if (!memo) memo = ch
else if (ch != memo) break
}
} while (i == data.length && idx < data.length && ++idx)
return (data[0] || '').slice(0, idx)
}
This code is available in this Gist along with a similar solution in Ruby. You can clone the gist as a git repo to try it out:
$ git clone git://gist.github.com/257891.git substring-challenge
I'm not very happy with those solutions. I have a feeling they might be solved with more elegance and less execution complexity—that's why I'm posting this challenge.
I'm going to accept as an answer the solution I find the most elegant or concise. Here is for instance a crazy Ruby hack I come up with—defining the & operator on String:
# works with Ruby 1.8.7 and above
class String
def &(other)
difference = other.to_str.each_char.with_index.find { |ch, idx|
self[idx].nil? or ch != self[idx].chr
}
difference ? self[0, difference.last] : self
end
end
class Array
def common_substring
self.inject(nil) { |memo, str| memo.nil? ? str : memo & str }.to_s
end
end
Solutions in JavaScript or Ruby are preferred, but you can show off clever solution in other languages as long as you explain what's going on. Only code from standard library please.
Update: my favorite solutions
I've chosen the JavaScript sorting solution by kennebec as the "answer" because it struck me as both unexpected and genius. If we disregard the complexity of actual sorting (let's imagine it's infinitely optimized by the language implementation), the complexity of the solution is just comparing two strings.
Other great solutions:
"regex greed" by FM takes a minute or two to grasp, but then the elegance of it hits you. Yehuda Katz also made a regex solution, but it's more complex
commonprefix in Python — Roberto Bonvallet used a feature made for handling filesystem paths to solve this problem
Haskell one-liner is short as if it were compressed, and beautiful
the straightforward Ruby one-liner
Thanks for participating! As you can see from the comments, I learned a lot (even about Ruby).
It's a matter of taste, but this is a simple javascript version:
It sorts the array, and then looks just at the first and last items.
//longest common starting substring in an array
function sharedStart(array){
var A= array.concat().sort(),
a1= A[0], a2= A[A.length-1], L= a1.length, i= 0;
while(i<L && a1.charAt(i)=== a2.charAt(i)) i++;
return a1.substring(0, i);
}
DEMOS
sharedStart(['interspecies', 'interstelar', 'interstate']) //=> 'inters'
sharedStart(['throne', 'throne']) //=> 'throne'
sharedStart(['throne', 'dungeon']) //=> ''
sharedStart(['cheese']) //=> 'cheese'
sharedStart([]) //=> ''
sharedStart(['prefix', 'suffix']) //=> ''
In Python:
>>> from os.path import commonprefix
>>> commonprefix('interspecies interstelar interstate'.split())
'inters'
Ruby one-liner:
l=strings.inject{|l,s| l=l.chop while l!=s[0...l.length];l}
My Haskell one-liner:
import Data.List
commonPre :: [String] -> String
commonPre = map head . takeWhile (\(x:xs)-> all (==x) xs) . transpose
EDIT: barkmadley gave a good explanation of the code below. I'd also add that haskell uses lazy evaluation, so we can be lazy about our use of transpose; it will only transpose our lists as far as necessary to find the end of the common prefix.
You just need to traverse all strings until they differ, then take the substring up to this point.
Pseudocode:
loop for i upfrom 0
while all strings[i] are equal
finally return substring[0..i]
Common Lisp:
(defun longest-common-starting-substring (&rest strings)
(loop for i from 0 below (apply #'min (mapcar #'length strings))
while (apply #'char=
(mapcar (lambda (string) (aref string i))
strings))
finally (return (subseq (first strings) 0 i))))
Yet another way to do it: use regex greed.
words = %w(interspecies interstelar interstate)
j = '='
str = ['', *words].join(j)
re = "[^#{j}]*"
str =~ /\A
(?: #{j} ( #{re} ) #{re} )
(?: #{j} \1 #{re} )*
\z/x
p $1
And the one-liner, courtesy of mislav (50 characters):
p ARGV.join(' ').match(/^(\w*)\w*(?: \1\w*)*$/)[1]
In Python I wouldn't use anything but the existing commonprefix function I showed in another answer, but I couldn't help to reinvent the wheel :P. This is my iterator-based approach:
>>> a = 'interspecies interstelar interstate'.split()
>>>
>>> from itertools import takewhile, chain, izip as zip, imap as map
>>> ''.join(chain(*takewhile(lambda s: len(s) == 1, map(set, zip(*a)))))
'inters'
Edit: Explanation of how this works.
zip generates tuples of elements taking one of each item of a at a time:
In [6]: list(zip(*a)) # here I use list() to expand the iterator
Out[6]:
[('i', 'i', 'i'),
('n', 'n', 'n'),
('t', 't', 't'),
('e', 'e', 'e'),
('r', 'r', 'r'),
('s', 's', 's'),
('p', 't', 't'),
('e', 'e', 'a'),
('c', 'l', 't'),
('i', 'a', 'e')]
By mapping set over these items, I get a series of unique letters:
In [7]: list(map(set, _)) # _ means the result of the last statement above
Out[7]:
[set(['i']),
set(['n']),
set(['t']),
set(['e']),
set(['r']),
set(['s']),
set(['p', 't']),
set(['a', 'e']),
set(['c', 'l', 't']),
set(['a', 'e', 'i'])]
takewhile(predicate, items) takes elements from this while the predicate is True; in this particular case, when the sets have one element, i.e. all the words have the same letter at that position:
In [8]: list(takewhile(lambda s: len(s) == 1, _))
Out[8]:
[set(['i']),
set(['n']),
set(['t']),
set(['e']),
set(['r']),
set(['s'])]
At this point we have an iterable of sets, each containing one letter of the prefix we were looking for. To construct the string, we chain them into a single iterable, from which we get the letters to join into the final string.
The magic of using iterators is that all items are generated on demand, so when takewhile stops asking for items, the zipping stops at that point and no unnecessary work is done. Each function call in my one-liner has a implicit for and an implicit break.
This is probably not the most concise solution (depends if you already have a library for this), but one elegant method is to use a trie. I use tries for implementing tab completion in my Scheme interpreter:
http://github.com/jcoglan/heist/blob/master/lib/trie.rb
For example:
tree = Trie.new
%w[interspecies interstelar interstate].each { |s| tree[s] = true }
tree.longest_prefix('')
#=> "inters"
I also use them for matching channel names with wildcards for the Bayeux protocol; see these:
http://github.com/jcoglan/faye/blob/master/client/channel.js
http://github.com/jcoglan/faye/blob/master/lib/faye/channel.rb
Just for the fun of it, here's a version written in (SWI-)PROLOG:
common_pre([[C|Cs]|Ss], [C|Res]) :-
maplist(head_tail(C), [[C|Cs]|Ss], RemSs), !,
common_pre(RemSs, Res).
common_pre(_, []).
head_tail(H, [H|T], T).
Running:
?- S=["interspecies", "interstelar", "interstate"], common_pre(S, CP), string_to_list(CPString, CP).
Gives:
CP = [105, 110, 116, 101, 114, 115],
CPString = "inters".
Explanation:
(SWI-)PROLOG treats strings as lists of character codes (numbers). All the predicate common_pre/2 does is recursively pattern-match to select the first code (C) from the head of the first list (string, [C|Cs]) in the list of all lists (all strings, [[C|Cs]|Ss]), and appends the matching code C to the result iff it is common to all (remaining) heads of all lists (strings), else it terminates.
Nice, clean, simple and efficient... :)
A javascript version based on #Svante's algorithm:
function commonSubstring(words){
var iChar, iWord,
refWord = words[0],
lRefWord = refWord.length,
lWords = words.length;
for (iChar = 0; iChar < lRefWord; iChar += 1) {
for (iWord = 1; iWord < lWords; iWord += 1) {
if (refWord[iChar] !== words[iWord][iChar]) {
return refWord.substring(0, iChar);
}
}
}
return refWord;
}
Combining answers by kennebec, Florian F and jberryman yields the following Haskell one-liner:
commonPrefix l = map fst . takeWhile (uncurry (==)) $ zip (minimum l) (maximum l)
With Control.Arrow one can get a point-free form:
commonPrefix = map fst . takeWhile (uncurry (==)) . uncurry zip . (minimum &&& maximum)
Python 2.6 (r26:66714, Oct 4 2008, 02:48:43)
>>> a = ['interspecies', 'interstelar', 'interstate']
>>> print a[0][:max(
[i for i in range(min(map(len, a)))
if len(set(map(lambda e: e[i], a))) == 1]
) + 1]
inters
i for i in range(min(map(len, a))), number of maximum lookups can't be greater than the length of the shortest string; in this example this would evaluate to [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
len(set(map(lambda e: e[i], a))), 1) create an array of the i-thcharacter for each string in the list; 2) make a set out of it; 3) determine the size of the set
[i for i in range(min(map(len, a))) if len(set(map(lambda e: e[i], a))) == 1], include just the characters, for which the size of the set is 1 (all characters at that position were the same ..); here it would evaluate to [0, 1, 2, 3, 4, 5]
finally take the max, add one, and get the substring ...
Note: the above does not work for a = ['intersyate', 'intersxate', 'interstate', 'intersrate'], but this would:
>>> index = len(
filter(lambda l: l[0] == l[1],
[ x for x in enumerate(
[i for i in range(min(map(len, a)))
if len(set(map(lambda e: e[i], a))) == 1]
)]))
>>> a[0][:index]
inters
Doesn't seem that complicated if you're not too concerned about ultimate performance:
def common_substring(data)
data.inject { |m, s| s[0,(0..m.length).find { |i| m[i] != s[i] }.to_i] }
end
One of the useful features of inject is the ability to pre-seed with the first element of the array being interated over. This avoids the nil memo check.
puts common_substring(%w[ interspecies interstelar interstate ]).inspect
# => "inters"
puts common_substring(%w[ feet feel feeble ]).inspect
# => "fee"
puts common_substring(%w[ fine firkin fail ]).inspect
# => "f"
puts common_substring(%w[ alpha bravo charlie ]).inspect
# => ""
puts common_substring(%w[ fork ]).inspect
# => "fork"
puts common_substring(%w[ fork forks ]).inspect
# => "fork"
Update: If golf is the game here, then 67 characters:
def f(d)d.inject{|m,s|s[0,(0..m.size).find{|i|m[i]!=s[i]}.to_i]}end
This one is very similar to Roberto Bonvallet's solution, except in ruby.
chars = %w[interspecies interstelar interstate].map {|w| w.split('') }
chars[0].zip(*chars[1..-1]).map { |c| c.uniq }.take_while { |c| c.size == 1 }.join
The first line replaces each word with an array of chars. Next, I use zip to create this data structure:
[["i", "i", "i"], ["n", "n", "n"], ["t", "t", "t"], ...
map and uniq reduce this to [["i"],["n"],["t"], ...
take_while pulls the chars off the array until it finds one where the size isn't one (meaning not all chars were the same). Finally, I join them back together.
The accepted solution is broken (for example, it returns a for strings like ['a', 'ba']). The fix is very simple, you literally have to change only 3 characters (from indexOf(tem1) == -1 to indexOf(tem1) != 0) and the function would work as expected.
Unfortunately, when I tried to edit the answer to fix the typo, SO told me that "edits must be at least 6 characters". I could change more then those 3 chars, by improving naming and readability but that feels like a little bit too much.
So, below is a fixed and improved (at least from my point of view) version of the kennebec's solution:
function commonPrefix(words) {
max_word = words.reduce(function(a, b) { return a > b ? a : b });
prefix = words.reduce(function(a, b) { return a > b ? b : a }); // min word
while(max_word.indexOf(prefix) != 0) {
prefix = prefix.slice(0, -1);
}
return prefix;
}
(on jsFiddle)
Note, that it uses reduce method (JavaScript 1.8) in order to find alphanumeric max / min instead of sorting the array and then fetching the first and the last elements of it.
While reading these answers with all the fancy functional programming, sorting and regexes and whatnot, I just thought: what's wrong a little bit of C? So here's a goofy looking little program.
#include <stdio.h>
int main (int argc, char *argv[])
{
int i = -1, j, c;
if (argc < 2)
return 1;
while (c = argv[1][++i])
for (j = 2; j < argc; j++)
if (argv[j][i] != c)
goto out;
out:
printf("Longest common prefix: %.*s\n", i, argv[1]);
}
Compile it, run it with your list of strings as command line arguments, then upvote me for using goto!
Here's a solution using regular expressions in Ruby:
def build_regex(string)
arr = []
arr << string.dup while string.chop!
Regexp.new("^(#{arr.join("|")})")
end
def substring(first, *strings)
strings.inject(first) do |accum, string|
build_regex(accum).match(string)[0]
end
end
I would do the following:
Take the first string of the array as the initial starting substring.
Take the next string of the array and compare the characters until the end of one of the strings is reached or a mismatch is found. If a mismatch is found, reduce starting substring to the length where the mismatch was found.
Repeat step 2 until all strings have been tested.
Here’s a JavaScript implementation:
var array = ["interspecies", "interstelar", "interstate"],
prefix = array[0],
len = prefix.length;
for (i=1; i<array.length; i++) {
for (j=0, len=Math.min(len,array[j].length); j<len; j++) {
if (prefix[j] != array[i][j]) {
len = j;
prefix = prefix.substr(0, len);
break;
}
}
}
Instead of sorting, you could just get the min and max of the strings.
To me, elegance in a computer program is a balance of speed and simplicity.
It should not do unnecessary computation, and it should be simple enough to make its correctness evident.
I could call the sorting solution "clever", but not "elegant".
Oftentimes it's more elegant to use a mature open source library instead of rolling your own. Then, if it doesn't completely suit your needs, you can extend it or modify it to improve it, and let the community decide if that belongs in the library.
diff-lcs is a good Ruby gem for least common substring.
My solution in Java:
public static String compute(Collection<String> strings) {
if(strings.isEmpty()) return "";
Set<Character> v = new HashSet<Character>();
int i = 0;
try {
while(true) {
for(String s : strings) v.add(s.charAt(i));
if(v.size() > 1) break;
v.clear();
i++;
}
} catch(StringIndexOutOfBoundsException ex) {}
return strings.iterator().next().substring(0, i);
}
Golfed JS solution just for fun:
w=["hello", "hell", "helen"];
c=w.reduce(function(p,c){
for(r="",i=0;p[i]==c[i];r+=p[i],i++){}
return r;
});
Here's an efficient solution in ruby. I based the idea of the strategy for a hi/lo guessing game where you iteratively zero in on the longest prefix.
Someone correct me if I'm wrong, but I think the complexity is O(n log n), where n is the length of the shortest string and the number of strings is considered a constant.
def common(strings)
lo = 0
hi = strings.map(&:length).min - 1
return '' if hi < lo
guess, last_guess = lo, hi
while guess != last_guess
last_guess = guess
guess = lo + ((hi - lo) / 2.0).ceil
if strings.map { |s| s[0..guess] }.uniq.length == 1
lo = guess
else
hi = guess
end
end
strings.map { |s| s[0...guess] }.uniq.length == 1 ? strings.first[0...guess] : ''
end
And some checks that it works:
>> common %w{ interspecies interstelar interstate }
=> "inters"
>> common %w{ dog dalmation }
=> "d"
>> common %w{ asdf qwerty }
=> ""
>> common ['', 'asdf']
=> ""
Fun alternative Ruby solution:
def common_prefix(*strings)
chars = strings.map(&:chars)
length = chars.first.zip( *chars[1..-1] ).index{ |a| a.uniq.length>1 }
strings.first[0,length]
end
p common_prefix( 'foon', 'foost', 'forlorn' ) #=> "fo"
p common_prefix( 'foost', 'foobar', 'foon' ) #=> "foo"
p common_prefix( 'a','b' ) #=> ""
It might help speed if you used chars = strings.sort_by(&:length).map(&:chars), since the shorter the first string, the shorter the arrays created by zip. However, if you cared about speed, you probably shouldn't use this solution anyhow. :)
Javascript clone of AShelly's excellent answer.
Requires Array#reduce which is supported only in firefox.
var strings = ["interspecies", "intermediate", "interrogation"]
var sub = strings.reduce(function(l,r) {
while(l!=r.slice(0,l.length)) {
l = l.slice(0, -1);
}
return l;
});
This is by no means elegant, but if you want concise:
Ruby, 71 chars
def f(a)b=a[0];b[0,(0..b.size).find{|n|a.any?{|i|i[0,n]!=b[0,n]}}-1]end
If you want that unrolled it looks like this:
def f(words)
first_word = words[0];
first_word[0, (0..(first_word.size)).find { |num_chars|
words.any? { |word| word[0, num_chars] != first_word[0, num_chars] }
} - 1]
end
It's not code golf, but you asked for somewhat elegant, and I tend to think recursion is fun. Java.
/** Recursively find the common prefix. */
public String findCommonPrefix(String[] strings) {
int minLength = findMinLength(strings);
if (isFirstCharacterSame(strings)) {
return strings[0].charAt(0) + findCommonPrefix(removeFirstCharacter(strings));
} else {
return "";
}
}
/** Get the minimum length of a string in strings[]. */
private int findMinLength(final String[] strings) {
int length = strings[0].size();
for (String string : strings) {
if (string.size() < length) {
length = string.size();
}
}
return length;
}
/** Compare the first character of all strings. */
private boolean isFirstCharacterSame(String[] strings) {
char c = string[0].charAt(0);
for (String string : strings) {
if (c != string.charAt(0)) return false;
}
return true;
}
/** Remove the first character of each string in the array,
and return a new array with the results. */
private String[] removeFirstCharacter(String[] source) {
String[] result = new String[source.length];
for (int i=0; i<result.length; i++) {
result[i] = source[i].substring(1);
}
return result;
}
A ruby version based on #Svante's algorithm. Runs ~3x as fast as my first one.
def common_prefix set
i=0
rest=set[1..-1]
set[0].each_byte{|c|
rest.each{|e|return set[0][0...i] if e[i]!=c}
i+=1
}
set
end
My Javascript solution:
IMOP, using sort is too tricky.
My solution is compare letter by letter through looping the array.
Return string if letter is not macthed.
This is my solution:
var longestCommonPrefix = function(strs){
if(strs.length < 1){
return '';
}
var p = 0, i = 0, c = strs[0][0];
while(p < strs[i].length && strs[i][p] === c){
i++;
if(i === strs.length){
i = 0;
p++;
c = strs[0][p];
}
}
return strs[0].substr(0, p);
};
Realizing the risk of this turning into a match of code golf (or is that the intention?), here's my solution using sed, copied from my answer to another SO question and shortened to 36 chars (30 of which are the actual sed expression). It expects the strings (each on a seperate line) to be supplied on standard input or in files passed as additional arguments.
sed 'N;s/^\(.*\).*\n\1.*$/\1\n\1/;D'
A script with sed in the shebang line weighs in at 45 chars:
#!/bin/sed -f
N;s/^\(.*\).*\n\1.*$/\1\n\1/;D
A test run of the script (named longestprefix), with strings supplied as a "here document":
$ ./longestprefix <<EOF
> interspecies
> interstelar
> interstate
> EOF
inters
$

Categories