When I wrote in JavaScript "Ł" > "Z" it returns true. In Unicode order it should be of course false. How to fix this? My site is using UTF-8.
You can use Intl.Collator or String.prototype.localeCompare, introduced by ECMAScript Internationalization API:
"Ł".localeCompare("Z", "pl"); // -1
new Intl.Collator("pl").compare("Ł","Z"); // -1
-1 means that Ł comes before Z, like you want.
Note it only works on latest browsers, though.
Here is an example for the french alphabet that could help you for a custom sort:
var alpha = function(alphabet, dir, caseSensitive){
return function(a, b){
var pos = 0,
min = Math.min(a.length, b.length);
dir = dir || 1;
caseSensitive = caseSensitive || false;
if(!caseSensitive){
a = a.toLowerCase();
b = b.toLowerCase();
}
while(a.charAt(pos) === b.charAt(pos) && pos < min){ pos++; }
return alphabet.indexOf(a.charAt(pos)) > alphabet.indexOf(b.charAt(pos)) ?
dir:-dir;
};
};
To use it on an array of strings a:
a.sort(
alpha('ABCDEFGHIJKLMNOPQRSTUVWXYZaàâäbcçdeéèêëfghiïîjklmnñoôöpqrstuûüvwxyÿz')
);
Add 1 or -1 as the second parameter of alpha() to sort ascending or descending.
Add true as the 3rd parameter to sort case sensitive.
You may need to add numbers and special chars to the alphabet list
You may be able to build your own sorting function using localeCompare() that - at least according to the MDC article on the topic - should sort things correctly.
If that doesn't work out, here is an interesting SO question where the OP employs string replacement to build a "brute-force" sorting mechanism.
Also in that question, the OP shows how to build a custom textExtract function for the jQuery tablesorter plugin that does locale-aware sorting - maybe also worth a look.
Edit: As a totally far-out idea - I have no idea whether this is feasible at all, especially because of performance concerns - if you are working with PHP/mySQL on the back-end anyway, I would like to mention the possibility of sending an Ajax query to a mySQL instance to have it sorted there. mySQL is great at sorting locale aware data, because you can force sorting operations into a specific collation using e.g. ORDER BY xyz COLLATE utf8_polish_ci, COLLATE utf8_german_ci.... those collations would take care of all sorting woes at once.
Mic's code improved for non-mentioned chars:
var alpha = function(alphabet, dir, caseSensitive){
dir = dir || 1;
function compareLetters(a, b) {
var ia = alphabet.indexOf(a);
var ib = alphabet.indexOf(b);
if(ia === -1 || ib === -1) {
if(ib !== -1)
return a > 'a';
if(ia !== -1)
return 'a' > b;
return a > b;
}
return ia > ib;
}
return function(a, b){
var pos = 0;
var min = Math.min(a.length, b.length);
caseSensitive = caseSensitive || false;
if(!caseSensitive){
a = a.toLowerCase();
b = b.toLowerCase();
}
while(a.charAt(pos) === b.charAt(pos) && pos < min){ pos++; }
return compareLetters(a.charAt(pos), b.charAt(pos)) ? dir:-dir;
};
};
function assert(bCondition, sErrorMessage) {
if (!bCondition) {
throw new Error(sErrorMessage);
}
}
assert(alpha("bac")("a", "b") === 1, "b is first than a");
assert(alpha("abc")("ac", "a") === 1, "shorter string is first than longer string");
assert(alpha("abc")("1abc", "0abc") === 1, "non-mentioned chars are compared as normal");
assert(alpha("abc")("0abc", "1abc") === -1, "non-mentioned chars are compared as normal [2]");
assert(alpha("abc")("0abc", "bbc") === -1, "non-mentioned chars are compared with mentioned chars in special way");
assert(alpha("abc")("zabc", "abc") === 1, "non-mentioned chars are compared with mentioned chars in special way [2]");
You have to keep two sortkey strings. One is for primary order, where German ä=a (primary a->a) and French é=e (primary sortkey e->e) and one for secondary order, where ä comes after a (translating a->azzzz in secondary key) or é comes after e (secondary key e->ezzzz). Especially in Czech some letters are variations of a letter (áéí…) whereas others stand in their full right in the list (ABCČD…GHChI…RŘSŠT…). Plus the problem to consider digraphs a single letters (primary ch->hzzzz). No trivial problem, and there should be a solution within JS.
Funny, I have to think about that problem and finished searching here, because it came in mind, that I can use my own javascript module. I wrote a module to generate a clean URL, therefor I have to translitate the input string... (http://pid.github.io/speakingurl/)
var mySlug = require('speakingurl').createSlug({
maintainCase: true,
separator: " "
});
var input = "Schöner Titel läßt grüßen!? Bel été !";
var result;
slug = mySlug(input);
console.log(result); // Output: "Schoener Titel laesst gruessen bel ete"
Now you can sort with this results. You can ex. store the original titel in the field "title" and the field for sorting in "title_sort" with the result of mySlug.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
This is a challenge to come up with the most elegant JavaScript, Ruby or other solution to a relatively trivial problem.
This problem is a more specific case of the Longest common substring problem. I need to only find the longest common starting substring in an array. This greatly simplifies the problem.
For example, the longest substring in [interspecies, interstelar, interstate] is "inters". However, I don't need to find "ific" in [specifics, terrific].
I've solved the problem by quickly coding up a solution in JavaScript as a part of my answer about shell-like tab-completion (test page here). Here is that solution, slightly tweaked:
function common_substring(data) {
var i, ch, memo, idx = 0
do {
memo = null
for (i=0; i < data.length; i++) {
ch = data[i].charAt(idx)
if (!ch) break
if (!memo) memo = ch
else if (ch != memo) break
}
} while (i == data.length && idx < data.length && ++idx)
return (data[0] || '').slice(0, idx)
}
This code is available in this Gist along with a similar solution in Ruby. You can clone the gist as a git repo to try it out:
$ git clone git://gist.github.com/257891.git substring-challenge
I'm not very happy with those solutions. I have a feeling they might be solved with more elegance and less execution complexity—that's why I'm posting this challenge.
I'm going to accept as an answer the solution I find the most elegant or concise. Here is for instance a crazy Ruby hack I come up with—defining the & operator on String:
# works with Ruby 1.8.7 and above
class String
def &(other)
difference = other.to_str.each_char.with_index.find { |ch, idx|
self[idx].nil? or ch != self[idx].chr
}
difference ? self[0, difference.last] : self
end
end
class Array
def common_substring
self.inject(nil) { |memo, str| memo.nil? ? str : memo & str }.to_s
end
end
Solutions in JavaScript or Ruby are preferred, but you can show off clever solution in other languages as long as you explain what's going on. Only code from standard library please.
Update: my favorite solutions
I've chosen the JavaScript sorting solution by kennebec as the "answer" because it struck me as both unexpected and genius. If we disregard the complexity of actual sorting (let's imagine it's infinitely optimized by the language implementation), the complexity of the solution is just comparing two strings.
Other great solutions:
"regex greed" by FM takes a minute or two to grasp, but then the elegance of it hits you. Yehuda Katz also made a regex solution, but it's more complex
commonprefix in Python — Roberto Bonvallet used a feature made for handling filesystem paths to solve this problem
Haskell one-liner is short as if it were compressed, and beautiful
the straightforward Ruby one-liner
Thanks for participating! As you can see from the comments, I learned a lot (even about Ruby).
It's a matter of taste, but this is a simple javascript version:
It sorts the array, and then looks just at the first and last items.
//longest common starting substring in an array
function sharedStart(array){
var A= array.concat().sort(),
a1= A[0], a2= A[A.length-1], L= a1.length, i= 0;
while(i<L && a1.charAt(i)=== a2.charAt(i)) i++;
return a1.substring(0, i);
}
DEMOS
sharedStart(['interspecies', 'interstelar', 'interstate']) //=> 'inters'
sharedStart(['throne', 'throne']) //=> 'throne'
sharedStart(['throne', 'dungeon']) //=> ''
sharedStart(['cheese']) //=> 'cheese'
sharedStart([]) //=> ''
sharedStart(['prefix', 'suffix']) //=> ''
In Python:
>>> from os.path import commonprefix
>>> commonprefix('interspecies interstelar interstate'.split())
'inters'
Ruby one-liner:
l=strings.inject{|l,s| l=l.chop while l!=s[0...l.length];l}
My Haskell one-liner:
import Data.List
commonPre :: [String] -> String
commonPre = map head . takeWhile (\(x:xs)-> all (==x) xs) . transpose
EDIT: barkmadley gave a good explanation of the code below. I'd also add that haskell uses lazy evaluation, so we can be lazy about our use of transpose; it will only transpose our lists as far as necessary to find the end of the common prefix.
You just need to traverse all strings until they differ, then take the substring up to this point.
Pseudocode:
loop for i upfrom 0
while all strings[i] are equal
finally return substring[0..i]
Common Lisp:
(defun longest-common-starting-substring (&rest strings)
(loop for i from 0 below (apply #'min (mapcar #'length strings))
while (apply #'char=
(mapcar (lambda (string) (aref string i))
strings))
finally (return (subseq (first strings) 0 i))))
Yet another way to do it: use regex greed.
words = %w(interspecies interstelar interstate)
j = '='
str = ['', *words].join(j)
re = "[^#{j}]*"
str =~ /\A
(?: #{j} ( #{re} ) #{re} )
(?: #{j} \1 #{re} )*
\z/x
p $1
And the one-liner, courtesy of mislav (50 characters):
p ARGV.join(' ').match(/^(\w*)\w*(?: \1\w*)*$/)[1]
In Python I wouldn't use anything but the existing commonprefix function I showed in another answer, but I couldn't help to reinvent the wheel :P. This is my iterator-based approach:
>>> a = 'interspecies interstelar interstate'.split()
>>>
>>> from itertools import takewhile, chain, izip as zip, imap as map
>>> ''.join(chain(*takewhile(lambda s: len(s) == 1, map(set, zip(*a)))))
'inters'
Edit: Explanation of how this works.
zip generates tuples of elements taking one of each item of a at a time:
In [6]: list(zip(*a)) # here I use list() to expand the iterator
Out[6]:
[('i', 'i', 'i'),
('n', 'n', 'n'),
('t', 't', 't'),
('e', 'e', 'e'),
('r', 'r', 'r'),
('s', 's', 's'),
('p', 't', 't'),
('e', 'e', 'a'),
('c', 'l', 't'),
('i', 'a', 'e')]
By mapping set over these items, I get a series of unique letters:
In [7]: list(map(set, _)) # _ means the result of the last statement above
Out[7]:
[set(['i']),
set(['n']),
set(['t']),
set(['e']),
set(['r']),
set(['s']),
set(['p', 't']),
set(['a', 'e']),
set(['c', 'l', 't']),
set(['a', 'e', 'i'])]
takewhile(predicate, items) takes elements from this while the predicate is True; in this particular case, when the sets have one element, i.e. all the words have the same letter at that position:
In [8]: list(takewhile(lambda s: len(s) == 1, _))
Out[8]:
[set(['i']),
set(['n']),
set(['t']),
set(['e']),
set(['r']),
set(['s'])]
At this point we have an iterable of sets, each containing one letter of the prefix we were looking for. To construct the string, we chain them into a single iterable, from which we get the letters to join into the final string.
The magic of using iterators is that all items are generated on demand, so when takewhile stops asking for items, the zipping stops at that point and no unnecessary work is done. Each function call in my one-liner has a implicit for and an implicit break.
This is probably not the most concise solution (depends if you already have a library for this), but one elegant method is to use a trie. I use tries for implementing tab completion in my Scheme interpreter:
http://github.com/jcoglan/heist/blob/master/lib/trie.rb
For example:
tree = Trie.new
%w[interspecies interstelar interstate].each { |s| tree[s] = true }
tree.longest_prefix('')
#=> "inters"
I also use them for matching channel names with wildcards for the Bayeux protocol; see these:
http://github.com/jcoglan/faye/blob/master/client/channel.js
http://github.com/jcoglan/faye/blob/master/lib/faye/channel.rb
Just for the fun of it, here's a version written in (SWI-)PROLOG:
common_pre([[C|Cs]|Ss], [C|Res]) :-
maplist(head_tail(C), [[C|Cs]|Ss], RemSs), !,
common_pre(RemSs, Res).
common_pre(_, []).
head_tail(H, [H|T], T).
Running:
?- S=["interspecies", "interstelar", "interstate"], common_pre(S, CP), string_to_list(CPString, CP).
Gives:
CP = [105, 110, 116, 101, 114, 115],
CPString = "inters".
Explanation:
(SWI-)PROLOG treats strings as lists of character codes (numbers). All the predicate common_pre/2 does is recursively pattern-match to select the first code (C) from the head of the first list (string, [C|Cs]) in the list of all lists (all strings, [[C|Cs]|Ss]), and appends the matching code C to the result iff it is common to all (remaining) heads of all lists (strings), else it terminates.
Nice, clean, simple and efficient... :)
A javascript version based on #Svante's algorithm:
function commonSubstring(words){
var iChar, iWord,
refWord = words[0],
lRefWord = refWord.length,
lWords = words.length;
for (iChar = 0; iChar < lRefWord; iChar += 1) {
for (iWord = 1; iWord < lWords; iWord += 1) {
if (refWord[iChar] !== words[iWord][iChar]) {
return refWord.substring(0, iChar);
}
}
}
return refWord;
}
Combining answers by kennebec, Florian F and jberryman yields the following Haskell one-liner:
commonPrefix l = map fst . takeWhile (uncurry (==)) $ zip (minimum l) (maximum l)
With Control.Arrow one can get a point-free form:
commonPrefix = map fst . takeWhile (uncurry (==)) . uncurry zip . (minimum &&& maximum)
Python 2.6 (r26:66714, Oct 4 2008, 02:48:43)
>>> a = ['interspecies', 'interstelar', 'interstate']
>>> print a[0][:max(
[i for i in range(min(map(len, a)))
if len(set(map(lambda e: e[i], a))) == 1]
) + 1]
inters
i for i in range(min(map(len, a))), number of maximum lookups can't be greater than the length of the shortest string; in this example this would evaluate to [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
len(set(map(lambda e: e[i], a))), 1) create an array of the i-thcharacter for each string in the list; 2) make a set out of it; 3) determine the size of the set
[i for i in range(min(map(len, a))) if len(set(map(lambda e: e[i], a))) == 1], include just the characters, for which the size of the set is 1 (all characters at that position were the same ..); here it would evaluate to [0, 1, 2, 3, 4, 5]
finally take the max, add one, and get the substring ...
Note: the above does not work for a = ['intersyate', 'intersxate', 'interstate', 'intersrate'], but this would:
>>> index = len(
filter(lambda l: l[0] == l[1],
[ x for x in enumerate(
[i for i in range(min(map(len, a)))
if len(set(map(lambda e: e[i], a))) == 1]
)]))
>>> a[0][:index]
inters
Doesn't seem that complicated if you're not too concerned about ultimate performance:
def common_substring(data)
data.inject { |m, s| s[0,(0..m.length).find { |i| m[i] != s[i] }.to_i] }
end
One of the useful features of inject is the ability to pre-seed with the first element of the array being interated over. This avoids the nil memo check.
puts common_substring(%w[ interspecies interstelar interstate ]).inspect
# => "inters"
puts common_substring(%w[ feet feel feeble ]).inspect
# => "fee"
puts common_substring(%w[ fine firkin fail ]).inspect
# => "f"
puts common_substring(%w[ alpha bravo charlie ]).inspect
# => ""
puts common_substring(%w[ fork ]).inspect
# => "fork"
puts common_substring(%w[ fork forks ]).inspect
# => "fork"
Update: If golf is the game here, then 67 characters:
def f(d)d.inject{|m,s|s[0,(0..m.size).find{|i|m[i]!=s[i]}.to_i]}end
This one is very similar to Roberto Bonvallet's solution, except in ruby.
chars = %w[interspecies interstelar interstate].map {|w| w.split('') }
chars[0].zip(*chars[1..-1]).map { |c| c.uniq }.take_while { |c| c.size == 1 }.join
The first line replaces each word with an array of chars. Next, I use zip to create this data structure:
[["i", "i", "i"], ["n", "n", "n"], ["t", "t", "t"], ...
map and uniq reduce this to [["i"],["n"],["t"], ...
take_while pulls the chars off the array until it finds one where the size isn't one (meaning not all chars were the same). Finally, I join them back together.
The accepted solution is broken (for example, it returns a for strings like ['a', 'ba']). The fix is very simple, you literally have to change only 3 characters (from indexOf(tem1) == -1 to indexOf(tem1) != 0) and the function would work as expected.
Unfortunately, when I tried to edit the answer to fix the typo, SO told me that "edits must be at least 6 characters". I could change more then those 3 chars, by improving naming and readability but that feels like a little bit too much.
So, below is a fixed and improved (at least from my point of view) version of the kennebec's solution:
function commonPrefix(words) {
max_word = words.reduce(function(a, b) { return a > b ? a : b });
prefix = words.reduce(function(a, b) { return a > b ? b : a }); // min word
while(max_word.indexOf(prefix) != 0) {
prefix = prefix.slice(0, -1);
}
return prefix;
}
(on jsFiddle)
Note, that it uses reduce method (JavaScript 1.8) in order to find alphanumeric max / min instead of sorting the array and then fetching the first and the last elements of it.
While reading these answers with all the fancy functional programming, sorting and regexes and whatnot, I just thought: what's wrong a little bit of C? So here's a goofy looking little program.
#include <stdio.h>
int main (int argc, char *argv[])
{
int i = -1, j, c;
if (argc < 2)
return 1;
while (c = argv[1][++i])
for (j = 2; j < argc; j++)
if (argv[j][i] != c)
goto out;
out:
printf("Longest common prefix: %.*s\n", i, argv[1]);
}
Compile it, run it with your list of strings as command line arguments, then upvote me for using goto!
Here's a solution using regular expressions in Ruby:
def build_regex(string)
arr = []
arr << string.dup while string.chop!
Regexp.new("^(#{arr.join("|")})")
end
def substring(first, *strings)
strings.inject(first) do |accum, string|
build_regex(accum).match(string)[0]
end
end
I would do the following:
Take the first string of the array as the initial starting substring.
Take the next string of the array and compare the characters until the end of one of the strings is reached or a mismatch is found. If a mismatch is found, reduce starting substring to the length where the mismatch was found.
Repeat step 2 until all strings have been tested.
Here’s a JavaScript implementation:
var array = ["interspecies", "interstelar", "interstate"],
prefix = array[0],
len = prefix.length;
for (i=1; i<array.length; i++) {
for (j=0, len=Math.min(len,array[j].length); j<len; j++) {
if (prefix[j] != array[i][j]) {
len = j;
prefix = prefix.substr(0, len);
break;
}
}
}
Instead of sorting, you could just get the min and max of the strings.
To me, elegance in a computer program is a balance of speed and simplicity.
It should not do unnecessary computation, and it should be simple enough to make its correctness evident.
I could call the sorting solution "clever", but not "elegant".
Oftentimes it's more elegant to use a mature open source library instead of rolling your own. Then, if it doesn't completely suit your needs, you can extend it or modify it to improve it, and let the community decide if that belongs in the library.
diff-lcs is a good Ruby gem for least common substring.
My solution in Java:
public static String compute(Collection<String> strings) {
if(strings.isEmpty()) return "";
Set<Character> v = new HashSet<Character>();
int i = 0;
try {
while(true) {
for(String s : strings) v.add(s.charAt(i));
if(v.size() > 1) break;
v.clear();
i++;
}
} catch(StringIndexOutOfBoundsException ex) {}
return strings.iterator().next().substring(0, i);
}
Golfed JS solution just for fun:
w=["hello", "hell", "helen"];
c=w.reduce(function(p,c){
for(r="",i=0;p[i]==c[i];r+=p[i],i++){}
return r;
});
Here's an efficient solution in ruby. I based the idea of the strategy for a hi/lo guessing game where you iteratively zero in on the longest prefix.
Someone correct me if I'm wrong, but I think the complexity is O(n log n), where n is the length of the shortest string and the number of strings is considered a constant.
def common(strings)
lo = 0
hi = strings.map(&:length).min - 1
return '' if hi < lo
guess, last_guess = lo, hi
while guess != last_guess
last_guess = guess
guess = lo + ((hi - lo) / 2.0).ceil
if strings.map { |s| s[0..guess] }.uniq.length == 1
lo = guess
else
hi = guess
end
end
strings.map { |s| s[0...guess] }.uniq.length == 1 ? strings.first[0...guess] : ''
end
And some checks that it works:
>> common %w{ interspecies interstelar interstate }
=> "inters"
>> common %w{ dog dalmation }
=> "d"
>> common %w{ asdf qwerty }
=> ""
>> common ['', 'asdf']
=> ""
Fun alternative Ruby solution:
def common_prefix(*strings)
chars = strings.map(&:chars)
length = chars.first.zip( *chars[1..-1] ).index{ |a| a.uniq.length>1 }
strings.first[0,length]
end
p common_prefix( 'foon', 'foost', 'forlorn' ) #=> "fo"
p common_prefix( 'foost', 'foobar', 'foon' ) #=> "foo"
p common_prefix( 'a','b' ) #=> ""
It might help speed if you used chars = strings.sort_by(&:length).map(&:chars), since the shorter the first string, the shorter the arrays created by zip. However, if you cared about speed, you probably shouldn't use this solution anyhow. :)
Javascript clone of AShelly's excellent answer.
Requires Array#reduce which is supported only in firefox.
var strings = ["interspecies", "intermediate", "interrogation"]
var sub = strings.reduce(function(l,r) {
while(l!=r.slice(0,l.length)) {
l = l.slice(0, -1);
}
return l;
});
This is by no means elegant, but if you want concise:
Ruby, 71 chars
def f(a)b=a[0];b[0,(0..b.size).find{|n|a.any?{|i|i[0,n]!=b[0,n]}}-1]end
If you want that unrolled it looks like this:
def f(words)
first_word = words[0];
first_word[0, (0..(first_word.size)).find { |num_chars|
words.any? { |word| word[0, num_chars] != first_word[0, num_chars] }
} - 1]
end
It's not code golf, but you asked for somewhat elegant, and I tend to think recursion is fun. Java.
/** Recursively find the common prefix. */
public String findCommonPrefix(String[] strings) {
int minLength = findMinLength(strings);
if (isFirstCharacterSame(strings)) {
return strings[0].charAt(0) + findCommonPrefix(removeFirstCharacter(strings));
} else {
return "";
}
}
/** Get the minimum length of a string in strings[]. */
private int findMinLength(final String[] strings) {
int length = strings[0].size();
for (String string : strings) {
if (string.size() < length) {
length = string.size();
}
}
return length;
}
/** Compare the first character of all strings. */
private boolean isFirstCharacterSame(String[] strings) {
char c = string[0].charAt(0);
for (String string : strings) {
if (c != string.charAt(0)) return false;
}
return true;
}
/** Remove the first character of each string in the array,
and return a new array with the results. */
private String[] removeFirstCharacter(String[] source) {
String[] result = new String[source.length];
for (int i=0; i<result.length; i++) {
result[i] = source[i].substring(1);
}
return result;
}
A ruby version based on #Svante's algorithm. Runs ~3x as fast as my first one.
def common_prefix set
i=0
rest=set[1..-1]
set[0].each_byte{|c|
rest.each{|e|return set[0][0...i] if e[i]!=c}
i+=1
}
set
end
My Javascript solution:
IMOP, using sort is too tricky.
My solution is compare letter by letter through looping the array.
Return string if letter is not macthed.
This is my solution:
var longestCommonPrefix = function(strs){
if(strs.length < 1){
return '';
}
var p = 0, i = 0, c = strs[0][0];
while(p < strs[i].length && strs[i][p] === c){
i++;
if(i === strs.length){
i = 0;
p++;
c = strs[0][p];
}
}
return strs[0].substr(0, p);
};
Realizing the risk of this turning into a match of code golf (or is that the intention?), here's my solution using sed, copied from my answer to another SO question and shortened to 36 chars (30 of which are the actual sed expression). It expects the strings (each on a seperate line) to be supplied on standard input or in files passed as additional arguments.
sed 'N;s/^\(.*\).*\n\1.*$/\1\n\1/;D'
A script with sed in the shebang line weighs in at 45 chars:
#!/bin/sed -f
N;s/^\(.*\).*\n\1.*$/\1\n\1/;D
A test run of the script (named longestprefix), with strings supplied as a "here document":
$ ./longestprefix <<EOF
> interspecies
> interstelar
> interstate
> EOF
inters
$