python equivalent of javascript array destructuring [duplicate] - javascript

list.append() appends to the end of a list. This explains that list.prepend() does not exist due to performance concerns for large lists. For a short list, how do I prepend a value?

The s.insert(0, x) form is the most common.
Whenever you see it though, it may be time to consider using a collections.deque instead of a list. Prepending to a deque runs in constant time. Prepending to a list runs in linear time.

This creates a new list with x prepended to it, rather than modifying an existing list:
new_list = [x] + old_list

What's the idiomatic syntax for prepending to a short python list?
You don't usually want to repetitively prepend to a list in Python.
If the list is short, and you're not doing it a lot... then ok.
list.insert
The list.insert can be used this way.
list.insert(0, x)
But this is inefficient, because in Python, a list is an array of pointers, and Python must now take every pointer in the list and move it down by one to insert the pointer to your object in the first slot, so this is really only efficient for rather short lists, as you ask.
Here's a snippet from the CPython source where this is implemented - and as you can see, we start at the end of the array and move everything down by one for every insertion:
for (i = n; --i >= where; )
items[i+1] = items[i];
If you want a container/list that's efficient at prepending elements, you want a linked list. Python has a doubly linked list, which can insert at the beginning and end quickly - it's called a deque.
deque.appendleft
A collections.deque has many of the methods of a list. list.sort is an exception, making deque definitively not entirely Liskov substitutable for list.
>>> set(dir(list)) - set(dir(deque))
{'sort'}
The deque also has an appendleft method (as well as popleft). The deque is a double-ended queue and a doubly-linked list - no matter the length, it always takes the same amount of time to preprend something. In big O notation, O(1) versus the O(n) time for lists. Here's the usage:
>>> import collections
>>> d = collections.deque('1234')
>>> d
deque(['1', '2', '3', '4'])
>>> d.appendleft('0')
>>> d
deque(['0', '1', '2', '3', '4'])
deque.extendleft
Also relevant is the deque's extendleft method, which iteratively prepends:
>>> from collections import deque
>>> d2 = deque('def')
>>> d2.extendleft('cba')
>>> d2
deque(['a', 'b', 'c', 'd', 'e', 'f'])
Note that each element will be prepended one at a time, thus effectively reversing their order.
Performance of list versus deque
First we setup with some iterative prepending:
import timeit
from collections import deque
def list_insert_0(prepends: int):
l = []
for i in range(prepends):
l.insert(0, i)
def list_slice_insert(prepends):
l = []
for i in range(prepends):
l[:0] = [i] # semantically same as list.insert(0, i)
def list_add(prepends):
l = []
for i in range(prepends):
l = [i] + l # caveat: new list each time
def deque_appendleft(prepends):
d = deque()
for i in range(prepends):
d.appendleft(i) # semantically same as list.insert(0, i)
def deque_extendleft(prepends):
d = deque()
d.extendleft(range(prepends)) # semantically same as deque_appendleft above
And a function for analysis, so that we can fairly compare all operations across a range of usages:
def compare_prepends(n, runs_per_trial):
results = {}
for function in (
list_insert_0, list_slice_insert,
list_add, deque_appendleft, deque_extendleft,
):
shortest_time = min(timeit.repeat(
lambda: function(n), number=runs_per_trial))
results[function.__name__] = shortest_time
ranked_methods = sorted(results.items(), key=lambda kv: kv[1])
for name, duration in ranked_methods:
print(f'{name} took {duration} seconds')
and performance (adjusting the number of runs per trial down to compensate for longer running times of more prepends - repeat does three trials by default):
compare_prepends(20, 1_000_000)
compare_prepends(100, 100_000)
compare_prepends(500, 100_000)
compare_prepends(2500, 10_000)
>>> compare_prepends(20, 1_000_000)
deque_extendleft took 0.6490256823599339 seconds
deque_appendleft took 1.4702797569334507 seconds
list_insert_0 took 1.9417422469705343 seconds
list_add took 2.7092894352972507 seconds
list_slice_insert took 3.1809083241969347 seconds
>>> compare_prepends(100, 100_000)
deque_extendleft took 0.1177942156791687 seconds
deque_appendleft took 0.5385235995054245 seconds
list_insert_0 took 0.9471780974417925 seconds
list_slice_insert took 1.4850486349314451 seconds
list_add took 2.1660344172269106 seconds
>>> compare_prepends(500, 100_000)
deque_extendleft took 0.7309095915406942 seconds
deque_appendleft took 2.895373275503516 seconds
list_slice_insert took 8.782583676278591 seconds
list_insert_0 took 8.931685039773583 seconds
list_add took 30.113558700308204 seconds
>>> compare_prepends(2500, 10_000)
deque_extendleft took 0.4839253816753626 seconds
deque_appendleft took 1.5615574326366186 seconds
list_slice_insert took 6.712615916505456 seconds
list_insert_0 took 13.894083382561803 seconds
list_add took 72.1727528590709 seconds
The deque is much faster. As the lists get longer, deques perform even better. If you can use deque's extendleft you'll probably get the best performance that way.
If you must use lists, keep in mind that for small lists, list.insert works faster, but for larger lists, inserting using slice notation becomes faster.
Don't prepend to lists
Lists were meant to be appended to, not prepended to. If you have a situation where this kind of prepending is a hurting the performace of your code, either switch to a deque or, if you can reverse your semantics and accomplish the same goal, reverse your list and append instead.
In general, avoid prepending to the built-in Python list object.

If someone finds this question like me, here are my performance tests of proposed methods:
Python 2.7.8
In [1]: %timeit ([1]*1000000).insert(0, 0)
100 loops, best of 3: 4.62 ms per loop
In [2]: %timeit ([1]*1000000)[0:0] = [0]
100 loops, best of 3: 4.55 ms per loop
In [3]: %timeit [0] + [1]*1000000
100 loops, best of 3: 8.04 ms per loop
As you can see, insert and slice assignment are as almost twice as fast than explicit adding and are very close in results. As Raymond Hettinger noted insert is more common option and I, personally prefer this way to prepend to list.

In my opinion, the most elegant and idiomatic way of prepending an element or list to another list, in Python, is using the expansion operator * (also called unpacking operator),
# Initial list
l = [4, 5, 6]
# Modification
l = [1, 2, 3, *l]
Where the resulting list after the modification is [1, 2, 3, 4, 5, 6]
I also like simply combining two lists with the operator +, as shown,
# Prepends [1, 2, 3] to l
l = [1, 2, 3] + l
# Prepends element 42 to l
l = [42] + l
I don't like the other common approach, l.insert(0, value), as it requires a magic number. Moreover, insert() only allows prepending a single element, however the approach above has the same syntax for prepending a single element or multiple elements.

Lets go over 4 methods
Using insert()
>>>
>>> l = list(range(5))
>>> l
[0, 1, 2, 3, 4]
>>> l.insert(0, 5)
>>> l
[5, 0, 1, 2, 3, 4]
>>>
Using [] and +
>>>
>>> l = list(range(5))
>>> l
[0, 1, 2, 3, 4]
>>> l = [5] + l
>>> l
[5, 0, 1, 2, 3, 4]
>>>
Using Slicing
>>>
>>> l = list(range(5))
>>> l
[0, 1, 2, 3, 4]
>>> l[:0] = [5]
>>> l
[5, 0, 1, 2, 3, 4]
>>>
Using collections.deque.appendleft()
>>>
>>> from collections import deque
>>>
>>> l = list(range(5))
>>> l
[0, 1, 2, 3, 4]
>>> l = deque(l)
>>> l.appendleft(5)
>>> l = list(l)
>>> l
[5, 0, 1, 2, 3, 4]
>>>

I would have done something quite fast forward in python >= 3.0
list=[0,*list]
It maybe be not the most efficient way, but it's the most Pythonic in my opinion.

Related

Simple Algorithm From Javascript to Python code not producing same result [duplicate]

How does Python's slice notation work? That is: when I write code like a[x:y:z], a[:], a[::2] etc., how can I understand which elements end up in the slice? Please include references where appropriate.
See also: Why are slice and range upper-bound exclusive?
The syntax is:
a[start:stop] # items start through stop-1
a[start:] # items start through the rest of the array
a[:stop] # items from the beginning through stop-1
a[:] # a copy of the whole array
There is also the step value, which can be used with any of the above:
a[start:stop:step] # start through not past stop, by step
The key point to remember is that the :stop value represents the first value that is not in the selected slice. So, the difference between stop and start is the number of elements selected (if step is 1, the default).
The other feature is that start or stop may be a negative number, which means it counts from the end of the array instead of the beginning. So:
a[-1] # last item in the array
a[-2:] # last two items in the array
a[:-2] # everything except the last two items
Similarly, step may be a negative number:
a[::-1] # all items in the array, reversed
a[1::-1] # the first two items, reversed
a[:-3:-1] # the last two items, reversed
a[-3::-1] # everything except the last two items, reversed
Python is kind to the programmer if there are fewer items than you ask for. For example, if you ask for a[:-2] and a only contains one element, you get an empty list instead of an error. Sometimes you would prefer the error, so you have to be aware that this may happen.
Relationship with the slice object
A slice object can represent a slicing operation, i.e.:
a[start:stop:step]
is equivalent to:
a[slice(start, stop, step)]
Slice objects also behave slightly differently depending on the number of arguments, similarly to range(), i.e. both slice(stop) and slice(start, stop[, step]) are supported.
To skip specifying a given argument, one might use None, so that e.g. a[start:] is equivalent to a[slice(start, None)] or a[::-1] is equivalent to a[slice(None, None, -1)].
While the :-based notation is very helpful for simple slicing, the explicit use of slice() objects simplifies the programmatic generation of slicing.
The Python tutorial talks about it (scroll down a bit until you get to the part about slicing).
The ASCII art diagram is helpful too for remembering how slices work:
+---+---+---+---+---+---+
| P | y | t | h | o | n |
+---+---+---+---+---+---+
0 1 2 3 4 5 6
-6 -5 -4 -3 -2 -1
One way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of n characters has index n.
Enumerating the possibilities allowed by the grammar for the sequence x:
>>> x[:] # [x[0], x[1], ..., x[-1] ]
>>> x[low:] # [x[low], x[low+1], ..., x[-1] ]
>>> x[:high] # [x[0], x[1], ..., x[high-1]]
>>> x[low:high] # [x[low], x[low+1], ..., x[high-1]]
>>> x[::stride] # [x[0], x[stride], ..., x[-1] ]
>>> x[low::stride] # [x[low], x[low+stride], ..., x[-1] ]
>>> x[:high:stride] # [x[0], x[stride], ..., x[high-1]]
>>> x[low:high:stride] # [x[low], x[low+stride], ..., x[high-1]]
Of course, if (high-low)%stride != 0, then the end point will be a little lower than high-1.
If stride is negative, the ordering is changed a bit since we're counting down:
>>> x[::-stride] # [x[-1], x[-1-stride], ..., x[0] ]
>>> x[high::-stride] # [x[high], x[high-stride], ..., x[0] ]
>>> x[:low:-stride] # [x[-1], x[-1-stride], ..., x[low+1]]
>>> x[high:low:-stride] # [x[high], x[high-stride], ..., x[low+1]]
Extended slicing (with commas and ellipses) are mostly used only by special data structures (like NumPy); the basic sequences don't support them.
>>> class slicee:
... def __getitem__(self, item):
... return repr(item)
...
>>> slicee()[0, 1:2, ::5, ...]
'(0, slice(1, 2, None), slice(None, None, 5), Ellipsis)'
The answers above don't discuss slice assignment. To understand slice assignment, it's helpful to add another concept to the ASCII art:
+---+---+---+---+---+---+
| P | y | t | h | o | n |
+---+---+---+---+---+---+
Slice position: 0 1 2 3 4 5 6
Index position: 0 1 2 3 4 5
>>> p = ['P','y','t','h','o','n']
# Why the two sets of numbers:
# indexing gives items, not lists
>>> p[0]
'P'
>>> p[5]
'n'
# Slicing gives lists
>>> p[0:1]
['P']
>>> p[0:2]
['P','y']
One heuristic is, for a slice from zero to n, think: "zero is the beginning, start at the beginning and take n items in a list".
>>> p[5] # the last of six items, indexed from zero
'n'
>>> p[0:5] # does NOT include the last item!
['P','y','t','h','o']
>>> p[0:6] # not p[0:5]!!!
['P','y','t','h','o','n']
Another heuristic is, "for any slice, replace the start by zero, apply the previous heuristic to get the end of the list, then count the first number back up to chop items off the beginning"
>>> p[0:4] # Start at the beginning and count out 4 items
['P','y','t','h']
>>> p[1:4] # Take one item off the front
['y','t','h']
>>> p[2:4] # Take two items off the front
['t','h']
# etc.
The first rule of slice assignment is that since slicing returns a list, slice assignment requires a list (or other iterable):
>>> p[2:3]
['t']
>>> p[2:3] = ['T']
>>> p
['P','y','T','h','o','n']
>>> p[2:3] = 't'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only assign an iterable
The second rule of slice assignment, which you can also see above, is that whatever portion of the list is returned by slice indexing, that's the same portion that is changed by slice assignment:
>>> p[2:4]
['T','h']
>>> p[2:4] = ['t','r']
>>> p
['P','y','t','r','o','n']
The third rule of slice assignment is, the assigned list (iterable) doesn't have to have the same length; the indexed slice is simply sliced out and replaced en masse by whatever is being assigned:
>>> p = ['P','y','t','h','o','n'] # Start over
>>> p[2:4] = ['s','p','a','m']
>>> p
['P','y','s','p','a','m','o','n']
The trickiest part to get used to is assignment to empty slices. Using heuristic 1 and 2 it's easy to get your head around indexing an empty slice:
>>> p = ['P','y','t','h','o','n']
>>> p[0:4]
['P','y','t','h']
>>> p[1:4]
['y','t','h']
>>> p[2:4]
['t','h']
>>> p[3:4]
['h']
>>> p[4:4]
[]
And then once you've seen that, slice assignment to the empty slice makes sense too:
>>> p = ['P','y','t','h','o','n']
>>> p[2:4] = ['x','y'] # Assigned list is same length as slice
>>> p
['P','y','x','y','o','n'] # Result is same length
>>> p = ['P','y','t','h','o','n']
>>> p[3:4] = ['x','y'] # Assigned list is longer than slice
>>> p
['P','y','t','x','y','o','n'] # The result is longer
>>> p = ['P','y','t','h','o','n']
>>> p[4:4] = ['x','y']
>>> p
['P','y','t','h','x','y','o','n'] # The result is longer still
Note that, since we are not changing the second number of the slice (4), the inserted items always stack right up against the 'o', even when we're assigning to the empty slice. So the position for the empty slice assignment is the logical extension of the positions for the non-empty slice assignments.
Backing up a little bit, what happens when you keep going with our procession of counting up the slice beginning?
>>> p = ['P','y','t','h','o','n']
>>> p[0:4]
['P','y','t','h']
>>> p[1:4]
['y','t','h']
>>> p[2:4]
['t','h']
>>> p[3:4]
['h']
>>> p[4:4]
[]
>>> p[5:4]
[]
>>> p[6:4]
[]
With slicing, once you're done, you're done; it doesn't start slicing backwards. In Python you don't get negative strides unless you explicitly ask for them by using a negative number.
>>> p[5:3:-1]
['n','o']
There are some weird consequences to the "once you're done, you're done" rule:
>>> p[4:4]
[]
>>> p[5:4]
[]
>>> p[6:4]
[]
>>> p[6]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
In fact, compared to indexing, Python slicing is bizarrely error-proof:
>>> p[100:200]
[]
>>> p[int(2e99):int(1e99)]
[]
This can come in handy sometimes, but it can also lead to somewhat strange behavior:
>>> p
['P', 'y', 't', 'h', 'o', 'n']
>>> p[int(2e99):int(1e99)] = ['p','o','w','e','r']
>>> p
['P', 'y', 't', 'h', 'o', 'n', 'p', 'o', 'w', 'e', 'r']
Depending on your application, that might... or might not... be what you were hoping for there!
Below is the text of my original answer. It has been useful to many people, so I didn't want to delete it.
>>> r=[1,2,3,4]
>>> r[1:1]
[]
>>> r[1:1]=[9,8]
>>> r
[1, 9, 8, 2, 3, 4]
>>> r[1:1]=['blah']
>>> r
[1, 'blah', 9, 8, 2, 3, 4]
This may also clarify the difference between slicing and indexing.
Explain Python's slice notation
In short, the colons (:) in subscript notation (subscriptable[subscriptarg]) make slice notation, which has the optional arguments start, stop, and step:
sliceable[start:stop:step]
Python slicing is a computationally fast way to methodically access parts of your data. In my opinion, to be even an intermediate Python programmer, it's one aspect of the language that it is necessary to be familiar with.
Important Definitions
To begin with, let's define a few terms:
start: the beginning index of the slice, it will include the element at this index unless it is the same as stop, defaults to 0, i.e. the first index. If it's negative, it means to start n items from the end.
stop: the ending index of the slice, it does not include the element at this index, defaults to length of the sequence being sliced, that is, up to and including the end.
step: the amount by which the index increases, defaults to 1. If it's negative, you're slicing over the iterable in reverse.
How Indexing Works
You can make any of these positive or negative numbers. The meaning of the positive numbers is straightforward, but for negative numbers, just like indexes in Python, you count backwards from the end for the start and stop, and for the step, you simply decrement your index. This example is from the documentation's tutorial, but I've modified it slightly to indicate which item in a sequence each index references:
+---+---+---+---+---+---+
| P | y | t | h | o | n |
+---+---+---+---+---+---+
0 1 2 3 4 5
-6 -5 -4 -3 -2 -1
How Slicing Works
To use slice notation with a sequence that supports it, you must include at least one colon in the square brackets that follow the sequence (which actually implement the __getitem__ method of the sequence, according to the Python data model.)
Slice notation works like this:
sequence[start:stop:step]
And recall that there are defaults for start, stop, and step, so to access the defaults, simply leave out the argument.
Slice notation to get the last nine elements from a list (or any other sequence that supports it, like a string) would look like this:
my_list[-9:]
When I see this, I read the part in the brackets as "9th from the end, to the end." (Actually, I abbreviate it mentally as "-9, on")
Explanation:
The full notation is
my_list[-9:None:None]
and to substitute the defaults (actually when step is negative, stop's default is -len(my_list) - 1, so None for stop really just means it goes to whichever end step takes it to):
my_list[-9:len(my_list):1]
The colon, :, is what tells Python you're giving it a slice and not a regular index. That's why the idiomatic way of making a shallow copy of lists in Python 2 is
list_copy = sequence[:]
And clearing them is with:
del my_list[:]
(Python 3 gets a list.copy and list.clear method.)
When step is negative, the defaults for start and stop change
By default, when the step argument is empty (or None), it is assigned to +1.
But you can pass in a negative integer, and the list (or most other standard sliceables) will be sliced from the end to the beginning.
Thus a negative slice will change the defaults for start and stop!
Confirming this in the source
I like to encourage users to read the source as well as the documentation. The source code for slice objects and this logic is found here. First we determine if step is negative:
step_is_negative = step_sign < 0;
If so, the lower bound is -1 meaning we slice all the way up to and including the beginning, and the upper bound is the length minus 1, meaning we start at the end. (Note that the semantics of this -1 is different from a -1 that users may pass indexes in Python indicating the last item.)
if (step_is_negative) {
lower = PyLong_FromLong(-1L);
if (lower == NULL)
goto error;
upper = PyNumber_Add(length, lower);
if (upper == NULL)
goto error;
}
Otherwise step is positive, and the lower bound will be zero and the upper bound (which we go up to but not including) the length of the sliced list.
else {
lower = _PyLong_Zero;
Py_INCREF(lower);
upper = length;
Py_INCREF(upper);
}
Then, we may need to apply the defaults for start and stop—the default then for start is calculated as the upper bound when step is negative:
if (self->start == Py_None) {
start = step_is_negative ? upper : lower;
Py_INCREF(start);
}
and stop, the lower bound:
if (self->stop == Py_None) {
stop = step_is_negative ? lower : upper;
Py_INCREF(stop);
}
Give your slices a descriptive name!
You may find it useful to separate forming the slice from passing it to the list.__getitem__ method (that's what the square brackets do). Even if you're not new to it, it keeps your code more readable so that others that may have to read your code can more readily understand what you're doing.
However, you can't just assign some integers separated by colons to a variable. You need to use the slice object:
last_nine_slice = slice(-9, None)
The second argument, None, is required, so that the first argument is interpreted as the start argument otherwise it would be the stop argument.
You can then pass the slice object to your sequence:
>>> list(range(100))[last_nine_slice]
[91, 92, 93, 94, 95, 96, 97, 98, 99]
It's interesting that ranges also take slices:
>>> range(100)[last_nine_slice]
range(91, 100)
Memory Considerations:
Since slices of Python lists create new objects in memory, another important function to be aware of is itertools.islice. Typically you'll want to iterate over a slice, not just have it created statically in memory. islice is perfect for this. A caveat, it doesn't support negative arguments to start, stop, or step, so if that's an issue you may need to calculate indices or reverse the iterable in advance.
length = 100
last_nine_iter = itertools.islice(list(range(length)), length-9, None, 1)
list_last_nine = list(last_nine_iter)
and now:
>>> list_last_nine
[91, 92, 93, 94, 95, 96, 97, 98, 99]
The fact that list slices make a copy is a feature of lists themselves. If you're slicing advanced objects like a Pandas DataFrame, it may return a view on the original, and not a copy.
And a couple of things that weren't immediately obvious to me when I first saw the slicing syntax:
>>> x = [1,2,3,4,5,6]
>>> x[::-1]
[6,5,4,3,2,1]
Easy way to reverse sequences!
And if you wanted, for some reason, every second item in the reversed sequence:
>>> x = [1,2,3,4,5,6]
>>> x[::-2]
[6,4,2]
In Python 2.7
Slicing in Python
[a:b:c]
len = length of string, tuple or list
c -- default is +1. The sign of c indicates forward or backward, absolute value of c indicates steps. Default is forward with step size 1. Positive means forward, negative means backward.
a -- When c is positive or blank, default is 0. When c is negative, default is -1.
b -- When c is positive or blank, default is len. When c is negative, default is -(len+1).
Understanding index assignment is very important.
In forward direction, starts at 0 and ends at len-1
In backward direction, starts at -1 and ends at -len
When you say [a:b:c], you are saying depending on the sign of c (forward or backward), start at a and end at b (excluding element at bth index). Use the indexing rule above and remember you will only find elements in this range:
-len, -len+1, -len+2, ..., 0, 1, 2,3,4 , len -1
But this range continues in both directions infinitely:
...,-len -2 ,-len-1,-len, -len+1, -len+2, ..., 0, 1, 2,3,4 , len -1, len, len +1, len+2 , ....
For example:
0 1 2 3 4 5 6 7 8 9 10 11
a s t r i n g
-9 -8 -7 -6 -5 -4 -3 -2 -1
If your choice of a, b, and c allows overlap with the range above as you traverse using rules for a,b,c above you will either get a list with elements (touched during traversal) or you will get an empty list.
One last thing: if a and b are equal, then also you get an empty list:
>>> l1
[2, 3, 4]
>>> l1[:]
[2, 3, 4]
>>> l1[::-1] # a default is -1 , b default is -(len+1)
[4, 3, 2]
>>> l1[:-4:-1] # a default is -1
[4, 3, 2]
>>> l1[:-3:-1] # a default is -1
[4, 3]
>>> l1[::] # c default is +1, so a default is 0, b default is len
[2, 3, 4]
>>> l1[::-1] # c is -1 , so a default is -1 and b default is -(len+1)
[4, 3, 2]
>>> l1[-100:-200:-1] # Interesting
[]
>>> l1[-1:-200:-1] # Interesting
[4, 3, 2]
>>> l1[-1:-1:1]
[]
>>> l1[-1:5:1] # Interesting
[4]
>>> l1[1:-7:1]
[]
>>> l1[1:-7:-1] # Interesting
[3, 2]
>>> l1[:-2:-2] # a default is -1, stop(b) at -2 , step(c) by 2 in reverse direction
[4]
Found this great table at http://wiki.python.org/moin/MovingToPythonFromOtherLanguages
Python indexes and slices for a six-element list.
Indexes enumerate the elements, slices enumerate the spaces between the elements.
Index from rear: -6 -5 -4 -3 -2 -1 a=[0,1,2,3,4,5] a[1:]==[1,2,3,4,5]
Index from front: 0 1 2 3 4 5 len(a)==6 a[:5]==[0,1,2,3,4]
+---+---+---+---+---+---+ a[0]==0 a[:-2]==[0,1,2,3]
| a | b | c | d | e | f | a[5]==5 a[1:2]==[1]
+---+---+---+---+---+---+ a[-1]==5 a[1:-1]==[1,2,3,4]
Slice from front: : 1 2 3 4 5 : a[-2]==4
Slice from rear: : -5 -4 -3 -2 -1 :
b=a[:]
b==[0,1,2,3,4,5] (shallow copy of a)
After using it a bit I realise that the simplest description is that it is exactly the same as the arguments in a for loop...
(from:to:step)
Any of them are optional:
(:to:step)
(from::step)
(from:to)
Then the negative indexing just needs you to add the length of the string to the negative indices to understand it.
This works for me anyway...
I find it easier to remember how it works, and then I can figure out any specific start/stop/step combination.
It's instructive to understand range() first:
def range(start=0, stop, step=1): # Illegal syntax, but that's the effect
i = start
while (i < stop if step > 0 else i > stop):
yield i
i += step
Begin from start, increment by step, do not reach stop. Very simple.
The thing to remember about negative step is that stop is always the excluded end, whether it's higher or lower. If you want same slice in opposite order, it's much cleaner to do the reversal separately: e.g. 'abcde'[1:-2][::-1] slices off one char from left, two from right, then reverses. (See also reversed().)
Sequence slicing is same, except it first normalizes negative indexes, and it can never go outside the sequence:
TODO: The code below had a bug with "never go outside the sequence" when abs(step)>1; I think I patched it to be correct, but it's hard to understand.
def this_is_how_slicing_works(seq, start=None, stop=None, step=1):
if start is None:
start = (0 if step > 0 else len(seq)-1)
elif start < 0:
start += len(seq)
if not 0 <= start < len(seq): # clip if still outside bounds
start = (0 if step > 0 else len(seq)-1)
if stop is None:
stop = (len(seq) if step > 0 else -1) # really -1, not last element
elif stop < 0:
stop += len(seq)
for i in range(start, stop, step):
if 0 <= i < len(seq):
yield seq[i]
Don't worry about the is None details - just remember that omitting start and/or stop always does the right thing to give you the whole sequence.
Normalizing negative indexes first allows start and/or stop to be counted from the end independently: 'abcde'[1:-2] == 'abcde'[1:3] == 'bc' despite range(1,-2) == [].
The normalization is sometimes thought of as "modulo the length", but note it adds the length just once: e.g. 'abcde'[-53:42] is just the whole string.
I use the "an index points between elements" method of thinking about it myself, but one way of describing it which sometimes helps others get it is this:
mylist[X:Y]
X is the index of the first element you want.
Y is the index of the first element you don't want.
Index:
------------>
0 1 2 3 4
+---+---+---+---+---+
| a | b | c | d | e |
+---+---+---+---+---+
0 -4 -3 -2 -1
<------------
Slice:
<---------------|
|--------------->
: 1 2 3 4 :
+---+---+---+---+---+
| a | b | c | d | e |
+---+---+---+---+---+
: -4 -3 -2 -1 :
|--------------->
<---------------|
I hope this will help you to model the list in Python.
Reference: http://wiki.python.org/moin/MovingToPythonFromOtherLanguages
This is how I teach slices to newbies:
Understanding the difference between indexing and slicing:
Wiki Python has this amazing picture which clearly distinguishes indexing and slicing.
It is a list with six elements in it. To understand slicing better, consider that list as a set of six boxes placed together. Each box has an alphabet in it.
Indexing is like dealing with the contents of box. You can check contents of any box. But you can't check the contents of multiple boxes at once. You can even replace the contents of the box. But you can't place two balls in one box or replace two balls at a time.
In [122]: alpha = ['a', 'b', 'c', 'd', 'e', 'f']
In [123]: alpha
Out[123]: ['a', 'b', 'c', 'd', 'e', 'f']
In [124]: alpha[0]
Out[124]: 'a'
In [127]: alpha[0] = 'A'
In [128]: alpha
Out[128]: ['A', 'b', 'c', 'd', 'e', 'f']
In [129]: alpha[0,1]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-129-c7eb16585371> in <module>()
----> 1 alpha[0,1]
TypeError: list indices must be integers, not tuple
Slicing is like dealing with boxes themselves. You can pick up the first box and place it on another table. To pick up the box, all you need to know is the position of beginning and ending of the box.
You can even pick up the first three boxes or the last two boxes or all boxes between 1 and 4. So, you can pick any set of boxes if you know the beginning and ending. These positions are called start and stop positions.
The interesting thing is that you can replace multiple boxes at once. Also you can place multiple boxes wherever you like.
In [130]: alpha[0:1]
Out[130]: ['A']
In [131]: alpha[0:1] = 'a'
In [132]: alpha
Out[132]: ['a', 'b', 'c', 'd', 'e', 'f']
In [133]: alpha[0:2] = ['A', 'B']
In [134]: alpha
Out[134]: ['A', 'B', 'c', 'd', 'e', 'f']
In [135]: alpha[2:2] = ['x', 'xx']
In [136]: alpha
Out[136]: ['A', 'B', 'x', 'xx', 'c', 'd', 'e', 'f']
Slicing With Step:
Till now you have picked boxes continuously. But sometimes you need to pick up discretely. For example, you can pick up every second box. You can even pick up every third box from the end. This value is called step size. This represents the gap between your successive pickups. The step size should be positive if You are picking boxes from the beginning to end and vice versa.
In [137]: alpha = ['a', 'b', 'c', 'd', 'e', 'f']
In [142]: alpha[1:5:2]
Out[142]: ['b', 'd']
In [143]: alpha[-1:-5:-2]
Out[143]: ['f', 'd']
In [144]: alpha[1:5:-2]
Out[144]: []
In [145]: alpha[-1:-5:2]
Out[145]: []
How Python Figures Out Missing Parameters:
When slicing, if you leave out any parameter, Python tries to figure it out automatically.
If you check the source code of CPython, you will find a function called PySlice_GetIndicesEx() which figures out indices to a slice for any given parameters. Here is the logical equivalent code in Python.
This function takes a Python object and optional parameters for slicing and returns the start, stop, step, and slice length for the requested slice.
def py_slice_get_indices_ex(obj, start=None, stop=None, step=None):
length = len(obj)
if step is None:
step = 1
if step == 0:
raise Exception("Step cannot be zero.")
if start is None:
start = 0 if step > 0 else length - 1
else:
if start < 0:
start += length
if start < 0:
start = 0 if step > 0 else -1
if start >= length:
start = length if step > 0 else length - 1
if stop is None:
stop = length if step > 0 else -1
else:
if stop < 0:
stop += length
if stop < 0:
stop = 0 if step > 0 else -1
if stop >= length:
stop = length if step > 0 else length - 1
if (step < 0 and stop >= start) or (step > 0 and start >= stop):
slice_length = 0
elif step < 0:
slice_length = (stop - start + 1)/(step) + 1
else:
slice_length = (stop - start - 1)/(step) + 1
return (start, stop, step, slice_length)
This is the intelligence that is present behind slices. Since Python has an built-in function called slice, you can pass some parameters and check how smartly it calculates missing parameters.
In [21]: alpha = ['a', 'b', 'c', 'd', 'e', 'f']
In [22]: s = slice(None, None, None)
In [23]: s
Out[23]: slice(None, None, None)
In [24]: s.indices(len(alpha))
Out[24]: (0, 6, 1)
In [25]: range(*s.indices(len(alpha)))
Out[25]: [0, 1, 2, 3, 4, 5]
In [26]: s = slice(None, None, -1)
In [27]: range(*s.indices(len(alpha)))
Out[27]: [5, 4, 3, 2, 1, 0]
In [28]: s = slice(None, 3, -1)
In [29]: range(*s.indices(len(alpha)))
Out[29]: [5, 4]
Note: This post was originally written in my blog, The Intelligence Behind Python Slices.
Python slicing notation:
a[start:end:step]
For start and end, negative values are interpreted as being relative to the end of the sequence.
Positive indices for end indicate the position after the last element to be included.
Blank values are defaulted as follows: [+0:-0:1].
Using a negative step reverses the interpretation of start and end
The notation extends to (numpy) matrices and multidimensional arrays. For example, to slice entire columns you can use:
m[::,0:2:] ## slice the first two columns
Slices hold references, not copies, of the array elements. If you want to make a separate copy an array, you can use deepcopy().
You can also use slice assignment to remove one or more elements from a list:
r = [1, 'blah', 9, 8, 2, 3, 4]
>>> r[1:4] = []
>>> r
[1, 2, 3, 4]
This is just for some extra info...
Consider the list below
>>> l=[12,23,345,456,67,7,945,467]
Few other tricks for reversing the list:
>>> l[len(l):-len(l)-1:-1]
[467, 945, 7, 67, 456, 345, 23, 12]
>>> l[:-len(l)-1:-1]
[467, 945, 7, 67, 456, 345, 23, 12]
>>> l[len(l)::-1]
[467, 945, 7, 67, 456, 345, 23, 12]
>>> l[::-1]
[467, 945, 7, 67, 456, 345, 23, 12]
>>> l[-1:-len(l)-1:-1]
[467, 945, 7, 67, 456, 345, 23, 12]
1. Slice Notation
To make it simple, remember slice has only one form:
s[start:end:step]
and here is how it works:
s: an object that can be sliced
start: first index to start iteration
end: last index, NOTE that end index will not be included in the resulted slice
step: pick element every step index
Another import thing: all start,end, step can be omitted! And if they are omitted, their default value will be used: 0,len(s),1 accordingly.
So possible variations are:
# Mostly used variations
s[start:end]
s[start:]
s[:end]
# Step-related variations
s[:end:step]
s[start::step]
s[::step]
# Make a copy
s[:]
NOTE: If start >= end (considering only when step>0), Python will return a empty slice [].
2. Pitfalls
The above part explains the core features on how slice works, and it will work on most occasions. However, there can be pitfalls you should watch out, and this part explains them.
Negative indexes
The very first thing that confuses Python learners is that an index can be negative!
Don't panic: a negative index means count backwards.
For example:
s[-5:] # Start at the 5th index from the end of array,
# thus returning the last 5 elements.
s[:-5] # Start at index 0, and end until the 5th index from end of array,
# thus returning s[0:len(s)-5].
Negative step
Making things more confusing is that step can be negative too!
A negative step means iterate the array backwards: from the end to start, with the end index included, and the start index excluded from the result.
NOTE: when step is negative, the default value for start is len(s) (while end does not equal to 0, because s[::-1] contains s[0]). For example:
s[::-1] # Reversed slice
s[len(s)::-1] # The same as above, reversed slice
s[0:len(s):-1] # Empty list
Out of range error?
Be surprised: slice does not raise an IndexError when the index is out of range!
If the index is out of range, Python will try its best to set the index to 0 or len(s) according to the situation. For example:
s[:len(s)+5] # The same as s[:len(s)]
s[-len(s)-5::] # The same as s[0:]
s[len(s)+5::-1] # The same as s[len(s)::-1], and the same as s[::-1]
3. Examples
Let's finish this answer with examples, explaining everything we have discussed:
# Create our array for demonstration
In [1]: s = [i for i in range(10)]
In [2]: s
Out[2]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [3]: s[2:] # From index 2 to last index
Out[3]: [2, 3, 4, 5, 6, 7, 8, 9]
In [4]: s[:8] # From index 0 up to index 8
Out[4]: [0, 1, 2, 3, 4, 5, 6, 7]
In [5]: s[4:7] # From index 4 (included) up to index 7(excluded)
Out[5]: [4, 5, 6]
In [6]: s[:-2] # Up to second last index (negative index)
Out[6]: [0, 1, 2, 3, 4, 5, 6, 7]
In [7]: s[-2:] # From second last index (negative index)
Out[7]: [8, 9]
In [8]: s[::-1] # From last to first in reverse order (negative step)
Out[8]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
In [9]: s[::-2] # All odd numbers in reversed order
Out[9]: [9, 7, 5, 3, 1]
In [11]: s[-2::-2] # All even numbers in reversed order
Out[11]: [8, 6, 4, 2, 0]
In [12]: s[3:15] # End is out of range, and Python will set it to len(s).
Out[12]: [3, 4, 5, 6, 7, 8, 9]
In [14]: s[5:1] # Start > end; return empty list
Out[14]: []
In [15]: s[11] # Access index 11 (greater than len(s)) will raise an IndexError
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-15-79ffc22473a3> in <module>()
----> 1 s[11]
IndexError: list index out of range
As a general rule, writing code with a lot of hardcoded index values leads to a readability
and maintenance mess. For example, if you come back to the code a year later, you’ll
look at it and wonder what you were thinking when you wrote it. The solution shown
is simply a way of more clearly stating what your code is actually doing.
In general, the built-in slice() creates a slice object that can be used anywhere a slice
is allowed. For example:
>>> items = [0, 1, 2, 3, 4, 5, 6]
>>> a = slice(2, 4)
>>> items[2:4]
[2, 3]
>>> items[a]
[2, 3]
>>> items[a] = [10,11]
>>> items
[0, 1, 10, 11, 4, 5, 6]
>>> del items[a]
>>> items
[0, 1, 4, 5, 6]
If you have a slice instance s, you can get more information about it by looking at its
s.start, s.stop, and s.step attributes, respectively. For example:
>>> a = slice(10, 50, 2)
>>> a.start
10
>>> a.stop
50
>>> a.step
2
>>>
The previous answers don't discuss multi-dimensional array slicing which is possible using the famous NumPy package:
Slicing can also be applied to multi-dimensional arrays.
# Here, a is a NumPy array
>>> a
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> a[:2, 0:3:2]
array([[1, 3],
[5, 7]])
The ":2" before the comma operates on the first dimension and the "0:3:2" after the comma operates on the second dimension.
The rules of slicing are as follows:
[lower bound : upper bound : step size]
I- Convert upper bound and lower bound into common signs.
II- Then check if the step size is a positive or a negative value.
(i) If the step size is a positive value, upper bound should be greater than lower bound, otherwise empty string is printed. For example:
s="Welcome"
s1=s[0:3:1]
print(s1)
The output:
Wel
However if we run the following code:
s="Welcome"
s1=s[3:0:1]
print(s1)
It will return an empty string.
(ii) If the step size if a negative value, upper bound should be lesser than lower bound, otherwise empty string will be printed. For example:
s="Welcome"
s1=s[3:0:-1]
print(s1)
The output:
cle
But if we run the following code:
s="Welcome"
s1=s[0:5:-1]
print(s1)
The output will be an empty string.
Thus in the code:
str = 'abcd'
l = len(str)
str2 = str[l-1:0:-1] #str[3:0:-1]
print(str2)
str2 = str[l-1:-1:-1] #str[3:-1:-1]
print(str2)
In the first str2=str[l-1:0:-1], the upper bound is lesser than the lower bound, thus dcb is printed.
However in str2=str[l-1:-1:-1], the upper bound is not less than the lower bound (upon converting lower bound into negative value which is -1: since index of last element is -1 as well as 3).
In my opinion, you will understand and memorize better the Python string slicing notation if you look at it the following way (read on).
Let's work with the following string ...
azString = "abcdefghijklmnopqrstuvwxyz"
For those who don't know, you can create any substring from azString using the notation azString[x:y]
Coming from other programming languages, that's when the common sense gets compromised. What are x and y?
I had to sit down and run several scenarios in my quest for a memorization technique that will help me remember what x and y are and help me slice strings properly at the first attempt.
My conclusion is that x and y should be seen as the boundary indexes that are surrounding the strings that we want to extra. So we should see the expression as azString[index1, index2] or even more clearer as azString[index_of_first_character, index_after_the_last_character].
Here is an example visualization of that ...
Letters a b c d e f g h i j ...
↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
┊ ┊
Indexes 0 1 2 3 4 5 6 7 8 9 ...
┊ ┊
cdefgh index1 index2
So all you have to do is setting index1 and index2 to the values that will surround the desired substring. For instance, to get the substring "cdefgh", you can use azString[2:8], because the index on the left side of "c" is 2 and the one on the right size of "h" is 8.
Remember that we are setting the boundaries. And those boundaries are the positions where you could place some brackets that will be wrapped around the substring like this ...
a b [ c d e f g h ] i j
That trick works all the time and is easy to memorize.
I personally think about it like a for loop:
a[start:end:step]
# for(i = start; i < end; i += step)
Also, note that negative values for start and end are relative to the end of the list and computed in the example above by given_index + a.shape[0].
#!/usr/bin/env python
def slicegraphical(s, lista):
if len(s) > 9:
print """Enter a string of maximum 9 characters,
so the printig would looki nice"""
return 0;
# print " ",
print ' '+'+---' * len(s) +'+'
print ' ',
for letter in s:
print '| {}'.format(letter),
print '|'
print " ",; print '+---' * len(s) +'+'
print " ",
for letter in range(len(s) +1):
print '{} '.format(letter),
print ""
for letter in range(-1*(len(s)), 0):
print ' {}'.format(letter),
print ''
print ''
for triada in lista:
if len(triada) == 3:
if triada[0]==None and triada[1] == None and triada[2] == None:
# 000
print s+'[ : : ]' +' = ', s[triada[0]:triada[1]:triada[2]]
elif triada[0] == None and triada[1] == None and triada[2] != None:
# 001
print s+'[ : :{0:2d} ]'.format(triada[2], '','') +' = ', s[triada[0]:triada[1]:triada[2]]
elif triada[0] == None and triada[1] != None and triada[2] == None:
# 010
print s+'[ :{0:2d} : ]'.format(triada[1]) +' = ', s[triada[0]:triada[1]:triada[2]]
elif triada[0] == None and triada[1] != None and triada[2] != None:
# 011
print s+'[ :{0:2d} :{1:2d} ]'.format(triada[1], triada[2]) +' = ', s[triada[0]:triada[1]:triada[2]]
elif triada[0] != None and triada[1] == None and triada[2] == None:
# 100
print s+'[{0:2d} : : ]'.format(triada[0]) +' = ', s[triada[0]:triada[1]:triada[2]]
elif triada[0] != None and triada[1] == None and triada[2] != None:
# 101
print s+'[{0:2d} : :{1:2d} ]'.format(triada[0], triada[2]) +' = ', s[triada[0]:triada[1]:triada[2]]
elif triada[0] != None and triada[1] != None and triada[2] == None:
# 110
print s+'[{0:2d} :{1:2d} : ]'.format(triada[0], triada[1]) +' = ', s[triada[0]:triada[1]:triada[2]]
elif triada[0] != None and triada[1] != None and triada[2] != None:
# 111
print s+'[{0:2d} :{1:2d} :{2:2d} ]'.format(triada[0], triada[1], triada[2]) +' = ', s[triada[0]:triada[1]:triada[2]]
elif len(triada) == 2:
if triada[0] == None and triada[1] == None:
# 00
print s+'[ : ] ' + ' = ', s[triada[0]:triada[1]]
elif triada[0] == None and triada[1] != None:
# 01
print s+'[ :{0:2d} ] '.format(triada[1]) + ' = ', s[triada[0]:triada[1]]
elif triada[0] != None and triada[1] == None:
# 10
print s+'[{0:2d} : ] '.format(triada[0]) + ' = ', s[triada[0]:triada[1]]
elif triada[0] != None and triada[1] != None:
# 11
print s+'[{0:2d} :{1:2d} ] '.format(triada[0],triada[1]) + ' = ', s[triada[0]:triada[1]]
elif len(triada) == 1:
print s+'[{0:2d} ] '.format(triada[0]) + ' = ', s[triada[0]]
if __name__ == '__main__':
# Change "s" to what ever string you like, make it 9 characters for
# better representation.
s = 'COMPUTERS'
# add to this list different lists to experement with indexes
# to represent ex. s[::], use s[None, None,None], otherwise you get an error
# for s[2:] use s[2:None]
lista = [[4,7],[2,5,2],[-5,1,-1],[4],[-4,-6,-1], [2,-3,1],[2,-3,-1], [None,None,-1],[-5,None],[-5,0,-1],[-5,None,-1],[-1,1,-2]]
slicegraphical(s, lista)
You can run this script and experiment with it, below is some samples that I got from the script.
+---+---+---+---+---+---+---+---+---+
| C | O | M | P | U | T | E | R | S |
+---+---+---+---+---+---+---+---+---+
0 1 2 3 4 5 6 7 8 9
-9 -8 -7 -6 -5 -4 -3 -2 -1
COMPUTERS[ 4 : 7 ] = UTE
COMPUTERS[ 2 : 5 : 2 ] = MU
COMPUTERS[-5 : 1 :-1 ] = UPM
COMPUTERS[ 4 ] = U
COMPUTERS[-4 :-6 :-1 ] = TU
COMPUTERS[ 2 :-3 : 1 ] = MPUT
COMPUTERS[ 2 :-3 :-1 ] =
COMPUTERS[ : :-1 ] = SRETUPMOC
COMPUTERS[-5 : ] = UTERS
COMPUTERS[-5 : 0 :-1 ] = UPMO
COMPUTERS[-5 : :-1 ] = UPMOC
COMPUTERS[-1 : 1 :-2 ] = SEUM
[Finished in 0.9s]
When using a negative step, notice that the answer is shifted to the right by 1.
My brain seems happy to accept that lst[start:end] contains the start-th item. I might even say that it is a 'natural assumption'.
But occasionally a doubt creeps in and my brain asks for reassurance that it does not contain the end-th element.
In these moments I rely on this simple theorem:
for any n, lst = lst[:n] + lst[n:]
This pretty property tells me that lst[start:end] does not contain the end-th item because it is in lst[end:].
Note that this theorem is true for any n at all. For example, you can check that
lst = range(10)
lst[:-42] + lst[-42:] == lst
returns True.
In Python, the most basic form for slicing is the following:
l[start:end]
where l is some collection, start is an inclusive index, and end is an exclusive index.
In [1]: l = list(range(10))
In [2]: l[:5] # First five elements
Out[2]: [0, 1, 2, 3, 4]
In [3]: l[-5:] # Last five elements
Out[3]: [5, 6, 7, 8, 9]
When slicing from the start, you can omit the zero index, and when slicing to the end, you can omit the final index since it is redundant, so do not be verbose:
In [5]: l[:3] == l[0:3]
Out[5]: True
In [6]: l[7:] == l[7:len(l)]
Out[6]: True
Negative integers are useful when doing offsets relative to the end of a collection:
In [7]: l[:-1] # Include all elements but the last one
Out[7]: [0, 1, 2, 3, 4, 5, 6, 7, 8]
In [8]: l[-3:] # Take the last three elements
Out[8]: [7, 8, 9]
It is possible to provide indices that are out of bounds when slicing such as:
In [9]: l[:20] # 20 is out of index bounds, and l[20] will raise an IndexError exception
Out[9]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [11]: l[-20:] # -20 is out of index bounds, and l[-20] will raise an IndexError exception
Out[11]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Keep in mind that the result of slicing a collection is a whole new collection. In addition, when using slice notation in assignments, the length of the slice assignments do not need to be the same. The values before and after the assigned slice will be kept, and the collection will shrink or grow to contain the new values:
In [16]: l[2:6] = list('abc') # Assigning fewer elements than the ones contained in the sliced collection l[2:6]
In [17]: l
Out[17]: [0, 1, 'a', 'b', 'c', 6, 7, 8, 9]
In [18]: l[2:5] = list('hello') # Assigning more elements than the ones contained in the sliced collection l [2:5]
In [19]: l
Out[19]: [0, 1, 'h', 'e', 'l', 'l', 'o', 6, 7, 8, 9]
If you omit the start and end index, you will make a copy of the collection:
In [14]: l_copy = l[:]
In [15]: l == l_copy and l is not l_copy
Out[15]: True
If the start and end indexes are omitted when performing an assignment operation, the entire content of the collection will be replaced with a copy of what is referenced:
In [20]: l[:] = list('hello...')
In [21]: l
Out[21]: ['h', 'e', 'l', 'l', 'o', '.', '.', '.']
Besides basic slicing, it is also possible to apply the following notation:
l[start:end:step]
where l is a collection, start is an inclusive index, end is an exclusive index, and step is a stride that can be used to take every nth item in l.
In [22]: l = list(range(10))
In [23]: l[::2] # Take the elements which indexes are even
Out[23]: [0, 2, 4, 6, 8]
In [24]: l[1::2] # Take the elements which indexes are odd
Out[24]: [1, 3, 5, 7, 9]
Using step provides a useful trick to reverse a collection in Python:
In [25]: l[::-1]
Out[25]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
It is also possible to use negative integers for step as the following example:
In[28]: l[::-2]
Out[28]: [9, 7, 5, 3, 1]
However, using a negative value for step could become very confusing. Moreover, in order to be Pythonic, you should avoid using start, end, and step in a single slice. In case this is required, consider doing this in two assignments (one to slice, and the other to stride).
In [29]: l = l[::2] # This step is for striding
In [30]: l
Out[30]: [0, 2, 4, 6, 8]
In [31]: l = l[1:-1] # This step is for slicing
In [32]: l
Out[32]: [2, 4, 6]
I want to add one Hello, World! example that explains the basics of slices for the very beginners. It helped me a lot.
Let's have a list with six values ['P', 'Y', 'T', 'H', 'O', 'N']:
+---+---+---+---+---+---+
| P | Y | T | H | O | N |
+---+---+---+---+---+---+
0 1 2 3 4 5
Now the simplest slices of that list are its sublists. The notation is [<index>:<index>] and the key is to read it like this:
[ start cutting before this index : end cutting before this index ]
Now if you make a slice [2:5] of the list above, this will happen:
| |
+---+---|---+---+---|---+
| P | Y | T | H | O | N |
+---+---|---+---+---|---+
0 1 | 2 3 4 | 5
You made a cut before the element with index 2 and another cut before the element with index 5. So the result will be a slice between those two cuts, a list ['T', 'H', 'O'].
Most of the previous answers clears up questions about slice notation.
The extended indexing syntax used for slicing is aList[start:stop:step], and basic examples are:
:
More slicing examples: 15 Extended Slices
The below is the example of an index of a string:
+---+---+---+---+---+
| H | e | l | p | A |
+---+---+---+---+---+
0 1 2 3 4 5
-5 -4 -3 -2 -1
str="Name string"
Slicing example: [start:end:step]
str[start:end] # Items start through end-1
str[start:] # Items start through the rest of the array
str[:end] # Items from the beginning through end-1
str[:] # A copy of the whole array
Below is the example usage:
print str[0] = N
print str[0:2] = Na
print str[0:7] = Name st
print str[0:7:2] = Nm t
print str[0:-1:2] = Nm ti
If you feel negative indices in slicing is confusing, here's a very easy way to think about it: just replace the negative index with len - index. So for example, replace -3 with len(list) - 3.
The best way to illustrate what slicing does internally is just show it in code that implements this operation:
def slice(list, start = None, end = None, step = 1):
# Take care of missing start/end parameters
start = 0 if start is None else start
end = len(list) if end is None else end
# Take care of negative start/end parameters
start = len(list) + start if start < 0 else start
end = len(list) + end if end < 0 else end
# Now just execute a for-loop with start, end and step
return [list[i] for i in range(start, end, step)]
I don't think that the Python tutorial diagram (cited in various other answers) is good as this suggestion works for positive stride, but does not for a negative stride.
This is the diagram:
+---+---+---+---+---+---+
| P | y | t | h | o | n |
+---+---+---+---+---+---+
0 1 2 3 4 5 6
-6 -5 -4 -3 -2 -1
From the diagram, I expect a[-4,-6,-1] to be yP but it is ty.
>>> a = "Python"
>>> a[2:4:1] # as expected
'th'
>>> a[-4:-6:-1] # off by 1
'ty'
What always work is to think in characters or slots and use indexing as a half-open interval -- right-open if positive stride, left-open if negative stride.
This way, I can think of a[-4:-6:-1] as a(-6,-4] in interval terminology.
+---+---+---+---+---+---+
| P | y | t | h | o | n |
+---+---+---+---+---+---+
0 1 2 3 4 5
-6 -5 -4 -3 -2 -1
+---+---+---+---+---+---+---+---+---+---+---+---+
| P | y | t | h | o | n | P | y | t | h | o | n |
+---+---+---+---+---+---+---+---+---+---+---+---+
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5

The meaning of this `long` function (two's complement and bit-shifting)

I have encountered this function:
const LIMIT32 = 2147483648; // The limit at which a 32-bit number switches signs == 2 ^ 31
function long(v) {
// Two's complement
if (v >= LIMIT32) {
v = -(2 * LIMIT32 - v);
}
return [(v >> 24) & 0xFF, (v >> 16) & 0xFF, (v >> 8) & 0xFF, v & 0xFF];
}
// e.g.
[-3, -2, -1, 0, 1,
-2147483649,-2147483648,-2147483647,
2147483647,2147483648,2147483649].forEach(x =>
console.log(`${x}: ${long(x)}`)
);
I'm wondering generally what this function is doing (why it's returning an array, and what the array elements are).
Then I'm wondering why it takes the v and does what looks like a sign flip and some multiplication.
Finally, the meaning of the bitshift and & operations for each item, why it's as multiples of 8, and why they chose 0xFF.
I'm wondering generally what this function is doing (why it's returning an array, and what the array elements are).
It returns an array of the 4 bytes that make up a int32 value. Why someone wrote the code to do that? I don't know.
Then I'm wondering why it takes the v and does what looks like a sign flip and some multiplication.
Because that's how int32 works: 0x7FFFFFFF + 1 === -0x80000000.
Although it is unnecessary in this code, the bit operations will take care of everything.
Finally, the meaning of the bitshift and & operations for each item, why it's as multiples of 8, and why they chose 0xFF.
Getting the distinct bytes of the int32, each one 8 bit long.

Can I represent large integers in Node 6 for only exponentiation without a dependency?

I am working on a kata that asks for the last digit of a[0] ^ (a[1] ^ (a[2] ^ ... (a[n-1] ^ a[n]))). When computing the answer, eventually Math.pow exceeds Number.MAX_SAFE_INTEGER, causing modexp below to return erroneous results.
#user2357112 says that JS needs a library for arbitrary-precision integers, which is all well and good, but nothing in the kata indicates that such a library is available in the remote environment, or even that I need one.
Since the kata and SO point in different directions on this matter, I want to learn if I can feasibly represent big integers ONLY for the purposes of solving this kata without writing an entire library.
My in-progress code is below, and it passes many tests before printing incorrect results. Some code was omitted to avoid spoilers.
TL;DR: If I cannot use a library, what can I do to feasibly represent large integers for the use case indicated by Math.pow()?
function modexp(b, e) {
let c = 1
while(e) {
if (e & 1)
c = c * b % 10
e >>= 1
b = b * b % 10
}
return c;
}
function lastDigit(as) {
if (!as || !as.length) return 1;
let e = as.slice(1).reverse().reduce((p,c) => Math.pow(c,p));
return modexp(as[0], Number(e));
}
This is obviously an X-Y problem. You don't need large integers.
You need to go back to elementary school math.
What's multiplication? Well let's take one example:
WARNING: SPOILERS! Don't read the following if you want to figure it out yourself!
1 2
x 2 3
------
3 6 last digit 6
2 4
------
2 7 6 notice how last digit is only involved
in ONE multiplication operation?
Hmm.. there seems to be a pattern. Let's see if that pattern holds. Let's multiply 12 x 23 x 23 by only doing the last digit:
1 2
x 2 3
------
6 calculate ONLY last digit
x 2 3
------
8 answer is: last digit is 8
Let's check our answer:
1 2
x 2 3
------
3 6
2 4
------
2 7 6
x 2 3
------
8 2 8
5 5 2
-------
6 3 4 8 last digit is INDEED 8
So it seems that you can find the last digit by only calculating the last digit. Let's try to implement a powLastDigit() function.
WARNING: SPOILERS! DON'T READ THE CODE IF YOU WANT TO WRITE IT YOURSELF!
function powLastDigit (number,power) {
var x = number;
for (var y=1; y<power; y++) {
x = ((x%10)*(number%10))%10; // we only care about last digit!
}
return x;
}
Let's check if we are right:
> powLastDigit(3,7)
7
> Math.pow(3,7)
2187
> powLastDigit(5,8)
5
> Math.pow(5,8)
390625
> powLastDigit(7,12)
1
> Math.pow(7,12)
13841287201
OK. Looks like it's working. Now you can use this function to solve your problem. It has no issues with very large exponents because it doesn't deal with large numbers:
> powLastDigit(2323,123456789)
3
Optimization
The above code is slow because it uses a loop. It's possible to speed it up by using Math.pow() instead. I'll leave that as a homework problem.
I use bit array to represent a positive integer;
[0,0,1,1] // bit array of 12 (MSB=rightmost)
This structure is used to manipulate the overall exponent, E
a[1] ^ (a[2] .. (a[n-1] ^ a[n])) // E
(!)There is a relation between LSD(a[0]) and the overall exponent E as below
//LSD(a[0]) LSD(LSD(a[0]) ^ E) for
// [E/4R0, E/4R1, E/4R2, E/4R3]
//--------- ----------------------------
// 0 [0, 0, 0, 0]
// 1 [1, 1, 1, 1]
// 2 [6, 2, 4, 8]
// 3 [1, 3, 9, 7]
// 4 [6, 4, 6, 4]
// 5 [5, 5, 5, 5]
// 6 [6, 6, 6, 6]
// 7 [1, 7, 9, 3]
// 8 [6, 8, 4, 2]
// 9 [1, 9, 1, 9]
For example, find least significant digit of (2 ^ (2 ^ 3)),
// LSD(a[0]) is 2
// E is 8
// implies E mod 4 is 0
// LSD(LSD(a[0]) ^ E)
// for E/4R0 is 6 (ans)
To determine E mod 4,
E[0] + E[1] * 2 // the two LSBs
To summarize, I create a data structure, bit array, to store large integers,
mainly for the intermediate value of exponent-part. The bit array is dynamic length obtaining max. 9007199254740991 bits, if all bits are set, the value in decimal is 2 ^ (9007199254740991 + 1) - 1. This bit array will never be converted back to decimal(safe). The only interesting information of overall exponent, E, is its two least significant bits, they are the remainder of E/4
which can be applied to the above relation(!).
Obviously, Math.pow will not work for bit array, so I handcraft a simple exp() for it. This is trivial since the fact that
//exponentiation == lots of multiplications == lots of additions
//it is not difficult to implement addition on bit array
This is the fiddle demonstrating above idea ONLY. It is intended to be slow if the E is really large. FYI
LSD.find([3,4,5,6]) // my Nightly hanged ~3s to find lsd
You may optimize the Bits.exp by means of childprocesses, web workers, debounce function, simd etc. Good luck.
You don't really need a bigint library to solve this kata.
You are only interested in the last digit of the result, and fortunately there is a property of powers that helps us with this. We effectively want to compute a power in modulus 10. Yes, modular exponentiation does help us a bit here, but the problem is that the exponent is very large as well - too large to compute, and too large to run a loop with that many iterations. But we don't need to, all we are interested in is the modulus of the result.
And there is a pattern! Let's take 4x as an example:
x 4^x (4^x)%10
--------------------------
0 4^0 = 1 1
1 4^1 = 4 4
2 4^2 = 16 6
3 4^3 = 64 4
4 4^4 = 256 6
5 4^5 = 1024 4
… …
20 4^20 = ??? 6 sic!
21 4^21 = ??? 4
… …
You will be able to find these patterns for all numbers in all modular bases. They all share the same characteristic: there's a threshold below which the remainder is irregular, and then they form a repeating sequence. To get a number in this sequence, we only need to perform a modulo operation on the exponent!
For the example above ((4^x)%10), we use a lookup table 0 → 6, 1 → 4 and compute x % 2; the threshold is 1. In JavaScript code, it might look like this:
x < 1 ? 1 : [6, 4][x % 2];
Of course, x is a very large number formed by the repeated exponentiation of the rest of the input, but we do not need to compute it as whole - we only want to know
whether it is smaller than a relatively small number (trivial)
what the remainder after division by q is - just a recursive call to the function we're implementing!

Decomposing a value into results of powers of two

Is it possible to get the integers that, being results of powers of two, forms a value?
Example:
129 resolves [1, 128]
77 resolves [1, 4, 8, 64]
I already thought about using Math.log and doing also a foreach with a bitwise comparator. Is any other more beautiful solution?
The easiest way is to use a single bit value, starting with 1 and shift that bit 'left' until its value is greater than the value to check, comparing each bit step bitwise with the value. The bits that are set can be stored in an array.
function GetBits(value) {
var b = 1;
var res = [];
while (b <= value) {
if (b & value) res.push(b);
b <<= 1;
}
return res;
}
console.log(GetBits(129));
console.log(GetBits(77));
console.log(GetBits(255));
Since shifting the bit can be seen as a power of 2, you can push the current bit value directly into the result array.
Example
You can adapt solutions from other languages to javascript. In this SO question you'll find some ways of solving the problem using Java (you can choose the one you find more elegant).
decomposing a value into powers of two
I adapted one of those answers to javascript and come up with this code:
var powers = [], power = 0, n = 129;// Gives [1,128] as output.
while (n != 0) {
if ((n & 1) != 0) {
powers.push(1 << power);
}
++power;
n >>>= 1;
}
console.log(powers);
Fiddle
Find the largest power of two contained in the number.
Subtract from the original number and Add it to list.
Decrement the exponent and check if new 2's power is less than the number.
If less then subtract it from the original number and add it to list.
Otherwise go to step 3.
Exit when your number comes to 0.
I am thinking of creating a list of all power of 2 numbers <= your number, then use an addition- subtraction algorithm to find out the group of correct numbers.
For example number 77:
the group of factors is { 1,2,4,8,16,32,64} [ 64 is the greatest power of 2 less than or equal 77]
An algorithm that continuously subtract the greatest number less than or equal to your number from the group you just created, until you get zero.
77-64 = 13 ==> [64]
13-8 = 7 ==> [8]
7-4 = 3 ==> [4]
3-2 = 1 ==> [2]
1-1 = 0 ==> [1]
Hope you understand my algorithm, pardo my bad english.
function getBits(val, factor) {
factor = factor || 1;
if(val) {
return (val % 2 ? [factor] : []).concat(getBits(val>>1, factor*2))
}
return [];
}
alert(getBits(77));

Find the longest common starting substring in a set of strings [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
This is a challenge to come up with the most elegant JavaScript, Ruby or other solution to a relatively trivial problem.
This problem is a more specific case of the Longest common substring problem. I need to only find the longest common starting substring in an array. This greatly simplifies the problem.
For example, the longest substring in [interspecies, interstelar, interstate] is "inters". However, I don't need to find "ific" in [specifics, terrific].
I've solved the problem by quickly coding up a solution in JavaScript as a part of my answer about shell-like tab-completion (test page here). Here is that solution, slightly tweaked:
function common_substring(data) {
var i, ch, memo, idx = 0
do {
memo = null
for (i=0; i < data.length; i++) {
ch = data[i].charAt(idx)
if (!ch) break
if (!memo) memo = ch
else if (ch != memo) break
}
} while (i == data.length && idx < data.length && ++idx)
return (data[0] || '').slice(0, idx)
}
This code is available in this Gist along with a similar solution in Ruby. You can clone the gist as a git repo to try it out:
$ git clone git://gist.github.com/257891.git substring-challenge
I'm not very happy with those solutions. I have a feeling they might be solved with more elegance and less execution complexity—that's why I'm posting this challenge.
I'm going to accept as an answer the solution I find the most elegant or concise. Here is for instance a crazy Ruby hack I come up with—defining the & operator on String:
# works with Ruby 1.8.7 and above
class String
def &(other)
difference = other.to_str.each_char.with_index.find { |ch, idx|
self[idx].nil? or ch != self[idx].chr
}
difference ? self[0, difference.last] : self
end
end
class Array
def common_substring
self.inject(nil) { |memo, str| memo.nil? ? str : memo & str }.to_s
end
end
Solutions in JavaScript or Ruby are preferred, but you can show off clever solution in other languages as long as you explain what's going on. Only code from standard library please.
Update: my favorite solutions
I've chosen the JavaScript sorting solution by kennebec as the "answer" because it struck me as both unexpected and genius. If we disregard the complexity of actual sorting (let's imagine it's infinitely optimized by the language implementation), the complexity of the solution is just comparing two strings.
Other great solutions:
"regex greed" by FM takes a minute or two to grasp, but then the elegance of it hits you. Yehuda Katz also made a regex solution, but it's more complex
commonprefix in Python — Roberto Bonvallet used a feature made for handling filesystem paths to solve this problem
Haskell one-liner is short as if it were compressed, and beautiful
the straightforward Ruby one-liner
Thanks for participating! As you can see from the comments, I learned a lot (even about Ruby).
It's a matter of taste, but this is a simple javascript version:
It sorts the array, and then looks just at the first and last items.
//longest common starting substring in an array
function sharedStart(array){
var A= array.concat().sort(),
a1= A[0], a2= A[A.length-1], L= a1.length, i= 0;
while(i<L && a1.charAt(i)=== a2.charAt(i)) i++;
return a1.substring(0, i);
}
DEMOS
sharedStart(['interspecies', 'interstelar', 'interstate']) //=> 'inters'
sharedStart(['throne', 'throne']) //=> 'throne'
sharedStart(['throne', 'dungeon']) //=> ''
sharedStart(['cheese']) //=> 'cheese'
sharedStart([]) //=> ''
sharedStart(['prefix', 'suffix']) //=> ''
In Python:
>>> from os.path import commonprefix
>>> commonprefix('interspecies interstelar interstate'.split())
'inters'
Ruby one-liner:
l=strings.inject{|l,s| l=l.chop while l!=s[0...l.length];l}
My Haskell one-liner:
import Data.List
commonPre :: [String] -> String
commonPre = map head . takeWhile (\(x:xs)-> all (==x) xs) . transpose
EDIT: barkmadley gave a good explanation of the code below. I'd also add that haskell uses lazy evaluation, so we can be lazy about our use of transpose; it will only transpose our lists as far as necessary to find the end of the common prefix.
You just need to traverse all strings until they differ, then take the substring up to this point.
Pseudocode:
loop for i upfrom 0
while all strings[i] are equal
finally return substring[0..i]
Common Lisp:
(defun longest-common-starting-substring (&rest strings)
(loop for i from 0 below (apply #'min (mapcar #'length strings))
while (apply #'char=
(mapcar (lambda (string) (aref string i))
strings))
finally (return (subseq (first strings) 0 i))))
Yet another way to do it: use regex greed.
words = %w(interspecies interstelar interstate)
j = '='
str = ['', *words].join(j)
re = "[^#{j}]*"
str =~ /\A
(?: #{j} ( #{re} ) #{re} )
(?: #{j} \1 #{re} )*
\z/x
p $1
And the one-liner, courtesy of mislav (50 characters):
p ARGV.join(' ').match(/^(\w*)\w*(?: \1\w*)*$/)[1]
In Python I wouldn't use anything but the existing commonprefix function I showed in another answer, but I couldn't help to reinvent the wheel :P. This is my iterator-based approach:
>>> a = 'interspecies interstelar interstate'.split()
>>>
>>> from itertools import takewhile, chain, izip as zip, imap as map
>>> ''.join(chain(*takewhile(lambda s: len(s) == 1, map(set, zip(*a)))))
'inters'
Edit: Explanation of how this works.
zip generates tuples of elements taking one of each item of a at a time:
In [6]: list(zip(*a)) # here I use list() to expand the iterator
Out[6]:
[('i', 'i', 'i'),
('n', 'n', 'n'),
('t', 't', 't'),
('e', 'e', 'e'),
('r', 'r', 'r'),
('s', 's', 's'),
('p', 't', 't'),
('e', 'e', 'a'),
('c', 'l', 't'),
('i', 'a', 'e')]
By mapping set over these items, I get a series of unique letters:
In [7]: list(map(set, _)) # _ means the result of the last statement above
Out[7]:
[set(['i']),
set(['n']),
set(['t']),
set(['e']),
set(['r']),
set(['s']),
set(['p', 't']),
set(['a', 'e']),
set(['c', 'l', 't']),
set(['a', 'e', 'i'])]
takewhile(predicate, items) takes elements from this while the predicate is True; in this particular case, when the sets have one element, i.e. all the words have the same letter at that position:
In [8]: list(takewhile(lambda s: len(s) == 1, _))
Out[8]:
[set(['i']),
set(['n']),
set(['t']),
set(['e']),
set(['r']),
set(['s'])]
At this point we have an iterable of sets, each containing one letter of the prefix we were looking for. To construct the string, we chain them into a single iterable, from which we get the letters to join into the final string.
The magic of using iterators is that all items are generated on demand, so when takewhile stops asking for items, the zipping stops at that point and no unnecessary work is done. Each function call in my one-liner has a implicit for and an implicit break.
This is probably not the most concise solution (depends if you already have a library for this), but one elegant method is to use a trie. I use tries for implementing tab completion in my Scheme interpreter:
http://github.com/jcoglan/heist/blob/master/lib/trie.rb
For example:
tree = Trie.new
%w[interspecies interstelar interstate].each { |s| tree[s] = true }
tree.longest_prefix('')
#=> "inters"
I also use them for matching channel names with wildcards for the Bayeux protocol; see these:
http://github.com/jcoglan/faye/blob/master/client/channel.js
http://github.com/jcoglan/faye/blob/master/lib/faye/channel.rb
Just for the fun of it, here's a version written in (SWI-)PROLOG:
common_pre([[C|Cs]|Ss], [C|Res]) :-
maplist(head_tail(C), [[C|Cs]|Ss], RemSs), !,
common_pre(RemSs, Res).
common_pre(_, []).
head_tail(H, [H|T], T).
Running:
?- S=["interspecies", "interstelar", "interstate"], common_pre(S, CP), string_to_list(CPString, CP).
Gives:
CP = [105, 110, 116, 101, 114, 115],
CPString = "inters".
Explanation:
(SWI-)PROLOG treats strings as lists of character codes (numbers). All the predicate common_pre/2 does is recursively pattern-match to select the first code (C) from the head of the first list (string, [C|Cs]) in the list of all lists (all strings, [[C|Cs]|Ss]), and appends the matching code C to the result iff it is common to all (remaining) heads of all lists (strings), else it terminates.
Nice, clean, simple and efficient... :)
A javascript version based on #Svante's algorithm:
function commonSubstring(words){
var iChar, iWord,
refWord = words[0],
lRefWord = refWord.length,
lWords = words.length;
for (iChar = 0; iChar < lRefWord; iChar += 1) {
for (iWord = 1; iWord < lWords; iWord += 1) {
if (refWord[iChar] !== words[iWord][iChar]) {
return refWord.substring(0, iChar);
}
}
}
return refWord;
}
Combining answers by kennebec, Florian F and jberryman yields the following Haskell one-liner:
commonPrefix l = map fst . takeWhile (uncurry (==)) $ zip (minimum l) (maximum l)
With Control.Arrow one can get a point-free form:
commonPrefix = map fst . takeWhile (uncurry (==)) . uncurry zip . (minimum &&& maximum)
Python 2.6 (r26:66714, Oct 4 2008, 02:48:43)
>>> a = ['interspecies', 'interstelar', 'interstate']
>>> print a[0][:max(
[i for i in range(min(map(len, a)))
if len(set(map(lambda e: e[i], a))) == 1]
) + 1]
inters
i for i in range(min(map(len, a))), number of maximum lookups can't be greater than the length of the shortest string; in this example this would evaluate to [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
len(set(map(lambda e: e[i], a))), 1) create an array of the i-thcharacter for each string in the list; 2) make a set out of it; 3) determine the size of the set
[i for i in range(min(map(len, a))) if len(set(map(lambda e: e[i], a))) == 1], include just the characters, for which the size of the set is 1 (all characters at that position were the same ..); here it would evaluate to [0, 1, 2, 3, 4, 5]
finally take the max, add one, and get the substring ...
Note: the above does not work for a = ['intersyate', 'intersxate', 'interstate', 'intersrate'], but this would:
>>> index = len(
filter(lambda l: l[0] == l[1],
[ x for x in enumerate(
[i for i in range(min(map(len, a)))
if len(set(map(lambda e: e[i], a))) == 1]
)]))
>>> a[0][:index]
inters
Doesn't seem that complicated if you're not too concerned about ultimate performance:
def common_substring(data)
data.inject { |m, s| s[0,(0..m.length).find { |i| m[i] != s[i] }.to_i] }
end
One of the useful features of inject is the ability to pre-seed with the first element of the array being interated over. This avoids the nil memo check.
puts common_substring(%w[ interspecies interstelar interstate ]).inspect
# => "inters"
puts common_substring(%w[ feet feel feeble ]).inspect
# => "fee"
puts common_substring(%w[ fine firkin fail ]).inspect
# => "f"
puts common_substring(%w[ alpha bravo charlie ]).inspect
# => ""
puts common_substring(%w[ fork ]).inspect
# => "fork"
puts common_substring(%w[ fork forks ]).inspect
# => "fork"
Update: If golf is the game here, then 67 characters:
def f(d)d.inject{|m,s|s[0,(0..m.size).find{|i|m[i]!=s[i]}.to_i]}end
This one is very similar to Roberto Bonvallet's solution, except in ruby.
chars = %w[interspecies interstelar interstate].map {|w| w.split('') }
chars[0].zip(*chars[1..-1]).map { |c| c.uniq }.take_while { |c| c.size == 1 }.join
The first line replaces each word with an array of chars. Next, I use zip to create this data structure:
[["i", "i", "i"], ["n", "n", "n"], ["t", "t", "t"], ...
map and uniq reduce this to [["i"],["n"],["t"], ...
take_while pulls the chars off the array until it finds one where the size isn't one (meaning not all chars were the same). Finally, I join them back together.
The accepted solution is broken (for example, it returns a for strings like ['a', 'ba']). The fix is very simple, you literally have to change only 3 characters (from indexOf(tem1) == -1 to indexOf(tem1) != 0) and the function would work as expected.
Unfortunately, when I tried to edit the answer to fix the typo, SO told me that "edits must be at least 6 characters". I could change more then those 3 chars, by improving naming and readability but that feels like a little bit too much.
So, below is a fixed and improved (at least from my point of view) version of the kennebec's solution:
function commonPrefix(words) {
max_word = words.reduce(function(a, b) { return a > b ? a : b });
prefix = words.reduce(function(a, b) { return a > b ? b : a }); // min word
while(max_word.indexOf(prefix) != 0) {
prefix = prefix.slice(0, -1);
}
return prefix;
}
(on jsFiddle)
Note, that it uses reduce method (JavaScript 1.8) in order to find alphanumeric max / min instead of sorting the array and then fetching the first and the last elements of it.
While reading these answers with all the fancy functional programming, sorting and regexes and whatnot, I just thought: what's wrong a little bit of C? So here's a goofy looking little program.
#include <stdio.h>
int main (int argc, char *argv[])
{
int i = -1, j, c;
if (argc < 2)
return 1;
while (c = argv[1][++i])
for (j = 2; j < argc; j++)
if (argv[j][i] != c)
goto out;
out:
printf("Longest common prefix: %.*s\n", i, argv[1]);
}
Compile it, run it with your list of strings as command line arguments, then upvote me for using goto!
Here's a solution using regular expressions in Ruby:
def build_regex(string)
arr = []
arr << string.dup while string.chop!
Regexp.new("^(#{arr.join("|")})")
end
def substring(first, *strings)
strings.inject(first) do |accum, string|
build_regex(accum).match(string)[0]
end
end
I would do the following:
Take the first string of the array as the initial starting substring.
Take the next string of the array and compare the characters until the end of one of the strings is reached or a mismatch is found. If a mismatch is found, reduce starting substring to the length where the mismatch was found.
Repeat step 2 until all strings have been tested.
Here’s a JavaScript implementation:
var array = ["interspecies", "interstelar", "interstate"],
prefix = array[0],
len = prefix.length;
for (i=1; i<array.length; i++) {
for (j=0, len=Math.min(len,array[j].length); j<len; j++) {
if (prefix[j] != array[i][j]) {
len = j;
prefix = prefix.substr(0, len);
break;
}
}
}
Instead of sorting, you could just get the min and max of the strings.
To me, elegance in a computer program is a balance of speed and simplicity.
It should not do unnecessary computation, and it should be simple enough to make its correctness evident.
I could call the sorting solution "clever", but not "elegant".
Oftentimes it's more elegant to use a mature open source library instead of rolling your own. Then, if it doesn't completely suit your needs, you can extend it or modify it to improve it, and let the community decide if that belongs in the library.
diff-lcs is a good Ruby gem for least common substring.
My solution in Java:
public static String compute(Collection<String> strings) {
if(strings.isEmpty()) return "";
Set<Character> v = new HashSet<Character>();
int i = 0;
try {
while(true) {
for(String s : strings) v.add(s.charAt(i));
if(v.size() > 1) break;
v.clear();
i++;
}
} catch(StringIndexOutOfBoundsException ex) {}
return strings.iterator().next().substring(0, i);
}
Golfed JS solution just for fun:
w=["hello", "hell", "helen"];
c=w.reduce(function(p,c){
for(r="",i=0;p[i]==c[i];r+=p[i],i++){}
return r;
});
Here's an efficient solution in ruby. I based the idea of the strategy for a hi/lo guessing game where you iteratively zero in on the longest prefix.
Someone correct me if I'm wrong, but I think the complexity is O(n log n), where n is the length of the shortest string and the number of strings is considered a constant.
def common(strings)
lo = 0
hi = strings.map(&:length).min - 1
return '' if hi < lo
guess, last_guess = lo, hi
while guess != last_guess
last_guess = guess
guess = lo + ((hi - lo) / 2.0).ceil
if strings.map { |s| s[0..guess] }.uniq.length == 1
lo = guess
else
hi = guess
end
end
strings.map { |s| s[0...guess] }.uniq.length == 1 ? strings.first[0...guess] : ''
end
And some checks that it works:
>> common %w{ interspecies interstelar interstate }
=> "inters"
>> common %w{ dog dalmation }
=> "d"
>> common %w{ asdf qwerty }
=> ""
>> common ['', 'asdf']
=> ""
Fun alternative Ruby solution:
def common_prefix(*strings)
chars = strings.map(&:chars)
length = chars.first.zip( *chars[1..-1] ).index{ |a| a.uniq.length>1 }
strings.first[0,length]
end
p common_prefix( 'foon', 'foost', 'forlorn' ) #=> "fo"
p common_prefix( 'foost', 'foobar', 'foon' ) #=> "foo"
p common_prefix( 'a','b' ) #=> ""
It might help speed if you used chars = strings.sort_by(&:length).map(&:chars), since the shorter the first string, the shorter the arrays created by zip. However, if you cared about speed, you probably shouldn't use this solution anyhow. :)
Javascript clone of AShelly's excellent answer.
Requires Array#reduce which is supported only in firefox.
var strings = ["interspecies", "intermediate", "interrogation"]
var sub = strings.reduce(function(l,r) {
while(l!=r.slice(0,l.length)) {
l = l.slice(0, -1);
}
return l;
});
This is by no means elegant, but if you want concise:
Ruby, 71 chars
def f(a)b=a[0];b[0,(0..b.size).find{|n|a.any?{|i|i[0,n]!=b[0,n]}}-1]end
If you want that unrolled it looks like this:
def f(words)
first_word = words[0];
first_word[0, (0..(first_word.size)).find { |num_chars|
words.any? { |word| word[0, num_chars] != first_word[0, num_chars] }
} - 1]
end
It's not code golf, but you asked for somewhat elegant, and I tend to think recursion is fun. Java.
/** Recursively find the common prefix. */
public String findCommonPrefix(String[] strings) {
int minLength = findMinLength(strings);
if (isFirstCharacterSame(strings)) {
return strings[0].charAt(0) + findCommonPrefix(removeFirstCharacter(strings));
} else {
return "";
}
}
/** Get the minimum length of a string in strings[]. */
private int findMinLength(final String[] strings) {
int length = strings[0].size();
for (String string : strings) {
if (string.size() < length) {
length = string.size();
}
}
return length;
}
/** Compare the first character of all strings. */
private boolean isFirstCharacterSame(String[] strings) {
char c = string[0].charAt(0);
for (String string : strings) {
if (c != string.charAt(0)) return false;
}
return true;
}
/** Remove the first character of each string in the array,
and return a new array with the results. */
private String[] removeFirstCharacter(String[] source) {
String[] result = new String[source.length];
for (int i=0; i<result.length; i++) {
result[i] = source[i].substring(1);
}
return result;
}
A ruby version based on #Svante's algorithm. Runs ~3x as fast as my first one.
def common_prefix set
i=0
rest=set[1..-1]
set[0].each_byte{|c|
rest.each{|e|return set[0][0...i] if e[i]!=c}
i+=1
}
set
end
My Javascript solution:
IMOP, using sort is too tricky.
My solution is compare letter by letter through looping the array.
Return string if letter is not macthed.
This is my solution:
var longestCommonPrefix = function(strs){
if(strs.length < 1){
return '';
}
var p = 0, i = 0, c = strs[0][0];
while(p < strs[i].length && strs[i][p] === c){
i++;
if(i === strs.length){
i = 0;
p++;
c = strs[0][p];
}
}
return strs[0].substr(0, p);
};
Realizing the risk of this turning into a match of code golf (or is that the intention?), here's my solution using sed, copied from my answer to another SO question and shortened to 36 chars (30 of which are the actual sed expression). It expects the strings (each on a seperate line) to be supplied on standard input or in files passed as additional arguments.
sed 'N;s/^\(.*\).*\n\1.*$/\1\n\1/;D'
A script with sed in the shebang line weighs in at 45 chars:
#!/bin/sed -f
N;s/^\(.*\).*\n\1.*$/\1\n\1/;D
A test run of the script (named longestprefix), with strings supplied as a "here document":
$ ./longestprefix <<EOF
> interspecies
> interstelar
> interstate
> EOF
inters
$

Categories