[update] Question solved, see bottom of post.
Since Python 2.5 the language got a new built-in method ‘all’ (and it’s nephew ‘any’). I wanted to play around with this a little, combined with generators, so I created a little testcase to test performance.
Here’s the test-case: take a list L of X random numbers in a given range [A, B], and check whether
- all elements in L are >= A
- all elements in L are >= (A + Z) where Z is a number in [0, (B - A)]
The first test should always result True, the second test could result to False.
Here’s the output of a test-run:
In [1]: import random, sys
In [2]: a = [random.randint(100, sys.maxint) for i in xrange(2000000)]
In [3]: len(a)
Out[3]: 2000000
In [4]: #Check whether all elements are >= 100
In [5]: %timeit all(i >= 100 for i in a)
10 loops, best of 3: 515 ms per loop
In [6]: %timeit any(i < 100 for i in a)
10 loops, best of 3: 454 ms per loop
In [7]: def f(l):
...: for i in l:
...: if i < 100:
...: return False
...: return True
...:
In [8]: %timeit f(a)
10 loops, best of 3: 292 ms per loop
In [9]: #Same thing for 100000, since now the list shouldn't be completely iterated
In [10]: %timeit all(i >= 100000 for i in a)
100 loops, best of 3: 4.73 ms per loop
In [11]: %timeit any(i < 100000 for i in a)
100 loops, best of 3: 4.29 ms per loop
In [12]: def g(l):
....: for i in l:
....: if i < 100000:
....: return False
....: return True
....:
In [13]: %timeit g(a)
100 loops, best of 3: 2.82 ms per loop
In [14]: #For reference
In [15]: %timeit False in (i >= 100 for i in a)
10 loops, best of 3: 531 ms per loop
In [16]: %timeit False in (i >= 100000 for i in a)
100 loops, best of 3: 5.03 ms per loop
It’s as if ‘all’, ‘any’ or ‘in’ don’t break/return when a first occurence of False (or True, obviously) is found. Is this the desired behaviour, and if it is, why? The calculation time difference between using all/any/in or a custom-made function (which is, unlike all etc, not written in C) which breaks whenever it can, is pretty astonishing.
[update] Question solved. It’s pretty normal the function-based approach performs better, since it combines what ‘all’ and the generator provided to ‘all’ do, taking away the generator function-call overhead. Damn