Discussion:
How to make Python run as fast (or faster) than Julia
(too old to reply)
Steven D'Aprano
2018-02-22 10:59:25 UTC
Permalink
https://www.ibm.com/developerworks/community/blogs/jfp/entry/Python_Meets_Julia_Micro_Performance?lang=en
--
Steve
bartc
2018-02-22 12:03:09 UTC
Permalink
Post by Steven D'Aprano
https://www.ibm.com/developerworks/community/blogs/jfp/entry/Python_Meets_Julia_Micro_Performance?lang=en
While an interesting article on speed-up techniques, that seems to miss
the point of benchmarks.

On the fib(20) test, it suggests using this to get a 30,000 times speed-up:

from functools import lru_cache as cache

@cache(maxsize=None)
def fib_cache(n):
if n<2:
return n
return fib_cache(n-1)+fib_cache(n-2)

The idea of the Fibonacci benchmark is to test how effectively an
implementation manages large numbers of recursive function calls. Then,
fib(36) would normally involve 48,315,633 calls.

This version does only 37, giving a misleading impression.

Anyway, I got a 6x speed-up using pypy, without changing anything.
Although, I doubt if that's still executing actual byte-code, if /that/
was the point of the test.

(It then goes on to suggest using 'numba', and using its JIT compiler,
and using on that on an /iterative/ version of fib(). Way to miss the point.

It might be a technique to bear in mind, but it is nonsensical to say
this gives a 17,000 times speed-up over the original code.

Here's another speed-up I found myself, although it was only 50 times
faster, not 17,000: just write the code in C, and call it via
os.system("fib.exe"). But you /do/ need to write it in a different
language.)
--
bartc
Chris Angelico
2018-02-22 12:29:53 UTC
Permalink
Post by Steven D'Aprano
https://www.ibm.com/developerworks/community/blogs/jfp/entry/Python_Meets_Julia_Micro_Performance?lang=en
While an interesting article on speed-up techniques, that seems to miss the
point of benchmarks.
from functools import lru_cache as cache
@cache(maxsize=None)
return n
return fib_cache(n-1)+fib_cache(n-2)
The idea of the Fibonacci benchmark is to test how effectively an
implementation manages large numbers of recursive function calls. Then,
fib(36) would normally involve 48,315,633 calls.
This version does only 37, giving a misleading impression.
Not overly misleading; the point of it is to show how trivially easy
it is to memoize a function in Python. For a fair comparison, I'd like
to see the equivalent Julia code: the function, unchanged, with
something around the outside of it to manage caching and memoization.
Can that be done with a couple of trivial lines of code using only the
standard library?

ChrisA
Serhiy Storchaka
2018-02-22 13:51:39 UTC
Permalink
Post by Chris Angelico
Not overly misleading; the point of it is to show how trivially easy
it is to memoize a function in Python. For a fair comparison, I'd like
to see the equivalent Julia code: the function, unchanged, with
something around the outside of it to manage caching and memoization.
Can that be done with a couple of trivial lines of code using only the
standard library?
The article contains a reference to a Julia example.
Post by Chris Angelico
Note that Julia also can memoize functions, see this example [1]
provided by Ismael V.C.
[1] http://nbviewer.jupyter.org/gist/Ismael-VC/b04ad17ee44c89917803
ast
2018-02-22 15:15:54 UTC
Permalink
Post by bartc
Post by Steven D'Aprano
https://www.ibm.com/developerworks/community/blogs/jfp/entry/Python_Meets_Julia_Micro_Performance?lang=en
While an interesting article on speed-up techniques, that seems to miss
the point of benchmarks.
    from functools import lru_cache as cache
            return n
        return fib_cache(n-1)+fib_cache(n-2)
It's a meaningless to test the execution time of a function
with a cache decorator on 1.000.000 loops

The first execution, ok, you get something meaningfull
but for the other 999.999 executions, the result is already on
the cache so you just measure the time to read the result
in a dictionnary and output it.
Post by bartc
Post by Steven D'Aprano
setup = """\
from functools import lru_cache as cache
@cache(maxsize=None)
def fib(n):
if n < 2: return n
return fib(n-1) + fib(n-2)
"""
Post by bartc
Post by Steven D'Aprano
from timeit import timeit
timeit("fib(20)", setup=setup, number=1)
0.00010329007704967808
Post by bartc
Post by Steven D'Aprano
timeit("fib(20)", setup=setup, number=100)
0.0001489834564836201

so 100 loops or 1 loop provides similar results
as expected !
Chris Angelico
2018-02-22 18:53:26 UTC
Permalink
Post by ast
Post by bartc
Post by Steven D'Aprano
https://www.ibm.com/developerworks/community/blogs/jfp/entry/Python_Meets_Julia_Micro_Performance?lang=en
While an interesting article on speed-up techniques, that seems to miss
the point of benchmarks.
from functools import lru_cache as cache
@cache(maxsize=None)
return n
return fib_cache(n-1)+fib_cache(n-2)
It's a meaningless to test the execution time of a function
with a cache decorator on 1.000.000 loops
The first execution, ok, you get something meaningfull
but for the other 999.999 executions, the result is already on
the cache so you just measure the time to read the result
in a dictionnary and output it.
Post by bartc
Post by Steven D'Aprano
setup = """\
from functools import lru_cache as cache
@cache(maxsize=None)
if n < 2: return n
return fib(n-1) + fib(n-2)
"""
Post by bartc
Post by Steven D'Aprano
from timeit import timeit
timeit("fib(20)", setup=setup, number=1)
0.00010329007704967808
Post by bartc
Post by Steven D'Aprano
timeit("fib(20)", setup=setup, number=100)
0.0001489834564836201
so 100 loops or 1 loop provides similar results
as expected !
The solution would be to flush the cache in the core code. Here's a tweak:

from timeit import timeit
setup = """
from functools import lru_cache as cache
def fib(n):
if n < 2: return n
return fib(n-1) + fib(n-2)
# this is what we effectively do inside the loop:
@cache(maxsize=None)
def fib_cached(n):
return fib(n)
"""
for count in 1, 10, 100, 1000:
print(count, timeit("cache(maxsize=None)(fib)(20)", setup=setup,
number=count))

You could improve on performance by keeping the same function object
and just flushing the cache with fib.cache_clear(), but I have no idea
how you'd translate that into other languages. Constructing a cache at
run-time should be safe.

ChrisA
ast
2018-02-22 20:15:12 UTC
Permalink
Post by Chris Angelico
print(count, timeit("cache(maxsize=None)(fib)(20)", setup=setup,
number=count))
hum ... very astute
Mark Lawrence
2018-02-22 14:02:46 UTC
Permalink
Post by bartc
It might be a technique to bear in mind, but it is nonsensical to say
this gives a 17,000 times speed-up over the original code.
What makes you say that? I have just run all of the code as given in
the reference and get results that are in the same ball park for every
test that was run.
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence
bartc
2018-02-23 00:26:33 UTC
Permalink
Post by bartc
It might be a technique to bear in mind, but it is nonsensical to say
this gives a 17,000 times speed-up over the original code.
What makes you say that?  I have just run all of the code as given in
the reference and get results that are in the same ball park for every
test that was run.
It's nonsensical because you are comparing two very different
algorithms, especially when one could well be a million times faster
than the other.

The point of the article was Julia vs. Python. You can't make Python
faster by switching to a faster algorithm; you have to use the same one
on both.
--
bartc
Steven D'Aprano
2018-02-23 05:00:13 UTC
Permalink
Post by bartc
The point of the article was Julia vs. Python. You can't make Python
faster by switching to a faster algorithm; you have to use the same one
on both.
No, the point of the article was to write Python code that is as fast as
the Julia code.

I don't know why the Julia programmers chose to use such a poor
algorithm: maybe they are show-casing just how amazingly awesome the
Julia compiler is that it can even make a rubbish algorithm run fast. I
don't care -- they can choose whatever algorithm they like for their
Julia code, but I don't really care how good the language is at running
bad code that I would never use in real life.
--
Steve
Ben Bacarisse
2018-02-23 13:51:33 UTC
Permalink
Post by Steven D'Aprano
Post by bartc
The point of the article was Julia vs. Python. You can't make Python
faster by switching to a faster algorithm; you have to use the same one
on both.
No, the point of the article was to write Python code that is as fast as
the Julia code.
I did not get that impression. In fact I was not really sure what the
point was. At the start, the authors says

"My take on this kind of cross language comparison is that the
benchmarks should be defined by tasks to perform, then have language
experts write the best code they can to perform these tasks."

but if that is the point (rather the one on you picked up on) then the
article could be, at best, only half the story.
Post by Steven D'Aprano
I don't know why the Julia programmers chose to use such a poor
It's odd indeed, but given that they did, what you take to be the point
of the article -- to write a good Python algorithm as fast as the
terrible Julia one -- seems a bit pointless.
--
Ben.
Steven D'Aprano
2018-02-23 18:05:59 UTC
Permalink
On Fri, 23 Feb 2018 13:51:33 +0000, Ben Bacarisse wrote:

[...]
Post by Ben Bacarisse
Post by Steven D'Aprano
I don't know why the Julia programmers chose to use such a poor
It's odd indeed, but given that they did, what you take to be the point
of the article -- to write a good Python algorithm as fast as the
terrible Julia one -- seems a bit pointless.
Someone looking at the Julia benchmarks will look at the results and go,
"Wow! Julia is so much faster than Python, I should choose to use Julia
instead!".

But guess what? The benchmarks are flawed. The performance of real-world
Julia code doesn't match the performance of the benchmarks.

"What’s disappointing is the striking difference between
the claimed performance and the observed one. For example,
a trivial hello world program in Julia runs ~27x slower
than Python’s version and ~187x slower than the one in C."

http://zverovich.net/2016/05/13/giving-up-on-julia.html

Admittedly "Hello World" is not what *I* would call a real world program,
but that's just an illustration of the discrepancy between the
performance in artificial benchmarks and the performance in useful code.

The benchmarks emphasise what Julia is good at, and use poor, unidiomatic
Python code. People who are either too naive to know better, or who *do*
know better but for some bizarre reason still take benchmarks at face
value, believe them, and conclude that Julia is faster than Python. Not
just a bit faster, but enough to conclude that Python is uncompetitive.

Okay, so Julia is fast, at least for toy benchmarks, maybe or maybe not
for actual code you care about. Great. Python is fast too, if you stop
writing shitty code. The *FIRST* lesson in optimization is to re-write
your crap implementation with a better algorithm. The point of the post
is to teach that lesson.

Stop writing crap code and then complaining that the language is "too
slow". Write better code, and then we'll take your complaints seriously.
Or, you never know, maybe you'll find it's fast enough and you don't
actually have to abandon your existing code base and migrate to a whole
new language.

"Python takes hours to sort a million integers using BubbleSort! Its too
damn slow!!!"

"Why not use the built-in sort? That's fast."

"NOOOOOO!!!! I MUST USE BUBBLESORT, BECAUSE REASONS!!!1!"

*wink*
--
Steve
bartc
2018-02-23 19:25:35 UTC
Permalink
Post by Steven D'Aprano
Stop writing crap code and then complaining that the language is "too
slow". Write better code, and then we'll take your complaints seriously.
I write compilers for static languages, which are generally considered
toys, partly because the code generated is not great.

But in a real application, the difference between my compiler, and
gcc-O3 or MSVC/O2 might only be 10-50%. 1.1 to 1.5 times as fast (or as
slow), and that's an application that does few external library calls.

The difference between Python and another dynamic language might be a
magnitude, yet you say it doesn't matter.

Thanks, that makes me feel much better about my own work! If a magnitude
difference in performance can be so casually dismissed.
Post by Steven D'Aprano
"Python takes hours to sort a million integers using BubbleSort! Its too
damn slow!!!"
"Why not use the built-in sort? That's fast."
"NOOOOOO!!!! I MUST USE BUBBLESORT, BECAUSE REASONS!!!1!"
*wink*
Have you ever written a benchmark in your life? If so, did it do
anything useful? Other than what benchmarks are normally used for.
--
bartc
Chris Angelico
2018-02-23 19:47:00 UTC
Permalink
Post by Steven D'Aprano
Stop writing crap code and then complaining that the language is "too
slow". Write better code, and then we'll take your complaints seriously.
I write compilers for static languages, which are generally considered toys,
partly because the code generated is not great.
Doesn't matter what language you're writing in, if your algorithm is
flawed. Bubble sort is bubble sort no matter what you're using. This
function will calculate triangle numbers just as inefficiently in
Python as it would in C:

def triangle(n):
if n <= 1: return n
return triangle(n-1) + triangle(n-1) - triangle(n-2) + 1
But in a real application, the difference between my compiler, and gcc-O3 or
MSVC/O2 might only be 10-50%. 1.1 to 1.5 times as fast (or as slow), and
that's an application that does few external library calls.
The difference between Python and another dynamic language might be a
magnitude, yet you say it doesn't matter.
Thanks, that makes me feel much better about my own work! If a magnitude
difference in performance can be so casually dismissed.
Yep! Your work is awesome, because your code is only a little bit
slower than the same algorithm with the same bugs in a better-known
language that's supported on more platforms and has a large team of
people supporting it.

Remind me what's actually advantageous about your language?

ChrisA
bartc
2018-02-23 20:02:21 UTC
Permalink
Post by Chris Angelico
Post by bartc
The difference between Python and another dynamic language might be a
magnitude, yet you say it doesn't matter.
Thanks, that makes me feel much better about my own work! If a magnitude
difference in performance can be so casually dismissed.
Yep! Your work is awesome, because your code is only a little bit
slower than the same algorithm with the same bugs in a better-known
language that's supported on more platforms and has a large team of
people supporting it.
Remind me what's actually advantageous about your language?
I don't know what point you're making here. Unless it's that no software
is worth writing unless it's done on a big scale with a huge team of
people, and is very well known.

My own point should be clear:

Python is 10 times slower than a competitor = doesn't matter
My language is 1.5 times slower than the big boys' = matters a great deal
Chris Angelico
2018-02-23 20:12:48 UTC
Permalink
Post by Chris Angelico
Post by bartc
The difference between Python and another dynamic language might be a
magnitude, yet you say it doesn't matter.
Thanks, that makes me feel much better about my own work! If a magnitude
difference in performance can be so casually dismissed.
Yep! Your work is awesome, because your code is only a little bit
slower than the same algorithm with the same bugs in a better-known
language that's supported on more platforms and has a large team of
people supporting it.
Remind me what's actually advantageous about your language?
I don't know what point you're making here. Unless it's that no software is
worth writing unless it's done on a big scale with a huge team of people,
and is very well known.
Python is 10 times slower than a competitor = doesn't matter
My language is 1.5 times slower than the big boys' = matters a great deal
Most performance is a tradeoff. Python has other advantages; what are
yours? So far, all you've said is that you're "not too much worse".
That's great for a toy language!

ChrisA
bartc
2018-02-23 21:32:50 UTC
Permalink
Post by Chris Angelico
I don't know what point you're making here. Unless it's that no software is
worth writing unless it's done on a big scale with a huge team of people,
and is very well known.
Python is 10 times slower than a competitor = doesn't matter
My language is 1.5 times slower than the big boys' = matters a great deal
Most performance is a tradeoff. Python has other advantages; what are
yours? So far, all you've said is that you're "not too much worse".
That's great for a toy language!
Ned Batchelder has banned me from mentioning any of my private work here.

So I'll keep it generic. Let's say the Tiny C compiler is not taken
seriously because it might be a couple of times slower than gcc-O3, even
thought it's 1% of the size and compiles 1000% as fast.

But the difference in runtime speed between Python and other dynamic
languages, if you look at benchmarks doing actual work in the language,
can be much greater than two times, yet that doesn't appear to matter.

Even defining how fast or slow Python actually is is fraught with
complexities making it hard to establish just what its comparative speed is:

No one seems to be able to agree with what benchmarks ought to be used.
Some don't seem to get benchmarks. Then it's question of whether you run
CPython. Or PyPy. Or Ironpython or Jpython. Or Numba with @jit. Or
Numpy. Or @cache or whatever it is. Or you run Cython.

Or maybe you can combine @jit, @cache, Cython, and PyPy in the same
program; I don't know. (Have I ever mentioned Python is complicated?)

So, you were asking about the benefits of having a small, simple
self-contained implementation of a language. I really couldn't tell you...
--
bartc
Chris Angelico
2018-02-23 21:51:27 UTC
Permalink
Post by bartc
So I'll keep it generic. Let's say the Tiny C compiler is not taken
seriously because it might be a couple of times slower than gcc-O3, even
thought it's 1% of the size and compiles 1000% as fast.
Except that nobody has said that. You're doing an excellent job of
fighting against a strawman.

I'm done. Have fun!

ChrisA
Dan Stromberg
2018-02-24 00:45:38 UTC
Permalink
Post by bartc
So I'll keep it generic. Let's say the Tiny C compiler is not taken
seriously because it might be a couple of times slower than gcc-O3, even
thought it's 1% of the size and compiles 1000% as fast.
What's generic about a specific example? (that's rhetorical)

I've used tcc in preference to gcc before; I had an application where
performance didn't matter as much as security; tcc has buffer overrun
checking.
Post by bartc
But the difference in runtime speed between Python and other dynamic
languages, if you look at benchmarks doing actual work in the language, can
be much greater than two times, yet that doesn't appear to matter.
The best way to use Python is to:
1) Write your whole application in it.
2) IF things are too slow, profile the app. Usually performance will
be fast enough from the start, but not always.
3) After (if) identifying a hotspot, optimize it in any of a variety
of ways, while still reaping the benefits of rapid development (which
often matters much more than CPU speed today)

I have a page about speeding up python:
http://stromberg.dnsalias.org/~strombrg/speeding-python/

But again, those techniques are only infrequently relevant, and pretty
much never relevant to an entire application.
bartc
2018-02-24 02:06:05 UTC
Permalink
Post by Dan Stromberg
Post by bartc
But the difference in runtime speed between Python and other dynamic
languages, if you look at benchmarks doing actual work in the language, can
be much greater than two times, yet that doesn't appear to matter.
1) Write your whole application in it.
2) IF things are too slow, profile the app. Usually performance will
be fast enough from the start, but not always.
3) After (if) identifying a hotspot, optimize it in any of a variety
of ways, while still reaping the benefits of rapid development (which
often matters much more than CPU speed today)
http://stromberg.dnsalias.org/~strombrg/speeding-python/
But again, those techniques are only infrequently relevant, and pretty
much never relevant to an entire application.
That your page lists 23 different approaches to making Python faster
suggests that its speed can be an issue. Certainly there seem to be a
lot of projects going on trying to fix that aspect.

So I'll rephrase, and say that some individuals in this group are saying
that speed doesn't matter. For many others it apparently does.

(I've just looked at the benchmark results referred to in the OP's link
[https://julialang.org/benchmarks/] and the comparison for the recursion
benchmark (let's just call it that) seems fair enough to me.

Perhaps it ought to have been clear that this was CPython, if it in fact
was. There are so many possibilities with Python, that using anything
else would confuse matters. People who use Python will be aware there
are acceleration techniques.

(A note regarding the C timing in that table: I've looked at what gcc
does with this benchmark with its unreasonably fast timing, and it
appears to cheat. That is, it does not call that function (ie. enter via
the normal prologue code) the number of times it says. I don't know if
that was the case when they tested C for that table. But it can happen
that a sufficiently simple benchmark can lend itself to
super-optimisation that is much harder with real code.

For that reason I no longer take account of gcc-O3 timings for
micro-benchmarks.))
--
bartc
Dan Stromberg
2018-02-24 02:22:29 UTC
Permalink
Post by bartc
Post by Dan Stromberg
But again, those techniques are only infrequently relevant, and pretty
much never relevant to an entire application.
That your page lists 23 different approaches to making Python faster
suggests that its speed can be an issue. Certainly there seem to be a lot of
projects going on trying to fix that aspect.
Of course CPU performance can be an issue. For any language.

When I was a kid, programs were written in BASIC. If BASIC was too
slow, you'd rewrite the hotspots in assembler. Does that mean
everything should have been written in assembler? Not at all.
Assembler was slow to code in and tended to produce crashy solutions.

Obsessing about CPU time is outdated thinking for almost all problems.
Developer time is much more likely to matter to the bottom line
Post by bartc
Perhaps it ought to have been clear that this was CPython, if it in fact
was.
CPython != Python. g++ != C++. gcc != C.

Python is a language. Not an implementation of a language.
Post by bartc
There are so many possibilities with Python, that using anything else
would confuse matters.
Gosh no. A language's success can, in significant part, be judged by
the number of compatible implementations and dialects.

You might find this interesting:
http://stromberg.dnsalias.org/~strombrg/why-is-python-slow/

It's just one microbenchmark, but it's me addressing someone asking
why Python is so much slower than C++.
Ned Batchelder
2018-02-23 20:40:06 UTC
Permalink
Post by bartc
Post by Chris Angelico
Post by bartc
The difference between Python and another dynamic language might be a
magnitude, yet you say it doesn't matter.
Thanks, that makes me feel much better about my own work! If a magnitude
difference in performance can be so casually dismissed.
Yep! Your work is awesome, because your code is only a little bit
slower than the same algorithm with the same bugs in a better-known
language that's supported on more platforms and has a large team of
people supporting it.
Remind me what's actually advantageous about your language?
I don't know what point you're making here. Unless it's that no
software is worth writing unless it's done on a big scale with a huge
team of people, and is very well known.
 Python is 10 times slower than a competitor = doesn't matter
 My language is 1.5 times slower than the big boys' = matters a great
deal
I've lost track of the points anyone here is making.  Can we PLEASE
avoid getting into this tarpit of comparing Bart's language to Python
yet again?

--Ned.
Steven D'Aprano
2018-02-24 02:05:17 UTC
Permalink
Post by bartc
Post by Steven D'Aprano
Stop writing crap code and then complaining that the language is "too
slow". Write better code, and then we'll take your complaints
seriously.
I write compilers for static languages, which are generally considered
toys, partly because the code generated is not great.
I have no idea whether your compilers are toys, or what "not great" means
for your generated code.

- too slow for the intended purpose?
- full of bugs?
- generated code is enormous?
- requires too much memory?
- all of the above?

If the actual answer is, "none of the above", then apart from your
failure to attract users and build a full ecosystem around your compiler,
I see no reason to call them toys.

If your compilers can get within 10% of best-of-breed C compilers, even
if only on trivial problems, that's nothing to sneer at. If you can do it
on non-trivial applications (say, the Linux kernel) without introducing a
vast number of compiler-generated bugs, then you're doing very well
indeed.
Post by bartc
Python is 10 times slower than a competitor = doesn't matter
My language is 1.5 times slower than the big boys' = matters
a great deal
Maybe it matters to *you*. You are the one who stated it was a toy
compiler. To me, I have *no idea* whether the compiler is any good.
Nobody but you uses it. I once tried to compile it, and couldn't get it
to compile.

As for Python's order-of-magnitude speed difference, thank you for being
generous. If you cherry-pick the right benchmarks, you could easily have
said it was two orders-of-magnitude slower. On the other hand, if you use
PyPy, and similarly cherry-pick the right benchmarks, you can justifiably
say that *Python is faster than C*. (On carefully chosen examples which
probably aren't representative of full application performance.)

But okay, let's take "Python is an order of magnitude (between 10 and 90
times) slower than C" for the sake of the argument. Does it matter?

Maybe. Maybe not. Sometimes it matters. Sometimes it doesn't.

The world's fastest street-legal production car is, so I'm told, the
Bugatti Veyron Super Sport. It has a top speed of 268 mph and the engine
has 1200 horsepower. And yet literally *billions* of car drivers choose
to drive something slower. They wouldn't buy a Veyron even if they could
afford it. Why shouldn't people make choices based on factors other than
speed, speed, speed, speed, speed?

Ruby has fallen out of favour now, but for quite a few years it was
looking like a contender despite being an order of magnitude slower than
Python. Which is an order of magnitude slower than Java, which is an
order of magnitude slower than C, according to some benchmarks at least.

People write business applications using Excel spreadsheets and macros.
And why not?

So it is quite possible to get practical work done and be a competitive,
useful language despite being (allegedly) a thousand or more times slower
than C.


[...]
Post by bartc
Have you ever written a benchmark in your life? If so, did it do
anything useful? Other than what benchmarks are normally used for.
Yes I have, not for what benchmarks are normally used for (dick-measuring
contests between programmers competing to see whose language is more
macho) but for the very practical purposes of:

(1) Deciding which algorithm to use in practice;

(2) Ensuring that updates to the code don't suffer serious performance
regressions;

(3) And checking that the code is *fast enough* (not as fast as
conceivably possible) for its purpose.

They are the sorts of benchmarks I care about.
--
Steve
bartc
2018-02-24 14:38:41 UTC
Permalink
Post by Steven D'Aprano
Post by bartc
Python is 10 times slower than a competitor = doesn't matter
My language is 1.5 times slower than the big boys' = matters
a great deal
As for Python's order-of-magnitude speed difference, thank you for being
generous.
Actually that comparison was with a competitor, ie. another dynamic
language, because I understand such languages work in different fields
from the Cs and C++s.

I'm sure there must be some that are faster (years since I've looked at
the field), but I vaguely had in mind mine. Although since then, CPython
has gotten faster.

Note that there are JIT-based implementations now which can give very
good results (other than PyPy) with dynamic languages.

My own efforts are still byte-code based so are unlikely to get any
faster. But they are also very simple.
Post by Steven D'Aprano
So it is quite possible to get practical work done and be a competitive,
useful language despite being (allegedly) a thousand or more times slower
than C.
Of course. I've been using a dynamic scripting language as an adjunct to
my compiled applications since the mid 80s. Then they were crude and
hopelessly slow (and machines were a lot slower too), but they could
still be tremendously useful with the right balance.

But the faster they are, the more work they can take over.
--
bartc
Chris Angelico
2018-02-23 18:37:14 UTC
Permalink
On Sat, Feb 24, 2018 at 5:05 AM, Steven D'Aprano
Post by Steven D'Aprano
But guess what? The benchmarks are flawed. The performance of real-world
Julia code doesn't match the performance of the benchmarks.
"What’s disappointing is the striking difference between
the claimed performance and the observed one. For example,
a trivial hello world program in Julia runs ~27x slower
than Python’s version and ~187x slower than the one in C."
http://zverovich.net/2016/05/13/giving-up-on-julia.html
Admittedly "Hello World" is not what *I* would call a real world program,
but that's just an illustration of the discrepancy between the
performance in artificial benchmarks and the performance in useful code.
No, but Hello World is a good test of application startup time. (A
null program doesn't always test everything, and doesn't look very
good either.) If Hello World takes half a second to run, you have a
language that is terrible for short scripts or command-line tools.
That doesn't mean it's a "slow language" necessarily - nor even a slow
interpreter, which is what you're actually testing - but it does mean
that, well...

***@sikorsky:~$ time pypy -c 'print("Hello, world!")'
Hello, world!

real 0m0.429s
user 0m0.036s
sys 0m0.032s
***@sikorsky:~$ time python -c 'print("Hello, world!")'
Hello, world!

real 0m0.024s
user 0m0.016s
sys 0m0.008s

... CPython does have its place still. Or maybe...

***@sikorsky:~$ time pypy -c 'print("Hello, world!")'
Hello, world!

real 0m0.050s
user 0m0.016s
sys 0m0.032s
***@sikorsky:~$ time python -c 'print("Hello, world!")'
Hello, world!

real 0m0.022s
user 0m0.016s
sys 0m0.004s

... that PyPy doesn't live in my disk cache the way CPython does. :)
PyPy is still slower than CPython for startup time, but not THAT much
slower. However, if you DO have something that has a ton of startup
overhead, you need to know about it. (Interestingly, I was actually
able to compile and run a C hello-world in comparable time to PyPy,
once the cache was warmed up. Curious. I expected that to be slower.
Once again, it proves that intuition can be extremely wrong.)

In terms of "which language is faster?", real-world code is both the
ONLY way to measure, and a completely unfair comparison. Life is
tough.

ChrisA
bartc
2018-02-22 16:00:35 UTC
Permalink
BTW while doing my tests, I found you could redefine the same function
with no error:

def fred():
pass

def fred():
pass

def fred():
pass

For classes too. I was aware you could reassign a different value to
such names, but didn't know this applied to def.

I guess the reason is that 'def' is just another statement, and
statements can be conditional, so that you can have several versions.

It was still a surprise. What an easy way for things to go wrong by
writing a new version of a function but forgetting to delete the
original. And if they happen to be in the right order so the program
still works, it must be confusing to read.
--
bartc
Ned Batchelder
2018-02-22 16:55:43 UTC
Permalink
Post by bartc
BTW while doing my tests, I found you could redefine the same function
    pass
    pass
    pass
For classes too. I was aware you could reassign a different value to
such names, but didn't know this applied to def.
I guess the reason is that 'def' is just another statement, and
statements can be conditional, so that you can have several versions.
More importantly, "def x" is just a convenient way to express an
assignment to x.  It behaves exactly like "x = make_a_function(name='x',
...)" (where I'm waving my hands about what happens in the ellipsis...
:)   Just as there is no prohibition on "x = 1; x = 2", there's no
prohibition on "def x(); def x()"

Lots of Python statements are assignments to names, with precisely the
same referencing and scoping behaviors as conventional assignment. All
of these assign to x:

    x = ...
    def x(...)
    def fn(x)
    class x
    import x
    import blah as x
    from blah import x
    for x in ...
    with blah as x
    except Something as x

--Ned.
Lawrence D’Oliveiro
2018-02-22 23:26:02 UTC
Permalink
Post by bartc
For classes too. I was aware you could reassign a different value to
such names, but didn't know this applied to def.
Every name in Python is a variable.
Post by bartc
I guess the reason is that 'def' is just another statement, and
statements can be conditional, so that you can have several versions.
There are no declarations as such in Python: every statement is executable.
Steven D'Aprano
2018-02-22 16:00:48 UTC
Permalink
Post by bartc
The idea of the Fibonacci benchmark is to test how effectively an
implementation manages large numbers of recursive function calls. Then,
fib(36) would normally involve 48,315,633 calls.
This version does only 37, giving a misleading impression.
Who cares if it only does 37? There is *absolutely no benefit* to those
additional 48,315,596 calls, and thanks to the benchmark I discovered
that.[1]

I want to know what is the fastest implementation of Fibonacci I can
write. I'm not married to the idea that I have to make a ludicrous 48
million function calls just to get the 36th Fibonacci number.
Post by bartc
(It then goes on to suggest using 'numba', and using its JIT compiler,
and using on that on an /iterative/ version of fib(). Way to miss the point.
Indeed, you certainly have.
Post by bartc
It might be a technique to bear in mind, but it is nonsensical to say
this gives a 17,000 times speed-up over the original code.
That's *exactly what it did*.

How many years, decades, have you been programming? Have you not realised
yet that the best way to optimize code is to pick the fastest algorithm
you can that will do the job? There's no law that says because you
started with a slow, inefficient algorithm you have to stay with it.
Post by bartc
Here's another speed-up I found myself, although it was only 50 times
faster, not 17,000: just write the code in C, and call it via
os.system("fib.exe").
Did you include the time taken to capture the text output from stdout,
parse it, and convert to an int?

Did your C code support numbers greater than 2**64? My fibonacci function
can calculate the 10,000th Fibonacci number in a blink of the eye:

py> fibi(10000)
336447648764317832666216120051075...976171121233066073310059947366875

Output truncated for brevity, it is a 2090-digit number.

Admittedly it did take about a minute and a half to generate all 208988
digits of the one millionth Fibonacci number, but the performance is fast
enough for my needs.




[1] Actually I already knew it, but then I didn't perform these
benchmarks, I'm just linking to them.
--
Steve
bartc
2018-02-22 17:53:30 UTC
Permalink
Post by Steven D'Aprano
Post by bartc
The idea of the Fibonacci benchmark is to test how effectively an
implementation manages large numbers of recursive function calls. Then,
fib(36) would normally involve 48,315,633 calls.
This version does only 37, giving a misleading impression.
Who cares if it only does 37? There is *absolutely no benefit* to those
additional 48,315,596 calls, and thanks to the benchmark I discovered
that.[1]
I want to know what is the fastest implementation of Fibonacci I can
write. I'm not married to the idea that I have to make a ludicrous 48
million function calls just to get the 36th Fibonacci number.
As I said, people keep missing the point. The fact this uses a grossly
inefficient way of calculating Fibonacci seems to blind them to any
other considerations.

The actual result is irrelevant, so long as its correct. The important
thing is those 50 million calls.

If you want to know how well one language implementations deals with
function calls compared to another, you will prefer a test that is doing
48 million function calls and little else, rather than one that does
only three dozen.

And one that is harder for a compiler to optimise, as you want to know
how fast those 48 million calls can be made, not how well a compiler can
avoid doing them!
Post by Steven D'Aprano
Post by bartc
(It then goes on to suggest using 'numba', and using its JIT compiler,
and using on that on an /iterative/ version of fib(). Way to miss the point.
Indeed, you certainly have.
Post by bartc
It might be a technique to bear in mind, but it is nonsensical to say
this gives a 17,000 times speed-up over the original code.
That's *exactly what it did*.
By running a different program? Fine, then here's my submission for the
N=20 case used in the tests:

def fib(n): return 6765

You have to compare like with like when testing languages, otherwise
there's no point. If you want to use an iterative algorithm, then do the
same with Julia. But then these functions are so fast, that it's not
clear exactly what you are testing.
Post by Steven D'Aprano
Did your C code support numbers greater than 2**64? My fibonacci function
py> fibi(10000)
336447648764317832666216120051075...976171121233066073310059947366875
Output truncated for brevity, it is a 2090-digit number.
Admittedly it did take about a minute and a half to generate all 208988
digits of the one millionth Fibonacci number, but the performance is fast
enough for my needs.
(OK, my interpreter took nearly 4 minutes, but I use my own big integer
algorithms. All jolly interesting, but a diversion.)

The fact is that the vast majority of integer calculations don't need to
use big integers (pretty much 100% of mine). Probably most don't even
need 64 bits, but 32 bits.

So, if there is a performance advantage in having separate 64-bit and
big integer types, then why not make use of it?

There's been a lot of talk of the advantages of static typing, and here
is one use for it.
--
Bartc
Steven D'Aprano
2018-02-23 01:27:53 UTC
Permalink
Post by bartc
Post by Steven D'Aprano
Post by bartc
The idea of the Fibonacci benchmark is to test how effectively an
implementation manages large numbers of recursive function calls.
Then, fib(36) would normally involve 48,315,633 calls.
This version does only 37, giving a misleading impression.
Who cares if it only does 37? There is *absolutely no benefit* to those
additional 48,315,596 calls, and thanks to the benchmark I discovered
that.[1]
I want to know what is the fastest implementation of Fibonacci I can
write. I'm not married to the idea that I have to make a ludicrous 48
million function calls just to get the 36th Fibonacci number.
As I said, people keep missing the point. The fact this uses a grossly
inefficient way of calculating Fibonacci seems to blind them to any
other considerations.
The actual result is irrelevant, so long as its correct. The important
thing is those 50 million calls.
Why do you care about the 50 million calls? That's crazy -- the important
thing is *calculating the Fibonacci numbers as efficiently as possible*.

I don't believe for a second that you would write code like this:


# need a Fibonacci number, but the implementation we have
# is too efficient, only taking 50 calls instead of 50 million
# so call the function a million times
for i in range(1000000):
num = fibonacci(40)
do_something_useful(num)


so why on earth do you care about the 50 million calls?

The only reason I care about them is to reduce the number as much as
possible. Why do you care about them?
Post by bartc
If you want to know how well one language implementations deals with
function calls compared to another, you will prefer a test that is doing
48 million function calls and little else, rather than one that does
only three dozen.
I don't care about "how well one language implementations deals with
function calls compared to another". I care about writing the fastest
idiomatic code I can in the language I'm using. If I were writing Julia
code, I would write the best Julia code I could. If I were writing Python
code, I would write the best Python code I can.

If I'm writing C or Pascal, that would mean writing loops like this:

arr := some array
for i := 1 to len(arr):
value = arr[i]
process(value)


If I'm writing Ruby or Python, that means *NOT* writing loops like that,
but writing them like this:

for value in arr:
process(value)


If the Julia coders have decided, for whatever reason, that they want to
show off how super-efficient function recursion is in Julia by using a
doubly-recursive implementation of the Fibonacci sequence, I'm not bound
by their choice. Their motive is to highlight the advantages of Julia. My
motive is to show how Python can be just as fast.

When beginners complain that "Python is too slow", nine times out of ten
they are writing non-idiomatic, inefficient code. Python is not Java:

http://dirtsimple.org/2004/12/python-is-not-java.html

nor is it Julia.
Post by bartc
And one that is harder for a compiler to optimise, as you want to know
how fast those 48 million calls can be made, not how well a compiler can
avoid doing them!
I don't give a flying fox about how fast the compiler can do those 48
million calls, I want to know how to get Fibonacci numbers as fast as I
can, and if that means avoiding making the calls, then you better believe
that I will avoid making those calls.

Why are you benchmarking code you would never use in real code?

I would be a really shitty programmer if I insisted on writing Julia code
in Python just because the Julia developers write Julia code in Julia.
Just as I would be a shitty programmer if I insisted that Julia
programmers must write *Python* code in Julia.

Write code that is natural and efficient for the language you are using.
Otherwise you end up making really ludicrous comparisons between
languages that have *zero* relationship to real-world code that anyone in
their right mind would actually use.
Post by bartc
By running a different program? Fine, then here's my submission for the
def fib(n): return 6765
The program has to be *correct*, and your program is not.

When comparing two compilers, you are ALWAYS comparing two different
programs.

Either the compilers generate precisely the same machine/byte code, in
which case there is no point in comparing them[1], or they generate
different machine/byte code, in which case you're comparing two different
running programs.



[...]
Post by bartc
The fact is that the vast majority of integer calculations don't need to
use big integers (pretty much 100% of mine). Probably most don't even
need 64 bits, but 32 bits.
And here we have the World According To Bart again: "since *I* don't need
more than 32-bits, then obviously it is a FACT than nobody else could
need them".

Yes Bart, the reason that so many languages have support for Bignums is
that everyone except you is an idiot who loves writing slow code for
absolutely no reason at all.




[1] Except to verify that they are, in fact, the same.
--
Steve
bartc
2018-02-23 12:17:50 UTC
Permalink
Post by Steven D'Aprano
Post by bartc
The actual result is irrelevant, so long as its correct. The important
thing is those 50 million calls.
Why do you care about the 50 million calls?
Because we are interested in comparing call-efficiency across languages?
NOT in Fibonacci numbers.

That's crazy -- the important
Post by Steven D'Aprano
thing is *calculating the Fibonacci numbers as efficiently as possible*.
# need a Fibonacci number, but the implementation we have
We don't. If Fibonacci causes problems for you because you can't see
past the inefficiency, then we might use the Ackermann function instead.

(But I've found Ackermann can cause stack overflow in some languages.)
Post by Steven D'Aprano
so why on earth do you care about the 50 million calls?
The only reason I care about them is to reduce the number as much as
possible. Why do you care about them?
You're responsible for implementing the call-mechanism of a new
language, and you want to make it efficient. What program do you test it
with, one that contains zero calls (clearly, that will be most
efficient!), or one that contains lots of calls?
Post by Steven D'Aprano
arr := some array
value = arr[i]
process(value)
If I'm writing Ruby or Python, that means *NOT* writing loops like that,
process(value)
Suppose it turned out that in Python, /this/ was actually faster:

for i in range(len(arr)):
value = arr[i]
process(value)
?
Post by Steven D'Aprano
I don't give a flying fox about how fast the compiler can do those 48
million calls, I want to know how to get Fibonacci numbers as fast as I
can, and if that means avoiding making the calls, then you better believe
that I will avoid making those calls.
OK, then, if you are at all interested in Julia vs. Python (and it was
you who posted that article!), then compare a version of a benchmark
that uses as few calls as possible, if you like. But what exactly is
that benchmark then highlighting? Loops?
Post by Steven D'Aprano
Post by bartc
By running a different program? Fine, then here's my submission for the
def fib(n): return 6765
The program has to be *correct*, and your program is not.
OK, then do as I did, and keep a global tally of how many calls it
makes, and print that as the result. Then the correct output for
'fibcalls(36)' should be:

48315633

If yours shows only 37, then /your/ program is wrong. (BTW, this value
will always fit in 64 bits, for the foreseeable future.)
Post by Steven D'Aprano
When comparing two compilers, you are ALWAYS comparing two different
programs.
That's part of the art of doing it properly.
Post by Steven D'Aprano
Either the compilers generate precisely the same machine/byte code, in
which case there is no point in comparing them[1], or they generate
different machine/byte code, in which case you're comparing two different
running programs.
So the people who worked for years on PyPy never once bothered to
compare results with running the same source program with CPython,
because there would be point? OK, I never knew that...
Post by Steven D'Aprano
Post by bartc
The fact is that the vast majority of integer calculations don't need to
use big integers (pretty much 100% of mine). Probably most don't even
need 64 bits, but 32 bits.
And here we have the World According To Bart again: "since *I* don't need
more than 32-bits, then obviously it is a FACT than nobody else could
need them".
Do you have better statistics than me? If so please share. I guess the
number of calculations (that /can/ be done in 64 bits) is less than
100%, but not 0%, so somewhere in between.

I'm saying it's nearer to 100% than to 0%; you're saying, what ..... ?

As examples of quantities that can be represented in 64 bits:

Character codes or code points
Integer pixel values
Most enumerations
For-loop counting (if it needs more than 64 bits, then we might be
waiting a while for it to finish)
Array indexing
Numbers of people, cars, guns, countries, houses, books, iphones etc
on Earth
... endless examples of numbers less than 9000000000000000000 used
in programming

Most calculations on these can also be done in 64 bits (except multiply
and power where the ceiling might 32 bits or lower).
Post by Steven D'Aprano
Yes Bart, the reason that so many languages have support for Bignums is
that everyone except you is an idiot who loves writing slow code for
absolutely no reason at all.
If bignum support is a handicap, even when they are not needed, then it
is something to consider.
--
bartc
Chris Angelico
2018-02-23 12:41:44 UTC
Permalink
Post by bartc
Post by Steven D'Aprano
Post by bartc
The fact is that the vast majority of integer calculations don't need to
use big integers (pretty much 100% of mine). Probably most don't even
need 64 bits, but 32 bits.
And here we have the World According To Bart again: "since *I* don't need
more than 32-bits, then obviously it is a FACT than nobody else could
need them".
Do you have better statistics than me? If so please share. I guess the
number of calculations (that /can/ be done in 64 bits) is less than 100%,
but not 0%, so somewhere in between.
I'm saying it's nearer to 100% than to 0%; you're saying, what ..... ?
I don't know what he's saying, but I'm saying that unless it is 100%,
your language should just natively support bigints. How do you know
when you'll need that extra room? Are you going to pick bigints for
everything that's affected by user input? If so, you may as well use
them for everything and not bother with the distinction; if not, some
day, your program will need larger numbers than you guessed for it,
and it'll break.
Post by bartc
Character codes or code points
Sure. That's why we have a special optimization for sequences of
21-bit numbers. We call it "str".
Post by bartc
Integer pixel values
Maybe in 64 bits for the time being, but 32 certainly won't be enough.
As soon as you do any sort of high DPI image manipulation, you will
exceed 2**32 total pixels in an image (that's just 65536x65536, or
46341x46341 if you're using signed integers); and it wouldn't surprise
me if some image manipulation needs that many on a single side - if
not today, then tomorrow. So 64 bits might not be enough once you
start counting total pixels.
Post by bartc
Most enumerations
For-loop counting (if it needs more than 64 bits, then we might be
waiting a while for it to finish)
Most loops don't need any counting. Should we implement a 0-bit
integer for those?
Post by bartc
Array indexing
Numbers of people, cars, guns, countries, houses, books, iphones etc
on Earth
... endless examples of numbers less than 9000000000000000000 used
in programming
As soon as you say "the number of X is always going to be less than
Y", you open yourself up to risk.

https://xkcd.com/865/
Post by bartc
Most calculations on these can also be done in 64 bits (except multiply and
power where the ceiling might 32 bits or lower).
Exactly. Most. And then you'll run into some limitation somewhere.

Let's say you want to know how many IP addresses, on average, are
allocated to one person. You'll probably get a single digit number as
your answer. Can you calculate that using 32-bit integers? What about
64-bit? Should you use bignums just in case? Take your pick, let me
know, and then I'll tell you my answer - and why.
Post by bartc
Post by Steven D'Aprano
Yes Bart, the reason that so many languages have support for Bignums is
that everyone except you is an idiot who loves writing slow code for
absolutely no reason at all.
If bignum support is a handicap, even when they are not needed, then it is
something to consider.
What sort of handicap is it? Performance? That's what optimization is
for. Some languages (Pike, for instance) have a single data type that
is sometimes represented internally by a native integer and sometimes
by a bignum; it's completely transparent apart from with timings.
Others (eg Python 2) have two separate data types, but will smoothly
transition to bignum any time it's needed. IMO CPython 3.x could
probably benefit from the transparent optimization... but Python (the
language) benefits hugely from the core int type simply being a
bignum. There are no semantically significant effects at the boundary
between "small numbers" and "large numbers".

Auto-growing lists are a cost, too. It's expensive to have to allocate
more memory to allow another element to be added. Should we lock in
the sizes of all lists/arrays at creation? Why should we pay the price
for that flexibility? Or maybe, in Bartville, this flexibility is
worth paying for but large integers are useless?

ChrisA
bartc
2018-02-23 13:28:16 UTC
Permalink
Post by Chris Angelico
Post by bartc
Integer pixel values
Maybe in 64 bits for the time being, but 32 certainly won't be enough.
Why, people's eyes will evolve to see quintillions of colours?
Post by Chris Angelico
As soon as you do any sort of high DPI image manipulation, you will
exceed 2**32 total pixels in an image (that's just 65536x65536, or
46341x46341 if you're using signed integers)
(Huh?)

; and it wouldn't surprise
Post by Chris Angelico
me if some image manipulation needs that many on a single side - if
not today, then tomorrow. So 64 bits might not be enough once you
start counting total pixels.
A four-billion x four-billion image would be quite something. I wonder
how long it would take Cpython to loop through all 18e18 pixels? (BTW
this still fits into unsigned 64 bits.)

These speculations are pointless. Graphics is very hardware-oriented
now, and this is mainly the concern of graphics processors which will
already have wide datapaths, much wider than 64 bits. Because you can of
course use more than one 64-bit value.
Post by Chris Angelico
Post by bartc
Most calculations on these can also be done in 64 bits (except multiply and
power where the ceiling might 32 bits or lower).
Exactly. Most.
OK, good, we've agreed on something: Most. Actually most values in my
programs fit into 32 bits. Most of /those/ are likely to fit into 16 bits.

In a program, most integers are going to represent small values.

And then you'll run into some limitation somewhere.
Post by Chris Angelico
Let's say you want to know how many IP addresses, on average, are
allocated to one person. You'll probably get a single digit number as
your answer. Can you calculate that using 32-bit integers? What about
64-bit? Should you use bignums just in case? Take your pick, let me
know, and then I'll tell you my answer - and why.
I'm not saying bignums are needed 0% of the time. But they don't need to
be used for 100% of all integer values, just in case.
Post by Chris Angelico
What sort of handicap is it? Performance?
I don't know. Wasn't that brought up in the discussion? That Python's
fib() routine could seamlessly accommodate bigger values as needed, and
Julia's couldn't, and maybe that was one reason that Julia had the edge.

Or maybe, in Bartville, this flexibility is
Post by Chris Angelico
worth paying for but large integers are useless?
In 40 years of coding I can say that I have very rarely needed arbitrary
precision integer values. (They come up in some recreational programs,
or for implementing arbitrary precision integers in the languages used
to run those programs!)

But I have needed flexible strings and lists all the time.

Is my programming experience really that different from anybody else's?
--
bartc
Steven D'Aprano
2018-02-23 16:39:02 UTC
Permalink
On Fri, 23 Feb 2018 23:41:44 +1100, Chris Angelico wrote:

[...]
Post by Chris Angelico
Post by bartc
Integer pixel values
Maybe in 64 bits for the time being, but 32 certainly won't be enough.
As soon as you do any sort of high DPI image manipulation, you will
exceed 2**32 total pixels in an image (that's just 65536x65536, or
46341x46341 if you're using signed integers);
I don't think there's any sensible reason to use signed ints for pixel
offsets.
Post by Chris Angelico
and it wouldn't surprise
me if some image manipulation needs that many on a single side - if not
today, then tomorrow. So 64 bits might not be enough once you start
counting total pixels.
64-bit int will limit your image size to a maximum of 4294967296 x
4294967296.

If you stitched the images from, let's say, the Hubble Space Telescope
together into one panoramic image of the entire sky, you'd surely exceed
that by a couple of orders of magnitude. There's a lot of sky and the
resolution of the Hubble is very fine.

Or made a collage of all the duck-face photos on Facebook *wink*

But you wouldn't be able to open such a file on your computer. Not until
we get a 128-bit OS :-)

While all of this is very interesting, let's remember that listing all
the many things which can be counted in 64 bits (or 128 bits, or 1024
bits...) doesn't disprove the need for bignums. The argument Bart makes
is to list all the things that don't need refrigerating:

- tinned beans
- nuts and bolts
- DVDs
- books
- spoons

as evidence that we don't need refrigerators. Yeah, I get it, there are
approximately a hundred thousand million billion trillion things that
don't need bignums. There are so many things that don't need bignums, you
need a bignum to count them all!

I'm truly happy for Bart that he never, or almost never, uses numbers
larger than 2**64. I just wish he would be less of a prig about this and
stop trying to deny me my bignums.

One of the things I like about Python is that I stopped needing to worry
about integer overflow back around Python 2.2 when the default integer
type became a bignum. I used to write code like:

try:
calculation(n)
except OverflowError:
calculation(long(n))

all the time back then. Now, if I have an integer calculation, I just
calculate it without having to care whether it fits in 64 bits or not.

And I don't give a rat's arse that this means that adding 1 to 10 to get
11 in Python takes thirty nanoseconds instead of ten nanoseconds as a
consequence. If I cared about that, I wouldn't be using Python.

Just the other day I need to calculate 23! (factorial) for a probability
calculation, and Python made it trivially easy. I didn't need to import a
special library, or use special syntax to say "use a bignum". I just
multiplied 1 through 23.

Why, just now, on a whim, simply because I can, I calculated 100!!
(double-factorial):

py> reduce(operator.mul, range(100, 0, -2), 1)
34243224702511976248246432895208185975118675053719198827915654463488000000000000

(Yes, I'm a maths geek, and I love this sort of thing. So sue me.)

There are dozens of languages that have made the design choice to limit
their default integers to 16- 32- or 64-bit fixed size, and let the user
worry about overflow. Bart, why does it upset you so that Python made a
different choice?
--
Steve
Rick Johnson
2018-02-26 02:33:47 UTC
Permalink
On Friday, February 23, 2018 at 10:41:45 AM UTC-6, Steven D'Aprano wrote:
[...]
Post by Steven D'Aprano
There are dozens of languages that have made the design
choice to limit their default integers to 16- 32- or 64-bit
fixed size, and let the user worry about overflow. Bart,
why does it upset you so that Python made a different
choice?
A default "integer-diversity-and-inclusivity-doctrine" is
all fine and dandy by me, (Hey, even integers need safe spaces),
but i do wish we pythonistas had a method to turn off this
(and other) cycle burning "features" -- you know -- in the
99.99999 percent of time that we don't need them.

And BTW... Am i the *ONLY* person here who feels that Python
optimizations should be more than merely the tossing of dead
weight overboard?
Chris Angelico
2018-02-26 02:45:22 UTC
Permalink
On Mon, Feb 26, 2018 at 1:33 PM, Rick Johnson
Post by Rick Johnson
[...]
Post by Steven D'Aprano
There are dozens of languages that have made the design
choice to limit their default integers to 16- 32- or 64-bit
fixed size, and let the user worry about overflow. Bart,
why does it upset you so that Python made a different
choice?
A default "integer-diversity-and-inclusivity-doctrine" is
all fine and dandy by me, (Hey, even integers need safe spaces),
In Python 3.6+, integers have safe underscores instead.
Post by Rick Johnson
but i do wish we pythonistas had a method to turn off this
(and other) cycle burning "features" -- you know -- in the
99.99999 percent of time that we don't need them.
Definitely. We should also provide a way for people to manually
allocate memory, for the times when that would be more efficient. And
to switch out the syntax to include braces. And all those other things
people like from other languages. Hey, here's an idea: maybe Python
shouldn't stop people from using other languages, if those other
languages are better suited to the job?
Post by Rick Johnson
And BTW... Am i the *ONLY* person here who feels that Python
optimizations should be more than merely the tossing of dead
weight overboard?
I dunno. We optimize this mailing list that way......

*ducking for cover*

ChrisA
Rick Johnson
2018-02-26 04:22:17 UTC
Permalink
Post by Chris Angelico
On Mon, Feb 26, 2018 at 1:33 PM, Rick Johnson
[...]
Post by Chris Angelico
Post by Rick Johnson
but i do wish we pythonistas had a method to turn off this
(and other) cycle burning "features" -- you know -- in the
99.99999 percent of time that we don't need them.
Definitely. We should also provide a way for people to
manually allocate memory, for the times when that would be
more efficient. And to switch out the syntax to include
braces.
Now you're throwing the baby out with the bath water.

We needn't re-implement the C language simply to achieve
reasonable and practical code-execution efficiency. Python
was never intended to be the fastest language. Heck! Python
was never intented to be the fastest _scripting_ language!
So of course, speed is not and should not be the primary
concern, but to say that execution speed is of _no_ concern
is quite absurd indeed.

As Steven mentioned, few people, even of they _could_ afford
the absurd price, will ever purchase that high-end sports
car. However, by using this example Steve is committing the
argumentation sin of comparing apples to oranges because
transportation is more a matter of practicality. Sure, you
may arrive at your destination more quickly in the high-end
sports car, however, the trade-off will be that you could be
thrown in jail for speeding or be injured or killed in an
accident.

But such existential risks are not the case for software...

Code that execute more quickly simply (wait for it...) yes,
executes more quickly! And I can assure you: no one will die,
be injured, or even become "Bubba's" cell mate should their
code commit the unforgivable "sin" of executing more
quickly[1].


--
[1] Quick! Someone please call a priest! We need to confess!!!
Steven D'Aprano
2018-02-26 09:57:06 UTC
Permalink
Post by Rick Johnson
So of course, speed is not and should not be the
primary concern, but to say that execution speed is of _no_ concern is
quite absurd indeed.
I'm pretty sure that nobody here has said that speed is of no concern.

Rather, I would argue that the position we're taking is that *in general*
Python is fast enough for the sorts of tasks we use it for (except, of
course, when it isn't, in which case you have our sympathy, and if we can
suggest some solutions we will).

Of course we'd take speed optimizations if they were free, but they're
never free:

- they often require more memory to run;

- they usually require more code, which adds to the maintenance burden
and increases the interpreter bloat;

- and increases the risk of bugs;

- somebody has to write, debug, document and maintain the code,
and developer time and effort is in short supply;

- or the optimization requires changes to the way Python operates;

- even if we wanted to make that change, it will break backwards
compatibility;

- and often what people imagine is a simple optimization (because
they haven't tried it) isn't simple at all;

- or simply doesn't work;

- and most importantly, just saying "Make it go faster!" doesn't work,
we actually need to care about the details of *how* to make it faster.

(We tried painting Go Faster stripes on the server, and it didn't work.)

There's no point suggesting such major changes to Python that requires
going back to the drawing board, to Python 0.1 or earlier, and changing
the entire execution and memory model of the language.

That would just mean we've swapped from a successful, solid, reliable
version 3.6 of the language to an untried, unstable, unproven, bug-ridden
version 0.1 of New Python.

And like New Coke, it won't attract new users (they're already using
Javascript or Go or whatever...) and it will alienate existing users (if
they wanted Javascript or Go they'd already be using it).

There have been at least two (maybe three) attempts to remove the GIL
from CPython. They've all turned out to increase complexity by a huge
amount, and not actually provide the hoped-for speed increase. Under many
common scenarios, the GIL-less CPython actually ran *slower*.

(I say "hoped for", but it was more wishful thinking than a rational
expectation that removing the GIL would speed things up. I don't believe
any of the core developers were surprised that removing the GIL didn't
increase speeds much, if at all, and sometimes slowed things down. The
believe that the GIL slows Python down is mostly due to a really
simplistic understanding of how threading works.)

Besides, if you want Python with no GIL so you can run threaded code, why
aren't you using IronPython or Jython?

Another example: UnladenSwallow. That (mostly) failed to meet
expectations because the compiler tool chain being used wasn't mature
enough. If somebody were to re-do the project again now, the results
might be different.

But it turns out that not that many people care enough to make Python
faster to actually invest money in it. Everyone wants to just stamp their
foot and get it for free.

(If it weren't for the EU government funding it, PyPy would probably be
languishing in oblivion. Everyone wants a faster Python so long as they
don't have to pay for it, or do the work.)

A third example: Victor Stinner's FatPython, which seemed really
promising in theory, but turned out to not make enough progress fast
enough and he lost interest.

I have mentioned FatPython here a number of times. All you people who
complain that Python is "too slow" and that the core devs ought to do
something about it... did any of you volunteer any time to the FatPython
project?
--
Steve
Chris Angelico
2018-02-26 11:34:00 UTC
Permalink
On Mon, Feb 26, 2018 at 8:57 PM, Steven D'Aprano
Post by Steven D'Aprano
Post by Rick Johnson
So of course, speed is not and should not be the
primary concern, but to say that execution speed is of _no_ concern is
quite absurd indeed.
I'm pretty sure that nobody here has said that speed is of no concern.
Rather, I would argue that the position we're taking is that *in general*
Python is fast enough for the sorts of tasks we use it for (except, of
course, when it isn't, in which case you have our sympathy, and if we can
suggest some solutions we will).
Of course we'd take speed optimizations if they were free, but they're
- they often require more memory to run;
- they usually require more code, which adds to the maintenance burden
and increases the interpreter bloat;
- and increases the risk of bugs;
- somebody has to write, debug, document and maintain the code,
and developer time and effort is in short supply;
- or the optimization requires changes to the way Python operates;
- even if we wanted to make that change, it will break backwards
compatibility;
- and often what people imagine is a simple optimization (because
they haven't tried it) isn't simple at all;
- or simply doesn't work;
- and most importantly, just saying "Make it go faster!" doesn't work,
we actually need to care about the details of *how* to make it faster.
Or it reduces memory usage, improves performance, and makes things
easier on the programmer... but might place unexpected (or unwanted)
demands on other implementations of Python. CPython, by virtue of
being the "default" Python interpreter, has to be careful of appearing
to mandate something. That's why buy-in from other Python
implementation teams (I believe Jython was the most notable here) was
important in the discussion about the recent compact dict update. Even
what looks like a no-brainer still can have unwanted consequences.
Post by Steven D'Aprano
(We tried painting Go Faster stripes on the server, and it didn't work.)
Steven Middlename D'Aprano! You should know better than that. "It
didn't work" is not good enough. What actually happened? Did the
stripes smoulder and smoke until you powered down the server? Did the
server raise NotImplementedError when you touched it with the
paintbrush? Did your boss walk up behind you and hand you a pink slip?
Be specific!
Post by Steven D'Aprano
There have been at least two (maybe three) attempts to remove the GIL
from CPython. They've all turned out to increase complexity by a huge
amount, and not actually provide the hoped-for speed increase. Under many
common scenarios, the GIL-less CPython actually ran *slower*.
(I say "hoped for", but it was more wishful thinking than a rational
expectation that removing the GIL would speed things up. I don't believe
any of the core developers were surprised that removing the GIL didn't
increase speeds much, if at all, and sometimes slowed things down. The
believe that the GIL slows Python down is mostly due to a really
simplistic understanding of how threading works.)
Removing the GIL from CPython is not about "speeding up" the language
or the interpreter, but about improving parallelism. And I think all
the core devs understand this (hence the lack of surprise there), but
people who call for it often don't.
Post by Steven D'Aprano
(If it weren't for the EU government funding it, PyPy would probably be
languishing in oblivion. Everyone wants a faster Python so long as they
don't have to pay for it, or do the work.)
Hm, I didn't know the EU was funding PyPy. Okay. Cool.

ChrisA
Steven D'Aprano
2018-02-26 13:03:34 UTC
Permalink
On Mon, 26 Feb 2018 22:34:00 +1100, Chris Angelico wrote:

[...]
Post by Steven D'Aprano
(We tried painting Go Faster stripes on the server, and it didn't
work.)
Steven Middlename D'Aprano! You should know better than that. "It didn't
work" is not good enough. What actually happened?
A truck crashed into the power pole outside our building and the power
went off. Then the UPS ran down and the server crashed.

Clearly this is a bug in the go-faster stripes. We tried reporting it:

Expected behaviour: server goes faster.

Actual behaviour: a truck crashes into the power pole
across the street and the lights go out.

but the maintainers closed the issue "Works for me". Very disappointed!


[...]
Removing the GIL from CPython is not about "speeding up" the language or
the interpreter, but about improving parallelism.
It is about speeding up threaded code which is CPU-bound. I call that
speeding up the language :-)

Hypothetically, it could also speed up even unthreaded code.

Some language features could internally launch threads and operate using
implicit parallelism. For instance, map(func, seq) could run in parallel
in separate threads whenever Python knew that func had no side-effects,
say for built-ins. There are also parallel algorithms for bignum
arithmetic. If threads were quick enough, I'm sure we could come up with
many more examples. Hypothetically speaking.

(In practice, the old hope that parallel computing would speed everything
up turned out to be flawed. Relatively few tasks are embarrassingly
parallel, most have sequential bottlenecks, and many are inherently
sequential.)
Post by Steven D'Aprano
(If it weren't for the EU government funding it, PyPy would probably be
languishing in oblivion. Everyone wants a faster Python so long as they
don't have to pay for it, or do the work.)
Hm, I didn't know the EU was funding PyPy. Okay. Cool.
I don't know if they still receive funding, but I know that PyPy really
only got going in a big way when they got a grant from the EU. I think it
paid for at least one developer to work on it full time for a year.
DuckDuckGo is probably your friend if you care about the details.
--
Steve
Chris Angelico
2018-02-26 14:34:51 UTC
Permalink
On Tue, Feb 27, 2018 at 12:03 AM, Steven D'Aprano
Post by Steven D'Aprano
Removing the GIL from CPython is not about "speeding up" the language or
the interpreter, but about improving parallelism.
It is about speeding up threaded code which is CPU-bound. I call that
speeding up the language :-)
Hypothetically, it could also speed up even unthreaded code.
In its purest sense, no, it cannot speed up unthreaded code. Since
many Python programs are single-threaded, this restricts the benefits
to those programs which can actually parallelize, but the penalties
(mainly overhead) usually apply to all programs.
Post by Steven D'Aprano
Some language features could internally launch threads and operate using
implicit parallelism. For instance, map(func, seq) could run in parallel
in separate threads whenever Python knew that func had no side-effects,
say for built-ins. There are also parallel algorithms for bignum
arithmetic. If threads were quick enough, I'm sure we could come up with
many more examples. Hypothetically speaking.
These are ways of converting single-threaded code into multi-threaded
code, which could then benefit from parallelization, which can reduce
wall-time for the process as a whole.
Post by Steven D'Aprano
(In practice, the old hope that parallel computing would speed everything
up turned out to be flawed. Relatively few tasks are embarrassingly
parallel, most have sequential bottlenecks, and many are inherently
sequential.)
Exactly; and it goes further: many modern programmers do not think in
terms of parallelism. Even event-driven code takes a bit of getting
your head around. Why are async functions preferred over threads? It's
not JUST because today's OSes have thread count limits far below
socket limits - if that were the sole justification, we wouldn't have
language-level support for async/await - it'd be a niche technique
used only in the highest-throughput applications. No, async functions
are preferred because they *reduce the points where context switching
can occur*, thus making it easier *for the programmer*. If there were
a way for the language to automatically run things in parallel, the
biggest risk is that the programmer who wrote it can no longer
understand it.
Post by Steven D'Aprano
Post by Steven D'Aprano
(If it weren't for the EU government funding it, PyPy would probably be
languishing in oblivion. Everyone wants a faster Python so long as they
don't have to pay for it, or do the work.)
Hm, I didn't know the EU was funding PyPy. Okay. Cool.
I don't know if they still receive funding, but I know that PyPy really
only got going in a big way when they got a grant from the EU. I think it
paid for at least one developer to work on it full time for a year.
DuckDuckGo is probably your friend if you care about the details.
Meh, I don't care that much about what the EU does or doesn't fund.
Was just a mild case of curiosity. I had about as much interest as my
savings account accrued last year.

I'm glad _someone_ funded PyPy, anyhow. It's a great demonstration of
what can be done.

ChrisA
bartc
2018-02-26 14:45:33 UTC
Permalink
Post by Chris Angelico
I'm glad _someone_ funded PyPy, anyhow. It's a great demonstration of
what can be done.
And it's good that /someone/ at least understands how PyPy works, as I
don't think many people do.

Apparently it doesn't optimise 'hot paths' within a Python program, but
optimises hot paths in the special Python interpreter. One written in
[R]Python. Or something...
--
Bartc
Rick Johnson
2018-02-26 04:25:42 UTC
Permalink
Post by Chris Angelico
On Mon, Feb 26, 2018 at 1:33 PM, Rick Johnson
[...]
Post by Chris Angelico
Post by Rick Johnson
A default "integer-diversity-and-inclusivity-doctrine" is
all fine and dandy by me, (Hey, even integers need safe spaces),
In Python 3.6+, integers have safe underscores instead.
You spacist!!!
Steven D'Aprano
2018-02-26 04:32:34 UTC
Permalink
Post by Rick Johnson
On Friday, February 23, 2018 at 10:41:45 AM UTC-6, Steven D'Aprano
wrote: [...]
Post by Steven D'Aprano
There are dozens of languages that have made the design choice to limit
their default integers to 16- 32- or 64-bit fixed size, and let the
user worry about overflow. Bart, why does it upset you so that Python
made a different choice?
A default "integer-diversity-and-inclusivity-doctrine" is all fine and
dandy by me, (Hey, even integers need safe spaces), but i do wish we
pythonistas had a method to turn off this (and other) cycle burning
"features" -- you know -- in the 99.99999 percent of time that we don't
need them.
Ah, you mean just like the way things were in Python 1.0 through 2.1?
Hands up anyone who has seen an integer OverflowError in the last 10
years? Anyone?

[***@ando ~]$ python1.5 -c "print 2**64"
Traceback (innermost last):
File "<string>", line 1, in ?
OverflowError: integer pow()


I really miss having to either add, or delete, an "L" suffix from my long
ints, and having to catch OverflowError to get any work done, and
generally spending half my time worrying how my algorithms will behave
when integer operations overflow.

Good times. I miss those days. I also miss the days when everyone had
scabies.


As someone who wrote Python code when bignums where *not* the default, I
can tell you that:

- it was a real PITA for those who cared about their code working
correctly and being bug-free;

- and the speed up actually wasn't that meaningful.


As is so often the case with these things, using fixed size ints looks
good in benchmark games, but what's fast in a toy benchmark and what's
fast in real code are not always the same.
--
Steve
Rick Johnson
2018-02-26 05:19:19 UTC
Permalink
On Sunday, February 25, 2018 at 10:35:29 PM UTC-6, Steven D'Aprano wrote:
[...]
Post by Steven D'Aprano
Ah, you mean just like the way things were in Python 1.0
through 2.1? Hands up anyone who has seen an integer
OverflowError in the last 10 years? Anyone?
I think Python2.1 is much older than 10 years, so yeah, of
course almost no one (save a few old timers) have seen that
error. :-)
Post by Steven D'Aprano
[...]
I really miss having to either add, or delete, an "L"
suffix from my long ints, and having to catch OverflowError
to get any work done, and generally spending half my time
worrying how my algorithms will behave when integer
operations overflow.
I agree with your sarcasm. And that's why these types of
auto conversions should be optional. I agree that most times
it's more practical to let python handle the dirty details.
But in some cases, where you need to squeeze out a few more
drops of speed juice, you won't mind the extra trouble. And
perhaps you (meaning specifically you) will never need such
a flexible feature. But hey. The Python community is
diverse. So please do keep that in mind.
Steven D'Aprano
2018-02-26 09:22:57 UTC
Permalink
I agree with your sarcasm. And that's why these types of auto
conversions should be optional. I agree that most times it's more
practical to let python handle the dirty details. But in some cases,
where you need to squeeze out a few more drops of speed juice, you won't
mind the extra trouble.
And that's where you call out to a library like numpy, or write a C
extension, or use a tool like Numba or Cython to optimise your Python
code to use native ints. (Numba is still a work in progress, but Cython
is a mature working product.)

Or to put it another way... if you want machine ints in Python, the way
you get them is something like:

from numba import jit

@jit
def myfunction(): ...



The core language doesn't have to support these things when there is a
healthy language ecosystem that can do it.
--
Steve
bartc
2018-02-26 11:13:51 UTC
Permalink
Post by Steven D'Aprano
I agree with your sarcasm. And that's why these types of auto
conversions should be optional. I agree that most times it's more
practical to let python handle the dirty details. But in some cases,
where you need to squeeze out a few more drops of speed juice, you won't
mind the extra trouble.
And that's where you call out to a library like numpy, or write a C
extension, or use a tool like Numba or Cython to optimise your Python
code to use native ints. (Numba is still a work in progress, but Cython
is a mature working product.)
Or to put it another way... if you want machine ints in Python, the way
from numba import jit
@jit
def myfunction(): ...
The core language doesn't have to support these things when there is a
healthy language ecosystem that can do it.
Below is the first draft of a Python port of a program to do with random
numbers. (Ported from my language, which in turned ported it from a C
program by George Marsaglia, the random number guy.)

However, running it quickly exhausts the memory in my machine. The
reason is that Python unhelpfully widens all results to bignums as
needed. The code relies on calculations being modulo 2^64.

Note that restricting integer ops to 64 bits probably still won't work,
as I believe the numbers need to be unsigned.

------------------------------------------

Q=0
carry=36243678541
xcng=12367890123456
xs=521288629546311
indx=20632

def refill():
global Q, carry, indx
for i in range(20632):
h = carry & 1
z = ((Q[i]<<41)>>1)+((Q[i]<<39)>>1)+(carry>>1)
carry = (Q[i]>>23)+(Q[i]>>25)+(z>>63)
Q[i] = ~((z<<1)+h)
indx=1
return Q[0]

def start():
global Q, carry, xcng, xs, indx
Q=[0,]*20632

for i in range(20632):

xcng=6906969069 * xcng + 123

xs ^= (xs<<13)
xs ^= (xs>>17)
xs ^= (xs<<43)

Q[i] = xcng + xs

N = 100000
for i in range(N):
if indx<20632:
s = Q[indx]
indx+=1
else:
s = refill()
xcng=6906969069 * xcng + 123
xs ^= (xs<<13)
xs ^= (xs>>17)
xs ^= (xs<<43)
x = s+xcng+xs
print ("Does x= 4013566000157423768")
print (" x=",x)

start()

------------------------------------------

(The code performs N iterations of a random number generator. You get
the result expected, ie. x=401...768, when N is a billion.)
--
Bartc
Chris Angelico
2018-02-26 11:40:36 UTC
Permalink
Post by bartc
Below is the first draft of a Python port of a program to do with random
numbers. (Ported from my language, which in turned ported it from a C
program by George Marsaglia, the random number guy.)
However, running it quickly exhausts the memory in my machine. The reason is
that Python unhelpfully widens all results to bignums as needed. The code
relies on calculations being modulo 2^64.
Note that restricting integer ops to 64 bits probably still won't work, as I
believe the numbers need to be unsigned.
No, it's because the original implementation assumed integer
wrap-around (at least, I think that's what's happening; I haven't
analyzed the code in great detail). That means all your integer
operations are doing two jobs: the one they claim to, and then a
masking to 64 bits signed. That's two abstract operations that happen,
due to the nature of the CPU, to work efficiently together. If you
don't implement both halves of that in your Python port, you have
failed at porting. What if you were porting a program from a 72-bit
chip that assumed Binary Coded Decimal? Would you complain that C's
integers are badly behaved?

And that's without even asking whether a C program that assumes
integer wrap-around counts as portable. At least with Python, you have
a guarantee that integer operations are going to behave the same way
on all compliant implementations of the language.

ChrisA
bartc
2018-02-26 12:13:56 UTC
Permalink
Post by Chris Angelico
Post by bartc
Below is the first draft of a Python port of a program to do with random
numbers. (Ported from my language, which in turned ported it from a C
program by George Marsaglia, the random number guy.)
However, running it quickly exhausts the memory in my machine. The reason is
that Python unhelpfully widens all results to bignums as needed. The code
relies on calculations being modulo 2^64.
Note that restricting integer ops to 64 bits probably still won't work, as I
believe the numbers need to be unsigned.
No, it's because the original implementation assumed integer
wrap-around (at least, I think that's what's happening; I haven't
analyzed the code in great detail). That means all your integer
operations are doing two jobs: the one they claim to, and then a
masking to 64 bits signed. That's two abstract operations that happen,
due to the nature of the CPU, to work efficiently together. If you
don't implement both halves of that in your Python port, you have
failed at porting. What if you were porting a program from a 72-bit
chip that assumed Binary Coded Decimal? Would you complain that C's
integers are badly behaved?
And that's without even asking whether a C program that assumes
integer wrap-around counts as portable. At least with Python, you have
a guarantee that integer operations are going to behave the same way
on all compliant implementations of the language.
A C version is given below. (One I may have messed around with, which
I'm not sure works properly. For an original, google for Marsaglia and
KISS64 or SUPRKISS64.)

Most integers are unsigned, which have well-defined overflow in C (they
just wrap as expected). In C, a mixed signed/unsigned op is performed as
unsigned.

-----------------------------

/* SUPRKISS64.c, period 5*2^1320480*(2^64-1) */
#include <stdio.h>
#include <stdlib.h>
#include "timer.c"
static signed long long Q[20632],
carry=36243678541LL,
xcng=12367890123456LL,
xs=521288629546311LL,
indx=20632;

#define CNG ( xcng=6906969069LL*xcng+123 )
#define XS ( xs^=xs<<13,xs^=xs>>17,xs^=xs<<43 )
#define SUPR ( indx<20632 ? Q[indx++] : refill() )
#define KISS SUPR+CNG+XS

signed long long refill(void) {
int i; signed long long z,h;
for(i=0;i<20632;i++) {
h = (carry&1);
z = ((Q[i]<<41)>>1)+((Q[i]<<39)>>1)+(carry>>1);
carry = (Q[i]>>23)+(Q[i]>>25)+(z>>63);
Q[i] = ~((z<<1)+h);
}
indx=1;
return (Q[0]);
}

int main() {
int i; signed long long x;

for(i=0;i<20632;i++) Q[i]=CNG+XS;

for(i=0;i<1000000000;i++) x=KISS;

printf("Does x=4013566000157423768\n x=%llu.\n",x);
}
--
bartc
Ned Batchelder
2018-02-26 13:42:14 UTC
Permalink
Post by bartc
Post by Chris Angelico
Post by bartc
Below is the first draft of a Python port of a program to do with random
numbers. (Ported from my language, which in turned ported it from a C
program by George Marsaglia, the random number guy.)
However, running it quickly exhausts the memory in my machine. The reason is
that Python unhelpfully widens all results to bignums as needed. The code
relies on calculations being modulo 2^64.
Note that restricting integer ops to 64 bits probably still won't work, as I
believe the numbers need to be unsigned.
No, it's because the original implementation assumed integer
wrap-around (at least, I think that's what's happening; I haven't
analyzed the code in great detail). That means all your integer
operations are doing two jobs: the one they claim to, and then a
masking to 64 bits signed. That's two abstract operations that happen,
due to the nature of the CPU, to work efficiently together. If you
don't implement both halves of that in your Python port, you have
failed at porting. What if you were porting a program from a 72-bit
chip that assumed Binary Coded Decimal? Would you complain that C's
integers are badly behaved?
And that's without even asking whether a C program that assumes
integer wrap-around counts as portable. At least with Python, you have
a guarantee that integer operations are going to behave the same way
on all compliant implementations of the language.
A C version is given below. (One I may have messed around with, which
I'm not sure works properly. For an original, google for Marsaglia and
KISS64 or SUPRKISS64.)
Most integers are unsigned, which have well-defined overflow in C
(they just wrap as expected). In C, a mixed signed/unsigned op is
performed as unsigned.
-----------------------------
/*   SUPRKISS64.c, period 5*2^1320480*(2^64-1)   */
#include <stdio.h>
#include <stdlib.h>
#include "timer.c"
 static signed long long Q[20632],
                 carry=36243678541LL,
                 xcng=12367890123456LL,
                 xs=521288629546311LL,
                 indx=20632;
 #define CNG ( xcng=6906969069LL*xcng+123 )
 #define XS  ( xs^=xs<<13,xs^=xs>>17,xs^=xs<<43 )
 #define SUPR ( indx<20632 ? Q[indx++] : refill() )
 #define KISS SUPR+CNG+XS
 signed long long refill(void) {
  int i; signed long long z,h;
  for(i=0;i<20632;i++) {
    h = (carry&1);
    z = ((Q[i]<<41)>>1)+((Q[i]<<39)>>1)+(carry>>1);
    carry = (Q[i]>>23)+(Q[i]>>25)+(z>>63);
    Q[i] = ~((z<<1)+h);
  }
  indx=1;
  return (Q[0]);
 }
 int main() {
  int i; signed long long x;
  for(i=0;i<20632;i++) Q[i]=CNG+XS;
  for(i=0;i<1000000000;i++) x=KISS;
  printf("Does x=4013566000157423768\n     x=%llu.\n",x);
}
With proper 64-bit masking (def only64(x): return x &
0xFFFFFFFFFFFFFFFF), the Python version produces the correct answer
using a reasonable amount of memory. Well, once you notice that the
Python code had N=1e5, and the C code had N=1e9 :)   If you want to
experiment, with N=1e5, the final number should be 5255210926702073855.

Also, I note that you said, "Most integers are unsigned", but the C code
has them all declared as signed?  It doesn't seem to have mattered to
your result, but I'm not an expert on C portability guarantees.

--Ned.
bartc
2018-02-26 14:04:53 UTC
Permalink
Post by Ned Batchelder
Post by bartc
A C version is given below. (One I may have messed around with, which
I'm not sure works properly. For an original, google for Marsaglia and
KISS64 or SUPRKISS64.)
Most integers are unsigned, which have well-defined overflow in C
With proper 64-bit masking (def only64(x): return x &
0xFFFFFFFFFFFFFFFF), the Python version produces the correct answer
using a reasonable amount of memory.
I did try sometime like that, but I must have missed something because I
didn't get quite the same results as a working version.

And with interpreted code, you tend not to test using loops of a billion
iterations.

Well, once you notice that the
Post by Ned Batchelder
Python code had N=1e5, and the C code had N=1e9 :)   If you want to
experiment, with N=1e5, the final number should be 5255210926702073855.
OK, I'll try that.
Post by Ned Batchelder
Also, I note that you said, "Most integers are unsigned", but the C code
has them all declared as signed?  It doesn't seem to have mattered to
your result, but I'm not an expert on C portability guarantees.
The C code I first pasted used 'unsigned', but the main program logic
wasn't right, and I found another version that looked better. That one
used 'signed' for some reason, which I completely missed.

Even if with C it works with either, the signed version might have
'undefined behaviour'. As said, google for the original; the ones I can
see have 'unsigned'. But I can also see a Fortran version that just uses
'integer*8', which I believe is signed 64-bit.
--
bartc
Rhodri James
2018-02-26 14:01:35 UTC
Permalink
Post by Ned Batchelder
Also, I note that you said, "Most integers are unsigned", but the C code
has them all declared as signed?  It doesn't seem to have mattered to
your result, but I'm not an expert on C portability guarantees.
C explicitly leaves the behaviour of signed arithmetic overflow
undefined, so you have no portability guarantees there.
--
Rhodri James *-* Kynesim Ltd
Chris Angelico
2018-02-23 16:51:40 UTC
Permalink
On Sat, Feb 24, 2018 at 3:39 AM, Steven D'Aprano
Post by Steven D'Aprano
[...]
Post by Chris Angelico
Post by bartc
Integer pixel values
Maybe in 64 bits for the time being, but 32 certainly won't be enough.
As soon as you do any sort of high DPI image manipulation, you will
exceed 2**32 total pixels in an image (that's just 65536x65536, or
46341x46341 if you're using signed integers);
I don't think there's any sensible reason to use signed ints for pixel
offsets.
How about "my language's default integer type is signed"? That's what
gave us all manner of problems, like the way a whole lot of old (and
some not even all THAT old) programs have trouble if you have more
than 2GB of free space - because they query disk free space as a
32-bit number, then interpret that number as signed. (If, as on many
modern systems, you have hundreds or thousands of gig free, it depends
whether that number modulo 4GB is more than 2GB, which is a very fun
thing to manage.)
Post by Steven D'Aprano
Post by Chris Angelico
and it wouldn't surprise
me if some image manipulation needs that many on a single side - if not
today, then tomorrow. So 64 bits might not be enough once you start
counting total pixels.
64-bit int will limit your image size to a maximum of 4294967296 x
4294967296.
If you stitched the images from, let's say, the Hubble Space Telescope
together into one panoramic image of the entire sky, you'd surely exceed
that by a couple of orders of magnitude. There's a lot of sky and the
resolution of the Hubble is very fine.
Or made a collage of all the duck-face photos on Facebook *wink*
But you wouldn't be able to open such a file on your computer. Not until
we get a 128-bit OS :-)
Hmm, yeah, I didn't actually do the math on that, heh.
Post by Steven D'Aprano
While all of this is very interesting, let's remember that listing all
the many things which can be counted in 64 bits (or 128 bits, or 1024
bits...) doesn't disprove the need for bignums. The argument Bart makes
- tinned beans
- nuts and bolts
- DVDs
- books
- spoons
as evidence that we don't need refrigerators. Yeah, I get it, there are
approximately a hundred thousand million billion trillion things that
don't need bignums. There are so many things that don't need bignums, you
need a bignum to count them all!
Touche!
Post by Steven D'Aprano
And I don't give a rat's arse that this means that adding 1 to 10 to get
11 in Python takes thirty nanoseconds instead of ten nanoseconds as a
consequence. If I cared about that, I wouldn't be using Python.
Which is why you don't do crypto in Python - because you DO care about
whether it takes thirty sometimes and ten sometimes, and various other
things.

So there *are* situations where you specifically do not want bignums;
but they're a lot rarer than situations where you don't care what the
underlying implementation is, and it's incredibly freeing to not have
to take integer magnitude limitations into account.

ChrisA
Steven D'Aprano
2018-02-23 18:00:33 UTC
Permalink
Post by bartc
Post by Steven D'Aprano
Post by bartc
The actual result is irrelevant, so long as its correct. The important
thing is those 50 million calls.
Why do you care about the 50 million calls?
Because we are interested in comparing call-efficiency across languages?
Are we? What on earth makes you think that?


[...]
Post by bartc
You're responsible for implementing the call-mechanism of a new
language, and you want to make it efficient. What program do you test it
with, one that contains zero calls (clearly, that will be most
efficient!), or one that contains lots of calls?
This has nothing to do with the topic under consideration. It is
irrelevant. This discussion has nothing, absolutely nothing with
"implementing the call-mechanism of a new language".

Perhaps you have missed that Python is a mature language well over 20
years old now, and even Julia is a few years old.


[...]
Post by bartc
OK, then, if you are at all interested in Julia vs. Python (and it was
you who posted that article!), then compare a version of a benchmark
that uses as few calls as possible, if you like. But what exactly is
that benchmark then highlighting? Loops?
What on earth makes you think that the point of this is to highlight any
one particular feature of a language? Loops, calls, addition, who cares?

The point is to show how to take badly written slow code written in
Python, and modify it to be better, faster code while remaining within
the Python ecosystem.


[...]
Post by bartc
OK, then do as I did, and keep a global tally of how many calls it
makes, and print that as the result. Then the correct output for
48315633
If yours shows only 37, then /your/ program is wrong.
I'm not interested in such a function. The words have not yet been
invented to explain how little I care about how fast Python or Julia can
make millions and millions of pointless function calls for no good reason.

But if it makes you feel good about yourself, if it makes you feel that
you've won some sort of little victory, then I concede:

If a programmer cares about writing useless, crappy, pointless code that
just makes function call after function call for no good reason, then
Python will be terrible for the job. I would never use Python for such
code, it is awful at it.

If you think it is important to run an empty loop a billion times in a
nanosecond, Python will be crap at that too.

Python is only useful for writing useful, meaningful code that gets
useful work done, not for your "fibcalls" function. If you have a need
for "fibcalls", then go ahead and use Julia, or C, or whatever language
you prefer, and be glad that you have a wide choice of languages to
choose from.
--
Steve
Python
2018-02-23 03:31:28 UTC
Permalink
Post by Steven D'Aprano
Post by bartc
As I said, people keep missing the point. The fact this uses a grossly
inefficient way of calculating Fibonacci seems to blind them to any
other considerations.
The actual result is irrelevant, so long as its correct. The important
thing is those 50 million calls.
Why do you care about the 50 million calls? That's crazy -- the important
thing is *calculating the Fibonacci numbers as efficiently as possible*.
If you are writing practical programs, that's true. But the Julia
benchmarks are not practical programs; they are designed to compare
the performance of various language features across a range of
languages.

How is it a fair comparison to compare how Julia runs an algorithm
that is not optimal in pretty much any language, to a completely
different one with a much improved order of complexity, optimized
specifically for Python?

The article author even points out that the Fibonacci sequence problem
can be solved using the same technique as he used for his Python
solution, but makes no effort to benchmark it and then goes on to
declare Python the winner anyway. That's just bad science. Many of
the comments after the article repeatedly echo the same fundamental
flaw that bartc raised: The benchmark isn't intended to determine the
fastest way to calculate Fibonacci numbers; it's meant to measure the
language's efficiency solving this class of problems in this way.
Post by Steven D'Aprano
I don't give a flying fox about how fast the compiler can do those 48
million calls
And if you needed to write a program that HAD TO perform those 48
million calls, wouldn't you care then? Don't forget that in the real
world, such reasons can include things like, "I don't know any other
way to solve it and I need it done RFN," or, "My boss is a moron and
insisted I do it this way..." [If you've never seen the latter,
consider yourself lucky, but I sadly have.]
Post by Steven D'Aprano
When comparing two compilers, you are ALWAYS comparing two different
programs.
Such benchmarks still compare the efficiency with which the compiler
turns the same algorithm you wrote in its provided syntax (ultimately)
into executed instructions, and that's still a fair and useful
comparison of the two languages, for solutions to problems of that
particular class/structure/whatever.

Knowing that a particular problem has a more efficient solution in the
language you're using isn't always feasible; no one's knowledge of
algorithms is exhaustive, and most of us have to care about deadlines
way more than whether or not our implementations are exactly the most
efficient ones possible. The average programmer typically has very
little control or even awareness of what code the compiler generates.
You can say that makes them shitty programmers but guess what? The
world is full of shitty programmers.

In that world, the real world, if you forbid these types of
comparisons for being inherently unfair, simply because the compiler
will obviously not generate *precisely* the same code, then you can
not with any measure of sanity determine which language is the better
tool to solve your problem of the day, given the tools you have at
your disposal (including your own knowledge of algorithms). These
types of benchmarks are useful for discovering what a given language
is good at and what it is not. That information is useful to either
allow you to choose the more efficient language for the solution
you're going to implement, or perhaps even as a hint that the language
you want to use may have a more efficient way to achieve the result
you need.
Terry Reedy
2018-02-23 08:11:36 UTC
Permalink
Post by Python
Post by Steven D'Aprano
Why do you care about the 50 million calls? That's crazy -- the important
thing is *calculating the Fibonacci numbers as efficiently as possible*.
If you are writing practical programs, that's true. But the Julia
benchmarks are not practical programs; they are designed to compare
the performance of various language features across a range of
languages.
If that were so, then the comparison should use the fastest *Python*
implementation. Which is what the article discussed here did. But the
comparison is a lie when the comparison is compiled machine code versus
bytecode interpreted by crippled cpython*. And people use this sort of
benchmark to choose a language for *practical programs*, and don't know
that they are not seeing *Python* times, but *crippled cpython* times.

* Python has an import statement. But 'comparisons' disallow 'import
numpy', a quite legal Python statement, and similar others. The ability
to import Python wrappers of Fortran and C libraries is part of the
fundamental design of cpython. It is a distributed project.

The fact that numerical python, numarray, and now numpy are distributed
separately, in spite of proposals to include them, syntax additions for
their benefit, and their overwhelming usefullness, is a matter of social
and administrative convenience, not necessity. There is also a
proposal, which I consider likely to happen, to enhance the cpython
Windows and Mac installers by making installation of numpy and other
great 'external' modules selectable options.

It takes just 2 or 3 legal Python lines that do run with cpython as
installed to make 'import numpy' work, without resorting to subprocess
(though that is *also* legal python:

from ensurepip import bootstrap; bootstrap() # only if needed
from pip import main
main(['install', 'numpy'])
<wait 70 seconds>
Post by Python
Post by Steven D'Aprano
import numpy
dir(numpy)
['ALLOW_THREADS', 'AxisError', 'BUFSIZE', 'CLIP', 'ComplexWarning',
...
'vstack', 'warnings', 'where', 'who', 'zeros', 'zeros_like']

So, given a realistic number-crunching benchmark that take at least
several minutes, one could add those two lines at the top and be off
--
Terry Jan Reedy
alister
2018-02-23 09:19:41 UTC
Permalink
Post by Terry Reedy
Post by Python
Post by Steven D'Aprano
Why do you care about the 50 million calls? That's crazy -- the
important thing is *calculating the Fibonacci numbers as efficiently
as possible*.
If you are writing practical programs, that's true. But the Julia
benchmarks are not practical programs; they are designed to compare the
performance of various language features across a range of languages.
If that were so, then the comparison should use the fastest *Python*
implementation. Which is what the article discussed here did. But the
comparison is a lie when the comparison is compiled machine code versus
bytecode interpreted by crippled cpython*. And people use this sort of
benchmark to choose a language for *practical programs*, and don't know
that they are not seeing *Python* times, but *crippled cpython* times.
Benchmarks are completely pointless anyway.
the question when choosing a language should not be 'How fast it is', it
should always be 'Is it fast enough'.
once you have a choice of languages that are fast enough other criteria
for suitability become important (such as readability & ease of debugging)
--
If you're going to do something tonight that you'll be sorry for tomorrow
morning, sleep late.
-- Henny Youngman
bartc
2018-02-23 12:57:35 UTC
Permalink
* Python has an import statement.  But 'comparisons' disallow 'import
numpy', a quite legal Python statement, and similar others.
If I'm duplicating a benchmark [in another language] then the last thing
I want to see is something like 'import numpy', or other special
language feature, as that cannot be replicated.

I want to see a pure algorithm. Not the equivalent of:

os.system("C_version.exe")

  The ability
to import Python wrappers of Fortran and C libraries is part of the
fundamental design of cpython.  It is a distributed project.
Is numpy a general purpose C library that can also be called from any
language that can use a C API? Or is it specific to Python?
--
bartc
Chris Angelico
2018-02-23 13:03:06 UTC
Permalink
Post by Terry Reedy
* Python has an import statement. But 'comparisons' disallow 'import
numpy', a quite legal Python statement, and similar others.
If I'm duplicating a benchmark [in another language] then the last thing I
want to see is something like 'import numpy', or other special language
feature, as that cannot be replicated.
os.system("C_version.exe")
The ability
Post by Terry Reedy
to import Python wrappers of Fortran and C libraries is part of the
fundamental design of cpython. It is a distributed project.
Is numpy a general purpose C library that can also be called from any
language that can use a C API? Or is it specific to Python?
No, it's a general purpose FORTRAN library that can also be called
from any language that can use a generic C, FORTRAN, COBOL, etc API.

ChrisA
Steven D'Aprano
2018-02-23 17:49:14 UTC
Permalink
Post by bartc
Is numpy a general purpose C library that can also be called from any
language that can use a C API? Or is it specific to Python?
No, it's a general purpose FORTRAN library that can also be called from
any language that can use a generic C, FORTRAN, COBOL, etc API.
Numpy itself is a collection of Python interfaces to a number of C and
Fortran libraries. You can't generally call numpy from other languages --
you can't even call numpy from other Python implementations, unless they
support calling C/Fortran. Jython, for example, can't do it without help:

https://jyni.org/

and while PyPy can call numpy, doing so is slow, and there's a custom
fork of numpy specially for PyPy:

https://bitbucket.org/pypy/numpy


Remember that numpy show cases the exact reason Python was invented: to
act as an easy to use "glue language" to join together efficient and
useful, but not so easy to use, libraries written in C and Fortran.
Jython was invented to support Java libraries, and IronPython to make it
easy to integrate with the Dot-Net ecosystem.
--
Steve
Chris Angelico
2018-02-23 18:20:34 UTC
Permalink
On Sat, Feb 24, 2018 at 4:49 AM, Steven D'Aprano
Post by Steven D'Aprano
Post by bartc
Is numpy a general purpose C library that can also be called from any
language that can use a C API? Or is it specific to Python?
No, it's a general purpose FORTRAN library that can also be called from
any language that can use a generic C, FORTRAN, COBOL, etc API.
Numpy itself is a collection of Python interfaces to a number of C and
Fortran libraries. You can't generally call numpy from other languages --
you can't even call numpy from other Python implementations, unless they
https://jyni.org/
and while PyPy can call numpy, doing so is slow, and there's a custom
https://bitbucket.org/pypy/numpy
Right. I was kinda handwaving a bit; numpy itself isn't the thing
you'd call from elsewhere, but BLAS libraries in general are intended
to be available from many languages. Point is, it's not
Python-specific.

ChrisA
Python
2018-02-23 16:32:41 UTC
Permalink
Post by Terry Reedy
Post by Python
Post by Steven D'Aprano
Why do you care about the 50 million calls? That's crazy -- the important
thing is *calculating the Fibonacci numbers as efficiently as possible*.
If you are writing practical programs, that's true. But the Julia
benchmarks are not practical programs; they are designed to compare
the performance of various language features across a range of
languages.
If that were so, then the comparison should use the fastest *Python*
implementation.
Doing that would completely fail to accomplish the task of comparing
the performance of recursive function calls in the two languages,
which is what the benchmark was designed to do. So, no actually, it
shouldn't.
Chris Angelico
2018-02-23 16:42:43 UTC
Permalink
Post by Python
Post by Terry Reedy
Post by Python
Post by Steven D'Aprano
Why do you care about the 50 million calls? That's crazy -- the important
thing is *calculating the Fibonacci numbers as efficiently as possible*.
If you are writing practical programs, that's true. But the Julia
benchmarks are not practical programs; they are designed to compare
the performance of various language features across a range of
languages.
If that were so, then the comparison should use the fastest *Python*
implementation.
Doing that would completely fail to accomplish the task of comparing
the performance of recursive function calls in the two languages,
which is what the benchmark was designed to do. So, no actually, it
shouldn't.
Where does the author say that the benchmark is designed to compare
recursion? If you specifically test recursion, some language
interpreters will perform really well (because they convert your
recursion into iteration) and others will not (because they faithfully
execute the code you actually ask them to). Recursion is sometimes a
good way to describe an algorithm, but rarely a good way to implement
it at a low level.

ChrisA
Python
2018-02-23 18:43:06 UTC
Permalink
Post by Chris Angelico
Post by Python
Post by Terry Reedy
If that were so, then the comparison should use the fastest *Python*
implementation.
Doing that would completely fail to accomplish the task of comparing
the performance of recursive function calls in the two languages,
which is what the benchmark was designed to do. So, no actually, it
shouldn't.
Where does the author say that the benchmark is designed to compare
recursion?
Chris, you're a smart guy... Are you suggesting that the reason the
fibonacci sequence was selected as a benchmark is because it's such an
amazingly useful problem that it, in and of itself, warrants having
such a behchmark? Or, do you think the reason it makes sense to have
such a benchmark is because, like the reason it's presented in pretty
much every CS program ever, it presents an opportunity to consider a
particular class of problems and different techniques for solving
those problems, and the performance characteristics of those
solutions?


But, to answer your question more directly, here:

https://julialang.org/benchmarks/

"It is important to note that the benchmark codes are not written
for absolute maximal performance (the fastest code to compute
recursion_fibonacci(20) is the constant literal 6765). Instead,
the benchmarks are written to test the performance of identical
algorithms and code patterns implemented in each language. For
example, the Fibonacci benchmarks all use the same (inefficient)
doubly-recursive algorithm..."

Satisfied?
Post by Chris Angelico
Recursion is sometimes a good way to describe an algorithm, but
rarely a good way to implement it at a low level.
I'm well aware, and said the equivalent elsewhere. As I also said
elsewhere, I never claimed it was a particularly useful benchmark. It
is, nevertheless, designed to accomplish the stated goal, and does
exactly that. You can decide for yourself how useful that goal is,
but you can't really argue that it doesn't serve that purpose.

So, by changing the algorithm, the article defeats the purpose of the
benchmark. It makes some fine points about code optimization, but it
completely fails at its stated purpose (to make the benchmarks more
fair). The comparisons it makes are substantially less valid than the
ones made by the Julia benchmarks, on account of optimizing only the
algorithm used by Python, and not testing with a similarly optimized
algorithm in Julia, but rather using its results for the intentionally
unoptimized algorithm those benchmarks used. Even if testing
optimized code is the point, as the article claims, it utterly fails
to do that. Bad science.
Chris Angelico
2018-02-23 18:56:25 UTC
Permalink
Post by Python
Post by Chris Angelico
Post by Python
Post by Terry Reedy
If that were so, then the comparison should use the fastest *Python*
implementation.
Doing that would completely fail to accomplish the task of comparing
the performance of recursive function calls in the two languages,
which is what the benchmark was designed to do. So, no actually, it
shouldn't.
Where does the author say that the benchmark is designed to compare
recursion?
Chris, you're a smart guy... Are you suggesting that the reason the
fibonacci sequence was selected as a benchmark is because it's such an
amazingly useful problem that it, in and of itself, warrants having
such a behchmark? Or, do you think the reason it makes sense to have
such a benchmark is because, like the reason it's presented in pretty
much every CS program ever, it presents an opportunity to consider a
particular class of problems and different techniques for solving
those problems, and the performance characteristics of those
solutions?
https://julialang.org/benchmarks/
"It is important to note that the benchmark codes are not written
for absolute maximal performance (the fastest code to compute
recursion_fibonacci(20) is the constant literal 6765). Instead,
the benchmarks are written to test the performance of identical
algorithms and code patterns implemented in each language. For
example, the Fibonacci benchmarks all use the same (inefficient)
doubly-recursive algorithm..."
Satisfied?
No, not satisfied. Everything you've said would still be satisfied if
all versions of the benchmark used the same non-recursive algorithm.
There's nothing here that says it's testing recursion, just that (for
consistency) it's testing the same algorithm. There is no reason to
specifically test *recursion*, unless that actually aligns with what
you're doing. For instance, when Google Chrome rolled out its new V8
JavaScript engine, it was specifically optimized for recursion,
because many Google apps used recursion heavily. If you're
implementing a new LISP interpreter, you should probably optimize for
recursion, because most LISP code is going to be written recursively.
But otherwise, there's no particular reason to stick to recursion.
Post by Python
So, by changing the algorithm, the article defeats the purpose of the
benchmark. It makes some fine points about code optimization, but it
completely fails at its stated purpose (to make the benchmarks more
fair). The comparisons it makes are substantially less valid than the
ones made by the Julia benchmarks, on account of optimizing only the
algorithm used by Python, and not testing with a similarly optimized
algorithm in Julia, but rather using its results for the intentionally
unoptimized algorithm those benchmarks used. Even if testing
optimized code is the point, as the article claims, it utterly fails
to do that. Bad science.
I won't dispute that part. The correct way to do this would be to
optimize both sides fairly - either not at all, or equivalently (for
some definition of equivalence). But switching both sides to an
unoptimized iterative algorithm would be perfectly fair. Recursion is
NOT essential to the benchmark.

ChrisA
bartc
2018-02-23 19:36:39 UTC
Permalink
Post by Chris Angelico
Post by Python
Satisfied?
No, not satisfied. Everything you've said would still be satisfied if
all versions of the benchmark used the same non-recursive algorithm.
Why? Does it matter?
Post by Chris Angelico
There's nothing here that says it's testing recursion, just that (for
consistency) it's testing the same algorithm. There is no reason to
specifically test *recursion*, unless that actually aligns with what
you're doing. For instance, when Google Chrome rolled out its new V8
JavaScript engine, it was specifically optimized for recursion,
because many Google apps used recursion heavily. If you're
implementing a new LISP interpreter, you should probably optimize for
recursion, because most LISP code is going to be written recursively.
But otherwise, there's no particular reason to stick to recursion.
You list plenty of reasons, then say there's no reason to use recursion!

The recursion is a red herring; but to get a program with lots of
function calling, then using a recursive algorithm - any algorithm -
will do the job.

Would you have a different opinion if Python excelled at that benchmark,
and Julia sucked? (I can't remember what the results were.)

(I was testing one of my static compilers today; it has three call
convention modes all operating within the Win64 ABI. I used the
Fibonacci benchmark with fib(42) to test it, with these results:

mode 1 5.8 seconds (non-ABI)
mode 2 6.5 seconds (part ABI conformance)
mode 3 7.4 seconds (full ABI conformance)

Other compilers (C compilers that have to conform) ranged from 4.3 to
6.2 seconds, including MSVC/O2. Except for gcc-O3 which managed 1.7
seconds. How do it do that? I'll have to investigate to see how much it
cheated.

The point of all this: I was using the recursive Fibonacci benchmark for
these tests, as there is little else going on besides function calls and
returns. It's perfect.

Why some people hate it here, I really don't know.)
Post by Chris Angelico
I won't dispute that part. The correct way to do this would be to
optimize both sides fairly - either not at all, or equivalently (for
some definition of equivalence). But switching both sides to an
unoptimized iterative algorithm would be perfectly fair. Recursion is
NOT essential to the benchmark.
So, what benchmark would you use instead if you wanted an idea of how
well each language dealt with it?
--
bartc
Python
2018-02-23 19:09:16 UTC
Permalink
Post by Chris Angelico
No, not satisfied. Everything you've said would still be satisfied if
all versions of the benchmark used the same non-recursive algorithm.
There's nothing here that says it's testing recursion, just that (for
consistency) it's testing the same algorithm. There is no reason to
specifically test *recursion*, unless that actually aligns with what
you're doing.
It seems abundantly clear to me that testing recursion is the point of
writing a benchmark implementing recursion (and very little of
anything else). Again, you can decide for yourself the suitability of
the benchmark, but I don't think you can really claim it doesn't
effectively test what it means to.
Chris Angelico
2018-02-23 19:16:38 UTC
Permalink
Post by Python
Post by Chris Angelico
No, not satisfied. Everything you've said would still be satisfied if
all versions of the benchmark used the same non-recursive algorithm.
There's nothing here that says it's testing recursion, just that (for
consistency) it's testing the same algorithm. There is no reason to
specifically test *recursion*, unless that actually aligns with what
you're doing.
It seems abundantly clear to me that testing recursion is the point of
writing a benchmark implementing recursion (and very little of
anything else). Again, you can decide for yourself the suitability of
the benchmark, but I don't think you can really claim it doesn't
effectively test what it means to.
Where do you get that it's specifically a recursion benchmark though?
I can't find it anywhere in the explanatory text.

ChrisA
Python
2018-02-23 19:41:17 UTC
Permalink
Post by Chris Angelico
Post by Python
It seems abundantly clear to me that testing recursion is the point of
writing a benchmark implementing recursion (and very little of
anything else). Again, you can decide for yourself the suitability of
the benchmark, but I don't think you can really claim it doesn't
effectively test what it means to.
Where do you get that it's specifically a recursion benchmark though?
I can't find it anywhere in the explanatory text.
I don't know how you could conclude it could be meant to test anything
else. The code is:

## recursive fib ##
fib(n) = n < 2 ? n : fib(n-1) + fib(n-2)

What else is there of substance to test? Trigraphs? Addition? The
comment even calls out that it is recursive.

I further draw inference from the two statements:

1. "Instead, the benchmarks are written to test the performance of
identical algorithms and code patterns implemented in each
language."

Calling out explicitly that it is algorithms that they are testing,
and here the algorithm is a recursion.

2. "For example, the Fibonacci benchmarks all use the same
(inefficient) doubly-recursive algorithm..."

They expressly called out this algorithm as being inefficient, and
expressly called it out as being double-recursion.

I'll grant you they never explicitly said so unequivocally (that I
saw--my reading of their site was not exhaustive), but I can't see any
way an informed reader could conclude any different purpose other than
benchmarking recursion.
Steven D'Aprano
2018-02-24 02:46:02 UTC
Permalink
Even if testing optimized
code is the point, as the article claims, it utterly fails to do that.
Bad science.
You've used that statement two or three times now.

*This isn't science*.

There's nothing scientific about writing benchmarks, or even objective.
It is through and through subjective choices given a paper-thin patina of
objectivity because the results include numbers.

But those numbers depend on the precise implementation of the benchmark.
They depend on the machine you run them on, sometimes strongly enough
that the order of which language is faster can swap. I remember a bug in
Python's urllib module, I think it was, that made code using it literally
hundreds of times slower on Windows than Linux or OS X.

The choice of algorithms used is not objective, or fair. Most of it is
tradition: the famous "whetstone" benchmark apparently measures something
which has little or no connection to anything software developers should
care about. It, like the Dhrystone variant, were invented to benchmark
CPU performance. The relevance to comparing languages is virtually zero.

"As this data reveals, Dhrystone is not a particularly
representative sample of the kinds of instruction sequences
that are typical of today's applications. The majority of
embedded applications make little use of the C libraries
for example, and even desktop applications are unlikely to
have such a high weighting of a very small number of
specific library calls."

http://dell.docjava.com/courses/cr346/data/papers/DhrystoneMIPS-
CriticismbyARM.pdf


Take the Fibonacci double-recursion benchmark. Okay, it tests how well
your language does at making millions of function calls. Why? How often
do you make millions of function calls? For most application code,
executing the function is far more costly than the overhead of calling
it, and the call overhead is dwarfed by the rest of the application.

For many, many applications, the *entire* program run could take orders
of magnitude fewer function calls than a single call to fib(38).

If you have a language with tail recursion elimination, you can bet
that's its benchmarks will include examples of tail recursion and tail
recursion will be a favoured idiom in that language. If it doesn't, it
won't.

I'm going to end with a quote:

"And of course, the very success of a benchmark program is
a danger in that people may tune their compilers and/or
hardware to it, and with this action make it less useful."

Reinhold P. Weicker, Siemens AG, April 1989
Author of the Dhrystone Benchmark
--
Steve
bartc
2018-02-24 11:35:17 UTC
Permalink
Post by Steven D'Aprano
Take the Fibonacci double-recursion benchmark. Okay, it tests how well
your language does at making millions of function calls. Why? How often
do you make millions of function calls?
Very often. Unless you are writing code 1970s style with everything in
one big main function.

I've done some tests with my interpreter [sorry, Ned], on one real task:

Total number of byte-code calls: 3.2 million
Total number of real x64 calls: 520 million

On this specific benchmark: 48 million and 580 million.

Changing the way those x64 calls were made (using a different call
convention), made some byte-code programs take up to 50% longer to
execute. [In this interpreter, each byte-code, no matter what it is, is
dispatched via a function call.]

For most application code,
Post by Steven D'Aprano
executing the function is far more costly than the overhead of calling
it, and the call overhead is dwarfed by the rest of the application.
Any actual figures?

In the case of interpreters, you want to optimise each byte-code, and
one way of doing that is to write a small program which features that
byte-code heavily. And then you start tweaking.

It is true that when working with heavy-duty data, or offloading work to
external, non-byte-code functions, then the byte-code execution
overheads are small. But Python's weakness is when it /is/ executing
actual algorithms using actual byte-code.

And actually, performance of CPython does seem to have improved
tremendously over the years. So some people have their eye on the ball.
Clearly not you.
Post by Steven D'Aprano
If you have a language with tail recursion elimination, you can bet
that's its benchmarks will include examples of tail recursion and tail
recursion will be a favoured idiom in that language. If it doesn't, it
won't.
Benchmarks need to be honest. But Fibonacci I think can't use that
optimisation (although gcc seems to have found another way of not that
much work).
--
bartc
Rick Johnson
2018-02-26 03:26:12 UTC
Permalink
On Friday, February 23, 2018 at 8:48:55 PM UTC-6, Steven D'Aprano wrote:
[...]
Post by Steven D'Aprano
Take the Fibonacci double-recursion benchmark. Okay, it
tests how well your language does at making millions of
function calls. Why?
Because making "millons of function calls" is what software
*DOES*!

Granted, and as others have testified in this thread, i
myself have almost never implemented an algorithm like that
monstrosity of recursion that is implemented in fib(), but
whether we use recursive functions are not, a high number of
functions calls is inevitable in our software. And if you
don't believe me, then use sys.setrace(myFunc) to trace all
function calls to stdout, and you shall be enlightened[1].
Post by Steven D'Aprano
How often do you make millions of function calls?
All the time! Of course, if the extent of your code base is
a single script containing the source `print 'Hello World'`,
perhaps your experience may vary from others here.
Post by Steven D'Aprano
For most application code, executing the function is far
more costly than the overhead of calling it,
What? Your functions are only called _once_ during the
lifetime of your code execution? Enough with the leg
pulling.
Post by Steven D'Aprano
and the call overhead is dwarfed by the rest of the
application.
Of course, because it's calling _other_ functions.
Post by Steven D'Aprano
For many, many applications, the *entire* program run could
take orders of magnitude fewer function calls than a single
call to fib(38).
Hmm, let's see:

def main():
print 'hello world'
main()

By golly, he's right! O_O


--
[1] But, in the meantime, you may want to make yourself a
taco; go play a round of golf; and possibly pay a visit to
national monument (or three!), while all that crap is printing
to stdout.
Steven D'Aprano
2018-02-26 10:36:49 UTC
Permalink
[...]
Post by Steven D'Aprano
Take the Fibonacci double-recursion benchmark. Okay, it tests how well
your language does at making millions of function calls. Why?
Because making "millons of function calls" is what software *DOES*!
You're right -- I worded that badly, and ended up saying something I
didn't really mean. Mea culpa.

Of course over the course of the entire program run, most non-trivial
programmers will end up having millions (if not billions) of function
calls *in total*. But they're not all happening *together*.

What I tried to get across is, how often do we use millions of function
calls to get one value? By default, Python will only allow one thousand
function calls in a stack before raising RecursionError.

py> def bad():
... return bad()
...
py> bad()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in bad
[ previous line repeats 998 times ]
RecursionError: maximum recursion depth exceeded


There's nothing unique about recursion that will trigger this error. Any
call stack of more than 1000 function calls will do it. So forget about
code getting a million functions deep. The average is probably more like
a dozen, perhaps a hundred tops.


[...]
Post by Steven D'Aprano
For most application code, executing the function is far more costly
than the overhead of calling it,
What? Your functions are only called _once_ during the lifetime of your
code execution? Enough with the leg pulling.
Let me explain what I intended to say.

Here is a really poor estimate of the overhead of calling a function on
my computer, measured in iPython.

In [50]: def call():
....: pass
....:
In [51]: %timeit call()
1000000 loops, best of 3: 717 ns per loop

So, on my computer, in iPython, it takes 717ns to call a function that
does nothing. How long does it take to do the tiniest smidgen of useful
work?

In [52]: def do_something():
....: a = 1
....: b = a+1
....: return b
....:

In [53]: %timeit do_something()
1000000 loops, best of 3: 923 ns per loop

Only an extra 200 ns! Wow, function calls are expensive!!!

(Well, yes, I knew that.) However, let's do some *real* work, not a
Mickey Mouse function like do_something().

In [54]: import pyprimes

In [55]: %timeit pyprimes.nth_prime(100000)
1 loops, best of 3: 2.6 s per loop


I happen to know that calling nth_prime ends up making a couple of
hundred function calls. (Because I wrote it.) Let's call it 10,000
function calls, to be on the safe side. So 10,000 x 800 nanoseconds is
0.008 second, let's call it 0.01 second, which is just 0.4% of the total
cost of calculating the number I was after.

(For the record, that's 1299709.)

So just a trivial fraction of the total cost of the function.

Now I know that this is not always the case. But I maintain that for
*most* (but not all) practical, real-world code that actually does
something useful, the overhead of calling the functions is *usually* (but
not always) low compared to the cost of actually doing whatever it is
that the functions do.
--
Steve
Terry Reedy
2018-02-24 05:18:58 UTC
Permalink
Post by Python
Post by Terry Reedy
Post by Python
Post by Steven D'Aprano
Why do you care about the 50 million calls? That's crazy -- the important
thing is *calculating the Fibonacci numbers as efficiently as possible*.
If you are writing practical programs, that's true. But the Julia
benchmarks are not practical programs; they are designed to compare
the performance of various language features across a range of
languages.
If that were so, then the comparison should use the fastest *Python*
implementation.
Doing that would completely fail to accomplish the task of comparing
the performance of recursive function calls in the two languages,
which is what the benchmark was designed to do.
That is an non goal because *languages* do not have clock-time speeds,
only programs written in concrete implementations run on real hardware.

Why do you think it fair to compare function call times using the
slowest rather than the fastest implementation of Python function calls?
Real Python programmers who are seriously concerned about time try to
use the fastest implementation appropriate to the situation.
Post by Python
So, no actually, it shouldn't.
To me, it is 'not using the fasted Python' that fails to make apples to
apples comparisons.

It has been said here that comparisons should use the same algorithm
doing the much the same work in both implementation. However, a Python
function call does *much more work* than in most languages, because the
meaning of 'function call' is much more flexible in Python than most
languages. The usual comparison is like lemons (other language calls)
to oranges (Python language calls, much more flexible). To be fair, the
comparison should be to a Python implementation that either notices or
accepts a declaration that, for instance, fib(n) only needs to pass an
int by position.

Comparing int addition to python object addition also compares unlike
operations. To compare using the same addition algorithm, one must use
an implementation that can restrict '+' to just int addition.

The Juila fib benchmark uses the type of function call Julia is good at.
Suppose we define fib in a way that uses Python features.

def fib(n, dummy):
if n >= 2:
return fib(n=n-1, dummy=dummy) + fib(dummy=dummy, n=n-2)
elif n >= 0:
return 1
else:
return None, n # or some error indicator including the bad value

If the language does not support 'parameter=value' syntax (which existed
long before Python), use ('n', n) and ('dummy', dummy) instead.
Now let's compare call times.

Or lets try 'fib(n-1, dummy) + fib(dummy=dummy, n=n-2)' to compare
functions that can identify argments either by position or by name.

Or f(n-1, dummy) + f(dummy=dummy, n=n-2) + f(*(n-3, dummy)), and change
'2' to '3', to utilize another Python call feature. If the language
does not support *args, omit '*'.
--
Terry Jan Reedy
boB Stepp
2018-02-24 07:05:40 UTC
Permalink
Post by Chris Angelico
Post by Python
Post by Chris Angelico
No, not satisfied. Everything you've said would still be satisfied if
all versions of the benchmark used the same non-recursive algorithm.
There's nothing here that says it's testing recursion, just that (for
consistency) it's testing the same algorithm. There is no reason to
specifically test *recursion*, unless that actually aligns with what
you're doing.
It seems abundantly clear to me that testing recursion is the point of
writing a benchmark implementing recursion (and very little of
anything else). Again, you can decide for yourself the suitability of
the benchmark, but I don't think you can really claim it doesn't
effectively test what it means to.
Where do you get that it's specifically a recursion benchmark though?
I can't find it anywhere in the explanatory text.
I hope I am not missing something blatantly obvious, but in the page
Python linked to (https://julialang.org/benchmarks/), in the opening
paragraph it states:

<quote>
These micro-benchmarks, while not comprehensive, do test compiler
performance on a range of common code patterns, such as function
calls, string parsing, sorting, numerical loops, random number
generation, recursion, and array operations.
</quote>

Recursion is listed above as one code pattern to specifically target
with one of their micro-benchmarks. I did not exhaustively go through
each language's code examples, but it does appear that the authors of
these benchmarks are implementing as exactly as possible the same
algorithm chosen for each of these code patterns in each language.
Now to me the real question is whether the Julia people were trying to
be intellectually honest in their choices of coding patterns and
algorithms or were they deliberately cherry picking these to make
Julia look better than the competition? Myself, I would rather be
charitable than accusatory about the benchmarkers' intentions. For
instance, the authors were aware of numpy and used it for some of the
python coding -- the array operations they were targeting IIRC.
Instead, if they were being deliberately dishonest, they could have
come up with some really contrived python code.
--
boB
Antoon Pardon
2018-02-26 12:06:51 UTC
Permalink
Post by Steven D'Aprano
Why do you care about the 50 million calls? That's crazy -- the important
thing is *calculating the Fibonacci numbers as efficiently as possible*.
No necessarily.

David Beazley in his talks sometimes uses an ineffecient algorithm for calculating
fibonacci numbers because he needs something that uses the cpu intensively.
calculating the fibonacci numbers in that context as efficiently as possible would
defeat that purpose.

So in a context of a benchmark it is not unreasonable to assume those 50 million
calls are the purpose and not calculating the Fibonacci numbers as efficiently as
possible.
--
Antoon.
 
bartc
2018-02-26 12:29:34 UTC
Permalink
Post by Antoon Pardon
Post by Steven D'Aprano
Why do you care about the 50 million calls? That's crazy -- the important
thing is *calculating the Fibonacci numbers as efficiently as possible*.
No necessarily.
David Beazley in his talks sometimes uses an ineffecient algorithm for calculating
fibonacci numbers because he needs something that uses the cpu intensively.
calculating the fibonacci numbers in that context as efficiently as possible would
defeat that purpose.
So in a context of a benchmark it is not unreasonable to assume those 50 million
calls are the purpose and not calculating the Fibonacci numbers as efficiently as
possible.
I don't think Steven is ever going to concede this point.

Because Python performs badly compared to Julia or C, and it's not
possible to conveniently offload the task to some fast library because
it only uses a handful of primitive byte-codes.

(I have the same trouble with my own interpreted language. Although
somewhat brisker than CPython, it will always be much slower than a
C-like language on such micro-benchmarks.

But I accept that; I don't have an army of people working on
acceleration projects and tracing JIT compilers. To those people
however, such a benchmark can be a useful yardstick of progress.)
--
bartc
Antoon Pardon
2018-02-26 12:28:41 UTC
Permalink
Post by Terry Reedy
Post by Python
Doing that would completely fail to accomplish the task of comparing
the performance of recursive function calls in the two languages,
which is what the benchmark was designed to do.
That is an non goal because *languages* do not have clock-time speeds,
only programs written in concrete implementations run on real hardware.
Why do you think it fair to compare function call times using the
slowest rather than the fastest implementation of Python function
calls?  Real Python programmers who are seriously concerned about time
try to use the fastest implementation appropriate to the situation.
If the slowest happens to be the standard implementation I see nothing wrong with it.
Post by Terry Reedy
Post by Python
  So, no actually, it shouldn't.
To me, it is 'not using the fasted Python' that fails to make apples
to apples comparisons.
It has been said here that comparisons should use the same algorithm
doing the much the same work in both implementation.  However, a
Python function call does *much more work* than in most languages,
because the meaning of 'function call' is much more flexible in Python
than most languages.
So? I don't see what is wrong when is pointed out that all that extra
work, that a lot of people don't care about
is costing performance.
Post by Terry Reedy
The usual comparison is like lemons (other language calls) to oranges
(Python language calls, much more flexible).  To be fair, the
comparison should be to a Python implementation that either notices or
accepts a declaration that, for instance, fib(n) only needs to pass an
int by position.
Comparing int addition to python object addition also compares unlike
operations.  To compare using the same addition algorithm, one must
use an implementation that can restrict '+' to just int addition.
I don't agree, if the performance loss comes from the standard implementation
of the language unable of doing that then people testing languages shouldn't
be burdened with going to search for some implementation that can.
--
Antoon Pardon
Etienne Robillard
2018-02-22 17:00:45 UTC
Permalink
So, how exactly can PyPy and JIT runs multithreaded Python applications
any faster than Julia on distributed systems?

Right now I think PyPy and JIT can run Python code on my old ia32
computer faster than with Python/Cython.

How do Julia scale on x86 machines ?

Etienne
Post by Steven D'Aprano
Post by bartc
The idea of the Fibonacci benchmark is to test how effectively an
implementation manages large numbers of recursive function calls. Then,
fib(36) would normally involve 48,315,633 calls.
This version does only 37, giving a misleading impression.
Who cares if it only does 37? There is *absolutely no benefit* to those
additional 48,315,596 calls, and thanks to the benchmark I discovered
that.[1]
I want to know what is the fastest implementation of Fibonacci I can
write. I'm not married to the idea that I have to make a ludicrous 48
million function calls just to get the 36th Fibonacci number.
Post by bartc
(It then goes on to suggest using 'numba', and using its JIT compiler,
and using on that on an /iterative/ version of fib(). Way to miss the point.
Indeed, you certainly have.
Post by bartc
It might be a technique to bear in mind, but it is nonsensical to say
this gives a 17,000 times speed-up over the original code.
That's *exactly what it did*.
How many years, decades, have you been programming? Have you not realised
yet that the best way to optimize code is to pick the fastest algorithm
you can that will do the job? There's no law that says because you
started with a slow, inefficient algorithm you have to stay with it.
Post by bartc
Here's another speed-up I found myself, although it was only 50 times
faster, not 17,000: just write the code in C, and call it via
os.system("fib.exe").
Did you include the time taken to capture the text output from stdout,
parse it, and convert to an int?
Did your C code support numbers greater than 2**64? My fibonacci function
py> fibi(10000)
336447648764317832666216120051075...976171121233066073310059947366875
Output truncated for brevity, it is a 2090-digit number.
Admittedly it did take about a minute and a half to generate all 208988
digits of the one millionth Fibonacci number, but the performance is fast
enough for my needs.
[1] Actually I already knew it, but then I didn't perform these
benchmarks, I'm just linking to them.
--
Etienne Robillard
***@yandex.com
https://www.isotopesoftware.ca/
Chris Angelico
2018-02-22 18:43:16 UTC
Permalink
Post by Serhiy Storchaka
Post by Chris Angelico
Not overly misleading; the point of it is to show how trivially easy
it is to memoize a function in Python. For a fair comparison, I'd like
to see the equivalent Julia code: the function, unchanged, with
something around the outside of it to manage caching and memoization.
Can that be done with a couple of trivial lines of code using only the
standard library?
The article contains a reference to a Julia example.
Post by Chris Angelico
Note that Julia also can memoize functions, see this example [1] provided
by Ismael V.C.
[1] http://nbviewer.jupyter.org/gist/Ismael-VC/b04ad17ee44c89917803
Yep, but I don't know Julia enough to know if that's the standard
library, and there aren't timings for it (which also makes it harder
to skim for - I missed it on first read, and only found it when I
searched the page for 'memoize'). If that's standard library, my
conclusion would be "Winner: Tie".

ChrisA
Chris Angelico
2018-02-22 19:07:44 UTC
Permalink
On Fri, Feb 23, 2018 at 3:00 AM, Steven D'Aprano
Post by Steven D'Aprano
Post by bartc
Here's another speed-up I found myself, although it was only 50 times
faster, not 17,000: just write the code in C, and call it via
os.system("fib.exe").
Did you include the time taken to capture the text output from stdout,
parse it, and convert to an int?
Relatively trivial compared to the overhead of invoking a subprocess
THROUGH A SHELL. On Windows, that has to invoke cmd.exe; on Unix-like
systems, most likely /bin/sh. And then that has to search the system
path (maybe that doesn't happen on his Windows system, as presumably
it's finding the executable on the cwd check). That's file system
operations, and a lot of them.
Post by Steven D'Aprano
Did your C code support numbers greater than 2**64? My fibonacci function
py> fibi(10000)
336447648764317832666216120051075...976171121233066073310059947366875
Output truncated for brevity, it is a 2090-digit number.
Admittedly it did take about a minute and a half to generate all 208988
digits of the one millionth Fibonacci number, but the performance is fast
enough for my needs.
Yeah, but if you stick that into a variable, you can separately time
the calculation from the stringification. Using the exact code from
the page:

def fib_seq(n):
if n < 2:
return n
a,b = 1,0
for i in range(n-1):
a,b = a+b,a
return a
Post by Steven D'Aprano
Post by bartc
time.time(); x = fib_seq(1000000); time.time()
1519325946.7880309
1519325953.773382

That's seven seconds to calculate the number. How big a number is it?
Post by Steven D'Aprano
Post by bartc
x.bit_length()
694241
Post by Steven D'Aprano
Post by bartc
math.log10(x)
208987.29076497658

Both take effectively zero time. Dividing that number by 10 that many
times is what takes all the processing. (Stringifying a number
basically consists of div-mod to trim off the last digit, keep going
until you run out of number. Lots of division.)

So actually calculating the millionth Fibonacci number can be done
fairly quickly; figuring out its first digits would be hard, but we
don't actually need that. Though we could probably get a decent
Post by Steven D'Aprano
Post by bartc
log = math.log10(x)
print(10**(log - int(log)), "e +", int(log))
1.9532821287892859 e + 208987

I wouldn't trust all of those digits, but that's a fast way to get an
idea of the scale of a number. Doing that rather than printing the
whole number means you actually see the performance of *calculation*.

ChrisA
Jack Fearnley
2018-02-22 19:55:08 UTC
Permalink
Post by Chris Angelico
On Fri, Feb 23, 2018 at 3:00 AM, Steven D'Aprano
Post by Steven D'Aprano
Post by bartc
Here's another speed-up I found myself, although it was only 50 times
faster, not 17,000: just write the code in C, and call it via
os.system("fib.exe").
Did you include the time taken to capture the text output from stdout,
parse it, and convert to an int?
Relatively trivial compared to the overhead of invoking a subprocess
THROUGH A SHELL. On Windows, that has to invoke cmd.exe; on Unix-like
systems, most likely /bin/sh. And then that has to search the system
path (maybe that doesn't happen on his Windows system, as presumably
it's finding the executable on the cwd check). That's file system
operations, and a lot of them.
Post by Steven D'Aprano
Did your C code support numbers greater than 2**64? My fibonacci
function can calculate the 10,000th Fibonacci number in a blink of the
py> fibi(10000)
336447648764317832666216120051075...976171121233066073310059947366875
Output truncated for brevity, it is a 2090-digit number.
Admittedly it did take about a minute and a half to generate all 208988
digits of the one millionth Fibonacci number, but the performance is
fast enough for my needs.
Yeah, but if you stick that into a variable, you can separately time the
calculation from the stringification. Using the exact code from the
return n
a,b = a+b,a
return a
Post by Steven D'Aprano
Post by bartc
time.time(); x = fib_seq(1000000); time.time()
1519325946.7880309 1519325953.773382
That's seven seconds to calculate the number. How big a number is it?
Post by Steven D'Aprano
Post by bartc
x.bit_length()
694241
Post by Steven D'Aprano
Post by bartc
math.log10(x)
208987.29076497658
Both take effectively zero time. Dividing that number by 10 that many
times is what takes all the processing. (Stringifying a number basically
consists of div-mod to trim off the last digit, keep going until you run
out of number. Lots of division.)
So actually calculating the millionth Fibonacci number can be done
fairly quickly; figuring out its first digits would be hard, but we
don't actually need that. Though we could probably get a decent
Post by Steven D'Aprano
Post by bartc
log = math.log10(x)
print(10**(log - int(log)), "e +", int(log))
1.9532821287892859 e + 208987
I wouldn't trust all of those digits, but that's a fast way to get an
idea of the scale of a number. Doing that rather than printing the whole
number means you actually see the performance of *calculation*.
I realize that this thread is about benchmarking and not really about
generating fibonacci numbers, but I hope nobody is using this code to
generate them on a 'production' basis,

Fibonacci numbers, any linearly recursive sequence for that matter, can
be generated in log time.

GP/Pari on my Intel I7 computes fibonacci(100000) in less than 1 ms,
fibonacci(1000000) in 5ms, fibonacci(10000000) in 59 ms and finally
fibonacci(100000000) in 733 ms.

This would obviously be slower in Python but by a constant factor.

Best regards,
Jack Fearnley
Post by Chris Angelico
ChrisA
bartc
2018-02-22 21:05:36 UTC
Permalink
Post by Jack Fearnley
I realize that this thread is about benchmarking and not really about
generating fibonacci numbers, but I hope nobody is using this code to
generate them on a 'production' basis,
Fibonacci numbers, any linearly recursive sequence for that matter, can
be generated in log time.
GP/Pari on my Intel I7 computes fibonacci(100000) in less than 1 ms,
fibonacci(1000000) in 5ms,
The simple method involves 1 million additions of numbers with an
average length of 100,000 digits. 5ms would be pretty good going.

Presumably it uses a faster algorithm. I found this in Python (from
stackoverflow):

def fib(n):
v1, v2, v3 = 1, 1, 0 # initialise a matrix [[1,1],[1,0]]
print (bin(n)[3:])
for rec in bin(n)[3:]: # perform fast exponentiation of the ....
calc = v2*v2
v1, v2, v3 = v1*v1+calc, (v1+v3)*v2, calc+v3*v3
if rec=='1':
v1, v2, v3 = v1+v2, v1, v2
return v2

fib(1000000) took 200ms seconds in Python 3. Printing the result about
another 1.6 seconds. I think now it's down to the efficiency of the big
integer library, as little bytecode is being executed.
--
bartc
Ian Kelly
2018-02-22 22:28:01 UTC
Permalink
Post by Jack Fearnley
I realize that this thread is about benchmarking and not really about
generating fibonacci numbers, but I hope nobody is using this code to
generate them on a 'production' basis,
Fibonacci numbers, any linearly recursive sequence for that matter, can
be generated in log time.
To be pedantic, that's not really true; see below.
Post by Jack Fearnley
GP/Pari on my Intel I7 computes fibonacci(100000) in less than 1 ms,
fibonacci(1000000) in 5ms, fibonacci(10000000) in 59 ms and finally
fibonacci(100000000) in 733 ms.
I'm not sure exactly what algorithm that's using, but note that the
growth of these timings is *not* logarithmic. In fact, it's
superlinear.

It's true can generate Fibonacci numbers in log(n) iterations using
linear algebra. As long as multiplication and addition are assumed to
be constant time, then the Fibonacci algorithm is indeed O(log n)
time. However, the numbers in the sequence grow on the order of O(phi
** n) which quickly exceeds what you can fit into a 32-bit integer.
Therefore bigints are quickly required.

The fastest known integer multiplication algorithm is O(n * log n *
log log n). The Fibonacci numbers grow at O(phi ** n) and therefore
require O(log (phi ** n)) = O(n) bits to represent them. Each
iteration therefore requires O(n * log n * log log n) time, and with
O(log n) iterations the total time of the algorithm is thus O(n * (log
n) ** 2 * log log n).

It's still a lot faster than the brute force approach, and aeons
faster than the brute-force recursive approach, but it's not strictly
logarithmic because that part of the analysis is dwarfed by the cost
of the arithmetic involved.
Steven D'Aprano
2018-02-23 00:31:49 UTC
Permalink
Post by Jack Fearnley
I realize that this thread is about benchmarking and not really about
generating fibonacci numbers, but I hope nobody is using this code to
generate them on a 'production' basis,
Fibonacci numbers, any linearly recursive sequence for that matter, can
be generated in log time.
That is not what your timing results show. Your timing results are
*worse* than linear, not better. In fact, they look more like N log N:


N Your time ∝ N ∝ log N ∝ N log N
(-actual-) (----------predicted---------)
100000 <1? 0.5 4.2 0.4
1000000 5 5 5.0 5.0
10000000 59 50 5.8 58.3
100000000 733 500 6.7 666.7
1000000000 ? 5000 7.5 7500.0

I don't know what algorithm your version of Fibonacci is using, but if it
is the one which uses matrix multiplication, then it is logarithmic in
the number of *steps*, but each step doesn't take constant time. As the
numbers involved get larger, the cost of each multiplication and addition
becomes higher.

Big O analysis is never a substitute for actual timing measurements, and
the assumption behind Big O analysis, namely that only some operations
take time, and always constant time, is never correct. It is only an
approximation to the truth, and as you can see, something which is
algorithmically Big O logarithmic can turn out to scale worse than
linearly once you actually measure the times.
--
Steve
Chris Angelico
2018-02-23 00:53:57 UTC
Permalink
On Fri, Feb 23, 2018 at 11:31 AM, Steven D'Aprano
Post by Steven D'Aprano
Big O analysis is never a substitute for actual timing measurements, and
the assumption behind Big O analysis, namely that only some operations
take time, and always constant time, is never correct. It is only an
approximation to the truth, and as you can see, something which is
algorithmically Big O logarithmic can turn out to scale worse than
linearly once you actually measure the times.
I prefer to describe that as "hidden costs". Any step that has its own
non-constant cost becomes a hidden cost. That doesn't mean Big O is
itself limited, but that when you analyze a function, you have to
think about more than just what's right in front of you. For example:

def count_at_max(items):
count = 0
for item in items:
if item == max(items):
count += 1
return count

def count_at_max(items):
count = 0
peak = max(items)
for item in items:
if item == peak:
count += 1
return count

So similar, but one of them runs in O(n²) and the other in O(n). The
max() call is a "hidden cost". (Obviously they'll usually be subtler
than that, but this is a sledgehammer for demonstrative purposes.) Big
O *can* handle this, you just have to take notice of things.

ChrisA
Rick Johnson
2018-02-23 01:13:02 UTC
Permalink
On Thursday, February 22, 2018 at 1:55:35 PM UTC-6, Jack Fearnley wrote:
[...]
Post by Jack Fearnley
I realize that this thread is about benchmarking and not
really about generating fibonacci numbers, but I hope
nobody is using this code to generate them on a
'production' basis,
I've been raising the warning flags about that awful
Fibonacci example for years!

First it was in the tutorial, and then, they plastered it on
the front page of Python.org like it were some sort of
trophy...

At this point i have concluded that it must be: (1) a
sadistic noob hazing device, or (2) a straw-stuffed,
anthropomorphic, crow-poop-riddled monster that sends the
seasoned programmers off in the opposite direction.
Jack Fearnley
2018-02-23 19:58:25 UTC
Permalink
[...]
Post by Jack Fearnley
I realize that this thread is about benchmarking and not really about
generating fibonacci numbers, but I hope nobody is using this code to
generate them on a 'production' basis,
I've been raising the warning flags about that awful Fibonacci example
for years!
First it was in the tutorial, and then, they plastered it on the front
page of Python.org like it were some sort of trophy...
At this point i have concluded that it must be: (1) a sadistic noob
hazing device, or (2) a straw-stuffed,
anthropomorphic, crow-poop-riddled monster that sends the seasoned
programmers off in the opposite direction.
Excellent thread! I see I'm preaching to the converted.

Jack
Python
2018-02-23 01:36:21 UTC
Permalink
Post by Chris Angelico
Post by bartc
The idea of the Fibonacci benchmark is to test how effectively an
implementation manages large numbers of recursive function calls. Then,
fib(36) would normally involve 48,315,633 calls.
This version does only 37, giving a misleading impression.
Not overly misleading; the point of it is to show how trivially easy
it is to memoize a function in Python.
This does not appear to me at all to be the point of the article. The
point of the article seems to be that the Julia benchmarks compare
unfairly the performance of Python to Julia, because they do not use
algorithms that suit "the best way to use Python." But I have to
agree with bartc, that is not at all the point of the benchmarks. The
point of the benchmarks is to compare the performance of a particular
algorithm or set of algorithms across a variety of languages.

It's fine to say that this method of computing Fibonacci sequences is
inefficient; but anyone who's spent any time on the problem knows that
recursive function calls is not the most efficient way to calculate
them in most languages. So it must follow logically that the point of
the benchmark is not to prove that Julia is faster than Python at
solving Fibonacci sequences, but rather that Julia is faster than
Python at solving the class of problems that would be implemented in a
similar manner as this particular implementation of Fibonacci
sequences.

No other interpretation makes any sense. And if that is the point, as
logic dictates, then comparing the performance of algorithm A
implemented in language X to the performance of a completely different
algorithm B in language Y is generally both misleading and pointless.

If you can prove that A is the most efficient alogorithm possible in X
and B is the most efficient algorithm in Y, for solving the same
problem, then and only then, you might have something. Otherwise, I
think what you're saying is bunk.
Terry Reedy
2018-02-23 06:38:01 UTC
Permalink
Post by Python
Post by Chris Angelico
Post by bartc
The idea of the Fibonacci benchmark is to test how effectively an
implementation manages large numbers of recursive function calls. Then,
fib(36) would normally involve 48,315,633 calls.
This version does only 37, giving a misleading impression.
Not overly misleading; the point of it is to show how trivially easy
it is to memoize a function in Python.
This does not appear to me at all to be the point of the article. The
point of the article seems to be that the Julia benchmarks compare
unfairly the performance of Python to Julia, because they do not use
algorithms that suit "the best way to use Python." But I have to
agree with bartc, that is not at all the point of the benchmarks. The
point of the benchmarks is to compare the performance of a particular
algorithm or set of algorithms across a variety of languages.
It's fine to say that this method of computing Fibonacci sequences is
inefficient; but anyone who's spent any time on the problem knows that
recursive function calls is not the most efficient way to calculate
them in most languages. So it must follow logically that the point of
the benchmark is not to prove that Julia is faster than Python at
solving Fibonacci sequences, but rather that Julia is faster than
Python at solving the class of problems that would be implemented in a
similar manner as this particular implementation of Fibonacci
sequences.
It is no secret that Python's interpreted function calls are slower than
function calls compiled to machine code and that Python's infinite
precision integer arithmetic is slower that machine int arithmetic. So
there is almost no point in combining both in a naive drastically
inefficient algorithm and declaring that Python is slower.

As to the vague 'class of problems implemented in a similar manner':
Any function f of count N that depends of values of f for counts < N can
be memoized the same way in Python as fibonacci. Everything said about
P vs J for fib applies to such similar problems. The same is almost
true for functions that depend on a pair of counts, such as the number
of combinations of N things taken K at a time. The time benefit per
space used is just a bit less.

Now consider a general recursive problem: a graph with N nodes and E
edges and recursive node functions that depend on the value of the
function at args(node) other nodes. Example: the minimum length of a
path from node a to node b. In the field of graph algorithms, it is
completely routine to save f(node) for intermediate nodes when
calculated. The idea of naively implementing the recursive definition,
in any non-trivial practical situation, without saving intermediate
values, is pretty ludicrous.

Fib can be viewed as a function on a directed graph where all but the 2
base nodes have two incoming and two outgoing edges.
--
Terry Jan Reedy
Chris Angelico
2018-02-23 08:58:01 UTC
Permalink
As to the vague 'class of problems implemented in a similar manner': Any
function f of count N that depends of values of f for counts < N can be
memoized the same way in Python as fibonacci. Everything said about P vs J
for fib applies to such similar problems. The same is almost true for
functions that depend on a pair of counts, such as the number of
combinations of N things taken K at a time. The time benefit per space used
is just a bit less.
The naive recursive Fibonacci function can be memoized with maxsize=2
and it gets virtually all the benefit - in fact, it becomes
effectively iterative. (I think even maxsize=1 will do that.) With
most memoization, that kind of benefit is less blatant.

ChrisA
Python
2018-02-23 16:50:50 UTC
Permalink
Post by Terry Reedy
It is no secret that Python's interpreted function calls are slower
than function calls compiled to machine code and that Python's
infinite precision integer arithmetic is slower that machine int
arithmetic. So there is almost no point in combining both in a
naive drastically inefficient algorithm and declaring that Python is
slower.
I never said the benchmarks chosen were awesome... :-) What I'm
saying is this:

1. Insistence that the most efficient python implementation of Fib
completely misses the point (and defeats the purpose) of the
benchmarks, and as such the entire article is lost in the weeds.

2. Inasmuch as someone might want to measure the performance of
recursive function calls in a particular language, the choice
of recursively implementing fib is a fine one, and useful for that
purpose.

3. If you want to say that choosing the most efficient implementation
in Python is the only way to do a fair comparison, then you must
also choose the most efficient implementation in Julia, which this
is not, by all appearances. But again, this misses the point of
the benchmark and defeats its purpose.

Now if you want to complain that the purpose is silly, I won't even
argue that... In 30 years of writing code I don't think I've ever
imiplemented a recursive solution to any problem that wasn't purely
academic. Maybe once or twice... maybe. But the purpose is what it
is, and arguing that it must be done a different way is bogus.
Steven D'Aprano
2018-02-23 18:14:22 UTC
Permalink
Post by Terry Reedy
It is no secret that Python's interpreted function calls are slower
than function calls compiled to machine code and that Python's infinite
precision integer arithmetic is slower that machine int arithmetic.
So there is almost no point in combining both in a naive drastically
inefficient algorithm and declaring that Python is slower.
I never said the benchmarks chosen were awesome... :-) What I'm saying
1. Insistence that the most efficient python implementation of Fib
completely misses the point (and defeats the purpose) of the
benchmarks, and as such the entire article is lost in the weeds.
The point of the benchmarks is to highlight how great Julia is, and why
people should choose it over Python.

The point of the article is to show how to take badly written,
unidiomatic, slow code written in Python, and modify it to be better,
faster code competitive (if not faster than) Julia, while remaining
within the Python ecosystem.

We aren't limited to the two options the benchmarks suggest: "Fast Julia!
Slow Python!". (Three, if you include C.) We can take a third option: "we
reject your crappy algorithms and write code like a professional".

https://allthetropes.org/wiki/Take_a_Third_Option
--
Steve
Python
2018-02-23 19:03:34 UTC
Permalink
Post by Steven D'Aprano
I never said the benchmarks chosen were awesome... :-) What I'm saying
1. Insistence that the most efficient python implementation of Fib
completely misses the point (and defeats the purpose) of the
benchmarks, and as such the entire article is lost in the weeds.
The point of the benchmarks is to highlight how great Julia is
For a particular set of common code patterns. You left that part out,
and it's the crux of the issue.
Post by Steven D'Aprano
and why people should choose it over Python.
I'd argue not why, but WHEN. Or really I'd argue nothing of the
sort... and simply point out that the benchmarks do exactly what the
Julia team say they do. It's up to you to decide how useful that is.
Post by Steven D'Aprano
The point of the article is to show how to take badly written,
unidiomatic, slow code written in Python, and modify it to be better,
faster code competitive (if not faster than) Julia, while remaining
within the Python ecosystem.
Even if that was the point, the article fails, because it compares
non-optimized Julia code to optimized Pyhton code.
Post by Steven D'Aprano
We aren't limited to the two options the benchmarks suggest
It actually suggested quite a lot more than that... benchmarks
compared the performance of 12 languages, using exactly the same
programming patterns (as near as I can tell anyway, I didn't
scrutinize all of the code).
Rhodri James
2018-02-22 12:09:30 UTC
Permalink
Post by Steven D'Aprano
https://www.ibm.com/developerworks/community/blogs/jfp/entry/Python_Meets_Julia_Micro_Performance?lang=en
Interesting article. I can't help but feel that using Cython is
cheating a bit, and I was really expecting a bit more Pythonic rewriting
of the benchmarks Pythonic, but still food for thought.
--
Rhodri James *-* Kynesim Ltd
Stefan Behnel
2018-02-23 10:00:28 UTC
Permalink
Post by Steven D'Aprano
https://www.ibm.com/developerworks/community/blogs/jfp/entry/Python_Meets_Julia_Micro_Performance?lang=en
Thanks for sharing, Steven.

While it was already suggested between the lines in some of the replies,
I'd like to emphasise that the combination of timeit and result caching
(memoizing) in this post is misleading and not meaningful. It actually
shows a common mistake that easily happens when benchmarking code.

Since timeit repeats the benchmark runs and takes the minimum, it will
*always* return the time it takes to look up the final result in the cache,
and never report the actual performance of the code that is meant to be
benchmarked here. From what I read, this was probably not intended by the
author of the post.

Myself, I'm guilty as charged to run into this more than once and have seen
it in many occasions. I've even seen proper Cython benchmark code that a C
compiler can fully analyse as static and replaces by a constant, and then
get wonder speedups from it that would never translate to any real-world
gains. These things happen, but they are mistakes. All they tell us is that
we must always be careful when evaluating benchmarks and their results, and
to take good care that the benchmarks match the intended need, which is to
help us understand and solve a real-world performance problem [1].

Stefan


[1] That's also the problem in the post above and in the original
benchmarks it refers to: there is no real-world problem to be solved here.
Someone wrote slow code and then showed that one language can evaluate that
slow code faster than another. Nothing to see here, keep walking...
Steven D'Aprano
2018-02-23 10:20:35 UTC
Permalink
I've even seen proper Cython benchmark code that a C compiler can fully
analyse as static and replaces by a constant, and then get wonder
speedups from it that would never translate to any real-world gains.
This is one of the reasons why typical benchmarks are junk. If you're not
benchmarking actual, real-world code you care about, you have no idea how
the language will perform with actual, real-world code.

There is a popular meme that "Python is slow" based on the results of
poorly-written code. And yet, in *real world code* that people actually
use, Python's performance is not that bad -- especially when you play to
its strengths, rather than its weaknesses, and treat it as a glue
language for stitching together high-performance libraries written in
high-performance languages like C, Fortran, Rust and Java.

Speaking of bad benchmarks, I recall an anecdote about a programmer
running some benchmarks on Fortran code on various mainframes, and found
that one specific machine was millions of times faster than the others.

It turns out that the code being tested was something like this
(translated into Python):

for i in range(1, 1000000):
pass


and the Fortan compiler on that one machine was clever enough to treat it
as dead code and remove it, turning the entire benchmark into a single
no-op.

Your point that we must consider benchmarkers carefully is a good one.
--
Steve
Steven D'Aprano
2018-02-23 10:21:07 UTC
Permalink
I've even seen proper Cython benchmark code that a C compiler can fully
analyse as static and replaces by a constant, and then get wonder
speedups from it that would never translate to any real-world gains.
This is one of the reasons why typical benchmarks are junk. If you're not
benchmarking actual, real-world code you care about, you have no idea how
the language will perform with actual, real-world code.

There is a popular meme that "Python is slow" based on the results of
poorly-written code. And yet, in *real world code* that people actually
use, Python's performance is not that bad -- especially when you play to
its strengths, rather than its weaknesses, and treat it as a glue
language for stitching together high-performance libraries written in
high-performance languages like C, Fortran, Rust and Java.

Speaking of bad benchmarks, I recall an anecdote about a programmer
running some benchmarks on Fortran code on various mainframes, and found
that one specific machine was millions of times faster than the others.

It turns out that the code being tested was something like this
(translated into Python):

for i in range(1, 1000000):
pass


and the Fortan compiler on that one machine was clever enough to treat it
as dead code and remove it, turning the entire benchmark into a single
no-op.

Your point that we must consider benchmarkers carefully is a good one.
--
Steve
Python
2018-02-23 20:02:09 UTC
Permalink
Post by Python
Post by Chris Angelico
Post by Python
It seems abundantly clear to me that testing recursion is the point of
writing a benchmark implementing recursion (and very little of
anything else). Again, you can decide for yourself the suitability of
the benchmark, but I don't think you can really claim it doesn't
effectively test what it means to.
Where do you get that it's specifically a recursion benchmark though?
I can't find it anywhere in the explanatory text.
I don't know how you could conclude it could be meant to test anything
else.
Actually I will make one small correction, which is to say that the
benchmark tests the performance of algorithms which are predominated
by many function calls, of which recursion is an obvious subset. But
I think that's splitting hairs.
Continue reading on narkive:
Loading...