A plan for Benchmarking Cython+

Goals

As part of the Cython+ project, this subproject will have to help evaluate the benefits of Cython+ against the statu quo (CPython and Stefan Behnel's Cython) and alternative approaches (various Python JIT accelerators, including Numba, and).

More specifically, we will have to evaluate Cython+ along two axes:

  1. How more performant (in terms of elapsed time, CPU and memory consumption) Cython+ is.
  2. How improved (or degraded) the developer experience (DX) is relative to the baselines, in terms of:
    1. Compilation time
    2. Startup time
    3. How different it is from standard Python (in terms of both added and removed features)
    4. Support from and from the Python tooling ecosystem (formatters, linters, IDEs...) and the Python librairies
    5. Language support for things that can't be done (easily) in standard Python

The "performance" part is mostly numerical. The "DX" part has some numerical, but most of it is qualitative.

Current state

I have created a project for running and reporting benchmarks of Cython+ against both regular implementations of Python and Cython (including some exotic variants), and other languages:

General observations

3 Levels of benchmarks

We will have to consider three main categories of benchmarks, which each provide different insights:

  1. Micro and mini benchmarks (from 10 to a few 100s of LOC). This is the easiest category to start with, and also the only one that allow rigourous comparison between languages or language variants.

  2. Benchmarking based on existing librairies (ex: SQLAlchemy, Jinja2, etc.) or applications (ex: PyDis, see below). This category is probably the most useful for users, but it will be quite hard to do in our context since a big part of the benefits of using Cython comes at the price of changes in the syntax (if not the semantics) of the language.

  3. Benchmarking of applications specifically written for the project (see below).

Benchmarks for various use cases

Benchmarks are only relevant for a class of applications.

We will classify our micro-benchmarks in several categories:

  1. Non-numerical algorithms
  2. Numerical algorithms
    1. Scalar
    2. Vector
  3. Networked apps

And provide synthetic marks for various usage profiles based on the weighted results of these benchmarks.

Benchmarks against relevant contenders

No need to benchmark against all the existing language. We will choose a subset:

Current plan

Projets to Cythonize+ for benchmarking

Pydis

Additional databases that could be cythonized+ and/or used as benchmarks without changes:

Web server / app server

See dedicated page.

Notes

Google originally optimized the V8 JIT using the Richards benchmark, because its a good test of polymorphism and how classes are often used. -- Source: https://medium.com/analytics-vidhya/77x-faster-than-rustpython-f8331c46aea1

Older benchmarks (by Nexedi)

La page : - https://www.nexedi.com/NXD-Blog.Multicore.Python.HTTP.Server

contient un tableau utile pour comparer des librairies de coroutines.

Et la page : - https://www.nexedi.com/NXD-Blog.Cython.Multithreaded.Coroutines

donne une idée des performances relatives de coroutines.

Si cela t'intéresse, j'aimerais bien avoir une évaluation de haproxy vs. lwan selon les mêmes principes.