Wednesday, 13 April 2011

Faster Binary Operators

Before moving on to the more complex modifications, we will start with an optimization that will speed up almost all code, and requires only small changes to the code base.

Python has a unique way of implementing binary operators, which works as follows; using the addition a + b as the example:
  • Call the special method __add__ on the left hand operator a.__add__(b)
  • If that fails, either because a does not have an __add__ method or because a.__add__(b) returns NotImplemented, then:
  • Call the special method __radd__ on the right hand operator b.__radd__(a).
There is a special case where the type of b is a subtype of the type of a, in which case the __radd__ method is tried first.

The optimization we wish to implement relies on the fact that built-in types such as int and list cannot be monkey-patched (altered at runtime) and that they are commonly used in binary expressions.

Binary operators can be sped up by attaching an index to all types.
Common built-in types are given an index between 1 to 15 inclusive, all other types have index 0.
Each binary operator consists of a table of 256 functions, indexed from 0 to 255. A binary expression is evaluated by generating an index from the types of a and b: indexof(a) * 16 + indexof(b). If a function is present in the table at that location then it is called, if not then the standard implementation is used. For example, suppose that int has an index of 3 and float has an index of 5, then the expression 1 + 1.0 will be resolved by calculating
indexof(int) * 16 + indexof(float) = 3 * 16 + 5 = 53.
A pointer to the function for adding ints to floats is stored at index 53, and the addition is performed with no additional look-up.

This will work perfectly correctly if all entries are NULL, functions can then be added and tested one at a time. Operator functions do not need to perform type checking, since they can only be accessed through the table.

Although the speed up will not be large, it will be measurable and has two other benefits.
First, it makes specializing (modification number 14) binary operators easier and faster.
Second, it prevents degradation of the performance of unspecialized binary operators that might occur due to enhanced builtin-functions (modification number 4).

Possible CPython Implementation

The type index could be stored in the tp_flags field of the type object, using bits 0 to 3. This would mean that subtype flags would need to be masked when inheriting, to prevent subtypes of built-in types from having non-zero indices.

Function look up can be performed with the expression:
operator_table[((Py_TYPE(a)->tp_flags&15)<<4)+(Py_TYPE(b)->tp_flags&15)]

Rather than storing NULL in unused slots, the original function, eg. PyNumber_Add, can be stored instead.

The following types are probably worth giving no-zero indices:
  1. int
  2. float
  3. complex
  4. bool
  5. list
  6. dict
  7. set
  8. frozenset
  9. tuple
  10. str
  11. bytes
  12. bytearray
Leaving three spare.

No comments:

Post a Comment