Friday, 17 June 2011

Simplifying Type and Frame Objects

Having made binary operations faster for the common case and simplified builtin-functions, it is possible to simplify the implementation of the type class.

Conceptually, for modern of versions of Python (2.4 onwards) a type object is little more than a wrapper around its dictionary.
However, the CPython implementation still has many aspects it inherited from earlier versions of Python. It has a large number of internal "slots" which are C functions of differing types, that sometimes don't quite match the apparent type of the Python objects they are supposed to represent.

For example, each class has a sq_item slot which, supports item lookup on sequences, provided that the index can be passed as a C integer. However, the Python [ ] operator can take arbitrary objects as indices, making the slot useless.

These slots can be replaced with a smaller, regular array which will hold regular Python objects.
The original slots must be retained, so that third party extensions can define types. It is just that they will not be used internally by the VM. Each original slot will be used to create a builtin-function.
By replacing the old slots with a more regular alternative, the code for updating and modifying these slots can be much simplified, making it more amenable to subsequent optimisation.


The frame class is another class whose objects can be slimmed down. Frames are allocated and deallocated at a very high rate, which puts a lot of pressure on the garbage collection subsystem.
CPython maintains a "zombie" frame for each function, so that frames do not have to be created every time a function is called (unless it is called recursively). This creates problems of its own, as all the local variables have to be set to NULL when the function returns.

A CPython frame object (on a 32 bit machine) occupies 326 bytes, plus 4 bytes per local variable. A frame object in HotPy takes 44 bytes, plus 4 per local variable. Why is the CPython frame so enormous?
The largest part of the CPython frame is the block stack, which is implemented as a fixed size array, which takes up 244 bytes.  Reimplementing it as linked list of objects would reduce the amount of memory used by a large amount. The remaining overhead is the evaluation stack and exception state which can be moved to the thead-state, and the scopes (builtins, globals and locals) which could be moved to a separate object, retained by the owning function.

Once the CPython frame has been shunk to a reasonable size, it can be allocated and deallocated like any other object.

No comments:

Post a Comment