Thursday 16 June 2011

Improving the Garbage Collector

It is often stated that "CPython does not have Garbage Collection".
Of course it does, otherwise Python programs would die from lack of memory almost before they started.
What people mean is that CPython does not have a very good garbage collection system.

So what's wrong with the system that does CPython have?
"Its reference counting, and reference counting is slow" is the standard answer.
But that's not all there is to it.

It is possible to make a high-performance GC using Reference Counting:
http://www.cs.umass.edu/~steveb/pubs/papers/urc-oopsla-2003.pdf
but not with the sort of simple reference counting that CPython uses.

So what exactly is wrong with Python's memory management system?

First, it maintains reference counts for references from the stack, as well as from global variables and object heap objects. This is very expensive, constantly incrementing and then decrementing the reference counts for short-lived objects.

The second problem is that the garbage collection algorithm determines how an object can be allocated. Fast allocation is impossible with a reference counted GC (this is also true for a mark & sweep collector).

Third of all, its non-generational.A generational collector divides all the objects on the heap into young and old objects and treats them differently. The young objects are collected frequently whereas the older objects are collected infrequently.
Generational collectors are generally faster.

The fourth, and final, problem is that reference counting is not complete,
meaning that it does not collect all garbage. It must be enhanced with a cycle collector. The CPython cycle collector has quite a high overhead, both in terms of time and memory used.

So how to change the Garbage Collector?
First we need to define an interface between the memory management sub-system and  the rest of the VM.
The current interface is superficially simple: the rest of the system modifies the ref-counts for objects, when the ref-count drops to zero, the garbage collector frees the object.

Unfortunately its not that simple.
Objects in Python can be finalised and it is possible to have weakrefs to an object.

So once an object has zero ref-count its weakrefs must be set to None, possibly calling call-backs. Once this is done the finaliser __del__ should be called. In order to pass a reference to an object to a finaliser then there must be a reference to it, in which case it ref-count should be non-zero.
To free an object, there can be no reference to it, so its ref-count should be zero.

CPython muddles this up a bit, it fails to keep finalization and deallocation separate.

So before a new GC can be implemented, finalisation needs sorting out:

So when an object has no references to it do the following
If object has a finalizer or a weakref:
      finalize_list.append(obj)
      while finalize_list:
            o =  finalize_list.pop()
            process_weakrefs(o)
            call_finalizer(o)
Otherwise:
      free_memory(obj)

A final detail is that objects must be finalized in topological order.
(PyPy handles this issue explicitly, it happens implicitly in ref-counted system)
This means that if object a refers to object b then a must be finalized before b. What happens when a and b form a cycle? CPython finalizes neither, PyPy choses an arbitrary order. I think PyPy is correct, but that's just my opinion, it hasn't been formalized.

Once that is done, we can implement a new GC. The new GC needs to provide ref-counting in order to support 3rd party code, but it shouldn't use ref-counting internally.

It is possible to have a mixed mode GC that supports ref-counting and tracing.
This paper explains how the ref-counting and tracing GC can work together:
http://www.research.ibm.com/people/d/dfb/papers/Bacon04Unified.pdf

The proposed implementation would work as follows:
It would be a generational collector.
Objects allocated internally by the VM are allocated using a bump-pointer allocator (the fastest possible allocator) in the nursery.
We will call these objects T-objects (traced-objects).

During a minor collection, live T-objects in the nursery are copied to the mature space.
In order to support generational collection, all T-objects must use a write-barrier when then are modified. A code analysis tool will be required to do this correctly.

Objects allocated by 3rd party code are allocated using a malloc-style allocator. We will call these R-objects (reference-counted objects). R-objects do not have a generation, they are always deallocated in the first minor collection that they become unreachable.

If any T-object has its reference count set, it is added to the T-counted set.
At collection, all objects in the T-counted set are removed from the T-counted set if their ref-count is 0, otherwise they are treated as roots by the garbage collector.

If an R-object has its reference count set to zero, it is added to a R-zero set.
At collection, if there are no traced references to an object in the R-zero set,
then it is freed, taking care to handle finalizers properly.

Collections would work as follows.

Minor Collection:
  • Scan C stack to find (conservatively) all T-objects that may be referred to (this includes internal pointers), appending to C-list.
  • Remove all objects from T-counted set that have 0 ref-count.
  • Trace heap (excluding any mature-space T-objects or any R-objects), from the following roots:
    • The T-counted set.
    • The C-list.
    • Global variables.
    • Modified objects in the mature space.

Reachable T-objects in the nursery are copied if they have a zero ref-count, otherwise they are promoted inplace.

Major Collection:
  • Scan C stack to find (conservatively) all T-objects that may be referred to (this includes internal pointers), appending to C-list.
  • Remove all objects from T-counted set that have 0 ref-count.
  • Trace heap, but do not trace any R-objects, with the following roots:
    • The T-counted set.
    • The C-list.
    • Global variables
In order to promote inplace, the heap must be divided into pages. This division can be used to rapidly determine if an object is a T-object or a R-object, by marking the page.

No comments:

Post a Comment