Thursday 16 June 2011

Enhancing Builtin Functions.

A Builtin Function is a function written in a low-level programming langauge.
In CPython, they are written in C. In Jython, its Java, IronPython uses C# and PyPy uses RPython.

Currently, in CPython, builtin functions come in four different types:

>>> type(list.append)
<class 'method_descriptor'>
>>> type(list.__add__)
<class 'wrapper_descriptor'>
>>> type(list.__new__)
<class 'builtin_function_or_method'>
>>> type(int.__add__)
<class 'wrapper_descriptor'>

The third one of these, 'builtin_function_or_method', doesn't seem to know what whether it is a function or a method.

Only two types are actually required: builtin-functions and builtin-methods.
The difference between the two is how they act when used as a descriptor.
Builtin-methods act like Python functions, in that they return a bound-method when used a descriptor.
Builtin-functions are not descriptors and are not bound to class instances.

>>> class MyList(list): pass

>>> MyList.m0 = print
>>> MyList.m1 = list.append
>>> l = MyList()

>>> l.m0
<built-in function print>
>>> l.m1
<built-in method append of MyList object at 0xb7ae6a04

>>> MyList.m0 == l.m0
True
>>> MyList.m1 == l.m1
False

There is no requirement for l.m1 to be a "built-in method", a normal bound-method would be fine.
Bizarrely, in CPython the type of the bound builtin method, l.m1 is the same as the type of the builtin function print.

>>> type(l.m1) == type(print)
True

First of all these types need to be rationalised a bit:
  • Rename  method_descriptor to builtin_method.
  • Rename builtin_function_or_method to builtin_function.
  • When a builtin_method is bound to a class instance it should produce a (bound) method not a builtin_function_or_method.
In order to optimise calls to builtin function, we need to know something about them. Currently a builtin function can take a limited range of parameter formats.
The allowed formats are:
  • f(self)
  • f(self, other)
  • f(self, *pos, **kws)
The type of its parameters cannot be specified.
For example, the __add__ method of int will fail if its first parameter is not an int, but this information is not available to the VM or the programmer.

I propose allowing a wider range of parameter formats and to specify the allowed types. Any number of parameters between 0 and 3 (maybe 4) should be allowed, with or without * parameters or ** keyword parameters.

For example the int.__add__  builtin_method would take 2 parameters,
with the parameter types (int, object).
The list.__setitem__ builtin_method would take 3 parameters, with the parameter types (list, object, object).
The print  builtin_function would take 0 parameters plus * and ** parameters: print(*args, **kws).

The wrapper_descriptor class can then be deleted as builtin_method can
fulfil its role.

In summary, this change would allow all builtin-functions to have a consistent interface to the VM, which would assist optimisation. It would also reduce code size by removing a number of classes.

No comments:

Post a Comment