Changes! Now I’m free to work “full-time” on my GSoC project, last Friday was my last day on my old job (Ikwa)
Today I’ve read lots of docs and code, specially the pyparser code. I’ll try to maintain a list of features I’m currently working on here.
Last week (ok, Friday before last Friday) we went to Werneck’s cave to code. One thing I’ve made there was implement the defaultdict object in the collections module. This module is written in C originally, so I needed to port it.
The code is relatively simple and I’ve just “copied” the C implementation (of course it’s clearer and more beautiful in Python
). Werneck also helped me with the pyparser but we couldn’t find a solution for supporting the full Python 2.5 Grammar .
Today I’ve worked on a small Python 2.5 change, the empty base classes list syntax.
Until Python 2.5 the list of base classes could not be empty. So you couldn’t define a class like:
class A():
pass
The right way for doing this was
class A:
pass
Now, in Python 2.5, both ways are legal.
Last week I’ve committed a slight changed Grammar for Python 2.5 syntax support to PyPy’s 2.5-features branch (the only change I’ve made was removing the support for the new import syntax, as it crashes the parser, I need to work on that).
With part of the 2.5 Grammar already supported, to support this new class definition syntax the only change needed was in pyparser (more specifically to the AST builder).
While the build_classdef method expected 4 atoms (< class keyword >, < white space >, < class name > and < comma >) or 7 (the same 4 plus < ( >, < base classes > and < ) > before the < comma >) in Python 2.4, now in Python 2.5 it can receive 6 atoms too (< class keyword >, < white space >, < class name >, < ( >, < ) > and < comma >). The change was really simple, but it was a good exercise for me because I could apply what I’m learning in my compilers classes at school (while reading the sources)
Some code (pypy/interpreter/pyparser/astbuilder.py:634):
def build_classdef(builder, nb):
...
if l == 4: # class NAME:
basenames = []
body = atoms[3]
elif l == 6: # class NAME(): # 2.5
basenames = []
body = atoms[5]
else:
assert l == 7
basenames = []
body = atoms[6]
base = atoms[3]
...
I don’t think I need to explain this code, it’s very simple (of course, it’s Python!
).
I’ve made some tests and it seemed to work great, but I’ve decided to compare the bytecode generated by PyPy and by CPython (both 2.4 and 2.5). While in CPython 2.4 the statement “class A: pass” produces the following bytecode:
>>> from dis import dis; c = compile("class A: pass", "/dev/null", "exec"); dis(c)
1 0 LOAD_CONST 0 ('A')
3 BUILD_TUPLE 0
6 LOAD_CONST 1 ()
9 MAKE_FUNCTION 0
12 CALL_FUNCTION 0
15 BUILD_CLASS
16 STORE_NAME 0 (A)
19 LOAD_CONST 2 (None)
22 RETURN_VALUE
CPython 2.5 produces for both statements (the old syntax and the new one) something slight different:
>>> from dis import dis; c = compile("class A: pass", "/dev/null", "exec"); dis(c)
1 0 LOAD_CONST 0 ('A')
3 LOAD_CONST 3 (())
6 LOAD_CONST 1 ()
9 MAKE_FUNCTION 0
12 CALL_FUNCTION 0
15 BUILD_CLASS
16 STORE_NAME 0 (A)
19 LOAD_CONST 2 (None)
22 RETURN_VALUE
The only difference between 2.4 and 2.5 is the instruction BUILD_TUPLE in 2.4 against the LOAD_CONST in 2.5. I'm not a bytecode expert, but it seems that this is an optimization, of course I may be completely wrong
Well, as I expected the bytecode produced by PyPy is identical the one produced by CPython 2.4, so I think I have more things to change to complete this task.
But not tonight, after 14 hours of PyPy code reading and some debugging (plus Linear Programming and Computer Theory classes) I think I should sleep.
Tags: bytecode, compilers, google summer of code, gsoc, parser, pypy, python
Good luck!