Sep 17

Genshi Python 3 Sprint

Awesome Sprint report from Simon Cross:

Nine of us gathered at the Yola offices in Cape Town on the morning of September 4th to attempt porting Genshi to Python 3.

The Genshi sprinters

The first hour passed quickly, taken up by exploring the Genshi code base and reading up on others' experiences migrating to Python 3. Various admin tasks -- dishing out repository access and making coffee -- happened in the background.

Once everyone had set up and had some idea of the scope of the task at hand we settled on a two-phase approach. The first phase would be to create a branch that would pass Genshi's test suite under Python 3 but without maintaining compatibility with Python 2. When that was complete we would move on to creating a single codebase that supported both Python 2 and 3, possibly via 2to3.

We kicked off the first phase by running 2to3 and committing the result to the branch. This left us with syntactically valid Python 3 code and a lot of failing tests. The majority of failures resulted from:

- Python 2 strings that needed to be changed to byte strings. - Python 2 AST node handling that needed to be updated. - Unicode string literals in doctest results and inside other string constants that needed to be translated.

We eventually created a custom 2to3 fixer for handling the unicode string literals.

Genshi supports both encoded streams of bytes and unicode strings in its parsers and generators and it handles the two cases quite cleanly but defaults to assuming that input and output are UTF-8 bytes. This made sense for Python 2 where byte strings are the default (or at least one less character to type) but makes less sense for Python 3 where unicode strings are the default. We made a decision to change the Genshi defaults in our port to unicode for both Python 2 and 3 (with the hopes of helping users of Genshi support both Python 2 and 3 more easily in their own code). This API change didn't require changes to the applications we tested against but we haven't tried any particularly large or complex ones yet.

Around dinner time we had a branch that passed the test suite under Python 3 and ran some simple example applications (for the purposes of which we did a very rough port of FormEncode) so we broke for pizza.

Fuelled by pizza and coffee from our Zen-of-Python mugs we dived into stage two. This consisted of grabbing sets of changes from phase one and deciding how best to incorporate them in a unified code base. We settled on a hybrid approach using a combination of running 2to3 from (via Distribute), a small module of useful compatibility utilities and some "if IS_PYTHON2: ... else: ..." blocks.

We had originally thought we might avoid the use of 2to3 and simply have one codebase but the number of simple cases handled successfully by 2to3 convinced us that including it was worth the slight inconvenience.

Once stage two was under control there was a sub-sprint to port Genshi's C extension module which went surprisingly swiftly and smoothly (a scattering of #ifdefs and it was done).

At around 2:00am on Sunday morning, after roughly fifteen hours of coding, we had a branch that passed the entire test suite under Python 2.4 to 2.6 and under Python 3.1 after applying 2to3 so we declared victory and staggered home.

Of course the sprint is never truly over until the code is committed -- currently the patch has been posted to the Genshi mailing list and discussions are underway to get it included (as is often the case the biggest stumbling block is finding someone with time and commit access :).

The two-stage approach worked well for us, allowing us to concentrate on discovering what changes were needed in the first stage without the overhead of continually running 2to3 or mentally switching between Python 2 and Python 3.

Links to the sprint repository available on the sprint planning page: