What is the Codex?

Some of you might have played with emulators which let your PC pretend to be the classic computers of your youth (the BBC Micro, Acorn Electron, Commodore 64 and so on). The emulators work by simulating the processors in those machines on your PC, so you end up with Chuckie Egg running on a virtual BBC Micro running on a PC running on whatever the underlying physics of the universe happen to be.

robhu found a puzzle based on a similar idea. The International Conference on Functional Programming have an annual contest. Last year’s was particularly fun:

In 1967, during excavation for the construction of a new shopping center in Monroeville, Pennsylvania, workers uncovered a vault containing a cache of ancient scrolls. Most were severely damaged, but those that could be recovered confirmed the existence of a secret society long suspected to have been active in the region around the year 200 BC… Among the documents found intact in the Monroeville collection was a lengthy codex, written in no known language and inscribed with superhuman precision.

You’re given the codex, and the specification for a processor used by the Cult of the Bound Variable. Once you’ve written an emulator for that processor, you’re off (unless, like me, you write the emulator in a few hours and spend half a day corrupting the files from the contest website with your editor and wondering why it isn’t working). There’s an ancient Unix system, complete with ancient spam, and various puzzles which I’ve not had a chance to look at yet.

According to reports on the contest, lots of people tried to write the emulator in high level languages where it ran like treacle. A C implementation like mine (which may count as a spoiler for people who want to do it themselves) without any namby-pamby array bounds checking nonsense works fast enough that you can’t tell you’re not talking to a real ancient Unix system. There’s life in the C language yet.

I’m lost in admiration for the people who produced the codex. It required writing a compiler backend targeting the fictional processor, and implementing various other languages on top of that, as well as coming up with the puzzles. I’m not sure how far I’ll get with the puzzles before my ability or patience expires, as I’m a hacker rather than a computer scientist, but still, the thing is amazingly cool. It’s a time sink for geeks, so I pass it on to those of you who, like me, will waste many days on it. Have fun.

10 thoughts on “What is the Codex?

  1. I suppose it depends on what you mean by ‘high level’ language.

    I would consider C# to be a high level language compared to C, and my implementation in that is not too shabby. According to the ICFP paper the contest designers did a C# implementation that ran half as fast as their C implementation (with Java taking about 2.2 times longer to run, and Python about 650 times).

    Page 11 onwards of the paper is safe to read btw (that’s the section on the Contest Technology). Well, safe except for possibly the score table on page 14.

  2. You might be interested in the benchmark speeds I’m getting here:
    My C# implementation (with all the safety turned on)

    SANDmark complete.
       
       real    3m42.484s
       user    3m39.497s
       sys     0m0.626s

    A C implementation I found online somewhere (edwark.c)

    SANDmark complete.
       
       real    0m57.565s
       user    0m56.553s
       sys     0m0.062s

    Your implementation

    SANDmark complete.
       
       real    0m56.692s
       user    0m55.533s
       sys     0m0.139s

    When we talked earlier you said your implementation was taking about 4 minutes on your computer (a G4 laptop?). It’s quite something that my (reasonably modern but not top of the range) laptop is so much faster. I have a Core 2 Duo (T5550) with 2MB of cache (per core), running at 1.66Ghz.

    1. Optimizing the sandmark is kind of a waste of time — you can speed it up a lot without much speeding up UMIX from the Codex. Better to use a ‘real’ sample workload.

      Also, the edwardk implementation blew up when I tried to run UMIX with it.

  3. The ICFP have had some great problems over the years, but I agree, this was one of the best.

    In terms of programming language efficiency, I’m intrigued by some of the ideas coming from the academic functional programming community recently. Some of the guys from the CU Computer Lab or Microsoft Research seem to have been talking about getting Haskell running with performance within a factor of 2 or 3 of C recently, not through the sort of low-level optimisations for which C is best known, but through much higher-level cleverness (such as never running code whose result never matters — a natural thing to do in a lazy functional world, but a difficult thing to get right in imperative C-land). Put that with industrial grade compilers that also implement the kind of low-level optimisations used by mainstream C compilers, and they’ll be within striking distance for most practical purposes. Then combine that with the fact that functional languages have natural advantages on multi-core machines, and the fact that they’re now dealing with concepts like I/O within their very elegant underlying models instead of via hacks, and I think the days of C (or C++) as the de facto standard high-performance language may be numbered.

    Now if only they could produce a language that worked with all those neat models underneath, and had a syntax that didn’t require a PhD to understand on top, they’d really be onto something. 🙂

  4. I think your implementation might be relying on undefined behaviour – at least, you’re using word_count without having initialised it, and this causes it not to work when I run it.

    Still, yours is better than mine, which doesn’t work at all yet.

    1. Argh. Thanks for pointing that out. I’ve fixed the copy on the web now.

      (The other thing it relies on is that a pointer will fit in a uint32, but at least I’ve commented that).

Leave a Reply

Your email address will not be published. Required fields are marked *