Some of you might have played with emulators which let your PC pretend to be the classic computers of your youth (the BBC Micro, Acorn Electron, Commodore 64 and so on). The emulators work by simulating the processors in those machines on your PC, so you end up with Chuckie Egg running on a virtual BBC Micro running on a PC running on whatever the underlying physics of the universe happen to be.
robhu found a puzzle based on a similar idea. The International Conference on Functional Programming have an annual contest. Last year’s was particularly fun:
In 1967, during excavation for the construction of a new shopping center in Monroeville, Pennsylvania, workers uncovered a vault containing a cache of ancient scrolls. Most were severely damaged, but those that could be recovered confirmed the existence of a secret society long suspected to have been active in the region around the year 200 BC… Among the documents found intact in the Monroeville collection was a lengthy codex, written in no known language and inscribed with superhuman precision.
You’re given the codex, and the specification for a processor used by the Cult of the Bound Variable. Once you’ve written an emulator for that processor, you’re off (unless, like me, you write the emulator in a few hours and spend half a day corrupting the files from the contest website with your editor and wondering why it isn’t working). There’s an ancient Unix system, complete with ancient spam, and various puzzles which I’ve not had a chance to look at yet.
According to reports on the contest, lots of people tried to write the emulator in high level languages where it ran like treacle. A C implementation like mine (which may count as a spoiler for people who want to do it themselves) without any namby-pamby array bounds checking nonsense works fast enough that you can’t tell you’re not talking to a real ancient Unix system. There’s life in the C language yet.
I’m lost in admiration for the people who produced the codex. It required writing a compiler backend targeting the fictional processor, and implementing various other languages on top of that, as well as coming up with the puzzles. I’m not sure how far I’ll get with the puzzles before my ability or patience expires, as I’m a hacker rather than a computer scientist, but still, the thing is amazingly cool. It’s a time sink for geeks, so I pass it on to those of you who, like me, will waste many days on it. Have fun.
I suppose it depends on what you mean by ‘high level’ language.
I would consider C# to be a high level language compared to C, and my implementation in that is not too shabby. According to the ICFP paper the contest designers did a C# implementation that ran half as fast as their C implementation (with Java taking about 2.2 times longer to run, and Python about 650 times).
Page 11 onwards of the paper is safe to read btw (that’s the section on the Contest Technology). Well, safe except for possibly the score table on page 14.
You might be interested in the benchmark speeds I’m getting here:
My C# implementation (with all the safety turned on)
A C implementation I found online somewhere (edwark.c)
Your implementation
When we talked earlier you said your implementation was taking about 4 minutes on your computer (a G4 laptop?). It’s quite something that my (reasonably modern but not top of the range) laptop is so much faster. I have a Core 2 Duo (T5550) with 2MB of cache (per core), running at 1.66Ghz.
… and I only save 10 seconds by compiling my C# in ‘unsafe’ mode.
Optimizing the sandmark is kind of a waste of time — you can speed it up a lot without much speeding up UMIX from the Codex. Better to use a ‘real’ sample workload.
Also, the edwardk implementation blew up when I tried to run UMIX with it.
Hi – what has piqued your interest in ICFP so long since the contest? 🙂
There’s life in the C language yet.
C is still our main implementation language at work.
Ours also, though there’s quite an increasing amount of Python floating around too.
The ICFP have had some great problems over the years, but I agree, this was one of the best.
In terms of programming language efficiency, I’m intrigued by some of the ideas coming from the academic functional programming community recently. Some of the guys from the CU Computer Lab or Microsoft Research seem to have been talking about getting Haskell running with performance within a factor of 2 or 3 of C recently, not through the sort of low-level optimisations for which C is best known, but through much higher-level cleverness (such as never running code whose result never matters — a natural thing to do in a lazy functional world, but a difficult thing to get right in imperative C-land). Put that with industrial grade compilers that also implement the kind of low-level optimisations used by mainstream C compilers, and they’ll be within striking distance for most practical purposes. Then combine that with the fact that functional languages have natural advantages on multi-core machines, and the fact that they’re now dealing with concepts like I/O within their very elegant underlying models instead of via hacks, and I think the days of C (or C++) as the de facto standard high-performance language may be numbered.
Now if only they could produce a language that worked with all those neat models underneath, and had a syntax that didn’t require a PhD to understand on top, they’d really be onto something. 🙂
I think your implementation might be relying on undefined behaviour – at least, you’re using word_count without having initialised it, and this causes it not to work when I run it.
Still, yours is better than mine, which doesn’t work at all yet.
Argh. Thanks for pointing that out. I’ve fixed the copy on the web now.
(The other thing it relies on is that a pointer will fit in a uint32, but at least I’ve commented that).