Carl de Marcken: Inside Orbitz

(Here's an email Carl de Marcken of ITA Software sent to a friend, describing their experiences using Lisp in one of the software industry's most demanding applications.)

Date: Fri, 12 Jan 2001 15:42:34 -0500
From: Carl de Marcken

Geoffrey,

Here are some tidbits...

1. Right now Sabre, Galileo, Amadeus and Worldspan operate many millions of dollars of IBM and Unisys mainframes each to answer the vast majority of queries done by airline phone agents, airport desk agents, travel agents, and travel web sites (other than our own and our customers'). Their computers are housed in bomb-proof, fire-walled (literally) complexes in Kansas City, Denver, Germany and Atlanta, and mostly run assembly language code for performance reasons. From what we can discern, their algorithms are basic: until we pointed it out to them I don't think they had any understanding of how hard the problem they're trying to solve is, or how far their solutions are from optimal.

2. ITA Software is slowly replacing the industry's hardware and software with Common Lisp code running on Linux PCs, that uses relatively involved algorithms that show off our academic CS background. The easiest place to see the code in action is on our web site, www.itasoftware.com.

3. The vast majority of our "thinking" code is in Common Lisp. We run both CMUCL and Franz, under Linux/Intel, HPUX/PA, and NT/Intel, and have about 200,000 lines of Lisp in our base search engine. Our web site page generation code is also largely written in Common Lisp, though there's also fair bit of Java there.

4. Because we have about 2 gigs of static data we need rapid access to, we use C++ code to memory-map huge files containing pointerless C structs (of flights, fares, etc), and then access these from Common Lisp using foreign data accesses. A struct field access compiles into two or three instructions, so there's not really any performance. penalty for accessing C rather than Lisp objects. By doing this, we keep the Lisp garbage collector from seeing the data (to Lisp, each pointer to a C object is just a fixnum, though we do often temporarily wrap these pointers in Lisp objects to improve debuggability). Our Lisp images are therefore only about 250 megs of "working" data structures and code.

5. Every query that hits our site gets sent via tcpip to a Lisp process running on an dual 800mhz x86 Linux box with 2g of ram ($3000, vs about $1,000,000 for a similarly capable mainframe), and the process devotes between 5 and 15 seconds of CPU time to it. One of our customers will have 200 such boxes, each running 2 or 3 Lisp processes. We save on ram by putting multiple processes on one box, since the virtual memory system automatically shares our read-only memory-mapped files between processes.

6. If you want to do a simple round-trip from BOS to LAX in two weeks, coming back in three, willing to entertain a 24 hour departure window for both parts, then limiting to "reasonable" routes (at most 3 flights and at most 10 hours or so) you have about 5,000 ways to get there and 5,000 ways to get back. Listing them is a mostly trivial graph-search (there are a few minor complications, but not many), that anybody could do in a fraction of a second.

7. The real challenge is that a single fixed itinerary (a fixed set of flights from BOS to LAX and a fixed set back) with only two flights in each direction may have more than 10,000 possible combinations of applicable "fares", each fare with complex restrictions that must be checked against the flights and the other fares. That means that the search space for this simple trip is of the order 5000 x 5000 x 10000, and a naive program would need to do a _lot_ of computation just to validate each of these possibilities. Suitably formalized, its not even clear that the problem of finding the cheapest flight is NP-complete, since it is difficult to put a bound on the size of the solution that will result in the cheapest price. If you're willing to dispense with restrictions on the energy in the universe, then it is actually possible to formalize the cheapest-price problem in a not-too-unreasonable way that leads to a proof of undecidability by reduction to the Post correspondance problem :-).

8. Our Lisp code is running very clever algorithms that let us produce in a reasonable time a data structure we call the "pricing-graph" from which we can very efficiently answer a query of the form "give me the k-th best solution (a validated set of flights and fares), ordered according to the function f", assuming of course certain restrictions on f, where the number of answers represented by the pricing-graph is 10^20 - 10^30 depending on the type of trip. In this way, we can reasonably claim that in 10 seconds we can produce 10^30 answers, even if we could not possibly enumerate the list of such answers.

9. We can do 10 seconds of Lisp computation on a 800mhz box and cons less than 5k of data. This is because we pre-allocate all data structures we need and die on queries that exceed them. This may make many Lisp programmers cringe, but with a 250 meg image and real-time constraints, we can't afford to generate garbage. For example, rather than using cons, we use "cons!", which grabs cells from an array of 10,000,000 cells we've preallocated and which gets reset every query.

10. A lot of our Lisp is designed to compile into very efficient assembly. We make a lot of use of Lisp's macro capabilities, but shy away from many other Lisp features like closures, generic functions, complex sequence functions and garbage collection. We're doing an incredible amount of computation - getting 10 seconds on a modern machine is an incredible gift - but if we're sloppy at all 10 seconds can turn into ten minutes, not adequate for a travel agent or web site. We disassemble most every Lisp function looking for inefficiencies and have had both CMUCL and Franz enhanced to compile our code better.

11. Occasionally we've had to move code from Lisp to C++, usually because of data loading issues (Lisp garbage collectors just can't deal with gigs of data, and there's no way to rapidly load gigs of data into a Lisp). Our experience has been a 10 to 1 code expansion; I don't think there are any programmers in our company that regret the choice of Common Lisp.

12. We've had very little trouble getting non-Lisp programmers to read and understand and extend our Lisp code. The only real problem is that the training most programmers have in Lisp has taught them to code very inefficiently, without paying any attention to the compiler. Of course, with things like STL and Java, I think programmers of other languages are also becoming pretty ignorant.

Date: Tue, 15 Jan 2002 17:49:04 -0800
From: Carl de Marcken

Paul,

I don't have any problems with it going up on a site, but please make a note that this message is old and the world is constantly changing: we now have thousands of CPUs running our code, and various airlines and major web sites (Orbitz, e.g.) depending on it. The mainframes are disappearing as our stuff replaces it.