| |
(Here's an email Carl de Marcken of ITA Software sent to
a friend, describing their experiences using Lisp in one of the software
industry's most demanding
applications.)
Date: Fri, 12 Jan 2001 15:42:34 -0500
From: Carl de Marcken
Geoffrey,
Here are some tidbits...
1. Right now Sabre, Galileo, Amadeus and Worldspan operate many
millions of dollars of IBM and Unisys mainframes each to
answer the vast majority of queries done by airline phone agents,
airport desk agents, travel agents, and travel web sites (other than
our own and our customers'). Their computers are housed in
bomb-proof, fire-walled (literally) complexes in Kansas City, Denver,
Germany and Atlanta, and mostly run assembly language code for
performance reasons. From what we can discern, their algorithms are
basic: until we pointed it out to them I don't think they had any
understanding of how hard the problem they're trying to solve is, or
how far their solutions are from optimal.
2. ITA Software is slowly replacing the industry's hardware and
software with Common Lisp code running on Linux PCs, that uses
relatively involved algorithms that show off our academic CS
background. The easiest place to see the code in action is on our web
site,
www.itasoftware.com.
3. The vast majority of our "thinking" code is in Common Lisp. We run
both
CMUCL and
Franz,
under Linux/Intel, HPUX/PA, and NT/Intel, and
have about 200,000 lines of Lisp in our base search engine. Our web
site page generation code is also largely written in Common Lisp,
though there's also fair bit of Java there.
4. Because we have about 2 gigs of static data we need rapid access
to, we use C++ code to memory-map huge files containing pointerless C
structs (of flights, fares, etc), and then access these from Common
Lisp using foreign data accesses. A struct field access compiles into
two or three instructions, so there's not really any performance.
penalty for accessing C rather than Lisp objects. By doing this, we
keep the Lisp garbage collector from seeing the data (to Lisp, each
pointer to a C object is just a fixnum, though we do often temporarily
wrap these pointers in Lisp objects to improve debuggability). Our
Lisp images are therefore only about 250 megs of "working" data
structures and code.
5. Every query that hits our site gets sent via tcpip to a Lisp
process running on an dual 800mhz x86 Linux box with 2g of ram ($3000,
vs about $1,000,000 for a similarly capable mainframe), and the
process devotes between 5 and 15 seconds of CPU time to it. One of
our customers will have 200 such boxes, each running 2 or 3 Lisp
processes. We save on ram by putting multiple processes on one box,
since the virtual memory system automatically shares our read-only
memory-mapped files between processes.
6. If you want to do a simple round-trip from BOS to LAX in two weeks,
coming back in three, willing to entertain a 24 hour departure window
for both parts, then limiting to "reasonable" routes (at most 3
flights and at most 10 hours or so) you have about 5,000 ways to get
there and 5,000 ways to get back. Listing them is a mostly trivial
graph-search (there are a few minor complications, but not many), that
anybody could do in a fraction of a second.
7. The real challenge is that a single fixed itinerary (a fixed set of
flights from BOS to LAX and a fixed set back) with only two flights in
each direction may have more than 10,000 possible combinations of
applicable "fares", each fare with complex restrictions that must be
checked against the flights and the other fares. That means that the
search space for this simple trip is of the order 5000 x 5000 x 10000,
and a naive program would need to do a _lot_ of computation just to
validate each of these possibilities. Suitably formalized, its not
even clear that the problem of finding the cheapest flight is
NP-complete, since it is difficult to put a bound on the size of the
solution that will result in the cheapest price. If you're willing to
dispense with restrictions on the energy in the universe, then it is
actually possible to formalize the cheapest-price problem in a
not-too-unreasonable way that leads to a proof of undecidability by
reduction to the Post correspondance problem :-).
8. Our Lisp code is running very clever algorithms that let us produce
in a reasonable time a data structure we call the "pricing-graph" from
which we can very efficiently answer a query of the form "give me the
k-th best solution (a validated set of flights and fares), ordered
according to the function f", assuming of course certain restrictions
on f, where the number of answers represented by the pricing-graph is
10^20 - 10^30 depending on the type of trip. In this way, we can
reasonably claim that in 10 seconds we can produce 10^30 answers, even
if we could not possibly enumerate the list of such answers.
9. We can do 10 seconds of Lisp computation on a 800mhz box and cons
less than 5k of data. This is because we pre-allocate all data
structures we need and die on queries that exceed them. This may make
many Lisp programmers cringe, but with a 250 meg image and real-time
constraints, we can't afford to generate garbage. For example, rather
than using cons, we use "cons!", which grabs cells from an array of
10,000,000 cells we've preallocated and which gets reset every query.
10. A lot of our Lisp is designed to compile into very efficient
assembly. We make a lot of use of Lisp's macro capabilities, but shy
away from many other Lisp features like closures, generic functions,
complex sequence functions and garbage collection. We're doing an
incredible amount of computation - getting 10 seconds on a modern
machine is an incredible gift - but if we're sloppy at all 10 seconds
can turn into ten minutes, not adequate for a travel agent or web
site. We disassemble most every Lisp function looking for
inefficiencies and have had both CMUCL and Franz enhanced to compile
our code better.
11. Occasionally we've had to move code from Lisp to C++, usually
because of data loading issues (Lisp garbage collectors just can't
deal with gigs of data, and there's no way to rapidly load gigs of
data into a Lisp). Our experience has been a 10 to 1 code expansion;
I don't think there are any programmers in our company that regret the
choice of Common Lisp.
12. We've had very little trouble getting non-Lisp programmers to read
and understand and extend our Lisp code. The only real problem is
that the training most programmers have in Lisp has taught them to
code very inefficiently, without paying any attention to the compiler.
Of course, with things like STL and Java, I think programmers of other
languages are also becoming pretty ignorant.
Date: Tue, 15 Jan 2002 17:49:04 -0800
From: Carl de Marcken
Paul,
I don't have any problems with it going up on a site, but
please make a note that this message is old and the world is constantly
changing: we now have thousands of CPUs running our code, and various
airlines and major web sites (Orbitz, e.g.) depending on it. The mainframes
are disappearing as our stuff replaces it.
|
|