Before Source.
In the early days of computing machinery, there was no such thing as source code. Alan Turing purportedly liked to talk to the machine in binary. Grace Hopper, who invented an early compiler, worked as close to the Harvard Mark I as she could get: flipping switches and plugging and unplugging relays that made up the "code" of what the machine would do. Such mechanical and meticulous work hardly merits the terms reading and writing; there were no GOTO statements, no line numbers, only calculations that had to be translated from the pseudo-mathematical writing of engineers and human computers to a physical or mechanical configuration.4 Writing and reading source code and programming languages was a long, slow development that became relatively widespread only by the mid-1970s. So-called higher-level languages began to appear in the late 1950s: FORTRAN, COBOL, Algol, and the "compilers" which allowed for programs written in them to be transformed into the illegible mechanical and valvular representations of the machine. It was in this era that the terms source language and target language emerged to designate the activity of translating higher to lower level languages.5 There is a certain irony about the computer, not often noted: the unrivaled power of the computer, if the ubiquitous claims are believed, rests on its general programmability; it can be made to do any calculation, in principle. The so-called universal Turing machine provides the mathematical proof.6 Despite the abstract power of such certainty, however, we do not live in the world of The Computer-we live in a world of computers. The hardware systems that manufacturers created from the 1950s onward were so specific and idiosyncratic that it was inconceivable that one might write a program for one machine and then simply run it on another. "Programming" became a bespoke practice, tailored to each new machine, and while programmers of a particular machine may well have shared programs with each other, they would not have seen much point in sharing with users of a different machine. Likewise, computer scientists shared mathematical descriptions of algorithms and ideas for automation with as much enthusiasm as corporations jealously guarded theirs, but this sharing, or secrecy, did not extend to the sharing of the program itself. The need to "rewrite" a program for each machine was not just a historical accident, but was determined by the needs of designers and engineers and the vicissitudes of the market for such expensive machines.7 In the good old days of computers-the-size-of-rooms, the languages that humans used to program computers were mnemonics; they did not exist in the computer, but on a piece of paper or a specially designed code sheet. The code sheet gave humans who were not Alan Turing a way to keep track of, to share with other humans, and to think systematically about the invisible light-speed calculations of a complicated device. Such mnemonics needed to be "coded" on punch cards or tape; if engineers conferred, they conferred over sheets of paper that matched up with wires, relays, and switches-or, later, printouts of the various machine-specific codes that represented program and data.
With the introduction of programming languages, the distinction between a "source" language and a "target" language entered the practice: source languages were "translated" into the illegible target language of the machine. Such higher-level source languages were still mnemonics of sorts-they were certainly easier for humans to read and write, mostly on yellowing tablets of paper or special code sheets-but they were also structured enough that a source language could be input into a computer and translated into a target language which the designers of the hardware had specified. Inputting commands and cards and source code required a series of actions specific to each machine: a particular card reader or, later, a keypunch with a particular "editor" for entering the commands. Properly input and translated source code provided the machine with an a.s.sembled binary program that would, in fact, run (calculate, operate, control). It was a separation, an abstraction that allowed for a certain division of labor between the ingenious human authors and the fast and mechanical translating machines.
Even after the invention of programming languages, programming "on" a computer-sitting at a glowing screen and hacking through the night-was still a long time in coming. For example, only by about 1969 was it possible to sit at a keyboard, write source code, instruct the computer to compile it, then run the program-all without leaving the keyboard-an activity that was all but unimaginable in the early days of "batch processing."8 Very few programmers worked in such a fashion before the mid-1970s, when text editors that allowed programmers to see the text on a screen rather than on a piece of paper started to proliferate.9 We are, by now, so familiar with the image of the man or woman sitting at a screen interacting with this device that it is nearly impossible to imagine how such a seemingly obvious practice was achieved in the first place-through the slow acc.u.mulation of the tools and techniques for working on a new kind of writing-and how that practice exploded into a Babel of languages and machines that betrayed the promise of the general-purpose computing machine.
The proliferation of different machines with different architectures drove a desire, among academics especially, for the standardization of programming languages, not so much because any single language was better than another, but because it seemed necessary to most engineers and computer users to share an emerging corpus of algorithms, solutions, and techniques of all kinds, necessary to avoid reinventing the wheel with each new machine. Algol, a streamlined language suited to algorithmic and algebraic representations, emerged in the early 1960s as a candidate for international standardization. Other languages competed on different strengths: FORTRAN and COBOL for general business use; LISP for symbolic processing. At the same time, the desire for a standard "higher-level" language necessitated a bestiary of translating programs: compilers, pa.r.s.ers, lexical a.n.a.lyzers, and other tools designed to transform the higher-level (human-readable) language into a machine-specific lower-level language, that is, machine language, a.s.sembly language, and ultimately the mystical zeroes and ones that course through our machines. The idea of a standard language and the necessity of devising specific tools for translation are the origin of the problem of portability: the ability to move software-not just good ideas, but actual programs, written in a standard language-from one machine to another.
A standard source language was seen as a way to counteract the proliferation of different machines with subtly different architectures. Portable source code would allow programmers to imagine their programs as ships, stopping in at ports of call, docking on different platforms, but remaining essentially mobile and unchanged by these port-calls. Portable source code became the Esperanto of humans who had wrought their own Babel of tribal hardware machines.
Meanwhile, for the computer industry in the 1960s, portable source code was largely a moot point. Software and hardware were two sides of single, extremely expensive coin-no one, except engineers, cared what language the code was in, so long as it performed the task at hand for the customer. Each new machine needed to be different, faster, and, at first, bigger, and then smaller, than the last. The urge to differentiate machines from each other was not driven by academic experiment or aesthetic purity, but by a demand for marketability, compet.i.tive advantage, and the transformation of machines and software into products. Each machine had to do something really well, and it needed to be developed in secret, in order to beat out the designs and innovations of compet.i.tors. In the 1950s and 1960s the software was a core component of this marketable object; it was not something that in itself was differentiated or separately distributed-until IBM's famous decision in 1968 to "unbundle" software and hardware.
Before the 1970s, employees of a computer corporation wrote software in-house. The machine was the product, and the software was just an extra line-item on the invoice. IBM was not the first to conceive of software as an independent product with its own market, however. Two companies, Informatics and Applied Data Research, had explored the possibilities of a separate market in software.10 Informatics, in particular, developed the first commercially successful software product, a business-management system called Mark IV, which in 1967 cost $30,000. Informatics' president Walter Bauer "later recalled that potential buyers were 'astounded' by the price of Mark IV. In a world accustomed to free software the price of $30,000 was indeed high."11 IBM's unbundling decision marked a watershed, the point at which "portable" source code became a conceivable idea, if not a practical reality, to many in the industry.12 Rather than providing a complete package of hardware and software, IBM decided to differentiate its products: to sell software and hardware separately to consumers.13 But portability was not simply a technical issue; it was a political-economic one as well. IBM's decision was driven both by its desire to create IBM software that ran on all IBM machines (a central goal of the famous OS/360 project overseen and diagnosed by Frederick Brooks) and as response to an ant.i.trust suit filed by the U.S. Department of Justice.14 The ant.i.trust suit included as part of its claims the suggestion that the close tying of software and hardware represented a form of monopolistic behavior, and it prompted IBM to consider strategies to "unbundle" its product.
Portability in the business world meant something specific, however. Even if software could be made portable at a technical level-transferable between two different IBM machines-this was certainly no guarantee that it would be portable between customers. One company's accounting program, for example, may not suit another's practices. Portability was therefore hindered both by the diversity of machine architectures and by the diversity of business practices and organization. IBM and other manufacturers therefore saw no benefit to standardizing source code, as it could only provide an advantage to compet.i.tors.15 Portability was thus not simply a technical problem-the problem of running one program on multiple architectures-but also a kind of political-economic problem. The meaning of product was not always the same as the meaning of hardware or software, but was usually some combination of the two. At that early stage, the outlines of a contest over the meaning of portable or shareable source code are visible, both in the technical challenges of creating high-level languages and in the political-economic challenges that corporations faced in creating distinctive proprietary products.
The UNIX Time-Sharing System.
Set against this backdrop, the invention, success, and proliferation of the UNIX operating system seems quite monstrous, an aberration of both academic and commercial practice that should have failed in both realms, instead of becoming the most widely used portable operating system in history and the very paradigm of an "operating system" in general. The story of UNIX demonstrates how portability became a reality and how the particular practice of sharing UNIX source code became a kind of de facto standard in its wake.
UNIX was first written in 1969 by Ken Thompson and Dennis Ritchie at Bell Telephone Labs in Murray Hill, New Jersey. UNIX was the denouement of the MIT project Multics, which Bell Labs had funded in part and to which Ken Thompson had been a.s.signed. Multics was one of the earliest complete time-sharing operating systems, a demonstration platform for a number of early innovations in time-sharing (multiple simultaneous users on one computer).16 By 1968, Bell Labs had pulled its support-including Ken Thompson-from the project and placed him back in Murray Hill, where he and Dennis Ritchie were stuck without a machine, without any money, and without a project. They were specialists in operating systems, languages, and machine architecture in a research group that had no funding or mandate to pursue these areas. Through the creative use of some discarded equipment, and in relative isolation from the rest of the lab, Thompson and Ritchie created, in the s.p.a.ce of about two years, a complete operating system, a programming language called C, and a host of tools that are still in extremely wide use today. The name UNIX (briefly, UNICS) was, among other things, a puerile pun: a castrated Multics.
The absence of an economic or corporate mandate for Thompson's and Ritchie's creativity and labor was not unusual for Bell Labs; researchers were free to work on just about anything, so long as it possessed some kind of vague relation to the interests of AT&T. However, the lack of funding for a more powerful machine did restrict the kind of work Thompson and Ritchie could accomplish. In particular, it influenced the design of the system, which was oriented toward a super-slim control unit (a kernel) that governed the basic operation of the machine and an expandable suite of small, independent tools, each of which did one thing well and which could be strung together to accomplish more complex and powerful tasks.17 With the help of Joseph Ossana, Douglas McIlroy, and others, Thompson and Ritchie eventually managed to agitate for a new PDP-11/20 based not on the technical merits of the UNIX operating system itself, but on its potential applications, in particular, those of the text-preparation group, who were interested in developing tools for formatting, typesetting, and printing, primarily for the purpose of creating patent applications, which was, for Bell Labs, and for AT&T more generally, obviously a laudable goal.18 UNIX was unique for many technical reasons, but also for a specific economic reason: it was never quite academic and never quite commercial. Martin Campbell-Kelly notes that UNIX was a "non-proprietary operating system of major significance."19 Kelly's use of "non-proprietary" is not surprising, but it is incorrect. Although business-speak regularly opposed open to proprietary throughout the 1980s and early 1990s (and UNIX was definitely the former), Kelly's slip marks clearly the confusion between software ownership and software distribution that permeates both popular and academic understandings. UNIX was indeed proprietary-it was copyrighted and wholly owned by Bell Labs and in turn by Western Electric and AT&T-but it was not exactly commercialized or marketed by them. Instead, AT&T allowed individuals and corporations to install UNIX and to create UNIX-like derivatives for very low licensing fees. Until about 1982, UNIX was licensed to academics very widely for a very small sum: usually royalty-free with a minimal service charge (from about $150 to $800).20 The conditions of this license allowed researchers to do what they liked with the software so long as they kept it secret: they could not distribute or use it outside of their university labs (or use it to create any commercial product or process), nor publish any part of it. As a result, throughout the 1970s UNIX was developed both by Thompson and Ritchie inside Bell Labs and by users around the world in a relatively informal manner. Bell Labs followed such a liberal policy both because it was one of a small handful of industry-academic research and development centers and because AT&T was a government monopoly that provided phone service to the country and was therefore forbidden to directly enter the computer software market.21 Being on the border of business and academia meant that UNIX was, on the one hand, shielded from the demands of management and markets, allowing it to achieve the conceptual integrity that made it so appealing to designers and academics. On the other, it also meant that AT&T treated it as a potential product in the emerging software industry, which included new legal questions from a changing intellectual-property regime, novel forms of marketing and distribution, and new methods of developing, supporting, and distributing software.
Despite this borderline status, UNIX was a phenomenal success. The reasons why UNIX was so popular are manifold; it was widely admired aesthetically, for its size, and for its clever design and tools. But the fact that it spread so widely and quickly is testament also to the existing community of eager computer scientists and engineers (and a few amateurs) onto which it was bootstrapped, users for whom a powerful, flexible, low-cost, modifiable, and fast operating system was a revelation of sorts. It was an obvious alternative to the complex, poorly doc.u.mented, buggy operating systems that routinely shipped standard with the machines that universities and research organizations purchased. "It worked," in other words.
A key feature of the popularity of UNIX was the inclusion of the source code. When Bell Labs licensed UNIX, they usually provided a tape that contained the doc.u.mentation (i.e., doc.u.mentation that was part of the system, not a paper technical manual external to it), a binary version of the software, and the source code for the software. The practice of distributing the source code encouraged people to maintain it, extend it, doc.u.ment it-and to contribute those changes to Thompson and Ritchie as well. By doing so, users developed an interest in maintaining and supporting the project precisely because it gave them an opportunity and the tools to use their computer creatively and flexibly. Such a globally distributed community of users organized primarily by their interest in maintaining an operating system is a precursor to the recursive public, albeit confined to the world of computer scientists and researchers with access to still relatively expensive machines. As such, UNIX was not only a widely shared piece of quasi-commercial software (i.e., distributed in some form other than through a price-based retail market), but also the first to systematically include the source code as part of that distribution as well, thus appealing more to academics and engineers.22 Throughout the 1970s, the low licensing fees, the inclusion of the source code, and its conceptual integrity meant that UNIX was ported to a remarkable number of other machines. In many ways, academics found it just as appealing, if not more, to be involved in the creation and improvement of a cutting-edge system by licensing and porting the software themselves, rather than by having it provided to them, without the source code, by a company. Peter Salus, for instance, suggests that people experienced the lack of support from Bell Labs as a kind of spur to develop and share their own fixes. The means by which source code was shared, and the norms and practices of sharing, porting, forking, and modifying source code were developed in this period as part of the development of UNIX itself-the technical design of the system facilitates and in some cases mirrors the norms and practices of sharing that developed: operating systems and social systems.23
Sharing UNIX.
Over the course of 197477 the spread and porting of UNIX was phenomenal for an operating system that had no formal system of distribution and no official support from the company that owned it, and that evolved in a piecemeal way through the contributions of people from around the world. By 1975, a user's group had developed: USENIX.24 UNIX had spread to Canada, Europe, Australia, and j.a.pan, and a number of new tools and applications were being both independently circulated and, significantly, included in the frequent releases by Bell Labs itself. All during this time, AT&T's licensing department sought to find a balance between allowing this circulation and innovation to continue, and attempting to maintain trade-secret status for the software. UNIX was, by 1980, without a doubt the most widely and deeply understood trade secret in computing history.
The manner in which the circulation of and contribution to UNIX occurred is not well doc.u.mented, but it includes both technical and pedagogical forms of sharing. On the technical side, distribution took a number of forms, both in resistance to AT&T's attempts to control it and facilitated by its unusually liberal licensing of the software. On the pedagogical side, UNIX quickly became a paradigmatic object for computer-science students precisely because it was a working operating system that included the source code and that was simple enough to explore in a semester or two.
In A Quarter Century of UNIX Salus provides a couple of key stories (from Ken Thompson and Lou Katz) about how exactly the technical sharing of UNIX worked, how sharing, porting, and forking can be distinguished, and how it was neither strictly legal nor deliberately illegal in this context. First, from Ken Thompson: "The first thing to realize is that the outside world ran on releases of UNIX (V4, V5, V6, V7) but we did not. Our view was a continuum. V5 was what we had at some point in time and was probably out of date simply by the activity required to put it in shape to export. After V6, I was preparing to go to Berkeley to teach for a year. I was putting together a system to take. Since it was almost a release, I made a diff with V6 [a tape containing only the differences between the last release and the one Ken was taking with him]. On the way to Berkeley I stopped by Urbana-Champaign to keep an eye on Greg Chesson. . . . I left the diff tape there and I told him that I wouldn't mind if it got around."25 The need for a magnetic tape to "get around" marks the difference between the 1970s and the present: the distribution of software involved both the material transport of media and the digital copying of information. The desire to distribute bug fixes (the "diff " tape) resonates with the future emergence of Free Software: the fact that others had fixed problems and contributed them back to Thompson and Ritchie produced an obligation to see that the fixes were shared as widely as possible, so that they in turn might be ported to new machines. Bell Labs, on the other hand, would have seen this through the lens of software development, requiring a new release, contract renegotiation, and a new license fee for a new version. Thompson's notion of a "continuum," rather than a series of releases also marks the difference between the idea of an evolving common set of objects stewarded by multiple people in far-flung locales and the idea of a shrink-wrapped "productized" software package that was gaining ascendance as an economic commodity at the same time. When Thompson says "the outside world," he is referring not only to people outside of Bell Labs but to the way the world was seen from within Bell Labs by the lawyers and marketers who would create a new version. For the lawyers, the circulation of source code was a problem because it needed to be stabilized, not so much for commercial reasons as for legal ones-one license for one piece of software. Distributing updates, fixes, and especially new tools and additions written by people who were not employed by Bell Labs scrambled the legal clarity even while it strengthened the technical quality. Lou Katz makes this explicit.
A large number of bug fixes was collected, and rather than issue them one at a time, a collection tape ("the 50 fixes") was put together by Ken [the same "diff tape," presumably]. Some of the fixes were quite important, though I don't remember any in particular. I suspect that a significant fraction of the fixes were actually done by non-Bell people. Ken tried to send it out, but the lawyers kept stalling and stalling and stalling. Finally, in complete disgust, someone "found a tape on Mountain Avenue" [the location of Bell Labs] which had the fixes. When the lawyers found out about it, they called every licensee and threatened them with dire consequences if they didn't destroy the tape, after trying to find out how they got the tape. I would guess that no one would actually tell them how they came by the tape (I didn't).26 Distributing the fixes involved not just a power struggle between the engineers and management, but was in fact clearly motivated by the fact that, as Katz says, "a significant fraction of the fixes were done by non-Bell people." This meant two things: first, that there was an obvious incentive to return the updated system to these people and to others; second, that it was not obvious that AT&T actually owned or could claim rights over these fixes-or, if they did, they needed to cover their legal tracks, which perhaps in part explains the stalling and threatening of the lawyers, who may have been buying time to make a "legal" version, with the proper permissions.
The struggle should be seen not as one between the rebel forces of UNIX development and the evil empire of lawyers and managers, but as a struggle between two modes of stabilizing the object known as UNIX. For the lawyers, stability implied finding ways to make UNIX look like a product that would meet the existing legal framework and the peculiar demands of being a regulated monopoly unable to freely compete with other computer manufacturers; the ownership of bits and pieces, ideas and contributions had to be strictly accountable. For the programmers, stability came through redistributing the most up-to-date operating system and sharing all innovations with all users so that new innovations might also be portable. The lawyers saw urgency in making UNIX legally stable; the engineers saw urgency in making UNIX technically stable and compatible with itself, that is, to prevent the forking of UNIX, the death knell for portability. The tension between achieving legal stability of the object and promoting its technical portability and stability is one that has repeated throughout the life of UNIX and its derivatives-and that has ramifications in other areas as well.
The ident.i.ty and boundaries of UNIX were thus intricately formed through its sharing and distribution. Sharing produced its own form of moral and technical order. Troubling questions emerged immediately: were the versions that had been fixed, extended, and expanded still UNIX, and hence still under the control of AT&T? Or were the differences great enough that something else (not-UNIX) was emerging? If a tape full of fixes, contributed by non-Bell employees, was circulated to people who had licensed UNIX, and those fixes changed the system, was it still UNIX? Was it still UNIX in a legal sense or in a technical sense or both? While these questions might seem relatively scholastic, the history of the development of UNIX suggests something far more interesting: just about every possible modification has been made, legally and technically, but the concept of UNIX has remained remarkably stable.
Porting UNIX.
Technical portability accounts for only part of UNIX's success. As a pedagogical resource, UNIX quickly became an indispensable tool for academics around the world. As it was installed and improved, it was taught and learned. The fact that UNIX spread first to university computer-science departments, and not to businesses, government, or nongovernmental organizations, meant that it also became part of the core pedagogical practice of a generation of programmers and computer scientists; over the course of the 1970s and 1980s, UNIX came to exemplify the very concept of an operating system, especially time-shared, multi-user operating systems. Two stories describe the porting of UNIX from machines to minds and ill.u.s.trate the practice as it developed and how it intersected with the technical and legal attempts to stabilize UNIX as an object: the story of John Lions's Commentary on Unix 6th Edition and the story of Andrew Tanenbaum's Minix.
The development of a pedagogical UNIX lent a new stability to the concept of UNIX as opposed to its stability as a body of source code or as a legal ent.i.ty. The porting of UNIX was so successful that even in cases where a ported version of UNIX shares none of the same source code as the original, it is still considered UNIX. The monstrous and promiscuous nature of UNIX is most clear in the stories of Lions and Tanenbaum, especially when contrasted with the commercial, legal, and technical integrity of something like Microsoft Windows, which generally exists in only a small number of forms (NT, ME, XP, 95, 98, etc.), possessing carefully controlled source code, immured in legal protection, and distributed only through sales and service packs to customers or personal-computer manufacturers. While Windows is much more widely used than UNIX, it is far from having become a paradigmatic pedagogical object; its integrity is predominantly legal, not technical or pedagogical. Or, in pedagogical terms, Windows is to fish as UNIX is to fishing lessons.
Lions's Commentary is also known as "the most photocopied doc.u.ment in computer science." Lions was a researcher and senior lecturer at the University of New South Wales in the early 1970s; after reading the first paper by Ritchie and Thompson on UNIX, he convinced his colleagues to purchase a license from AT&T.27 Lions, like many researchers, was impressed by the quality of the system, and he was, like all of the UNIX users of that period, intimately familiar with the UNIX source code-a necessity in order to install, run, or repair it. Lions began using the system to teach his cla.s.ses on operating systems, and in the course of doing so he produced a textbook of sorts, which consisted of the entire source code of UNIX version 6 (V6), along with elaborate, line-by-line commentary and explanation. The value of this textbook can hardly be underestimated. Access to machines and software that could be used to understand how a real system worked was very limited: "Real computers with real operating systems were locked up in machine rooms and committed to processing twenty four hours a day. UNIX changed that."28 Berny Goodheart, in an appreciation of Lions's Commentary, reiterated this sense of the practical usefulness of the source code and commentary: "It is important to understand the significance of John's work at that time: for students studying computer science in the 1970s, complex issues such as process scheduling, security, synchronization, file systems and other concepts were beyond normal comprehension and were extremely difficult to teach-there simply wasn't anything available with enough accessibility for students to use as a case study. Instead a student's discipline in computer science was earned by punching holes in cards, collecting fan-fold paper printouts, and so on. Basically, a computer operating system in that era was considered to be a huge chunk of inaccessible proprietary code."29 Lions's commentary was a unique doc.u.ment in the world of computer science, containing a kind of key to learning about a central component of the computer, one that very few people would have had access to in the 1970s. It shows how UNIX was ported not only to machines (which were scarce) but also to the minds of young researchers and student programmers (which were plentiful). Several generations of both academic computer scientists and students who went on to work for computer or software corporations were trained on photocopies of UNIX source code, with a whiff of toner and illicit circulation: a distributed operating system in the textual sense.
Unfortunately, Commentary was also legally restricted in its distribution. AT&T and Western Electric, in hopes that they could maintain trade-secret status for UNIX, allowed only very limited circulation of the book. At first, Lions was given permission to distribute single copies only to people who already possessed a license for UNIX V6; later Bell Labs itself would distribute Commentary briefly, but only to licensed users, and not for sale, distribution, or copying. Nonetheless, nearly everyone seems to have possessed a dog-eared, nth-generation copy. Peter Reintjes writes, "We soon came into possession of what looked like a fifth generation photocopy and someone who shall remain nameless spent all night in the copier room sp.a.w.ning a sixth, an act expressly forbidden by a carefully worded disclaimer on the first page. Four remarkable things were happening at the same time. One, we had discovered the first piece of software that would inspire rather than annoy us; two, we had acquired what amounted to a literary criticism of that computer software; three, we were making the single most significant advancement of our education in computer science by actually reading an entire operating system; and four, we were breaking the law."30 Thus, these generations of computer-science students and academics shared a secret-a trade secret become open secret. Every student who learned the essentials of the UNIX operating system from a photocopy of Lions's commentary, also learned about AT&T's attempt to control its legal distribution on the front cover of their textbook. The parallel development of photocopying has a nice resonance here; together with home ca.s.sette taping of music and the introduction of the video-ca.s.sette recorder, photocopying helped drive the changes to copyright law adopted in 1976.
Thirty years later, and long after the source code in it had been completely replaced, Lions's Commentary is still widely admired by geeks. Even though Free Software has come full circle in providing students with an actual operating system that can be legally studied, taught, copied, and implemented, the kind of "literary criticism" that Lions's work represents is still extremely rare; even reading obsolete code with clear commentary is one of the few ways to truly understand the design elements and clever implementations that made the UNIX operating system so different from its predecessors and even many of its successors, few, if any of which have been so successfully ported to the minds of so many students.
Lions's Commentary contributed to the creation of a worldwide community of people whose connection to each other was formed by a body of source code, both in its implemented form and in its textual, photocopied form. This nascent recursive public not only understood itself as belonging to a technical elite which was const.i.tuted by its creation, understanding, and promotion of a particular technical tool, but also recognized itself as "breaking the law," a community const.i.tuted in opposition to forms of power that governed the circulation, distribution, modification, and creation of the very tools they were learning to make as part of their vocation. The material connection shared around the world by UNIX-loving geeks to their source code is not a mere technical experience, but a social and legal one as well.
Lions was not the only researcher to recognize that teaching the source code was the swiftest route to comprehension. The other story of the circulation of source code concerns Andrew Tanenbaum, a well-respected computer scientist and an author of standard textbooks on computer architecture, operating systems, and networking.31 In the 1970s Tanenbaum had also used UNIX as a teaching tool in cla.s.ses at the Vrije Universiteit, in Amsterdam. Because the source code was distributed with the binary code, he could have his students explore directly the implementations of the system, and he often used the source code and the Lions book in his cla.s.ses. But, according to his Operating Systems: Design and Implementation (1987), "When AT&T released Version 7 [ca. 1979], it began to realize that UNIX was a valuable commercial product, so it issued Version 7 with a license that prohibited the source code from being studied in courses, in order to avoid endangering its status as a trade secret. Many universities complied by simply dropping the study of UNIX, and teaching only theory" (13). For Tanenbaum, this was an unacceptable alternative-but so, apparently, was continuing to break the law by teaching UNIX in his courses. And so he proceeded to create a completely new UNIX-like operating system that used not a single line of AT&T source code. He called his creation Minix. It was a stripped-down version intended to run on personal computers (IBM PCs), and to be distributed along with the textbook Operating Systems, published by Prentice Hall.32 Minix became as widely used in the 1980s as a teaching tool as Lions's source code had been in the 1970s. According to Tanenbaum, the Usenet group comp.os.minix had reached 40,000 members by the late 1980s, and he was receiving constant suggestions for changes and improvements to the operating system. His own commitment to teaching meant that he incorporated few of these suggestions, an effort to keep the system simple enough to be printed in a textbook and understood by undergraduates. Minix was freely available as source code, and it was a fully functioning operating system, even a potential alternative to UNIX that would run on a personal computer. Here was a clear example of the conceptual integrity of UNIX being communicated to another generation of computer-science students: Tanenbaum's textbook is not called "UNIX Operating Systems"-it is called Operating Systems. The clear implication is that UNIX represented the clearest example of the principles that should guide the creation of any operating system: it was, for all intents and purposes, state of the art even twenty years after it was first conceived.
Minix was not commercial software, but nor was it Free Software. It was copyrighted and controlled by Tanenbaum's publisher, Prentice Hall. Because it used no AT&T source code, Minix was also legally independent, a legal object of its own. The fact that it was intended to be legally distinct from, yet conceptually true to UNIX is a clear indication of the kinds of tensions that govern the creation and sharing of source code. The ironic apotheosis of Minix as the pedagogical gold standard for studying UNIX came in 199192, when a young Linus Torvalds created a "fork" of Minix, also rewritten from scratch, that would go on to become the paradigmatic piece of Free Software: Linux. Tanenbaum's purpose for Minix was that it remain a pedagogically useful operating system-small, concise, and ill.u.s.trative-whereas Torvalds wanted to extend and expand his version of Minix to take full advantage of the kinds of hardware being produced in the 1990s. Both, however, were committed to source-code visibility and sharing as the swiftest route to complete comprehension of operating-systems principles.
Forking UNIX.
Tanenbaum's need to produce Minix was driven by a desire to share the source code of UNIX with students, a desire AT&T was manifestly uncomfortable with and which threatened the trade-secret status of their property. The fact that Minix might be called a fork of UNIX is a key aspect of the political economy of operating systems and social systems. Forking generally refers to the creation of new, modified source code from an original base of source code, resulting in two distinct programs with the same parent. Whereas the modification of an engine results only in a modified engine, the modification of source code implies differentiation and reproduction, because of the ease with which it can be copied.
How could Minix-a complete rewrite-still be considered the same object? Considered solely from the perspective of trade-secret law, the two objects were distinct, though from the perspective of copyright there was perhaps a case for infringement, although AT&T did not rely on copyright as much as on trade secret. From a technical perspective, the functions and processes that the software accomplishes are the same, but the means by which they are coded to do so are different. And from a pedagogical standpoint, the two are identical-they exemplify certain core features of an operating system (file-system structure, memory paging, process management)-all the rest is optimization, or bells and whistles. Understanding the nature of forking requires also that UNIX be understood from a social perspective, that is, from the perspective of an operating system created and modified by user-developers around the world according to particular and partial demands. It forms the basis for the emergence of a robust recursive public.
One of the more important instances of the forking of UNIX's perambulatory source code and the developing community of UNIX co-developers is the story of the Berkeley Software Distribution and its incorporation of the TCP/IP protocols. In 1975 Ken Thompson took a sabbatical in his hometown of Berkeley, California, where he helped members of the computer-science department with their installations of UNIX, arriving with V6 and the "50 bug fixes" diff tape. Ken had begun work on a compiler for the Pascal programming language that would run on UNIX, and this work was taken up by two young graduate students: Bill Joy and Chuck Hartley. (Joy would later co-found Sun Microsystems, one of the most successful UNIX-based workstation companies in the history of the industry.) Joy, above nearly all others, enthusiastically partic.i.p.ated in the informal distribution of source code. With a popular and well-built Pascal system, and a new text editor called ex (later vi), he created the Berkeley Software Distribution (BSD), a set of tools that could be used in combination with the UNIX operating system. They were extensions to the original UNIX operating system, but not a complete, rewritten version that might replace it. By all accounts, Joy served as a kind of one-man software-distribution house, making tapes and posting them, taking orders and cashing checks-all in addition to creating software.33 UNIX users around the world soon learned of this valuable set of extensions to the system, and before long, many were differentiating between AT&T UNIX and BSD UNIX.
According to Don Libes, Bell Labs allowed Berkeley to distribute its extensions to UNIX so long as the recipients also had a license from Bell Labs for the original UNIX (an arrangement similar to the one that governed Lions's Commentary).34 From about 1976 until about 1981, BSD slowly became an independent distribution-indeed, a complete version of UNIX-well-known for the vi editor and the Pascal compiler, but also for the addition of virtual memory and its implementation on DEC's VAX machines.35 It should be clear that the unusual quasi-commercial status of AT&T's UNIX allowed for this situation in a way that a fully commercial computer corporation would never have allowed. Consider, for instance, the fact that many UNIX users-students at a university, for instance-could not essentially know whether they were using an AT&T product or something called BSD UNIX created at Berkeley. The operating system functioned in the same way and, except for the presence of copyright notices that occasionally flashed on the screen, did not make any show of a.s.serting its brand ident.i.ty (that would come later, in the 1980s). Whereas a commercial computer manufacturer would have allowed something like BSD only if it were incorporated into and distributed as a single, marketable, and identifiable product with a clever name, AT&T turned something of a blind eye to the proliferation and spread of AT&T UNIX and the result were forks in the project: distinct bodies of source code, each an instance of something called UNIX.
As BSD developed, it gained different kinds of functionality than the UNIX from which it was sp.a.w.ned. The most significant development was the inclusion of code that allowed it to connect computers to the Arpanet, using the TCP/IP protocols designed by Vinton Cerf and Robert Kahn. The TCP/IP protocols were a key feature of the Arpanet, overseen by the Information Processing and Techniques Office (IPTO) of the Defense Advanced Research Projects Agency (DARPA) from its inception in 1967 until about 1977. The goal of the protocols was to allow different networks, each with its own machines and administrative boundaries, to be connected to each other.36 Although there is a common heritage-in the form of J. C. R. Licklider-which ties the imagination of the time-sharing operating system to the creation of the "galactic network," the Arpanet initially developed completely independent of UNIX.37 As a time-sharing operating system, UNIX was meant to allow the sharing of resources on a single computer, whether mainframe or minicomputer, but it was not initially intended to be connected to a network of other computers running UNIX, as is the case today.38 The goal of Arpanet, by contrast, was explicitly to achieve the sharing of resources located on diverse machines across diverse networks.
To achieve the benefits of TCP/IP, the resources needed to be implemented in all of the different operating systems that were connected to the Arpanet-whatever operating system and machine happened to be in use at each of the nodes. However, by 1977, the original machines used on the network were outdated and increasingly difficult to maintain and, according to Kirk McKusick, the greatest expense was that of porting the old protocol software to new machines. Hence, IPTO decided to pursue in part a strategy of achieving coordination at the operating-system level, and they chose UNIX as one of the core platforms on which to standardize. In short, they had seen the light of portability. In about 1978 IPTO granted a contract to Bolt, Beranek, and Newman (BBN), one of the original Arpanet contractors, to integrate the TCP/IP protocols into the UNIX operating system.
But then something odd happened, according to Salus: "An initial prototype was done by BBN and given to Berkeley. Bill [Joy] immediately started hacking on it because it would only run an Ethernet at about 56K/sec utilizing 100% of the CPU on a 750. . . . Bill lobotomized the code and increased its performance to on the order of 700KB/sec. This caused some consternation with BBN when they came in with their 'finished' version, and Bill wouldn't accept it. There were battles for years after, about which version would be in the system. The Berkeley version ultimately won."39 Although it is not clear, it appears BBN intended to give Joy the code in order to include it in his BSD version of UNIX for distribution, and that Joy and collaborators intended to cooperate with Rob Gurwitz of BBN on a final implementation, but Berkeley insisted on "improving" the code to make it perform more to their needs, and BBN apparently dissented from this.40 One result of this scuffle between BSD and BBN was a genuine fork: two bodies of code that did the same thing, competing with each other to become the standard UNIX implementation of TCP/IP. Here, then, was a case of sharing source code that led to the creation of different versions of software-sharing without collaboration. Some sites used the BBN code, some used the Berkeley code.
Forking, however, does not imply permanent divergence, and the continual improvement, porting, and sharing of software can have odd consequences when forks occur. On the one hand, there are particular pieces of source code: they must be identifiable and exact, and prepended with a copyright notice, as was the case of the Berkeley code, which was famously and vigorously policed by the University of California regents, who allowed for a very liberal distribution of BSD code on the condition that the copyright notice was retained. On the other hand, there are particular named collections of code that work together (e.g., UNIX, or DARPA-approved UNIX, or later, Certified Open Source [sm]) and are often identified by a trademark symbol intended, legally speaking, to differentiate products, not to a.s.sert ownership of particular instances of a product.
The odd consequence is this: Bill Joy's specific TCP/IP code was incorporated not only into BSD UNIX, but also into other versions of UNIX, including the UNIX distributed by AT&T (which had originally licensed UNIX to Berkeley) with the Berkeley copyright notice removed. This bizarre, tangled bank of licenses and code resulted in a famous suit and countersuit between AT&T and Berkeley, in which the intricacies of this situation were sorted out.41 An innocent bystander, expecting UNIX to be a single thing, might be surprised to find that it takes different forms for reasons that are all but impossible to identify, but the cause of which is clear: different versions of sharing in conflict with one another; different moral and technical imaginations of order that result in complex entanglements of value and code.
The BSD fork of UNIX (and the subfork of TCP/IP) was only one of many to come. By the early 1980s, a proliferation of UNIX forks had emerged and would be followed shortly by a very robust commercialization. At the same time, the circulation of source code started to slow, as corporations began to compete by adding features and creating hardware specifically designed to run UNIX (such as the Sun Sparc workstation and the Solaris operating system, the result of Joy's commercialization of BSD in the 1980s). The question of how to make all of these versions work together eventually became the subject of the open-systems discussions that would dominate the workstation and networking sectors of the computer market from the early 1980s to 1993, when the dual success of Windows NT and the arrival of the Internet into public consciousness changed the fortunes of the UNIX industry.
A second, and more important, effect of the struggle between BBN and BSD was simply the widespread adoption of the TCP/IP protocols. An estimated 98 percent of computer-science departments in the United States and many such departments around the world incorporated the TCP/IP protocols into their UNIX systems and gained instant access to Arpanet.42 The fact that this occurred when it did is important: a few years later, during the era of the commercialization of UNIX, these protocols might very well not have been widely implemented (or more likely implemented in incompatible, nonstandard forms) by manufacturers, whereas before 1983, university computer scientists saw every benefit in doing so if it meant they could easily connect to the largest single computer network on the planet. The large, already functioning, relatively standard implementation of TCP/IP on UNIX (and the ability to look at the source code) gave these protocols a tremendous advantage in terms of their survival and success as the basis of a global and singular network.
Conclusion.
The UNIX operating system is not just a technical achievement; it is the creation of a set of norms for sharing source code in an unusual environment: quasi-commercial, quasi-academic, networked, and planetwide. Sharing UNIX source code has taken three basic forms: porting source code (transferring it from one machine to another); teaching source code, or "porting" it to students in a pedagogical setting where the use of an actual working operating system vastly facilitates the teaching of theory and concepts; and forking source code (modifying the existing source code to do something new or different). This play of proliferation and differentiation is essential to the remarkably stable ident.i.ty of UNIX, but that ident.i.ty exists in multiple forms: technical (as a functioning, self-compatible operating system), legal (as a license-circ.u.mscribed version subject to intellectual property and commercial law), and pedagogical (as a conceptual exemplar, the paradigm of an operating system). Source code shared in this manner is essentially unlike any other kind of source code in the world of computers, whether academic or commercial. It raises troubling questions about standardization, about control and audit, and about legitimacy that haunts not only UNIX but the Internet and its various "open" protocols as well.
Sharing source code in Free Software looks the way it does today because of UNIX. But UNIX looks the way it does not because of the inventive genius of Thompson and Ritchie, or the marketing and management brilliance of AT&T, but because sharing produces its own kind of order: operating systems and social systems. The fact that geeks are wont to speak of "the UNIX philosophy" means that UNIX is not just an operating system but a way of organizing the complex relations of life and work through technical means; a way of charting and breaching the boundaries between the academic, the aesthetic, and the commercial; a way of implementing ideas of a moral and technical order. What's more, as source code comes to include more and more of the activities of everyday communication and creation-as it comes to replace writing and supplement thinking-the genealogy of its portability and the history of its forking will illuminate the kinds of order emerging in practices and technologies far removed from operating systems-but tied intimately to the UNIX philosophy.
5. Conceiving Open Systems.
The great thing about standards is that there are so many to choose from.
Openness is an unruly concept. While free tends toward ambiguity (free as in speech, or free as in beer?), open tends toward obfuscation. Everyone claims to be open, everyone has something to share, everyone agrees that being open is the obvious thing to do-after all, openness is the other half of "open source"-but for all its obviousness, being "open" is perhaps the most complex component of Free Software. It is never quite clear whether being open is a means or an end. Worse, the opposite of open in this case (specifically, "open systems") is not closed, but "proprietary"-signaling the complicated imbrication of the technical, the legal, and the commercial.
In this chapter I tell the story of the contest over the meaning of "open systems" from 1980 to 1993, a contest to create a simultaneously moral and technical infrastructure within the computer industry.2 The infrastructure in question includes technical components-the UNIX operating system and the TCP/IP protocols of the Internet as open systems-but it also includes "moral" components, including the demand for structures of fair and open compet.i.tion, antimonopoly and open markets, and open-standards processes for high-tech networked computers and software in the 1980s.3 By moral, I mean imaginations of the proper order of collective political and commercial action; referring to much more than simply how individuals should act, moral signifies a vision of how economy and society should be ordered collectively.
The open-systems story is also a story of the blind spot of open systems-in that blind spot is intellectual property. The story reveals a tension between incompatible moral-technical orders: on the one hand, the promise of multiple manufacturers and corporations creating interoperable components and selling them in an open, heterogeneous market; on the other, an intellectual-property system that encouraged jealous guarding and secrecy, and granted monopoly status to source code, designs, and ideas in order to differentiate products and promote compet.i.tion. The tension proved irresolvable: without shared source code, for instance, interoperable operating systems are impossible. Without interoperable operating systems, internetworking and portable applications are impossible. Without portable applications that can run on any system, open markets are impossible. Without open markets, monopoly power reigns.
Standardization was at the heart of the contest, but by whom and by what means was never resolved. The dream of open systems, pursued in an entirely unregulated industry, resulted in a complicated experiment in novel forms of standardization and cooperation. The creation of a "standard" operating system based on UNIX is the story of a failure, a kind of "figuring out" gone haywire, which resulted in huge consortia of computer manufacturers attempting to work together and compete with each other at the same time. Meanwhile, the successful creation of a "standard" networking protocol-known as the Open Systems Interconnection Reference Model (OSI)-is a story of failure that hides a larger success; OSI was eclipsed in the same period by the rapid and ad hoc adoption of the Transmission Control Protocol/Internet Protocol (TCP/IP), which used a radically different standardization process and which succeeded for a number of surprising reasons, allowing the Internet to take the form it did in the 1990s and ultimately exemplifying the moral-technical imaginary of a recursive public-and one at the heart of the practices of Free Software.
The conceiving of openness, which is the central plot of these two stories, has become an essential component of the contemporary practice and power of Free Software. These early battles created a kind of widespread readiness for Free Software in the 1990s, a recognition of Free Software as a removal of open systems' blind spot, as much as an exploitation of its power. The geek ideal of openness and a moral-technical order (the one that made Napster so significant an event) was forged in the era of open systems; without this concrete historical conception of how to maintain openness in technical and moral terms, the recursive public of geeks would be just another hierarchical closed organization-a corporation manque-and not an independent public serving as a check on the kinds of destructive power that dominated the open-systems contest.
Hopelessly Plural.
Big iron, silos, legacy systems, turnkey systems, dinosaurs, mainframes: with the benefit of hindsight, the computer industry of the 1960s to the 1980s appears to be backward and closed, to have literally painted itself into a corner, as an early Intel advertis.e.m.e.nt suggests (figure 3). Contemporary observers who show disgust and impatience with the form that computers took in this era are without fail supporters of open systems and opponents of proprietary systems that "lock in" customers to specific vendors and create artificial demands for support, integration, and management of resources. Open systems (were it allowed to flourish) would solve all these problems.
Given the promise of a "general-purpose computer," it should seem ironic at best that open systems needed to be created. But the general-purpose computer never came into being. We do not live in the world of The Computer, but in a world of computers: myriad, incompatible, specific machines. The design of specialized machines (or "architectures") was, and still is, key to a compet.i.tive industry in computers. It required CPUs and components and a.s.sociated software that could be clearly qualified and marketed
3. Open systems is the solution to painting yourself into a corner. Intel advertis.e.m.e.nt, Wall Street Journal, 30 May 1984.
as distinct products: the DEC PDP-11 or the IBM 360 or the CDC 6600. On the Fordist model of automobile production, the computer industry's mission was to render desired functions (scientific calculation, bookkeeping, reservations management) in a large box with a b.u.t.ton on it (or a very large number of b.u.t.tons on increasingly smaller boxes). Despite the theoretical possibility, such computers were not designed to do anything, but, rather, to do specific kinds of calculations exceedingly well. They were objects customized to particular markets.
The marketing strategy was therefore extremely stable from about 1955 to about 1980: identify customers with computing needs, build a computer to serve them, provide them with all of the equipment, software, support, or peripherals they need to do the job-and charge a large amount. Organizationally speaking, it was an industry dominated by "IBM and the seven dwarfs": Hewlett-Packard, Honeywell, Control Data, General Electric, NCR, RCA, Univac, and Burroughs, with a few upstarts like DEC in the wings.
By the 1980s, however, a certain inversion had happened. Computers had become smaller and faster; there were more and more of them, and it was becoming increasingly clear to the "big iron" manufacturers that what was most valuable to users was the information they generated, not the machines that did the generating. Such a realization, so the story goes, leads to a demand for interchangeability, interoperability, information sharing, and networking. It also presents the nightmarish problems of conversion between a bewildering, heterogeneous, and rapidly growing array of hardware, software, protocols, and systems. As one conference paper on the subject of evaluating open systems put it, "At some point a large enterprise will look around and see a huge amount of equipment and software that will not work together. Most importantly, the information stored on these diverse platforms is not being shared, leading to unnecessary duplication and lost profit."4 Open systems emerged in the 1980s as the name of the solution to this problem: an approach to the design of systems that, if all partic.i.p.ants were to adopt it, would lead to widely interoperable, integrated machines that could send, store, process, and receive the user's information. In marketing and public-relations terms, it would provide "seamless integration."
In theory, open systems was simply a question of standards adoption. For instance, if all the manufacturers of UNIX systems could be convinced to adopt the same basic standard for the operating system, then seamless integration would naturally follow as all the various applications could be written once to run on any variant UNIX system, regardless of which company made it. In reality, such a standard was far from obvious, difficult to create, and even more difficult to enforce. As such, the meaning of open systems was "hopelessly plural," and the term came to mean an incredibly diverse array of things.
"Openness" is precisely the kind of concept that wavers between end and means. Is openness good in itself, or is openness a means to achieve something else-and if so what? Who wants to achieve openness, and for what purpose? Is openness a goal? Or is it a means by which a different goal-say, "interoperability" or "integration"-is achieved? Whose goals are these, and who sets them? Are the goals of corporations different from or at odds with the goals of university researchers or government officials? Are there large central visions to which the activities of all are ultimately subordinate?
Between 1980 and 1993, no person or company or computer industry consortium explicitly set openness as the goal that organizations, corporations, or programmers should aim at, but, by the same token, hardly anyone dissented from the demand for openness. As such, it appears clearly as a kind of cultural imperative, reflecting a longstanding social imaginary with roots in liberal democratic notions, versions of a free market and ideals of the free exchange of knowledge, but confronting changed technical conditions that bring the moral ideas of order into relief, and into question.
In the 1980s everyone seemed to want some kind of openness, whether among manufacturers or customers, from General Motors to the armed forces.5 The debates, both rhetorical and technical, about the meaning of open systems have produced a slough of writings, largely directed at corporate IT managers and CIOs. For instance, Terry A. Critchley and K. C. Batty, the authors of Open Systems: The Reality (1993), claim to have collected over a hundred definitions of open systems. The definitions stress different aspects-from interoperability of heterogeneous machines, to compatibility of different applications, to portability of operating systems, to legitimate standards with open-interface definitions-including those that privilege ideologies of a free market, as does Bill Gates's definition: "There's nothing more open than the PC market. . . . [U]sers can choose the latest and greatest software." The range of meanings was huge and oriented along multiple axes: what, to whom, how, and so on. Open systems could mean that source code was open to view or that only the specifications or interfaces were; it could mean "available to certain third parties" or "available to everyone, including compet.i.tors"; it could mean self-publishing, well-defined interfaces and application programming interfaces (APIs), or it could mean sticking to standards set by governments and professional societies. To cynics, it simply meant that the m