Numbers Rule Your World Part 5 Online

Numbers Rule Your World - novelonlinefull.com

You’re read light novel Numbers Rule Your World Part 5 online at NovelOnlineFull.com. Please use the follow button to get notification about the latest chapter next time when you visit NovelOnlineFull.com. Use F11 button to read novel in full-screen(PC only). Drop by anytime you want to read free – fast – latest novel. It’s great if you could leave a comment, share your opinion about the new chapters, new novel with others on the internet. We’ll do our best to bring you the finest, latest novel everyday. Enjoy

Upon a.n.a.lyzing the lottery win data, Rosenthal uncovered an unusual pattern of wins by retail store insiders, much too unusual to have been produced by chance. With similar logic, some people stopped flying after the EgyptAir crash because to them, four crashes in four years seemed like an unusual pattern of disasters in the same region-too many to have happened completely at random. Did such behavior const.i.tute a "personality disorder"?

The facts on the ground were immutable: the four flights, the location, the accident times, and the number of casualties were there for all to see. Many rejected random chance as an explanation for the pattern of crashes. Yet, to Professor Barnett, four in four looked just like the work of chance. He even used the same tool of statistical testing but arrived at a different conclusion. The difference lay in how he a.s.similated the data.

Statisticians are a curious lot: when given a vertical set of numbers, they like to look sideways. They look into the nooks and crannies; they look underneath the cracks; they turn over every pebble. From decades of experience, they learn that what is hidden is just as important as what is in front of their eyes. No one ever gets to see the whole picture, so the key is to know what you don't know know what you don't know. When you read the table of fatalities presented earlier, you may have visualized four black dots over the "Nantucket Triangle" and connected the dots; Barnett, by contrast, saw four black dots plus millions of white dots. Each white dot stood for one flight that safely traversed the air s.p.a.ce during those four years. Seen in this light, we would hardly find the black dots, let alone connect them. Then, taking it further, Barnett envisioned ten, even twenty, years of flights over the Nantucket Triangle, bringing millions more white dots into the picture, and only dozens of additional black dots. This method creates a new picture, one altogether different from the list of worst disasters frequently displayed in postcrash news reports. Listed separately, the four accidents stuck out like stars in the night sky; however, they became all but invisible when buried within a sea of whiteness (see Figure 5-2 Figure 5-2). Considering that the Northeast Corridor is one of the busiest airways in the world, it would follow that this area would see a larger number of fatal accidents.

As to whether fear of flying could be considered a "personality disorder," one esteemed statistician answered firmly in the negative during a lecture to an audience at Boeing. He suggested that as the airline industry has fended off systematic causes of jet crashes such as equipment failure, new types of risks are rising to the surface. He cited three "menaces that caused scant fatalities in the 1990s but which could cause more deaths in forthcoming years": sabotage, runway collisions, and midair collisions. The lecture, t.i.tled "Airline Safety: End of a Golden Age?" could not have been more aptly timed; it was delivered on September 11, 2001. The future he had antic.i.p.ated arrived early.

Figure 5-2 The Statistician's Worldview The Statistician's Worldview [image]

Who was this professor with such impressive foresight? None other than Arnold Barnett, who has been studying airline safety data for more than thirty years at the MIT Sloan School of Management. In the 1970s, he initiated a remarkably productive research program that has continuously tracked the safety record of airlines worldwide. Before he arrived on the scene, people considered it impossible to measure airline safety accurately, because the contributing factors could not be directly observed. How could one appraise the att.i.tudes of corporate managers toward safety? How could one compare the efficacy of different training programs? How could one take into account disparate flight routes, airports, flight lengths, and age of airlines? Barnett the statistician made an end run around these obstacles, realizing he did not need any of those unknowns. When a pa.s.senger boards a plane, his or her fear is solely of dying in a fatal crash; it is thus sufficient to merely track the frequency of fatal accidents and the subsequent survival rates. Similarly, universities rely on SAT scores and school ranks to evaluate applicants because they cannot possibly visit every family, every home, and every school. How to compare Mary's parents to Julia's? How to rank Michael's gymnasium against Joseph's? So, instead of measuring such specific influences on student achievement as parental upbringing and quality of education, educators merely track the actual scholastic ability as represented by SAT scores and school ranks.

Under Barnett's watch, airlines in the developed world saw the risk of death drop from 1 in 700,000 in the 1960s to 1 in 10 million in the 1990s, a fourteen-fold improvement in three decades. He was the first to prove that American carriers were the safest in the world, and by 1990, he was telling everyone about a golden age of air safety. The rest of the developed world has since caught up, while the developing world still lags by two decades. Barnett believes that fatal air crashes have essentially become random events with a minuscule strike rate. In other words, it is no longer possible to find any systematic cause of an air disaster, like mechanical failure or turbulence. Air crashes today are practically freak accidents.

What does the visionary Barnett say about two of our biggest fears?

1. Don't choose between U.S. national airlines based on safety. Don't choose between U.S. national airlines based on safety. Jet crashes occur randomly, so the carrier that suffers a recent crash would have been merely unlucky. Between 1987 and 1996, USAir happened to be the unlucky airline. It operated 20 percent of domestic flights but accounted for 50 percent of all crash fatalities, by far the worst record among the seven major airlines in the United States (see Jet crashes occur randomly, so the carrier that suffers a recent crash would have been merely unlucky. Between 1987 and 1996, USAir happened to be the unlucky airline. It operated 20 percent of domestic flights but accounted for 50 percent of all crash fatalities, by far the worst record among the seven major airlines in the United States (see Figure 5-3 Figure 5-3). Barnett asked what the chance was that such a lopsided allocation of deaths could have hit any one of the seven carriers. The chance was 11 percent; it was quite likely to happen, and if not USAir, another airline would have borne the brunt. In another study, Barnett found that no U.S. airline has sustained an advantage in safety: the top-ranked airline in one period frequently came in last in the next period, giving further proof that all operators were materially equal in safety. It is just not possible to predict which airline will suffer the next fatal crash. Pa.s.sengers have nowhere to run for air safety.

Figure 5-3 Relative Proportion of Flights and Deaths for USAir and Six Other U.S. Carriers, 19871996: Evidence That USAir Was Less Safe? Relative Proportion of Flights and Deaths for USAir and Six Other U.S. Carriers, 19871996: Evidence That USAir Was Less Safe?

[image]

2. Don't avoid foreign airlines, even after one of their planes has crashed. Don't avoid foreign airlines, even after one of their planes has crashed. Flights operated by developing-world airlines are just as safe as those run by U.S. airlines on routes where they directly compete with one another, typically those between the developed and developing worlds. Where they do not overlap, foreign airlines suffer many more crashes, for unknown reasons. (Some speculate that they may be a.s.signing better crews to international flights.) Because of their poor domestic record, the overall risk of death a.s.sociated with developing-world carriers was eight times worse than for their developed-world peers. But Barnett found no difference between these two groups of operators on competing routes: the risk was about 1 in 1.5 million during 20002005. The once-a-day frequent flier could expect to die in a jet crash in 4,100 years, on any of the operators that offer service on these routes. Moreover, while the worldwide risk of aviation fatality has been more than halved since the 1980s, the risk differential between developing-world and developed-world operators has stayed minute. Thus, we can trust these supposedly hulking, inefficient state enterprises with old planes, undertrained pilots, and unmotivated staff to take us overseas safely. Flights operated by developing-world airlines are just as safe as those run by U.S. airlines on routes where they directly compete with one another, typically those between the developed and developing worlds. Where they do not overlap, foreign airlines suffer many more crashes, for unknown reasons. (Some speculate that they may be a.s.signing better crews to international flights.) Because of their poor domestic record, the overall risk of death a.s.sociated with developing-world carriers was eight times worse than for their developed-world peers. But Barnett found no difference between these two groups of operators on competing routes: the risk was about 1 in 1.5 million during 20002005. The once-a-day frequent flier could expect to die in a jet crash in 4,100 years, on any of the operators that offer service on these routes. Moreover, while the worldwide risk of aviation fatality has been more than halved since the 1980s, the risk differential between developing-world and developed-world operators has stayed minute. Thus, we can trust these supposedly hulking, inefficient state enterprises with old planes, undertrained pilots, and unmotivated staff to take us overseas safely.

Like Rosenthal, Barnett used statistical testing to prove his point. For the decade leading up to 1996, developing-world airlines operated 62 percent of compet.i.tive flights. If they were just as safe as U.S. airlines, they should have caused about 62 percent of pa.s.senger deaths, or well over 62 percent if they were more p.r.o.ne to disasters. In those ten years, developing-world carriers caused only 55 percent of the fatalities, indicating that they did no worse (see Figure 5-4 Figure 5-4).

Figure 5-4 Relative Proportion of Flights and Deaths for Developed-World and Developing-World Carriers, 19871996: No Evidence That Developing-World Carriers Were Less Safe on Comparable Routes Relative Proportion of Flights and Deaths for Developed-World and Developing-World Carriers, 19871996: No Evidence That Developing-World Carriers Were Less Safe on Comparable Routes [image]

The news about the Ontario lottery investigation spread all over Canada, and in every province, the lottery corporations were overrun by phone calls and e-mails from concerned citizens.

British Columbia's...o...b..dsman, in reviewing past winners, unmasked dozens of extraordinarily lucky store owners, including one who took home CDN$300,000 over five years, winning eleven times. When the president of the British Columbia Lottery Corporation, which runs the province's lottery, was fired, his buddy, himself a former president, came to his defense: "Of course, it's possible retailers cheated players of their prize money, but only if you're a fool."

In New Brunswick, the Atlantic Lottery Corporation, which runs lotteries in four provinces, attempted to reshape the publicity by hiring an external consultant to audit past wins using the same method as Rosenthal. The a.n.a.lysis, however, showed that between 2001 and 2006, store owners claimed 37 out of 1,293 prizes of CDN$25,000 or more, when they were expected to have won fewer than 4 of those. It was inconceivable that this group of players could have won so many prizes if each ticket had an equal chance of winning.

Meanwhile, the CBC hired Rosenthal again, this time to examine the pattern of wins in the lotteries in the Western provinces from November 2003 to October 2006. The professor found that insiders earned sixty-seven wins of CDN$10,000 or more- twice as many as could be expected if the lotteries were fair to all players. Just how lucky were these insiders? Using statistical testing, Rosenthal further explained that the chance was 1 in 2.3 million that insiders could have racked up so many wins under a fair lottery system. While not as extreme as in Ontario, these odds were still negligible. Again, Rosenthal could hardly believe that the store owners were that much luckier than the rest of the ticket holders, so he suspected fraud. (Unlike in Ontario, neither the Atlantic nor the Western Lottery Corporation has been able to catch any individual cheater.) To restore the public's confidence, the lottery authorities announced a series of measures to protect customers, including installation of self-service scanning machines, reconfiguration of monitors to face out to the customers, improvement in win-tracking technology, background checks for retailers, and the requirement of winners to sign the back of their winning tickets. It remains to be seen whether these policies will succeed at lifting the cloud of suspicion.

Both statisticians grappled with real-life data, noticed unusual patterns, and asked whether they could occur by chance. Rosenthal's answer was an unequivocal no, and his result raised myriad doubts about insider wins in Ontario lotteries. Employing the same type of logic, Barnett alleviated our fear of flying by showing why air travelers have nowhere to run, because freak accidents can hit any unlucky carrier, anywhere.

You may still be wondering why statisticians willingly accept the risk of death while they show little appet.i.te for playing with chance. Why do they behave differently from most people? We know it is not the tools at their disposal that affect their behavior; we all use the same sort of statistical testing to weigh the situational evidence against chance, whether we realize it or not. The first difference lies in the way that statisticians perceive data: most people tend to hone in on unexpected patterns, but statisticians like to evaluate these against the background. For Barnett, the background is the complete flight schedule, not just a list of the worst disasters, while for Rosenthal, it includes all lottery players, not just retailers with major wins.

Moreover, in the worldview of statisticians, rare is impossible: rare is impossible: jackpots are for dreamers, and jet crashes for paranoids. For Rosenthal to believe that all retail store insiders acted with honor, he would have had to accept that an extremely rare event had taken place. That would require disavowing his statistical roots. Barnett keeps on flying, twice a week, as he believes air disasters are nigh extinct. Had he stopped out of fear at any point, he would have had to admit that an incredibly unlikely incident could occur. That, too, would contravene his statistical instinct. jackpots are for dreamers, and jet crashes for paranoids. For Rosenthal to believe that all retail store insiders acted with honor, he would have had to accept that an extremely rare event had taken place. That would require disavowing his statistical roots. Barnett keeps on flying, twice a week, as he believes air disasters are nigh extinct. Had he stopped out of fear at any point, he would have had to admit that an incredibly unlikely incident could occur. That, too, would contravene his statistical instinct.

Rather than failing at risk a.s.sessment, as many have alleged, the people who avoid flying after air crashes also are reasoning like statisticians. Faced with a raft of recent fatal crashes, they rule out the possibility of chance. What leads them to draw different conclusions is the limited slice of data available to them. There are many everyday situations in which we run statistical tests without realizing it. The first time our bags get searched at the airport, we might rue our luck. If it happens twice, we might start to wonder about the odds of being picked again. Three or four times, and we might seriously doubt whether selection has been random at all. Rare is impossible.

At the request of two senators in 1996, the Federal Aviation Administration acted to close the information gap between the experts and the public by releasing limited air safety data on its website. How have we done since? Poorly, unfortunately. As of 2006, anyone can find black dots (the disasters) in those databases but not the white dots (the safe arrivals). Every incident from total loss to no damage is recorded with ample details, making it difficult to focus on the relevant events. Clearly, weak execution has run afoul of good intention. It is time we started turning over those pebbles! As the professors showed us, a few well-chosen numbers paint a far richer picture than hundreds of thousands of disorganized data.

Conclusion Statistical thinking is hard," the n.o.bel prize winner Daniel Kahneman told a gathering of mathematicians in New York City in 2009. A revered figure in the world of behavioral economics, Professor Kahneman spoke about his renewed interest in this topic, which he first broached in the 1970s with his frequent collaborator Amos Tversky. The subject matter is not inherently difficult, but our brains are wired in such a way that it requires a conscious effort to switch away from the default mode of reasoning, which is not statistical. Psychologists found that when research subjects were properly trained, and if they recognized the statistical nature of the task at hand, they were much likelier to make the correct judgment.

Statistical thinking is distinct from everyday thinking. It is a skill that is learned. What better way to master it than to look at positive examples of what others have accomplished. Although they rarely make the headlines, many applied scientists routinely use statistical thinking on the job. The stories in this book demonstrate how these pract.i.tioners make smart decisions and how their work benefits society.

In concluding, I review the five aspects of statistical thinking: 1. The discontent of being averaged: Always ask about variability Always ask about variability.2. The virtue of being wrong: Pick useful over true Pick useful over true.3. The dilemma of being together: Compare like with like Compare like with like.4. The sway of being asymmetric: Heed the give-and-take of two errors Heed the give-and-take of two errors.5. The power of being impossible: Don't believe what is too rare to be true. Don't believe what is too rare to be true.

Some technical language is introduced in these pages; it can be used as guideposts for those wanting to explore the domain of statistical thinking further. The interst.i.tial sections called "Crossovers" take another look at the same stories, the second time around revealing another aspect of statistical thinking.

The Discontent of Being Averaged Averages are like sleeping pills: they put you in a state of stupor, and if you overdose, they may kill you.

That must have been how the investors in Bernie Madoff's hedge fund felt in 2008, when they learned the ugly truth about the streak of stable monthly returns they'd been receiving up until then. In the dream world they took as real, each month was an average month; variability was conquered-nothing to worry about. Greed was the root cause of their financial ruin. Those who doubted the absence of variability in the reported returns could have saved themselves; instead, most placed blind faith in the average.

The overuse of averages pervades our society. In the business world, the popular notion of an annualized growth metric, also called "compound annual growth rate," is borne from erasing all year-to-year variations. A company that is expanding at 5 percent per year every year every year has the same annualized growth rate as one that is growing at 5 percent per year has the same annualized growth rate as one that is growing at 5 percent per year on average on average but operates in a volatile market so that the actual growth can range from 15 percent in one year to 10 percent in another. The financing requirements of these two businesses cannot be more different. While the compound annual growth rate provides a useful basic summary of the past, it conveys a false sense of stability when used to estimate the future. The statistical average simply carries no information about variability. but operates in a volatile market so that the actual growth can range from 15 percent in one year to 10 percent in another. The financing requirements of these two businesses cannot be more different. While the compound annual growth rate provides a useful basic summary of the past, it conveys a false sense of stability when used to estimate the future. The statistical average simply carries no information about variability.

Statistical thinking begins with noticing and understanding variability. What gets commuters upset? Not the average travel time to work, to which they can adjust. They complain about unexpected delays, occasioned by unpredictable accidents and weather emergencies. Such variability leads to uncertainty, which creates anxiety. Julie Cross, the Minnesota commuter in Chapter 1 Chapter 1, was surely not the only driver who found "picking the fastest route" to be "a daily gamble."

It is therefore no surprise that effective measures to control congestion attack the problem of variability. For Disney guests arriving during busy hours, FastPa.s.s lines eliminate the uncertainty of waiting time by s.p.a.cing out spikes in demand. Similarly, metered ramps on highways regulate the inflow of traffic, promising commuters smoother trips once they enter.

The Disney "Imagineers" and the highway engineers demonstrated impressive skills in putting theoretical science into practice. Their seminal achievements were in emphasizing the behavioral aspect of decision making. The Disney scientists learned to focus their attention on reducing perceived wait times, as distinct from actual wait times. In advocating perception management, they subordinated the well-established research program in queuing theory queuing theory, a branch of applied mathematics that has produced a set of sophisticated tools for minimizing actual average wait times in queues. As with traditional economics, queuing theory makes an a.s.sumption about rational human behavior that does not match reality. For example, in putting up signs showing inflated estimates of waiting time, the Disney engineers counted on irrationality, and customer surveys consistently confirmed their judgment. For further exploration of the irrational mind, see the seminal work of Daniel Kahneman, starting with his 2003 overview article "Maps of Bounded Rationality: Psychology for Behavioral Economics" in American Economic Review American Economic Review, and Predictably Irrational Predictably Irrational by Dan Ariely. by Dan Ariely.

Political considerations often intrude on the work of applied scientists. For instance, Minnesota state senator d.i.c.k Day seized upon the highway congestion issue to score easy points with his const.i.tuents, some of whom blamed the ramp-metering policy for prolonging their commute times. A huge commotion ensued, at the end of which the highway engineers were vindicated. The Minnesota Department of Transportation and the senator agreed to a compromise solution, making small changes to how the meters were operated. For applied scientists, this episode conveyed the valuable lesson that the technical good (reducing actual travel time) need not agree with the social good (managing the public's perception). Before the "meters shutoff" experiment, engineers doggedly pursued the goal of delaying the onset of congestion, which preserves the carrying capacity of highways and sustains traffic flow. The experiment verified the technical merit of this policy: the benefits of smoother traffic on the highway outweighed the drawback of waiting at on-ramps. Nevertheless, commuters disliked having to sit and stew at the ramps even more than they disliked the stop-and-go traffic on jam-packed highways.

Statisticians run experiments experiments to collect data in a systematic way to help make better decisions. In the Minnesota experiment, the consultants performed a form of to collect data in a systematic way to help make better decisions. In the Minnesota experiment, the consultants performed a form of prepost a.n.a.lysis prepost a.n.a.lysis. They measured traffic flow, trip time, and other metrics at preselected sections of the highways before the experiment and again at its conclusion. Any difference between the pre- period and post- period was attributed to shutting off the ramp meters.

But note that there is a hidden a.s.sumption of "all else being equal." The a.n.a.lysts were at the mercy of what they did not, or could not, know: was all else really equal? For this reason, statisticians take absolute caution in interpreting prepost studies, especially when opining on why the difference was observed during the experiment. The book Statistics for Experimenters Statistics for Experimenters by George Box, Stuart Hunter, and Bill Hunter is the cla.s.sic reference for proper design and a.n.a.lysis of experiments. (The Minnesota experiment could have benefited from more sophisticated statistical expertise.) by George Box, Stuart Hunter, and Bill Hunter is the cla.s.sic reference for proper design and a.n.a.lysis of experiments. (The Minnesota experiment could have benefited from more sophisticated statistical expertise.) Crossovers Insurance is a smart way to exploit variability, in this case, the ebb and flow of claims filed by customers. If all policyholders required payout concurrently, their total losses would swallow the c.u.mulative surplus collected from premiums, rendering insurers insolvent. By combining a large number of risks acting independently, actuaries can reliably predict average future losses and thus set annual premiums so as to avoid financial ruin. This cla.s.sic theory works well for automotive insurance but applies poorly to catastrophe insurance, as Tampa businessman Bill Poe painfully discovered.

For auto insurers, the level of total claims is relatively stable from year to year, even though individual claims are dispersed over time. By contrast, catastrophe insurance is a "negative black swan" business, to follow Na.s.sim Taleb's terminology. In Taleb's view, business managers can be lulled into ignoring certain extremely unlikely events ("black swans") just because of the remote chance of occurrence, even though the rare events have the ability to destroy their businesses. Hurricane insurers hum along merrily, racking up healthy profits, until the big one ravages the Atlantic coast, something that has little chance of happening but wreaks extreme damage when it does happen. A mega-hurricane could cause $100 billion in losses-fifty to a hundred times higher than the damage from the normal storm. The cla.s.sic theory of insurance, which invokes the bell curve, breaks down at this point because of extreme variability and severe spatial concentration of this risk. When the black swan appears, a large portion of customers makes claims simultaneously, overwhelming insurers. These firms might still be solvent on average on average-meaning that over the long run, their premiums would cover all claims-but the moment cash balances turn negative, they implode. Indeed, catastrophe insurers who fail to plan for the variability of claims invariably find themselves watching in horror as one ill wind razes their entire surplus.

Statisticians not only notice variability but also recognize its type. The more moderate type of variability forms the foundation of the automotive insurance business, while the extreme type threatens the hurricane insurers. This is why the government "take-out" policy, in which the state of Florida subsidizes entrepreneurs to take over policies from failed insurers, made no sense; the concentrated risks and thin capital bases of these start-up firms render them singularly vulnerable to extreme events.

Variability is the reason why a steroid test can never be perfectly accurate. When the International Cycling Union (UCI), the governing body for cycling, inst.i.tuted the hematocrit test as a makeshift method for catching EPO dopers, it did not designate a positive finding as a doping violation; rather, it set a threshold of 50 percent as the legally permissible hematocrit level for partic.i.p.ation in the sport. This decision reflected UCI's desire to ameliorate the effect of any false-positive errors, at the expense of letting some dopers escape detection. If all normal men were to have red blood cells amounting to precisely 46 percent of their blood volume (and all dopers were to exceed 50 percent), then a perfect test can be devised, marking up all samples with hematocrit levels over 46 percent as positive, and those below 46 percent as negative. In reality, it is the proverbial "average male" who comes in at 46 percent; the "normal" hematocrit level for men varies from 42 to 50 percent. This variability complicates the tester's job: someone with red cell density of, say, 52 percent can be a blood doper but can also be a "natural high," such as a highlander who, by virtue of habitat, has a higher hematocrit level than normal.

UCI has since inst.i.tuted a proper urine test for EPO, the hormone abused by some endurance athletes to enhance the circulation of oxygen in their blood. Synthetic EPO, typically harvested from ovary cells of Chinese hamsters, is prescribed to treat anemia induced by kidney failure or cancer. (Researchers noted a portion of the annual sales of EPO could not be attributed to proper clinical use.) Because EPO is also naturally secreted by the kidneys, testers must distinguish between "natural highs" and "doping highs." Utilizing a technique known as isoelectric focusing, the urine test establishes the acidity profiles of EPO and its synthetic version, which are known to be different. Samples with a basic area percentage (BAP), an inverse measure of acidity, exceeding 80 percent were declared positive, and these results were attributed to illegal doping (see Figure C-1 Figure C-1).

To minimize false-positive errors, timid testers set the threshold BAP to pa.s.s virtually all clean samples including "natural highs," which had the effect of also pa.s.sing some "doping highs." This led Danish physician Rasmus Damsgaard to a.s.sert that many EPO-positive urine samples were idling in World Anti-Doping Agency (WADA) labs, their illicit contents undetected. If testers would lower the threshold, more dopers would get caught, but a few clean athletes would be falsely accused of doping. This trade-off is as undesirable as it is unavoidable. The inevitability stems from variability between urine samples: the wider the range of BAP, the harder it is to draw a line between natural and doping highs.

Figure C-1 Drawing a Line Between Natural and Doping Highs Drawing a Line Between Natural and Doping Highs [image]

[image]

Because the anti-doping laboratories face bad publicity for false positives (while false negatives are invisible unless the dopers confess), they calibrate the tests to minimize false accusations, which allows some athletes to get away with doping.

The Virtue of Being Wrong The subject matter of statistics is variability, and statistical models statistical models are tools that examine why things vary. A disease outbreak model links causes to effects to tell us why some people fall ill while others do not; a credit-scoring model identifies correlated traits to describe which borrowers are likely to default on their loans and which will not. These two examples represent two valid modes of statistical modeling. are tools that examine why things vary. A disease outbreak model links causes to effects to tell us why some people fall ill while others do not; a credit-scoring model identifies correlated traits to describe which borrowers are likely to default on their loans and which will not. These two examples represent two valid modes of statistical modeling.

George Box is justly celebrated for his remark "All models are false but some are useful." The mark of great statisticians is their confidence in the face of fallibility. They recognize that no one can have a monopoly on the truth, which is unknowable as long as there is uncertainty in the world. But imperfect information does not intimidate them; they seek models that fit the available evidence more tightly than all alternatives. Box's writings on his experiences in the industry have inspired generations of statisticians; to get a flavor of his engaging style, see the collection Improving Almost Anything Improving Almost Anything, lovingly produced by his former students.

More ink than necessary has been spilled on the dichotomy between correlation and causation. Asking for the umpteenth time whether correlation implies causation is pointless (we already know it does not). The question Can correlation be useful without causation? Can correlation be useful without causation? is much more worthy of exploration. Forgetting what the textbooks say, most pract.i.tioners believe the answer is quite often yes. In the case of credit scoring, correlation-based statistical models have been wildly successful even though they do not yield simple explanations for why one customer is a worse credit risk than another. The parallel development of this type of model by researchers in numerous fields, such as pattern recognition, machine learning, knowledge discovery, and data mining, also confirms its practical value. is much more worthy of exploration. Forgetting what the textbooks say, most pract.i.tioners believe the answer is quite often yes. In the case of credit scoring, correlation-based statistical models have been wildly successful even though they do not yield simple explanations for why one customer is a worse credit risk than another. The parallel development of this type of model by researchers in numerous fields, such as pattern recognition, machine learning, knowledge discovery, and data mining, also confirms its practical value.

In explaining how credit scoring works, statisticians emphasize the similarity between traditional and modern methods; much of the criticism leveled at credit-scoring technology applies equally to credit officers who make underwriting decisions by handcrafted rules. Credit scores and rules of thumb both rely on information from credit reports, such as outstanding account balances and past payment behavior, and such materials contain inaccurate data independently of the method of a.n.a.lysis. Typically, any rule discovered by the computer is a rule the credit officer would also use if he or she knew about it. While the complaints from consumer advocates seem reasonable, no one has yet proposed alternatives that can overcome the problems common to both systems. Statisticians prefer the credit-scoring approach because computers are much more efficient than loan officers at generating scoring rules, the resulting rules are more complex and more precise, and they can be applied uniformly to all loan applicants, ensuring fairness. Industry leaders concur, pointing out that the advent of credit scoring precipitated an explosion in consumer credit, which boosted consumer spending, hoisting up the U.S. economy for decades. Consider this: since the 1970s, credit granted to American consumers has exploded by 1,200 percent, while the deep recession that began in 2008 has led to retrenchment at less than 10 percent a year.

Statistical models do not relieve business managers of their responsibility to make prudent decisions. The credit-scoring algorithms make educated guesses on how likely each applicant will be to default on a loan but shed no light on how much risk an enterprise should shoulder. Two businesses with different appet.i.tes for risk will make different decisions, even if they use the same credit-scoring system.

When correlation is not enough to be useful without causation, the stakes get dramatically higher. Disease detectives must set their sights on the source of contaminated foods, as it is irresponsible to order food recalls, which cripple industries, based solely on evidence of correlation. The bagged spinach case of 2006 revealed the sophistication required to solve such a riddle. The epidemiologists used state-of-the-art statistical tools like the casecontrol study and information-sharing networks; because they respect the limits of these methods, they solicited help from laboratory and field personnel as well.

The case also demonstrated the formidable challenges of outbreak investigations: urgency mounted as more people reported sick, and key decisions had to be made under much uncertainty. In the bagged-spinach investigation, every piece of the puzzle fell neatly into place, allowing the complete causal path to be traced, from the infested farm to the infected stool. Investigators were incredibly lucky to capture the P227A lot code and discover the specific shift when the contamination had occurred. Many other investigations are less than perfect, and mistakes not uncommon. For example, a Taco Bell outbreak in November 2006 was initially linked to green onions but later blamed on iceberg lettuce. In 2008, when the Food and Drug Administration (FDA) claimed tomatoes had caused a nationwide salmonella outbreak, stores and restaurants immediately yanked tomatoes from their offerings, only to discover later that they had been victims of a false alarm. Good statisticians are not daunted by these occasional failures. They understand the virtue in being wrong, as no model can be perfect; they particularly savor those days when everything works out, when we wonder how they manage to squeeze so much out of so little in such a short time.

Crossovers Disney fans who use Len Testa's touring plans pack in an amazing number of attractions during their visits to Disney theme parks, about 70 percent more than the typical tourist; they also shave off three and a half hours of waiting time and are among the most gratified of Disney guests. In putting together these plans, Testa's team took advantage of correlations. Most of us realize that many factors influence wait times at a theme park, such as weather, holiday, time of the day, day of the week, crowd level, popularity of the ride, and early-entry mornings. Similar to credit-scoring technology, Testa's algorithm computed the relative importance of these factors. He told us that the popularity of rides and time of day matter the most (both rated 10), followed by crowd level (9), holiday (8), early-entry morning (5), day of week (2), and weather (1). Thus, in terms of total waiting time, there really was no such thing as an off-peak day or a bad-weather day. How did Testa know so much?

Testa embraced what epidemiologists proudly called "shoe leather," or a lot of walking. On any brilliant summer day in Orlando, Florida, Testa could be spotted among the jumpy 8:00 A.M A.M. crowd at the gates of Walt Disney World, his ankles taped up and toes greased, psyched up for the rope drop. The entire day, he would be shuttling between rides. He would get neither in line nor on any ride; every half hour, upon finishing one loop, he would start over at the first ride. He would walk for nine hours, logging eighteen miles. To cover even more ground, he had a small staff take turns with different rides, all year round. In this way, they collected wait times at every ride every thirty minutes. Back at the office, the computers scanned for patterns.

Testa's model did not attempt to explain why certain times of the day were busier than others; it was enough to know which times to avoid. As interesting as it would be to know how each step of a touring plan decreased their wait times, Testa's millions of fans care about only one thing: whether the plan let them visit more rides, enhancing the value of their entry tickets. The legion of satisfied readers is testimony to the usefulness of this correlational model.

Polygraphs rely strictly on correlations between the act of lying and certain physiological metrics. Are correlations useful without causation? In this case, statisticians say no. To avoid falsely imprisoning innocent people based solely on evidence of correlation, they insist that lie detection technology adopt causal modeling of the type practiced in epidemiology. They caution against logical overreach: Liars breathe faster. Adam's breaths quickened. Therefore, Adam was a liar. Liars breathe faster. Adam's breaths quickened. Therefore, Adam was a liar. Deception, or stress related to it, is only one of many possible causes for the increase in breathing rate, so variations in this or similar measures need not imply lying. As with epidemiologists studying spinach and Deception, or stress related to it, is only one of many possible causes for the increase in breathing rate, so variations in this or similar measures need not imply lying. As with epidemiologists studying spinach and E. coli E. coli, law enforcement officials must find corroborative evidence to strengthen their case, something rarely accomplished. A noteworthy finding of the 2002 NAS report was that scientific research into the causes of physiological changes a.s.sociated with lying has not kept up with the spread of polygraphs. The distinguished review panel on the report underlined the need for coherent psychological theories that explain the connection between lying and various physiological measures.

For the same reason, data-mining models for detecting terrorists are both false and useless. Data-mining models uncover patterns of correlation. Statisticians tell us that rounding up suspects based on these models will inevitably ensnare hundreds or thousands of innocent citizens. Linking cause to effect requires a much more sophisticated, multidisciplinary approach, one that emphasizes shoe leather, otherwise known as human intelligence gathering.

The Dilemma of Being Together In 2007, the average college-bound senior scored 502 in the Critical Reading (verbal) section of the SAT. In addition, girls performed just as well as boys (502 and 504, respectively), so nothing was lost by reporting the overall average score, and a bit of simplicity was gained. The same could not be said of blacks and whites, however, as the average black student tallied 433, almost 100 points below the average white student's score of 527. To aggregate or not to aggregate: that is the dilemma of being together. Should statisticians reveal several group averages or one overall average?

The rule of thumb is to keep groups together if they are alike and to set them apart if they are dissimilar. In our example, after the hurricane disasters of 20042005, insurers in Florida rea.s.sessed the risk exposure of coastal residents, deciding that the difference relative to inland properties had widened so drastically that the insurers could no longer justify keeping both groups together in an undifferentiated risk pool. Doing so would have been wildly unfair to the inland residents.

The issue of group differences group differences is at the heart of the dilemma. When group differences exist, groups should be disaggregated. It is a small tragedy to have at our disposal ready-made groups to part.i.tion people into, such as racial groups, income groups, and geographical groups. This easy categorization conditions in us a cavalier att.i.tude toward forming comparisons between blacks and whites, the rich and the poor, red and blue states, and so on. Statisticians tell us to examine such group differences carefully, as they frequently cover up nuances that break the general rule. For instance, the widely held notion that the rich vote Republican fell apart in a review of state-by-state data. Andrew Gelman, a statistician at Columbia University, found that this group difference in voting behavior surfaced in "poor" states like Mississippi but not in "rich" states like Connecticut. (See his fascinating book is at the heart of the dilemma. When group differences exist, groups should be disaggregated. It is a small tragedy to have at our disposal ready-made groups to part.i.tion people into, such as racial groups, income groups, and geographical groups. This easy categorization conditions in us a cavalier att.i.tude toward forming comparisons between blacks and whites, the rich and the poor, red and blue states, and so on. Statisticians tell us to examine such group differences carefully, as they frequently cover up nuances that break the general rule. For instance, the widely held notion that the rich vote Republican fell apart in a review of state-by-state data. Andrew Gelman, a statistician at Columbia University, found that this group difference in voting behavior surfaced in "poor" states like Mississippi but not in "rich" states like Connecticut. (See his fascinating book Red State, Blue State, Rich State, Poor State Red State, Blue State, Rich State, Poor State for more on this topic.) Similarly, the Golden Rule settlement failed because the procedure for screening out unfair test items lumped together students with divergent ability levels. The mix of ability levels among black students varied from that among whites, so this rule produced many false alarms, flagging questions as unfair even when they were not. for more on this topic.) Similarly, the Golden Rule settlement failed because the procedure for screening out unfair test items lumped together students with divergent ability levels. The mix of ability levels among black students varied from that among whites, so this rule produced many false alarms, flagging questions as unfair even when they were not.

Statisticians regard this as an instance of the famous Simpson's paradox Simpson's paradox: the simultaneous and seemingly contradictory finding that no difference exists between high-ability blacks and high-ability whites; no difference exists between low-ability blacks and low-ability whites; and when both ability levels are combined, blacks fare significantly worse than whites. To our amazement, the act of aggregation manufactures an apparent racial gap!

Here is what one would expect: since the group differences are zero for both high- and low-ability groups, the combined difference should also be zero. Here is the paradox: the statistics show that in aggregate, whites outperform blacks by 80 points (the bottom row of Figure C-2 Figure C-2). However, the confusion dissipates upon realizing that white students typically enjoy better educational resources than blacks, a fact acknowledged by the education community, so the average score for whites is more heavily weighted toward the score for high-ability students, and the average for blacks toward the score for low-ability students. In resolving the paradox, statisticians compute an average for each ability level so as to compare like with like. Simpson's paradox is a popular topic in statistics books, and it is a complicated concept at first glance.

Figure C-2 Aggregation Creates a Difference: An Ill.u.s.tration of Simpson's Paradox Aggregation Creates a Difference: An Ill.u.s.tration of Simpson's Paradox [image]

The recognition of Simpson's paradox led to a breakthrough in fair testing. The procedure for differential item functioning (DIF) a.n.a.lysis, introduced in Chapter 3 Chapter 3, divides examinees into groups of like ability and then compares average correct rates within these groups. Benefiting from research by the Educational Testing Service (ETS) in the 1980s, DIF a.n.a.lysis has rapidly gained acceptance as the scientific standard. In practice, ETS uses five ability groups based on total test score. For the sake of simplicity, we only concerned ourselves with the case of two groups.

The strategy of stratification stratification (a.n.a.lyzing groups separately) is one way to create like groups for comparison. A superior alternative strategy is (a.n.a.lyzing groups separately) is one way to create like groups for comparison. A superior alternative strategy is randomization randomization, when feasible. Statisticians frequently a.s.sign test subjects randomly into one group or another; say, in a clinical trial, they will select at random some patients to be given placebos, and the remainder to receive the medicine under study. Because of random a.s.signment, the groups will have similar characteristics: the mix of races will be the same, the mix of ages will be the same, and so on. In this way, "all else being equal" is a.s.sured when one group is chosen for special treatment. If the treatment has an effect, the researcher does not have to worry about other contributing factors. While statisticians prefer randomization to stratification, the former strategy is sometimes infeasible. For example, in DIF a.n.a.lysis, social norms would prevent one from exposing some students randomly randomly to higher-quality schools and others to lower-quality schools. to higher-quality schools and others to lower-quality schools.

By contrast, the attempt by Florida insurers to disaggregate the hurricane risk pools has pushed the entire industry to the brink in the late 2000s. This consequence is hardly surprising if we recall the basic principle of insurance-that partic.i.p.ants agree to cross-subsidize each other in times of need. When the high-risk coastal policies are split off and pa.s.sed to take-out companies with modest capital bases, such as Poe Financial Group, or to Citizens Property Insurance Corporation, the state-run insurer of last resort, these ent.i.ties must shoulder a severe concentration of exposure, putting their very survival into serious question. In 2006, Poe became insolvent after 40 percent of its customers bled a surplus of ten years dry in just two seasons.

Crossovers By Arnold Barnett's estimation, between 1987 and 1996, air carriers in the developing world sustained 74 percent of worldwide crash fatalities while operating only 18 percent of all flights (see Figure C-3 Figure C-3a). If all airlines were equally safe, we would expect the developing-world carriers to share around 18 percent of fatalities. To many of us, the message could not be clearer: U.S. travelers should stick to U.S. airlines.

Yet Barnett contended that Americans gained nothing by "buying local," because developing-world carriers were just as safe as those in the developed world. He looked at the same numbers as most of us but arrived at an opposite conclusion, one rooted in the statistics of group differences. Barnett discovered that the developing-world airlines had a much better safety record on "between-worlds" routes than on other routes. Thus, lumping together all routes created the wrong impression.

Since domestic routes in most countries are dominated by home carriers, airlines compete with each other only on international routes; in other words, about the only time American travelers get to choose a developing-world carrier is when they are flying between the two worlds. Hence, only the between-worlds routes are relevant. On these relevant routes, over the same period, developing-world carriers suffered 55 percent of the fatalities while making 62 percent of the flights (see Figure C-3 Figure C-3b). That indicates they weren't more dangerous than developed-world airlines.

Figure C-3 Stratifying Air Routes: Relative Proportion of Flights and Deaths by Developing-World and Developed-World Carriers, 19871996 Stratifying Air Routes: Relative Proportion of Flights and Deaths by Developing-World and Developed-World Carriers, 19871996 [image]

Group differences entered the picture again when comparing developed-world and developing-world carriers on between-worlds routes only. The existence of a group difference in fatality rates between the two airline groups is what would compel us to reject the equal-safety hypothesis.

Any stratification strategy should come with a big warning sign, statisticians caution. Beware the cherry-picker who draws attention only to one group out of many. If someone presented only Figure C-3 Figure C-3b, we could miss the mediocre safety record of developing-world carriers on their domestic routes, surely something we ought to know while touring around a foreign country.

Such mischief of omission can generally be countered by asking for information on every group, whether relevant or not.

Stratification produces like groups for comparison. This procedure proved essential to the proper fairness review of questions on standardized tests. Epidemiologists have known about this idea since Sir Bradford Hill and Sir Richard Doll published their landmark 1950 study linking smoking to lung cancer, which heralded the casecontrol study as a viable method for comparing groups. Recall that Melissa Plantenga, the a.n.a.lyst in Oregon, was the first to identify the eventual culprit in the bagged-spinach case, and she based her hunch on a 450-item shotgun questionnaire, which revealed that four out of five sickened patients had consumed bagged spinach. Disease detectives cannot rely solely on what proportion of the "cases" (those patients who report sickness) were exposed to a particular food; they need a point of reference-the exposure rate of "controls" (those who are similar to the cases but not ill). A food should arouse suspicion only if the cases have a much higher exposure rate to it than do the controls. Statisticians carefully match cases and controls to rule out any known other factors that may also induce the illness in one group but not the other.

In 2005, a year before the large E. coli E. coli outbreak in spinach, pre-packaged lettuce salad was blamed for another outbreak of outbreak in spinach, pre-packaged lettuce salad was blamed for another outbreak of E. coli E. coli, also of cla.s.s O157:H7, in Minnesota. The investigators interviewed ten cases, with ages ranging from three to eighty-four, and recruited two to three controls, with matching age, for each case patient. In the casecontrol study, they determined that the odds of exposure to prepackaged lettuce salad was eight times larger for cases than for controls; other evidence subsequently confirmed this hypothesis.

The result of the study can also be expressed thus: among like people, those in the group who fell ill were much more likely to have consumed prepackaged lettuce salad than those in the group who did not become ill (see Figure C-4 Figure C-4). In this sense, the casecontrol study is a literal implementation of comparing like with like. When like groups are found to be different, statisticians will treat them separately.

Figure C-4 The CaseControl Study: Comparing Like with Like The CaseControl Study: Comparing Like with Like [image]

The Sway of Being Asymmetric If all terrorists use barbecue barbecue as a code word and we know Joe is a terrorist, then we are certain Joe also uses the word as a code word and we know Joe is a terrorist, then we are certain Joe also uses the word barbecue barbecue. Applying a general truth (all terrorists) to a specific case (Joe the terrorist) is natural; going the other way, from the specific to the general, carries much peril, and that is the playground for statisticians. If we are told Joe the terrorist says "barbecue" a lot, we cannot be sure that all other terrorists also use that word, as even one counter-example invalidates the general rule.

Therefore, when making a generalization, statisticians always attach a margin of error margin of error, by which they admit a chance of mistake. The inaccuracy comes in two forms: false positives false positives and and false negatives false negatives, which are (unhelpfully) called type I type I and and type II type II errors in statistics texts. They are better understood as false alarms and missed opportunities. Put differently, accuracy encompa.s.ses the ability to correctly detect positives as well as the ability to correctly detect negatives. In medical parlance, the ability to detect true positives is known as errors in statistics texts. They are better understood as false alarms and missed opportunities. Put differently, accuracy encompa.s.ses the ability to correctly detect positives as well as the ability to correctly detect negatives. In medical parlance, the ability to detect true positives is known as sensitivity sensitivity, and the ability to detect true negatives is called specificity specificity. Unfortunately, improving one type of accuracy inevitably leads to deterioration of the other. See the textbook Stats: Data and Models Stats: Data and Models by Richard D. De Veaux for a formal discussion under the topic of by Richard D. De Veaux for a formal discussion under the topic of hypothesis testing hypothesis testing, and the series of illuminating expositions on the medical context by Douglas Altman, published in British Medical Journal British Medical Journal.

When anti-doping laboratories set the legal limit for any banned substance, they also fix the trade-off between false positives and false negatives. Similarly, when researchers configure the computer program for the PCa.s.s portable lie detector to attain desired proportions of red, yellow, and green results, they express their tolerance of one type of error against the other. What motivates these specific modes of operation? Our discussion pays particular attention to the effect of incentives incentives. This element falls under the subject of decision theory decision theory, an area that has experienced a burst of activity by so-called behavioral social scientists.

In most real-life situations, the costs of the two errors are unequal or asymmetric asymmetric, with one type being highly publicized and highly toxic, and the other side going unnoticed. Such imbalance skews incentives. In steroid testing, false negatives are invisible unless the dopers confess, while false positives are invariably mocked in public. No wonder timid testers tend to underreport positives, providing inadvertent cover for many dopers. In national-security screening, false negatives could portend frightening disasters, while false positives are invisible until the authorities reverse their mistakes, and then only if the victims tell their tales. No wonder the U.S. Army configures the PCa.s.s portable polygraph to minimize false negatives.

Not surprisingly, what holds sway with decision makers is the one error that can invite bad press. While their actions almost surely have made the other type of error worse, this effect is hidden from view and therefore neglected. Because of such incentives, we have to worry about false negatives in steroid testing and false positives in polygraph and terrorist screening. For each drug cheat caught by anti-doping labs, about ten other cheaters have escaped detection. For each terrorist trapped by polygraph screening, hundreds if not thousands of innocent citizens have been falsely implicated. These ratios are worse when the targets to be tested are rarer (and spies or terrorists are rare indeed).

The bestselling Freakonomics Freakonomics provides a marvelously readable overview of behavioral economics and incentives. The formulas for false positives and false negatives involve provides a marvelously readable overview of behavioral economics and incentives. The formulas for false positives and false negatives involve conditional probabilities conditional probabilities and the famous and the famous Bayes' rule Bayes' rule, a landmark of any introductory book on statistics or probability. For the sake of simplicity, textbook a.n.a.lysis often a.s.sumes the cost of each error to be the same. In practice, these costs tend to be unequal and influenced by societal goals such as fairness as well as individual characteristics such as integrity that may conflict with the objective of scientific accuracy.

Crossovers Banks rely on credit scores to make decisions on whether to grant credit to loan applicants. Credit scores predict how likely customers are to repay their loans; arising from statistical models, the scores are subject to errors. Like polygraph examiners, loan officers have strong incentives to reduce false negatives at the expense of false positives. False-negative mistakes put money in the hands of people who will subsequently default on their loans, leading to bad debt, write-offs, or even insolvency for the banks. False-positive errors result in lost sales, as the banks deny worthy applicants who would otherwise have fulfilled their obligations. Notice, however, that false positives are invisible to the banks: once the customers have been denied loans, the banks could not know if they would have met their obligations to repay the loan or not. Unsurprising, such asymmetric costs coax loan officers into rejecting more good customers than necessary while reducing exposure to bad ones. It is no accident that these decisions are undertaken by the risk management department, rather than sales and marketing.

The incentive structure is never static; it changes with the business cycle. During the giant credit boom of the early 2000s, low interest rates pumped easy money into the economy and greased a cheap, abundant supply of loans of all types, raising the opportunity cost of false positives (missed sales). At the same time, the economic expansion lifted all boats and lessened the rate of default of the average borrower, curtailing the cost of false negatives (bad debt). Thus, bank managers were emboldened to chase higher sales at what they deemed lower risks. But there was no free lunch: dialing down false positives inevitably generated more false negatives, that is, more bad debt. Indeed, by the late 2000s, banks that had unwisely relaxed lending standards earlier in the decade sank under the weight of delinquent loans, which was a key factor that tipped the United States into recession.

Jeffrey Rosenthal applied some statistical thinking to prove that mom-and-pop store owners had defrauded Ontario's Encore lottery. Predictably, a howl of protests erupted from the accused. Leaders of the industry chimed in, too, condemning his d.a.m.ning report as "outrageous" and maintaining that store owners had "the highest level of integrity."

Was it a false alarm? From the statistical test, we know that if store owners had an equal chance in the lotteries as others, then the probability they could win at least 200 out of 5,713 prizes was one in a quindecillion (1 followed by forty-eight zeros), which was practically zero. Hence, Rosenthal rejected the no-fraud hypothesis as impossible. The suggestion that he had erred was tantamount to believing that the insiders had beaten the rarest of odds fair and square. The chance of this scenario occurring naturally-that is, the chance of a false alarm-would be exactly the previous probability. Thus, we are hard-pressed to doubt his conclusion.

(Recall that there is an unavoidable trade-off between false positives and false negatives. If Rosenthal chose to absorb a higher false-positive rate-as much as one in a hundred is typical-he could reduce the chance of a false negative, which is the failure to expose dishonest store owners. This explains why he could reject the no-fraud hypothesis for western Canada as well, even though the odds of 1 in 2.3 million were higher.) The Power of Being Impossible Statistical thinking is absolutely central to the scientific method, which requires theories to generate testable hypotheses. Statisticians have created a robust framework for judging whether there is sufficient evidence to support a given hypothesis. This framework is known as statistical testing statistical testing, also called hypothesis testing hypothesis testing or or significance testing significance testing. See De Veaux's textbook Stats: Data and Models Stats: Data and Models for a typically fluent introduction to this vast subject. for a typically fluent introduction to this vast subject.

Take the fear of flying developing-world airlines. This anxiety is based on the hunch that air carriers in the developing world are more p.r.o.ne to fatal accidents than their counterparts in the developed world. Arnold Barnett turned around this hypothesis and reasoned as follows: if the two groups of carriers were equally safe, then crash fatalities during the past ten years should have been scattered randomly among the two groups in proportion to the mix of flights among them. Upon examining the flight data, Barnett did not find sufficient evidence to refute the equal-safety hypothesis.

All of Barnett's various inquiries-the comparison between developed-world and developing-world carriers, the comparison among U.S. domestic carriers-pointed to the same general result: that it was not not impossible for these airlines to have equal safety. That was what he meant by pa.s.sengers having "nowhere to run"; the next u

Numbers Rule Your World Part 5

Numbers Rule Your World - novelonlinefull.com

RECENTLY UPDATED MANGA

Global Lord: 100% Drop Rate

Legend of Swordsman

Alchemy Emperor Of The Divine Dao

Farming in the Mountains: Max Level Jiaojiao Is Three Years Old

Warlock Apprentice

Reborn at Boot Camp: General, Don’t Mess Around!

Heaven Extinction Martial Emperor

Springtime Farming: A Happy Wife At Home

Warning : Providence the Beauty is Driven to Villainy

Absolute Resonance

Found 100 Million In My Rented Apartment

Complete Martial Arts Attributes

Numbers Rule Your World Part 5 summary