This was a disappointing read, since I had wanted more details on Patterson’s research and thoughts about the evidence supporting that Tutankhamen was killed. While I wasn’t expecting a work of historical scholarship, I did not anticipate that he was going to dramatize his interpretation of this slice of Egyptian history. This would have been fine, but I will be honest and admit that I wasn’t in the mood for it. Especially since the writing style is clipped, with a Dick and Jane cadence. I do not care for it.

There were two reasons I did not like the book. The first is that Patterson talks up his research. Due to the exposition style, it was unclear how much research he had done, compared to pure invention. I don’t mean that Patterson did not get the dates and major events right, but since drama requires a bit more flavor, there are certainly liberties he took with constructing the details of Egyptian life. The dialogue is one example; the thoughts and motivations he ascribes to the pharoah, queen, and the court functionaries are another. However, this wouldn’t be so bad if the pieces of research leading to his thesis, that King Tut was murdered, wasn’t so weak.

The weakness in the evidence and the long build-up make up the second fault. As far as I could tell, Patterson calls this a homicide based on a cranial wound (as determined from CT scans of the mummy skull), the elevation of 3 pharoahs from Tut’s court, and the small tomb and lack of hieroglyphic records of Tut. The fresh piece of evidence is in fact the head wound. The rest of the evidence had been known, and certainly the circumstances described does not rule out murder. The fact that following Tut, all three subsequent rulers came from his court is consistent with foul play. First, Tut’s wife/sister succeeded him, then his court advisor, then his general. Human ambition being what it is, one can construct all sorts of stories about Tut’s wife and the court advisor. The lack of mention in the hieroglyphic record may be due to incompleteness in the the record, although it could also be interpreted as the systematic obliteration of Tut’s legacy. Burying Tut in a small tomb also could indicate carelessness, and at least diffidence in how the pharaoh was laid to rest. But it might just mean that Tut was not liked. Or it could mean the murder was going through the motions of the burial. But then why would the murdered line the small tomb with treasure? One might think the head wound would prove crucial to Patterson’s case that tips the theory in favor of murder.

Yet Patterson, in his dramatization, documents the wound as stemming from a chariot fall. Hmm. And, during the assassination scene, the killer supposedly suffocated the pharaoh (fine, that was fiction. I suppose Patterson found it to be weird to have the killer strike the pharaoh on the exact spot injured from the fall – there was only a single wound to the head.) So the smashing new bits of insight wasn’t even used to weave a consistent story regarding the murder of Tut. That I found strange. The lead up to the supposed new piece of evidence did not pay off. That would be fine for any writer but Patterson: he is a writer of detective stories. Are his other books so poorly tied together?

Although I had been expecting something a bit more serious (it certainly makes for good copy for a detective story writer to do a bit of crime investigation), the fact that the historical tidbits were translated into a story didn’t bother me, in and of itself. Yes, there are issues concerning the provenance of each detail, but as a whole, it works as one amateur’s interpretation of how Egypt’s ruling class lived. At some point, with the difficulty in translating hieroglyphics and the length of time separating us from the pharaohs, a scholar’s educated reconstruction of how these Egyptians lived may not fair any better than what Patterson can invent based on his research.

There were also other minor problems. Patterson wove three stories together: the story of the pharaohs, Patterson’s modern day research, and Howard Carter’s excavation of Egypt and his finding Tut’s tomb. Patterson, on two occasions, wrote of Carter’s removal from active excavation, and merely alluded to Carter’s personality clashes with his superiors. But somehow, Patterson did not recount the details of the arguments that led to Carter’s removal. He simply just wrote that Carter was about to flout the wrong people… and left it at that.

So, the major problem was that Patterson played up the historical research he and his co-author performed. It may have been submerged into the background details of the pharaoh’s story. But Patterson didn’t describe in clear terms what new evidence he had, and the story he wrote differed in interpretation, but not substance, from what was already known. And given the circumstantial evidence surrounding Tut’s tomb and succession, it seems strange that no one had posited that Tut was murdered, as Patterson seems to be suggesting.

What a strange book. The whole point of being is to trash intellectuals who idealizes the pursuit of freedom (either in behavior, in intellectual pursuits, from society). Paul Johnson admitted that it was unfair to use the private lives of individuals to judge the strength of their thoughts, but nonetheless he spent the entire book documenting the deficiencies of men who talked big and lived meanly. The quality of the men never matched the beauty of their vision, prose, or poetry.

The futility of such an exercise is noted early, in the chapter about Shelley. Johnson admits that this cad was a wastrel who had no compunction about writing mean letters detailing the failures of his parents while concurrently asking for money. Shelley used people, seeing his family as nothing but a source of income and women no more than a means for physical pleasure. Naturally, he thought himself liberal, dispensing with archaic institutions of monogamy. He expected his wife to accept his mistress to share their apartment, but he graciously extended the same privilege to his wife (whom apparently complained about this arrangement.)

Regardless, all this is peripheral: Johnson thinks Shelley wrote beautifully, and his poetry moved Johnson. Johnson writes,

The truth, however, is fundamentally different and to anyone who reveres Shelley as a poet (as I do) it is deeply disturbing. It emerges from a variety of sources, one of the most important of which is Shelley’s own letters.”

Great. But why should the gap between artisanal accomplishments and the empty lives of artists be so surprising, in an age when starlets, athletes, politicians, authors, musicians, and entertainers behave as if they were competing for the favor of the Borgias? Johnson already conceded the point that he can appreciate the artistry, if not the artist.

There was one high point in the book, though. Johnson destroyed Karl Marx on both a personal and professional level. In this instance, it seems that there are elements in Marx’s personality that might have directly resulted in the shoddy intellectual quality of his work. Marx made a better short form than long form writer; the long form exposed Marx’s deficiencies as a researcher and investigator. Das Kapital contained a number of misuse of evidence. Marx did do a spectacular job of digging up dirt on his enemies, though.

In a coda, Johnson links 2oth century atrocities to both secular intellectuals ignoring atrocities committed in their name and to the social milieu they created that promoted nihilism (namely in excesses of Communist regimes.)  It seems to me a simpler case that these mass murderers were ambitious, ruthless, and disposed to murder even before they encountered post-modern philosophy. As much as I detest social relativism, post-modernism, and religious dogma, I can’t fault these ideas as causing mass effects. I can, however, fault the men who, upon gaining power to commit atrocities, cloak their acts in the trappings of a recognizable philosophy.  To suggest that terrorists or dictators  valued life until reading a book seems to be placing the cart before the horse.

In the end, I do agree with Johnson in that it is so disappointing that philosophers rarely reach the ideals they espouse. So what else is new?

Statistical certainty

January 2, 2010

I read Bill Simmons’s The Book of Basketball. I enjoyed his book, as it is a fun survey of NBA history. The book isn’t just a numbers game or just breaking down plays. It includes enough human interest elements that it should appeal to a casual fan or diffident parties (like me; I can count the number of basketball games I’ve seen – TV or live – on both hands.) Simmons does a fantastic job of conveying his love of basketball. For me, he really brought different basketball eras to life, inserting comments from players, coaches, and sportswriters. He also seems fairly astute in breaking down plays and describing the flow of the game.

Yes, I bought the book because I think Bill Simmons’s writing. If you enjoy his blog, you will find that same breezy conversation style here. The man has a gift for dropping pop culture references and making it germane to his arguments. But what I like most is that he is earnest in trying to understand and to make his readers appreciate the people who play a game for a living.

His segment on Elgin Baylor was moving, in showing how racism affected this one man; in some ways, it was probably more effective than if he just talked in general terms about the 1960’s. His whole book works because it stays at the personal level. Even in his discussion of teams and individual players, he takes pains to discuss how this person was and is regarded by his peers and teammates.

In this way,  I think Simmons did a fantastic job of making a case that basketball can contain as much historical perspective as baseball. This is something that should not have to be argued. Baseball has a lock on “the generational game by which history can be measured” status. What seems important is that there are human elements that make it accessible between generations: things like fathers taking their sons to the games, talking about the games and players, the excitement of watching breathtaking physical acts that expand how one views the human condition, and the joy and agony of championship wins and losses. While baseball’s slow pace lends itself to the way history moves one (periods where nothing seems to happen punctuated by drama), it doesn’t mean other things happen in a vacuum. Style of play, the way the players are treated, and the composition of the player demographic all reflect the times. These games can be a reflection of society, and one can see the influence of racial injustice in something as mundane as box scores as integration occurred.

Simmons blend basketball performance, its history, and its social environment of basketball effectively, some examples could be found in his discussion of Dr. J, Russell, Baylor, Kareem, and Jordan. In discussing why there probably won’t be another Michael Jordan (or Hakeem, or Kevin McHale), he takes inventive routes. Most of his points relate to societal/basketball environment pressures. Players are drafted sooner, the high pay scale for draft picks lower motivation to prove their worth, and perhaps society itself would actively discourage players from behaving as competitively as Jordan did. I suppose it’s interesting, but I’m not sure if that matters so much if the player is perceived to be an excellent player. Regardless, it seems to me that Simmons has been thinking about these things for some time. And I found it fun to read his take on basketball.

And I liked this book because it gives the lie to the weird view that someone who hasn’t done something cannot make reasonable, intelligent statements about it. Simmons wasn’t a professional basketball player, but he certainly uses every resource available to absorb the history and characters populating the game. He read a fair bit, he watched and rewatched games, he talked to players, he talked to people who covered basketball and he watched some more.  And he isn’t afraid to raise issues that occur to readers; you’ll see what I mean when you read his footnotes.

The book (and his podcast) confirms my opinion of Simmons as the smart friend who’d be a blast to have (one who bleeds Celtics green, watches sports for a living, and must keep up with Hollywood gossip, gambles, and pop culture because it gives him ammunition for columns).

***

There are some issues with the book, mainly in how statistical analysis of basketball is portrayed. I should be upfront and say that these issues did not detract from his arguments (for reasons that will be clear later), but I wish he would reconcile eyeball and statistical information.  And because I’ve decided one focus of this blog should be how non-scientists deal with science (and scientists), I thought I should offer some thoughts on some of these issues.

I am somewhat undecided about how Simmons (and I suppose I am using him as a proxy for all “non-scientist”) actually feels about statistics. He claims that team sports like basketball and football are fundamentally different from baseball; the team component of the former increase the number of additive and subtractive interactions while the latter game is composed of individual units of performance.  Thus the increase in complexity makes it difficult to model. So he discards so called simple measures of NBA player performance like WP48, PER, and adjusted plus-minus.

His rationale is that these indicators ought to back up existing observations about NBA players. So Kobe Bryant needs to be ranked as a top-20 player of all time (WP48 ranks Bryant as a superior player – like Paul Pierce – and not a step or two behind Michael Jordan.) It seems like he wants statistics to tell him what he wants to hear, when in fact statistics helps you see things you don’t see.

But then that leads to my second point about Simmons: why does he need the model to back up his mental model of player performance? Put differently, why is it that he cannot accept differences in rankings calculated by some turn-the-crank-spit-out-value model? I think Simmons lacks a nuanced view of how these numbers ought to be interpreted, and that he refuses to see that a simple model can capture a great many things about a complex system. Sure, once you’ve set up your criteria (like some level of significance you are willing to accept), you align everything by it, but there is room for some judgement as to where that line is drawn.

Another way of describing a complex system is to say that there are many things going on at once, and they are all interacting in some way. There are 10 players on a basketball court. One player, with the ball, has options to pass, to shoot, or to move the ball. Within each of these options, he has a set of suboptions: which one of the other four guys do I pass to? Who’s open? Which open player has a good shot from where he is? Am I in my optimal position to shoot? Do I need to drive to the basket or kick the ball out to the perimenter? There are many more possibilities than these.

***

At one level, Simmons is right; it is useful to break things down into “hyperintelligent” stats – identifying the tendency of players (whether he likes breaking to his left or right when he’s starts driving from the top of the key, whether he is equally good in shooting from his left or right hand, how often he does a turnaround, fadeaway, or drives to the hoop), trying to figure out how many forced errors a defender creates, how often a unforced turnovers happen (like someone dribbling off his foot), how many blocks get slapped out of bounds vs being tipped to get possession, and so on.

But isn’t it just as intelligent to find an easy way of collapsing the complex game into a simple “x + y” formula? On several occasions, Simmons uses a short quote (and praises the person who said it) that captures everything he wanted to say in 15 pages. A simple model is analogous to that short quote.

More importantly, what if we didn’t need all these hyperintelligent stats to capture the essence of the game?

I just switched the problem from one of identifying player performance and productivity to one that captures the game a broad strokes. The two ideas are of course related but still distinct and should not be confused to mean the same thing.

This gets back to the original motives of the person who does the modeling.

If it’s a scientist or economist, I’ll tell you now that he is interested in getting the most impact with the least amount of work. He probably has to teach, run a lab/research program, and write grants and publications. He doesn’t have time to break game film down. And he certainly does not have the money to hire someone to look at game film (although I am sure he’ll have no lack of applicants for the job.) He spends his money finding people to do research and teach. If his research program is into finding ways to measure worker productivity, he will probably start with existing resources. So fine; he now has a database of NBA player box scores.

He’ll want to link these simple measures of player output to wins and losses. But players score points, not wins, and thankfully the difference in points scored and points given up correlate extremely well with wins and losses.

From there, it is relatively simple to do a linear regression for all players for all teams, finding how each of the box score stats relate to the overall points scored for each team. And as noted, some metrics have a higher correlation to the point difference (I will not use the term differential to mean difference; differential belongs to diff EQ’s.) Regardless, it seems an affliction for males that they rank things; so the researchers have these numbers, and it’s trivial to list players from high to low.

Now, here’s another consideration. In this, and in other branches of science, the data are not “clean”. That is, we scientists (generally) assume that the phenomenon we are observing conforms to a “normal” distribution – that is, there is some true state for the thing we observe (found by taking the average of our observations) and the individual pieces of observation hover around this true state (or average). So there is variation around the mean.

In my research, for example, I can measure neural responses in the olfactory bulb. I use optical indicators of neural activity; essentially, the olfactory bulb lights up with odor stimulation. The more the neurons respond, the brighter things get. The olfactory bulb is separated into these circular structures called glomeruli. Each glomerulus receives connections from the sensory neurons situated in the nose and the output neurons of the olfactory bulb (some other cells are also present, but they aren’t important for this story.)

When a smell is detected by humans (or animals and insects), what we mean is that some chemical from the odor source has been carried, through the air, into the nose and neurons become active (they fire “action potential spikes”). And the pattern of this activity, at the olfactory bulb, is quite similar – but not exactly the same – from animal to animal.

Sometimes, we see fewer responses to the same smell. Other times, we see a few more responses. Sometimes we see a different pattern from what we expect. Sometimes, we see no responses. This might happen once every 15 animals. Not a whole lot to take away from our general, broad stroke understanding of how this part of the brain processes smell information. In most cases, some of these things might be explained technically; the animal was in poor health, or our stimulus apparatus has a leak, or the smell compound is degraded. We know this because we can improve the signal by fixing the equipment or giving the animal a drug to clear up its nose (mucus secretion – snot! – is a problem).

And as a direct analogy to this WP48 vs “hyperintelligent stats” problem, we find that a complex smell (compose of hundreds of different chemicals) may be “recreated” by using a few of these chemicals. There is good empirical evidence this is the case: prepared food manufacturers and fragrance makers can mimick smells and flavor reasonably well. This is akin to capturing the essence of the smell (or sport) with a few simple chemicals (or box scores). And generally, we don’t even need people to describe to us what they smell to figure this out (i.e. break down game film to create detailed stats). We can simply force them to make them answer a simple question: do these two things smell the same to you, yes or no? Thus “complex” brain processes and decision making can be boiled down into a forced-choice test results. Do we lose information? Yes, but everyone realizes this is a start. As we know more, and new technology becomes available, we can do more and ask more with less effort. Then we will be able to better use the information we have. As far as I know, most statheads have access to box-scores (although there is nothing to stop them from breaking down game film aside from time and money issues.)

But that’s the broad strokes view. If we get into details (that is, as if we started working with the “hyperintelligent” stat breakdowns), we find that of course there is more going on, and that the differences we see are not only technical issues. For example, the pattern of activity we see differs slightly from animal to animal, but this is because the cells that form connections with the olfactory bulb do not hit the same spot. And if we can use a single chemical to recreate a smell, the smell itself is still different enough that humans generally can tell something is missing. So the other chemicals are in fact detected and contributing some information that the brain uses to form the sensation of smell. And we know that the way neurons respond to a single chemical differs from how they respond to a mixture, confirming that there is in fact additional information being transmitted.

The important point is that the simple model captures an important part, but not all, of the complex system. One problem that can occur with increasing the complexity of models is that overfitting occurs: the model becomes applicable to one small part, rather than the whole, system. Even game film breakdown hinders  if it gives you so many options that you are back where you started. You’d probably avoid focusing on rare events and just concentrate on the things that happen often – which, again, is the point of a simple model.

The intense break down of game film to provide detailed portraits of player effectiveness could be combined with the broad strokes analysis. A metric like WP48 can tell a coach where a player is deficient. The coach can use the detailed breakdown to figure out why the player isn’t rebounding, passing, shooting well, and so on. That’s where things like defensive pressure, help defense, and positional analysis can be used for further evaluation. And I’m not sure if stat heads argued otherwise.

Deficiencies of statistical models

As in the things that models explicitly ignores.

One thing statistical models do not address is the fan’s enjoyment of a player. Actually, I suppose one might be able simply chart percent-capacity of stadiums when a particular player comes to town, but that’s something I don’t think Simmons would argue. There’s something to be said about how a player scores: Simmons pays tribute to Russell and Baylor, the first players to make basketball a vertical game. He cites Dr. J. as introducing the urban playground style  into basketball. He loves talking about the egos of players, especially when players take MVP snubs personally and then dominates the so-called MVP in a subsequent game.

Simmons also offers a rebuttal to PER, adjusted plus/minus, and “wages of win” metrics in his ranking of Allen Iverson – by saying that he doesn’t care. It’s sufficient for him that he finds Iverson a presence on the court. His emotions are acted out as basketball plays. He finds Iverson’s toughness and anger on the court fascinating to watch.

But Simmons does use metrics: the standard box scores. I would ask this: if Iverson didn’t score as much as he did, would Simmons still care? As Berri has noted, the rankings by sportswriters, the salaries given to scorers, and PER rankings all correlate highly with volume scoring (i.e. the points total, not field-goal percentage). Despite the tortured arguments writers might make, and the lip service given to building a lineup with complete players, “good” players are players who score a lot.

However, I should be clear and say that Simmons’s approach does not detract from his defense of his rankings. He uses player and coach testimonies, historical relevance, visual appeal of their playing style, sports writers, and the box scores to generate a living portrait of these players as people. Outside of the box scores, there are enough grist for the mill. I would suggest that it is these arguments that make the whole argument process fun. Even in baseball, supposedly the sport with the most statistically validated models of player performance (and Berri would argue that basketball players and their contribution to team records are even more consistent), there are enough differences of opinion concerning impact, playing styles, and relvance to confound Hall of Fame/MVP arguments (see Joe Posnanski).

Because Simmons is upfront about his criteria (even if the judgement of each might be not as “objective” as a number), it is fine for him to weight non-statistical arguments for greatness. It’s how he defined the game. Just as Berri defined “player productivity” in terms of his WP48 metric. Because Berri publishes in peer-reviewed journal, he needs methods that are reproducible. Science, and in general the peer review process, is a different process than writing books or Hall-of-Fame arguments or historical rankings. The implicit understanding of peer-review is that the work is technically sound and reproducible. Berri cannot take the chance of publishing a Simmons-like set of criteria and have other sports economist “turn the crank” and come out with different rankings. But Berri can publish an algorithm, and proper implementation will yield the same results.

Does this mean that Berri is right? Or that a formula is better than Simmons’s criteria? Mostly no. The one time where it is “better” is when one is preparing the analysis for peer-review. In this case, it is nicer to have a formula, or a process, or a set of instructions, that yield the same result each and everytime the experiment is run. In other words, we try to remove our bias as much as possible. Bias here does not mean anything pernicious; it just is a catch-all term for how we think a certain way (with our own gut feelings about the validity of ideas and research direction). Being objective simply means we try to make sure that our interpretation conforms to the data, and that the work is good enough so that other researchers come to the same general conclusions.

I think Simmons actually doesn’t need to trash statistics, nor does he need to ignore it. Once he establishes ground rules, he can emphasize or deemphasize how important box scores are in his evaluation. As it is, I found his arguments compelling. His strength, again, is to make basketball history an organic thing. He does his best to eliminate the “you had to be there” barrier and tries to place the players in the context of their time.

Now, one might ask why stats can’t be used to resolve these arguments about all time greats. Leaving aside the issue of the different eras (and frankly, this can be addressed by normalizing performance scores to the standard deviation for a given time period, as Berri does here ), there is the issue of what the differences in these metrics mean. In the same article I cited, Berri reports that the standard deviation for the performance of all power forwards, defined by his WP48 metric, is about .110. His average basketball player has a WP48 of .100. Kevin Garnett, for example, has a WP48 (2002-2003) of 0.443. That translates roughly that Garnett is more than 4x as productive as an average player, but normalized to the standard deviation, he is only 3.5x as productive.

But how much different is a power forward from Kevin Garnett if the other forward has a WP48 of 0.343? One might interpret this to mean that Garnett is still nearly 1 standard deviation better than the other player, but it could also mean that their performance fall within 1 standard deviation of each other. Depending on the variation of each player’s performance for a given year, compared to his career mean, they could be statistically similar. That is, the difference might be accounted for by the “noise” in slight upticks/downticks in rebounds/assists/steals/turnovers/shooting percentages/blocks. If you prefer, how about the difference between a .300 hitter and a .330 hitter? Over 500 at-bats, the .300 has 150 hits, and the .330 hitter has 165; the difference would be 15 hits over the course of a season. Are the two hitters really that different? The answer would depend on the variability of batting average (for the compared players) and how these numbers look with a larger sample set (i.e. over a career with over 5000 at-bats, for instance.) The context for the difference must be analyzed.

Here’s another example: let’s assume that Simmons and Berri’s metric turned out similar listings, perhaps with different order (one difference is that Iverson would be nowhere near Berri’s top 96.) And further, let us assume that the career WP48 scores are essentially within 1.5 standard deviations of one another. How might Simmons break with the WP48 rankings?

Let us tackle how Berri would have constructed his ranking: he would simply list players from highest to lowest WP48. That’s probably because he is in peer-review article mode. And frankly, if you profess to have a metric, why would you throw it out? You might if, like Simmons, you defined the argument differently. Of his Pyramid of Fame rankings, he lists a few arguments that do not encompass basketball productivity. Again, the idea of historical relevance, player/coach testimony, and the style and flair of the players enter into Simmons’s arguments. So all things being equal, and if the difference in rankings by metric is slight, there really is no reason against weighing the statistics more than any other attribute. Heck, even if the metric differences are large, it wouldn’t matter. Simmons like his other arguments more anyway.

But if you do talk about the actions on the court, then I believe you are in fact constrained. Of the metrics I had mentioned, WP48 offers high correlation with point-difference and thus with win-loss records. Further, some of the other metrics actually correlate with points-scored by players, suggesting that there is no difference between that metric and simply looking at the aggregate point total. So there are actually models that do reasonably well in predicting and “explaining” the mechanics of how teams win and lose.

In a way, I think the power of a proper metric is not in ranking similarly “productive” players, but in identifying the surprisingly bad or good players. Iverson is an example of the former; Josh Smith (of the 2009-2010 Hawks) of the latter. It might not be as powerful a separator of players with similar scores, because their means essentially fall within 1 standard deviation of one another; in essense, they are statistically the same. In this case, it  helps to have other information to aid evaluation (and this isn’t easy; as Malcolm Gladwell has written, and Steven Pinker taken issue with, some measuring sticks are less reliable than others.)

Another example where statistics is powerful is in determining, in the aggregate, if player performance varies from year to year. Berri found that it isn’t, suggesting that the impact of coaching and teammate changes may not be as high as one thinks. However, such a finding in no way precludes coaches and teammates from having an effect on teammates. It just means that these people are too few to affect the mean. Or perhaps it suggests that coachs are not using information properly to make adjustments that are meaningful to player performance. Overall, I suppose, one cause for why Simmons hates advanced stats and rankings is that he isn’t sensitive to the importance of standard deviation, and ironically enough,  he applies the mean tyrannically when there is such a concept as statistical insignificance.

But Berri has never pushed his work as a full explanation of the game of basketball. First, he doesn’t present in-game summaries: he only looks at averages over time. There’s nothing in his stat to indicate the ups and downs (i.e. standard deviation in performance) a player experiences from game to game. Even in baseball, hitting .333 does not guarantee a hit every 3 at-bats. It just means that over time, a hitter’s hit streaks and lulls add up to some number that is a third of his at-bats. Berri’s metric (and any other work that proposes to measure player performance) certainly cannot predict what a given box score would be, for a given game, for a given player.

Regardless, I do not see a problem with Simmons’s ranking his players. Simply, he values entertainment value as much as production. I would say he values the swings in performance just as much, if not more (more on this later). Yes, he says stats do not matter, but of course it does. It’s interesting that all the scoring lines he cites, in admiration, all lead with a high score or score per game. And if you can’t shoot, rebound, pass, steal, or block and coughs the ball up a lot, it wouldn’t matter how pretty you make everything look.

No-no’s

Joe Posnanski has pointed out that, whenever someone trashes stats, he tends to offer some other supplemental numbers that back up his point. In other words, the disagreement isn’t about statistics per se, but between the distinction of “obvious” stats vs. “convoluted” stats.

Even if one disagrees with basketball statistics, at least he can believe that statheads came up with a formula first and turned the crank before comparing the readout with their perceptions of players. Hence Simmons blowing up when PER or WP48 doesn’t rank his favorites highly.

Simmons approaches this from the opposite direction. He has an outcome in mind and “builds” a stat/model to fit it (like his 42-Club). But he mistakes his way of tinkering with what modelers actually do. Berri arrived at his model by performing linear regression on a particular box score and seeing whether the point-difference increased. It isn’t an arbitrary way of deriving some easy to use formulation. The regression coefficients are meaningful in that, what it says is, if you increase shooting percentage by this amount, the point-difference goes up by that amount. It so happens that points scored by a player did not increase the point-difference. And he built it by using all players; it’s strange to decide before hand what players are great, and then build a metric around that. Why even bother in the first place?

And for Berri to report differently on these aggregate data because Kobe isn’t ranked any higher, actually would become scientific fraud. But as I noted above, applying these WP48 rankings isn’t as hard and firm a process as Simmons thinks. There is some room for flexibility, depending on what one tries to accomplish.

In general, I agree that more break downs in the game would be useful, in the sense that more data is always nice. The problem, for academics, is that these stats might remain proprietary, and it becomes difficult to apply across all teams. Even if we could get all the “hyperintelligent” stat breakdowns from a single team, it is unclear if other teams would view the break down in the same way. The utility for examining general questions about worker (i.e. player) productivity for academic publication becomes less clear. The database ought to help the teams – assuming they are intellectually honest enough to verify that their stats that produce a better picture of player productivity and aren’t impressed by the gee-whiz-ness of it all. My guess is that they won’t be entirely successful, as Simmons still has a job trashing bad GM decisions.

Standard Deviations

Why I watch sports: it seems to be similar to the way Simmons does. He watches over a thousand hours of sports each year, waiting for the chance to see something he has never seen before. Something that stretches the imagination and the realm of human physical achievement.

I feel the same way; I am team and sport agnostic, and although I used to follow Boston Bruins hockey religiously, I left that behind in high school. Although I have lived in Boston from the age of 7 onwards, I had not been infected by the Red Sox or Celtics bug (even during their mid-80’s run). I did root for the Red Sox in 2003 and 2004, but that was because of the immense drama involved in the playoff games against the Yankees. And Bill Simmons’s blog for the season.

Perhaps I prove Simmons’s point about stat heads; I like to say that I am interested in sports in the abstract. I like the statistical analysis for the same reason Dave Berri had pointed out in his books. There is a wealth of data in there to be mined. I thought one good example of the type of research that can come from these data is finding evidence for racial bias in the way basketball referees call games.

However, what got me interested in watching professional sports was Simmons writing about it. Although I didn’t watch football, basketball, or baseball for a long time, I did watch the Olympics and, believe it or not, televised marathons. Partly it was because my wife and I were running, but mostly I saw the track and field type sports as a wonderful spectacle. So it wasn’t that much of a stretch to fall into a stereotypical male activity.

At any rate, I was amazed at Usain Bolt’s performance in the 2008 Summer Olympics. I was disappointed by Paula Radcliffe injuring herself during the Athens Olympics, and then relieved when she won the NYC marathon, setting a new speed record in the process. I rooted for Lance Armstrong to win his seventh Tour. I rooted for the Patriots to get their perfect season. And until the Colts laid down and the Saints loss a couple of weeks ago, I wanted the Colts and the Saints to meet in the Super Bowl, both sporting 18-0 records. I was glad that the Yankees won the World Series, and with that fantasy baseball lineup, I hope they continue to win. I want to see the best teams win, and win often. And yes, I wish the regular season records lined up with the championship winners for a given season. Then we wouldn’t have arguments about best regular season records and the championship winners.

This isn’t because I’m a bandwagon fan; I watch sports now for the same reason that Simmons does. To see the best of the best do great things. But not always because they might have a competitor who wants it more, leading to the best failing, at times. This drama is the power of sports.

And I can see why Simmons argues so passionately against stats. He likes the visceral impact of sports. I can say that Bolt ran a 9.69s 100 m. But it was nothing compared to seeing Bolt accelerate, distance himself from the other runners, and then slow down as he pulled into the finish line. He blew away the competition. My eyes were wide and my mouth hung open: he slowed down! And he was 2 strides ahead of everybody. And he set a new record. Even if Bolt didn’t set the record, he still made it look easy. On the field, on that particular day, he out-classed his competitors. It is watching the struggle of the competitors (like Phelps winning the 100m fly by 10 milliseconds), on that day, that matters. Over time, if one didn’t watch that particular heat, then the line World Record: Usain Bolt, 100 m, 9.69s doesn’t quite hit you the same way.

But then, there is this. What if instead of looking at the single race, you looked at the athlete performing in 8 or 20 or  50 events for a year? And at these events, the same set of athletes compete over and over?

Here are some possible outcomes: Phelps and Bolt lose every other match, essentially giving us a single transcendental moment. Phelps and Bolt win half their meets. Phelps and Bolt utterly dominate the field, winning 65% or more of their meets.

For first case, we would probably admit that the Phelps and Bolt phenomena was a one-off. For whatever reason, the contingencies (no sports gods or stars aligning here!) lined up such that they did highly improbable feats (but not impossible. This distinction is the point of this section.) The third case proves our point; they are not perfect, but they sure are good. The second case is a bit trickier: since they are right on the borderline, we need some analysis to help us decide. One way might be to sum up our individual observations about these two. Being .500, while giving us a single breathtaking moment might be persuasive. Or one might look at how everybody else did (Phelps and Bolt might have won 50% of the time, but if the remainder is split among their competitors, they have still dominated the field.)

But then what if Bolt and Phelps won 49% of the time, and some other competitor won 50% of the time? What then? Here, criteria are important. Most of the time, we say better meaning, well, something is better. Generally, we aren’t specific about what we mean by it.

In the book, Simmons ranks his top 96 players in a pyramid schematic. He is rather specific about what he wants in a player. And as one expects, he is specific about the types of intangibles his basketball player should have (basically, basketball sense – i.e. The Secret, if he made his teammates better, winnability, and if you choose someone based on “if your life depended on this one guy winning you a title.”) The evaluation of those intangibles, however, is not as precise as he’d like. However, the advantage here is that one might be able to answer “why” questions. In some cases, Simmons seemingly ranked two players differently while giving them the same arguments (like the consistency of Tim Duncan and John Stockton. Somehow, Stockton just rubbed Simmons the wrong way, while Duncan’s consistency makes him the seventh best player of all time.) And his emphasis on projecting Bill Russell’s game into the modern era seemed like Russell should have ranked lower. On occasion, I was left with the feeling that the arguments did not match the ranking.  From what he said about the stat inflation and how Wilt didn’t get the secret, I thought he would be ranked lower than 6.

Dave Berri has the opposite problem: he has a mathematically defined metric and when he says better or worse, it’s whether this metric is higher or lower between the players being compared. He can further break down this stat to show where a player is good or deficient (whether shooting percentage, blocks, turnovers, fouls, steals,  and assists are above or below the average). He can tell you the hows, with his model spitting out a number that combines these different performance stat into a metric of productivity. But he simply ranks players numerically, without talking about how these differences one might see between the players (and one might not be able to see it… it could be one more missed shot or one less rebound every couple of games.)

I am amazed that Simmons cannot reconcile eyeball and statistical information. Just about every time Simmons bitches out scorers, he talks about how this player didn’t get “The Secret”. It isn’t about scoring; it’s about having a complete game. It is about making the team better with the skills you have. To top it off, Simmons then says that point getters are one dimensional. You can’t shy away from rebounds. It’s great to have a few steals/blocks. Sure, not every athlete can do it all, and certainly not be as prolific as superstars, but you can’t avoid doing those things.

I’m sure Berri is nodding his head, agreeing with Simmons. Point getting isn’t the same as being a efficient shooter (at least average field goal and free throw percentages). And you certainly can’t be below average in the other areas if you want to help your team.

But Berri generally writes about the average. Simmons focuses on the standard deviations. He doesn’t just care about the scoring line; he focuses on Achilles-wreaking-havoc-on-the-Trojans type of performances. He loves the stories of Jordan’s pathological competitiveness. In other words, Simmons lives for the outlier moments.

And I think therein lies the nutshell (and to borrow a Simmons device, I could have said this 5500 words ago and shortened this review.) Simmons views the out-of-normal performance as transcendent, as examples of players who wanted something more or had something to prove. He treats the extreme as something significant; he uses a back story to it to give the event meaning. That’s fine. It’s also fine when Berri (and stat heads) are constrained in treating outliers as noise (possibly) or irrelevant to the general scope of the model, if they desire a model of what usually happens and are not concerned with doing the job of a GM and a coach for free. Because they both defined the game they wish to play in.

When to talk…

December 16, 2009

I swear I never meant for this blog to focus so much on sports. But Dave Berri has a post that dovetails neatly with some thoughts I have regarding experts, expertise, and how the public should handle them. I think it can be interesting to approach science issues from the side, rather than head on. Specifically, three authors (Berri, Malcolm Gladwell, and Steven Pinker), all of whom I admire, have had a minor verbal tussle about the issue of expertise.

First, a digression. I was already going to comment on the interface between experts and laymen. The original impulse came about because I just finished reading Trust Us, We’re Experts! by Sheldon Rampton and John Stauber. Like books of this ilk, the authors spend many chapters recounting the failures of authority figures and the exploitation of these failings by people who follow the profit motive to an extreme degree. Although the title hints at a broadside against arrogance of scientists, it really is about the appropriation of the authority, rigor, and analysis of science to sell things. The targets of this book are mainly PR companies and the corporations that hire them. There are also a few choice words for scientists who become corporate flacks.

The book lacked in presentation, mostly because the authors avoided analyzing how one can tell good from bad science. The presentation leans on linkages between instances of corporate malfeasance; there is no analysis and data on how many companies engage PR firms in this. There is no analysis on the amount of research from company scientists versus independent ones. The authors focus on motives of corporate employees, but somehow ignore the possibility of bias within the academy. There is no attempt to identify if and when corporate research can be solid. In broad brush strokes, then, chemists who discover compounds with therapeutic potential are suspect; the same people working in academia (and presumably someone who will not capitalize on this finding financially) can be trusted.

This is actually a huge problem in the book; one of the techniques that Rampton and Stauber document is the use of name-calling (good old fashion “going negative”, ironically enough, the PR firms would simply label all opposition as junk science.) in describing research and scientists who publish contrary findings from whatever corporations happen to be pushing. But by avoiding the main issue of identifying good and bad science, the two stitch examples of corporate and public relations collusion. Now, the evidence they present is good; they hoist PR and corporate employees by their own petards, quoting from interviews, articles written for PR workers, and from internal memos. But the ultimate point here is that Rampton and Stauber simply tarnish corporate research because the scientists work for corporations. I believe this to be a weak argument and is ultimately useless. One example I can think of is, what if two groups with different ideologies present contrary findings? Assuming that the so called ‘profit motive’ are equally applicable, or not at all, then readers will have lost the major tool that Rampton and Stauber pushed on in this book. But as I will show, the situation is not always as stark as, for example, corporate shills and academicians or creationists against biologists. There is enough research of varied quality, published by ‘honest actors’, to cause enough head-scratching about how solid a scientific finding was.

Let’s be clear, though. Of course the follow-the-money strategy is straightforward and, I would think more likely than not, correct. But that cannot be the only analysis one does; if the thesis is that PR firms use name-calling as a major tactic in discrediting good, rational, scientific research, it seems bad form to use funding source as a way to argue that investigators funded by corporations do bad research. It’s just another instance of name calling. I expected more analysis so that we could move away from that.

And that’s the unfortunate thing about a book like this; why wouldn’t I want a book that causes outrage? Why, in essence, am I asking for an intellectually “pure” book, one that deals with corporate strong arm tactics in a so-called more methodical, scientific way. Doesn’t this smack of the political posturing, where somehow a result matters less than the means – and no, I do not mean the ends justify the means. I am just pointing out that there might be multiple ways of doing something (like taking route A vs. B or cutting costs by choosing between vendor C and vendor D). Workplace politics might elevate these mundane differences into managerial warfare. Why should I care what the politics are, so long as it leads to a desirable end result?

One problem problem with a book like Trust Us is that it appeals to emotions with rhetoric, without a corresponding appeal to logic. I think including analytical rigor is important as it provides the tools for lasting impact. As it is written, the book (published in 2000) provides catchy examples of corporate malfeasance. The most basic motif is as follows: activists use studies that, for example, correlate lung cancer with smoking in order to drive legislation to decrease smoking. Corporations and interested parties attack by calling this bad science, by calling the researchers irresponsible, by calling the activists socialist control freaks who wish to moralize on an issue that is really a matter of personal choice. They have a considerable war chest for this sort of thing. Frankly, if that’s what Rampton and Stauber are worried about, then their focus should have been on the herd mentality of people, not the fact that PR firms use negative ads.

But that is only one weapon; the other weapon is the recruitment or outright purchase of favorable scientific articles. The  example would be the studies published by scientists who work for tobacco companies, with the studies refuting the claims of the investigators. But Rampton and Stauber focus on simply point out that this favorable finding comes from researchers who are paid by Philip Morris. That’s nice, but how is this different from the name-calling Philip Morris engages in? The real issue is how one goes about identifying what bad research is.

They do throw a sop to analytical tools, at the end of the book. The discussion is cursory; the focus is again on helping the reader dissociate the emotional rhetoric from the arguments (such as they are.) The appeal is that the analysis is simple. Just question the motives of the spokesmen and experts.Worst of all, their discussion of the difficulties of science gives the impression that the whole enterprise is a bit of a crapshoot anyway. They point out peer review is a recent phenomenon, that grant disbursal depends upon critiques from competing scientists, and that the statistically significant differences reported are more often than not, mundane and not dramatic. Their discussion of p-values make scientific conclusions sound like so much guesswork, rather then the end result of hard work. Day-to-day science isn’t as bad as the pair portrayed it.

It is a trick to take a broad question (“How does the brain work?”), break it down into a model (“Let us use the olfactory system as a ‘brain-network lite’”), identify a technique that can answer a specific question (“I wonder if the intensity of a smell is related to the amount of neural activity in the olfactory system? We expect to see more synaptic transmission from the primary neurons that detect ’smells.’”), do different experiments to get at this single question, analyze the data, and write up the results.

Forget the fact that different scientists have different abilities to ask and answer scientific questions; nature doesn’t often give a clear answer. So yes, it is hard to get conclusive statements. To confound the issue further, even good research can have a flaws, unclear experimental design, incorrect analysis, and distressingly minor differences between control and test conditions.  Which leads us to the question, what exactly does good research look like?

I am not going to answer this now, and I can’t answer this. The blog will, eventually, attempt to deal with this very issue by presenting papers and research that I read about, in addition to book reviews. But my point here is that Rampton and Stauber didn’t address this issue either. The very end of the book is a populist appeal, one that emphasizes “common sense” over jargon and statistics. They even appeal to our civic duty, that we should become more politically active and associate with (my term, not theirs) “lay-experts”. At some point, however, even well-informed non-scientist and non-experts must have turned to experts for some original research. Rather than disregard that research, then, one must learn and gain a comfort level with parsing scientific literature.

It took a while, but we return to the Gladwell-Pinker-Berri flap. The setup is simple: Berri is a sports economist, specializing in creating models that predict athletic performance. However, he has tackled multi-player games (basketball and American football), which, presumably, would lead to complex models, or perhaps something computationally intractable. Surprisingly, he found that neither was the case. The important point this time is that he was able to show where quarterbacks are selected in the NFL draft doesn’t fit with their performance (assessed using the Berri and Simmons QB Score metric.) Gladwell wrote an essay that presented Berri and Simmons argument favorably. Pinker made a short comment refuting this, saying that QB’s drafted high do have better performance.

Both Pinker and Gladwell’s review and response seemed snippy to me. But what I found interesting was that while Pinker questioned Gladwell’s ability as an analyst (while giving Gladwell the backhanded compliment that he is a rather gifted essayist – but not a researcher or analyst), Gladwell, in turn, questioned the background of Pinker’s sources. I think Gladwell’s highlighting the faults with the arguments was sufficient, as Pinker’s sources are somewhat weak. It really wasn’t necessary to impugn their background.

This is ironic, as Pinker raises some peripheral issues regarding Gladwell’s suitability in reviewing the research and observations from experts. Just as with Gladwell, I think Pinker gave a reasonable counter-argument to Gladwell’s generally gung-ho and favorable presentation of his subjects. For example, there is a flip side to imperfect predictors: while they may not be useful for predicting the most suitable candidates, they help to remove the worst ones from the pool, in a cost-effective way. That’s an interesting, and I think one “system” that scientists can study to answer this is… sports (because of the wealth of performance data).

There really is no need to trash an expositor just because he is a better essayist than a scientist, for instance. Isn’t Gladwell in fact an expert in conveying novel research to the public (and effectively)?

In this case, I think both the “expert” and “lay person” gave a good accounting of their (intellectual) problems with the other. However, they both engaged in what amounted to look-at-the-source “analysis” (Pinker says Gladwell doesn’t know what he writes about. Gladwell trashes Pinker’s football sources for things they did, that are unrelated to football). The only thing the ad hominem attacks achieved was to raise the blood pressure of both participants.

Asymmetry

November 25, 2009

Strangely enough, I find myself writing again about Bill Simmons. I found his latest article interesting, well-thought out, with his conclusions generally supported by his arguments. So why am I writing? Simmons did a great job breaking down film and the problems with the type of statistics used. I took issue with the fact that he concludes this “proves” the lack of predictive power of statistics, when I thought he should have concluded that he used statistical and observational analysis correctly. Simmons missed a golden opportunity to show readers how to synthesize statistics and low-sample number observations.

The setup:  Week 10, Patriots at the Colts, 34-28. The Patriots had the ball on their 28 yard line, 2 min 3 s left to play, and it was 4th-and-2. Belichek decided to go for the first down rather than punting. There might have been some issue with the ball being spotted in the wrong place, but essentially, the Colts stopped the Patriots. Turnover on downs. The Colts scored on their series, after dragging out the clock, and won the game by a point.

First, Simmons does what I like sports writers to do: combine on-the-field observation with the context of what one usually sees from football teams, in the aggregate (i.e. some group analysis, which usually does mean statistical analysis). I happen to think his argument against not-punting, in this specific play, is stronger than, for example,  Joe Posnanski’s and Gregg Easterbrook’s posts about the statistical analyses that generally supported Belichek’s decision. Simmon’s arguments were stronger because he specifically placed his observation of the game and the Patriot’s performance leading up to this last offensive call in the context of aggregate statistics. True to form, however, he followed this by trashing the statistical analysis, rather than concluding that he had properly evaluated singular performance and identified how the Patriots deviated from the aggregate.

Simmon’s argument is that most stat-heads used the wrong set of probabilities. Posnanski,  Easterbrook and Simmons presented the statistical arguments that the Patriots had a greater chance of winning had they gone for the conversion, rather than punting. To be fair, the difference might have been slight; numerically, of course, one probability was higher than the other (Tim Graham of ESPN arriving at a 1.5% win probability). Had Simmons focused on reconciling the statistical assumptions with how Belichek’s play calling lowered the Patriots’ chances of achieving first down, I believe he would have provided a wonderful illustration of how one goes about reconciling statistical/probability estimates with actual events. Unfortunately, Simmons ignores the probability of winning, focuses on the probability of losing, and asserts that  punting was the unequivocal correct call.

Simmons had a contrary opinion from Easterbrook and Posnanski on the punting issue, but all three of them found problems with Belichek’s coaching in the last minutes of play, preceding the 4th down conversion attempt. All three seemed to have pointed out issues with game management (such as 2 timeouts that were called just to make sure the right players were on the field) and with play calling (rushing on first down, passing on the next two downs). That last sequence seemed to have suggested that the call to play out the fourth down rather than punting was a spontaneous call. Simmons broke that down nicely, suggesting that rushing on third down made more sense if one is in fact going for a 4th down conversion. Finally, the actual play on 4th down was atrocious, as the Patriots limited their options drastically, going with an empty backfield. In this formation, there was no running option, and the Colts simply jammed Brady to hurry his throw. As it happens, he connected with Kevin Faulk, but short of first down.

I don’t think anything here contradicts the aggregate story (such as a greater than even chance of getting 2 yards). The fact is, there was much circumstantial evidence that Belichek might have flubbed the play. After all, there are no guarantees; just because the average play nets 5 yards doesn’t mean the players just stand there, waiting for the refs to spot the ball up field. You need to select a play and then execute it. As the saying goes, that’s why they play the game. The players still need to give their fullest effort.

What one should consider is how Belichek reduced the Patriots’ chance of converting by using a bad strategy. And Simmons actually did this. He noted that this play was essentially a 2-point conversion attempt, as both offense and defense were lined up to attack and defend a short field (i.e. defending the end zone with the line of scrimmage at the 2 yard line). There seemed to have been some confusion between the special teams and offense as it wasn’t clear to the players whether they were attempting a punt or not, necessitating a time out that could have been used later to challenge the Faulk bobble (see Posnanski’s post). Simmons presented some stats showing that 2-point conversions had a lower success rate (on the road; I have issues with Simmons’s selective stat picking, but that piece wasn’t exactly a peer-reviewed article.) It was unreasonable to conclude that the Colts would have rolled back down field to score with under 2 minutes to go, possessing only 1 timeout (despite the fact that the Colts did exactly that on their preceding drive. It probably was an aberration and won’t happen again. But a stat here would be nice, comparing how long in distance and time an avg NFL drive is.) The Colts  also had an inexperienced, young receiver corps, which might have increased the Patriots’ chances of stopping the Colts after a punt.)

So, even if the average successful 4th down conversion is around 60%, the Patriots did not maximize the likelihood of success. Thus the stat-heads, in essence, should have altered the assumptions for their calculations, based on the on the field observations, from the last couple of minutes of the game. Maybe the Patriots should have punted.

There are some arguments against punting. Easterbrook focused on the specific offense/defense matchups as determined by this particular game. Easterbrook wrote that, on the previous possession, the Colts drove 79 yards in 1:40, without a time out, for a touch down. Easterbrook also noted that, to his eyes, the Patriots defense seemed a step behind the Colts offense. Also, the Patriots were playing against a weak secondary. As it happened, Brady and company rolled up 370 yards on the night. It seemed like they should have had a greater than the league average chance of converting the 4th down.  They might have had a slightly lower than league average chance of defending ~70 yards, had they punted, as they had just shown they could give up a long drive (although Simmons pointed out that the Patriots stopped the Colts in 5 of the last 7 defensive series in that game.)

Again, the two arguments are  whether the Patriots can stop the Manning with under 2 minutes and whether Brady plus Faulk, Welker, and Moss can gain 2 yards. On the field, there are probably enough game-related distractions and observations for Belichek. As Posnanski said, there might have been a lot going in Belichek’s mind. It might have taken him until the last second to come to some conclusion about what to do on that fourth down. He probably did know, in general terms, the arguments above, but might not have led to a clear cut answer. He might have just decided that there was a very good chance his QB would have found a way to get the 2 yards. Although I support Simmons’s argument (and only because I think the win probability is shaded just slightly more towards punting, with Simmons’s modifications taken into account), I’m not sure if punting is a clear answer with so much time left on the clock, against a quarterback like Manning.

I think both punt and no-punt, observational arguments are valid. And the whole point of statistics is to help you weigh these alternatives against some metric (i.e. the league average.) Where it actually detracts from the analysis (to the non-statistician’s mind mind) is when the likelihoods of a positive outcome, for the considered alternatives, are rather similar.

The two points here is that, 1) contrary to Simmons point that observations are somehow better, observations also led to two contradictory, sound conclusions about the overall strategy, and 2) with the situation as stated, punting was still not a guarantee of a win (punting would have been the better option as time left to play decreased.)

The problem with the former is that we have a tendency to shoehorn these anecdotes into fitting the conclusions that we want to draw. That’s why having some statistics can provide a context for evaluating the single sample observations. You can’t do what Simmons did, which is to say that the aggregate is wrong because of the details in this situation (wrong play selection or no strategy leading to a 4th down conversion attempt) just as you can’t argue against the punt if a punt return-touchdown happened. Because in the aggregate, these things are aberrations. Even if Simmons arguments for punting was strong, it probably should have modified the outcome to only a greater than 50% winning probability, not the 100% win that Simmons thinks. In other words, you can’t just turn a 60% win probability into 100% just because you chose it. In the aggregate, both plays would yield a win more than 50% of the time.

Some other criticisms of Simmons’s piece: not all stats are created equal. Examples of what not to do with stats include Simmons using spurious stats, like how often there are 3TDs scored in the 4th quarter, to bolster his point. But why limit it to 4th quarter? Why not just look at how often 3TDs are scored in a quarter? Or why look at only 2 point conversion plays, on the road? I know Simmons made a point about how this particular play is set up like one, but the proper comparison is still against all 2 yard attempts or a comparison against all 2-point conversion plays. The problem is that, he made no attempt to discuss the validity of that particular stat in general before analyzing the break downs. In some regards, it might be simpler to prove the general case before the specific one. And certainly it helps to present all the splits, not just the ones that support your case.

Part of the issue with probability and statistics is that people do not have the luxury of the long-run or multiple trials. We only have this one trial. Which brings us the the asymmetry referred to in the title of this post. Models are one way in that one can build them by collecting multiple observations; it is a mug’s game to apply models to predict a specific event. Something might happen, until it does; the model is probabilistic, but the outcome is binary. That is part of the difficulty in accepting statistical models.

I thought that Simmons piece indicated that he did not separate the overall strategy with the details of the execution.  As he is so fond of arguing, the details cannot be captured by a simple measure as “conversion”. There were many ways of getting there: is a recovered fumble an ideal way of converting a 4th down? How about a penalty against the defense? Was it a 4th and inches grind forward? Was it 8 yd pass against a weak opponent? Did the coach rest the first string defense in the fourth quarter, with the game well in hand? However, this was in the context of a Brady plus Welker, Faulk, and Moss offense that had nearly 400 yards on the night. That is a detail that Simmons did not dwell on. The players gave the Patriots a legitimate shot at converting the 4th down. It was the playcalling from Belichek that failed the Patriots. I thought it was unfair for Simmons to trash the strategy based on the example of this particular play.

And to spread the criticism a bit, I don’t think it makes sense to never punt, as Easterbrook maintains (though he argues this from an aesthetic perspective.)  The contribution of that particular play to the overall win probability depends on the situation. It is the coach’s job to identify the most significant factors in terms of the aggregate (i.e. whole NFL result) and then apply it to an analysis of how his particular offensive and defensive play callings maximize the actual performance of his players.

Simmons missed a great opportunity to show how a proper analysis should be done. He could have supported the obvious point, that, hey, to maximize on that 60% success rate, you need to treat this like a normal play in a scripted series, not like a 2 pt conversion. He even said as much; another one of his points is that Belichek did not treat the whole series like a four down set. Doing so would have enhanced the overall chance of success. Instead, he raised the metaphorical equivalent of the “blogger-in-Mom’s-basement” attack against stat-heads: that they don’t watch the games. And that watching the game would have told you what the correct strategy was. I don’t think that was the case as all, as the contrary view can be derived using Easterbrook’s asssumptions.

September 7, 2009

I got to thinking about a difference between writers and commenters. One crucial difference is skill, naturally. However, I am thinking about some of the emails sportswriters such as Joe Posnanski, Dave Berri, Peter King, and Bill Simmons get. The best correspondence they publish tends to follow up on a thought, often giving an example about some tragedy the pundits had written about.

Considering this small and selective sample, I concluded that the main difference beween lay writers and the professional is context. Professionals establish context in which lay writers tend to work. That is, professional writers organize examples by their themes, while the lay writers (i.e. commenters) write single examples. This leads, firstly, to the difference in length. The commenters provide an example or a vignette that refers to the established idea. I suppose one-graf bloggers tend to fall into this category, no matter how good the actual prose is. The professional writer would have developed the context for his main argument before using examples to emphasize his own point. While longer is not always better, of course developing ideas take up space. This leads to longer pieces. It takes a bit of skill to compress ideas into a paragraph (try reading abstracts from science papers and see if it makes sense to someone outside of the field you work in. The good ones will make sense to someone who doesn’t work in your field.)

For now, I want to focus on the difference between a professional writer’s and a scientist’s mode of writing. At the level of sports pundits and analysis, there are the Joe Posnanskis and Bill Simmons of the world, and there are popularizer of research, like Dave Berri. All three are wonderful writers for their fields, but I would rather read Posnanski and Simmons before Berri, if considering only the literary aspects of their writing. Nevertheless, the main difference between the two is not in the scope but in the details that provide context for their pieces.

Recently, Posnanski wrote about his desire to adopt a baseball stat for his blog. He hinted at reasons for disliking OPS (simply, on-base percentage + slugging avg), and presented an argument for his “hitting average.” That’s all fine and good; readers of Dave Berri’s blog and book Wages of Wins will note that finding Berri in fact tries to find statistical measures of athlete “productivity” that relates to point production and thus, wins. Now, here’s the difference between Posnanski’s and Berri’s approaches. It certainly isn’t scope, since both are ostensibly doing the same thing. However, Berri’s approach is scientifically sound where Posnanski’s isn’t, despite Posnanski dealing with objective mathemetical measures.

A caveat: I am not saying that Posnanski’s stat or approach is wrong. Posnanski has made every attempt to say that what he is doing is more for aesthetic reasons and than to find THE stat, the single model that explains MOST aspects of baseball. Again, I am merely considering their styles of presentation, which are partially limited by the scope and how they approach the details.

In any case, Posnanski details how stat-geek readers of his blog, led by Tom Tango, generated a new stat called “linear weights ratio.” Posnanski tests this stat out by checking the rankings of a number of players; of course, there is some alignment with more traditional advanced baseball stats. He also presents the formula for his hitting average, for readers to play with. Again, there’s nothing intrinsically wrong with this; Posnanski isn’t doing econometrics. If anything, he is doing a great service by getting various reads to think mathematically. But Posnanski doesn’t provide a context to evaluate that new metric. Mainly, he doesn’t compare this metric to established metrics. In contrast, Berri’s approach is, in essence scientific, since his arguments are constrained by the context of describing and comparing these metrics.

This context is the difference between a layman’s approach and a scientist’s approach. Berri did much the same thing as Posnanski suggests in researching basketball players’ productivity. Berri looked at the linear regression of things like points score, shooting percentage, rebounds, turnovers, and so forth, on the amount of points scored. Based on these stats and the weights identified from the regression analysis, he generated a linear model. He placed this stat, Wins Produced, into context by first applying it to all NBA players through all years for which stats are available, he compared its correlation to points scored for and against to existing NBA statistical models, and he generated points of comparisons for each NBA player to the “mean” player at his position. In this way, he is able to actually determine that his measure has a higher correlation to the efficiency differential (points scored – points given up) than the other stats. He was also able to identify the main difference between his and other models, in that the other models tend to use points scored as opposed to the ratio of points scored and shots attempted.

The weights Berri used are not arbitrary in the sense that he simply pulled them out in order to emphasize some difference between NBA players that he thought should exist. Naturally, he might have removed some measures from his model because the weight isn’t high enough, but that’s a different matter from “fine tuning” the weight. Regardless, the most important point is that generally, he made a model from the aggregates that significantly correlated with efficiency differential before applying the model to the players. In this way, he has created rankings of NBA player productivity that has generated some arguments in the sport pundit community (for an example, see here, here, here and here.)
While the particulars aren’t important, the conflict is illustrative of a scientific versus a more laid-back  (although it could still be rigorous) analytical approach. For Berri, he simply sets up a model, cranks out the numbers, and then organizes his views of the players by examining the stats. For the laid-back approach, one sees if the stat is properly associated with a player. Again, this latter approach is fine, within its domain. Sports writers are not scientists, nor do they control the purse strings for a sports team. Even within a sports franchise, one does not need to rely on statistics, if they so desire. As Berri notes, the stats comprise merely one component of NBA evaluation. It’s a shortcut to organizing player’s performance. In no case does it substitute ways of identifying why certain players are not rebounding, or generating enough assists, or reducing their turnovers.

In the Posnanski example, he presented a stat which is correlated with runs scored in baseball. He didn’t say whether this correlation is necessarily higher than other measures (such as OPS). This is a subtle point that is often missed. If the correlations between both measures are similar, than there really is no difference. Of course, there may be a lot more numbers involved in one over the other, but most scientists would simpler choose with one with fewer values. It’s probably also easier to calculate. Using the other numbers do not give you added value. I have seen people talk about complex stats as if complexity (lots of math squigglies) is somehow better or is more correct. That is not the case.

So, how does this relate to writing styles? Well, if the laymen write in examples, and professional writers extract themes and trends from examples, then scientists try to extract ideas/themes/trends that apply to all examples (well, ideally, all, but in generally they try to capture data from a meaningful sample that is indicative of the whole population.)

However, there is a limitation in the presentation of a scientific finding: the conclusions are bound by the premise of the hypothesis and the methods and measures that are used. Thus, in Berri’s case, he presents arguments for NBA player’s productivity in terms of his measure (or other measures, if he’s interested in comparing the different metrics.) But he is constrained by that, less so in his blog, but certainly in his peer-reviewed papers. As a matter of fact, Berri’s blog tends to be a bit dry, breaking down a player’s deficiencies by examining the particulars of how low his shooting percentage, rebounds, assists, etc are relative to the league or position average. Just as importantly, Berri suggests that the metric is best used as an entry point into proper player evaluation and development. It’s a short hand for identify players who might be improved. Despite Berri suggesting players don’t change much from year to year, from team to team, from coach to coach, it may be because no one has tailored a practice program for players based on this simple evaluation. Or it may reflect the ceiling offered by a player’s talent. Aside from these straightfoward analysis of why players have below, above, or near average productivity, Berri doesn’t write about how he might enjoy watching certain NBA players. I think it gives an unfair impression that he is a bloodless machine who doesn’t know what a basketball looks like. His model does not account for flair, style, or aesthetics that is probably the raison d’etre for watching sports in the first place.

For sports writers like Simmons and Posnanski, they approach it from the aesthetic domain first. The assumption is that they have an eye for talent and style, and that this is applicable to how everyone else enjoys watching that player or game. I don’t mean that they are interested in a so-called objective way to rank the entertainment or productive value of these players. I mean that they want, but are frustrated by the fact that they can’t always, to identify an essence of a player that can be applied without qualification or exception and can be easily demonstrable. The clearest example is in the way some describe and compare Kobe Bryant to Michael Jordan. Dave Berri can rank the two, not only in absolute terms but as some standard deviation above the league average for their eras. In that comparison, not only is Jordan more “productive” than Kobe, he is a nearly twice so. Simmons would argue that Kobe is the best there is now. He might be a cut below Jordan, but there is no player closer.

One solution here is to recognize that there is a difference between the professional and the scientific presentation of ideas. Berri started from the metrics first, despite whatever he might think about the players. Simmons cannot, or would not, separate the aesthetics and productivity of the players he enjoys watching. There is nothing wrong with either approach. The only difference is that Berri’s work easily translates into a scientific publication format. Its details all concern finding some measure, defending that measure, identifying advantages of using that measure, and discussing how this measure may be insufficient. In other words, Berri and other scientists are biased into finding “measurables”. For better or for worse, because in the end, the basic scientific hypothesis is “how much.” How much did this drug improve patient outcome? How much did the tumor reduce? How much is a photon deflected from its true path by a massive body? Can we identify how many molecules of this do we have?

This isn’t necessarily a reductionist approach; at its best, finding quantifables is a way of creating a reference point so we can start to discuss things. Thus, the proper angle to take against a scientist (i.e. Berri) is to identify and improve on his assumptions, find a different metric that gives a higher correlation, or improve on his metric by finding more terms that add value to enhance correlation. In other words, scientific discussion is limited by the context of the methods, which acts as a framework for subsequent arguments.

The sports writers do not have this limitation. They can seque between stats and aesthetics. Like Simmons, they can also sprinkle pop-culture references that actually advance their argument. However, I think because they do approach things from an aesthetic angle first, they tend to provide contexts based on motifs and not on metrics. In other words, it allows Simmons to focus on the literary spin of his piece, relating the NBA offseason to lines from  the movie Almost Famous. It allows Posnanski to say that he wants a new stat, because he doesn’t like how OPS is pronounce “ops” and not “Oh-Pee-Ess”. There is a lot of room for literary flourish, which shouldn’t make the argument any more objective, but it becomes much more enjoyable.

Interestingly enough, and, ironically, I haven’t looked at this for all cases, I think for the most part, Simmons and Berri emphasizes the same attributes they want from their ideal basketball player. They want someone who can shoot well (i.e. high shooting percentage), score a lot of points, make passes for assists, don’t cough the ball up, and make rebounds. Where they differ is in how they rank the so called “top players”.  Berri has noted that most conventional players evaluation centers on points scored (without regard to the number of misses the player made.) He has noted that player rankings and player salaries have a correlation of 0.99 compared to points scored. And strangely enough, Berri’s work showed that scoring points, by itself, does not lead to higher efficiency differentials. Despite what writers and general managers profess about finding complete basketball players, they put their money on the point-getters. In other words, all the verbiage devoted to arguing how smooth and graceful players are, how much one should enjoy their talent before they fade into old age, the idea of “aesthetics” and “points” are no different. It’s interesting that Berri noted that in fact there may be an implicit metric being used to evaluate players based on the so called explicit measure of a player’s style/gracefulness/aethetics.

http://dberri.wordpress.com/2007/05/15/speeding-up-time-for-bill-simmons/

Detective Newton

July 31, 2009

Margret Guthrie of The Scientist gave a favorable review to Newton and the Counterfeiter. It sounds like a wonderful little vignette into the great mathe-magician’s life.

The philosopher eventually assembled such a compelling case against Chaloner — from testimony by witnesses, informants, and even the wives and mistresses of the criminal’s associates — that he was able to bring him up on charges of counterfeiting the King’s coin, a treasonable offence, in 1698.

On Thomas Levenson’s writing, she notes

[His] pace and timing rival those of the best crime story authors. He has written a real page-turner, perfect for a long afternoon’s engagement with the hammock or whiling away a long airport layover.

July 16, 2009

Nature journal has published a review of Italo Calvino’s Cosmicomics. The book is a re-release, and Alan Lightman recommends the book. It is a set of short stories with cosmological themes. It is whimsical, in one case having a mollusc imagining it had a mustache. The anthology compared favorably with Primo Levy’s Periodic Table. I am now interested in reading Cosmicomics. I have yet to read Levy’s book, although it is collecting dust on my book shelf.

I want to avoid trashing books, since the goal of this blog is to engage the ideas, themes, and characters of books, on the author’s terms. However, Engine City has such issues with writing and presentation that it seems unavoidable that I talk about the writing. I didn’t like the characters in this book. I did not care what happened to them. The leaps from chapter to chapter isn’t graceful but comes across as disjointed. It isn’t always straightforward how one chapter necessarily relates to the last, despite the fact that the story progresses linearly, switching among various time points and character perspectives. Another source of confusion lies in the advancement of story by years – and it isn’t always clear by how many years – as the story uses slow-than-light travel, meaning that there is time-dilation. To be sure, the story, plot, and characters aren’t difficult to follow, but the writing disrupted narrative flow and went a long way to sour the novel’s entertainment value for me.

Aside from that, I thought there are a number of interesting ideas. First and foremost, MacLeod tackles the issue of the Singularity. For all intents and purposes, the Singularity implies the existence of an advanced intelligence. We cannot hope to relate, in our present humanity, to this intelligence. It matters not where the intelligence arose (alien, human made silica, or post-transcendence human.) Various sci-fi authors have settled into different camps/schools, regarding how they should tackle this subject. Some of them argue that it is difficult to write about post-Singularity events, so it makes sense to ignore it and focus on the non-transcendent characters. In other words, the authors will continue to muck it out with the plebes. Some others focus on mundanes interacting/divining the intentions of the Singularity.

I, for one, do not see the difficulty in dealing with the Singularity. Since humans cannot hope to understand a god’s mind, or a Singularity-intelligence, I think, practically speaking, we can make up any motive/action that we want. After all, such an approach has worked for religion. Therefore, I do not see putting words into god’s mouth as a hurdle for sci-fi writers.

MacLeod introduces such a concept, although this aspect of the story isn’t really emphasized. The gods, in this case, arose from nanp-silica based life on asteroids. Bathed in the energy of the sun, they aggregated not into a multi-cell organism, but a networked multi-cellular, super-intelligence. And this super-intelligence does not like noise. So it does what any self-respecting Transcendent will do, who has access to computational powers to model million-body Newtonian mechanical problems. It uses mass-driver weapons to destroy, with pinpoint precision, sources of said noise. The only thing it needs are the right sized asteroids and time. We find out how this intelligence deals with infestations at the beginning and the end of the novel.

What happens in between deals with humans, displaced into a set of star systems on the other side of Sol, readying themselves for an alien invasion. The alien invasion, it turns out, may be a red herring. It turns out these aliens are in fact our creators, and we are not certain what threat they carry.  They already had one colony that had been destroyed on earth (and they had manipulated lifeforms there that eventually evolved into humans.) The novel concerns itself with the interface between humans and their creators, although this matter isn’t probed too deeply.

The nature of these creatures is that they can make a lot of things. They are, in essence, cornucopias. This is a direction that comes too late in the way the novel is plotted; as it is, we don’t really see what impact these creatures would have. A new philosophical idea was also introduced: if the gods could develop out in hard vacuum, in nanostructures of space material, then such intelligences may also have developed on a planet. This “gaia” may work in concert with the gods in the asteroids. Again, this idea came too late, since, at this point, new characters enter the plot and eventually captures and executes the heroes.It is interesting that the heroes were executed not for their role in fomenting instability and war, but for “deicide”. Thus, it is as if the story was composed of multiple short stories.

At first, I thought this proved a failing, since it led to abrupt transitions, dangling plot lines, and unexplored consequences. Over time, I can see how this style of exposition may have worked. The exposition style limits our point of view to when our heroes make their port of calls, after a physical journey of immense temporal delays. We see them encounter civilization at different points of development. Their unsteady grasp of their present encounter is mirrored by the reader’s own disorientation after each time jump, with disorientation lessening as the plot develops for that time period. Unfortunately, I do not feel that MacLeod pulled it off.

OK. I should have moved on. I have continued reading, but haven’t posted any reviews. However, this book really stuck with me, and I need to get this off my chest.

I have noted in my review of Little Children that Perotta paints sympathetic portraits of suburbanites. Sure, by merely describing how they act, Perotta hoists the lot of them on their own petards. Again, I need to stress that Perotta does not present a one-sided portrait of these harried fathers and mothers. This is important, as Ruth and Tim, the two protagonists, are on two opposite sides of the debate on sex education and how far private religion should extend into public schooling.

Of the two, Ruth comes across as insouciant and flip. It actually makes it hard to root for her, despite the fact that hers is probably the more realistic point of view: kids will have experiment and have sex. Why ignore this fact and tell them to repress their urges? Sex education becomes damage control, rather than a vaccination. Her nemesis is JoAnn, not surprisingly, an attractive, sexy, but virginal spokeswoman for a conservative Christian organization. Again, Perotta avoids the easy send-up; as portrayed, there are no dissatisfied boyfriends, grumbling fiance, or kinky neuroses (or any hint of “doesn’t-really-count-as-sex” sex). As a matter of fact, JoAnn comes across as rather dignified, given the contrast in Ruth’s divorced, lonely, and somewhat aimless life. However, there is no doubt that Perotta’s sympathy lies with Ruth; the arguments against knowledge of sex usually are spoofed with wild figures, false accounts of disease transmittance or injunctions from the Bible. Ruth at least gives voice to various numbers and facts about STDs and birth control.

Tim enters the story as Ruth’s daughter’s soccer coach. After a win, Tim gathers his players, who form a circle to give thanks to God. Ruth is mortified, and so the plot is set; Tim and Ruth fall into their roles as adversaries, although Tim is generally an unwilling participant. Tim comes off as a sincere man, who wandered in his youth and failed as a husband and father. Now divorced, he shares custody of his daughter and tries hard to make amends. He too is somewhat aimless; he desires the past that he has lost and has no idea how to let go or move on. He is prodded into a relationship, and then marriage, with Carrie, a fellow parishioner, by the pastor.

It would be easy to focus on the red state/blue state split, the evangelical authoritarians against the liberal sophisticates. There are no new arguments here. What I carried from this book was an admiration of how well Perotta portrays characters. Even the pastor, the obvious lightning rod for anti-evangelical sentiment, doesn’t fall into that role. Pastor Dennis is a dynamic young man who converted Tim. I think enthusiastic best describes Dennis. Dennis is naturally disgusted with Tim for being so weak now; of course Tim made mistakes with his first wife. But now Tim pursues Ruth, spurning Carrie, and it seems realistic to me that while Dennis may overlook past transgression, he abhors what Tim does.

I think the least sympathetic character in the whole book is Carrie, Tim’s wife. She is dutiful to a fault. When I write these reviews, I have no  idea what the author’s intentions are (unless I’ve read interviews). It seems to me that Perotta’s intention with Carrie is to use her to represent the worse of the Christian authoritarian movement. First, Tim does admit that Carrie is his better. But then Perotta twists the knife a little – against Carrie. Carrie realizes it. Her attempt to provide a stable home is her duty. Her settling down with Tim is her duty. When Carrie buys sexy lingerie to ignite passion in their lives, it’s her duty. Submerging her desires; it’s her duty. Her marriage to Tim is a duty.

Therein lies Perotta’s main point; why are evangelicals so gung-ho about submission? Worse, it isn’t even as if Carrie does her duty for god. It is unclear if her motivation is faith, fear of being alone, or a need to amend her past by starting a life as a chaste wife. It is unclear what emptiness she is trying to fill. I might have mis-read the book, but I thought that all the other characters seem sincere. They generally believed in what they are doing, even if how they go about it turns into a complete mess. We don’t read too much about JoAnn’s life, or Pastor Dennis’s wife. As I had mentioned, it seems that JoAnn has it together.

As for Pastor Dennis, there is an element of pride in his pushing Tim to do the right thing; Tim was an official convert. Again, that is a reasonable portrayal of a very human sin. Tim struggles; he has lusts, and he knows what comes of it. He lost his wife over it. But lust is on the same continuum as a capacity for passion; he lacks that with his current wife. One problem is the biblical injunction to have stability, to have a woman simply to temper the man’s wild urges. Ruth is no stranger to sex; she has even enjoyed some of it. But she has also felt pain at being used, and her adult life seems devoted to addressing the symptoms of promiscuity, the logistics of avoiding pregnancy and disease management, and not so much really helping kids – or herself – find happiness or joy on their own terms. Ruth understands enough that religion is not a salve, and neither is living for the moment. But she isn’t sure how to proceed with living in the moment, to be happy and not merely pleasure-seeking. Carrie, by contrast, seems bitter. She has grown to dislike her past (promiscuous) self, but she doesn’t like her present self either. However, she seized on the fact that being able to suppressing her desires places her on moral ground, and more importantly higher than her husband. Despite her meekness, that’s the game she decides to play, and she certainly knows the score. That makes her the ugliest character in the story.

The strength of this story lies in the complicated characters. Especially Ruth and Tim, who are both aimless but sense they are currently at the nadir of their lives. In the end, Tim of course puts his lot with Ruth; although it should be a big statement against the use of religion, sex, or marriage as a bandaid on dissatisfaction with life, it felt more like a realistic first step these two trying to decide what actually makes them happy. I think this is a sublime ending.