The Inherent Variability of Baseball
(draft)
Baseball
depends on statistics; I understand that. But, I assert, perhaps it does so by
taking them much too seriously. In this paper, first shared with SABRites as a series
of posts to the SABR LISTSERV in the fall of 2000, I examine baseball
statistics in a different way, by showing possibilities, rather than
probabilities.
Part 1.
The Batting Averages of “Good” players
Lets
start with batting averages. Playing "god," I created (on the
computer) twenty (20) baseball players all with the ability to hit .300 and
placed them on major league teams. In their rookie year. They all got exactly
250 at bats. Each time one of these 20 players came to bat, I “rolled the dice”
in such a way that he had EXACTLY three chances in ten of getting a hit,
excepting walks and HBPs. When the season ends, how did they look?
Of
course, many outcomes COULD have happened. I ran the exercise ONCE; here is how
the season ended:
HART .256 64
hits in 250 at bats.
IOTA .268
LUCAS .280
JONES .284
NORTON .288
FETCH .292
MORRIS .292
BARNES .300
PARKS .300
ADAMS .308
TURNER .308
KELLY .316
QUINN .316
GARCIA .320
UTLEY .320
OGDEN .324
RIEG .332
CODY .340
SPEAR .344
DOWNS .348 87
hits in 250 at bats. 23 more than Hart.
Well --
Downs got favorable mention for "Rookie of the Year" while Hart was
not sure he'd have a job the next season. Yet there was ABSOLUTELY NO
DIFFERENCE between Downs and Hart. None at all. Nada. Zip. Chance, and chance
alone, accounted for the differences.
Continuing
the process in which I am the "god" that built these 20 players, each
of which has exactly 3 chances in 10 of getting a hit every time he comes to
the plate (walks, errors, HBPs, etc. excepted), I continue. All 20 of my
players were picked up for season 2, and I controlled that season so that each
man got exactly 500 at bats during the season. As might be expected, the
variability was down -- perhaps not as much as one might expect.
KELLY .270 135
hits
IOTA .278
GARCIA .280
MORRIS .284
TURNER .286
QUINN .288
NORTON .302
ADAMS .304
DOWNS .304
UTLEY .306
BARNES .308
CODY .310
OGDEN .310
RIEG .310
PARKS .312
LUCAS .314
SPEAR .314
HART .316
FETCH .320
JONES .338 169
hits. 34 more hits than Kelly.
Downs
dropped from his rookie year of .348 to .304; Hart improved from .256 to
.316.The sportswriter in Hart's city wrote three columns on how Hart was
improving; Baseball Weekly also mentioned him with favor. A "comer."
Jones also improved a lot -- he even got a vote or two for MVP with his .338
average. Kelly's year was a big disappointment in his city, dropping from .316
in his rookie year to .270. But -- there was ABSOLUTELY NO DIFFERENCE between
any of these players -- it was all chance that operated. Put another way, the
value of each of the 20 players to their team was identical.
The
teams that had these players, because I made them this way, had good years and
went to the playoffs. All 20 of these guys played. By the end of the playoffs,
each man had batted 20 times, excepting walks, errors, etc. Here is the
outcome:
QUINN .050
CODY .100
ADAMS .200
BARNES .200
PARKS .200
TURNER .200
IOTA .250
NORTON .250
UTLEY .250
DOWNS .300
HART .300
MORRIS .300
SPEAR .300
KELLY .350
LUCAS .350
RIEG .350
JONES .400
OGDEN .400
FETCH .450
GARCIA .550
The
writers, of course, gave Garcia the MVP award, and had harsh words for Quinn,
pointing out that he had batted .316 his rookie year, dropped to .288 in the
past season, and, when facing the superior pitching of the playoffs, had gone
only 1 for 20.
A
variability of 500 points. And not one cause of that variability except chance.
That did not stop the baseball writers, of course. They had a lot to say. So
did the fans. It is estimated that well in excess of 500,000 hours of barroom
talk were consumed over the winter as the relative merits of these players was
debated, and sometimes fought over! All with almost absolute certainty that the
persons involved knew what they were talking about.
No --
I'm not knocking baseball writers, fans or barroom talk. But I am suggesting
that chance may play a larger part on how the statistics turn out -- and how
players are perceived, than some people think.
Now the
story jumps ahead a few years. The careers of all the players are finished. All
of them wound up (because I said so) with 3,000 at bats. Here is how they
finished:
UTLEY .286
IOTA .287
CODY .290
MORRIS .293
TURNER .293
KELLY .295
JONES .298
ADAMS
.300
QUINN .301
FETCH .302
OGDEN
.303
SPEAR .303
DOWNS
.305
PARKS .306
BARNES .307
HART .307
NORTON .307
LUCAS .315
RIEG .315
GARCIA .318
You
remember Garcia, don't you? He is the one who was the MVP in the playoffs. He
went on to bat .318 lifetime. Utley, on the other hand, started with a .320 in
his rookie year, dropped to .306 in his
second season, did poorly in the playoffs (.250) and finished with .286
lifetime. A credible career, but not a great one. Yet, there was no difference
at all between Garcia and Utley except chance -- the vagrant gust of wind, the
rough infield, the insect that
encountered the pitched ball which changed the ball's path ever so slightly.
From a comfortable armchair, we look at Utley and Garcia, and while neither (at
least on the basis of batting average alone) are HOF candidates, Garcia is no
doubt remembered in his home team’s town with some fondness. Utley is probably
not.
Baseball
is, of course, much more than chance, and my thesis is not that statistics are
without value. But we agonize (sometimes) that Mantle missed .300 by so little
--- and do not acknowledge that if the universe was replayed 10 or 20 times, he
might well have had a final batting average much different than .299 -- perhaps
higher -- perhaps lower.
My
protocol for the preceding was to set up a player as a spreadsheet, then run
& print the spreadsheet exactly 20 times. I then wrote player names, in
alphabetical order, on each of the 20 spreadsheets, and analyzed the results.
Clearly, I could do this n times, where n is any number I wanted. I did it
exactly 20 times and stopped. I could have also done it 100 times and selected
the 20 I wanted. This protocol would clearly not have been a good example of
anything, as I could have selected results to “tell a good story” instead of
the actual story that came out.
The
specific spreadsheet formula (Microsoft WORKS) used for each at bat was
=IF((RAND()-$B$6)<0,1,0)
where
B6 was set to .3
and the
resulting batting averages are simply summed over the number of at bats
desired.
Yes, I
know that rigorous statistical analyses are also possible. But they (in
general) don't show what might actually happen. Much like an analysis of bridge
hands is useful -- but an actual deal will give a player more insight, even
though that particular deal (any particular deal) is so rare that he will
likely never see it again. So consider the above, and what follows, as simply a
representative possibility.
Part 2.
HOF caliber players
Next I
reran the simulator using players with a .35 chance of a hit in each at bat,
giving each of these HOF-caliber players exactly 8,000 at bats. I also ran each
player through 20 at bats in four world series. Here are the results:
life WS#1 WS#2 WS#3
WS#4
Name avg.
avg. avg. avg.
avg.
Abner .344
.400 .400 .300
.150
Baker .347
.350 .300 .200
.400
Champ .350
.150 .550 .350
.450
Dempsey .338
.300 .250 .400
.450
Epsley .344
.250 .400 .150
.400
Folger .358
.300 .300 .300
.300
Grimes .353
.300 .450 .300
.350
Hanes .350
.600 .350 .350
.500
Isley .343
.300 .350 .400
.350
Jenkins .353
.300 .250 .400
.250
This
simulation is much less interesting. By the time 8,000 at bats are attained,
the variability is down a great
deal,
the lifetime range above being only between Folger at .358 and Dempsey at .338.
And all of these "greats" excelled in World Series play, although
there were three instances of one of them batting only .150 in one series.
Still -- would we not regard a .358 lifetime hitter (Folger) as significantly
better than one who hit .338 (Dempsey)?
Part 3.
Below average players.
I ran two
short experiments on below average players. Playing ten men, each with a
potential .250 batting average for a career of 2000 at bats, the results showed
a variability from .231 to ..265. Playing ten men with a potential batting
average of .200 for a career of 1000 at bats showed a variability from .180 to
.213. However, one of these last players did bat .450 in one “World Series.”
Can you picture the excited talk as people refer to “light hitting Joe Doaks”
who excelled in one series? Yet “Joe’s” results in that series were just
chance.
Part 4.
Team variability
I've
looked at batting average variability, and argued that chance can account for a
wide range of results for any player, regardless of how good he is. Now what
about teams?
My
protocol was as follows: I created a league of eight teams, where each of these
teams has the inherent capability, in terms of averages, of the 1948 Cleveland
Indians. That team went 97 and 59. How would a season look if all eight teams
were balanced -- exactly the same as
the 1948 Indians? I differentiated between the teams by color, and ran five
complete seasons. Here are the results, which I found somewhat surprising:
Season
#1
Team Record
Pink 87 67
Green 86 68
Red 81 73
Aqua 79 75
Brown 78 76
Blue 72 82
Yellow 67 97
White` 66 88
The
managers of White and Yellow got fired. But their team was EXACTLY the same as
that of the others.
Season
#2
Team Record
Brown 85 69 Improved from fifth place
Blue 85 69 Improved from sixth place
Aqua 80 74 Improved from fourth place
Green 77 77 Dropped from second place
Pink 75 79 Dropped from first place
Red 75 79 Dropped from third place
Yellow 72 82
White 67 87
The
manager of Brown was praised to the skies for bringing his team up from 5th to
first place. The Blue’s manager also came in for some kudos. But all the teams
were the same.
Season
#3
Team Record
Yellow 85 69
Green 81 73
Pink 79 75
Blue 79 75
Brown 75 79
Red 75 79
White 73 81
Aqua 69 85
This
year all the pundits wrote about Yellow.
Season
#4
Team Record
Green 86 68
Red 83 71
Brown 78 76
Yellow 78 76
Pink 78 76
Aqua 73 81
Blue 72 82
White 68 86 Four years at or near the cellar and the owners of White are getting frustrated.
Season
#5
Team Record
White 83
71
Brown 81 73
Yellow 80 74
Pink 80 74
Red 76 78
Aqua 75 79
Green 72 82
Blue 69 85
No -- I
didn't "make" White win at last. That's just the way it turned out.
There was no difference at all between the eight teams. So the next time the
Cubs finish 14 games out, can I say it is just chance that did it? I think not
-- but one can say that chance has a role to play.
The
question may be asked how the last experiment might be replicated. I have the
code; it is a variant of a computer baseball exercise written about ten years
ago and sold as shareware. I will package a set of files which will allow the
preceding experiment to be performed; the main application program is disabled.
If anyone wants a copy -- email me privately (BURGY@www.burgy.50megs.com); I'll send
them, with instructions as a ZIP file.
Part 5
Players Within Teams.
The
next set of simulation tests can be replicated by anyone who has the PC
shareware program SIMBASE. This was written about 1989, and may not be available
any longer. The author / address is:
Phillip
Smith
PMS
Software of Canada
109
Tripp Crescent
Nepean,
Ontario, Canada K2J 1E2
In
these tests, I took the 1987 Indians and had them play against each other,
first for a season of 154 games; then for a stretch of 600 games, approximating
four seasons.
I set
up the same team of nine players for each game (each player plays the entire
game):
1.
Julio Franco 1. Julio Franco
2.
Brook Jacoby 2. Brook Jacoby
3.
Joe Carter 3. Joe Carter
4.
Mel Hall 4. Mel Hall
5.
Cory Snyder 5. Cory Snyder
6.
Carmelo Castillo 6. Carmelo Castillo
7.
Eddie Williams 7. Eddie Williams
8.
Junior Noboa 8. Junior Noboa
9.
Tommy Hinzo 9. Tommy Hinzo
Here are
the actual batter statistics for the 1987 Indians against all pitchers in the
league. I had Tom Candiotti pitch the simulated games, and as he was somewhat
different than the league average that year, the results will have some
differences based on his pitching characteristics as well as chance.
+-------------------------+
Cleveland
Indians AB 1B 2B 3B
HR H BB SO OO
BA SA
Julio
Franco 495 123 24
3 8 158 60
56 281 .319 .428
Brook
Jacoby 540 100 26
4 32 162 78
73 305 .300 .540
Joe
Carter 588 94
27 2 32 155 36 105 328 .263
.479
Mel
Hall 485 96
21 1 18 136 21 68 281
.280 .439
Cory
Snyder 577 76
25 2 33 136 32 166 275 .235
.457
Carmelo
Castillo 220 27 17 0
11 55 16 52 113 .250
.477
Eddie
Williams 283 55
12 0 15 82 40
56 145 .289 .491
Junior
Noboa 511 89
36 5 19 149 40 41 321
.291 .493
Tommy
Hinzo 257 53
9 3 3 68 12 49
140 .264 .357
After
154 games, these were the results:
---------------------------------------------------------------------------
Visiting Team Home Team
CLEVELAND INDIANS CLEVELAND INDIANS
Runs Hits
Errors Wins Runs
Hits Errors Wins
900
1461 161 74
854 1368 195 80
--------------------------------------------------------------------------
And the
player statistics for the year:
+---------------------------+
Cleveland Indians AB 1B
2B 3B HR H BB
SO OO BA SA
(Visitors)
Julio Franco 649 140 31 1
9 181 93 63 405
.278 .371
Brook Jacoby 616 107 31 7
53 198 107 55 363 .321
.652
Joe Carter 664 84 20
2 40 146 47
86 432 .219 .436
Mel Hall 675 114 26 1
25 166 19 73 436
.245 .398
Cory Snyder 621 65 37
0 44 146 54 136 339
.235 .507
Carmelo Castillo 615 89
50 0 36 175 43 102 338 .284
.541
Eddie Williams 551 86 18
0 35 139 99
84 328 .252 .475
Junior Noboa 579 90 36
10 25 161 47
37 381 .278 .504
Tommy Hinzo 576 109 19 10
11 149 32 67 360
.258 .383
+---------------------------+
Cleveland Indians AB 1B
2B 3B HR H BB
SO OO BA SA
(Home)
Julio Franco 627 129 32 6
8 175 76 48 404
.279 .387
Brook Jacoby 589 97 31
5 31 164 97
67 358 .278 .505
Joe Carter
626 83 24 0
48 155 47 85 386
.247 .515
Mel Hall 628 119 23 2
22 166 33 63 399
.264 .412
Cory Snyder 594 76 22
2 43 143 50 145 306
.240 .501
Carmelo Castillo 577 59
39 0 22 120 56 117 340 .207
.389
Eddie Williams 527 103 15 0
31 149 85 86 292
.282 .487
Junior Noboa 543 100 36 3
27 166 49 42 335
.305 .532
Tommy Hinzo 541 99 15
10 6 130 34
74 337 .240 .338
Franco
hit .278 and .279 -- pretty close.
difference .001
But
Jacoby hit .321 and .278.
difference -.043
Carter
hit .219 and .247
difference .028
Hall
hit .245 and .264
difference .019
Snyder
hit .235 and .240
difference .005
Castillo
hit .284 and .207!
difference -.077
Williams
hit .252 and .282
difference .030
Noboa
hit .278 and .305
difference .027
Hinzo
hit .258 and .240
difference -.018
Since I
don't know the design of SIMBASE, I don't know if there is a home team /
visiting team bias built in. There might be. But that bias is not likely to
explain the differences shown above. Interested people can easily compare the
other statistics. Castillo's statistics alone sort of boggle the mind. One guy
we'd be giving a bonus to -- the other is likely out of a job. Yet both are the
same player, with the same capabilities, playing on the same team.
I
wanted to look at possible home team bias, so I ran two tests of 600 games
each, the equivalent of about four seasons each. In test 1, the home team won,
301 to 299. The widest variance I found in the batters was Williams, who batted
.292 as a member of the visiting team and .264 as a member of the home team. All
the other variances were, however, in single digits. In test 2, the visitors
prevailed, 310 to 290. Batting variances were in a range of 1 to 16 points,
most in double digits.
This
seemed to indicate no home team bias, but not being convinced, I ran 20 more series of 600 games. Here are
the results (including the tests above:
Test Home Visitors Avg.
1 301 299 50.2%
2 290 310 48.3%
3 292 308 48.7%
4 301 299 50.2%
5 306 294 51.0%
6 336 264 56.0%
7 318 282 53.0%
8 313 287 52.2%
9 302 298 50.3%
10 330 270 55.0%
11 327 273 54.5%
12 289 311 48.2%
13 300 300 50.0%
14 301 299 50.2%
15 309 291 51.5%
16 305 295 50.8%
17 300 300 50.0%
18 308 292 51.3%
19 297 303 49.5%
20 301 299 50.2%
21 289 311 48.2%
22 305 295 50.8%
Totals 6720 6480 50.9%
This
suggests to me that the "no home team bias" assumption might not be
true. However, since the generally accepted notion of home team advantage is
pretty well understood to be larger than that measured above (51%), it does
appear that if this simulator has one, it is lower than the accepted rates.
Part 6
The value of a superstar
Back in
1960, I built a computer simulator for
the IBM 1620 computer. One of the issues I was concerned with was the relative
worth of a "super-slugger." One of the tests I ran then was to create
two teams equal in every way overall, but with one having every player of equal
capability and the other having eight players of lesser capability with a
super-slugger batting fourth. The
question I had was -- how much better would the second team be than the
first? That was 40 years ago. Notes indicate that overall I saw the balanced
team win more frequently than the unbalanced one -- which would argue against
spending one's salary dollars accordingly. But I did not keep the results, and
so those tests are rubbish history except for the question they pose.
In this
test, I used SIMBASE again, creating two teams, exactly equal, except one had
nine players with:
a .322
batting average (186 for 578),
a .425
slugging average, 5 homers, 8 triples, and 29 doubles,
and the
other had eight players with
a .299
batting average (173 for 578),
a .380
slugging average, 3 homers, 6 triples, and 26 doubles,
and one
player, batting fourth, with
a .501
batting average (290 for 578),
a .785
slugging average, 21 homers, 24 triples, and 53 doubles.
These
two were then the same statistically. I played 10 sets of 600 games each
between them. They wound up dead even; 3000 wins each. That shot down my thesis. Darn! When an owner picks a
super-player, of course, he also looks for the intangibles -- how much
inspiration he might be to other players -- how many extra fans he will bring
in, and so forth. I know that it was Feller on the Indians of the 30s to 50s
that brought me out to the ballpark. Over that period of time his pitching
probably accounted for at least a couple dozen "extra" games for our
family.
I've
made the argument that chance plays a large part in baseball -- and that its
influence on the outcome of games as well as the resulting statistics is often
overlooked by some of us, fans, SABRites, writers and broadcasters. That does
not diminish, in my judgment, either the inherent worth nor the enjoyment of
statistics. My thesis simply enjoins us to take them for what they are worth,
imperfect measures of imperfect players made by imperfect people, some better
than others, all talented far beyond the average person, who have given us over
a hundred years of great enjoyment and will continue to do so for years to
come. As a Christian, I fully expect to see many sports played in heaven.
Baseball will be prominent among them. What delights we shall still see. Ruth
batting against Feller? Barry Bonds against Addie Joss? Hank Aaron against
Nolan Ryan? What joy. What bliss. Maybe even they will let me play!
John
Burgeson, February, 2001
Press
BackSpace to return