The Inherent Variability of Baseball (draft)

 

Baseball depends on statistics; I understand that. But, I assert, perhaps it does so by taking them much too seriously. In this paper, first shared with SABRites as a series of posts to the SABR LISTSERV in the fall of 2000, I examine baseball statistics in a different way, by showing possibilities, rather than probabilities.

 

Part 1. The Batting Averages of “Good” players

 

Lets start with batting averages. Playing "god," I created (on the computer) twenty (20) baseball players all with the ability to hit .300 and placed them on major league teams. In their rookie year. They all got exactly 250 at bats. Each time one of these 20 players came to bat, I “rolled the dice” in such a way that he had EXACTLY three chances in ten of getting a hit, excepting walks and HBPs. When the season ends, how did they look?

 

Of course, many outcomes COULD have happened. I ran the exercise ONCE; here is how the season ended:

 

HART        .256       64 hits in 250 at bats.

IOTA        .268

LUCAS       .280

JONES       .284

NORTON      .288

FETCH       .292

MORRIS      .292

BARNES      .300

PARKS       .300

ADAMS       .308

TURNER      .308

KELLY       .316

QUINN       .316

GARCIA      .320

UTLEY       .320

OGDEN       .324

RIEG        .332

CODY        .340

SPEAR       .344

DOWNS       .348       87 hits in 250 at bats. 23 more than Hart.

 

Well -- Downs got favorable mention for "Rookie of the Year" while Hart was not sure he'd have a job the next season. Yet there was ABSOLUTELY NO DIFFERENCE between Downs and Hart. None at all. Nada. Zip. Chance, and chance alone, accounted for the differences.

 

Continuing the process in which I am the "god" that built these 20 players, each of which has exactly 3 chances in 10 of getting a hit every time he comes to the plate (walks, errors, HBPs, etc. excepted), I continue. All 20 of my players were picked up for season 2, and I controlled that season so that each man got exactly 500 at bats during the season. As might be expected, the variability was down -- perhaps not as much as one might expect.

 

KELLY       .270       135 hits

IOTA        .278

GARCIA      .280

MORRIS      .284

TURNER      .286

QUINN       .288

NORTON      .302

ADAMS       .304

DOWNS       .304

UTLEY       .306

BARNES      .308

CODY        .310

OGDEN       .310

RIEG        .310

PARKS       .312

LUCAS       .314

SPEAR       .314

HART        .316

FETCH       .320

JONES       .338       169 hits. 34 more hits than Kelly.

 

Downs dropped from his rookie year of .348 to .304; Hart improved from .256 to .316.The sportswriter in Hart's city wrote three columns on how Hart was improving; Baseball Weekly also mentioned him with favor. A "comer." Jones also improved a lot -- he even got a vote or two for MVP with his .338 average. Kelly's year was a big disappointment in his city, dropping from .316 in his rookie year to .270. But -- there was ABSOLUTELY NO DIFFERENCE between any of these players -- it was all chance that operated. Put another way, the value of each of the 20 players to their team was identical.

 

The teams that had these players, because I made them this way, had good years and went to the playoffs. All 20 of these guys played. By the end of the playoffs, each man had batted 20 times, excepting walks, errors, etc. Here is the outcome:

 

QUINN       .050

CODY        .100

ADAMS       .200

BARNES      .200

PARKS       .200

TURNER      .200

IOTA        .250

NORTON      .250

UTLEY       .250

DOWNS       .300

HART        .300

MORRIS      .300

SPEAR       .300

KELLY       .350

LUCAS       .350

RIEG        .350

JONES       .400

OGDEN       .400

FETCH       .450

GARCIA      .550

 

The writers, of course, gave Garcia the MVP award, and had harsh words for Quinn, pointing out that he had batted .316 his rookie year, dropped to .288 in the past season, and, when facing the superior pitching of the playoffs, had gone only 1 for 20.

 

A variability of 500 points. And not one cause of that variability except chance. That did not stop the baseball writers, of course. They had a lot to say. So did the fans. It is estimated that well in excess of 500,000 hours of barroom talk were consumed over the winter as the relative merits of these players was debated, and sometimes fought over! All with almost absolute certainty that the persons involved knew what they were talking about.

 

No -- I'm not knocking baseball writers, fans or barroom talk. But I am suggesting that chance may play a larger part on how the statistics turn out -- and how players are perceived, than some people think.

 

Now the story jumps ahead a few years. The careers of all the players are finished. All of them wound up (because I said so) with 3,000 at bats. Here is how they finished:

 

UTLEY       .286

IOTA        .287

CODY        .290

MORRIS      .293

TURNER      .293

KELLY       .295

JONES       .298

ADAMS        .300

QUINN       .301

FETCH       .302

OGDEN        .303

SPEAR       .303

DOWNS        .305

PARKS       .306

BARNES      .307

HART        .307

NORTON      .307

LUCAS       .315

RIEG        .315

GARCIA      .318

 

You remember Garcia, don't you? He is the one who was the MVP in the playoffs. He went on to bat .318 lifetime. Utley, on the other hand, started with a .320 in his rookie year, dropped to .306  in his second season, did poorly in the playoffs (.250) and finished with .286 lifetime. A credible career, but not a great one. Yet, there was no difference at all between Garcia and Utley except chance -- the vagrant gust of wind, the rough  infield, the insect that encountered the pitched ball which changed the ball's path ever so slightly. From a comfortable armchair, we look at Utley and Garcia, and while neither (at least on the basis of batting average alone) are HOF candidates, Garcia is no doubt remembered in his home team’s town with some fondness. Utley is probably not.

 

Baseball is, of course, much more than chance, and my thesis is not that statistics are without value. But we agonize (sometimes) that Mantle missed .300 by so little --- and do not acknowledge that if the universe was replayed 10 or 20 times, he might well have had a final batting average much different than .299 -- perhaps higher -- perhaps lower.

 

My protocol for the preceding was to set up a player as a spreadsheet, then run & print the spreadsheet exactly 20 times. I then wrote player names, in alphabetical order, on each of the 20 spreadsheets, and analyzed the results. Clearly, I could do this n times, where n is any number I wanted. I did it exactly 20 times and stopped. I could have also done it 100 times and selected the 20 I wanted. This protocol would clearly not have been a good example of anything, as I could have selected results to “tell a good story” instead of the actual story that came out.

 

The specific spreadsheet formula (Microsoft WORKS) used for each at bat was

=IF((RAND()-$B$6)<0,1,0)

 

where B6 was set to .3

 

and the resulting batting averages are simply summed over the number of at bats desired.

 

Yes, I know that rigorous statistical analyses are also possible. But they (in general) don't show what might actually happen. Much like an analysis of bridge hands is useful -- but an actual deal will give a player more insight, even though that particular deal (any particular deal) is so rare that he will likely never see it again. So consider the above, and what follows, as simply a representative possibility.

 

Part 2. HOF caliber players

 

Next I reran the simulator using players with a .35 chance of a hit in each at bat, giving each of these HOF-caliber players exactly 8,000 at bats. I also ran each player through 20 at bats in four world series. Here are the results:

 

          life   WS#1    WS#2   WS#3  WS#4

Name      avg.   avg.    avg.   avg.  avg.

 

Abner    .344   .400    .400   .300  .150

Baker    .347   .350    .300   .200  .400

Champ    .350   .150    .550   .350  .450

Dempsey  .338   .300    .250   .400  .450

Epsley   .344   .250    .400   .150  .400

Folger   .358   .300    .300   .300  .300

Grimes   .353   .300    .450   .300  .350

Hanes    .350   .600    .350   .350  .500

Isley    .343   .300    .350   .400  .350

Jenkins  .353   .300    .250   .400  .250

 

This simulation is much less interesting. By the time 8,000 at bats are attained, the variability is down a great

deal, the lifetime range above being only between Folger at .358 and Dempsey at .338. And all of these "greats" excelled in World Series play, although there were three instances of one of them batting only .150 in one series. Still -- would we not regard a .358 lifetime hitter (Folger) as significantly better than one who hit .338 (Dempsey)?

 

 

Part 3. Below average players.

 

I ran two short experiments on below average players. Playing ten men, each with a potential .250 batting average for a career of 2000 at bats, the results showed a variability from .231 to ..265. Playing ten men with a potential batting average of .200 for a career of 1000 at bats showed a variability from .180 to .213. However, one of these last players did bat .450 in one “World Series.” Can you picture the excited talk as people refer to “light hitting Joe Doaks” who excelled in one series? Yet “Joe’s” results in that series were just chance.

 

Part 4. Team variability

 

I've looked at batting average variability, and argued that chance can account for a wide range of results for any player, regardless of how good he is. Now what about teams?

 

My protocol was as follows: I created a league of eight teams, where each of these teams has the inherent capability, in terms of averages, of the 1948 Cleveland Indians. That team went 97 and 59. How would a season look if all eight teams were balanced --  exactly the same as the 1948 Indians? I differentiated between the teams by color, and ran five complete seasons. Here are the results, which I found somewhat surprising:

 

Season #1

 

Team  Record

 

Pink        87 67

Green       86 68

Red         81 73

Aqua        79 75

Brown       78 76

Blue        72 82

Yellow      67 97

White`      66 88

 

The managers of White and Yellow got fired. But their team was EXACTLY the same as that of the others.

 

Season #2

 

Team  Record

 

Brown       85 69 Improved from fifth place

Blue        85 69 Improved from sixth place

Aqua        80 74 Improved from fourth place

Green       77 77 Dropped from second place

Pink        75 79 Dropped from first place

Red         75 79 Dropped from third place

Yellow      72 82

White       67 87

 

The manager of Brown was praised to the skies for bringing his team up from 5th to first place. The Blue’s manager also came in for some kudos. But all the teams were the same.

 

Season #3

 

Team  Record

 

Yellow      85 69

Green       81 73

Pink        79 75

Blue        79 75

Brown       75 79

Red         75 79

White       73 81

Aqua        69 85

 

This year all the pundits wrote about Yellow.

 

Season #4

 

Team  Record

 

Green       86 68

Red         83 71

Brown       78 76

Yellow      78 76

Pink        78 76

Aqua        73 81

Blue        72 82

White       68 86         Four years at or near the cellar and the owners of                             White are getting frustrated.

 

Season #5

 

Team  Record

 

White       83 71

Brown       81 73

Yellow      80 74

Pink        80 74

Red         76 78

Aqua        75 79

Green       72 82

Blue        69 85

 

No -- I didn't "make" White win at last. That's just the way it turned out. There was no difference at all between the eight teams. So the next time the Cubs finish 14 games out, can I say it is just chance that did it? I think not -- but one can say that chance has a role to play.

 

The question may be asked how the last experiment might be replicated. I have the code; it is a variant of a computer baseball exercise written about ten years ago and sold as shareware. I will package a set of files which will allow the preceding experiment to be performed; the main application program is disabled. If anyone wants a copy -- email me privately (BURGY@www.burgy.50megs.com); I'll send them, with instructions as a ZIP file.

 

Part 5 Players Within Teams.

 

The next set of simulation tests can be replicated by anyone who has the PC shareware program SIMBASE. This was written about 1989, and may not be available any longer. The author / address is:

 

Phillip Smith

PMS Software of Canada

109 Tripp Crescent

Nepean, Ontario, Canada K2J 1E2

 

In these tests, I took the 1987 Indians and had them play against each other, first for a season of 154 games; then for a stretch of 600 games, approximating four seasons.

 

I set up the same team of nine players for each game (each player plays the entire game):

 

            1. Julio Franco     1. Julio Franco

            2. Brook Jacoby     2. Brook Jacoby

            3. Joe Carter       3. Joe Carter

            4. Mel Hall         4. Mel Hall

            5. Cory Snyder      5. Cory Snyder

            6. Carmelo Castillo 6. Carmelo Castillo

            7. Eddie Williams   7. Eddie Williams

            8. Junior Noboa     8. Junior Noboa

            9. Tommy Hinzo      9. Tommy Hinzo

 

 

Here are the actual batter statistics for the 1987 Indians against all pitchers in the league. I had Tom Candiotti pitch the simulated games, and as he was somewhat different than the league average that year, the results will have some differences based on his pitching characteristics as well as chance.

                       +-------------------------+

Cleveland Indians   AB  1B  2B  3B  HR   H  BB  SO  OO     BA     SA

 

Julio Franco       495 123  24   3   8 158  60  56 281   .319   .428

Brook Jacoby       540 100  26   4  32 162  78  73 305   .300   .540

Joe Carter         588  94  27   2  32 155  36 105 328   .263   .479

Mel Hall           485  96  21   1  18 136  21  68 281   .280   .439

Cory Snyder        577  76  25   2  33 136  32 166 275   .235   .457

Carmelo Castillo   220  27  17   0  11  55  16  52 113   .250   .477

Eddie Williams     283  55  12   0  15  82  40  56 145   .289   .491

Junior Noboa       511  89  36   5  19 149  40  41 321   .291   .493

Tommy Hinzo        257  53   9   3   3  68  12  49 140   .264   .357

 

After 154 games, these were the results:

 ---------------------------------------------------------------------------

      Visiting Team                 Home Team

   CLEVELAND INDIANS           CLEVELAND INDIANS

 

Runs    Hits   Errors  Wins       Runs    Hits   Errors   Wins

 

 900    1461      161   74     854    1368      195        80

--------------------------------------------------------------------------

And the player statistics for the year:

                       +---------------------------+

 Cleveland Indians AB  1B  2B  3B  HR   H  BB  SO  OO     BA     SA

 (Visitors)

 Julio Franco     649 140  31   1   9 181  93  63 405   .278   .371

 Brook Jacoby     616 107  31   7  53 198 107  55 363   .321   .652

 Joe Carter       664  84  20   2  40 146  47  86 432   .219   .436

 Mel Hall         675 114  26   1  25 166  19  73 436   .245   .398

 Cory Snyder      621  65  37   0  44 146  54 136 339   .235   .507

 Carmelo Castillo 615  89  50   0  36 175  43 102 338   .284   .541

 Eddie Williams   551  86  18   0  35 139  99  84 328   .252   .475

 Junior Noboa     579  90  36  10  25 161  47  37 381   .278   .504

 Tommy Hinzo      576 109  19  10  11 149  32  67 360   .258   .383                   

                        +---------------------------+

 Cleveland Indians AB  1B  2B  3B  HR   H  BB  SO  OO     BA     SA

 (Home)

 Julio Franco     627 129  32   6   8 175  76  48 404   .279   .387

 Brook Jacoby     589  97  31   5  31 164  97  67 358   .278   .505

 Joe Carter       626  83  24   0  48 155  47  85 386   .247   .515

 Mel Hall         628 119  23   2  22 166  33  63 399   .264   .412

 Cory Snyder      594  76  22   2  43 143  50 145 306   .240   .501

 Carmelo Castillo 577  59  39   0  22 120  56 117 340   .207   .389

 Eddie Williams   527 103  15   0  31 149  85  86 292   .282   .487

 Junior Noboa     543 100  36   3  27 166  49  42 335   .305   .532

 Tommy Hinzo      541  99  15  10   6 130  34  74 337   .240   .338

 

Franco hit .278 and .279 -- pretty close.   difference  .001

But Jacoby hit .321 and .278.               difference -.043

Carter hit .219 and .247                    difference  .028

Hall hit .245 and .264                      difference  .019

Snyder hit .235 and .240                    difference  .005

Castillo hit .284 and .207!                 difference -.077

Williams hit .252 and .282                  difference  .030

Noboa hit .278 and .305                     difference  .027

Hinzo hit .258 and .240                     difference -.018

 

Since I don't know the design of SIMBASE, I don't know if there is a home team / visiting team bias built in. There might be. But that bias is not likely to explain the differences shown above. Interested people can easily compare the other statistics. Castillo's statistics alone sort of boggle the mind. One guy we'd be giving a bonus to -- the other is likely out of a job. Yet both are the same player, with the same capabilities, playing on the same team.

 

I wanted to look at possible home team bias, so I ran two tests of 600 games each, the equivalent of about four seasons each. In test 1, the home team won, 301 to 299. The widest variance I found in the batters was Williams, who batted .292 as a member of the visiting team and .264 as a member of the home team. All the other variances were, however, in single digits. In test 2, the visitors prevailed, 310 to 290. Batting variances were in a range of 1 to 16 points, most in double digits.

 

This seemed to indicate no home team bias, but not being convinced,  I ran 20 more series of 600 games. Here are the results (including the tests above:

 

Test  Home  Visitors    Avg.

                 

1     301   299   50.2%

2     290   310   48.3%

3     292   308   48.7%

4     301   299   50.2%

5     306   294   51.0%

6     336   264   56.0%

7     318   282   53.0%

8     313   287   52.2%

9     302   298   50.3%

10    330   270   55.0%

11    327   273   54.5%

12    289   311   48.2%

13    300   300   50.0%

14    301   299   50.2%

15    309   291   51.5%

16    305   295   50.8%

17    300   300   50.0%

18    308   292   51.3%

19    297   303   49.5%

20    301   299   50.2%

21    289   311   48.2%

22    305   295   50.8%

                 

Totals      6720  6480  50.9%

 

This suggests to me that the "no home team bias" assumption might not be true. However, since the generally accepted notion of home team advantage is pretty well understood to be larger than that measured above (51%), it does appear that if this simulator has one, it is lower than the accepted rates.

 

Part 6 The value of a superstar

 

Back in 1960, I  built a computer simulator for the IBM 1620 computer. One of the issues I was concerned with was the relative worth of a "super-slugger." One of the tests I ran then was to create two teams equal in every way overall, but with one having every player of equal capability and the other having eight players of lesser capability with a super-slugger batting fourth. The  question I had was -- how much better would the second team be than the first? That was 40 years ago. Notes indicate that overall I saw the balanced team win more frequently than the unbalanced one -- which would argue against spending one's salary dollars accordingly. But I did not keep the results, and so those tests are rubbish history except for the question they pose.

 

In this test, I used SIMBASE again, creating two teams, exactly equal, except one had nine players with:

 

a .322 batting average (186 for 578),

a .425 slugging average, 5 homers, 8 triples, and 29 doubles,

 

and the other had eight players with

a .299 batting average (173 for 578),

a .380 slugging average, 3 homers, 6 triples, and 26 doubles,

 

and one player, batting fourth, with

a .501 batting average (290 for 578),

a .785 slugging average, 21 homers, 24 triples, and 53 doubles.

 

These two were then the same statistically. I played 10 sets of 600 games each between them. They wound up dead even; 3000 wins each. That shot down my  thesis. Darn! When an owner picks a super-player, of course, he also looks for the intangibles -- how much inspiration he might be to other players -- how many extra fans he will bring in, and so forth. I know that it was Feller on the Indians of the 30s to 50s that brought me out to the ballpark. Over that period of time his pitching probably accounted for at least a couple dozen "extra" games for our family.

 

I've made the argument that chance plays a large part in baseball -- and that its influence on the outcome of games as well as the resulting statistics is often overlooked by some of us, fans, SABRites, writers and broadcasters. That does not diminish, in my judgment, either the inherent worth nor the enjoyment of statistics. My thesis simply enjoins us to take them for what they are worth, imperfect measures of imperfect players made by imperfect people, some better than others, all talented far beyond the average person, who have given us over a hundred years of great enjoyment and will continue to do so for years to come. As a Christian, I fully expect to see many sports played in heaven. Baseball will be prominent among them. What delights we shall still see. Ruth batting against Feller? Barry Bonds against Addie Joss? Hank Aaron against Nolan Ryan? What joy. What bliss. Maybe even they will let me play!

 

John Burgeson, February, 2001

 

Press BackSpace to return