scdoggy
Moderator (Honorary)
I wanted to get a discussion going in here with the crew on how big a sample size everyone considers when using various pieces of data when capping. I see a lot of threads where people make inferences on batter vs pitcher, umpire trends, day/night splits, days of the week, etc, etc based on what I consider a relatively small sample size. Making sweeping assumptions based on small samples will come back to bite you in the ass. Examples:
Batter x is 8 for 12 lifetime off pitcher Y - seems way too small of a sample to take anything from. It means the batter has had success vs the pitcher, but regression says that they are just as likely to go 0 for 4 and swing back towards the mean.
Same with umpires - looking at pitcher X's performance vs. Umpire Y - and I see this one a lot. "Pitcher X is 3-0 lifetime with a 2.20 ERA in three starts with umpire Y behind the dish. This means zero to me. Anything under a sample size of at least 10 starts is really hard for me to draw any meaning from. The pitcher could have been a flyball guy with the wind gusting in from center in 2 of the three starts, any number of variables can diminish the value of a sample this small.
I realize that umps like Eddings, Vanover, Reynolds, Schrieber, and others have large samples to work with and we can look at years of data when making decisions about their tendencies. But even this info is factored into the game by the linesmakers unless you catch the ump by stalking the first game with a new crew. Otherwise, oddsmakers know who is behind the dish and have all the same data we do when making the line.
How much info do you guys need when evaluating a sample size to determine whether the info has relevance when capping a specific game?
Batter x is 8 for 12 lifetime off pitcher Y - seems way too small of a sample to take anything from. It means the batter has had success vs the pitcher, but regression says that they are just as likely to go 0 for 4 and swing back towards the mean.
Same with umpires - looking at pitcher X's performance vs. Umpire Y - and I see this one a lot. "Pitcher X is 3-0 lifetime with a 2.20 ERA in three starts with umpire Y behind the dish. This means zero to me. Anything under a sample size of at least 10 starts is really hard for me to draw any meaning from. The pitcher could have been a flyball guy with the wind gusting in from center in 2 of the three starts, any number of variables can diminish the value of a sample this small.
I realize that umps like Eddings, Vanover, Reynolds, Schrieber, and others have large samples to work with and we can look at years of data when making decisions about their tendencies. But even this info is factored into the game by the linesmakers unless you catch the ump by stalking the first game with a new crew. Otherwise, oddsmakers know who is behind the dish and have all the same data we do when making the line.
How much info do you guys need when evaluating a sample size to determine whether the info has relevance when capping a specific game?