Patterns vs regression in a small sample size

scdoggy

Moderator (Honorary)
I wanted to get a discussion going in here with the crew on how big a sample size everyone considers when using various pieces of data when capping. I see a lot of threads where people make inferences on batter vs pitcher, umpire trends, day/night splits, days of the week, etc, etc based on what I consider a relatively small sample size. Making sweeping assumptions based on small samples will come back to bite you in the ass. Examples:

Batter x is 8 for 12 lifetime off pitcher Y - seems way too small of a sample to take anything from. It means the batter has had success vs the pitcher, but regression says that they are just as likely to go 0 for 4 and swing back towards the mean.

Same with umpires - looking at pitcher X's performance vs. Umpire Y - and I see this one a lot. "Pitcher X is 3-0 lifetime with a 2.20 ERA in three starts with umpire Y behind the dish. This means zero to me. Anything under a sample size of at least 10 starts is really hard for me to draw any meaning from. The pitcher could have been a flyball guy with the wind gusting in from center in 2 of the three starts, any number of variables can diminish the value of a sample this small.

I realize that umps like Eddings, Vanover, Reynolds, Schrieber, and others have large samples to work with and we can look at years of data when making decisions about their tendencies. But even this info is factored into the game by the linesmakers unless you catch the ump by stalking the first game with a new crew. Otherwise, oddsmakers know who is behind the dish and have all the same data we do when making the line.

How much info do you guys need when evaluating a sample size to determine whether the info has relevance when capping a specific game?
 
Very tough to quantify patterns and samples sizes with a blanket statement. If a batter is 8 for 12 off Pitcher Y, I don't consider that a small sample size but if a batter is 1 for 12 I might. I don't consider it completely as luck or random if a player has such great success. When thinking about a hitter/pitcher matchup, if a guy is 8 for 12, he obviously sees that pitcher well and is not off balance or dominated by the pitcher. He may just flat out see that pitcher well and be getting good cuts and although some regression may be expected, it just may be a great matchup where he is going to hit that pitcher better than he hits anyone else. I think each situation is very subjective. Umpires are very dangerous from year to year as sometimes they change styles or are aware of styles. One stat that is very underrated is umpires tendencies towards home and away teams. I think that could be deeply psychological yet not something that an umpire is cognizant of. Why does one umpire show such a strong record towards the road team and another have such a strong record toward the home team. A lot of factors go into that, but I am willing to say that the umpire favoring the road team has a strong and tough personality and is not affected at all by the home crowd or adversity. So the amount of information needed varies on each individual situation I am looking at. It is always factored in to my thinking, it's just how strong of a factor, that I tailor to the situation based upon how comfortable I am with the information. And after writing this, I have probably given you know more clairty than you had before. LOL
 
I assume that relative sample size is smaller when comparing so and so hitter vs. a pitcher as opposed to a full season if that makes sense. I typically have in my head what I'm going to play and then check the pitcher vs. hitter matchups. Most of the time I still play, but occasionally I will back out of a play. Very few times will I look at a small sample <100 PAs vs a team and base a play on it. Unless the numbers are staggering and I'm getting + money.
 
I think when you use it with a SSS you need to look at the underlying things

on a matchup with 10-12 at bats

I would look at strike outs, walks, extra base hits (see if he is getting good contact), how the pitcher fared the rest of the day

i.e if player A is 4 for 12 against Pitcher B, but on 1 day he went 4-4 and Pitcher B got shelled for 13 hits, he might have just not had his stuff that day
 
Ok, so pitcher vs. hitter is something we all look at and because of a generally small-ish sample size unless we've got two vets from the same leauge, we sort of all agree that we value this info a little bit, but as anything else it is all part of a bigger equation. I'm much more of a "feel capper" for lack of a better term. I watch a ton of baseball and make a lot of my plays on gut feels for how the team is playing. I go into the days work, knowing what teams or totals are likely to interest me and making prices on them. If the prices show value, then I go do all the research and look at weather, umps, head to head, splits, bullpen numbers, etc etc. Sometimes in the course of my research I'll see things the put me on other plays (wind blowing out at Wrigley at 20 mph, etc). Like Dollaz said above - if I see certain numbers that work against me, the size of the play (or perhaps wagering on the game at all) is compromised.

Agree with GH saying we've got to be careful about umps. The guys around here who do the daily ump threads, stalking and are kind enough to post numbers for the rest of us - do a ton of work on the umpires and there is a lot of info available to us. I personally do look at umps, but it doesn't value into the equation very much for me. There are some umps that I will think twice about playing totals if I'm on the other side of them because they've been around for a while and just do or don't give certain pitches. But as I said in an earlier post, some umps don't have the sample size to work from that the vets do, and many of them are just on a run of overs/unders that will be relatively short lived and even out in the long haul. Plus, like GH said - some of them change their stripes and start overcompensating for things at a certain point. Who knows, maybe the leauge gets on them for being too generous/stingy. I guess what I'm saying here is be careful what you do with this information, and remember that unless we're catching a crew in game one of a series, all the ump info is out there for the linesmakers and your over/under guy has been factored into the line already.

Funny you mentioned the Home/away stuff. One of the things that I always wondered is whether it would be worthwhile to know where these guys grew up. Certainly a kid who was raised in Saint Lou might not give the corners to the corners to the Cubs' pitchers....
 
I think this is one of those debates that could go on forever. Every sample size is irrelevant IMO to the current situation. Stats are reflections of past performances but they saying is the only constant is change. So just about every about is in a different situation. A Guy can have 20atbats vs a pitcher and maybe 1 game in the day at home, 1 game at night at home and then 1 each on the road...4 different situations and possible ly spread over 2- 3seasons ...what has changed w a hitter over 3 yes or a pitcher?

I think sticking to the basics is best when it comes to baseball. I think at times we forget that its a simple decision. Of team A vs Team b, especially today when we rarely see SP even elite ones in the 8th and 9th .

I love stats just as much as anyone but I think the most important stats are probably ly the last 10 days and vs division al opponents. Teams play each other 18-19 time a season so usually see teams fairly regularly. Last 10 days or 7 days because its current form..

Stats are a representation of performance but so are wins and losses. So teams don't know how to win but stats may show something slightly different. However what good team has bad stats? And vice versa? Also things will regress but to what degree? Sometimes when a team is playing over their head the correction is even more extreme but when a buy is say 8/20 off a pit her is 2/8 regression. ? Some of it depends on what those 2 hits were..

I think its just all small part of the equation. Umpires play a role naturally but missed IMO is how they change outcomes? Show me an ump that has an under bias when he is calling games for the worst so in baseball and he gets 4-3 games...Umps are like anything else constantly changing. No ump can call a game the same exact way even twice in a row -- its called being human. How can you predict how many borderline pitches are going to be in a game. Perfect example Tom Glavine - he had consistent results for years and widely known he got an expanded strike zone. The thing is he had a different ump every start!! Sure you can find some umps he struggled with but that doesn't mean he was going to lose.

And so on and so on...
 
Umps have a bigger effect on MLB than refs do on NBA (unless the ref is crooked), especially since NBA decided this year to put under refs with over refs in a crew or homer refs with away refs.

As for MLB, if you play totals, you have to know who the umpire is for 75% of your plays, the other 25% are spot or situational (teams off a shutout, teams getting swept, teams with a struggling bullpen, etc). Take for example...Ump Emmel last year...he was 27-6 to the over last year, I think like 21-1 to start before he finally started leveling off. Its not coincidence that his games went over, he forces pitchers to throw hittable strikes and his games sailed over, repeatedly. This year, he started off 2-0 to under, so he appears to made some adjustments and looking at last year hasn't helped this year.

So what do I mean about the 75%? Lets say you have two struggling pitchers, but you have an ump trending to under. The ump will make it easier for the struggling pitchers to throw strikes, they go deeper in the game with the score 3-1 through 6 (their whip of 1.6 suddenly today becomes 0.60), then you have the good bullpen left for each, and the game ends 4-1. However, you get those two struggling pitchers, with an ump trending to over...you are more likely to see a 6-2 game through 3 1/3 innings and the game sailing over as bullpen has to get involved earlier which are the long relievers and the managers choose not to use the best relievers and the game ends up 8-5. Its all because the ump to a large degree determines how well the pitchers pitch each day.

Also, on the flipside, you get two great pitchers, on getaway day with team A's pitcher great (ie Rays at Rangers yesterday, Rays pitcher really good facing hot bats, he cooled them off, while Rays bats cold facing pretty good pitcher and he kept them in check, Rays won 2-0) and an ump trending to under, the starters go deep with an ump giving them the corners and the game sails under. You can get an over with two good pitchers, but thats usually more because of the batter/pitcher/team matchups than because the ump was neutral or over.

Umps that trend to away/dog or home/fav are definitely important to look at because they can give you great value on dogs. Its not something to rely on, but if you are looking for a reason to take a dog, its going to give you at least some value.

As for how many PA's...I think 6 or less is not enough to go on, but anything more than that, is a trend, especially if the whole lineup sees the pitcher well. Play2win is pretty good at discerning when a pitcher struggles vs a team. As for umps, you only need 3 or 4 to see that the ump gets along with the pitcher or not. I understand that we are dealing with small sample sizes, but most of the games, you aren't going to see many people with 20 AB's or umps who umped the same pitcher 7 times, a lot of baseball is random, but the trends on small sample sizes when the price is right have bared out pretty well.
 
That's why I profile and look for patterns -- not to mention you can spend hours sifting thru arts for one game let alone 15....its all about making the best decision / conclusion in the shortest amount of one's time. I just think having a general k owlwdge of stats is best....teams hit lh or RH well or not, day vs night, home vs away ....

Guess I disgaree with umps being important. To me they aren't. How many guys even look at the totals of the past games umps called? Something I have been saying for 10 years now ..LOL...So Pedro is pitching vs big unit and the total is 6.5 and its. a 2-1 game, that's because of the ump? ?

Watch the Indians - Yanks game on monday -- watch Mike Aviles atbat in the b9th I thinking, believe he was PH. It was a blowout ump called 2 Terrible strikes on him and then he crushes a HR. The ump effectively took the bat out of his hands and I was like wow -- then boom he crushed a HR. You can't predict what an umps role will be...that day..
 
Good contributions guys. What Nut alludes to at the end of his post is something I agree with a ton. Dispite umpires tendencies, their effect on games going over/under is minimal in my opinion. Others will argue that pitchers throw differently when certain umps are behind the plate, and sure - this is true - but knowing they'll get the corner or the high strike one night and not the next because of who is behind the plate is known to the hitter as well. Think about this - to look at games where umps actually played a serious hand in going over/under you would throw out all the games that went way over or way under. Right off the bat you get rid of about 60% of the games. Then you look at the ones that fell on, or within a run or two of the total and you decide how many were because of tight or loose zones. Its going to be a small number.

Playball - go back and look at Emmel's year in 2012. Despite all those overs, his strike percentage last year was just shy of 63%. Really middle of the road number that isn't really indicative of guy who is determining the outcomes of the games. Many of those overs FLEW over the number. It wouldn't have mattered who the ump was, the game was going over no matter what. This is exactly the type of situation I was thinking about with this thread. Emmel's run was one for the ages because it lasted almost the whole season, but it really was a statistical anomaly..
 
That's part of my point w umps. The total, the teams, SP are crucial in looking at umpire outcomes but few if any do. The Total is the expectation. The teams paint a picture of the type lineups and Parks are involved. The SP for the quality of pitching. How can an ump labeled as o er or under when the totals are not consistent ?? Anywhere from 7 to 10.

So what's odd about a 7-5 game on 10 total? Odd to me is 2-1 in that game. At the same time I can understand it doesn't hurt to create a profile for an ump. Again,its so tine consuming. Personally I think not worrying so much about factors has made me better. Factors being wind, umps, things like that. Today its all out there - its going to at least mildly priced in by the time a game starts. Is the wind ever blowing out and we have a 7 .5 total? Maybe out west with the new dimensions but all these factors tend to be priced in
 
There is definitely a point when you can drive yourself mad looking at numbers. I don't have as much time as I once did to spend capping this stuff anyway - so my stlye of capping fits me better watch games - have an idea what you want to play going into the day, do a bit of reasearch, make prices, compare... The wind is one thing I always do pay attention to. Bullpens and weather are probably the things I look at first when examining a potential play.
 
Emmel in 2012
4/7 - LAD (Capuano) at SD (Moseley) - 5-5 after 5, game ended 6-5 - Total was 7
4/9-4/11 - was in HOU (hosting ATL)
4/13 - CLE (D. Lowe) at KC (Hochevar) - 7-1 after 1, game ended 8-3 - Total was 8
4/16-4/19 - his crew was in CWS (hosting BAL)
4/20-4/22 - his crew was in Cubs (hosting CIN)
4/25 - WAS (Zimmerman) at SD (Wieland) - 2-1 after 5, WAS scored 4 in 7th, game ended 7-2 - Total was 6.5
4/29 - Mets (Johan Santana) at COL (Moyer) - 3-0 after 1, 4-0 after 5, COL scored 4 in 8th to tie at 4-4, then extras saw Mets score 2, COL score 1 - game ended 6-5, Total was 9.5 (took 3 runs in extras to go over)
5/1-5/3 - his crew was in SF (hosting MIA)
5/4 - STL (Lohse) at HOU (Harrell) - 5-4 after 3, ended 5-4 - Total was 8.5
5/8 - TEX (Feliz) at BAL (Arrieta) - 5-0 after 3, game ended 10-3 - Total was 8.5
5/11-5/13 - his crew was in Yanks (hosting SEA)
5/14-5/15 - his crew was in Mets (hosting MIL)
5/16 - CIN (Leake) at Mets (Johan Santana) - 3-1 after 6, CIN scored 4 in 8th - game ended 6-3 - Total was 7.5
5/20 - BOS (Beckett) at PHI (C. Lee) - 5-0 after 3, ended 5-1 - Total was 7 (UNDER)
5/22-5/24 - his crew was in CLE (hosting DET)
5/25 - COL (Friedrich) at CIN (Cueto) - 5-3 after 5, ended 6-3 with a run in 9th, Total was 8.5
5/28 to 6/3 - Crew was on Vacation
6/5 - TEX (Holland) at OAK (Blackley) - 5-2 after, ended 6-3, Total was 7.5
6/9 - Cubs (Samardzija) at MIN (Diamond) - 8-0 after 4, ended 11-3 - Total was 8

I could keep going...but you can see most of the time, the starters blew up, sometimes it was the bullpen.
 
Play ball - I'm missing your point on that last post. How did Emmel's zone have anything to do with those games based on the notes above?
 
Current year umpire over/under is meaningless. Pitcher vs current ump in 5 games is meaningless. Small sample hitter-pitcher stuff has been shown to be pretty meaningless too. The exception would be if you have a platoon reason that backs it up. Then you would probably want to see if the hitter struggles against other similar pitchers. If you want to judge pitcher vs ump, check him vs all the under or over umps imo
 
Back
Top