Basketball players who make several baskets in succession are described as having a hot hand. Fans and players have long believed in the hot hand phenomenon, which refutes the assumption that each shot is independent of the next. However, a 1985 paper by Gilovich, Vallone, and Tversky collected evidence that contradicted this belief and showed that successive shots are independent events (http://psych.cornell.edu/sites/default/files/Gilo.Vallone.Tversky.pdf). This paper started a great controversy that continues to this day, as you can see by Googling hot hand basketball.
Our investigation will focus on Kobe Bryant’s performance with the Los Angeles Lakers in the 2009 NBA finals when playing against the Orlando Magic, which earned him the title Most Valuable Player. Many spectators commented on him having a hot hand. Let’s load some data from those games and look at the data structure with str
.
What was his typical streak length? How long was his longest streak of baskets?
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.7632 1.0000 4.0000
## Kobe's Streaks Percentage
## 0 1 2 3 4
## 0.51315789 0.31578947 0.07894737 0.07894737 0.01315789
Kobe’s typical streak length is 0, and his longest streak is 4 baskets.
So Kobe had some long shooting streaks, but are they long enough to support the belief that he had hot hands? What can we compare them to? Consider the idea of statistical independence. A shooter with a hot hand will have shots that are not independent of one another. Specifically, if the shooter makes their first shot, the hot hand model says they will have a higher probability of making their second shot.
During Kobe’s career, the percentage of time he makes a basket (i.e. his shooting percentage) is about 45%, or equivalently \[ P(\text{shot 1 = H}) = 0.45. \] If hot hands is really a thing, then when Kobe makes the first shot and has a hot hand (not independent shots), then the probability that he makes his second shot would go up to, let’s say, 60%, \[ P(\text{shot 2 = H} \, | \, \text{shot 1 = H}) = 0.60. \] Because of these increased probabilites, you’d expect Kobe to have longer streaks. Now, if hot hands are just a myth, and each shot is independent of the next. When Kobe hits his first shot, the probability that he makes the second is still 0.45. \[ P(\text{shot 2 = H} \, | \, \text{shot 1 = H}) = 0.45. \]
Now, having expressed the problem in this way we may assess if Kobe’s shooting streaks are long enough to indicate that he has hot hands. Here are two possible ways:
2a) calculating the conditional probabilities and
2b) comparing Kobe’s streak lengths to someone without hot hands (a simulated independent shooter).
First calculate the total percentage of shots that resulted in a basket in the 2009 NBA finals, as \[\frac{\#\text{ hits}}{\text{total shots}}\]
## [1] "43.61 %"
Kobe’s scoring average for this 2009 NBA finals game was ~ 43.61%
Next we need to filter out the streaks that had at least the first shot resulting in a Hit – by doing this we are conditioning the data to make the conditional statement. Since those streaks in which Kobe made the first shot have two shots or more, use the variable “shot.num” in the dataset to calculate \[P(\text{shot 2 = H} \, | \, \text{shot 1 = H})=\frac{\# (\text{shot 2 = H} \cup \text{shot 1 = H)}}{ \# (\text{shot 2 = H} \cup \text{shot 2 = M})}=\frac{\# \text{shot 2 = H}}{ \# \text{ shot 2}}\] by identifying those observations corresponding to the second shots (i.e., those with “shot.num==2”).
## [1] "36.11 %"
Given that Kobe made the first basket (in a potential streak) there was about a 36.11% chance that he would also make the second.
Is there evidence to think that Kobe has hot hands? How reliable is this conclusion? Provide an objective argument to justify your answer.
The second alternative is to compare Kobe’s streak lengths to the streak lengths of shooters without hot hands, or in other words to independent shooters. We don’t have any data from shooters we know to have independent shots, but this type of data is very easy to simulate in R. In a simulation, you set the ground rules of a random process and then the computer uses random numbers to generate an outcome that adheres to those rules. To simulate a single shot from an independent shooter with a shooting percentage of 50% we can use the code below (switch the chunk option eval=FALSE
to eval=TRUE
so that this chunk is evaluated).
Keep in mind that to make a valid comparison between Kobe and our simulated independent shooter, we need to align both their shooting percentage and the number of attempted shots.
Simulate 133 shots from an independent shooter comparable to Kobe and using calc_streak
, compute the streak lengths of this independent shooter.
Use the R functions table
and quantile
to compare the streak length distribution of Kobe and that for this independent shooter
## Sample's Hit's and Misses
## H M
## 58 75
## Kobe's Hit's and Misses
## H M
## 58 75
## Sample's Hit and Miss Percentages
## H M
## 43.61 56.39
## Kobe's Hit and Miss Percentages
## H M
## 43.61 56.39
## 0% 25% 50% 75% 100%
## 0 0 0 1 4
## 0% 25% 50% 75% 100%
## 0 0 0 1 4
Build the R function sim_generation
, which takes the number of independent shooters (\(N\)) to simulate, the number of shots taken by each shooter (\(m\)), and the shooting percentage (\(perc\)); and returns a data frame containing the outcomes for all shot taken by a single shooter in each row.
Use the function sim_generation
to simulate \(N=500\) shooters taking \(m=133\) shots and \(perc=0.45\).
Get creative, use the results from the previous exercise to evaluate if Kobe has in fact hot hands. For this you may use any R function of your choice, some options are quantile
, mean
, median
, histogram
, barplot
.
The distribution of the graphs above indicates that Kobe did not have hot hands.
This homework was created by adapting materials fromOpenIntro, which is released under a Creative Commons Attribution-ShareAlike 3.0 Unported.