About

These are class notes and R code for Dorcas Ofori-Boaten’s STAT-461 : Introduction to Statistics I for Fall term 2021 at Portland State University.

W1-D1 : Tues. Sept. 28, 2021

Dorcas or Professor Dorcas
first of series of three courses
originally from Ghana (small country)
passion and drive for math and analysis
goal of all classes is academic excellence
send email anytime of day to doforib2@pdx.edu
please read through syllabus
first homework will be assigned on Thursday and due the following Thurseday
due dates will switch to Tuesday after the midterm
extra credit can be carried onto something in the future if it isn’t needed on direct homework
some extra credit will be in the slides (pay attention)
homework due at 11:5PM (wont work at 12:00)

Chapter 1 : What is Statistics?

statistics is applied math which allows you to study data to form a judgment in a case of real world applications.
have discipline that is math related, but more applied in it’s focus
get data, then summary or form of information from that data
real world scenarios aren’t fixed or absolute
in collection of data there is variability, but we are still able to capture the variability (that is the endtails of statistics)
When you say something is average, you are saying it with some confidence
random errors also have to be continuous

Population , Sample, Census

Parent	Child	Example
Population : well defined collection of objects in a study (focus of the study)	Parameter	all PSU Student
Sample : smaller portion of population that reflects population	Statistic	sample of students
Census : all desired information of a population

the first letter of each of these goes back to the parent
Example: Difference between rookie salaries between Yankee and Marlon players.
- What are the collection of objects for this study? (aka the the population)
  - The entire collection of people, who include all Yankee players and all Marlon players
a sample is a smaller portion of the population used to predict the nature of the entire population
- there are different ways to sample different schemes
- make sure the sample is a true reflection of the structure of the population
the census is when you go into population and collection all desired information person by person (or object to object)
- list of desired information is know as a census

Types	Defintition	Methods
Descriptive	allow the methods to calculate summaries (mean, variance, ect.)	graphical (histogram on) and numerical
Inferential	want to infer, predict what will happen next

once able to describe sampel and move it onto the population you are moving into inferential population

Descriptive Statistics

for graphical methods looks at ..

the center
the variation
the distribution \(\frac{a_{1}+a_{2}+a_{3}+...+a_{n}}{n}\)
time
outliers

histogram is bars laid side by side to show data in a study

Skew

left skew, right skew, and symmetric distribution

skew	definition
Right Skew	data is on the left with gradation will decreasing to the right
Left Skew	tallest bar will be to the right and gradion will be to the left
Symmetric	equibalance state, where the tallest bar is in the middle and divides the data into two equal parts

for a normal distribution mean, median, and mode are the same thing
based on skewness you can tell where mean will fall
other graph patterns : U shape, uniform, Binodal (two bumps), bell-shaped

Mean

Population Mean : \(\mu=\frac{x_{1}+x_{2}+x_{3}+...+x_{n}}{N}\)
Sample Mean : \(\overline{x}=\frac{x_{1}+x_{2}+x_{3}+...+x_{n}}{n}\)
Sample Mean (textbook) : the sample mean , “y-bar”, of n measured responses \(y_{1},y_{2},...,y_{n}\)

\[\overline{y}=\frac{1}{n}\sum_{i=1}^{n}y_{i}\]

the mean is sensitive to outliners (not robust)
the mean is not an accurate value to give on its own.
trimmed mean doesn’t mean you throw away, you simply compute the mean you are intrested in

Median

sample median : \(\widetilde{x}\)
to find the median : order the data from smallest to largest and find the value in the middle
- if total number of population or sample is even then will need to take the two middle values and divide them by 2.
median is robust to outliers, because the center will still be the center regardless
advisable to use the route of median for summaries
to find the median : order the data set and find the value in the middle

Mode

highest frequency
value that has appeared the highest number of times
mode is essential descriptor for data

Range

Max - Min

Variance and Standard Deviation

Population Variance : \(\sigma ^{2}=\frac{[(x_{1}-\mu)^{2}+(x_{2}-\mu)^{2}+...(x_{n}-\mu)^{2}]}{N}=\frac{(x_{i}-\overline{x})^{2}}{N}\)
Population Standard Deviation : \(\sigma=\sqrt{\sigma^{2}}\)
Sample Variance : \(s^{2}=\frac{[(x_{1}-\overline{x})^{2}+(x_{2}-\overline{x})^{2}+...(x_{n}-\overline{x})^{2}]}{n}=\frac{(x_{i}-\overline{x})^{2}}{n}\)
Sample Variance (textbook) : \[s^{2}=\frac{1}{n-1}\sum_{i=1}^{n}(y_{i}-\overline{y})^{2}\]
Sample Standard Deviation : \(s=\sqrt{s^{2}}\)

Summary

Statistics is the real world application of math which allows one to study data in order to form a judgment.
Population is the involves the all objects whereas a sample is just a representative subset of the population.
Two types of statistics :
- Descriptive which is the method of calculating descriptive values such as mean, median, mode, and variance, as well as graphical methods such as a histogram (see Histogram Trial- Day 1)
- Inferential statistics uses the descriptions to infer or make predictions about the future.
skew data has the bulk of the data beside it.
- left skew, bulk of data is on right (mean < mode)
- right skew, bulk of data is on left (mean > mode)
Population uses geek symbols, and Sample uses the alphabet.
I need to start recognizing the matching the symbols and equations faster in my mind. This will help me understand the regression class better.

W1-D2 : Thur. Sept. 30, 2021

\(\Rightarrow\) Review from last class :

Characterizing a Set of Measurements : Graphical Methods
- Distribution Patterns : U Shape, Uniform, Bimodal, Bell-Shape (Normal)
- Symmetry (balancing in the middle), right skewed (mean to the right of data), left skewed (mean to the left of data)
- Histograms can be used with frequency, relative frequency or percentage.
Characterizing a Set of Measurements : Numerical Methods
- Mean (average)
  - Sample Mean : \(\overline{y}=\frac{1}{n}\sum_{i=1}^{n}y_{i}\)
  - Population Mean : \(\mu =\frac{1}{N}\sum_{i=1}^{N}Y_{i}\)
  - Mean is sensitive to outliers because it uses every data value
- Median (middle value)
  - Median is not sensitive to outliers because it only looks at middle values
  - For even data sets the mean of the two middle values is the median
- Mode (appears most frequently)
- Range (Max - Min)
  - sensitive to outliers
- Variance (avg. squared deviation of data from mean)
  - Population Variance : \(\sigma^{2}=\frac{1}{N}\sum_{i=1}^{N}(Y_{i}-\mu)^{2}\)
  - Sample Variance : \(s^{2}=\frac{1}{n-1}\sum_{i=1}^{n}(y_{i}-\overline{y})^{2}=\frac{1}{n-1}[\sum_{i=1}^{n}y_{i}^{2}-n\overline{y}^{2}]\)
  - sensitive to outliers
- Standard Deviation (avg. deviation of data from mean)
  - Population Standard Deviation : \(\sigma=\sqrt{\sigma^{2}}\)
  - Sample Standard Deviation : \(s=\sqrt{s^{2}}\)
  - Approximate Sample : \(s\approx \frac{\text{Range}}{4}\)
  - sensitive to outliers
- Smaller variance or standard deviation indicates that the data are more consistent

Examples

In class : Given a random sample of 10 observations such that the sample mean is 5 and \(\sum_{i=1}^{n}y_{i}^{2}=350\). Compute the sample standard deviation. (Hint use the shortcut formula)

Given \(n=10\) and \(\overline{y}=5\) the sample standard deviation is :

\[\begin{equation} \label{a} \begin{split} s & = \sqrt{\frac{1}{n-1}[\sum_{i=1}^{n}y_{i}^{2}-n\overline{y}^{2}]}\\ & = \sqrt{\frac{1}{10-1}[350-(10*5^{2})]}\\ & = \sqrt{\frac{100}{9}}\\ & = 3.33 \end{split} \end{equation}\]

Example 11 : Listed below are the distances (in kilometers) from a home to local supermarkets.

data <- c(1.1, 4.2, 2.3, 4.7, 2.7, 3.2, 5.6, 3.3, 3.5, 3.8, 4.0, 1.5, 4.5, 4.5, 2.5, 4.8, 3.3, 5.5, 6.5, 12.3) 
data

##  [1]  1.1  4.2  2.3  4.7  2.7  3.2  5.6  3.3  3.5  3.8  4.0  1.5  4.5  4.5  2.5
## [16]  4.8  3.3  5.5  6.5 12.3

Compute the Range (12.3-1.1)

range <- max(data)-min(data)
range

## [1] 11.2

Approximate the standard deviations using the range

\(s\approx \frac{\text{Range}}{4}=\frac{11.2}{4}=2.8\)

The following output shows numerical summaries obtained with the R package pastecs.

library(pastecs)

## 
## Attaching package: 'pastecs'

## The following objects are masked from 'package:dplyr':
## 
##     first, last

stat.desc(data)

##      nbr.val     nbr.null       nbr.na          min          max        range 
##   20.0000000    0.0000000    0.0000000    1.1000000   12.3000000   11.2000000 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
##   83.8000000    3.9000000    4.1900000    0.5238973    1.0965297    5.4893684 
##      std.dev     coef.var 
##    2.3429401    0.5591743

Why do you think the computed s (2.343) is slightly smaller than the approximated value in part (b)?

Simple, becauase part(b) is an approximation.

The Empirical Rule

When the data is bell-shaped (normally distributed) then the Empirical Rule can be used to find the percentage of the data with 1, 2, or 3 standard deviations about the mean.

first use the Empirical Rule to find out the number of standard deviations (z) correspond to bounds of interval
next transfer z to data value (y) of variable using the following formula

Data Value = Mean + z*Standard Deviation

sample : \(y=\overline{y}+zs\)
population : \(Y=\mu +z\sigma\)

Summary

Review descripitive statistics including examples
The Empirical Rule : 68-95-99.7

W2-D3 : Tues. Oct. 5, 2021

moving into chapter 2 which deals with more probability
review empirical rule

Examples

Example : Resting breathing rates for college-age students are approximately normally distributed (bell-shaped) with mean 12 and standard deviation 2.3 beaths per minute.

What percentage of students have between 9.7 and 14.3 breaths per minute?

\(z_{1}=\frac{y-\overline{y}}{s}=\frac{9.7-12}{2.3}=-1\)

\(z_{2}=\frac{14.3-12}{2.3}=1\)

Accoding to the empirical rule, about 68% of students have breath rate within 1 standard deviaiton of the mean.

What percentage of students have less than 7.4 breaths per minute?

\(z=\frac{7.4-12}{2.3}=-2\)

\(\frac{100-95}{2}=2.5\%\)

2.5% of students have brath rates less than 7.4 breaths per minute.

Example Select all the histograms for which the empirical rule is valid or good to use.

(f), because it is the only one that follows the bell shape curve

Probability

What is probability?

Probability refers to the study of randomness and uncertainty.

Probability indicates how likely it is for something to occur in a random experiment. For example, how likely or what is the chance that if you roll a fair 100-sided die (slides labeled 1 through 100), how likely you observe 65 on the top.

An experiment is any activity / process whose outcome s is subject to uncertainty.

Toss a coin, Roll a Die once or twice.

A sample space is the set of all possible outcome of a random experiment. Denoted by S.

Toss a coin once \(S=\{\text{heads, tails}\}\)

An event is a subset (small collection of outcomes) of sample space and denoted by capital letter. Event and set are interchangeable.

What are the outcomes with at least one head when a coin is tossed twice?

\(S=\{\text{HH, HT, TH, TT}\}\) \(A=\{\text{HT,TH}\}\)

If an event has only a single point, then it is a simple event. Otherwise it is a compound event.

The Null set (empty event) is an event with no elements. Denoted \(\emptyset\) or \(\{\}\).

Set Theory Notation

The complement of an event A is the set of all outcomes in the sample space that do not belong to A. Denoted : \(A^{c}\), \(A'\) or \(\overline{A}\).

The intersection of two events A and B. The collection of all outcomes that appear in both A AND B. Denoted : \(A\cap B\)

\(A\cap B=B\cap A\)

The union of two events A and B. The collection of all outcomes that appear in either A OR B OR in both. Denoted : \(A\cup B\)

Event B is a subset of Event A if every element of B is also in A. This is denoted by \(B\subset A\)

If \(A\cap B=\emptyset\), then A and B are disjoint or mutually exclusive events.

\(A_{1}, A_{2},...,A_{n}\) are exhaustive events if and only if \(A_{1}\cup A_{2}\cup ...\cup A_{n}=S\)

\(A_{1}, A_{2},...,A_{n}\) are pairwise mutually exclusive (dijoint) and exhaustive events :

if and only if \(A_{1}\cap A_{j}=\emptyset \forall i\ne j\)

AND

\(A_{1}\cup A_{2}\cup ...\cup A_{n}=S\)

Basic Properties

\(A\cap \emptyset = \emptyset\)

\(A\cup \emptyset =A\)

\(A\cap \overline{A}=\emptyset\)

\(A\cup \overline{A}=S\)

\(\overline{S}=\emptyset\)

\(\overline{(\overline{A})}=A\)

DeMorgan’s Law

\((\overline{A\cap B})=\overline{A}\cup \overline{B}\)

\((\overline{A\cup B})=\overline{A}\cap \overline{B}\)

Distributive Law

\(A\cap (B\cup C) = (A\cap B)\cup (A\cap C)\)

\(A\cup (B\cap C) = (A\cup B)\cap (A\cup C)\)

Probability

probability of an event \(A:P(A)\)

\[P(A)=\frac{\text{count(numbers) of outcomes in event A}}{\text{count (number) of outcomes in the sample space}}\]

Note : \(0\leq P(A)\leq 1\)

Counting Rules

Product rule for k-tuplets : If a process can be broken down into a sequence of K steps, then the total number of possible outcomes is the product of the number of outcomes at each step.

Suppose a license plate number is formed with 3 letters, followed by 3 numbers. How many different outcomes are possible? \(26^{3}\cdot 10^{3}=17576000\)

The Factorial Rule is the number of ways to order or rank or arrange n objects is \(n!=n(n-1)(n-2)...(3)(2)(1)\).

Notes 0!=1, and 1!=1

There are four candidates for a job. The members of the search committee will rank the four candidates from strongest to weakest. How many different outcomes are possible? \(4!=424\)

The Combination rule has the following set up :

n different items are available
select k of the n items without replacement
order of selection does not matter

\[{n\choose k}C={n\choose k}=\frac{n!}{k!(n-k)!}\]

Note : “n choose k”

How many 5 element subsets are in the set S={2, 3, 4, 5, 6, 7, 8, 9}?

The Permutation Rule has the following set up :

n different items are available
Select k of the n items without replacement
Order of selection matter.

\[{n\choose k}P=\frac{n!}{(n-k)!}\]

There are ten candidates for a job. The search committee will choose four of them, and rank the chosen four from strongest to weakest. How many different outcomes are possible?

Counting Rules : Summary

More Examples

These are not extra credit, but try these to get counting technique down.

A fleet of 9 taxis is to be dispatched to 3 airports in such a way that 3 go to airport A, 5 go to airport B and 1 goes to airport C. In how many distinct ways can this be accomplished?

\(^{9}C_3\times^{6}C_5\times^{1}C_1=84\times 6\times 1=504\)

Using TI-84 : [MATH]\(\Leftarrow\)[PRB] 3: nCr

From a collection of 9 paintings, four are to be selected to hang side by side on a gallery wall in positions 1, 2, 3, and 4. In how many ways can this be done?

\(^{9}P_3=3,024\)

because order matters

Using TI-84 : [MATH]\(\Leftarrow\)[PRB] 2: nPr

Suppose your iPod contains 5 albums, each with 10 songs. How many ways can a playlist of 4 songs be selected if a song can be repeated?

\((5\times 10)^4=6,250,000\)

we want to choose 4 songs out of 5 albums (50 songs) with replacement, and order matters.

A class consists of 19 students: 8 sophomores, 5 juniors, and 6 seniors. The instructor randomly selects a group of 6 students to present problems on the board. How many possible groups can be chosen?

\(^{19}P_6=19,535,040\)

we want to choose 6 students out of 19 without replacement, and order matters.

A class consists of 19 students: 8 sophomores, 5 juniors and 6 seniors. The instructor randomly selects a group of 6 students to present problems on the board. How many possible groups can be chosen if the group must consist of 2 sophomores, 2 juniors, and 2 seniors?

\(^{8}C_2\times^{5}C_2\times^{6}C_2=28\times 10\times15=4,200\)

we want to choose 2 out of 8 sophomores, 2 out of 5 juniors, and 2 out of 6 seniors without replacement, and order does not matter.

Probability Examples

A student prepares an exam by studying a list of 10 problems. She can solve 6 of them. For the exam, the instructor selects 5 problems at random from the 10 on the list given to the students. What is the probability that the student can solve all 5 problems on the exam?

\(\frac{^{6}C_5}{^{10}C_5}=\frac{6\choose5}{10\choose 5}=0.024\)

we want to see how many combinations are possible in the space, so it doesn’t matter how many the student can solve when calculating the denominator.

Extra Credit

The college of liberal arts and sciences is made up of 12 freshmen, 10 sophomores, 7 juniors and 7 seniors. What is the probability of selecting a student from each group to form a student advisory council?

\(\frac{^{12}C_1\times ^{1}C_1\times ^{7}C_1\times ^{7}C_1}{^{36}C_4}=\frac{5880}{58905}=0.099822\)

Answer : ~9.98% rare likelihood

we want to first choose 4 out of 36 students without replacements, and order doesn’t matter. P(B)
then we want to choose 1 of the 12 freshman, 1 of 10 sophomores, 1 of 7 junors, and 1 of 7 seniors without replacements and order doesn’t matter. P(A)
P(A)/P(B)

Summary

Homework 1 is due the end of day Thursday (completed)
Email Extra Credit to Professor before class Thursday (completed)
Today we did examples applying The Empirical Rule and the equation \(z=\frac{y-\overline{y}}{s}\)
Went over basic probability definitions, notation, and equations
Four counting rules where order does and doesn’t matter, and there is or is not replacement. (see picture above)
Should work through examples given today before next class (completed)

W2-D4 : Thur. Oct. 7th, 2021

review some of the counting problems from last class

Axioms of Probability

For any event A, \(P(A)\geq 0\)
P(S)=1
If \(A_1,A_2,A_3,\)… is an infinite collection of pairwise mutually exclusive events, then \(P(A_1\cup A_2\cup A_3\cup ...)\sum P(A_i)\)

Properties of Probability

P(\(\emptyset\))=0
If A and B are disjoint, then \(P(A\cap B)=0\)
For any event A, \(P(A)+P(A^C)=P(S)=1\)
\(\Rightarrow P(A)=1-P(A^C)\)
\(\Rightarrow P(A^C)=1-P(A)\)
For events A and B, \(\Rightarrow P(A\cup B)=P(A)+P(B)-P(A\cap B)\)
If A and B are dijoint, then \(\Rightarrow P(A\cup B)=P(A)+P(B)\)
\(P(A\cup B\cup C)=P(A)+P(B)+P(C)-P(A\cap B)-P(A\cap C)-P(B\cap C)+P(A\cap B\cap C)\)

Examples

What is the probability of obtaining a total of 7 or 11 when two dice are tossed once? (imagine a picture of all the combinations)

\[\begin{equation}\label{711dice} \begin{split} P(7\text{ or }11) & = P(7\cup 11)\\ & = P(7)+P(11)+P(7\cap 11)\\ & = P(7)+P(11)+0\\ & = \frac{6}{36}+\frac{2}{36}\\ & = \frac{8}{36} \end{split} \end{equation}\]

What is the probability of obtaining a 9 or 13?

\[\begin{equation}\label{913dice} \begin{split} P(9\text{ or }13) & = P(9\cup 13)\\ & = P(9)+P(13)+P(9\cap 13)\\ & = P(9)+P(13)+0\\ & = \frac{4}{36}+\frac{0}{36}\\ & = \frac{4}{36} \end{split} \end{equation}\]

Suppose that in a class of 100 kids that 45 like Sprite, 60 like Cola, and 20 like both. If a kid is randomly selected and ask the drink they like, estimate the following probabilities : (complete in 6 minutes)

P(Sprite)=\(\frac{45}{100}=0.45\)
P(Cola)=\(\frac{60}{100}=0.60\)
P(Both)=\(\frac{20}{100}=0.20\)
P(At least one of these drinks)=\(0.25+0.2+0.4=0.85\)

can be written as cola or sprite

P(Only Cola) = \(=\frac{60-20}{100}=0.4\)
P(Neither) = \(1-0.85=0.15\)
P(Exactly one of the two events occur) = \(\frac{40+25}{100}= 0.65\)

More on Properties of Probability

If \(A_1 , A_2 , .... , A_n\) are any events then :

\(P(A_1\cup A_2\cup ...\cup A_3)=P(\bigcup\limits_{i=1}^{n}A_i)=\)

\(\sum\limits_{i=1}^{n}P(A_i)-\sum \sum\limits_{1\leq ij\leq n} P(A_i\cap A_j)+\sum \sum\limits_{1\leq ijk\leq n}\sum P(A_i\cap A_j\cap A_k)-...+(-1)^{n-1}P(A_1\cap A_2\cap ...A_n )\)

\(P(\bigcup\limits_{i=1}^{n}A_i)\leq \sum\limits_{i=1}^n P(A_i)\)

Conditional Probability

\(P(A|B)\): Probability of A given B.

What is the probability that A will occur given that B has occured?
For any two events, A and B with \(P(B)>0\), the conditional probability of A given B is defined as

\[P(A|B)=\frac{P(A\cap B)}{P(B)}\]

\[\Rightarrow P(A\cap B)=P(A|B)\times P(B)\]

(Multiplication Law of Probability)

\[\Rightarrow P(A\cap B)=P(B|A)\times P(A)\]

\(P(A|B)\ne P(B|A)\)
example : P(dark | midnight)=1, P(midnight|dark)

Example

One card is taken from a pile of 52 cards. Let event A occur an ace is taken out, event B when a card of spades is taken out. Compute P(A|B) and P(B|A). (two minutes)

\(P(A)=\frac{4}{52}\)

\(P(B)=\frac{13}{52}\)

\(P(A\cap B)=\frac{1}{52}\)

\(P(A|B)=\frac{P(A\cap B)}{P(B)}=\frac{\frac{1}{52}}{\frac{13}{52}}=\frac{1}{13}=0.0769\)

\(P(B|A)=\frac{P(B\cap A)}{P(A)}=\frac{\frac{1}{52}}{\frac{4}{52}}=\frac{1}{4}=0.25\)

Statistical Independence

Two events A and B are statistically independent if and only if

\[P(A\cap B)=P(A)\times P(B)\]

\[P(A|B)=P(A)\]

\[P(B|A)=P(B)\]

Example

Suppose we flip a coin twice. Let \(A_1\) = {1st coin is heads} and \(A_2\) = {2nd coin is heads}. Are \(A_1\) and \(A_2\) independent?

S={HH, HT, TH, TT}

\(P(A_1)=\frac{1}{2}\)

\(P(A_2)=\frac{1}{2}\)

\(P(A_1\cap A_2)=\frac{1}{4}\)

\(P(A_2|A_1)=\frac{P(A_1\cap A_2)}{P(A_1)}=\frac{\frac{1}{4}}{\frac{1}{2}}=\frac{1}{2}=P(A_2)\)

A beverage store has the following pre-made basket that contain the following gift combinations:

labels	cookies	mugs	candy	total
Labels	Cookies	Mugs	Candy	Total
Coffee	20	13	15	48
Tea	18	16	12	46
Total	38	29	27	94

Are the events Tea (T) and Candy (C) independent?

\(P(T)=\frac{46}{94}\)

\(P(C)=\frac{27}{94}\)

\(P(T\cap C)=\frac{12}{94}\)

Yes, Tea and Candy are statiistically independent.

More of Independence

If A and B are statistically independent, then

A and \(\overline{B}\) are independent, that is \(P(A\cap \overline{B})=P(A)P(\overline{B})\)
\(\overline{A}\) and \(\overline{B}\) are independent, that is \(P(\overline{A}\cap B)=P(\overline{A})P(B)\)
\(\overline{A}\) and \(\overline{B}\) are independent, that is \(P(\overline{A}\cap \overline{B})=P(\overline{A})P(\overline{B})\)
A collection of events … are pairwise independent if and only if \[P(A_i\cap A_j )=P(A_i)P(A_j)\forall i\ne j\]

Mutual Independence

A collection of events \(A_1, A_2, ..., A_n\) are mutually independent if and only if \[P(A_1\cap A_2\cap ....A_n)=P(A_1)P(A_2)...P(A_n)=\prod\limits_{i=1}^{n}P(A_i)\]

Pairwise independence \(\nRightarrow\) Mutually independent
Mutually independence \(\Rightarrow\) Pairwise independence

The Law of Total Probability

Let \(A_1, A_2, ..., A_n\) be n mutually exclusive (disjoint) and exhaustive events. Then for any event B,

\[B=(B\cap A_1)\cup(B\cap A_2)\cup ... (B\cap A_n)\]

AND

\[P(B)=P(B\cap A_1)+(B\cap A_2)+ ... (B\cap A_n)\]

Equivalently,

\[P(B)=[P(B|A_1)\times P(A_1)]+[P(B|A_2)\times P(A_2)]+...+[P(B|A_n)\times P(A_n)]\]

Summary

A week ahead of schedule, so next week will be review questions
read chapter 2 of textbook
Start prepping practice test problems (homework, in class examples, textbook) to study this weekend
start prepping a “cheat sheet” for tests (complete)
finish / review today’s notes when posted (complete)
start hw 2 when it’s post (wont be posted until next Thu.)

W3-D5 : Tues. Oct. 12th, 2021

use textbook for practice problems

Review

Conditional Probability : \(P(A|B)=\frac{A\cap B}{P(B)}\)
Multiplication Law of Probability : \(P(A\cap B)=P(B|A)\times P(A)\)
Statistical Independence : IFF \(P(A\cap B)=P(A)\times P(B)\) , \(P(A|B)=P(A)\) , \(P(B|A)=P(B)\)
If A and B are statistically independent, then A and \(\overline{B}\) , \(\overline{A}\) and B , and \(\overline{A}\) and \(\overline{B}\) are independent
Mutual Independence : IFF \(P(A_1\cap A_2\cap ....A_n)=P(A_1)P(A_2)...P(A_n)=\prod\limits_{i=1}^{n}P(A_i)\)
Total Law of Probability : for dijoint and exhaustive events, \(P(B)=\sum\limits_{i=1}^k[P(B|A_i)\times P(A_i)]\)

Bayes’ Rule

Let \(A_1,A_2,...,A_k\) be k mutually exclusive (disjoint) events, each with probabilities \(P(A_i)\), for \(i=1,2,...,k\).

\[P(A_i|B)=\frac{P(A_i\cap B)}{P(B)}=\frac{P(B|A_i)\times P(A_i)}{\sum\limits_{j=1}^{k}[P(B|A_j)\times P(A_j)]}\]

Example 1

Suppose that 1% of a population uses a certain drug. Let

D : uses the drug

\(D^c\) : does not use the drug

T : tests positive for disease

\(T^C\) : tests negative for disease

The drug manufacturer claims that \(P(T|D^C)=0.0.015\) ; \(P(T^C|D)=0.005\)

Given a positive test, find the probability that a person actually uses the drug. [Hint: \(P(D|T)\)]

\(P(D|T)=\frac{P(T|D)\times P(D)}{[P(T|D)\times P(D)]+[P(T|D^C)\times P(D^C)]}\)

\(\Rightarrow P(D|T)=\frac{(0.995)(0.001)}{(0.995\times0.01)+(0.015\times 0.99)}=\frac{199}{496}\approx 0.4012\) (4 dip.)

every level of the tree adds up to 1

Example 2

65% of the email sent to my account is spam. If an email is actually spam, the spam filter will correcltly identify it 80% of the time. If the email is not spam, the filter still tags it as spam 4% of the time. Given that a message has been tagged as spam, what is the probability that it is actually a spam? Given that a message makes it through the filter without being tagger as a spam, what is the probability that it really isn’t spam?

looking at performance of the filter

Let S : spam , and C : spam filter tags as spam

P(S) = 0.65 , P(C|S) = 0.8 , P(C|\(S^C\)) = 0.04

\(P(S|C)=\frac{P(C|S)\times P(S)}{P(C|S\times P(S))]\times [P(C|S^C)\times P(S^C)]}\)

\(\Rightarrow P(S|C)=\frac{0.8(0.65)}{[0.8(0.65)]+[(0.04)(0.35)]}=\frac{260}{267}\approx 0.9738\) (4 dp)

\[\begin{equation}\label{ps^C|c^c} \begin{split} P(S^C|C^C) & = \frac{P(C^C|S^C)\times P(S^C)}{P(C^C|S^C\times P(S^C))]\times [P(C^C|S)\times P(S)]}\\ & = \frac{0.96(0.35)}{0.96(0.35)+0.2(0.05)}\\ & = \frac{168}{233} \pprox 0.7210 \end{split} \end{equation}\]

(4 dp)

Extra Credit

email before 10am Thur.
will solve both on Thur.

At a certain gas station, 40% of the customers use regular gas (\(A_1\)), 35% use plus gas (\(A_2\)), and 25% use premium (\(A_3\)). Of those customers using regular gas, only 30% fill their tanks (event B). If those customers using plus, 60% fill their tanks , whereas of those using premium, 50% fill their tanks.

P(\(A_1\)) = 0.4 , P(\(A_2\)) = 0.35 , P(\(A_3\)) = 0.25

P(B|\(A_1\)) = 0.30 , P(B|\(A_2\)) = 0.60 , P(B|\(A_1\)) = 0.50

(a) What is the probability that the next customer will request plus gas and fill the tank?

\(P(A_2\cap B)=P(B|A_2)\times P(A_2)=0.60\times 0.35 = 0.21\)

(b) What is the probability that the next customer fills the tank?

\[\begin{equation}\label{p fill tank} \begin{split} P(B) & = P(A|cap A_1)+P(A|cap A_2)+P(A|cap A_3)\\ & = [P(B|A_1)\cdot P(A_1)]+[P(B|A_2)\cdot P(A_2)]+[P(B|A_3)\cdot P(A_3)]\\ & = 0.455 \end{split} \end{equation}\]

(c) If the next customer fills the tank, what is the probability taht regular gas is requested? Plus? Premium?

\(P(A_1|B)=\frac{P(A_1\cap B)}{P(B)}=\frac{P(B|A_1)\cdot P(A_1)}{P(B)}=\frac{24}{91}\approx 0.264\) (3 d.p.)

\(P(A_2|B)=\frac{P(A_2\cap B)}{P(B)}=\frac{P(B|A_2)\cdot P(A_2)}{P(B)}=\frac{6}{13}\approx 0.462\) (3 d.p.)

\(P(A_3|B)=\frac{P(A_3\cap B)}{P(B)}=\frac{P(B|A_3)\cdot P(A_3)}{P(B)}=\frac{25}{91}\approx 0.275\) (3 d.p.)

Part 2 : Prove that \(^{n+1}C_k=^{n}C_k+^nC_{k-1}\) is true. (not exzactly what is in the book)

Proof : \(^{n+1}C_k=^{n}C_k+^nC_{k-1}\)

Summary

Bayes’ Rule : \(P(A_i|B)=\frac{P(A_i\cap B)}{P(B)}=\frac{P(B|A_i)\times P(A_i)}{\sum\limits_{j=1}^{k}[P(B|A_j)\times P(A_j)]}\)
tree diagrams

To Do

fill in gaps in todays notes (complete)
extra credit (complete)

W3-D6 : Thur. Oct. 14th, 2021

extra credit

Chapter 3 : Random Variables & Probability Distributions

For a given sample space of en experiment, S, a random variable (typically denoted X or Y, capital letters) is any rule that associates a real number with each outcome in S.

Example

Flip a coin three times. Let X = {The number of heads}

Types of Random Variables

Discrete : A random variable whose possible values either consititue a finite set or one whose range is countabley infinite (count).

example : the number of studnets in a class, results of rolling a two on a 6-sided die

Contnuous : A random variable whose range is an interval on the number line (measure). Uncountably infinite.

example : A dog’s weight, a person’s height, waiting time for a flight, etc.

Probability Distribution

Probability Distribution : A description of how the total probability 1 is distributed among the various possible values of the ranom varible X.

note : each possible value of the random variable X is assigned a probability. The sum of all the probabilites for the various possible values of X is equal to 1.

For discrete random variables, this is called the Probability mass function (pmf)

\(\forall x\) the pmf is defined as \(p(x)=P(X=x)\)

Properties of the pmf :

\(0\leq p(x)\leq1\)
\(\sum\limits_{x\in S}p(x)=1\)

For continuous random variables, this is called the Probability density function (pdf)

\(\forall x\) the pdf is defined as \(f(x)=P(a\leq X\leq b)=\int_a^bf(x)dx\)

Properties of pdf :

\(f(x)\geq0\forall x\)
\(\int_{-\infty}^{\infty}f(x)dx=1\)

Relevant Properties

\(P(X=a)=\int_a^af(x)dx=0\forall a\).
\(P(a\leq X\leq b) = P(X=a)+P(a<X<b)+P(X=b)=P(a<X<b)\)
\(P(a\leq X\leq b)=P(a<X\leq b)=P(a\leq X <b)=P(a<X<b)\)

This is only true for continuous distribution

Example 1 - Discrete

Flip a coin three times. Let X = {The number of heads}

	V1	V2	V3	V4
X	0	1	2	3
p.x	1/8	3/8	3/8	1/8

\[ p(x) = \begin{cases} 1/8 & \text{if }x = 0\\ 3/8 & \text{if }x = 1\\ 3/8 & \text{if }x = 2\\ 1/8 & \text{if }x = 3\\ 0 & \text{otherwise} \\ \end{cases} \]

Example 2

A car dealer has 30 cars available for immediate sale, of which 10 are classified as compact cars. Three customers arrive and buy cars. Define the random variable X to be the number of compact cars sold.

What is the pmf of X?

What is the probability exactly 2 cars are purchased?

\(P(X=2)=\frac{900}{4060}\)

What is the probability less than 2 compact cars are purchased?

\(P(X<2)=P(X=0)+P(X=1)=0.281+0.468\approx 0.749\)

Extra Credit Problem

If A and B are independent events, show that - A and \(\overline{B}\) are also independent. - Are \(\overline{A}\) and \(\overline{B}\) independent?

For A and B \(\rightarrow\) independent then, - \(P(A|B)=P(A)\) - \(P(B|A)=P(B)\) - \(P(A\cap B)=P(A)P(B)\)

Proof that A, and B’ are Independent

\[\begin{equation}\label{A and B' are independent } \begin{split} P(A|\overline{B}) & = \frac{P(A\cap \overline{B})}{P(\overline{B})}\\ & = \frac{P(\overline{B}|A)P(A)}{P(\overline{B})}\\ & = \frac{[1-P(B|A)]P(A)}{P(\overline{B})}\\ & = \frac{[1-P(B)]P(A)}{P(\overline{B})}\\ & = \frac{[P(\overline{B})]P(A)}{P(\overline{B})}\\ & = P(A) \end{split} \end{equation}\]

Therefore \(P(A|\overline{B})=P(A)\). QED.

Proof that A’ and B’ are independent

\[\begin{equation}\label{A' and B' are independent } \begin{split} P(\overline{A}|\overline{B}) & = \frac{P(\overline{A}\cap \overline{B})}{P(\overline{A})}\\ & = \frac{P(\overline{B}|\overline{A})P(\overline{B})}{P(\overline{A})}\\ & = \frac{[1-P(A|\overline{B})]P(\overline{B})}{P(\overline{A})}\\ & = \frac{[1-P(A)]P(\overline{B})}{P(\overline{A})}\\ & = \frac{[P(\overline{A})]P(\overline{B})}{P(\overline{A})}\\ & = P(B) \end{split} \end{equation}\]

Therefore \(P(\overline{B}|\overline{A})=P(B)\). QED.

Summary

discrete random variables (countable)
- pmf
continuous random variables (measureable)
- pdf

To Do

compare hw1 to solutions (complete)
HW2 (complete)

W4-D7 : Tues. Oct. 19th, 2021

review
- random variable : associates a real number with each outcome in S.
- support : the set of possible values for the random variable
- discrete : finite and countable set
- countinuous : unaccountably infinite and measurable
- probability distribution : description of how probability is distributed among the various possible values of the random variable X.
- probability mass function (pmf) : for discrete rv (sum = 1)
- probability density function (pdf) : for continuous rv (integral = 1)
  - integrating area under the curve
- relevant properties (continuous rv) : P(X=a)=0, because integral needs range

Example 1

The current in a certain circuit as measured by an ammeter is a continuous r.v. X with the following pdf:

\[ f(x) = \begin{cases} 0.075x+0.2 & \text{for }3\leq x\leq 5 \\ 0 & \text{otherwise} \\ \end{cases} \]

(1) Sketch the graph of \(f(x)\)

\(f(3)=0.075(3)+0.2=0.425\) \(f(5)=0.075(5)+0.2=0.575\)

(2) Calculate \(P(X\leq P)\)

\(P(x\leq4)=P(3\leq X\leq 4)=\int_3^4(0.075x+0.2)dx=0.4625\)

(3) How does the probability above compare to \(P(X<4)\)?

For coninuous random variable \(P(X<4)=P(X\leq 4)=0.4625\)

(4) Calculate \(P(4.5<X)\).

\(P(4.5<x)=P(4.5< X< 5)=\int_{4.5}^5(0.075x+0.2)dx=0.278125\)

Example 2

Suppose the time between injury accidents in a nuclear power plant (in days) is a random variable that has pdf :

\[ f(x) = \begin{cases} \frac{1}{3}e^{-y/3} & y>0 \\ 0 & y\leq 0 \\ \end{cases} \]

(a) Verify that the function is a pdf.

\(f(y)\geq0\) for all y

\(y\geq 0 \rightarrow [0,\infty)\)

\(\lim\limits_{n\rightarrow \infty}f(n)=\frac{1}{3}(0)=0\)

\(\int_0^{\infty}f(y)dy=1\)

\(\int_0^\infty f(y)dy=\int_0^\infty \frac{1}{3}e^{-y/3}dy=\frac{1}{3}[(-3)(0-1)]=1\)

\(\therefore f(x)\) is a valid pdf.

(b) What is the probability that the time between accidnets is between 2 and 5 days?

\(P(2<X<5)=\int_2^5\frac{1}{3}e^{-y/3}dy=(-1)[e^{-5/3}-e^{-2/3}]\approx 0.325\) 3 d.p.

(c) Find the CDF of Y.

\[ F(y) = \begin{cases} 0 & y<0 \\ 1-e^{-y/3} & y\geq 0 \end{cases} \]

Cummulative Distribution Functions (CDF)

The cumulative distribution function (CDF) for a random variable X with PMF \(p(x)\) is defined for every number \(x\) by \[F(X)=P(X\leq x)\quad\text{for}-\infty<x<\infty\]

F(x) is a valid CDF if and only if: - \(F(X)\geq 0\) - \(F(X)\leq 1\) - F(x) is a non-decreasing funciton [That is if \(x_1\), \(x_2\) are two values such that and \(x_1\leq x_2\), then \(F(x_1)<F(x_2)\)

Relationship between CDFs and PMFs

For any discrete random variable X

\[F(x)=\sum\limits_{x_i\leq x}p(x_i)\Leftrightarrow p(x_i)=F(x_i)-F(x_{i-1})\quad\text{for }i=2,3,...\]

and

\(F(x_1)=p(x_1)\)

Cummulative Distibution Function (CDF)

The cumulative distribution function (cdf) for a continuous random variable X is defined as :

\[F(x)=P(X\leq x)=\int_{-\infty}^xf(t)dt\]

Suppose taht X is a continuous random variable, then for any number a

\[P(X>a)=1-F(a)\]

and for any two number a and b, such that \(a<b\).

\[P(a\leq X\leq b)=F(b)-F(a)\]

Obtaining pdf from cdf

If X is a continuous rv with pdf \(f(x)\) and cdf \(F(x)\), then at every x at which the derivative x exists, \(f(x)=F'(x)\).

Example :

\[ F(x) = \begin{cases} 0 & x<0 \\ \frac{1}{4}x^2 & 0\leq x<2 \\ 1 & 2<x \end{cases} \]

For \(0\leq 2<2\) : \(f(x)=F'(x)=\frac{d}{dx}(\frac{1}{4}x^2)=\frac{1}{4}2x=\frac{1}{2}x\)

For \(x<0\) : \(f(x)=F'(x)=0\)

For \(x>2\) : \(f(x)=F'(x)=0\)

Summary

pdf (probability d.f.)
cdf (cummulative d.f.)
- \(F(x)=P(X\leq x)=\int_{-\infty}^xf(t)dt\)
- \(P(X>a)=1-F(a)\)
- \(P(a\leq X\leq b)=F(b)-F(a)\)
obtainding pdf from cdf : \(f(x)=F'(x)\)

To Do

finish today’s notes (complete)

W4-D8 : Thur. Oct. 21st, 2021

no graphing calculators on tests, but R is okay?
review
- random variable : associates a real number with an outcome in S
- Discrete : rv are a finite set with a countably infinite range
- Continuous : rv whose range is an interval on the number line, and uncountably infinite.
- probability distribution : how the total probability 1 is distributed amoung rv X.
- PMF : for discrete rv (sum)
- PDF : for continuous rv (integral)
- CDF : rv X with PMF p(x)
  - \(f(x)=F'(x)\) with PDF \(f(x)\) and CDF \(F(X)\).
  - PDF is derivative of CDF
  - CDF is integral of PDF

Example 2

Consider the pdf below:

\[ f(x) = \begin{cases} \frac{1}{2}x & 0\leq x <2\\ 0 & \text{otherwise}\\ \end{cases} \]

(a) Obtain the cdf for any number between 0 and 2.

\[ F(x) = \begin{cases} 0 & x<0\\ \frac{1}{4}x^2 & 0\leq x<2\\ 1 & x\geq 2 \end{cases} \]

take note that 0 is at the top and 1 is at the bottom of the CDF function

(b) Using the cdf, find \(P(X\leq \frac{4}{5})\)

\(F(x)=P(X\leq x)\)

\(P(X\leq \frac{4}{5})=F(\frac{4}{5})=\frac{1}{4}(\frac{4}{5})^2=\frac{4}{25}=0.16\)

if use cdf route only integrate once and then plug in probabilities

(c) Using the pdf, find \(P(X\leq \frac{4}{5})\)

\(P(X\leq \frac{4}{5})=P(0\leq X\leq \frac{4}{5})=\int_0^{\frac{4}{5}}\frac{1}{2}xdx=\frac{4}{25}\)

if use the pdf need to set up integral

(d) Using the cdf, find \(P(X> \frac{6}{5})\)

\(P(X> \frac{6}{5})=1-P(x\leq \frac{6}{5})=1-F(\frac{6}{5})=1-\frac{1}{4}(\frac{6}{5})^2=\frac{64}{100}=0.64\)

(e) Using the cdf, find \(P(\frac{1}{2}<X< \frac{5}{2})\)

\(P(\frac{1}{2}<X< \frac{5}{2})=F(\frac{5}{2})-F(\frac{1}{2})=1-\frac{1}{4}(\frac{1}{2})^2=\frac{15}{16}\)

Expected Values - Discrete

Let X be a discrete random variable with set of possible values D and pmf p(x).

The mean or expected value of X is \[E(X)=\mu_X=\sum\limits_{X \in D}xp(x)\]
If a random variable X has a set of possible values D and pmf p(x), then the expected value of any function h(x) is \[E[h(x)]=\sum_{x\in D}h(x)p(x)\]

Example

Flip a coin three times. X = {The number of heads}. Given \(Y=3X+7\), calculate the expected value E(X), E(Y).

	V1	V2	V3	V4
X	0.0	1.00	2.00	3.0
p.x	0.1	0.45	0.25	0.2

\(E(X)=\sum\limits_{x=1}^4xp(x)=0(\frac{1}{8})+1(\frac{3}{8})+2(\frac{3}{8})+3(\frac{1}{8})=\frac{3}{2}\)

\(E(Y)=\sum\limits_{x=1}^3yp(x)=(2\times 0)(\frac{1}{8})+(2\times 1)(\frac{3}{8})+(2\times 2)(\frac{3}{8})+(2\times 3)(\frac{1}{8})=3\)

Example

Properties of the expected value

Let a be some constant, X is a rv \(E(aX)=aE(X)\)
Let X and Y be two random variables, \(E(X+Y)=E(X)+E(Y)\).
Let b be some constant E(b)

Compounding : \(E(aX+bY+c)=aE(X)+bE(Y)+c\)

Variance of a random variable

Let X have pmf \(p(x)\) and expected value \(\mu_X\). Then the variance of X is

\[\sigma^2(X)=V(X)=E[(X-\mu_x)^2]=\sum_{x\in D}(x-\mu_x)^2p(x)\]

By construction the standard deviation is defined

\[\sigma (X)=\sqrt{\sigma^2(X)}=\sqrt{V(X)}\]

Example

Calculate \(V(X)\) for using the shortcut formula.

	V1	V2	V3	V4
X	1.0	2.00	3.00	4.0
p.x	0.1	0.45	0.25	0.2

\(V(X)=f(x^2)-\mu_x^2\)

\(f(x^2)=\sum\limits_{x=1}^4x^2p(x)=7.35\)

\(\Rightarrow V(X)=7.35-2.55^2=0.8475\)

Variance of a function

\[V[(h(X)]=E[(h(X)-\mu_{xh(x)})^2]=\sum_{x\in D}(h(x)-\mu_{h(x)})^2p(x)\] Properties of Variance

Let a be some constant, X is a RV \(V(aX)=a^2V(X)\) .
Let b be some constant, then \(V(b)=0\).

Summary

Expected Value
Variance

Additional Notes

monday will cover continuous version of this (integrating instead of summing), and then will be good for the midterm
MIDTERM NEXT WEEK!!!! (Chapters 1-3)