Lecturer: Diana Skrzydlo (Skryzlo)
Office Hours: MF 3:30-5 in M3 3106 & W 11:30-12:30 in M3 3144
What is probability?
The likelihood of an event occuring
Definitions to set up a probability model
Review Question:
What is essential to the classical definition of probability?
Outcomes are equally likely
Example
Experiment: Roll a 6-sided die repeatedly
Trial: One roll
Outcomes: 1, 2, … , 5, 6
= {1, 2, 3, 4, 5, 6} (most versatile)
= {even, odd} (also works)
Example
A = “a 1 is rolled” -> A = {1}
B = “the number is odd” -> B = {1, 3, 5}
Note: events consisting of one sample point are simple events, otherwise they are called compound events On any trial, if the outcome is in the event A, we say A has occurred.
E.g. if a 1 is rolled -> A occurs, B occurs
3 is rolled -> B occurs, A does not
6 is rolled -> neither occur
is a function that maps events in a sample space to numbers such that
Recall
P(A) is the probability that A occurs. A is a subset of S. If A is compound, its prob is the sum of the probs of the simple events that make up A.
Two basic counting rules:
“with replacement” - is possible to obtain the same result more than once. Then it’ll have choices. M spaces, k options.
A permutation is an ordering of k objects selected from n objects without replacement. Let n = number of objects. K spaces. The order matters! = “n to k factors” =
Also
Tip, if computing a certain permutation is hard, try 1 - opposite of what you want to find. Eg. Find P(of at least one repeat) turns into 1 - P(no repeats) which is easier.
A combination is a selection of k objects from n objects without replacement. The order does NOT matter. E.g. drawing a hand of cards. Denoted by N choose K (I forget how to this) Pascal triangle - n choose k is the kth entry in the nth row.
Some shit about pascal’s triangle that is super obvious
(n choose k) = (n-1 choose k-1) + (n-1 choose k)
We won’t prove this mathematically since it’s messy to prove
oh lool, we proved this in MATH 239.
Example:
A box of 10 computer chips containing 3 defective ones in the box. Four of the chips are tested. What is the probability that 1 is found defective?
Solution:
W/o rep & don’t care about order.
The whole time, we don’t care about order. (i.e. we don’t need to worry if the defective one is first, last or etc)
Note: “1 if found” means exactly 1. Not at least one.
Can be arranged where and is number of type k.
Recall - event -> subset of S for any event A,
Rules for probabilities
If an event A is contained in another event B (A subset B) Then,
thus,
A_bar , A^c, A^’, -> NOT A. Aunion B = or. A intersect B = and.
Bar{aunionB} = a_bar and b_bar
bar{aintersectB} = a_bar or b_bar
P(a union B) = P(A) + P(B) - P(AB)
applies when events are independent of each other (don’t affect one another)
Recall: Independent: P(AB) = P(A)P(B)
Conditional prob of A given B: P(A|B) =
Last time we had
A = small = 3
C = total = 8
P(A|C) = 1/5
P(A) = 1/6
C occuring makes A more likely.
P(C|A) = 1/6, P(C) = 5/36
A occurring makes C more likely. Dependence is a two-way relationship
Alternative defn of independence. P(A|B) = P(A)
knowing B occurred has no effect on the probability of A occurring. P(B|A) = P(B)
Example:
Diana has 2 kids. Each independently has 0.5 chance of having red hair. At least one has red hair. What is the probability the other does? Let A = at least one red
Kid 1 | Kid 2 | |
---|---|---|
red |
0.25 | 0.25 |
not red |
0.25 | 0.25 |
P(A) = 0.75
P(both|A) = P(both and A) / P(A) = 0.25/0.75 = 1/3
If we rearrange the definition of conditional prob, we get:
P(AB) = P(B)P(A|B)
P(AB) = P(A)P(B|A)
Intuitively, think of “checking” whether A occurred. If it did, “check” whether B occurred (knowing A did). (or you check B first, then A) Works for both dependent and independent events.
Extends nicely to multiple events P(ABC) = P(B)P(C|B)P(A|BC)
The order doesn’t matter as long as each probability is conditional on what we know.
Let be a collection of events that are mutually exclusive cover the sample space
(break S up into k disjoint pieces) This is a partition of S
e.g. A and A_bar is a partition of size 2.
For any event B in S, we can express it as B =
So P(B) = using product rule
Example:
Three people A,B,C are writing code together. A produces twice as much as B or C
1% of A’s code has errors
2% of B’s
5% of C’s
What is the probability that a random line of code has errors?
Let E = line has error
A = line written by A
B = line written by B
C = line written by C
We want P(E)
Using the partition {A, B, C}, P(A) = 0.5 , P(B) = P(C) = 0.25
We have P(E|A) = 0.01
P(E|B) = 0.02
P(E|C) = 0.05
So
We found P(E) = 0.0225 using law of total prob. We would like to know, if we find an error, whose fault is it?
We had to reverse the direction of the conditioning: we have P(E|author). We want P(author|E).
also written as
For this example:
Example
Your spam filter has 90% detection rate but 1% false positive rate. Suppose 25% of messages are spam
Let S = “message is spam” P(S) = 0.25 P(S’) = 0.75
I = “Identified as spam”
P(I/S) = 0.9 -> P(I’|S) = 0.0
photo here<>
Supervised learning -> give some training data
Unsupervised learning -> no training data
Bayesian Classifier
P(Category 1 | evidence) =
3 doors. 1 door is revealed to be a dud. Do you switch doors?
Yes
Bayes’ Rule:
P(A|B) =
Product rule + law of total prob. A and A’ is a partition
Def. A random variable (rv) is a function that maps points in a sample space to Real numbers
The values that the rv can take on are called the RANGE of the rv. By convention, we call rvs X,Y,Z and the values, x,y,z
We are interested in finding the prob. that a rv X takes on one of it’s particular values. ie P(X =x) (outcome is a sample point that maps to x)
There are two types of rvs, discrete -> finite or countably infinite range vs continuous -> incountably infinite range
More than one rv can be defined on the same S
eg roll 3 fair 6-sided dice.
X = sum on 3 dice {3, … , 18}
Y = avg value {1, … , 6}
Z = the 3 digit number created by the dice
W = the product of the 3 {1, 2, … , 216}
U = number of 3s {1,2,3}
etc
Def. The probability factor (pf) of a discrete rv X is f(x) = P(X = x) and is only defined for x range of X. f(x) if x does not belong in range is undefined or 0.
- Properties of f(x)
- for all x range of f(x) . Why? it’s a probability
- The events “X = x” are mutually exclusive for each x
Def. The cumulative distribution function (cdf) of the rv X is F(x) = P(X x) and is defined for ALL Real x.
Recall: Random variable (X) maps sample points to Real numbers
Probability function f(x) = P(X=x) x range
Cumulative distribution function F(x) = P() x
Example:
x | 0 | 1 | 2 | 3 |
---|---|---|---|---|
f(x) | 125/216 | 75/216 | 15/216 | 1/216 |
F(x) | 125/216 | 200/216 | 215/216 | 1 |
Let’s graph it to show what happens at other Real values of x.
Relationship between F and f
f(x) = the size of the jump in F(x) at the point x = F(x) - F(x-1) <- which is the next smallest value in range
We can use F(x) to find = P()
Let X be a rv on the range {a, a+1, …, b} where & a,b where each value is equally likely. Then we say X has a discrete uniform distribution on [a , b] in other words, “X~DU[a,b]” . X has a DU distribution of [a,b]
e.g.
Find the pf f(x) = P(X = x) = c must be constant since all equal
We need
Find the cdf
Example
You are looking through a linked list of 100 items for one particular item. Let X = # of comparisons to find it. Claim: X ~ DU [1, 100]
Set up: we have N objects -> r “successes”, N-r “Failures”
We choose n without replacement
Let X = “# of S’s chosen”
eg. if x=# of winning numbers on a lotto 6/49 ticket
X ~ Hyp(N,r,n)
N=49, r = 6, n =6
Find the pf f(x) = P(X=x) = P(we get x S’s and n-x F’s)
range of x?
lower bound: , possible to run out of F’s so remaining will all be S’s
upper bound x \leq n (all 5’s), x \leq r (could run out of S’s)
so the range of X is messy and depends on the relationship btwn N, r, and n.
Also, there is no closed form expression for F(x). You have to add up the f(x) values which is computationally intensive when N is large.
Example:
You have 10 cards - 7 treasure, 3 non-treasure.
Draw 5 without replacement. Let X = “# of treasure cards”. X ~ Hyp(N=10, r=7, n=5)
f(x) = P(X = x) = for x = 2,3,4,5
All the values of f(x) will add up to 1
(proven using Hypergeometric series result)
Set up: Bernoulli trials - independent trials, 2 outcomes on each (S or F), P(S) = P is constant for all trials
We do n trials.
Let X = # of S in n trials
(You can imagine it as a Hypergeometric selecting with replacement instead of without.)
We write X ~ Bin(n, p)
Eg. Flip a fair coin 10 times, x = # heads, X ~ Bin (n = 10, p = 0.5)
Find the pf f(x) = P(X=x) = P(we get x S’s and n-x F’s) = p^x(1-p)^(n-x)j
Recall:
X~Hyp(N, r, n)
X = # S’s in n objects w/o rep
X~Bin(n,p)
Example:
Want to send a 4-bit message. Each bit is independently flipped (0->1 or 1->0)
Probability = 0.1
P(message received correctly?)
Let X = # of bits flipped
X~Bin(4, 0.1)
P(X=0) =
Now add 3 “parity bits” to the message which allows the receiver to detect and fix up to 1 error.
Let Y = # bits flipped ~Bin(7, 0.1)
P(Y=0) + P(Y=1) are both ok.
(ie. n << N) then it doesn’t make a big difference to the probabilities if you sample with or without replacement. If we did it with replacement the number of S’s we get
X~Bin(n,p=)
So when N is large and n is small, we can use a Bin(n, ) to approximate a Hyp(N,r,m). (when is less than 0.05)
recall: Negative Binomial (5.5)
Bernoulli trials (indep, S or F P(S)=p) X = # of F’s before the Kth S is obtained
pf and examples of NB
Geometric rv (5.6)
How to tell when to use distributions
Bin | NB |
---|---|
-know # trials(n) | -know # successes(k) |
-? successes modelled by X | -? trails modelled by k+x |
We write
range
We can show that
But there is no closed form expression for
Example:
A startup is looking for 5 investors. Each investor will independently say yes with probability 0.2. Founders will ask investors one at a time until they get 5 yes. Let X=total # of investors asked. Find f(x) and f(10).
Let Y = # who say no (Y+5=X)
So
(nothing to do with hypergeometric)
Special case of NB with k=1.
X= # of F’s before obtaining the first S in Bernoulli trails
X~Geo(p)
range: {0, 1, 2, …}
no orderings to worry about since it’s just FFFF….FS (x F’s)
Easy to show using infinite geometric series formula
Discrete Uniform | Hypergeometric | Bin | NB | Geometric | Poisson | |
---|---|---|---|---|---|---|
pf f(x) | ||||||
range | {a, a+1, .. ,b} | weird! | 0,…,n | 0,1,… | 0,1 … | 0,1,2,… |
cdf F(x) | no closed form | no | no | |||
when to use? | - counting # S | Bin with large n, small p (approx) |
f(x)
When to use? = Bin with large n, small p (approx)
range 0,1,2,…
Recall: uniform, hypergeometric, binomial, NB, geometric
Poisson dist from Bin (5.7)
Poisson process (5.8)
We say X has a Poisson distribution with parameter ( if for
Easy to show since
The Poisson is a limiting case of the Binomial when n-> and p->0 such that the product np remains constant
Let np =
then,
So if we have a Bin(n,p) with large n and small p, we can approx if with a Poisson rv with parameter . Guideline: and works well!
Example:
Roll up the Rim - “1 in 9 cups win!” You buy 100 cups (treat as independent). Find prob you get 10 or fewer winning cups.
X= # winning
So:
Try Poisson approx
Not a great approx since was a bit too high to be “close to 0”
Clicker question
Suppose you type at exactly 90 words pm and on each word have a 1% chance of making an error. After 1 minute, what is the probability you have made NO errors?
0.405 -> from bin. 0.407 -> from poisson
You can also use Poisson when by instead modelling the number of F’s instead of S’s.
Consider “events” occurring randomly throughout time/space according to 3 conditions:
E.g. emails into an inbox
Cars through an intersection
Births in a large population
Imagine we observe a Poisson process (with rate ) for t units of time.
Let X = # of events that occur.
X is a discrete rv with no maximum.
It turns out that:
X ~ Poi( = Xt)
ie. for x = 0, 1, 2
Proof: See course notes
Add one more column to your chart!
| Poisson
f(x) | |
When to use? = Bin with large n, small p (approx)
range 0,1,2,…
When NOT to use?
We can specify a maximum
If it makes sense to ask how many time the event did not occur
Example:
requests to a web server follow the conditions of a Poisson process with rate 100 per minute.
Find prob of 1 request in 1 sec. 90 requests in 1 min.
Let X = # requests in 1 sec
X ~ Poi( or )
Let Y = # requrests in 1 min
Y ~ Poi()
P(Y=90) =
Many problems may combine more than one distribution together. Your task is to identify the distribution needed, depending on the perobability requested.
Example: Server requests 100/min
A 1-second period is “quiet” if it contains no requests.
a) Find the Prob of a “quiet” second.
X=# requests in 1 sec. X~Poi()
P(X = 0) =
b) Prob of 10 “quiet” seconds in a minute (60 non-overlapping sec)
Let Y = # “quiet” in 60 sec.
Y~Bin(60,0.189)
P(Y=10) =
c) Prob of having to wait 30 non-overlapping sec to get 2 “quiet”
Let Z = # non-quiet sec before 2 “quiet”
Z ~ NB(2, 0.189)
P(Z = 28) =
d) Given (c), the prob there is 1 “quiet” sec in the first 15 sec.
P(1 Q in 15 sec | wait 30 for 2 Q)
Use binomial for 1st prob. Geometric for 2nd.
Let X = # of kids in a family
To summarize the data, we can use:
1. A frequency distribution
x | frequency |
---|---|
1 | 10 |
2 | 14 |
3 | 10 |
4 | 1 |
5 | 2 |
2. a frequency histogram (different graph here because I was lazy and didn’t want to draw one in ms paint)
3. A single number representing the average or sample mean
4. median - the middle value: 2 in this case
5. mode - most common/frequent value (in this case, 2)
Expectation of a r.v. (7.2)
We had the sample mean of the kids be
We can replace the observed relative frequency with a theoretical probability of the r.v. equally x -> theoretical mean
2011 census:
x | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
f(x) | 0.43 | 0.4 | 0.12 | 0.04 | 0.01 |
so the theoretical mean of X is:
and median is 2 since F(2) > 0.5 and F(1) < 0.5
and mode is 1
Def: The expected value or expectation or mean of a discrete r.v X is:
Recall - Expected value (aka mean) of X is
Can think of it as a weighted average of the values X can take with weights = probabilities or balance point of the histogram of f(x). Often we may be interested in the average value of the function of x. Eg. x = usage on phone. g(X) = cost of that usage.
Def: the expected value of g(X) for a discrete r.v. X is
weighted average of the g(x) values that can occur. (e.g. g(X) = 1000 + 250x)
What if ?
Here, because g(x) is non-linear function of X. Expectation is a linear operator.
SO ONLY USE IF g(X) IS LINEAR
Also:
If we have the distribution of X ans we let Y = g(X), we can find E[Y] by either: or by finding the range and pf of Y and using
Example
Suppose the time to finish a part on a coding question is 10 minutes if you make no errors. Syntax errors take 2 mins to fix. Logic errors take 10 min. Assume O(syntax error) = 0.1 , P(logic error) = 0.2 independently.
Find the average(expected) time to finish this question.
Let X = 0 (no errors), 1(syntax), 2(logic), 3(both),
x | 0 | 1 | 2 | 3 |
---|---|---|---|---|
f(y) -> f(x) | 0.72 =(0.9*0.8) | (0.1)*(0.8) = .08 | (0.9)*(0.2)=0.18 | 0.02 |
y -> g(x) | 10 | 12 | 20 | 22 |
So
Example:
A web server has a cache. 20% chance that the request is found in the cache (cache hit) -> 10 ms. If it’s not found (cache miss) then it takes 50(send msg) + 70(lookup) + 50(return answer) ms. Find the expected time with and without the cache.
Without: time is always 190ms So E[T] = 170.
With: Let X = 0 if found, 1 if not found
x | 0 | 1 |
---|---|---|
f(x) | 0.2 | 0.8 |
time=g(x) | 10 | 180 |
Recall:
The mean of X, E[X] tells us where the distribution is centered, on average. But the practice we also care about how widely spread out the distribution is around that mean.
E.g. determining the number of servers for an online system. need to know spread.
How to measure?
(not helpful!)
- average absolute distance from mean
So we use:
- average squared distance from mean
Def: Variance of a r.v. X is
Weighted average squared, distance from mean. If X is discrete,
Calc form of variance:
Where
Example:
X has pf
x | 10 | 12 | 20 | 22 |
---|---|---|---|---|
f(x) | 0.72 | 0.08 | 0.18 | 0.62 |
We found min
Now
Note that variance is measured in units squared rather than the original units of X. Taking the square root of the variance makes more sense.
Defn: the standard deviation of X is
In our example, min
Remember, Var(X) will always be non-negative because it’s a weighted average of values
So SD(X) will also be and a Real number
by linearity of expectations
Note: b does not affect the variance since shifting the distribution doesn’t affect the spread. We increase all the distances by a factor of so the squared distances increase by
Finally,
Example
Suppose X has pf
x | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
f(x) | 0.1 | 0.1 | 0.1 | 0.5 | 0.2 |
y | 1 | 3 | 5 | 7 | 9 |
Let
But! We can calculate the Var(Y) in a different way!
Verify and
Note: Var is smaller than mean when or the variance is small since there’s less uncertainty
is the same so are and cdf Claim: F(x) is continuous for a cts rv X Proof ie F(x) is left cts similarly F(x) is right cts So F(x) is everywhere continuous (not necessarily everywhere differentiaible) As before, its non-decreasingRecall: continuous rv. has uncountable range for any x, P(X=x) = 0
Properties of F(x)
Probability density function f(x)
expectation + variance, percentiles
transformations
Imagine area of rectangleNote: f(x) is NOT a probability!
a) Let X = # who add in 1 day
X~ Poi()
We know P(X=0) = 0.1353
But
Equating and solving,
b)
looks like binomial. We could have at the beginning, used binomial and have gotten the same result.
c)
d) Y = # bad days out of 7
Y~Bin(7,0.41)
e) Let Z = # good before first bad
Z ~ Geo(0.41)
a)
b)
general
c) range of R: {0, 1, 2, 3}
from before
Similarly, and
So
d) we know
take d/dr of bs
Now
Why?
Waiting time is inversely proportional to the prob of success.
x | 3 | 4 | 5 | 6 |
---|---|---|---|---|
f(x) | 0.02 | 0.13 | 0.75 | 0.1 |
suppose the hours of schoolwork you have each week are 20sqrtx and your stress level is 2X^2
a) Find the expected schoolwork hours and the expected stress level of a random student.
b) Find the variance of these quantities
a) Let Y =
x | f(x) | y | z | y^2 | z^2 |
---|---|---|---|---|---|
3 | 0.02 | 34.64 | 18 | 1200 | 324 |
4 | 0.13 | 40 | 32 | 1600 | 1024 |
5 | 0.75 | 50 | 2000 | … | |
6 | 0.1 | 72 | 2400 | … |
-> Var(Y) = 6.60
-> Var(Z) = 110.31
f(x) = ksqrtx for 0 < x < 1
f(x) = k/2 (3-x) for 1 < x < 3
f(x) = 0 otherwise
see learn
a) we need
So
recall: cdf, pdf, mean, percentiles of cts X
Uniform distribution (8.2)
transformations / change of variable (8.1)
SWAG: computer -generated random numbers
Example:
A cont rv X has pdf f(x) = for 0 < x < 2 and 0 otherwise
Find c:
f(x) = F’(x)
all x
so
So F(x) = 0 if x < 0
F(x) = 1 x > 2
Find P(X>1)
Find E[X]
E[X] =
Def: If a cts rv X takes values on (a,b), a < b R st all subintervals of fixed length are equally likely. We say X~U(a,b). The endpoints a and b can be included or excluded, it doesn’t matter.
Find f(x) = c (a constant)
but we need
Sp F(x) = 0 for x < a
F(x) = 1 for x > b
So F(x) is not diff at a,b.
(average of end points)
Similarly Var(X) = (proof as exercise. Maybe on final :^))
Makes sense that it’s proportional to the square of the range.
Special case:
a=0, b=U~U(0,1)
otherwise
If we have the distribution X but we want we can find the cdf/pdf of Y by changing the variable in 3 steps.
Example:
Let X=# a spinner lands on between 0 and 4.
X~U(0,4)
Let and find
Step 1:
Step 2: We know for
for
for
So
and
Step 3: range of Y:
So for
otherwise
Note: if h(X) is invertible (1-1) we can differentiate after step 1 using the chain rule.
def: The exponential distribution is a cts rv defined for all positive Real numbers. It represents the waiting time between events in a Poisson process.
Find cdf of X=time until next event in P.p w rate .
F(x) = P(Xx) = 1 - P(X>x) = 1 - P(next event occurs more than x time from now)
= 1 - P(0 events in (0,x))
but # events in (0,x) in a P.p is Poi(x)
= 1 -
Thus f(x) = for x > 0
Alternate parameterization
for x > 0.
()
We say X~exp
E[X] =
use integration by parts OR we can use a trick called a Gamma function
Def:
Properties:
If ,
If ,
If is an integer ,
So
substitution:
Let
So:
Remember - rate of events in poisson process.
- average waiting time between events
So it makes sense that they are inversely related
,
So Var(X) =
i.e.
For Poisson, nean = variance = parameter .
For exponential mean = sd = parameter
Example:
Requests arrive to a server with an exp. distribution of wait times. 50% of the time, the wait is at best 15 sec.
a) Find avg time between requests
Let X = time between requests~exp
We know P(X > 15) = 0.5
So 1 - F(15) = 0.5
So
So
b) If a request has just come in, find the prob. another one comes in within 5 sec.
P(X < 5) = F(5) =
Could also do with Poisson rv.
The memoryless property
Example:
Suppose buses arrive according to a Poisson process with rate 5/hr.
Find P(wait > 15 mins)
t = min
X = waiting time ~exp(12)
P(X > 15) = 1 - (1 -
C) given you’ve already waiting 6 min, find prob you wait > 15 more min.
i.e. P(X > 21 | X > 6)
Waiting already done affect change future wait time
Recall: X~exp(), is waiting time between events in a Poisson proccess with rate (cts rv)
Var(X) =
Memoryless property
If X~exp()
P(X > t + s | X > t) = P(X > s)
Any waiting time already occurred is irrelevent in determining the remaining waiting time. This makes exp a bad choice for modelling human lifetimes.
Despite this we often use it in the short term for lifetimes of electronic or mechanical components.
Many real life phenomena seem to follow a Normal distribution.
- heights/weights
- test scores on learge standard tests
- logs of stock returns
- measurement errors
Def: We say X has a Normal Distribution X~N() if its pdf is:
for all Real x
What does it look life?
-symmetric around
-both lefts and right “tails” go to 0 quickly
- highest when x =
- area under curve = 1
We can show using a change of variable and the gamma function trick and also the median and mode are
And (proof again in uses gamma f’n)
the cdf
We cannot evaluate the integral! numerical methods are needed.
Note: the Normal distribution is also called the Gausian X~N(,) = X~G(,) <- where is SD
Standard Normal rv (special case)
Let ,
Z~N(0, 1)
standard Normal pdf for
The standard Normal cdf
still cannot be integrated analytically but tables of values of are readily available.
Using N(0,1) tables
row -> ones column and first decimal place
column -> second decimal place
e.g. P(z < 3.14) = 0.99916
to find F(z) where z < 0 use the fact that the Normal dist is symmetric.
Example:
Suppose a voltage of +2 or -2 is sent down a wire to represent 1 or 0. The connection is noisy and adds a N(0,1) amount to the voltage sent. The receiver interprets any voltage > +0.5 as a 1 or 0 otherwise.
Find P(error| 1 sent)
Let R = voltage received = 2 + Z where Z~N(0,1) is the noise
P(R < 0.5) = P(2 + Z < 0.5)) = P(Z < -1.5)
=1 - P( Z < +1.5)
= 1 - 0.93319
= 0.06681
P(error | 0 sent) R = -2 + Z
P(R > 0.5) = P(Z > 2.5) = 1 - P(Z < 2.5) = 1 - 0.99379 = 0.00621
Finding percentiles of N(0,1) is the value such that F() = p
F() = p connot be solved analytically so we use the F(z) table. Say we want
Recall: Normal Dist, probabilities + percentiles of N(0,1)
Transforming a Normal variable
Probabilities + percentiles of N(,)
Also Recall: X~N(,)
Standard Normal Z~N(0,1)
F(z) =P(Z z) in tables (use symmetry for negative values)
Also finding percentiles of N(0,1)
-can look for p in the main table and find the row + column closest
P(Z )=p
-or can look up in the reverse Normal table (at the bottom)
Example:
Find c such that
Since Normal Dist is symmetric, we look for c -> covers 90% (because the area outside the graph will be 20% / 2)
So c is the 90th percentile of N(0,1)
looking it up, c = 1.2816
In practice we want to find information, not just N(0,1)
Suppose X~N(, )
Let (a linear function of X)
Claim Z~N(0,1)
Proof
Which is the pdf of a N(0,1) rv
Examples
Heights of adult men are N(68, ). Find prob of a person being at least 77 in.
Let X = height of a random person from this population
P(X < 77) = P( > )
Example 2
Scores on the SAT reading test follow a N(504, ). What score is required to be in the top 10%?
Find s st
So set
In general, if X~N(,)
the percentile of X is
percentile of Z~N(0,1).
We have models (both discrete and continuous) for a single random quantity
But we are often interested in two or more rvs at the same time.
Eg. returns on two stocks X and Y
heights and weights
in a board game, # of a suit and # of a rank
treatment vs recovery
runtime vs algorithm used
In this course we’ll focus on the discrete case, but the continuous case is similar
Def:
For two discrete rvs X and Y, the joint pf is f(x,y)=P(X=x , Y=y)
i.e. the prob that X takes the value x AND Y takes the value y at the same time. Defined where x range of X and y Y. In general, the joint pf of , … , is f(,…) = P(, … , )
Similarly, to the single var case, f(x,y) can be displayed as a table or as a function of x and y.
Example:
Experiment: toss a fair coin 3 times
Let X = # heads
Y = 1 if heads , 0 otherwise
find f(x,y)
f(x,y) | 0 | 1 | 2 | 3 |
---|---|---|---|---|
0 | ||||
1 |
Note:
what if we only want info about X?
e.g.
P(X=0) = P(X=0, Y=0) + P(X=0, Y=1) = f(0,0) + f(0,1) =
etc
We get the pf of X by summing over the values of Y.
Similarly we could find the pf of Y by summing over the values of X.
Def: The marginal pf of X is and the marginal pf of Y is
Note: and will automatically satisfy the conditions for a pf
In general for X1 ,… , Xn , then the marginal pf of X1 is:
Recall: two events A and B are independent iff P(AB) = P(A)P(B). We extend this idea to rvs.
Def: Two discrete rvs X and Y are independent iff for all possible pairs (x,y)
Note: to show dependence, it suffices to find one pair (x,y) where this doesn’t hold to show Independence, check all pairs.
In general, X1,…Xn are independent of each other iff f(x1, …, xn) = for all n-tuples
In our example, f(0,1) = 0. but fx(0)fy(1) = 1/16 != 0
therefore X and Y are not dependent
A quick way to check for dependence is to find 0’s in the table
In other words, if the range is not a rectangle, formally a Cartesian product, the we know that the variables are dependent.
Remember conditional probability
We can extend to rvs
Def: The conditional pf of X given Y=y is:
Similarly
We may often be interested in the sum of X and Y or any function of any number of rvs.
eg. T = X + Y, U = 2(Y-X)
We can find the pf of the new rv based on the joint pf as follows: