2. Discrete random variables

If the chance outcome of the experiment is a number, it is called a random variable.

Discrete random variable: the possible outcomes can be listed e.g. 0, 1, 2, … .

Notation for random variables: capital letter near the end of the alphabet e.g. X, Y.

P(X = k) denotes “The probability that X takes the value k“.

Note: 0 P(X = k) 1 for all k, and .

We are just going to look at two discrete distributions in detail.

Preliminaries

Factorials: for a positive integer k, k! = k(k-1)(k-2) … 2.1

e.g. 3! = 3 x 2 x 1 = 6.

By definition, 0! = 1.

Combinatorials: integers n and k where n k 0: = number of ways of choosing k things from n.

For example, in the National Lottery, the numbers of ways of choosing 6 numbers from 49 (1, 2, … , 49) is: Binomial distribution

Bernoulli trial: a process with two possible outcomes, “success” and “failure”.

e.g. coin tossing: Heads or Tails

quality control: Quality Satisfactory or Unsatisfactory

An experiment consists of n independent Bernoulli trials and p = probability of success for each trial. Let X = total number of successes in the n trials.

Then P(X = k) = for k = 0, 1, 2, … , n.

This is called the Binomial distribution with parameters n and p, or B(n, p) for short.

X ~ B(n, p) stands for “X has the Binomial distribution with parameters n and p.”

Rough justification

X = k” means k successes (each with probability p) and n-k failures (each with probability 1-p). Suppose for the moment all the successes come first:

probability = p x p x … x p x (1-p) x (1-p) x … x (1-p) (by independence)

= pk(1-p)n-k

In fact, you get this probability whatever the ordering of successes and failures.

Number of ways of ordering k successes out of n trails = .

By special addition rule, overall probability = pk(1-p)n-k + pk(1-p)n-k + … +pk(1-p)n-k

= Example 2.1

Probability that more than one will be returned?

Solution

Let X = number of engines which will need to be returned.

Bernoulli trial: each engine, it’s either returned or not.

X ~ B(5, 0.2)

P(More than one engine returned) = P(X > 1)

= 1 – P(X 1) (Complements rule)

= 1 – P(X = 0) – P(X = 1)

= 1 – = 1 – (1 x 1 x 0.85) – (5 x 0.2 x 0.84)

= 1 – 0.32768 – 0.4096 = 0.263.

Situations where a Binomial might occur

1) Quality control: select n items at random; X = number found to be satisfactory.

2) Survey of n people about products A and B; X = number preferring A.

3) Telecommunications: n messages; X = number with an invalid address.

Poisson distribution

A random variable Y has the Poisson distribution with parameter (> 0) if:

P(Y = k ) = (k = 0, 1, 2, … )

Occurrence of the Poisson distribution

1) As an approximation to B(n, p), when n is large and p is small (e.g. if np < 7, say); in this case, if X ~ B(n, p ) then P(X = k) where = np i.e. X is approximately Poisson, parameter np.

2) Associated with a Poisson process. A Poisson process, rate , consists of occurrences which happen independently of one another and at random, the probability of an occurrence happening in a small interval of length t being t.

Then, if Y = number of occurrences in an interval of size t, Y ~ Poisson, = t.

Examples of possible Poisson processes

1) Arrivals of messages at a telecommunications system.

2) Occurrence of flaws in a fibre.

3) Times at which vehicles pass a census point.

4) Times at which radio-active particles are given off from a radio-active rock.

Example 2.2

Probability three or more fail in ten years?

Solution

Let X = number failing in ten years, out of 5,000,000.

X ~ B(5000000, 10-6)

Evaluating the Binomial probabilities is rather awkward; better to use the Poisson approximation.

X has approximately Poisson, = np = 5000000 x 10-6 = 5.0.

P(Three or more fail) = P(X 3) = 1 – P(X = 0) – P(X = 1) – P(X = 2)

= 1 –   = 1 – e-5 (1 + 5 + 12.5) = 0.875

Example 2.3

(a) Find the probability of 5 messages arriving in a 2-sec interval.

(b) For how long can the operation of the centre be interrupted, if the probability of losing one or more messages is to be no more than 0.05?

Solution

Times of arrivals form a Poisson process, rate = 1.2/sec.

(a) Let Y = number of messages arriving in a 2-sec interval.

Then Y ~ Poisson, = t = 1.2 2 = 2.4.

P(Y = 5) = = 0.060.

(b) Let the required time = t seconds.

Let W = number of messages in t seconds, so that W ~ Poisson, = 1.2 t = 1.2t

P(At least one message) = P(W 1) = 1 – P(W = 0) = 1 – e-1.2t 0.05.

e-1.2t 0.95

-1.2t ln(0.95) = -0.05129

t 0.043 seconds.

Mean (or expected value) of a distribution

For a random variable X taking values 0, 1, 2, … , the mean value of X is:

= = 0 P(X = 0) + 1 P(X = 1) + 2 P(X = 2) + …

The mean is also called: population mean

expected value of X (or E(X))

expectation of X.

Intuitive idea: if X is observed in repeated independent experiments and is the sample mean after n observations (= ), then as n gets bigger, tends to .

Variance and standard deviation of a distribution

is a measure of the “average value” of a distribution.

The standard deviation, , is a measure of how spread out the distribution is.

Variance = 2 = var (X)

= (definition)

= (often easier to evaluate in practice).

Results (no proof)

If X ~ B(n, p), then = E(X) = np, and 2 = var(X) = np(1-p).

If Y ~ Poisson, , then = E(X) = , and 2 = var(X) = .

As well as helping describe distributions, knowledge of the mean and variance can help evaluate probabilities, using a “Normal approximation”. We shall return to this later.