**2. Discrete random variables**

If the chance outcome of the experiment is a **number**, it is called a *random variable*.

*Discrete random variable: *the possible outcomes can be listed e.g. 0, 1, 2, … .

Notation for random variables: capital letter near the end of the alphabet e.g. *X*, *Y*.

P(*X = k*) denotes “The probability that *X* takes the value *k*“.

Note: 0 P(*X = k*) 1 for all *k*, and .

We are just going to look at two discrete distributions in detail.

**Preliminaries**

*Factorials:* for a positive integer *k*, *k*! = *k*(*k*-1)(*k*-2) … 2.1

e.g. 3! = 3 x 2 x 1 = 6.

By definition, 0! = 1.

*Combinatorials: *integers *n* and *k* where *n* *k* 0:

= number of ways of choosing *k* things from *n*.

For example, in the National Lottery, the numbers of ways of choosing 6 numbers from 49 (1, 2, … , 49) is:

**Binomial distribution**

*Bernoulli trial:* a process with two possible outcomes, “success” and “failure”.

e.g. coin tossing: Heads or Tails

quality control: Quality Satisfactory or Unsatisfactory

An experiment consists of *n* independent Bernoulli trials and *p* = probability of success for each trial. Let *X* = total number of successes in the *n* trials.

Then P(*X = k*) = for *k* = 0, 1, 2, … , *n*.

This is called the Binomial distribution with parameters *n* and *p*, or B(*n*, *p*) for short.

*X* ~ B(*n*, *p*) stands for “*X* has the Binomial distribution with parameters* n* and *p*.”

**Rough justification**

“*X = k*” means *k* successes (each with probability *p*) and *n-k* failures (each with probability 1-*p*). Suppose for the moment all the successes come first:

probability = *p* x *p* x … x *p* x (1-*p*) x (1-*p*) x … x (1-*p*) (by independence)

= *p**k*(1-*p*)*n-k*

In fact, you get this probability whatever the ordering of successes and failures.

Number of ways of ordering *k* successes out of *n* trails = .

By special addition rule, overall probability = *p**k*(1-*p*)*n-k* +* p**k*(1-*p*)*n-k* + … +*p**k*(1-*p*)*n-k*

=

**Example 2.1**

Probability that more than one will be returned?

*Solution*

Let *X *= number of engines which will need to be returned.

Bernoulli trial: each engine, it’s either returned or not.

*X* ~ B(5, 0.2)

P(More than one engine returned) = P(*X* > 1)

= 1 – P(*X* 1) (Complements rule)

= 1 – P(*X* = 0) – P(*X* = 1)

= 1 –

= 1 – (1 x 1 x 0.85) – (5 x 0.2 x 0.84)

= 1 – 0.32768 – 0.4096 = 0.263.

**Situations where a Binomial might occur**

1) Quality control: select *n* items at random;* X* = number found to be satisfactory.

2) Survey of *n* people about products A and B; *X* = number preferring A.

3) Telecommunications: *n* messages; *X* = number with an invalid address.

**Poisson distribution**

A random variable *Y* has the Poisson distribution with parameter (> 0) if:

P(*Y = k* ) = (*k* = 0, 1, 2, … )

**Occurrence of the Poisson distribution**

1) As an approximation to B(*n, p*), when *n* is large and *p* is small (e.g. if *np* < 7, say); in this case, if *X* ~ B(*n, p* ) then P(*X = k*) where = *np* i.e. *X* is approximately Poisson, parameter *np*.

2) Associated with a *Poisson process*. A Poisson process, rate , consists of occurrences which happen independently of one another and at random, the probability of an occurrence happening in a small interval of length *t* being *t*.

Then, if *Y* = number of occurrences in an interval of size *t*, *Y* ~ Poisson, *= **t*.

**Examples of possible Poisson processes**

1) Arrivals of messages at a telecommunications system.

2) Occurrence of flaws in a fibre.

3) Times at which vehicles pass a census point.

4) Times at which radio-active particles are given off from a radio-active rock.

**Example 2.2**

Probability three or more fail in ten years?

*Solution*

Let *X* = number failing in ten years, out of 5,000,000.

*X *~ B(5000000, 10-6)

Evaluating the Binomial probabilities is rather awkward; better to use the Poisson approximation.

*X* has approximately Poisson, *= np* = 5000000 x 10-6 = 5.0.

P(Three or more fail) = P(*X* 3) = 1 – P(*X* = 0) – P(*X* = 1) – P(*X* = 2)

= 1 – – – = 1 – e-5 (1 + 5 + 12.5) = 0.875

**Example 2.3**

(a) Find the probability of 5 messages arriving in a 2-sec interval.

(b) For how long can the operation of the centre be interrupted, if the probability of losing one or more messages is to be no more than 0.05?

*Solution*

Times of arrivals form a Poisson process, rate = 1.2/sec.

(a) Let *Y* = number of messages arriving in a 2-sec interval.

Then *Y* ~ Poisson, = *t* = 1.2 2 = 2.4.

P(*Y* = 5) = = 0.060.

(b) Let the required time = *t* seconds.

Let *W* = number of messages in *t *seconds, so that *W* ~ Poisson, = 1.2 *t* = 1.2*t*

P(At least one message) = P(*W* 1) = 1 – P(*W *= 0) = 1 – e-1.2*t* 0.05.

e-1.2*t* 0.95

-1.2*t* ln(0.95) = -0.05129

*t* 0.043 seconds.

**Mean (or expected value) of a distribution**

For a random variable *X* taking values 0, 1, 2, … , the mean value of *X* is:

= = 0 P(*X* = 0) + 1 P(*X* = 1) + 2 P(*X* = 2) + …

The mean is also called: population mean

expected value of *X* (or E(*X*))

expectation of *X*.

*Intuitive idea: *if *X* is observed in repeated independent experiments and is the sample mean after *n* observations (= ), then as *n* gets bigger, tends to .

**Variance and standard deviation of a distribution**

is a measure of the “average value” of a distribution.

The standard deviation, , is a measure of how spread out the distribution is.

Variance = 2 = var (X)

= (definition)

= (often easier to evaluate in practice).

**Results (no proof)**

If *X* ~ B(*n, p*), then = E(*X*) = *np*, and 2 = var(*X*) = *np*(1-*p*).

If *Y* ~ Poisson, , then = E(*X*) = , and 2 = var(*X*) = .

As well as helping describe distributions, knowledge of the mean and variance can help evaluate probabilities, using a “Normal approximation”. We shall return to this later.