Given a set of data, we would like to draw conclusions about the underlying population from which the data were taken. For instance, what is the mean of the population? What is its variance?

A **statistic**
is a **function of a random sample** of observations from a population. If
we compute the value of a statistic from the data in order to provide an idea
about a particular parameter of the population, we are *estimating *the
parameter. The statistic is then called an **estimator**. If the statistic
is a **single value**, it is a **point estimator**. If it provides a **range
of values**, it is an **interval estimator**.

Based on data collected about the lifetimes of GE-75 watt light bulbs, you want to make a statement such as “I am 95% confident that the average life time of GE 75-watt light bulbs is somewhere between 200 and 220 hours.” The interval (200,220) is termed a 95% confidence interval. How can you come up with a reasonable interval estimate of μ? By reasonable, I mean that you have confidence in the method used to obtain the interval. For instance, if you have to come up with an interval estimate of μ every month for your company (which buys thousands of cases of light bulbs each year), can you be assured that the intervals you provide will contain the true mean lifetime about 95% of the time?

Suppose you plan to measure the lifetimes of n GE 75-watt
light bulbs and decide to use the following (random) interval as your estimate:

,

where, the standard deviation
in lifetimes, is assumed to be known and *n* is relatively large.
Depending on the data collected, the computed endpoints of the interval will
vary. But the Central Limit Theorem
tells us that if our sample size is
large (When *n* is small, we must assume our population is

In general, areliable **two-sided** interval estimate of μ is
formed as ._{. } For a
particular sample of data, the observed sample average,
is substituted for μ.
We can also form **one-sided interval** estimates of_{ }μ.
For instance, if we want to place an **upper bound** on the value of μ,
our interval estimate would be_{}.

(**Note**: For large *n*, an approximate
confidence interval can be formed even when is not known, but is
replaced by its estimate, *s*, the sample
standard deviation.)

We can compute the number of samples necessary to obtain
an interval of a desired length *L* that is with reliability, using the following formula:

_{ }

The length, L, can be thought of as specifying the
precision of our estimate. There is an inverse relationship between the
reliability of our estimate and its precision. To make L smaller while keeping
the same level of reliability, *n* must be larger (more information is
needed). For fixed L, the larger the confidence level, the larger *n* must
be. For fixed L and reliability, the larger σ the larger *n* must be.

What if *n* is small? Then we need more information
about the population, specifically we must assume that it is

Since the Central Limit Theorem applies to sample
proportions, analysis similar to that above tells us that a reliable confidence
interval for *p* can be formed as_{}.

We can compute the number of samples necessary to obtain
an interval of a desired length *L* that is reliable using the following formula:

_{ }is a guess of *p*. If you have no knowledge of *p*
beforehand, assuming *p=0.5* gives a maximum value of *n* over entire
range of possible values of *p*.