Material Discussed

Sections 4.4 (Exponential Distribution only) and 4.6 (Normal Probability Plots)

How to Determine if a Distribution of Sample Data can be Adequately
described using a Normal Distribution

  1. For large data sets, analysis of the shape of the histogram can help you decide if the data
    follows approximately a Normal distribution.
  2. Normal Probability Plot (Normal QQ Plot): For data sets of any size (greater than about
    6 or so), the percentiles of the data can be compared to the corresponding percentiles of a
    standard Normal distribution. If the data can be adequately described using a Normal
    distribution, the percentiles of the data should be approximately linearly related to the
    percentiles of the standard Normal distribution. To assess whether this holds, plot pairs
    (Zi, X(i)), where Zi denotes the [100(i-.5)/n]-th percentile of a standard Normal random
    variable and X(i) denotes the  i-th smallest observation. The points should fall along a
    straight line if the data follows a Normal distribution. The slope of the line indicates the

standard deviation of the data. The y-intercept indicates the mean of the data.  For data
sets of < 30 observations, only substantial departure from linearity should be
interpreted as conclusive evidence of non normality.
How the plot differs from a

straight line can give you information about how the data distribution differs from a

Normal distribution.

If the graph is (X(i) , Zi), A light-tailed distribution (relative to the Normal) will give an
      S-shaped plot with the left end of the plot curving upward. A heavy-tailed distribution

(relative to the Normal) will give an S-shaped plot with the left end of the plot curving
      downward. A right-skewed distribution will give a plot having middle points falling above

the line and end points falling below the line. Note: Some statistical packages plot (X(i) , Zi)

instead of  (Zi, X(i)). The interpretation of the shape of the plot as an indication of how the

distribution differs from normality will then be reversed.

 

Probability plots can be used to check distributional assumptions for distributions other than

Normal. For example, to check whether data comes from an exponential distribution, compare

percentiles of data to percentiles of a standard Exponential (with rate λ = 1). Additionally,

to check whether two sets of data come from the same underlying distribution, plot the

percentiles of the first data set versus the percentiles of the second data set. The plot

should be approximately linear.

 

The Exponential Distribution

 

The exponential probability density function is widely used in engineering to describe the

distribution of many types of variables, most often, the distribution of waiting times

between occurrences of successive events. A random variable X has an exponential

distribution with parameter λ (λ >0) if its pdf is f(x) = λ exp(-λx), x >0,. 

The mean of an exponential r.v. with parameter λ is 1/λ  . The standard deviation of an
exponential r.v. with parameter
λ is also 1/λ . The StatConcepts Lab “How are

Populations Distributed” allows you to visualize exponential pdfs with different parameters λ.  

 

Suppose the number of occurrences of an event in a time interval of length t follows a

Poisson process with rate αt  and the number of occurrences in nonoverlapping intervals

are independent. It can be shown that the waiting time until the first occurrence of the

event follows an exponential distribution with parameter α.

 

The exponential distribution has the memoryless property, meaning that the probability we

wait at least b additional minutes for an event to occur, given that we’ve already waited at

least a minutes (a<b) is the same as the probability that we have to wait b minutes from

the start. In other words, the distribution of additional waiting time is exactly the same as

the distribution of original waiting time, or distribution of additional waiting time is

independent of how long you’ve already waited. (The distribution of the number of

occurrences until the first success, given the events are independent with constant

probability of success, p, also has this property.)