The objective of environmental measurements can be qualitative or quantitative. For example, the presence of lead in household paint is a topic of concern. The question may be "Are there toxic metals present in the paint in a certain home?" An analysis designed to address this question is a qualitative analysis, where the analyst screens for the presence of a certain pollutant or class of substances. The next obvious question, of course, is "How much lead is in the paint?" This type of analysis is called quantitative analysis and it not only addresses the question of the presence of lead in the paint, but also its concentration.
Let us say that the analyst uses a technique that can measure as low as 1 m g/kg of lead in paint. For a particular sample he says that no lead was detected. This means that the concentration of lead in that sample is less than 1 m g/kg. It is not proper to report the results of this analysis as "zero lead" since it may well be present at some lower concentration, which our measuring device cannot detect. The report should read "lead not detected above 1m g/kg" or "lead not detected; detection limit 1m g/kg". Quantitative analyses of environmental samples are particularly challenging because the concentrations of pollutants in environmental samples are often very low, and the matrix is usually complex.
All numerical results obtained from experimentation are accompanied by a certain amount of error and an estimate of the magnitude of this error is necessary to validate the results. Error is a statistical term, and should not be confused with the common meaning of the word, which implies a blunder and blame to be laid on someone! All measurements have some error associated with them. One cannot eliminate it, but with careful work, the magnitude of the error can be characterized and, sometimes, the magnitude can be reduced with improved techniques.
In general, the types of error can be classified as random or systematic. If the same experiment is repeated several times, the individual measurements will fall around an average value. The differences are due to unknown or uncontrollable factors, and are termed random error. Random errors, if they are truly random, should cluster around the true value, and have equal probability of being above or below it. Chances are that the measurements will be slightly high or slightly low more often than they will be very high or very low. The measure of the amount of random error present in a set of data is the precision or reproducibility.
On the other hand, systematic error tends to deviate or bias all the measurements in one direction. So accuracy, which is a measure of deviation from the true value, is affected by systematic error. Accuracy is defined as the deviation of the mean from the true value.
Accuracy = (mean - true value) / true value ( .1)
Often the true value is not known. For the purpose of comparison, measurement by an established method or by an accepted institution is sometimes accepted as the true value. In colloquial English, accuracy and precision are often used interchangeably, but in statistics they mean quite different things.
The difference between accuracy and precision is further illustrated by the following example. A sample containing 10 m g/kg of lead is analyzed by acid digestion and atomic absorption spectrometry . Five repeat analyses are performed by four different laboratories, resulting in the data presented in Table .4 and Figure .1. The analyses performed by laboratory A show less variability but are not close to the true value. So, this laboratorys work has high precision but low accuracy. This is a strong sign that this laboratory is subject to some identifiable source of bias. Its calibrations, standards and methods should be examined carefully. Laboratory Ds data on the other hand show little variability and good accuracy. Laboratory B and C both have data with poorer precision, but one is accurate while the other is not. Actually, for a small number of replicates, when good accuracy is obtained with poor precision, it is usually accidental!
Since every measurement contains certain amount of error, the result from a single measurement can not be accepted alone as a true value. An estimate of the error is necessary to predict within what range the true value may lie. The random error in a measurement is estimated by repeating the measurement several times, which provides two valuable pieces of information: the average value and the variability of the measurement. The most widely used measure of average value is the arithmetic mean, ,
( .2)
where S xi is the sum of the replicate measurements and n is the total number of measurements.
The most useful measure of variability (or precision) is the standard deviation, s . This is calculated as:
( .3)
When only a few pieces of data are available, the calculated standard deviation may be underestimated, since the mean is used as the true value, and the mean was calculated from the same small data set. To obtain an unbiased estimate of s , which is designated s, N-1 is used in the denominator.
( .4)
As the number of data points becomes larger, the value of s approaches that of s . When N becomes as large as 20, the equation for s may be used. Another term commonly used to measure the variability is called the coefficient of variation (CV) or the relative standard deviation (RSD). It may also be expressed as a percentage
RSD = (s/) or %RSD = (s/) x 100 ( .5)
Relative standard deviation is the parameter of choice for comparing the precision of data of different units and magnitudes and is used extensively in analytical sciences.
In the absence of systematic error and if a large number of measurements are made, the results fall into a normal or Gaussian distribution. Normal distribution is defined by the equation
( .6)
where y is the frequency of occurrence and m is the arithmetic mean. The normal distribution has a bell-like shape. In a set of data which is normally distributed, 68% of the measurements will lie within ± 1s of the mean, 95% within ± 1.96s , and 99.7% within ± 2.97s . It is worth pointing out that it is not always possible to prove that repeated measurements will fall into a normal distribution, but in general it provides a good approximation. That is especially true when the number of measurements are not very large.
In environmental measurements it is often the case that the analytical method is being used near its limit of detection. This may result in skewed data sets which do not show a normal distribution. In cases where the distribution is not normal, the mean may not be the best indicator of the "center" of such a set. These data sets will usually contain some high outliers, but low outliers are simply reported as "not detected". The effect is to cut off the normal distribution on the low side. When high errors are more likely than low ones, the log normal distribution often fits the data better. Figure .2 shows the normal and log normal distribution curves. There are several ways of treating log normal data. As the name "log normal" hints, the logarithms of the data are normally distributed. This means that one can take the log of each data point and then do the statistical calculations. At the end, the antilog of the final value is taken. For instance, the geometric mean, calculated as the antilog of the arithmetical average of the logs of n data points, may give a better estimate of the center of the data, if a log normal distribution is present.
( .7)
Sometimes, instead of setting "not detected" sample values to zero, it is set to 1/2 of the method detection limit, but this method is not universally accepted, and its statistical validity is not proven.
In dealing with non-normally distributed data, the median, the central value when all the points are arranged by magnitude or the mode, the most frequently occurring value, in some cases better describe the central tendency of the data.
If there is no systematic error, then the distribution of random error can be computed from a series of measurements. Thus, it is possible to predict, with a certain amount of "confidence", a range within which the true value should lie. This range is called the confidence interval and the end values of this range are called confidence limits. The larger the confidence interval is, the higher the probability that the true value lies within that confidence interval. If we assume a normal distribution of error, then 95% of the measurements will lie between . This is the 95% confidence interval, i.e. there is a 95% probability that the true value lies within this range. Similarly, the 99.7% confidence interval falls between .
If the number of measurements is not large, the estimated experimental standard deviation (s) is not equal to s . So in this case, instead of the normal distribution, the confidence interval is calculated using Student's-t distribution. The t-distribution table is presented in Appendix 1. In this case, the confidence interval is computed as:
( .8)
The value of t is determined from the t-distribution table and its value depends upon the number of degrees of freedom, which is one less than the number of measurements (n-1), and the desired confidence level.
Sometimes several data sets are combined and a grand mean is calculated from the pooled data. For example, several carbon monoxide measurements are made in a city every day, and at the end of the month a grand average is to be obtained. If the precision of the various data sets is not significantly different, the overall average can be computed as:
( .9)
and the corresponding standard deviation is:
( .10)
where the weights, Wk, are the corresponding number of measurements in each subaverage.
If the precision of the different sets varies significantly, the values are weighted inversely according to their variance (s2):
( .11)
then Equation 1.9 is applied.
Just as several means may be combined to obtain a grand average, standard deviations also can be combined to obtain a single estimate. If there are k sets of measurements whose standard deviations are not significantly different, then the pooled standard deviation, sp, can be calculated as:
( .12)
where, s1 , s2 and si are the standard deviations of i separate sets of measurements. The number of measurements in these sets are n1, n2, and ni respectively.
As mentioned before, because of random error, the measured mean value is seldom exactly equal to the true value. Significance tests are used to decide if the difference between a known value and the measured value or between two measured values can be due to random error alone. The quantities being compared could be means and standard deviations from two different sets of measurements, to see if the means are really different, or to see if a certain measurement is a statistically improbable outlier.
It is often important to know whether the mean of several measurements is significantly different from a known value. The known value could be a specified standard value or regulatory threshold for environmental compliance. The significance of the difference between the experimental and the known or target value must be determined, to see if the difference is due to random error only or if there is truly a difference, a statistically significant difference, between the two numbers. To do this, a confidence interval for the measured mean is calculated. If the target value falls within the same range, then it can be said that the two values are not different at that level of confidence.
This type of comparison is necessary, for instance, when the two sets of measurements are made at different laboratories or using different analytical techniques. It can also be useful when one is comparing two sets of samples from different populations or areas, to determine if the measured variable, (concentration, temperature, pH or whatever), is truly different between the two populations, or if the difference can be ascribed to random error alone.
Let xa and xb be the means from the two sets of measurements whose standard deviations are sa and sb respectively. Then we must determine if the difference between these means, ( xa - xb ), is significantly different from zero, within a chosen confidence interval.
Case 1: When the standard deviations of the two measurement sets are not thought to be significantly different from each other, for instance, if water samples were taken from two well-mixed sources, and were all done by the same analyst, using the same method. If there is a doubt as to whether the standard deviations are different or not, an F test, described below, will give that information. The variances of both means are calculated as:
( .13)
using the pooled value of s. Then the desired confidence level is chosen. The uncertainty of the difference between the two means is calculated as:
( .14)
using a value for t, from the t-distribution table, corresponding to the probability level chosen and the number of degrees of freedom, (nA+ nB - 2). If the difference between the two means being examined is not greater than the uncertainty in the difference, UD , the means are not considered to be different.
Case 2: If it is thought that s1 and s2 might be significantly different from each other, then the variances of the means are:
( .15)
The effective number of degrees of freedom, f, is:
( .16)
The result from the above equation is rounded to the nearest whole number.
Finally, ( .17)
using the f value calculated above and the desired confidence level to determine the t* value to be selected from the t-table. If the difference between the two means being examined is not greater than the uncertainty in the difference, UD , the means are not significantly different.
Example: The following data were obtained for PCB in fish tissues from two different rivers. Are the two populations of fish contaminated to a significantly different degree at the 95% confidence level?
River A | River B |
2.34 ng/g | 1.55 |
2.66 | 1.82 |
1.99 | 1.34 |
1.91 | 1.88 |
First the means and standard deviations of each group must be calculated. The mean for A is 2.225 and SD is 0.2988. For B the mean is 1.648 and SD is 0.2417.
The variances of each of the means are:
The effective number of degrees of freedom are:
The t value for 95% CL and 6 degrees of freedom is 2.45.
The difference between the two means is 2.225 - 1.648 = 0.577. The uncertainty in the difference is 0.90. The uncertainty is bigger than the difference, so there is not a significant difference in these two batches of fish, at the 95% CL.
The F-test is used to compare the precision of two sets of analytical measurements. The measurements can be from different methods, laboratories, or instruments. They can also be a series of different samples from different populations, as in the example above. If s1 and s2 are the standard deviations from the two measurements, the F value is calculated as:
( .18 )
The value of F is compared to a critical value, Fc, from standard tables. The value of Fc depends upon the degrees of freedom of the two measurements, and the chosen level of confidence. If the calculated F does not exceed the Fc from the table, then it can be concluded that the standard deviations do not differ.
Example: If we want to compare the deviations in the PCB content of the fish in the two rivers, we can apply an f test to the data given in the previous example..
F from the table for 3 degrees of freedom and 95% CL is 9.27. The calculated F is less than that from the table so the variation within each population can be considered to be the same.
When several measurements are taken, certain values may be unusually different from others in the data set. These points are called outliers. It is always not easy to tell from statistics if an outlier is due to inherent random error present in the measurement or is due to some identifiable error. It is always best to search carefully for a reason before a measurement is discarded as an outlier. When a system or population is not well known, or when only a few measurements have been made on it, it is very difficult to determine with a high degree of confidence, that an outlier is truly due to an error, not just an extreme sample in the normal distribution.
For example, ten repeat measurements (in m g/g) for lead in a soil sample are as follows: 1.0, 1.1, 1.2, 0.9, 1.0, 0.8, 1.0, 1.1, 0.9, 2.8. The last number looks like an outlier and may have been caused by instrument malfunction, human error or contamination. One reason why the last measurement in the above data set is immediately considered to be an outlier is because it does not fit a normal distribution.
A simple and rapid method of determining if an outlier may be rejected is to divide the difference between the questioned value and the mean by the standard deviation.
.
If this ratio is less than 4, the value should be retained, when s is well defined. If s has been estimated from only a few measurements, a value of 6 should be reached before it can be discarded at an confidence level of about 98%. For the data set above, is 1.18 and s = 0.58. The ratio is only 2.1, so the last figure is not an outlier by this test.
The Dixon test is based on the normal distribution of error. In this test the data is arranged in the order of increasing numerical value: x1< x2< x3 < ... < xn.
The suspect value may be either xn or x1. The ratio t is calculated as follows depending upon the number of measurements.
For n = | If xn is suspect | If x1 is suspect |
|
3 to 7 |
t 10 = |
(xn-xn-1)/(xn-x1) |
(x2-x1)/(xn-x1) |
8 to 10 |
t 11 = |
(xn-xn-1)/(xn-x2) |
(x2-x1)/(xn-1-x1) |
11 to 13 |
t 21 = |
(xn-xn-2)/(xn-x2) |
(x3-x1)/(xn-1-x1) |
The ratio is compared to critical values from Table .5 at a predetermined level of risk of false rejection. If the computed ratio is greater than the tabulated value, the value may be considered an outlier at that risk level.
From the above discussion you should have a sense that the simple reporting of an average value is not much use to the reader. At minimum, one needs to report the mean, the standard deviation and the number of degrees of freedom, or the number of measurements. Otherwise, the user of the data has no way of applying statistical tests to the data. The confidence the user can place in the reported data is essentially described in the information given in these three parameters.
The determination of the significance or non-significance of measurements is important because an environmental study can usually be proposed as a hypothesis. The data obtained after the measurements are performed are tested statistically to see if the hypothesis is proven or not. What kinds of hypotheses can be tested? If the question is " Is this river polluted by PCB?" the hypothesis will be stated as "The river contains a significantly higher concentration of PCB than another river which we consider to be clean (i.e., our background site)". Another study might look at tests of an emission control system: "Is the emission of SO2 from this stack too high?" The hypothesis can be stated as: "The concentration of SO2 in the stack effluent is not significantly higher than the concentration stated in the companys release requirement." The statistical tests discussed above can be used to test these hypotheses, as long as all the required data have been collected--the means and standard deviations of both the area of interest and the background or standard to which it is being compared.
1. Benzene is present in air at 10 ppmv. Express this in m g/liter, in mg/m3.
2. Ca+2 is found in a water sample at a level of 10 m g/liter. How much calcium, expressed as calcium carbonate, is present in 1 m3 of the water?
3. What are the major steps in the process of performing an environmental analysis?
4. Distinguish between random and systematic errors. Which of these can be treated statistically?
5. Explain how it is possible to have high precision without high accuracy.
6. For the following set of analyses of an urban air sample for carbon monoxide
determine the mean, standard deviation, median, and coefficient of variation.
325, 320, 334, 331, 280, 331, 338 m g/m3.
7. In the above data set, is there any value which can be discarded as an outlier?
8. Samples of bird eggs were analyzed for DDT residues. The samples were collected from two different habitats. The question is: Are the two habitats different from each other in the amount of DDT to which these birds are exposed? The data reported are:
|
|
Mean Conc. DDT (ppb) |
|
Area 1 |
|
|
|
Area 2 |
|
|
|
At the 95% CL, are the two sets significantly different or not?
9. Soil samples were collected at different areas surrounding an abandoned mine and analyzed for lead. At each area several samples were taken. The soil was extracted with acid, and the extract analyzed using flame atomic absorption spectrometry. The following data were obtained:
Area | Number of samples | Pb concentration in ppm |
A | 4 | 1.2, 1.0, 0.9, 1.4 |
B | 5 | 0.7, 1.0, 0.5, 0.6, 0.4 |
C | 3 | 2.0, 2.2, 2.5 |
D | 5 | 1.4, 1.1, 0.9, 1.7, 1.5 |
E | 4 | 1.9, 2.3, 2.5, 2.5 |
Calculate:
a) The overall mean of all the measurements.
d) The pooled estimate of the standard deviation of the method
c) The 95% and 80% confidence interval for the measurement at each spot.
d) Which of the above areas show mean Pb concentrations which are not significantly
different from each other at a 90% CL, indicating a similar pattern of contamination?
10. The analysis of waste water for benzene using purge and trap GC/MS yielded a pooled
standard deviation of 0.5 m g/L. A waste water sample from a
oil refinery showed a benzene concentration of 5.5 m g/L.
Calculate the 75, 85, 95 and 99% confidence interval, if the concentration reported was
based upon:
a) a single analysis
b) the mean of 5 analyses
c) the mean of 15 analyses
11. Distinguish between the following terms:
a) Sensitivity and detection limit
b) Limit of quantitation and limit of linearity
c) quantitation by calibration curve and the method of standard addition.
d) Confidence interval and confidence limits
12. The following data were obtained in calibrating a halogen specific GC detector. A linear relationship between detector response in mv and concentration of dichloroethane (DCE) is expected. Analysis of calibration standards yield the following results:
DCE concentration, ng/ml | Detector output , mv |
1.0 |
-54.0 |
2.1 |
-28.2 |
3.05 |
+2.8 |
4.0 |
+32.2 |
5.05 |
+65.8 |
a) Plot the calibration data and draw a line through the points by eye.
b) Determine the best straight line by a least squares fit
c) Calculate the concentration of an unknown for which the detector output was 7.8 mv.
d) What is the calibration sensitivity?
13. The following calibration data were obtained for a total organic carbon analyzer (TOC) measuring TOC in water:
Concentration, ug/L | Number of replicates | Mean Signal | Standard deviation |
0.0 |
20 |
0.03 |
0.008 |
6.0 |
10 |
0.45 |
0.0084 |
10.0 |
7 |
0.71 |
0.0072 |
19.0 |
5 |
1.28 |
0.015 |
a) Calculate the calibration sensitivity
b) What is the detection limit for this method?
c) Calculate the relative standard deviation for each of the replicate sets.