# Math 333/333H

Week 1

1. Discussion of syllabus and class standards
2. Collection of  student information
3. Overview of probability and statistics
4. Graphical and numerical summaries of data (using StataQuest)

Statistics: A science that deals with the methods of collecting, organizing, and summarizing data in such a way that valid conclusions can be drawn from them.

Two types of statistical investigations: Descriptive and inferential

Common goal of statistical investigations: Explore characteristics of a large group of items (population) based on information about a few (sample).

The investigation may be observational (information is collected on subjects after the fact) or experimental (information is collected from an experiment designed to answer a particular question).

Information collected in the form of variables:

Qualitative: describes observations belonging to a set of categories

Quantitative: describes observations that take numerical values—discrete or continuous

A summary measure computed from the collected data is a statistic. A summary measure that describes a characteristic of the population is a parameter. Usually, a statistic is used to estimate a parameter.

For observational studies, data are collected using random sampling methods (simple random sample, stratified random sample, cluster sample) applied to target population. For experimental studies, data are collected by carrying out a designed experiment.

### Tabular and Graphical Summaries of Data

1. Stem-and-leaf plot-represents data by plotting first digit as stem and second digit as leaf. Gives visual summary of data, while retaining actually data values
2. Frequency table – data arranged in tabular form to show the number of times an item falls into a particular category  (frequency) or to show the relative frequency; Need to choose class widths, class intervals, class boundaries. Gives information about chance that a particular item from population will fall in a certain category
3. Histogram-graphical representation of frequency table; gives information about symmetry of data
4. Pie charts and bar charts-useful to show percentage of total items falling into a particular category
5. Scatter plots-useful for showing relationship between two variables collected on same items (bivariate data)
6. Time series plot-line plot showing variation in data over time

### Numerical Summaries of Data

1. Measures of central tendency: Mean, median, mode; The mean is an arithmetic average of values; the median is the value such that half of the data are less than this value and half are greater than this value; the mode is the most frequent value. May also be interested in weighted mean
2. p-th Percentile: Value such that when data are sorted from smallest to largest, at least p percent of the observations are at or below this value and at least (1-p) percent are at or above this value. Particularly interested in 25%, 50% (median) and 75%. IQR=75%-25%
3. Box-plot: graphical display indicating middle 50% of data as a box, with horizontal line drawn in box to indicate median and “whiskers” drawn from ends of box to farthest observations still within 1.5 x (fourth spread) of data; Outliers depicted by individual symbols
4. Measures of dispersion:  Range, Mean absolute deviation, standard deviation
5. Chebyshev’s Rule: For any data set, at least 75% of observations will fall within 2 standard deviations of the mean; At least 89% will fall within 3 standard deviations of the mean
6. Empirical Rule: For a data set whose distribution is somewhat bell-shaped (or symmetric), approximately 68% of observations will fall within 1 standard deviation of the mean. Approximately 95% will fall within 2 standard deviations of the mean; Approximately 99% will fall within 3 standard deviations of the mean