Chapter 1 Overview

**Statistics**: A science that deals with the
methods of collecting, organizing, and summarizing data in such a way that **valid **conclusions can be drawn from them.

**Two
types** of
statistical investigations: Descriptive and inferential

Common
goal of statistical investigations: Explore characteristics of a large group of
items (**population**) based on information about a few (**sample**).

The
investigation may be **observationa**l (information is collected on subjects
after the fact) or **experimental **(information is collected from an
experiment designed to answer a particular question).

Information
collected in the form of **variables**:

**Qualitative**: describes observations
belonging to a set of categories

**Quantitative**: describes observations
that take numerical values—discrete or continuous

A
summary measure computed from the collected data is a **statistic**. A
summary measure that describes a characteristic of the population is a **parameter**.
Usually, a statistic is used to estimate a parameter.

- Stem-and-leaf
plot-represents data by plotting first digit as stem and second digit as
leaf. Gives visual summary of data, while retaining actually data values
- Frequency table – data
arranged in tabular form to show the number of times an item falls into a
particular category (
**frequency**) or to show the relative frequency; Need to choose class widths, class intervals, class boundaries. Gives information about chance that a particular item from population will fall in a certain category - Histogram-graphical
representation of frequency table; gives information about
**symmetry**of data - Pie charts and bar
charts-useful to show percentage of total items falling into a particular
category
- Scatter plots-useful
for showing relationship between two variables collected on same items (
**bivariate**data) - Time series plot-line
plot showing variation in data over time

- Measures of central
tendency: Mean, median, mode; The
**mean**is an arithmetic average of values; the**median**is the value such that half of the data are less than this value and half are greater than this value; the**mode**is the most frequent value. May also be interested in weighted mean - p-th Percentile: Value such that when data are
sorted from smallest to largest, at least p percent of the observations
are at or below this value and at least (1-p) percent are at or above this
value. Particularly interested in 25%, 50% (median) and 75%. IQR=75%-25%
- Box-plot: graphical
display indicating middle 50% of data as a box, with horizontal line drawn
in box to indicate median and whiskers drawn from the left edge of the box(
25
^{th}percentile) to the smallest observation; also from the right edge of the box (75^{th}percentile) to the largest observation. Outliers depicted by individual symbols - Measures of
dispersion: Range, standard
deviation
- Chebyshev’s Rule: For any data
set, at least 75% of observations will fall within 2 standard deviations
of the mean; At least 89% will fall within 3 standard deviations of the
mean
- Empirical Rule: For a
data set whose distribution is somewhat bell-shaped (or symmetric),
approximately 68% of observations will fall within 1 standard deviation of
the mean. Approximately 95% will fall within 2 standard deviations of the
mean; Approximately 99% will fall within 3 standard deviations of the mean