Chapter One

Statistics: The Art and Science of Learning from Data.

Alier Ëë Reng https://www.alierwaaireng.com/
01-06-2019

Section 1.1

An Overview of Statistics

What You Should Learn

A Definition of Statistics

Statistics is the science of collecting, organizing, analyzing and interpreting data in order to make decisions.

Definition

Example 1

Identifying data Sets

In a recent survey, 3002 adults in the United States were asked if they read news on the Internet at least once a week. Six hundred of the adults said yes. Identify the population and the sample. Describe the data set. (Source: Pew Research Center)

Solution

Practice Problem 1

The U.S. Department of Energy conducts weekly surveys of approximately 900 gasoline stations to determine the average price per gallon of regular gasoline. On December 29, 2003, the average prices was 1.478 per gallon. Identify the population and the sample.(Source: U.S. Department of Energy)

Note: Whether a data set is a population or a sample usually depends on the context of the real-life situation.

Example 2
Distinguishing between a Parameter and a Statistic

Decide whether the numerical value describes a population parameter or a sample statistic. Explain your reasoning.

  1. A recent survey of a sample of MBA’s reported that the average starting salary for an MBA is less than $65,000.(Source: The Washington Post Company)

  2. Starting salaries for the 667 MBA graduates from the University of Chicago Graduate School of Business increased by 8.5% from previous year.

  3. In a random check of a sample retail stores, the Food and Drug Administration found that 34% of the stores were not storing fish at the proper temperature.

Solution

  1. Because the average of $65,000 is based on a subset of the population, it is a sample statistic.

  2. Because the percent increase of 8.5% is based on all 667 graduates’ starting salaries, it is a population parameter.

  3. Because the percent of 34% is based on a subset of the population, it is a sample statistic.

Branches of Statistics

The study of statistics has two major branches: descriptive statistics and inferential statististics.

Example 3

Descriptive and Inferential Statistics

Decide which part of the study represents the descriptive branch of statistics. What conclusions might be drawn from the study using inferential statistics?

  1. A large sample of men, aged 48, was studied for 18 years. For unmarried men, approximately 70% were alive at age 65. For married men, 90% were alive at age 65.(Source: The Journal of Family Issues)

  2. In a sample of Wall Street analysts, the percentage who incorrectly forecasted high-tech earnings in a recent year was 44%.(Source: Bloomberg News)

Solution

  1. Descriptive statistics involves statemtns such as “For unmarried men, approximately 70% were alive at age 65” and “For married men, 90% were alive at age 65.” A possible inference drawn from the study is that being married is associated with a longer life for men.

  2. The part of this study that represents the descriptive branch of statistics involves the statement “the percentage of Wall Street analysts who incorrectly forecasted high-tech earnings in a recent year was 44%.” A possible inference drawn from the study is the stock market is difficult to forecast, even for professionals.

Section 1.2

Data Classification

What You Should Learn

Types of Data

There are two major types of data: quantitative data and qualitative data.

Definition

Quantitative data consist of numerical measurements or counts. Examples: height, weight, speed of a car, number of houses, etc.

Qualitative data consist of attributes, labels, or nonnumerical entries. Examples: gender, race, country of origin, etc.

Levels of Measurement

Data at the nominal level of measurement are qualitative only. Data at this level are categorized by using names, labels, or qualities. E.g. Zip codes, names of network affiliates, etc. No mathematical computations can be made at this level.

Data at the ordinal level of measurement are qualitative or quantitative. Data at this level can be arranged in order, but differences between data entries are not meaningful. E.g. Grammy Awards, letter grades, movie ranking, etc.

Data at the interval level of measurement are quantitative. The data can be ordered, and your can calculate meaningful differences between data entries. At the interval level, a zero entry simply represents a position on a scale; the entry is not an inherent zero. E.g. temperature, State Gov’t Tax collections by year, etc.

Data at the ratio level of measurement are similar to data at the interval level, with the added property that a zero entry is an inherent zero. A ratio of two data values can be formed so one data value can be expressed as a multiple of another. E.g. home prices, volumes, fish lengths, etc.

Section 1.3

Experimental Design

What You Should Learn

Experimental Design

The purpose of any statistical study is to use sample information to make data-informed decisions about a general population of interest.

Guides for an Experimental Design

Data Collection

Research data can be collected in several ways depending on the focus of one’s study. Below are four methods of data collection:

A major distinction between the observational study and an experiment is that a researcher does not manipulate the subjects in an observation, whereas the researcher applies a treatment to one group of the subjects (treatment group ) and a placebo to the control group, in an experiment. The experiment in which both the researcher and the subjects do not know which subjects are receiving a placebo is called a double blind experiment. And the experiment in which the researcher knows which subjects are receiving the placebo (control group) and which one aren’t (treatment group) is called a single-blind experiment.

Sampling Techniques

When collecting data it is imperative to watch for biases. Below are the types of sampling methods that are used to collect unbiased data:

Abuses of Statistics

Sometimes statistics can be used either wittingly or unwittingly to mislead the readers. For instance, a researcher may deliberately choose a biased sample to achieve his or her objective(s). Or in another situation, a researcher may ask questions that encourage respondents to either intentionally or unintentionally answer the questions in a certain way.

Citation

For attribution, please cite this work as

Reng (2019, Jan. 6). Reng Data Science Institute: Chapter One. Retrieved from https://www.rengdatascience.io/posts/2019-01-06-chapter-one/

BibTeX citation

@misc{reng2019chapter,
  author = {Reng, Alier Ëë},
  title = {Reng Data Science Institute: Chapter One},
  url = {https://www.rengdatascience.io/posts/2019-01-06-chapter-one/},
  year = {2019}
}