Purpose

  • This is a personal notes to organize the knowledge
  • Pick up basic statistics concept to
    • Explore more deeply in statistics
    • Support my research

Target Audience

For those who do not take statistics course before

Article Structure

@( use xmind as map of this article )

Content start!

Basic Concepts

What is statistics?

Statistics is the science of collecting, analyzing, interpreting, and presenting (numerical) data.

Population V.S. Sampling

( figure for describe relationship between Population & Sampling )

Why sampling ?

We can not understand ( or collect ) the whole population, so we only try to infer the whole picture of population with sampling. Sample is portion of population Census(普查) is the only way totally understand population.

Is a sample representative?

(This need to be survey)

Now, we have to introduce some common term for statistics. Let’s get start!

Terminology

Descriptive vs Inferential

Descriptive

  • Graphical or numerical summaries of data.
  • Describing (visualizing or summarizing) a set of data.

Inferential

  • Making a “scientific guess” on unknowns.
  • Trying to say something about the population.

Parameter vs Statistic

  • A numerical summary of a population is a parameter.
  • A numerical summary of a sample is a statistic.

Levels of data measurement

  • Nominal.
  • Ordinal.
  • Quantitative: interval or ratio.

Nominal

{% asset_img Nominal.png %}

  • Arithmetic operations cannot be applied on nominal data.
  • No rank

Ordinal

{% asset_img Ordinal.png %}

  • Arithmetic operations cannot be applied on ordinal data.
  • Can be rank

Quantitative (interval and ratio) levels

Example:

  • Degrees in Celsius or Fahrenheit.
  • Heights, weights, income, prices. qualitative (categorical): nominal, ordinal quantitative (numeric): interval, ratio

Data visualization

Here are some common chart for data visualization

  • Frequency distributions
  • Histograms
  • Frequency polygons
  • Line charts
  • Pie charts
  • Bar charts
  • Scatter plot

An Example

How to get some feeling on 731 numbers? {% asset_img Example.png %}

Frequency distributions

{% asset_img Full_frequency_distributions.png %}

Features

  • Count
  • Grouping data
  • Observe outliers

Histograms

{% asset_img Histograms.png %}

Features

  • Contiguous rectangles

Frequency polygons

{% asset_img Frequency_polygons.png %}

Features

  • Compare multiple frequency distributions

Line charts

{% asset_img Line_charts.png %}

Features

  • Depict a time series data set

Pie Chart

{% asset_img Pie_charts.png %}

Features

  • Relative frequency distributions
  • Not suitable for comparing averages

Bar Chart

{% asset_img Bar_charts.png %}

Features

  • Noncontiguous
  • Visualizing the proportions of each categories
  • Demonstrating the di↵erences

Bar Chart v.s. Histograms

{% asset_img Bar_charts_vs_histograms.png %}

A bar chart uses noncontiguous bars to visualize categorical data.

A histogram uses contiguous bars to visualize quantitative data.

Scatter Plot

{% asset_img Scatter_plot.png %}

two vales are both measured in quantitative scales

Bike rental example (2011, 2012)

Resource & Reference

Statistics and Data Analysis for Engineers Part 1: Introduction and Descriptive Statistics, Ling-Chieh Kung, NTU IM @(shareSlide link)