Purpose
- This is a personal notes to organize the knowledge
- Pick up basic statistics concept to
- Explore more deeply in statistics
- Support my research
Target Audience
For those who do not take statistics course before
Article Structure
@( use xmind as map of this article )
Content start!
Basic Concepts
What is statistics?
Statistics is the science of collecting, analyzing, interpreting, and presenting (numerical) data.
Population V.S. Sampling
( figure for describe relationship between Population & Sampling )
Why sampling ?
We can not understand ( or collect ) the whole population, so we only try to infer the whole picture of population with sampling. Sample is portion of population Census(普查) is the only way totally understand population.
Is a sample representative?
(This need to be survey)
Now, we have to introduce some common term for statistics. Let’s get start!
Terminology
Descriptive vs Inferential
Descriptive
- Graphical or numerical summaries of data.
- Describing (visualizing or summarizing) a set of data.
Inferential
- Making a “scientific guess” on unknowns.
- Trying to say something about the population.
Parameter vs Statistic
- A numerical summary of a population is a parameter.
- A numerical summary of a sample is a statistic.
Levels of data measurement
- Nominal.
- Ordinal.
- Quantitative: interval or ratio.
Nominal
{% asset_img Nominal.png %}
- Arithmetic operations cannot be applied on nominal data.
- No rank
Ordinal
{% asset_img Ordinal.png %}
- Arithmetic operations cannot be applied on ordinal data.
- Can be rank
Quantitative (interval and ratio) levels
Example:
- Degrees in Celsius or Fahrenheit.
- Heights, weights, income, prices. qualitative (categorical): nominal, ordinal quantitative (numeric): interval, ratio
Data visualization
Here are some common chart for data visualization
- Frequency distributions
- Histograms
- Frequency polygons
- Line charts
- Pie charts
- Bar charts
- Scatter plot
An Example
How to get some feeling on 731 numbers? {% asset_img Example.png %}
Frequency distributions
{% asset_img Full_frequency_distributions.png %}
Features
- Count
- Grouping data
- Observe outliers
Histograms
{% asset_img Histograms.png %}
Features
- Contiguous rectangles
Frequency polygons
{% asset_img Frequency_polygons.png %}
Features
- Compare multiple frequency distributions
Line charts
{% asset_img Line_charts.png %}
Features
- Depict a time series data set
Pie Chart
{% asset_img Pie_charts.png %}
Features
- Relative frequency distributions
- Not suitable for comparing averages
Bar Chart
{% asset_img Bar_charts.png %}
Features
- Noncontiguous
- Visualizing the proportions of each categories
- Demonstrating the di↵erences
Bar Chart v.s. Histograms
{% asset_img Bar_charts_vs_histograms.png %}
A bar chart uses noncontiguous bars to visualize categorical data.
A histogram uses contiguous bars to visualize quantitative data.
Scatter Plot
{% asset_img Scatter_plot.png %}
two vales are both measured in quantitative scales
Bike rental example (2011, 2012)
Resource & Reference
Statistics and Data Analysis for Engineers Part 1: Introduction and Descriptive Statistics, Ling-Chieh Kung, NTU IM @(shareSlide link)