Mathematics/Statistics/Core Measurements

From Dev Wiki
< Mathematics‎ | Statistics
Revision as of 13:43, 11 May 2020 by Brodriguez (talk | contribs) (Add mean notation)
Jump to navigation Jump to search

Below are some of the most basic, and regularly used forms of measurements in statistics.
All of these measurements are used to gather information about a list of items.

Note: Most of these are easier to use when the list of items is sorted by some meaningful ordering. For some, such as #Median, they will only work on sorted lists.


Mean/Average

The "mean" and "average" are effectively two different words for the same thing.
Effectively, this attempts to get the most "middle" value given a set of items.

Tip: Often, this is represented as , pronounced "x bar".

Standard (Unweighted) Mean

Unweighted Mean is what most people think of when someone says "mean" or "average.
Effectively, take all values in a list and add them together. Then divide this sum by the total count of original values.

Scientific Notation:

Given a list of n terms, [x_0, x_1, ..., x_{n-1}, x_{n}]:
 

Direct Notation:

Given a list of n terms, [x_0, x_1, ..., x_{n-1}, x_n]:
 

Example:

Given a list of [1, 4, 7, 5, 9, 9, 2, 10].
 
Step 1) Sum: 1 + 4 + 7 + 5 + 9 + 9 + 2 + 10 = 47
Step 2) Divide sum by count:  = 5.875

Weighted Mean

Weighted Mean is similar to above, except that each value has a "weight" associated with it.

Scientific Notation:

Given a list of n terms, [x_0, x_1, ..., x_{n-1}, x_{n}], with associated weights [w_0, w_1, ..., w_{n-1}, w_{n}]:
 

Direct Notation:

Given a list of n terms, [x_0, x_1, ..., x_{n-1}, x_n], with associated weights [w_0, w_1, ..., w_{n-1}, w_{n}]:
 

Example:

Given a list of [1, 4, 7, 5, 9, 9, 2, 10], the first 4 values are twice as important as the last 4.
 
Step 1) Sum: 2*1 + 2*4 + 2*7 + 2*5 + 1*9 + 1*9 + 1*2 + 1*10 = 64
Step 2) Divide sum by weights:  =  = 5.333


Median

Similarly to mean, the median attempts to get the most "middle" item given a set of items.
However, instead of doing so by literal value, it does this by count of items.

For odd numbered sets, the median is the exact middle number.

Given a list of [1, 2, 3, 5, 7], the median is 3.

For even numbered sets, the median is the middle two numbers.

Given a list of [1, 2, 3, 4, 5, 7], the medians are 3 and 4.


Mode

The mode is the value that occurs most frequently in a set of items.

Given a list of [1, 2, 2, 3, 3, 4, 4, 4], the mode is 4.


Range

Template:ToDo
Range is the difference between the lowest and highest values. Theoretically, it is yet another attempt to get the most "middle" value out of a set of items.

In other words, this attempts to measure how much values tend to spread apart in a given set.

Given a list of [2, 4, 5, 7, 8], the lowest and highest values are 2 and 8.
Thus, the range is 8 - 2 = 6

Note that this can be less than useful when the data set has outliers.
Template:ToDo

If we introduce a new value of 100 to our above list, we get [2, 4, 5, 7, 8, 100].
The lowest and highest values are now 2 and 100.
Thus, the range is now 100 - 2 = 98.
This isn't very descriptive of our data anymore.