Mathematics/Statistics/Core Measurements: Difference between revisions

Revision as of 12:40, 18 May 2020

Below are some of the most basic, and regularly used forms of measurements in statistics.
All of these measurements are used to gather information about a list of items.

Note: Most of these are easier to use when the list of items is sorted by some meaningful ordering. For some, such as #Median, they will only work on sorted lists.

Mean/Average

The "mean" and "average" are effectively two different words for the same thing.
Effectively, this attempts to get the most "middle" value given a set of items.

Tip: Often, this is represented as Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \bar{x}} , pronounced "x bar".

Standard (Unweighted) Mean

Unweighted Mean is what most people think of when someone says "mean" or "average.
Effectively, take all values in a list and add them together. Then divide this sum by the total count of original values.

Scientific Notation:

Given a list of n terms, [x_0, x_1, ..., x_{n-1}, x_{n}]:
 
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \bar{x} = \frac{1}{n} * \sum_{i=1}^n x_i}

Direct Notation:

Given a list of n terms, [x_0, x_1, ..., x_{n-1}, x_n]:
 
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \bar{x} = \frac{n_0 + n_1 +, ..., + x_{n-1} + x_n}{n}}

Example:

Given a list of [1, 4, 7, 5, 9, 9, 2, 10].
 
Step 1) Sum: 1 + 4 + 7 + 5 + 9 + 9 + 2 + 10 = 47
Step 2) Divide sum by count: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \frac{47}{8}}
 = 5.875

Weighted Mean

Weighted Mean is similar to above, except that each value has a "weight" associated with it.

Scientific Notation:

Given a list of n terms, [x_0, x_1, ..., x_{n-1}, x_{n}], with associated weights [w_0, w_1, ..., w_{n-1}, w_{n}]:
 
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \bar{x} = \frac{\sum_{i=1}^n w_ix_i}{\sum_{i=1}^n w_i}}

Direct Notation:

Given a list of n terms, [x_0, x_1, ..., x_{n-1}, x_n], with associated weights [w_0, w_1, ..., w_{n-1}, w_{n}]:
 
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \bar{x} = \frac{w_0n_0 + w_1n_1 +, ..., + w_{n-1}x_{n-1} + w_nx_n}{w_0 + w_1 +, ..., + w_{n-1}, w_n}}

Example:

Given a list of [1, 4, 7, 5, 9, 9, 2, 10], the first 4 values are twice as important as the last 4.
 
Step 1) Sum: 2*1 + 2*4 + 2*7 + 2*5 + 1*9 + 1*9 + 1*2 + 1*10 = 64
Step 2) Divide sum by weights: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \frac{64}{2 + 2 + 2 + 2 + 1 + 1 + 1 + 1}}
 = Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \frac{64}{12}}
 = 5.333

Median

Similarly to mean, the median attempts to get the most "middle" item given a set of items.
However, instead of doing so by literal value, it does this by count of items.

For odd numbered sets, the median is the exact middle number.

Given a list of [1, 2, 3, 5, 7], the median is 3.

For even numbered sets, the median is the middle two numbers.

Given a list of [1, 2, 3, 4, 5, 7], the medians are 3 and 4.

For some applications, you always want a single number as the median, including for even numbered sets.
In these instances, average the two median numbers together.

Given a list of [1, 2, 3, 4, 5, 7], the medians are 3 and 4.
We can average this together to get a single value of 3.5

Mode

The mode is the value that occurs most frequently in a set of items.

Given a list of [1, 2, 2, 3, 3, 4, 4, 4], the mode is 4.

Range

Template:ToDo
Range is the difference between the lowest and highest values. Theoretically, it is yet another attempt to get the most "middle" value out of a set of items.

In other words, this attempts to measure how much values tend to spread apart in a given set.

Given a list of [2, 4, 5, 7, 8], the lowest and highest values are 2 and 8.
Thus, the range is 8 - 2 = 6

Note that this can be less than useful when the data set has outliers. This is because it only uses extreme endpoints.
Template:ToDo

If we introduce a new value of 100 to our above list, we get [2, 4, 5, 7, 8, 100].
The lowest and highest values are now 2 and 100.
Thus, the range is now 100 - 2 = 98.
This isn't very descriptive of our data anymore.

Variance

This is an alternative to the Range, which also attempts to measure how much values tend to spread apart in a given set.
Unlike Range, this approach handles outliers much better, so the result is generally more representative of your data.

This is one of the more regularly used values in statistics, and fairly important to understand.
For a full, detailed explanation, see [this youtube video].

Tip: Often, this is represented as

\sigma ^{2}

, which is the lowercase character for "sigma".

Scientific Notation:

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sigma^2 = \frac{\sum_{i=1}^n (x_i - \bar{x})^2}{n}}

Direct Notation:

Given a list of n terms, [x_0, x_1, ..., x_{n-1}, x_n] with a mean of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \bar{x}}
:
 
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sigma^2 = \frac{(n_0 - \bar{x})^2 + (n_1 - \bar{x})^2 +, ..., + (x_{n-1} - \bar{x})^2 + (x_n - \bar{x})^2}{n}}

Example:

Using our above #Mean/Average example, we have a list of [1, 4, 7, 5, 9, 9, 2, 10] with a mean of 5.875.
 
Step 1) Sum with mean: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (1 - 5.875)^2 + (4 - 5.875)^2 + (7 - 5.875)^2 + (5 - 5.875)^2 + (9 - 5.875)^2 + (9 - 5.875)^2 + (2 - 5.875)^2 + (10 - 5.875)^2}

Step 2) Sum result: 80.875
Step 3) Divide sum by count: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \frac{80.875}{8}}
 = 10.109375

Standard Deviation

The Standard Deviation is the same as the Variance, but taken a step further.

Note the the variance is meant to determine "how far values in a set stray from our mean". But in doing so, we square our values in order to counteract inevitable negative values.
Standard Deviation is meant to correct this, by taking our final value and square rooting it.

The end result is a value that indicates, generally speaking, how far each item in our list is from the mean.

Tip: Often, this is represented as Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sigma} , which is the lowercase character for "sigma".

Scientific Notation:

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sigma = \sqrt{\frac{\sum_{i=1}^n (x_i - \bar{x})^2}{n}}}

Direct Notation:

Given a list of n terms, [x_0, x_1, ..., x_{n-1}, x_n] with a mean of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \bar{x}}
:
 
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sigma = \sqrt{\frac{(n_0 - \bar{x})^2 + (n_1 - \bar{x})^2 +, ..., + (x_{n-1} - \bar{x})^2 + (x_n - \bar{x})^2}{n}}}

Example:

Using our above #Mean/Average example, we have a list of [1, 4, 7, 5, 9, 9, 2, 10] with a variation of 10.109375.
 
At this point, all that's missing is the square root.
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sqrt{10.109375}}
 = 3.179524335

Covariance

While variance indicates how values of a single variable tend to behave, covariance indicates how two variables behave in relation to each other.

A value of 0 indicates that there is nearly no correlation between variables.

A positive value indicates a positive correlation. That is, as x increases, so does y. And as y increases, so does x. Larger values indicate a stronger correlation.

A negative value indicates a negative correlation. That is, as x increases, y decreases. And as y increases, x decreases. Larger values indicate a stronger correlation.

Note: Covariance values will change depending on the scale of initial variables.

Scientific Notation:

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \frac{1}{n - 1}\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})}

Where

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \bar{x}} is the mean of our x values.
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \bar{y}} is the mean of our y values.

For further explanation, see this youtube video.

@@ Line 157: / Line 157: @@
 * <math>\bar{x}</math> is the [[#Mean|mean]] of our x values.
 * <math>\bar{y}</math> is the [[#Mean|mean]] of our y values.
+For further explanation, see [https://youtu.be/0nZT9fqr2MU this youtube video].

Mathematics/Statistics/Core Measurements: Difference between revisions

Revision as of 12:40, 18 May 2020

Contents

Mean/Average

Standard (Unweighted) Mean

Weighted Mean

Median

Mode

Range

Variance

Standard Deviation

Covariance

Navigation menu

Mathematics/Statistics/Core Measurements: Difference between revisions

Revision as of 12:40, 18 May 2020

Mean/Average

Standard (Unweighted) Mean

Weighted Mean

Median

Mode

Range

Variance

Standard Deviation

Covariance

Navigation menu

Search