Mathematics/Statistics/Correlation Coefficients: Difference between revisions

From Dev Wiki
Jump to navigation Jump to search
(Create page)
 
 
(One intermediate revision by the same user not shown)
Line 12: Line 12:


== Pearson's Correlation Coefficient ==
== Pearson's Correlation Coefficient ==
This is one of the most popular forms of calculating a correlation coefficient.
The equation to calculate the '''Pearson Correlation Coefficient''' is
<math>r = \frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{(n\sum x^2-(\sum x)^2)(n\sum y^2-(\sum y)^2)}}</math>
Where
* <math>\sum x</math> is the sum of all our original x values in our dataset.
* <math>\sum y</math> is the sum of all our original y values in our dataset.
* <math>\sum x^2</math> is the sum of all our x values, after squaring them first.
* <math>\sum y^2</math> is the sum of all our y values, after squaring them first.
* <math>\sum xy</math> is the sum of all our x and y pairs, after multiplying together first.
For additional explanation, see [https://youtu.be/jBQz2RGxCek this youtube video].


== Sample Correlation Coefficient ==
== Sample Correlation Coefficient ==


The equation to calculate the '''Sample Correlation Coefficient''' is the following.
The equation to calculate the '''Sample Correlation Coefficient''' is
  <math>r = \frac{1}{n - 1}\sum_{i=1}{n}(\frac{x_i-\bar{x}}{S_x})(\frac{y_i-\bar{y}}{S_y})</math>
  <math>r = \frac{1}{n - 1}\sum_{i=1}{n}(\frac{x_i-\bar{x}}{S_x})(\frac{y_i-\bar{y}}{S_y})</math>
Where
* <math>x</math> is the [[Statistics/Core_Measurements#Mean|mean]] of our x.
* <math>y</math> is the [[Statistics/Core_Measurements#Mean|mean]] of our y.
* <math>S_y</math> is the [[Statistics/Core_Measurements#Standard Deviation|standard deviation]] of our x.
* <math>S_x</math> is the [[Statistics/Core_Measurements#Standard Deviation|standard deviation]] of our y.


For further explanation, see [https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data/scatterplots-and-correlation/v/calculating-correlation-coefficient-r this Khan Academy video].
For further explanation, see [https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data/scatterplots-and-correlation/v/calculating-correlation-coefficient-r this Khan Academy video].

Latest revision as of 17:21, 25 October 2020

Tip: This is often represented as r.

The Correlation Coefficient is a value that describes "how well can a straight line fit this data". It is similar to covariance except that a correlation coefficient will always be between [-1, 1].

A value of exactly 1 indicates that there is a strong positive correlation between x and y. That is, as x increases, so does y. And as y increases, so does x.

A value of exactly -1 indicates that there is a strong negative correlation between x and y. That is, as x increases, y decreases. And as y increases, x decreases.

As values approach 0, it indicates a weaker and weaker correlation, with 0 indicating that there is absolutely no correlation between x and y.

There are a few ways to calculate a correlation coefficient.

Pearson's Correlation Coefficient

This is one of the most popular forms of calculating a correlation coefficient.

The equation to calculate the Pearson Correlation Coefficient is

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle r = \frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{(n\sum x^2-(\sum x)^2)(n\sum y^2-(\sum y)^2)}}}

Where

  • Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sum x} is the sum of all our original x values in our dataset.
  • Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sum y} is the sum of all our original y values in our dataset.
  • Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sum x^2} is the sum of all our x values, after squaring them first.
  • Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sum y^2} is the sum of all our y values, after squaring them first.
  • Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sum xy} is the sum of all our x and y pairs, after multiplying together first.

For additional explanation, see this youtube video.


Sample Correlation Coefficient

The equation to calculate the Sample Correlation Coefficient is

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle r = \frac{1}{n - 1}\sum_{i=1}{n}(\frac{x_i-\bar{x}}{S_x})(\frac{y_i-\bar{y}}{S_y})}

Where

  • Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x} is the mean of our x.
  • is the mean of our y.
  • Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle S_y} is the standard deviation of our x.
  • Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle S_x} is the standard deviation of our y.

For further explanation, see this Khan Academy video.