Mathematics/Statistics/Correlation Coefficients: Difference between revisions
Brodriguez (talk | contribs) (Expand page) |
Brodriguez (talk | contribs) m (Brodriguez moved page Statistics/Correlation Coefficients to Mathematics/Statistics/Correlation Coefficients) |
Latest revision as of 17:21, 25 October 2020
The Correlation Coefficient is a value that describes "how well can a straight line fit this data". It is similar to covariance except that a correlation coefficient will always be between [-1, 1].
A value of exactly 1 indicates that there is a strong positive correlation between x and y. That is, as x increases, so does y. And as y increases, so does x.
A value of exactly -1 indicates that there is a strong negative correlation between x and y. That is, as x increases, y decreases. And as y increases, x decreases.
As values approach 0, it indicates a weaker and weaker correlation, with 0 indicating that there is absolutely no correlation between x and y.
There are a few ways to calculate a correlation coefficient.
Pearson's Correlation Coefficient
This is one of the most popular forms of calculating a correlation coefficient.
The equation to calculate the Pearson Correlation Coefficient is
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle r = \frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{(n\sum x^2-(\sum x)^2)(n\sum y^2-(\sum y)^2)}}}
Where
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sum x} is the sum of all our original x values in our dataset.
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sum y} is the sum of all our original y values in our dataset.
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sum x^2} is the sum of all our x values, after squaring them first.
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sum y^2} is the sum of all our y values, after squaring them first.
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sum xy} is the sum of all our x and y pairs, after multiplying together first.
For additional explanation, see this youtube video.
Sample Correlation Coefficient
The equation to calculate the Sample Correlation Coefficient is
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle r = \frac{1}{n - 1}\sum_{i=1}{n}(\frac{x_i-\bar{x}}{S_x})(\frac{y_i-\bar{y}}{S_y})}
Where
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x} is the mean of our x.
- is the mean of our y.
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle S_y} is the standard deviation of our x.
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle S_x} is the standard deviation of our y.
For further explanation, see this Khan Academy video.