Mathematics/Statistics/Basic Graphical Displays

From Dev Wiki
< Mathematics‎ | Statistics
Revision as of 00:30, 14 May 2020 by Brodriguez (talk | contribs) (Create page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:ToDo


Box Plot

Bar Chart

Histogram

Quantile Plot

Q-Q Plot

Aka "quantile-quantile" plot.

Graphs quantiles of one data set against the quantiles of another, along with a 45-degree reference line.
One quantile is mapped across the x-axis and the other across the y-axis.
If both data sets come from a population with the same distribution, then they'll approximately match the reference line.
The farther the mapping is from the line, the more likely the two sets come from populations with different distributions.

Example

For example, given a data set we want to know more about, we might ask "is this data normally distributed".

To answer this question, we can use a Q-Q plot.

  1. Divide your data set into an appropriate number of quantiles. If it's a small dataset, then you can divide so that each item represents a quantile.
  2. Find another data set that is known to be a normal distribution. (we'll call this our "comparison data").
  3. Divide our comparison data into the same number of quantiles. For example, if our data set is divided into 27 quantiles, then we divide our comparison data into 27 quantiles as well.
  4. At t his point, we can start creating our graph. The quantiles of our comparison data should map across the x-axis, and the quantiles of our data set should map across the y-axis. For each quantile mapping, draw a dotted line.
  5. Where equivalent quantiles intersect, we can draw a point. For example, at the location that quantile 5 of our data set intersects quantile 5 of our comparison data, we draw a point.
  6. After all points are drawn, erase all the dotted lines indicating quantiles.
  7. Finally, draw a straight line through your graph, with a slope of 1. If the two data sets have the same distribution type, then the points should match this line fairly well. The farther they stray from this line, the less likely the distributions match.

Template:Add imagines to help describe example. Possibly reference https://www.youtube.com/watch?v=okjYjClSjOg