Mathematics/Statistics/Basic Graphical Displays

From Dev Wiki
Jump to navigation Jump to search

Template:ToDo


Box Plot

Bar Chart

Histogram

A histogram is a visual representation of a dataset, based on groupings of the data. Template:ToDo

Creating a Histogram

Given a dataset:

  1. Organize dataset values into meaningful buckets. Template:ToDo
  2. Graph each of these buckets with onto a bar chart.
  3. If the buckets have meaningful ordering, then leave as is. Otherwise, it might be helpful to reorder them now to give a more meaningful visual representation.
  4. To make our histogram more useful, we now probably want to create a density curve to fit our data. We create this by drawing a single line that connects the top of all bars in our graph.

Histogram Types

  • Unimodal - A basic histogram, which has a single curve.
  • Multi-Modal - A histogram which has two or more curves.
  • Symmetric - An even distribution of the histogram, where the mean and median are the same.
  • Skewed - The histogram data is clumped to the right or left, and thus creates a non-symmetric distribution.
    • Left/Negatively Skewed - The left-side tail is longer in the distribution. This indicates that the mean is to the left of the median.
    • Right/Positively Skewed - The right-side tail is longer in the distribution. This indicates that the mean is to the right of the median.


Q-Q Plot

Aka "quantile-quantile" plot.

Graphs quantiles of one data set against the quantiles of another, along with a 45-degree reference line.
One quantile is mapped across the x-axis and the other across the y-axis.
If both data sets come from a population with the same distribution, then they'll approximately match the reference line.
The farther the mapping is from the line, the more likely the two sets come from populations with different distributions.

Example

For example, given a data set we want to know more about, we might ask "is this data normally distributed".

To answer this question, we can use a Q-Q plot.

  1. Divide your data set into an appropriate number of quantiles. If it's a small dataset, then you can divide so that each item represents a quantile.
  2. Find another data set that is known to be a normal distribution. (we'll call this our "comparison data").
  3. Divide our comparison data into the same number of quantiles. For example, if our data set is divided into 27 quantiles, then we divide our comparison data into 27 quantiles as well.
  4. At t his point, we can start creating our graph. The quantiles of our comparison data should map across the x-axis, and the quantiles of our data set should map across the y-axis. For each quantile mapping, draw a dotted line.
  5. Where equivalent quantiles intersect, we can draw a point. For example, at the location that quantile 5 of our data set intersects quantile 5 of our comparison data, we draw a point.
  6. After all points are drawn, erase all the dotted lines indicating quantiles.
  7. Finally, draw a straight line through your graph, that matches your points as best as possible. If the two data sets have the same distribution type, then the points should match this line fairly well. The farther they stray from this line, the less likely the distributions match.

Template:ToDo


Scatter Plot

A plot where each pair of values is treated as a pair of coordinates. One value maps across the x-axis and the other value maps across the y-axis.

Scatter plots are used to see how data clusters together, and if there are any outliers.
They can also be used to see if part or all of the data has a positive/negative correlation.

  • Positive Correlation - If one axis raises, then the other will also raise.
  • Negative Correlation - If one axis raises, then the other will lower.
ToDo: Add images of scatter plots and correlation.