Mathematics/Statistics/Data Characteristics: Difference between revisions
Brodriguez (talk | contribs) (Add min and max) |
Brodriguez (talk | contribs) (Add percentiles and quartiles) |
||
Line 7: | Line 7: | ||
== Min == | == Min == | ||
Given a set of items, the | Given a set of items, the Min is the lowest single unique value. | ||
For example, given a set of [1, 1, 2, 2, 3, 3, 4, 4, 5, 5], 1 is the min. | For example, given a set of [1, 1, 2, 2, 3, 3, 4, 4, 5, 5], 1 is the min. | ||
== Max == | == Max == | ||
Given a set of items, the | Given a set of items, the Max is the highest single unique value. | ||
For example, given a set of [1, 1, 2, 2, 3, 3, 4, 4, 5, 5], 5 is the max. | For example, given a set of [1, 1, 2, 2, 3, 3, 4, 4, 5, 5], 5 is the max. | ||
== Percentile == | |||
Given a set of items, a Percentile is the value that marks 25%, 50% or 75% in your data. | |||
In other words, find the [[Statistics - Core Measurements#Median | median]] of your data. This marks the 50th percentile.<br> | |||
This splits your data into two halves. At which point you find the median for each half, giving you the 25th and 75th percentile. | |||
For example, given a set of [1, 2, 3, 4, 5, 6], we have the following percentiles: | |||
25th - 2 | |||
50th - 3.5 | |||
75th - 5 | |||
== Quantiles == | == Quantiles == | ||
Given a set of items, a Quartile is a segment of the data when split up into four chunks. Note that the set must be ordered for this. | |||
There are four distinct quartiles with unique names:<br> | |||
Quartile 1 (Q1) - The lowest 25% of numbers in the set.<br> | |||
Quartile 2 (Q2) - The second lowest 25% of numbers in the set.<br> | |||
Quartile 3 (Q3) - The second highest 25% of numbers in the set.<br> | |||
Quartile 4 (Q4) - The highest 25% of numbers in the set. | |||
In other words, sort your data and divide it into quarters. Each quarter is a quartile, with Q1 being the lowest values and Q4 being the highest. | |||
For example, given a set of [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], we have the following quartiles: | |||
Q1 - [1, 2, 3] | |||
Q2 - [4, 5, 6] | |||
Q3 - [7, 8, 9] | |||
Q4 - [10, 11, 12] | |||
While the above example splits neatly, not all sets will perfectly divide into fourths.<br> | |||
In these cases, a more scientific approach is to find the [[#Percentile | percentiles]], and then use these as the boundaries for the quartiles.<br> | |||
In instances where an odd number of values are being split (so the percentile could arguably be in either quartile), one of two methods are common: | |||
# Include the percentile value in both quartiles. | |||
# Exclude the percentile value from both quartiles. | |||
For example, consider this: | |||
Given a set of [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], we have the following percentiles: | |||
25th - 3 | |||
50th - 6 | |||
75th - 9 | |||
If including, we first split into [1, 2, 3, 4, 5, 6] and [6, 7, 8, 9, 10, 11]. | |||
Then we split again to get our quartiles. | |||
Q1 - [1, 2, 3] | |||
Q2 - [4, 5, 6] | |||
Q3 - [6, 7, 8] | |||
Q4 - [9, 10, 11] | |||
If excluding, we first split into [1, 2, 3, 4, 5] and [7, 8, 9, 10, 11]. | |||
Then we split again to get our quartiles: | |||
Q1 - [1, 2] | |||
Q2 - [4, 5] | |||
Q3 - [7, 8] | |||
Q4 - [10, 11] | |||
== Outliers == | == Outliers == | ||
Given a set of items, an outlier is an item that does not fit in with the rest. It's usually extremely low or extremely high, compared to the other values in the set. | Given a set of items, an outlier is an item that does not fit in with the rest. It's usually extremely low or extremely high, compared to the other values in the set. | ||
For example, given a set of [1, 2, 3, 4, 50], 50 is an outlier, as it's much higher than the rest of the values. | For example, given a set of [1, 2, 3, 4, 50], 50 is an outlier, as it's much higher than the rest of the values. |
Revision as of 11:47, 12 May 2020
The following details some basic characteristics of data in statistics.
See also Core Measurements.
Min
Given a set of items, the Min is the lowest single unique value.
For example, given a set of [1, 1, 2, 2, 3, 3, 4, 4, 5, 5], 1 is the min.
Max
Given a set of items, the Max is the highest single unique value.
For example, given a set of [1, 1, 2, 2, 3, 3, 4, 4, 5, 5], 5 is the max.
Percentile
Given a set of items, a Percentile is the value that marks 25%, 50% or 75% in your data.
In other words, find the median of your data. This marks the 50th percentile.
This splits your data into two halves. At which point you find the median for each half, giving you the 25th and 75th percentile.
For example, given a set of [1, 2, 3, 4, 5, 6], we have the following percentiles: 25th - 2 50th - 3.5 75th - 5
Quantiles
Given a set of items, a Quartile is a segment of the data when split up into four chunks. Note that the set must be ordered for this.
There are four distinct quartiles with unique names:
Quartile 1 (Q1) - The lowest 25% of numbers in the set.
Quartile 2 (Q2) - The second lowest 25% of numbers in the set.
Quartile 3 (Q3) - The second highest 25% of numbers in the set.
Quartile 4 (Q4) - The highest 25% of numbers in the set.
In other words, sort your data and divide it into quarters. Each quarter is a quartile, with Q1 being the lowest values and Q4 being the highest.
For example, given a set of [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], we have the following quartiles: Q1 - [1, 2, 3] Q2 - [4, 5, 6] Q3 - [7, 8, 9] Q4 - [10, 11, 12]
While the above example splits neatly, not all sets will perfectly divide into fourths.
In these cases, a more scientific approach is to find the percentiles, and then use these as the boundaries for the quartiles.
In instances where an odd number of values are being split (so the percentile could arguably be in either quartile), one of two methods are common:
- Include the percentile value in both quartiles.
- Exclude the percentile value from both quartiles.
For example, consider this:
Given a set of [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], we have the following percentiles: 25th - 3 50th - 6 75th - 9
If including, we first split into [1, 2, 3, 4, 5, 6] and [6, 7, 8, 9, 10, 11]. Then we split again to get our quartiles. Q1 - [1, 2, 3] Q2 - [4, 5, 6] Q3 - [6, 7, 8] Q4 - [9, 10, 11]
If excluding, we first split into [1, 2, 3, 4, 5] and [7, 8, 9, 10, 11]. Then we split again to get our quartiles: Q1 - [1, 2] Q2 - [4, 5] Q3 - [7, 8] Q4 - [10, 11]
Outliers
Given a set of items, an outlier is an item that does not fit in with the rest. It's usually extremely low or extremely high, compared to the other values in the set.
For example, given a set of [1, 2, 3, 4, 50], 50 is an outlier, as it's much higher than the rest of the values.