Mathematics/Statistics/Normalization: Difference between revisions
Brodriguez (talk | contribs) (Create page) |
Brodriguez (talk | contribs) (Add normalization methods) |
||
Line 5: | Line 5: | ||
For example, if trying to run analysis with "weight" and "height" attributes for a population of individuals, the unit of measurement used will implicitly change how much importance each attribute has in the analysis. Alternatively, we can normalize them both to a range between [0, 1], so that both attributes have equal importance in analysis, regardless of units of measurement. | For example, if trying to run analysis with "weight" and "height" attributes for a population of individuals, the unit of measurement used will implicitly change how much importance each attribute has in the analysis. Alternatively, we can normalize them both to a range between [0, 1], so that both attributes have equal importance in analysis, regardless of units of measurement. | ||
Below are some common normalization types. | |||
== Min-Max Normalization == | |||
Min-max normalization is a form of simple linear transformation, where values are simply scaled down to the desired interval. | |||
{{ Warn | This form of normalization will result in errors if a value added at a later date falls outside of the original data range. In such a case, it's important to re-normalize the entire dataset, using the original data.}} | |||
<math>new\_value = \frac{old\_value - old\_min}{old\_max - old\_min} ( new\_max - new\_min ) + new\_min</math> | |||
Where: | |||
* <math>old\_value</math> is the original value of the item to adjust. | |||
* <math>old\_min</math> is the original [[Statistics/Data_Characteristics#Min|minimum]] of the given attribute. | |||
* <math>old\_max</math> is the original [[Statistics/Data_Characteristics#Max|maximum]] of the given attribute. | |||
* <math>new\_min</math> is the new [[Statistics/Data_Characteristics#Min|minimum]] to use for the given attribute. | |||
* <math>new\_max</math> is the new [[Statistics/Data_Characteristics#Max|maximum]] to use for the given attribute. | |||
== Zero-Mean Normalization == | |||
Also known as "z-score normalization". | |||
Zero-Mean Normalization normalizes attribute values based on the mean and standard deviation. | |||
<math>new\_value = \frac{old\_value - \bar{x}}{\sigma}</math> | |||
Where: | |||
* <math>old\_value</math> is the original value of the individual dataset item to adjust. | |||
* <math>\bar{x}</math> is the attribute [[Statistics/Core_Measurements#Mean|mean]]. | |||
* <math>\sigma</math> is the attribute [[Statistics/Core_Measurements#Standard Deviation|standard deviation]]. | |||
== Normalization by Decimal Scaling == | |||
{{ ToDo | Fill in section. }} |
Revision as of 14:51, 17 May 2020
Normalization is a method to make values fall within a "common range".
For example, in Data Mining and Neural Networks, it's common to normalize values so that they fall into the (inclusive) range of [-1, 1] or [0, 1].
Normalization keeps the ratio of values in an attribute, while ensuring that no single attribute has a significantly larger range than the others. Discrepancy in the ranges an attribute spans may cause one attribute to have more weight (and thus importance) in statistical analysis than other attributes, even when no such correlation should otherwise be expected to exist.
For example, if trying to run analysis with "weight" and "height" attributes for a population of individuals, the unit of measurement used will implicitly change how much importance each attribute has in the analysis. Alternatively, we can normalize them both to a range between [0, 1], so that both attributes have equal importance in analysis, regardless of units of measurement.
Below are some common normalization types.
Min-Max Normalization
Min-max normalization is a form of simple linear transformation, where values are simply scaled down to the desired interval.
Where:
- is the original value of the item to adjust.
- is the original minimum of the given attribute.
- is the original maximum of the given attribute.
- is the new minimum to use for the given attribute.
- is the new maximum to use for the given attribute.
Zero-Mean Normalization
Also known as "z-score normalization".
Zero-Mean Normalization normalizes attribute values based on the mean and standard deviation.
Where:
- is the original value of the individual dataset item to adjust.
- is the attribute mean.
- is the attribute standard deviation.