Mathematics/Statistics/Normalization
Normalization is a method to make values fall within a "common range".
For example, in Data Mining and Neural Networks, it's common to normalize values so that they fall into the (inclusive) range of [-1, 1] or [0, 1].
Normalization keeps the ratio of values in an attribute, while ensuring that no single attribute has a significantly larger range than the others. Discrepancy in the ranges an attribute spans may cause one attribute to have more weight (and thus importance) in statistical analysis than other attributes, even when no such correlation should otherwise be expected to exist.
For example, if trying to run analysis with "weight" and "height" attributes for a population of individuals, the unit of measurement used will implicitly change how much importance each attribute has in the analysis. Alternatively, we can normalize them both to a range between [0, 1], so that both attributes have equal importance in analysis, regardless of units of measurement.
Below are some common normalization types.
Min-Max Normalization
Min-max normalization is a form of simple linear transformation, where values are simply scaled down to the desired interval.
Where:
- is the original value of the item to adjust.
- is the original minimum of the given attribute.
- is the original maximum of the given attribute.
- is the new minimum to use for the given attribute.
- is the new maximum to use for the given attribute.
Zero-Mean Normalization
Also known as "z-score normalization".
Zero-Mean Normalization normalizes attribute values based on the mean and standard deviation.
Where:
- is the original value of the individual dataset item to adjust.
- is the attribute mean.
- is the attribute standard deviation.