继续浏览精彩内容
慕课网APP
程序员的梦工厂
打开
继续
感谢您的支持,我会继续努力的
赞赏金额会直接到老师账户
将二维码发送给自己后长按识别
微信支付
支付宝支付

Normalization Methods

乌然娅措
关注TA
已关注
手记 64
粉丝 22
获赞 13

Normalization

Data transformation is one of the critical steps in Data Mining. Among many data transformation methods, normalization is a most frequently used technique. For example, we can use Z-score normalization to reduce possible noise in sound frequency.

We will introduce three common normalization method, Max-Min Normalization, Z-Score Normalization, Scale multiplication.

Max-Min Normalization
xnormal=(xmin(x))(max(x)min(x))x_{normal}= \frac{(x- min(x))}{(max(x)- min(x))}
it will scale all the data between 0 and 1.
Example:
Chinese high schools use 150 point scale, USA high schools use 100 point scale and Russian high schools use 5 point scale.

`

Z-Score Normalization

Xznormal=(Xmean)sdX_{z-normal}= \frac{(X- mean)}{sd}
It will transform the data in units relative to the standard deviation.
Example:
It is useful when comparing data sets with different units (cm and inch).

Scale multiplication

$ Z_{z-normal} =X*10 or Z_{z-normal} =X/10$
It will transform the data in scales of muliple of 10.
Example:
Some money transactions are too large, we will divide 1000 to make it viewer friendly.

Code

import random
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import colors
from matplotlib.ticker import PercentFormatter
from matplotlib import pylab


y=random.sample(range(0,150),50)
x=list(map(int,y))
x1=np.array(x)
xmin=min(x)
xmax=max(x)

#Max-Min normalization
mmnorm=(x1 - xmin)/(xmax-xmin)
#plot

fig,axs=plt.subplots(1,2,sharey=True)

#Original random number
axs[0].hist(x, bins=10)
axs[0].title.set_text("Random Data")


#Max-Min normalizaed histogram Plot
axs[1].hist(mmnorm, bins=10,color="lightblue")
plt.title("Max-Min Normalized Data")
plt.show()

#Z-score Normalization

y2=random.sample(range(0,150),50)
x2=list(map(int,y3))
x21=np.array(x2)
mean=np.mean(x21)
sd=np.std(x21)


#scale normalization
znorm=(x21-mean)/sd

#plot

fig,axs=plt.subplots(1,2,sharey=True)

#Original random number
axs[0].hist(x2, bins=10, color="green")
axs[0].title.set_text("Random Data")


#scale normalizaed histogram Plot
axs[1].hist(znorm, bins=10,color="lightgreen")
plt.title("Z-score Normalized Data")
plt.show()

#scale

y3=random.sample(range(1000,10000),50)
x3=list(map(int,y3))
x31=np.array(x3)

#scale normalization
snorm=x31/1000

#plot

fig,axs=plt.subplots(1,2,sharey=True)

#Original random number
axs[0].hist(x3, bins=10, color="orange")
axs[0].title.set_text("Random Data")


#scale normalizaed histogram Plot
axs[1].hist(snorm, bins=10,color="yellow")
plt.title("Scale Normalized Data")
plt.show()
打开App,阅读手记
0人推荐
发表评论
随时随地看视频慕课网APP