统计方法
mean var std
value_counts计算值出现的次数
group_by类似sql的分组聚合
pivot_table透视表,数据交叉分析中常用
#statistic
print(df.mean()) #均值
print(df.var()) #方差
print("~~~~~~~~~~~~~~~~~~~~************")
s=pd.Series([1,2,2,np.nan,5,7,9,10],index=dates) #构架series
print(s)
print(s.shift(2))
print(s.diff())
print(s.value_counts()) #返回值可以用来绘制直方图
print(df.apply(np.cumsum)) #累加
print(df.apply(lambda x:x.max()-x.min())) #极差:每个属性最大值减最小值
#concat
pieces=[df[:3],df[-3:]] #用下标获取到dataframe的一部分,前三行和后三行拼接在一起
print(pd.concat(pieces))
left=pd.DataFrame({"key":["x","y"],"value":[1,2]}) #建立两个dataframe
right=pd.DataFrame({"key":["x","z"],"value":[3,4]})
print("LEFT:",left)
print("RIGHT:",right)
print(pd.merge(left,right,on="key",how="inner"))
df3=pd.DataFrame({"A":["a","b","c","b"],"B":list(range(4))})
print(df3.groupby("A").sum())
# #Reshape
import datetime
df4=pd.DataFrame({'A':['one','one','two','three']*6, #此表格共有24行
'B':['a','b','c']*8,
'C':['foo','foo','foo','bar','bar','bar']*4,
'D':np.random.randn(24), #随机数
'E':np.random.randn(24),
'F':[datetime.datetime(2017,i,1) for i in range(1,13)]+
[datetime.datetime(2017,i,15) for i in range(1,13)]})
# pivot_table透视表
print(pd.pivot_table(df4,values="D",index=["A","B"],columns=["C"]))
#数据透视表
# df = pd.Series([1,2,4,np.nan,5,6,7,10],index=dates)
df.mean() ## 均值
df.vaar() ## 方差
df.shift(2) ## 右移两位
df.value_counts() ## 统计出现的值的次数 -- 直方图
df.apply(np.cumsum) #累加值
## Concat
pieces = [df[:3],df[-3:]]
pd.concat(pieces))
left join
本节代码 #pandas表统计与整合 #均值 print(df.mean()) #方差 print(df.var()) s=pd.Series([1,2,4,np.nan,5,7,9,10],index=dates) print(s) print(s.shift(2)) print(s.diff()) print(s.value_counts()) #累加 print(df.apply(np.cumsum)) #极差 print(df.apply(lambda x:x.max()-x.min())) #表格拼接 pieces=[df[:3],df[-3:]] print (pieces) left=pd.DataFrame({"key":["x","y"],"value":[1,2]}) right=pd.DataFrame({"key":["x","z"],"value":[3,4]}) print("LEFT:",left) print("RIGHT:",right) print(pd.merge(left,right,on="key",how="outer")) df3=pd.DataFrame({"A":["a","b","c","d"],"B":list(range(4))}) print(df3.groupby("A").sum())
#Concat 拼接 pieces = [df[:3],df[-3:]] print(pd.concat(pieces))
用contact对数组进行拼接,比如前三行和后三行拼接
apply()可以直接应用函数,或者应用自定义函数
shift()数字移动,diff()减法,value_counts()某个值出现的次数。