继续浏览精彩内容
慕课网APP
程序员的梦工厂
打开
继续
感谢您的支持,我会继续努力的
赞赏金额会直接到老师账户
将二维码发送给自己后长按识别
微信支付
支付宝支付

Mean vs Median

乌然娅措
关注TA
已关注
手记 64
粉丝 21
获赞 12

R day 2:

I was working on a dataset of Airbnb in New York City from Kaggle, when i run the summary function for the price variable in R, i noticed there’s a strong difference between Mean and Median of the variable.

summary(ab$price)

Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 69.0 106.0 152.7 175.0 10000.0

In this case, which variable is more persuasive? Mean or Median.

In order to answer this question, we will run the density distribution of the price variable first.
As the graph shows, the price density distribution is extremely skewed to the left.

Can you guess which one would make more sense?
Yes, it is the median value that tells a better story about Airbnb price in NYC !

d1<- ggplot(ab, aes(price))+geom_density(alpha=0.2)
d1

What if the data is not skewed or just slightly skewed?

In this case, Mean Value is very reliable to describe the central tendency of the data

carrots <- data.frame(length = rnorm(100000, 6, 2))
cukes <- data.frame(length = rnorm(50000, 7, 2.5))

#Now, combine your two dataframes into one.  First make a new column in each.
carrots$veg <- 'carrot'
cukes$veg <- 'cuke'

#and combine into your new data frame vegLengths
vegLengths <- rbind(carrots, cukes)

#now make your lovely plot
p <- ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)

p

by examining the density distributions of data, now we have a conclusion.

Conclusion:

if a data distribution is Normal/slightly Skewed the Mean Value shows the Central Tendency of the dataset.
Whereas if the data is skewed, then the Median is a more intuitive measurement.

Thanks to Jun.z, who is willing to share with me about all the stats tricks.

REF:

打开App,阅读手记
0人推荐
发表评论
随时随地看视频慕课网APP