推荐:数据科学课程和书籍清单以及培训讲座


一亩三分地论坛

 找回密码
 获取更多干活,快来注册

一亩三分地官方iOS手机应用下载
查看: 1038|回复: 1
收起左侧

[BigData] 大数据和中小数据在具体分析技术上有何区别?

[复制链接] |试试Instant~ |关注本帖
CBC 发表于 2014-8-24 01:19:53 | 显示全部楼层 |阅读模式

注册一亩三分地论坛,查看更多干货!

您需要 登录 才可以下载或查看,没有帐号?获取更多干活,快来注册

x
用regression 举例,一般统计分析建一个线性model 就是那么几步:Data collection and preparation (collect data,preliminary checks, remedy measures),Reduction of number of predictor variables (forward step regression, AIC,SBC, Mallow-C), Model refinement and selection ( investigate curvature and interaction effects, study residual and other diagnostics),Model validation. 所用的软件就是SAS或R。 如果是模式识别用到logistic regression, neural network 什么的。
这是对一般统计中的中小数据而言的,那么在大数据中分析方法会有何不同?
小K 发表于 2014-8-24 12:37:12 | 显示全部楼层
关注一亩三分地公众号:
Warald_一亩三分地
for 1, big data is too big to fit into memory, i.e. you can't fit even a simple linear regression straight from R

2.
big data != bigger in size.
other than "volume", big data also has "variety and velocity", the base R itself dont handle either one well. there are packages that can deal with lots of data quickly. For SAS, i think couple of yrs ago it's been shown that Revolution R runs faster than SAS.

neither is good at dealing with variety, so far.

not all data comes in "rectangle" shape which readily feeds into your regression model, and due to all 3 v's it is generally better to control the data extraction and cleaning process, instead of relying on sas programmers to clean data for you, as those in pharma would do...

评分

2

查看全部评分

回复 支持 反对

使用道具 举报

本版积分规则

关闭

一亩三分地推荐上一条 /5 下一条

手机版|小黑屋|一亩三分地论坛声明

custom counter

GMT+8, 2017-7-27 02:58

Powered by Discuz! X3

© 2001-2013 Comsenz Inc. Design By HUXTeam

快速回复 返回顶部 返回列表