一亩三分地论坛

 找回密码
 获取更多干货,去instant注册!

扫码关注一亩三分地公众号
查看: 518|回复: 1
收起左侧

[BigData] 大数据和中小数据在具体分析技术上有何区别?

[复制链接] |试试Instant~ |关注本帖
CBC 发表于 2014-8-24 01:19:53 | 显示全部楼层 |阅读模式

注册一亩三分地论坛,查看更多干货!

您需要 登录 才可以下载或查看,没有帐号?获取更多干货,去instant注册!

x
用regression 举例,一般统计分析建一个线性model 就是那么几步:Data collection and preparation (collect data,preliminary checks, remedy measures),Reduction of number of predictor variables (forward step regression, AIC,SBC, Mallow-C), Model refinement and selection ( investigate curvature and interaction effects, study residual and other diagnostics),Model validation. 所用的软件就是SAS或R。 如果是模式识别用到logistic regression, neural network 什么的。.1point3acres缃
这是对一般统计中的中小数据而言的,那么在大数据中分析方法会有何不同?
.鐣欏璁哄潧-涓浜-涓夊垎鍦
小K 发表于 2014-8-24 12:37:12 | 显示全部楼层
for 1, big data is too big to fit into memory, i.e. you can't fit even a simple linear regression straight from R
-google 1point3acres
2.
big data != bigger in size.
other than "volume", big data also has "variety and velocity", the base R itself dont handle either one well. there are packages that can deal with lots of data quickly. For SAS, i think couple of yrs ago it's been shown that Revolution R runs faster than SAS.

neither is good at dealing with variety, so far.

not all data comes in "rectangle" shape which readily feeds into your regression model, and due to all 3 v's it is generally better to control the data extraction and cleaning process, instead of relying on sas programmers to clean data for you, as those in pharma would do...

评分

2

查看全部评分

回复 支持 反对

使用道具 举报

本版积分规则

请点这里访问我们的新网站:一亩三分地Instant.

Instant搜索更强大,不扣积分,内容组织的更好更整洁!目前仍在beta版本,努力完善中!反馈请点这里

关闭

一亩三分地推荐上一条 /5 下一条

手机版|小黑屋|一亩三分地论坛声明 ( 沪ICP备11015994号 )

custom counter

GMT+8, 2016-12-8 18:15

Powered by Discuz! X3

© 2001-2013 Comsenz Inc. Design By HUXTeam

快速回复 返回顶部 返回列表