亚麻OA求砸,面经神衣护体!


一亩三分地论坛

 找回密码
 Sign Up 注册获取更多干货
天天打游戏、照样领工资、还办H1B
这份工作你要不要?
把贵司招聘信息放这里
查看: 1055|回复: 1
收起左侧

[BigData] 大数据和中小数据在具体分析技术上有何区别?

[复制链接] |试试Instant~ |关注本帖
CBC 发表于 2014-8-24 01:19:53 | 显示全部楼层 |阅读模式

注册一亩三分地论坛,查看更多干货!

您需要 登录 才可以下载或查看,没有帐号?Sign Up 注册获取更多干货

x
用regression 举例,一般统计分析建一个线性model 就是那么几步:Data collection and preparation (collect data,preliminary checks, remedy measures),Reduction of number of predictor variables (forward step regression, AIC,SBC, Mallow-C), Model refinement and selection ( investigate curvature and interaction effects, study residual and other diagnostics),Model validation. 所用的软件就是SAS或R。 如果是模式识别用到logistic regression, neural network 什么的。
这是对一般统计中的中小数据而言的,那么在大数据中分析方法会有何不同?
小K 发表于 2014-8-24 12:37:12 | 显示全部楼层
for 1, big data is too big to fit into memory, i.e. you can't fit even a simple linear regression straight from R. 鐗涗汉浜戦泦,涓浜╀笁鍒嗗湴

2.
big data != bigger in size.
other than "volume", big data also has "variety and velocity", the base R itself dont handle either one well. there are packages that can deal with lots of data quickly. For SAS, i think couple of yrs ago it's been shown that Revolution R runs faster than SAS.
.鏈枃鍘熷垱鑷1point3acres璁哄潧
neither is good at dealing with variety, so far.
. 鐗涗汉浜戦泦,涓浜╀笁鍒嗗湴
not all data comes in "rectangle" shape which readily feeds into your regression model, and due to all 3 v's it is generally better to control the data extraction and cleaning process, instead of relying on sas programmers to clean data for you, as those in pharma would do...

评分

2

查看全部评分

回复 支持 反对

使用道具 举报

本版积分规则

关闭

一亩三分地推荐上一条 /5 下一条

手机版|小黑屋|一亩三分地论坛声明

custom counter

GMT+8, 2017-10-18 17:05

Powered by Discuz! X3

© 2001-2013 Comsenz Inc. Design By HUXTeam

快速回复 返回顶部 返回列表