没结婚也能买房啊!大波士顿地区买房小tips

一亩三分地论坛

 找回密码
 Sign Up 注册获取更多干货
码农求职神器Triplebyte:
不用海投,内推多家公司面试
[Google级团队]:实时大数据分析领域践行者
北京/深圳-大数据/搜索/机器学习职位
日志易机器大数据行业践行者Web/大数据/机器学习等职位-北京or深圳
把贵司招聘信息放这里
查看: 1147|回复: 1
收起左侧

大数据和中小数据在具体分析技术上有何区别?

[复制链接] |试试Instant~ |关注本帖
CBC 发表于 2014-8-24 01:19:53 | 显示全部楼层 |阅读模式

注册一亩三分地论坛,查看更多干货!

您需要 登录 才可以下载或查看,没有帐号?Sign Up 注册获取更多干货

x
用regression 举例,一般统计分析建一个线性model 就是那么几步:Data collection and preparation (collect data,preliminary checks, remedy measures),Reduction of number of predictor variables (forward step regression, AIC,SBC, Mallow-C), Model refinement and selection ( investigate curvature and interaction effects, study residual and other diagnostics),Model validation. 所用的软件就是SAS或R。 如果是模式识别用到logistic regression, neural network 什么的。
这是对一般统计中的中小数据而言的,那么在大数据中分析方法会有何不同?
小K 发表于 2014-8-24 12:37:12 | 显示全部楼层
for 1, big data is too big to fit into memory, i.e. you can't fit even a simple linear regression straight from R

2. . 1point3acres.com/bbs
big data != bigger in size.
other than "volume", big data also has "variety and velocity", the base R itself dont handle either one well. there are packages that can deal with lots of data quickly. For SAS, i think couple of yrs ago it's been shown that Revolution R runs faster than SAS. . 鍥磋鎴戜滑@1point 3 acres

neither is good at dealing with variety, so far.

not all data comes in "rectangle" shape which readily feeds into your regression model, and due to all 3 v's it is generally better to control the data extraction and cleaning process, instead of relying on sas programmers to clean data for you, as those in pharma would do...

评分

2

查看全部评分

回复 支持 反对

使用道具 举报

本版积分规则

提醒:发帖可以选择内容隐藏,部分板块支持匿名发帖。请认真读完以下全部说明:

■隐藏内容方法: [hide=200]你想要隐藏的内容比如面经[/hide]
■意思是:用户积分低于200则看不到被隐藏的内容
■可以自行设置积分值,不建议太高(200以上太多人看不到),也不建议太低(那就没必要隐藏了)
■建议只隐藏关键内容,比如具体的面试题目、涉及隐私的信息,大部分内容没必要隐藏。
■微信/QQ/电子邮件等,为防止将来被骚扰甚至人肉,以论坛私信方式发给对方最安全。
■匿名发帖的板块和方法:http://www.1point3acres.com/bbs/thread-388663-1-1.html

关闭

一亩三分地推荐上一条 /5 下一条

手机版|小黑屋|一亩三分地论坛声明

custom counter

GMT+8, 2018-4-21 15:48

Powered by Discuz! X3

© 2001-2013 Comsenz Inc. Design By HUXTeam

快速回复 返回顶部 返回列表