[八我司] 介绍一下Uber tech stack和各个大组的情况

一亩三分地论坛

 找回密码
 Sign Up 注册获取更多干货
E轮2.5亿美元融资
K12教育独角兽一起作业
北京-诚聘人工智能/教育/大数据岗
坐标湾区
Games Startup
招聘游戏开发工程师
游戏初创公司招聘工程师、UIUX Designer和游戏策划
码农求职神器Triplebyte:
不用海投
内推多家公司面试
把贵司招聘信息放这里
查看: 1044|回复: 4
收起左侧

Project Interview.....come to share opinions?

[复制链接] |试试Instant~ |关注本帖
snowdustdj 发表于 2014-4-9 06:47:11 | 显示全部楼层 |阅读模式

2014(4-6月) 分析|数据科学类 本科 实习@ - 网上海投 - 其他  | Other |

注册一亩三分地论坛,查看更多干货!

您需要 登录 才可以下载或查看,没有帐号?Sign Up 注册获取更多干货

x
之前在编程算法板块发过帖子求助,刚才随便做了做交了,顺便发出来大家一起讨论。。。。因为输入法比较 原始 我就用英语了。。
. 围观我们@1point 3 acres
The HR sent me a project to do, in order to evaluate my skills on the knowledge of Data Science. She asked me not to post the project online, so I will just describe it generally, for you guys to discuss...
There is a 1GB data set, the first column is facebook user ID, the rest columns are their 'likes':basket ball, base ball, walking dead, domino.... just imaging you are on you facebook and you want to click 'like' for some public page, of someone's words, for expample, I posted "I get a girl friend", and you will click 'like' my post...and different people have different number of likes and the likes are not all the same. The data may looks like:
1234455 basketball, olive garden,walking dead, i get a girl frend,really?you love me?,haha haha haha,....
1234667 data science, cs, statistics, weight training, dell, dota2,.......
.......
there are 5w obs, the number of columns are not the same for each obs, and besides english, there are japnese, and many of other characters will my computer can't show them....
object:
1.built a histgram for the likes
In my opinion, it's just count the frenquecy of the different strings, and maybe we can ignore some low frenquency ones. The challenge for me is read the data (cause I only know R, I need ways to handle the memory), and count the frenquency(cause I don't know how many kinds of strings I will have).本文原创自1point3acres论坛
2.built a histgram for the like pairs, eg:(stats,cs), if there are 100 likes for a person, there will be possible 100*99/2 pairs...
3. Given the training set, if you have the likes for a new person, build a recommendation system, recommend some thing to the person
4.Given the training set, for some thing, like 'cs', find the people you should recommend it to.

This is a project interview from a start up, I just felt like it's not the right time for me to be on that position, because I really don't have any good solutions, and I have finals and projects due at the end of this month, so I roughly send my ideas to the HR.

Hope we could have a good discuss about it, and wish it may help some one with future interviews...

ps: the data above are fake, manipulated by myself

评分

2

查看全部评分

本帖被以下淘专辑推荐:

 楼主| snowdustdj 发表于 2014-4-9 06:50:03 | 显示全部楼层
just opened the data set, 26W rows, 70 cols

补充内容 (2014-4-9 06:53):
ignore this one....it's not correct. The data is at a mess and I didn't read it correctly.
回复 支持 反对

使用道具 举报

danielgao 发表于 2014-4-10 00:37:16 | 显示全部楼层
I guess the interview is from FB, and it's probably not a good idea to posted it online....
.本文原创自1point3acres论坛
1. use hashmap or trie trees. The memory is not an issue on the production machine, as those machines have tons of memories . If they really care about memory, use some key-value db and persist the data in the disk/flash. You might need to think about how to process different language characters... and also some words might be different but they may mean the same thing in different languages.

2.Not sure if I understand your questions correctly, do you need to consider the case that like pairs cross person? Or you only need to consider the like pairs within a person?

3. just some basic idea, maybe maintain an like to user id set mapping. So for any given two likes, if there is huge overlap between their id set, you can consider those two likes are close, and then you can recommend similar likes.

4. If you have 3 , should be trivial to do 4
回复 支持 反对

使用道具 举报

 楼主| snowdustdj 发表于 2014-4-10 02:24:33 来自手机 | 显示全部楼层
danielgao 发表于 2014-4-10 00:37
I guess the interview is from FB, and it's probably not a good idea to posted it online....

1. us ...

It's from a startup, not FB...本文原创自1point3acres论坛
I don't know hash map, or hash table, any quick ways to learn and apply?

The second task is, I think, find the cross pairs for each person, say for me: a b c, then I need find out the frequency  of pair (ab) among all the people.
回复 支持 反对

使用道具 举报

阿骄 发表于 2015-11-21 11:20:30 | 显示全部楼层
这题很适合用 Spark 啊。
回复 支持 反对

使用道具 举报

本版积分规则

提醒:发帖可以选择内容隐藏,部分板块支持匿名发帖。请认真读完以下全部说明:

■隐藏内容方法: [hide=200]你想要隐藏的内容比如面经[/hide]
■意思是:用户积分低于200则看不到被隐藏的内容
■可以自行设置积分值,不建议太高(200以上太多人看不到),也不建议太低(那就没必要隐藏了)
■建议只隐藏关键内容,比如具体的面试题目、涉及隐私的信息,大部分内容没必要隐藏。
■微信/QQ/电子邮件等,为防止将来被骚扰甚至人肉,以论坛私信方式发给对方最安全。
■匿名发帖的板块和方法:http://www.1point3acres.com/bbs/thread-405991-1-1.html

关闭

一亩三分地推荐上一条 /5 下一条

手机版|小黑屋|一亩三分地论坛声明

custom counter

GMT+8, 2018-5-25 13:27

Powered by Discuz! X3

© 2001-2013 Comsenz Inc. Design By HUXTeam

快速回复 返回顶部 返回列表