注册一亩三分地论坛,查看更多干货!
您需要 登录 才可以下载或查看附件。没有帐号?注册账号
x
本帖最后由 maloch 于 2020-9-24 12:00 编辑
. .и
本人phd第四年开始,这暑假在facebook ds实习,刚结束google hr就联系我约面试,我本来想春招投一波的看到有机会送上门就不面白不面了。
HR 很热情问你毕业计划,实习经历,职业规划等等。很快约了首轮面试。
技术电面:youtube组的一位大哥,在谷歌呆了8年了。迟到了10分钟但是很友善后面有给我补时间。也很耐心听我博士科研(他直接问博士论文准备写什么)。这里需要注意的是你把自己模型描述得再牛都好,记得准备一下如何回答“怎么验证你的estimation准不准”或者 “可否给出estimation CI”的followup。
题目:
1. 经典coding 生成1-100向量,每个取square root,对奇数位求和。不推荐用for loop,我用的是 x = sqrt(seq(1,100)), y = sum(x[seq(1,99,by=2)]).
2. 看图分析,都是linear regression的概念,首先看y跟三个变量的pairwise scatterplot,问你觉得regression会长什么样。然后给你看summary(lm model)问你看出什么信息和潜在问题 (印证你刚刚的观察;R square太高可能overfitting)。 然后是diagnostic,看回归残差与各个x的scatterplot,问有啥问题。看studentized residuals和qq-plot,问你看出什么。
3. case study. Youtube home feed. 有一个新算法,得到的home feed总体提高了,但有8% channel sample的revenue下降超过10%,问你怎么看。如果又有一个算法把8%里面的4%的revenue提上来了,但引起另外4%sample revenue下降超过10%,问你怎么看。我往multiple testing方面扯了,感觉是对的。
hr很快followup call说面试官很impressed,约onsite,5轮。
1st round: Chrome team
面试官说是open ended question:how do you find out whether the following conclusion is true or not: Using methods that have lower power to analyze experiments leads to a higher fraction of the published papers that are incorrect. (本人此题崩了,完全误解了题意,感觉是被面试官误导带偏了,欢迎大家思考一下,我过两天会后面在评论区奉上答案,我现在知道答案)
2nd round, Youtube team
customer feature X, predict if user would watch a video in the homefeed page. How would you design. (predict the click, logistic; predict the number of click, poisson regression).
Let's do logistic. write down the code (I use R). What is you don't have the glm function, how to you solve the logistic regression. (write down loss function, code it out, gradient descent).. 1point 3acres
If the customer features > 900, sample size only 1000, what will happend (overfit, collinearity). How to solve (PCA, PLS, AIC, BIC, Lasso, etc).
Code Lasso. (glmnet). Write down lasso object function. how to choose the hyperparameter (CV). Code cv.
If you have estimation, how to calculate CI (Bootstrap).
If you have method A, method B, how do you use bootstrap to test whether A is better than B. (each bootstrap round get the difference of accuracy, look at the dist of all the diff acc)
3rd round, health team
case study. Google want to design a screen survey, for those who search headache, fever, etc, direct them to a survey ask more about the symptoms and give advices about thether this person should go to hostpital or not. How do you test the effect of this survey? (1. define treatment/control group, should be event-based (search health-related words). 2. test how ppl react to the survey (go to hospital or not after suggestions). 3. test how accurate is the suggestion (whether users are satisfied with the survey suggestions). ) . 1point 3acres
How do you know whether users went to hospital or not and how they feel about suggestions? (post survey)
What is only part of the sample took the post-survey, say, younger people. How do you generalized this biased sample info to the population (不够时间了到这里,这个也卡住我了,大家可以一起思考哈,感觉也是个常见问题,如何解决sample biasedness)
4st, Ads metric
model 1, regression Y ~ X, sample size n. model 2, every sample in model 1 duplicate, make the sample size 2n and do regression Y' ~ X'. Will coefficient estimation change? why? (no, show math).
Will the CI change? why? (yes, show math).
what's the issue about model 2? (violate independent condition)
how will you estimate coefficient CI with model 2's data? (bootstrap). do you expect the bootstrap CI estimation close to model 1 or model 2 regression? why? (model 1).
Another question, you have budget B to invest in N place for ads campaign like TV, Radio, Emails, Youtube.... How to您好! 本帖隐藏的内容需要积分高于 188 才可浏览 您当前积分为 0。 使用VIP即刻解锁阅读权限或查看其他获取积分的方式 游客,您好! 本帖隐藏的内容需要积分高于 188 才可浏览 您当前积分为 0。 VIP即刻解锁阅读权限 或 查看其他获取积分的方式 >Udacity abtest summary. 除此之外分享几个个人感觉挺不错的ds 面试快速复习干货: how to choose k in PCA, mixed effect, 快速复习R coding statistic, 因果分析。还有好多在bookmark就不一 一分享啦,大家一般都google得到。建议大家多想想自己哪些知识有漏洞比如recommendation system,outlier detection甚至survival analysis,搜到看了即便几分钟比完全空想肯定稳很多的。 祝大家offer多多事业有成。
. From 1point 3acres bbs
求大米哈
|