一亩三分地

 找回密码 注册账号

扫描二维码登录本站

BBS
Offer多多
Salarytics
交友
Learn
Who's Hiring?
Visa Tracker
疫情动态
指尖新闻
Instant
客户端
微信公众号
扫码关注公众号
留学申请公众号
扫码关注留学申请公众号
Youtube频道
留学博客
关于我们
查看: 3185|回复: 26
收起左侧

数据科学DS面经全[新人求米,后续分享力扣所有 SQL 答案]

  [复制链接] |只看干货 |面试经验, 分析|数据科学类, all, 数科面经
我的人缘0

升级   32.57%


分享帖子到朋友圈
zzh2011 | 显示全部楼层 |阅读模式
本楼: 👍   100% (5)
 
 
0% (0)   👎
全局: 👍   100% (162)
 
 
0% (0)    👎

2020(4-6月) 分析|数据科学类 硕士 全职@all - Other - 技术电面  | Other | fresh grad应届毕业生

注册一亩三分地论坛,查看更多干货!

您需要 登录 才可以下载或查看,没有帐号?注册账号

x
DS 面试题目的几大分类
Statistics
Programming
General
Big Data
Python
R
SQL
Modeling
Behavioral
Culture Fit
Problem-Solving
. 1point3acres

[新人求加米!!! 米多后回报地里会分享自己手动总结的力扣所有 SQL 题目(按考点分类)以及答案!!!]

Statistics
1 What is the Central Limit Theorem and why is it important?
2 What is sampling? How many sampling methods do you know?. check 1point3acres for more.
3 What is the difference between Type I vs Type II error?
4 What is linear regression? What do the terms P-value, coefficient, R-Squared value mean? What is the significance of each of these components?
5 What are the assumptions required for linear regression?
    There are four major assumptions:
1. There is a linear relationship between the dependent variables and the regressors, meaning the model you are creating actually fits the data,
2. The errors or residuals of the data are normally distributed and independent from each other,
3. There is minimal multicollinearity between explanatory variables, and
4. Homoscedasticity. This means the variance around the regression line is the same for all values of the predictor variable.
6 What is a statistical interaction?
7 What is selection bias?
8 What is an example of a dataset with a non-Gaussian distribution?
9 What is the Binomial Probability Formula?


Programming
1 With which programming languages and environments are you most comfortable working?
2 What are some pros and cons about your favorite statistical software?
3 Tell me about an original algorithm you’ve created.
4 Describe a Data Science project  in which you worked with a substantial programming component. What did you learn from that experience?
5 Do you contribute to any open source projects?
6 How would you clean a dataset in (insert language here)?
7 Tell me about the coding you did during your last project?

Big Data
1 What are the two main components of the Hadoop Framework?
2 Explain how MapReduce works as simply as possible.
3 How would you sort a large list of numbers?
4 Here is a big dataset. What is your plan for dealing with outliers? How about missing values? How about transformations?

Python
1 What modules/libraries are you most familiar with? What do you like or dislike about them?
2 What are the supported data types in Python?
3 What is the difference between a tuple and a list in Python?

R
1 What are the different types of sorting algorithms available in R language?
There are insertion, bubble, and selection sorting algorithms.
2 What are the different data objects in R?
3 What packages are you most familiar with? What do you like or dislike about them?
4 How do you access the element in the 2nd column and 4th row of a matrix named M?
5 What is the command used to store R objects in a file?
6 What is the best way to use Hadoop and R together for analysis?
7 How do you split a continuous variable into different groups/ranks in R?
8 Write a function in R language to replace the missing value in a vector with the mean of that vector.


SQL
1 What is the purpose of the group functions in SQL? Give some examples of group functions.
2 Group functions are necessary to get summary statistics of a dataset. COUNT, MAX, MIN, AVG, SUM, and DISTINCT are all group functions.
3 Tell me the difference between an inner join, left join/right join, and union.
4 What does UNION do? What is the difference between UNION and UNION ALL?
4 What is the difference between SQL and MySQL or SQL Server?
5 If a table contains duplicate rows, does a query result display the duplicate values by default? How can you eliminate duplicate rows from a query result?


Modeling
1 Tell me about how you designed the model you created for a past employer or client.
2 What are your favorite data visualization techniques?
3 How would you effectively represent data with 5 dimensions?
4 How is kNN different from k-means clustering?
kNN, or k-nearest neighbors is a classification algorithm, where the k is an integer describing the the number of neighboring data points that influence the classification of a given observation. K-means is a clustering algorithm, where the k is an integer describing the number of clusters to be created from the given data. Both accomplish different tasks.
5 How would you create a logistic regression model?
6 Have you used a time series model? Do you understand cross-correlations with time lags?
7 Explain the 80/20 rule, and tell me about its importance in model validation.. check 1point3acres for more.
8 Explain what precision and recall are. How do they relate to the ROC curve?
Recall describes what percentage of true positives are described as positive by the model. Precision describes what percent of positive predictions were correct. The ROC curve shows the relationship between model recall and specificity – specificity being a measure of the percent of tru
游客,本帖隐藏的内容需要积分高于 188 才可浏览,您当前积分为 0。
查看如何攒积分 Click here for more info.
llars in the lottery, what would you do with the money?. From 1point 3acres bbs
14 What is one thing you believe that most people do not?
15 What personality traits do you butt heads with?. 1point3acres
16 What are you passionate about?


Problem Solving
1 How would you come up with a solution to identify plagiarism?. 1point3acres
2 How many “useful” votes will a Yelp review receive?
3 How do you detect individual paid accounts shared by multiple users?. 1point3acres
4 You are about to send one million emails. How do you optimize delivery? How do you optimize response?
5 You have a dataset containing 100K rows and 100 columns, with one of those columns being our dependent variable for a problem we’d like to solve. How can we quickly identify which columns will be helpful in predicting the dependent variable. Identify two techniques and explain them to me as though I were 5 years old.
6 How would you detect bogus reviews, or bogus Facebook accounts used for bad purposes?
7 How would you perform clustering on one million unique keywords, assuming you have 10 million data points – each one consisting of two keywords, and a metric measuring how similar these two keywords are? How would you create this 10 million data points table in the first place?
8 How would you optimize a web crawler to run much faster, extract better information, and better summarize data to produce cleaner databases?


// 祝大家早日上岸

评分

参与人数 57大米 +97 收起 理由
mazhayao93 + 1 谢谢楼主!
张艺佳Jenny + 1 很有用的信息!
rachelguagua + 1 给你点个赞!
似水锦年 + 2 给你点个赞!
一碗橙子酒 + 1 赞一个
猫科动物 + 1 给你点个赞!
cherry90922 + 1 谢谢楼主!
piman + 1 赞一个
Quinntile + 2 很有用的信息!
yytt1102 + 1 谢谢分享!

查看全部评分


上一篇:新加坡shopee HR数据科学家电面
下一篇:热带雨林HM的B(O)A技书店面,吐槽与SQL Book题面筋

本帖被以下淘专辑推荐:

我的人缘0

升级   32.57%

 楼主| zzh2011 2020-8-19 09:08:54 | 显示全部楼层
本楼: 👍   0% (0)
 
 
0% (0)   👎
全局: 👍   100% (162)
 
 
0% (0)    👎
第一期 SQL 链接直达:力扣 SQL 题目答案 按题型分类: (I) JOIN (不包括 self join) ->  https://www.1point3acres.com/bbs/thread-661677-1-1.html
回复

使用道具 举报

我的人缘0

升级   86.67%

BODIANLEE 2020-8-24 09:41:44 | 显示全部楼层
本楼: 👍   100% (2)
 
 
0% (0)   👎
全局: 👍   40% (4)
 
 
60% (6)    👎
讲道理并不难这些问题!!!谢谢楼主
回复

使用道具 举报

我的人缘0

升级   81.5%

shuke-chen 2020-8-23 05:54:04 | 显示全部楼层
本楼: 👍   100% (2)
 
 
0% (0)   👎
全局: 👍   84% (134)
 
 
15% (25)    👎
积分不够看不到。。。求好心人加米, 谢谢
回复

使用道具 举报

我的人缘0

升级   0%

本楼: 👍   100% (1)
 
 
0% (0)   👎
全局: 👍   100% (1)
 
 
0% (0)    👎
Mark一下
回复

使用道具 举报

我的人缘0
本楼: 👍   100% (1)
 
 
0% (0)   👎
全局: 👍   100% (100)
 
 
0% (0)    👎
感谢楼主分享
回复

使用道具 举报

我的人缘0

升级   32.57%

 楼主| zzh2011 2020-8-19 09:05:36 | 显示全部楼层
本楼: 👍   0% (0)
 
 
0% (0)   👎
全局: 👍   100% (162)
 
 
0% (0)    👎

不客气, 加油!

评分

参与人数 1大米 +1 收起 理由
wingsyzli + 1 赞一个

查看全部评分

回复

使用道具 举报

我的人缘0

升级   61%

msnlbj 2020-8-19 10:26:09 | 显示全部楼层
本楼: 👍   100% (1)
 
 
0% (0)   👎
全局: 👍   90% (36)
 
 
10% (4)    👎
火前留名~~~感谢楼主分享!

评分

参与人数 1大米 +1 收起 理由
zzh2011 + 1 赞一个

查看全部评分

回复

使用道具 举报

我的人缘0

升级   50.86%

FinalLi 2020-8-19 11:18:58 | 显示全部楼层
本楼: 👍   100% (1)
 
 
0% (0)   👎
全局: 👍   75% (21)
 
 
25% (7)    👎
感谢楼主分享
回复

使用道具 举报

我的人缘0

升级   59%

璐璐璐 2020-8-20 05:39:42 | 显示全部楼层
本楼: 👍   100% (1)
 
 
0% (0)   👎
全局: 👍   100% (2)
 
 
0% (0)    👎
MARK一下,感谢分享
回复

使用道具 举报

我的人缘0

升级   8.5%

小黄狗 2020-8-20 06:01:28 | 显示全部楼层
本楼: 👍   100% (1)
 
 
0% (0)   👎
全局: 👍   100% (2)
 
 
0% (0)    👎
mark一下,积分够了再回来看

评分

参与人数 1大米 +1 收起 理由
shuke-chen + 1 欢迎来一亩三分地论坛!

查看全部评分

回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册账号

本版积分规则

隐私提醒:
■拉群请前往同学同事飞友|拉群结伴版块,其他版块拉群,帖子会被自动删除
■论坛不能删帖,为防止被骚扰甚至人肉,不要公开留微信等联系方式,请以论坛私信方式发送。
■特定版块可以超级匿名:https://pay.1point3acres.com/tools/thread
■其他版块匿名方法:http://www.1point3acres.com/bbs/thread-405991-1-1.html

手机版|||一亩三分地

Powered by Discuz! X3

© 2001-2013 Comsenz Inc. Design By HUXTeam

Some icons made by Freepik from flaticon.com

快速回复 返回顶部 返回列表