查看: 615| 回复: 0
收起左侧

今日分享:Top 40 DS Interview Questions

|只看干货
本楼: 👍   0% (0)
 
 
0% (0)   👎
全局: 👍   84% (22)
 
 
15% (4)    👎

注册一亩三分地论坛,查看更多干货!

您需要 登录 才可以下载或查看附件。没有帐号?注册账号

x
今日分享:Top 40 DS Interview Questions. Waral dи,

http://xhslink.com/itUwcm
11. Explain likely differences between administrative datasets and datasets gathered from experimental studies. What are likely problems encountered with administrative data? How do experimental methods help alleviate these problems? What problem do they bring?
Administrative datasets are typically datasets used by governments or other organizations for non-statistical reasons.. ----
Administrative datasets are usually larger and more cost-efficient than experimental studies. They are also regularly updated assuming that the. 1point3acres
organization associated with the administrative dataset is active and functioning. At the same time, administrative datasets may not capture all of the data that one may want and may not be in the desired format either. It is also prone to quality issues and missing entries.

12. You are compiling a report for user content uploaded every month and notice a spike in uploads in October. In particular, a spike in picture uploads. What might you think is the cause of this, and how would you test it?. 1point3acres
There are a number of potential reasons for a spike in photo uploads:
1. A new feature may have been implemented in October which involves uploading photos and gained a lot of traction by users. For example, a feature that gives the ability to create photo albums.
2. Similarly, it’s possible that the process of uploading photos before was not intuitive and was improved in the month of October.. ----
3. There may have been a viral social media movement that involved uploading photos that lasted for all of October. Eg. Movember but something more scalable.
4. It’s possible that the spike is due to people posting pictures of themselves in costumes for Halloween.
The method of testing depends on the cause of the spike, but you would conduct hypothesis testing to determine if the inferred cause is the actual cause.

13. You’re about to get on a plane to Seattle. You want to know if you should bring an umbrella. You call 3 random friends of yours who live there and ask each independently if it’s raining. Each of your friends has a 2/3 chance of telling you the truth and a 1/3 chance of messing with you by lying. All 3 friends tell you that “Yes” it is raining. What is the probability that it’s actually raining in Seattle?
You can tell that this question is related to Bayesian theory because of the last statement which essentially follows the structure, “What is the probability A is true given B is true?” Therefore we need to know the probability of it raining in London on a given day. Let’s assume it’s 25%.
P(A) = probability of it raining = 25% P(B) = probability of all 3
friends say that it’s raining P(A|B) probability that it’s raining given they’re telling that it is raining P(B|A) probability that all 3 friends say that it’s raining given it’s raining = (2/3)3 = 8/27
Step 1: Solve for P(B) P(A|B) = P(B|A) * P(A) / P(B), can be rewritten as
P(B) = P(B|A) * P(A) + P(B|not A) * P(not A) P(B) = (2/3)3 * 0.25 + (1/3)3 * 0.75 = 0.25*8/27 + 0.75*1/27
Step 2: Solve for P(A|B) P(A|B) = 0.25 * (8/27) / ( 0.25*8/27 + 0.75*1/27) P(A|B) = 8 / (8 + 3) = 8/11
Therefore, if all three friends say that it’s raining, then there’s an 8/11 chance that it’s actually raining.
. 1point3acres.com
14. There’s one box — has 12 black and 12 red cards, 2nd box has 24 black and 24 red; if you want to draw 2 cards at random from one of the 2 boxes, which box has the higher probability of getting the same color? Can you tell intuitively why the 2nd box has a higher probability. check 1point3acres for more.
The box with 24 red cards and 24 black cards has a higher probability of getting two cards of the same color. Let’s walk through each step.
Let’s say the first card you draw from each deck is a red Ace.
This means that in the deck with 12 reds and 12 blacks, there’s now 11 reds and 12 blacks. Therefore your odds of drawing another red are equal to 11/(11+12) or 11/23.
In the deck with 24 reds and 24 blacks, there would then be 23 reds and 24 blacks. Therefore your odds of drawing another red are equal to 23/(23+24) or 23/47.
Since 23/47 > 11/23, the second deck with more cards has a higher probability of getting the same two cards.
.1point3acres15. What is: lift, KPI, robustness, model fitting, design of experiments, 80/20 rule?
Lift: lift is a measure of the performance of a targeting model measured against a random choice targeting model; in other words, lift tells you how much better your model is at predicting things than if you had no model.
KPI: stands for Key Performance Indicator, which is a measurable metric used to determine how well a company is achieving its business objectives. Eg. error rate.
Robustness: generally robustness refers to a system’s ability to handle variability and remain effective.
Model fitting: refers to how well a model fits a set of observations.
Design of experiments: also known as DOE, it is the design of any task that aims to describe and explain the variation of information under conditions that are
hypothesized to reflect the variable. [4] In essence, an experiment aims to predict an outcome based on a change in one or more inputs (independent variables)..--
80/20 rule: also known as the Pareto principle; states that 80% of the effects come from 20% of the causes. Eg. 80% of sales come from 20% of customers.

16. Define quality assurance, six sigma.
Quality assurance: an activity or set of activities focused on maintaining a desired level of quality by minimizing mistakes and defects.
Six sigma: a specific type of quality assurance methodology composed of a set of techniques and tools for process improvement. A six sigma process is one in which 99.99966% of all outcomes are free of defects.

17. Give examples of data that does not have a Gaussian distribution, nor log-normal.. 1point3acres.com
* Any type of categorical data won’t have a gaussian distribution or lognormal distribution.. 1point 3acres
* Exponential distributions — eg. the amount of time that a car battery lasts or the amount of time until an earthquake occurs.

18. What is root cause analysis? How to identify a cause vs. a correlation? Give examples
Root cause analysis: a method of problem-solving used for identifying the root cause(s) of a problem
Correlation measures the relationship between two variables, range from -1 to 1. Causation is when a first event appears to have caused a second event. Causation essentially looks at direct relationships while correlation can look at both direct and indirect relationships.
Example: a higher crime rate is associated with higher sales in ice cream in Canada, aka they are positively correlated. However, this doesn’t mean that one causes another. Instead, it’s because both occur more when it’s warmer outside.
You can test for causation using hypothesis testing or A/B testing.

19. Give an example where the median is a better measure than the mean. Χ
When there are a number of outliers that positively or negatively skew the data.

20. Given two fair dices, what is the probability of getting scores that sum to 4? to 8?
There are 4 combinations of rolling a 4 (1+3, 3+1, 2+2): P(rolling a 4) = 3/36 = 1/12
There are combinations of rolling an 8 (2+6, 6+2, 3+5, 5+3, 4+4): P(rolling an 8) = 5/36

21. What is the Law of Large Numbers?. 1point3acres.com
The Law of Large Numbers is a theory that states that as the number of trials increases, the average of the result will become closer to the expected value.
Eg. flipping heads from fair coin 100,000 times should be closer to 0.5 than 100 times.
想要更多相关练习题的话记得留言哟~

上一篇:SQL性能优化的21小技巧(二)
下一篇:interview query 账号共享
您需要登录后才可以回帖 登录 | 注册账号
隐私提醒:
  • ☑ 禁止发布广告,拉群,贴个人联系方式:找人请去🔗同学同事飞友,拉群请去🔗拉群结伴,广告请去🔗跳蚤市场,和 🔗租房广告|找室友
  • ☑ 论坛内容在发帖 30 分钟内可以编辑,过后则不能删帖。为防止被骚扰甚至人肉,不要公开留微信等联系方式,如有需求请以论坛私信方式发送。
  • ☑ 干货版块可免费使用 🔗超级匿名:面经(美国面经、中国面经、数科面经、PM面经),抖包袱(美国、中国)和录取汇报、定位选校版
  • ☑ 查阅全站 🔗各种匿名方法

本版积分规则

>
快速回复 返回顶部 返回列表