< ds_app2018 | 显示全部楼层 | 🔍试试Job多多
 本楼： 👍  0 0% 0% 0   👎 全局： 17 100% 0% 0

2018(1-3月) 分析|数据科学类 硕士 全职@facebook - 内推 - 技术电面  | Other | 其他

### 注册一亩三分地论坛，查看更多干货！

x
click-through-rateof P. We select a sample of size N and examine the sample's conversion rate,d

VIP即刻解锁阅读权限查看其他获取积分的方式 {P}is within DELTA of the true click through rate P, with 95% confidence.

### 评分

susiezam + 3 很有用的信息！ 楼主| ds_app2018 2018-3-20 11:51:13 | 显示全部楼层
 本楼： 👍  2 100% 0% 0   👎 全局： 17 100% 0% 0
 网上有人回答的答案是这样的，觉得最后一个公式N is greater than (1 / delta)^2不是太懂，求讨论！ . 1point3acres Interpret the question this way: we want to choose an N such that P_hat is an element of [P - delta, P + delta] with probability 95%. First, note that since P_hat is the sum of N Bernoulli trials with some common parameter (by assumption) that we are trying to estimate, we can safely assume P_hat to be normally distributed with mean equal to the true mean (P) and variance equal to (P)(1 - P) / N. Now, we when does a normally distributed random variable fall within delta of it's mean with 95% probability? The answer depends on how big delta is. Since P_hat is normally distributed, we know from our statistics classes that 95% of the time it will fall within 2 standard deviations of its mean. So in other words, we want [P - delta, P + delta] = [P - 2*SE(P_hat), P + 2*SE(P_hat)]. That is, we want delta = SE(P_hat). So what is the SE ("standard error") of P_hat? Well that's just the square root of its (sample) variance, or Sqrt(P_hat * (1 - P_hat) / N). But wait! We haven't run the experiment yet! How can we know what P_hat is? We can either (a) make an educated guess, or (b) take the "worst" possible case and use that to upper bound N. Let's go with option (b): P_hat * (1 - P_hat) is maximized when P_hat is .5, so the product is 0.25.. Waral dи, To put it all together: delta = 2 * Sqrt(0.25) / Sqrt(N) = 2 * .5 / Sqrt(N) => N = (1 / delta) ^ 2. So when N is greater than (1 / delta)^2, we can rest assured that P_hat will fall within the acceptable range 95% of the time. cloud0325 2018-3-22 01:29:38 | 显示全部楼层
 本楼： 👍  0 0% 0% 0   👎 全局： 2 100% 0% 0
 因为这里要求minimum N, P_hat * (1 - P_hat) is maximized when P_hat is .5 so 2*sqrt(P_hat * (1 - P_hat))/sqrt(N) = delta >=2*0.5/sqrt(N) then delta >=1/sqrt(N) then sqrt(N)>=1/delta. ---- so N>=(1/delta)^2. Χ 另外，这里应该是1.96 不是 2 吧？？. 1point3acres  sunday2018 2018-3-24 03:34:27 | 显示全部楼层
 本楼： 👍  0 0% 0% 0   👎 全局： 24 92% 8% 2
 请问楼主面了吗？ hjftc001 2018-3-24 12:10:17 | 显示全部楼层
 本楼： 👍  0 0% 0% 0   👎 全局： 53 100% 0% 0
 cloud0325 发表于 2018-3-22 01:29. 1point3acres 因为这里要求minimum N, P_hat * (1 - P_hat) is maximized when P_hat is .5 so 2*sqrt(P_hat * (1 - P_h .... Waral dи, 应该是1.96。楼主给的答案是用了3 sigma rule 估计的 h19881812 2018-3-25 23:59:33 | 显示全部楼层
 本楼： 👍  0 0% 0% 0   👎 全局： 2 100% 0% 0
 Actually, it is not very important to use 1.96 or 2 since we use CLT to do the normal approximation.. Χ Assume X_i = 1 if the i-th user click and 0 otherwise. Assume X_i ~iid Ber(p) and sum_{i=1}^N X_i ~ Bin(N,p). Let p_hat = \sum X_i / N be the sample click-through rate. (1) E(p_hat) = E(sum X_i / N) = 1 / N sum E(X_i) = 1 / N * Np = p.1point3acres (2) Var(p_hat) = Var(sum X_i / N) = 1 / N^2 sum Var(X_i) (by iid property)  = 1 / N^2 * Np(1-p) = p(1-p) / N. By CLT, (p_hat - p) is asymptotically normally distributed with mean 0 and variance p(1-p)/N. That is, (p_hat - p) / sqrt(p(1-p) / N) ~ AN(0,1).-baidu 1point3acres Note that (1) 95% = P(|p_hat - p | < delta) =P(|(p_hat - p) / sqrt(p(1-p) / N) | < delta / sqrt(p(1-p) / N) ) and (2) (p_hat - p) / sqrt(p(1-p) / N) ~ AN(0,1).. Hence, from the distribution function of N(0,1), we know that delta / sqrt(p(1-p) / N)  = 2 (or 1.96, whatever). Now, rewrite the above formula gives N = 4 * delta^2 / p*(1-p). While we do not know the exact value of 0 < p < 1, it is clear that p(1-p) <= 1/4 by some simple calculus. Finally, we get N =  4 * delta^2 / p*(1-p) >= 4 * delta^2 / 4 = delta^2. h19881812 2018-3-26 00:02:36 | 显示全部楼层
 本楼： 👍  0 0% 0% 0   👎 全局： 2 100% 0% 0
 h19881812 发表于 2018-3-25 23:59. 1point3acres Actually, it is not very important to use 1.96 or 2 since we use CLT to do the normal approximation. ... Btw, when conducting a real experiment, we should be very careful about the assumption that X_i's are i.i.d. linbaobei001 2018-3-28 05:51:29 | 显示全部楼层
 本楼： 👍  0 0% 0% 0   👎 全局： 35 81% 19% 8
 h19881812 发表于 2018-3-25 23:59. 1point3acres.com Actually, it is not very important to use 1.96 or 2 since we use CLT to do the normal approximation. .... Waral dи, 很好的答案，， 解释的很清楚，但是你最后一步好像错了，，， N= 4*p(1-p) /belta^2    分子和分母反了~

 隐私提醒： ☑ 禁止发布广告，拉群，贴个人联系方式：找人请去🔗同学同事飞友，拉群请去🔗拉群结伴，广告请去🔗跳蚤市场,和 🔗租房广告|找室友 ☑ 论坛内容在发帖 30 分钟内可以编辑，过后则不能删帖。为防止被骚扰甚至人肉，不要公开留微信等联系方式，如有需求请以论坛私信方式发送。 ☑ 干货版块可免费使用 🔗超级匿名：面经（美国面经、中国面经、数科面经、PM面经），抖包袱（美国、中国）和录取汇报、定位选校版 ☑ 查阅全站 🔗各种匿名方法 本版积分规则 回帖后跳转到最后一页