<
回复: 7
收起左侧

FB sample size题

本楼:   👍  0
0%
0%
0   👎
全局:   17
100%
0%
0

2018(1-3月) 分析|数据科学类 硕士 全职@facebook - 内推 - 技术电面  | Other | 其他

注册一亩三分地论坛,查看更多干货!

您需要 登录 才可以下载或查看附件。没有帐号?注册账号

x
Lets say the population on Facebook clicks ads with a
  click-through-rateof P. We select a sample of size N and examine the sample's conversion rate,d
您好!
本帖隐藏的内容需要积分高于 188 才可浏览
您当前积分为 0。
使用VIP即刻解锁阅读权限或查看其他获取积分的方式
游客,您好!
本帖隐藏的内容需要积分高于 188 才可浏览
您当前积分为 0。
VIP即刻解锁阅读权限查看其他获取积分的方式
{P}is within DELTA of the true click through rate P, with 95% confidence.

评分

参与人数 1大米 +3 收起 理由
susiezam + 3 很有用的信息!

查看全部评分


上一篇:Insight Data Science二面面经
下一篇:LIAN面经
 楼主| ds_app2018 2018-3-20 11:51:13 | 显示全部楼层
本楼:   👍  2
100%
0%
0   👎
全局:   17
100%
0%
0
网上有人回答的答案是这样的,觉得最后一个公式N is greater than (1 / delta)^2不是太懂,求讨论!
. 1point3acres
Interpret the question this way: we want to choose an N such that P_hat is an element of [P - delta, P + delta] with probability 95%.

First, note that since P_hat is the sum of N Bernoulli trials with some common parameter (by assumption) that we are trying to estimate, we can safely assume P_hat to be normally distributed with mean equal to the true mean (P) and variance equal to (P)(1 - P) / N.

Now, we when does a normally distributed random variable fall within delta of it's mean with 95% probability? The answer depends on how big delta is. Since P_hat is normally distributed, we know from our statistics classes that 95% of the time it will fall within 2 standard deviations of its mean.

So in other words, we want [P - delta, P + delta] = [P - 2*SE(P_hat), P + 2*SE(P_hat)]. That is, we want delta = SE(P_hat).

So what is the SE ("standard error") of P_hat? Well that's just the square root of its (sample) variance, or Sqrt(P_hat * (1 - P_hat) / N). But wait! We haven't run the experiment yet! How can we know what P_hat is?

We can either (a) make an educated guess, or (b) take the "worst" possible case and use that to upper bound N.
Let's go with option (b): P_hat * (1 - P_hat) is maximized when P_hat is .5, so the product is 0.25.. Waral dи,

To put it all together: delta = 2 * Sqrt(0.25) / Sqrt(N) = 2 * .5 / Sqrt(N) =&gt; N = (1 / delta) ^ 2. So when N is greater than (1 / delta)^2, we can rest assured that P_hat will fall within the acceptable range 95% of the time.

回复

使用道具 举报

cloud0325 2018-3-22 01:29:38 | 显示全部楼层
本楼:   👍  0
0%
0%
0   👎
全局:   2
100%
0%
0
因为这里要求minimum N, P_hat * (1 - P_hat) is maximized when P_hat is .5
so 2*sqrt(P_hat * (1 - P_hat))/sqrt(N) = delta >=2*0.5/sqrt(N)
then delta >=1/sqrt(N)
then sqrt(N)>=1/delta. ----
so N>=(1/delta)^2. Χ
另外,这里应该是1.96 不是 2 吧??. 1point3acres
扫码关注一亩三分地求职移民公众号
更多干货内容等你发现
回复

使用道具 举报

sunday2018 2018-3-24 03:34:27 | 显示全部楼层
本楼:   👍  0
0%
0%
0   👎
全局:   24
92%
8%
2
请问楼主面了吗?
回复

使用道具 举报

hjftc001 2018-3-24 12:10:17 | 显示全部楼层
本楼:   👍  0
0%
0%
0   👎
全局:   53
100%
0%
0
cloud0325 发表于 2018-3-22 01:29. 1point3acres
因为这里要求minimum N, P_hat * (1 - P_hat) is maximized when P_hat is .5
so 2*sqrt(P_hat * (1 - P_h ...
. Waral dи,
应该是1.96。楼主给的答案是用了3 sigma rule 估计的
回复

使用道具 举报

h19881812 2018-3-25 23:59:33 | 显示全部楼层
本楼:   👍  0
0%
0%
0   👎
全局:   2
100%
0%
0
Actually, it is not very important to use 1.96 or 2 since we use CLT to do the normal approximation.. Χ

Assume X_i = 1 if the i-th user click and 0 otherwise. Assume X_i ~iid Ber(p) and sum_{i=1}^N X_i ~ Bin(N,p). Let p_hat = \sum X_i / N be the sample click-through rate.
(1) E(p_hat) = E(sum X_i / N) = 1 / N sum E(X_i) = 1 / N * Np = p.1point3acres
(2) Var(p_hat) = Var(sum X_i / N) = 1 / N^2 sum Var(X_i) (by iid property)  = 1 / N^2 * Np(1-p) = p(1-p) / N.
By CLT, (p_hat - p) is asymptotically normally distributed with mean 0 and variance p(1-p)/N. That is, (p_hat - p) / sqrt(p(1-p) / N) ~ AN(0,1).-baidu 1point3acres

Note that
(1) 95% = P(|p_hat - p | < delta) =P(|(p_hat - p) / sqrt(p(1-p) / N) | < delta / sqrt(p(1-p) / N) ) and
(2) (p_hat - p) / sqrt(p(1-p) / N) ~ AN(0,1)..
Hence, from the distribution function of N(0,1), we know that delta / sqrt(p(1-p) / N)  = 2 (or 1.96, whatever).

Now, rewrite the above formula gives N = 4 * delta^2 / p*(1-p). While we do not know the exact value of 0 < p < 1, it is clear that p(1-p) <= 1/4 by some simple calculus.

Finally, we get N =  4 * delta^2 / p*(1-p) >= 4 * delta^2 / 4 = delta^2.
回复

使用道具 举报

h19881812 2018-3-26 00:02:36 | 显示全部楼层
本楼:   👍  0
0%
0%
0   👎
全局:   2
100%
0%
0
h19881812 发表于 2018-3-25 23:59. 1point3acres
Actually, it is not very important to use 1.96 or 2 since we use CLT to do the normal approximation. ...

Btw, when conducting a real experiment, we should be very careful about the assumption that X_i's are i.i.d.
回复

使用道具 举报

linbaobei001 2018-3-28 05:51:29 | 显示全部楼层
本楼:   👍  0
0%
0%
0   👎
全局:   35
81%
19%
8
h19881812 发表于 2018-3-25 23:59. 1point3acres.com
Actually, it is not very important to use 1.96 or 2 since we use CLT to do the normal approximation. ...
. Waral dи,
很好的答案,, 解释的很清楚,但是你最后一步好像错了,,, N= 4*p(1-p) /belta^2    分子和分母反了~
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册账号
隐私提醒:
  • ☑ 禁止发布广告,拉群,贴个人联系方式:找人请去🔗同学同事飞友,拉群请去🔗拉群结伴,广告请去🔗跳蚤市场,和 🔗租房广告|找室友
  • ☑ 论坛内容在发帖 30 分钟内可以编辑,过后则不能删帖。为防止被骚扰甚至人肉,不要公开留微信等联系方式,如有需求请以论坛私信方式发送。
  • ☑ 干货版块可免费使用 🔗超级匿名:面经(美国面经、中国面经、数科面经、PM面经),抖包袱(美国、中国)和录取汇报、定位选校版
  • ☑ 查阅全站 🔗各种匿名方法

本版积分规则

Advertisement
>
快速回复 返回顶部 返回列表