一亩三分地论坛

 找回密码
 获取更多干货,去instant注册!

扫码关注一亩三分地公众号
查看: 678|回复: 7
收起左侧

A9 second round interview intern

[复制链接] |试试Instant~ |关注本帖
kobe24 发表于 2015-12-16 06:57:22 | 显示全部楼层 |阅读模式

2016(7-9月) 分析|数据科学类 博士 实习@A9 - 网上海投 - 技术电面 |Other其他

注册一亩三分地论坛,查看更多干货!

您需要 登录 才可以下载或查看,没有帐号?获取更多干货,去instant注册!

x
A9 advertisement group:. from: 1point3acres.com/bbs
interviewer: Chinese male

5 min introduction and talk about research. more info on 1point3acres.com

. 1point3acres.com/bbs50 min data coding
                     1) given you a huge file with data matrix, compute the mean/variance of each column.鏈枃鍘熷垱鑷1point3acres璁哄潧
                     2) there are N employees, implement a algorithm that sample M of them. Assume each employee is equally being selected.

5 min for me to ask him questions.. more info on 1point3acres.com

神罗天征 发表于 2015-12-16 07:13:32 | 显示全部楼层
请问第二题怎么做呢?是什么蓄水池算法吗?
回复 支持 反对

使用道具 举报

 楼主| kobe24 发表于 2015-12-16 07:28:47 | 显示全部楼层
神罗天征 发表于 2015-12-16 07:13
请问第二题怎么做呢?是什么蓄水池算法吗?

at the beginning, the one I wrote has high complexity as follows:

def sample(N, M):
    CDF_list = [ ];
    pdf_list = [float(1/N)] * N;  # [1/N, 1/N, ...].鐣欏璁哄潧-涓浜-涓夊垎鍦
   
    # list of employee name
    employee_list = [.....];
    CDF_list.append(0.0);
. 1point3acres.com/bbs    for pos in range(0,len(pdf_list)):
.鏈枃鍘熷垱鑷1point3acres璁哄潧        accu_prob = float(sum(pdf_list[0:pos+1]));-google 1point3acres
        CDF_list.append(accu_prob);
        
    # sampling process,   sample one employee
    prob = random.unform(0,1);. 涓浜-涓夊垎-鍦帮紝鐙鍙戝竷
    count = 0;.鏈枃鍘熷垱鑷1point3acres璁哄潧
    sampled_list = [ ];
    # complexity: O(M*len(CDF_list))
    while count <= M:. 鐗涗汉浜戦泦,涓浜╀笁鍒嗗湴
        for i in range(0, len(CDF_list)):. 涓浜-涓夊垎-鍦帮紝鐙鍙戝竷
            if prob >= float(CDF_list) and prob < float(CDF_list[i+1]):. visit 1point3acres.com for more.
                if employee_list not in sampled_list:
                    sampled_list.append(employee_list);
                    count = count + 1;
                    
    # all the sampled employee in the sampled_list. 鍥磋鎴戜滑@1point 3 acres
    # print out them
    for pos in range(0,len(sampled_list)):
        print "each employee is "  + str(sampled_list[pos]);   

Then interviewer asked me the best case/worst case complexity of above algorithm and then he asked me how to improve?

Then I came up a random.shuffle algorithm.

Permute-By-Sorting(employee_list):
回复 支持 反对

使用道具 举报

 楼主| kobe24 发表于 2015-12-16 07:31:06 | 显示全部楼层
kobe24 发表于 2015-12-16 07:28
at the beginning, the one I wrote has high complexity as follows:. 1point 3acres 璁哄潧

def sample(N, M):

Permute-By-Sorting(employee_list):
   n = employee_list.length;
  let P[1...n] be a new array. visit 1point3acres.com for more.
  for i = 1 to n
     P = RANDOM(1,n^{3});
  sorting employee_list using P as sort keys.
.1point3acres缃
The algorithm is in "Introduction to Algorithm" Book, chapter 5.3 randomized algorithm. you can take a look at it.
回复 支持 反对

使用道具 举报

神罗天征 发表于 2015-12-16 07:36:18 | 显示全部楼层
kobe24 发表于 2015-12-16 07:31
Permute-By-Sorting(employee_list):
   n = employee_list.length;. from: 1point3acres.com/bbs
  let P[1...n] be a new array

十分感谢!

补充内容 (2015-12-16 07:37):
祝楼主offer~
回复 支持 反对

使用道具 举报

阿骄 发表于 2016-1-1 12:42:28 | 显示全部楼层
感谢楼主!请问第一题是 MapReduce 的解法(因为 huge file)还是干算?
回复 支持 反对

使用道具 举报

 楼主| kobe24 发表于 2016-1-1 22:04:09 | 显示全部楼层
一亩三分地严打"顶""好贴""收藏了"之类的垃圾回复帖!被警告三次,系统会自动封杀ID!

想支持楼主,请点击帖子下方的"好苗""分享""收藏"键,酌情给楼主加大米(系统不扣你自己的分)。
积分不够看不了帖子,请参考论坛导航里的"帮助","新手提纲"里有攒积分指南
回复 支持 反对

使用道具 举报

 楼主| kobe24 发表于 2016-1-1 22:05:10 | 显示全部楼层
I didn't use map-reduce framework. don't save data into array. otherwise it will cost lots of memory
回复 支持 反对

使用道具 举报

本版积分规则

请点这里访问我们的新网站:一亩三分地Instant.

Instant搜索更强大,不扣积分,内容组织的更好更整洁!目前仍在beta版本,努力完善中!反馈请点这里

关闭

一亩三分地推荐上一条 /5 下一条

手机版|小黑屋|一亩三分地论坛声明 ( 沪ICP备11015994号 )

custom counter

GMT+8, 2016-12-11 19:50

Powered by Discuz! X3

© 2001-2013 Comsenz Inc. Design By HUXTeam

快速回复 返回顶部 返回列表