回复: 21
收起左侧

2025 Meta Data Engineer Onsite

   
🎯 1
1
🙏 1
💪 1
本楼:   👍  9
100%
0%
0   👎
全局:   27
96%
4%
1

2025(1-3月) DataEng 本科 全职@Meta - 内推 - Onsite  | 😃 Positive 😐 Average | Pass | 在职跳槽

注册一亩三分地论坛,查看更多干货!

您需要 登录 才可以下载或查看附件。没有帐号?注册账号

x
本帖最后由 andy3545 于 2025-2-13 12:23 编辑

Backgound:

I am writing this in English because I moved to Canada from China at a young age and only have a third-grade level of Chinese proficiency.
  • 4 years of experience in Data Engineering
  • Graduated from a Canadian Unversity
  • Currently working at a ride-sharing company
  • Prepared around 25 hours total (on weekends and after work)




Timeline:
  • 10/24/2024: Referral via a friend
  • 11/1/2024: HR Call
  • 12/5/2024: Technical Phone Screen
  • 1/31/2025: Onsite Day 1
  • 2/5/2025: Onsite Day 2
  • 2/11/2025: Verbal Offer for E4




Technical Phone Screen:

I have a detailed post on this here:
SQL Topics: GROUP BY HAVING, sub-queries, SUM(CASE WHEN), self-joins
If you work with SQL regularly, this should be straightforward. I was able to complete all 5/5 questions.
Focus on fundamentals: Lists, Strings, Tuples, Dictionaries.
T
您好!
本帖隐藏的内容需要积分高于 188 才可浏览
您当前积分为 0。
使用VIP即刻解锁阅读权限或查看其他获取积分的方式
游客,您好!
本帖隐藏的内容需要积分高于 188 才可浏览
您当前积分为 0。
VIP即刻解锁阅读权限查看其他获取积分的方式
dataset with billions of records, bonus points for discussing data partitioning strategies and optimizing dashboards by reading from a daily aggregated table instead of the raw fact table.



Best of Luck. 加油!!

补充内容 (2025-02-14 22:25 +08:00):

Data Modeling: Handle posts shared 10,000+ times (Hint: Use an array to store shares).

Standard newsfeed data model, but the follow up question was "How does your data model handle multiple layers of sharing, and efficietly count how many shares each post has and who the original poster and posted time is?"

For example:
user_1 shares, then user_2 sees user_1's post and shares, this is considered 2 layers of sharing.
Imaginge there is 1000+ layers

评分

参与人数 11大米 +20 收起 理由
127849172401 + 1 很有用的信息!
sclmaomao + 1 给你点个赞!
maximus2002 + 1 给你点个赞!
qiqiqi_612 + 1 很有用的信息!
bbrqlesll + 1 赞一个

查看全部评分


上一篇:OpenAI ML Debugging面试,求大佬分享 | 可交流Linear Algebra
下一篇:下个门SD店面
 楼主| andy3545 2025-2-16 04:18:38 | 显示全部楼层
本楼:   👍  2
100%
0%
0   👎
全局:   27
96%
4%
1
小亩_f59dab3 发表于 2025-2-14 22:20
Thank you! Very helpful. Already add rice.

Quick question: for the SQL part, is it based on the pre ...

Schema will be provided by the interviewer.
回复

使用道具 举报

 楼主| andy3545 2025-2-14 22:14:05 | 显示全部楼层
本楼:   👍  2
100%
0%
0   👎
全局:   27
96%
4%
1
本帖最后由 andy3545 于 2025-2-14 09:16 编辑
sevensail 发表于 2025-2-13 21:22
this is very helpful :)  

qq about the machine learning question you mentioned in full stack2, is ...

1. It was a machine learning-themed Python question. It wasn’t hard, just a lot of information and parameters to understand. But don’t worry—it was just a bonus question, so skipping it wouldn’t affected your score or performance. The interviewer even said, "We have a lot of time left, do you want to attempt this bonus question for fun? Don’t worry about finishing it."

2.
dim_posts
- post_id
- author_id
- posted_at
- shared_by [user_1, user_2, user_3]

Every time the post is re-posted/shared,  update the shared_by array column. This way you can easily find the original poster even if there was 1000+ layers of sharing. I'm not saying this is the correct solution, but this is how I apporached it when the interviewer asked how I'd handle multiple layers of sharing.
扫码关注一亩三分地求职移民公众号
更多干货内容等你发现
回复

使用道具 举报

 楼主| andy3545 2025-2-19 22:33:03 | 显示全部楼层
2
本楼:   👍  1
100%
0%
0   👎
全局:   27
96%
4%
1
ellen19930626 发表于 2025-2-18 16:37
谢谢楼主,请问python题的input长啥样? 是 movie和category在一个list?[哪吒, cartoon, 唐探,comedy]? ...

If I remeber correctly, it's something like this

ratings = {movie_1: 3.5, movie_2: 4,  movie_3: 3, movie_1:4, movie_3: 2 .....}
categories = {movie_1: horror,  movie_2: comedy,  movie_3: action ......}

You need to get the 3 highest avg rating in each category horror, comedy, action.
回复

使用道具 举报

haotongli 2025-2-14 07:39:39 | 显示全部楼层
本楼:   👍  0
0%
0%
0   👎
全局:   1
100%
0%
0
🙏
回复

使用道具 举报

sevensail 2025-2-14 10:22:26 来自APP | 显示全部楼层
本楼:   👍  0
0%
0%
0   👎
全局:   15
100%
0%
0
this is very helpful :)  

qq about the machine learning question you mentioned in full stack2, is it like a coding question or?  

Thanks for sharing!!

补充内容 (2025-02-14 14:29 +08:00):

Also "Handle posts shared 10,000+ times (Hint: Use an array to store shares)." what do you mean use array to store share?  a calculated column that stores how many time this parent post has been shared?  
回复

使用道具 举报

sevensail 2025-2-15 03:01:25 | 显示全部楼层
本楼:   👍  1
100%
0%
0   👎
全局:   15
100%
0%
0
本帖最后由 sevensail 于 2025-2-14 11:05 编辑

Thanks for sharing!  

a second approach could be we add a parent_post_id after each post_id, if it's the original post, we can set it the same is post_id or null. So we can easily track the share count by groupby the parent_post_id, and a self join to table post_dim to find the original poster and time. But I am not sure which approach is better for large scale of data in practice...
回复

使用道具 举报

haotongli 2025-2-15 04:21:29 | 显示全部楼层
本楼:   👍  1
100%
0%
0   👎
全局:   1
100%
0%
0
thanks bro
回复

使用道具 举报

本楼:   👍  0
0%
0%
0   👎
全局:   13
100%
0%
0
Thank you! Very helpful. Already add rice.

Quick question: for the SQL part, is it based on the previous schema u created, or the schema will be provided?
回复

使用道具 举报

ellen19930626 2025-2-19 05:37:24 | 显示全部楼层
本楼:   👍  0
0%
0%
0   👎
全局:   10
100%
0%
0
谢谢楼主,请问python题的input长啥样? 是 movie和category在一个list?[哪吒, cartoon, 唐探,comedy]?还是movie,category各一个list?

Python (2 Questions): Given a list of movies and categories, map movies to categories and return top 3 movies per category.
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册账号
隐私提醒:
  • ☑ 禁止发布广告,拉群,贴个人联系方式:找人请去🔗同学同事飞友,拉群请去🔗拉群结伴,广告请去🔗跳蚤市场,和 🔗租房广告|找室友
  • ☑ 论坛内容在发帖 30 分钟内可以编辑,过后则不能删帖。为防止被骚扰甚至人肉,不要公开留微信等联系方式,如有需求请以论坛私信方式发送。
  • ☑ 干货版块可免费使用 🔗超级匿名:面经(美国面经、中国面经、数科面经、PM面经),抖包袱(美国、中国)和录取汇报、定位选校版
  • ☑ 查阅全站 🔗各种匿名方法

本版积分规则

Advertisement
>
快速回复 返回顶部 返回列表