2025 Meta Data Engineer Onsite

andy3545

注册一亩三分地论坛，查看更多干货！

您需要登录才可以下载或查看附件。没有帐号？注册账号

x

本帖最后由 andy3545 于 2025-2-13 12:23 编辑

Backgound:

I am writing this in English because I moved to Canada from China at a young age and only have a third-grade level of Chinese proficiency.

4 years of experience in Data Engineering
Graduated from a Canadian Unversity
Currently working at a ride-sharing company
Prepared around 25 hours total (on weekends and after work)

Timeline:

10/24/2024: Referral via a friend
11/1/2024: HR Call
12/5/2024: Technical Phone Screen
1/31/2025: Onsite Day 1
2/5/2025: Onsite Day 2
2/11/2025: Verbal Offer for E4

Technical Phone Screen:

I have a detailed post on this here:
SQL Topics: GROUP BY HAVING, sub-queries, SUM(CASE WHEN), self-joins
If you work with SQL regularly, this should be straightforward. I was able to complete all 5/5 questions.
Focus on fundamentals: Lists, Strings, Tuples, Dictionaries.
The challenge isn't the questions themselves but the speed required—aim for max 8 minutes per question

Onsite Interviews:

Very similar to this .

Behavioral Round (Indian Interviewer)

Interviewed by an Indian manager, very friendly.
Be prepared for follow-up questions
您好！
本帖隐藏的内容需要积分高于 188 才可浏览
您当前积分为 0。
使用VIP即刻解锁阅读权限或查看其他获取积分的方式
游客，您好！
本帖隐藏的内容需要积分高于 188 才可浏览
您当前积分为 0。
VIP即刻解锁阅读权限或查看其他获取积分的方式
 Unlock interview details and practice with AI
Curated Interview Questions from Top Companies
y reading from a daily aggregated table instead of the raw fact table.

Best of Luck. 加油!!

补充内容 (2025-02-14 22:25 +08:00):

Data Modeling: Handle posts shared 10,000+ times (Hint: Use an array to store shares).

Standard newsfeed data model, but the follow up question was "How does your data model handle multiple layers of sharing, and efficietly count how many shares each post has and who the original poster and posted time is?"

For example:
user_1 shares, then user_2 sees user_1's post and shares, this is considered 2 layers of sharing.
Imaginge there is 1000+ layers

补充内容 (2025-05-28 03:24 +08:00):

I don't check this site anymore. If you have questions, send me a message on linkedin

andy3545

ellen19930626 发表于 2025-2-18 16:37
谢谢楼主，请问python题的input长啥样？是 movie和category在一个list？[哪吒, cartoon, 唐探，comedy]？ ...

If I remeber correctly, it's something like this

ratings = {movie_1: 3.5, movie_2: 4, movie_3: 3, movie_1:4, movie_3: 2 .....}
categories = {movie_1: horror, movie_2: comedy, movie_3: action ......}

You need to get the 3 highest avg rating in each category horror, comedy, action.

andy3545

小亩_f59dab3 发表于 2025-2-14 22:20
Thank you! Very helpful. Already add rice.

Quick question: for the SQL part, is it based on the pre ...

Schema will be provided by the interviewer.

andy3545

本帖最后由 andy3545 于 2025-2-14 09:16 编辑

sevensail 发表于 2025-2-13 21:22
this is very helpful :)

qq about the machine learning question you mentioned in full stack2, is ...

1. It was a machine learning-themed Python question. It wasn’t hard, just a lot of information and parameters to understand. But don’t worry—it was just a bonus question, so skipping it wouldn’t affected your score or performance. The interviewer even said, "We have a lot of time left, do you want to attempt this bonus question for fun? Don’t worry about finishing it."

2.
dim_posts
- post_id
- author_id
- posted_at
- shared_by [user_1, user_2, user_3]

Every time the post is re-posted/shared, update the shared_by array column. This way you can easily find the original poster even if there was 1000+ layers of sharing. I'm not saying this is the correct solution, but this is how I apporached it when the interviewer asked how I'd handle multiple layers of sharing.

haotongli

🙏

sevensail

this is very helpful :)

qq about the machine learning question you mentioned in full stack2, is it like a coding question or?

Thanks for sharing!!

补充内容 (2025-02-14 14:29 +08:00):

Also "Handle posts shared 10,000+ times (Hint: Use an array to store shares)." what do you mean use array to store share?  a calculated column that stores how many time this parent post has been shared?

sevensail

本帖最后由 sevensail 于 2025-2-14 11:05 编辑

Thanks for sharing!

a second approach could be we add a parent_post_id after each post_id, if it's the original post, we can set it the same is post_id or null. So we can easily track the share count by groupby the parent_post_id, and a self join to table post_dim to find the original poster and time. But I am not sure which approach is better for large scale of data in practice...

haotongli

thanks bro

小亩_f59dab3

Thank you! Very helpful. Already add rice.

Quick question: for the SQL part, is it based on the previous schema u created, or the schema will be provided?

ellen19930626

谢谢楼主，请问python题的input长啥样？是 movie和category在一个list？[哪吒, cartoon, 唐探，comedy]？还是movie，category各一个list？

Python (2 Questions): Given a list of movies and categories, map movies to categories and return top 3 movies per category.

2025 Meta Data Engineer Onsite

注册一亩三分地论坛，查看更多干货！

评分

相关帖子

本帖被以下淘专辑推荐:

浏览过的版块