回复: 3
收起左侧

Airbnb MLE VO 全套

匿名用户-RVZSF  2024-11-9 08:53:49 来自APP
本楼:   👍  0
0%
0%
0   👎

2024(10-12月) MachineLearningEng 博士 全职@airbnb - 网上海投 - Onsite  | 😐 Neutral 😐 AverageOther | 在职跳槽

注册一亩三分地论坛,查看更多干货!

您需要 登录 才可以下载或查看附件。没有帐号?注册账号

x
VO1: BQ,常规问题

VO2, system design:
Airbnb has a list of problematic listings surfaced by user complaints, e.g., low quality, poor safety, fraudulent. They are already removed but their owners keep creating new listing entries (i.e., relisting) for them to walk around the system and keep the bad business.
Design a relisting detection system to capture those relistings, so that we can automatically take actions, e.g., removal, hide in search, make unavailable to book.

VO3: Resume and project deep dive

VO4: System design:
Prompt: Your team wants to build a product to surface. "family friendly". listings on Airbnb. You are responsible for building the Machine Learning system to identify such listings and how to rank them when
您好!
本帖隐藏的内容需要积分高于 188 才可浏览
您当前积分为 0。
使用VIP即刻解锁阅读权限或查看其他获取积分的方式
游客,您好!
本帖隐藏的内容需要积分高于 188 才可浏览
您当前积分为 0。
VIP即刻解锁阅读权限查看其他获取积分的方式
Example:~
Input: [10, 10, 10, 10, 10]
Output: [ (5, 10, 0)]
Input: [10, 11, 12, 10, 10]
Output: [ (3, 10, 1), (2, 10, 0)].

求很多米zszszszszszszszszszs


补充内容 (2024-11-12 11:19 +08:00):
求加米zszszszszszszszszszs

评分

参与人数 8大米 +18 收起 理由
abscii + 1 赞一个
cindycai + 1 很有用的信息!
windaway0215 + 1 很有用的信息!
清道神君 + 10 欢迎分享你知道的情况,会给更多大米奖励!
lks + 1 给你点个赞!

查看全部评分


上一篇:meta 25ng vo面经
下一篇:Meta 店面
地里匿名用户
匿名用户-RVZSF  2024-11-9 09:02:55 来自APP
本楼:   👍  0
0%
0%
0   👎
附上楼主自己的VO4答案,面试时候对面的面试官把楼主说的都记录了下来

1.) Data / Labels / Problem Formulation

- classification problem, want to determine if a listing is family friendly
- also a ranking problem, trying to prioritize the most family friendly ones
when the filter is applied

Probability of being family friendly

Data / Labels for this problem: four kinds of data sources

1.) Listing data (description of home, amenity, hosting information)
2.) Pricing information / availability
3.) User reviews
4.) Booking history, patterns indicating family stays
5.) User behavior data (ctr, other user actions etc.)

Label = explicit vs. implicit labels for this problem
explicit labels = amenity tags (demonstrate family specific amenities, cribs or high chairs)
or host provided information (host might indicate if listing is family friendly)

implicit labels = past reviews might tell if this listing is family friendly (children etc.)
booking patterns = is this listing often booked by family or during school holiday setc.?

user past booking history, if they are identified as a family.


How to create the labels overall?

Using human annotators to annotate if a listing is family friendly
Then can use a model to train to predict if other listings are family friendly (semi-supervised)

Later stage: can collect information about whether the listing was family friendly etc.


2.) What information would you use as features? What types of models would you train?

Features - already talked about this above.

Amenities - choose family-oriented amenities (high-chair) and safety (smoke detectors)
listing type - want entire home with kitchen, dining room etc.

can perform analysis on unstructured data such as listing description (are "kids welcome" etc.)
can also extract negative signals if a building or home is "adult only"

location features - low crime rates, safe neighborhood. Accessible to essential surfaces (parks/playgrounds etc.)

review features - cleanliness score and safety


Model Strategy:

two types of data sources. - tabular data vs. unstructured data (text)
unstructured data (text) - from both the user side and the host side

- Start with tree based (XGBDT) or linear models (logistic regression)
- very interpretability (+)
- cannot process unstructured data (-)

- can use text encoders to convert text to a dense representation (word2vec)
- can combine numerical representation with tabular data and use tree based models

- Next Step: neural networks

- try to start with simple architecture first (multiple MLP)
- try to predict family friendly probability
- given the probability we can order the items to induce a ranking

pros: efficient and potentially more powerful than GBDT / linear model
cons: text information is hard to understand/capture, might be limited in their predicted power
      text information is very unstructured, so might not be able to truly "understand" nuance
      ie. if the user says something positive and the host says something negative, how to balance contradictory informatino?
      combining tabular data with text is likely difficult

- Next step: Develop a multi-tower architecture with more advanced neural networks

- different kinds of input, can have their own tower
- can create 4-5 towers to process the corresponding features, then combined at higher layers
- ie. one tower for user-side text information
- ie. another tower for customer side text information
- combine all the layers using MHA layer to get a wholistic overview of whether the home is family friendly

Part 3: Offline and Online Evaluation

Q: What are some metrics you would try to look at Offline and Online? To see if your family friendly ranking model is performing well?

Definitely have a training/validation/test set.

Want to look at metrics such as precision/recall/Auc/F1 score
We are also trying to look at ranking metrics such as NDCG, MMR, precision @ k


Online Evaluation:

- setup A/B testing
- control group (current system), treatment (system w/ our new model)
- want to use a canary release, start with 99% control and 1% treatment
- compare the key business metrics between the two groups (conversion rate etc.)
- then slowly ramp up the traffic in the treatemnt group


Part 4: Debugging / Explainability

Q: What if your model shows strong improvement in Offline metrics but when you do the A/b test
you don't see an improvement in business metrics? How might you debug or iterate ?

- might be an issue with model generalization
   - do the offline metrics really align with business metrics ?
   - High F1 score might not translate to improvement in business metrics
    - maybe switch from F1 score to prediction @ k
- issue with model behaviors in production
    - data and feature distribution shift ?
    - compare training vs. test distribution ?

评分

参与人数 1大米 +1 收起 理由
saiba + 1 很有用的信息!

查看全部评分

回复

使用道具 举报

地里匿名用户
匿名用户-CRPNF  2024-11-28 19:51:48 来自APP
本楼:   👍  0
0%
0%
0   👎
多谢分享。祝lz好运。
扫码关注一亩三分地求职移民公众号
更多干货内容等你发现
回复

使用道具 举报

地里匿名用户
匿名用户-0TPX4  6 天前
本楼:   👍  0
0%
0%
0   👎
VO2的大概思路可以分享一下吗?
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册账号
隐私提醒:
  • ☑ 禁止发布广告,拉群,贴个人联系方式:找人请去🔗同学同事飞友,拉群请去🔗拉群结伴,广告请去🔗跳蚤市场,和 🔗租房广告|找室友
  • ☑ 论坛内容在发帖 30 分钟内可以编辑,过后则不能删帖。为防止被骚扰甚至人肉,不要公开留微信等联系方式,如有需求请以论坛私信方式发送。
  • ☑ 干货版块可免费使用 🔗超级匿名:面经(美国面经、中国面经、数科面经、PM面经),抖包袱(美国、中国)和录取汇报、定位选校版
  • ☑ 查阅全站 🔗各种匿名方法

本版积分规则

Advertisement
>
快速回复 返回顶部 返回列表