注册一亩三分地论坛,查看更多干货!
您需要 登录 才可以下载或查看附件。没有帐号?注册账号
x
最近被他家猎头在hired上骚扰,是个engineer manager 的role,一直做tech lead想试试不同的track,就答应下来了。
第一轮店面就是网上的code signal的题目,四道题,题目巨长,前两道非常简单,一次提交通过,第三道花了点时间,导致第四道只有10分钟。反正最后只拿了715分,给hr complain了这家ide太难用,再来一次估计能上一千,但是居然让过了。安排了vo
第一题, 系统设计,题目:
Imagine you are now a part of the Data Engineering team at Company. The Fraud Detection team is looking for better, faster, more accurate insights and analytics for identifying fraudulent transactions.
Operational:
The analysts want to create weekly reports on the number of transactions that are suspected to be unauthorized based on two criteria:
1. The transaction amount is 3 times greater than the average transaction (assume a day window) amount made by the customer in the same category , excluding the suspected fraudulent transaction
2. The transaction was made outside a 100-mile radius of the customer’s address
The data the team is already capturing can be found in Schemas. Using the context above, let’s discuss the operational design of the system.
由于是老江湖,这简直不是问题。给出了下面设计:
DESIGN:
1. real time response needed; assume SLA is less than 3 seconds;
2. notification channel for customer and a way to provide feedbacks;
3. request scale, (regional first, then, global);
4. data volume, 100 million transactions (1kB per transaction) per day (100TB/day); (10 million customers);
5. cost concern of data systems (infra/platform);
6. observability of this fraud detection system (APM, monitoring, turning, ML/DS usage);
7. CI/CD of this fraud detection system;
8. Must be HA, scalable, and highly reliable (expect after 3-5 years, this system can still handle the volume);
9. Development POCs(buy existing solutions (support/cost), or build by our own teams) ( using AWS); TSYS
10ms 10ms 10ms 100 ms 10ms
pos -> Gateway -> MSK/Kafka -> Notification Service (k8s microservices)-> Redis/MemcacheD Query system/real time DB(DynamoDB for TSYS) -> Kafka/MSK/SQS ->
10ms
Notification Service -> customer terminal/Write to TSYS of acknowledged
Reporting:
1. how many good/bad trasactions;
2. metrics based;
3. audiance of the reports are developer and business users inside capatical one;
DynamoDB -> ELK -> Splunk/Grafana Mimir/prometheus -> Dashboards -> Configurable reports
最后这哥们很满意,是个印度人,但是口语不错,没有口音,貌似是个lead sde。还有一部分,根本没时间细说,就算了。
Analytical:
While the reports have a weekly rollout to cross-reference any inbound fraud claims as well as proactively notify customers of these charges as soon as they are detected, the fraudulent detection team still requires the reports to be highly available so that they can get live data anytime. Let’s discuss how you would continue to design a data architecture that is able to quickly detect fraudulent transactions from an inflow of 100 million transactions per day.
Schemas
Customers
Description: This is a list of all customers and their information.
Schema:
customer_id [string]: unique identifier to each customer
customer_first_name [string]: customer’s first name
customer_last_name [string]: customer’s last name
customer_address [string]: customer’s full address
customer_coordinates [string]: longitude-latitude pair of customer’s address
tax_id_number [string]: customer’s tax identification number
Transactions
Description: This is a list of all transactions made to company.
Schema:
transaction_id [string]: unique identifier of transaction
account_number [string]: account number in which the transaction co您好! 本帖隐藏的内容需要积分高于 188 才可浏览 您当前积分为 0。 使用VIP即刻解锁阅读权限或查看其他获取积分的方式 游客,您好! 本帖隐藏的内容需要积分高于 188 才可浏览 您当前积分为 0。 VIP即刻解锁阅读权限 或 查看其他获取积分的方式 visioned with VCPU cores, 2GB/second network and in the same VPC and security group;
3. Instance prices are fixed in the analysis period;
4. Autocaling is not turn on;
5. There are 2 instances' types;
6. 2 zones;
7. All Batch jobs, not hard deadline and runs in one day, then we can let them share the cluster with M32C8 instead creating many M42C4 instances.
US East
M32C4 for 15 mins is 1 dollar, 4 dollars / hour
M32C8 for 45 mins is 1.75 dollar, 15 mins will be 0.58 , 7 dollars / hour
US West
M32C8 for 1 hour is 7 dollars
M32C4 for 1 hour is 4 dollars
反正就是根据这些数据,你能推断application是怎么运行的。这哥们给了不少提示,但是我感觉面得不是很好。
第四轮是个lead sde, 居然问的是DE的题目。匪夷所思。说即使是manager也要有40%的coding,我说没问题,平时我也有至少30%的时间要coding。给一个网站的API,GET返回的是JSON,就是货币转换,让你找出一个月的数据,并存在csv文件里面。还用的是code signal,这平台真听恶心的。最后是写出了pseudo code,没运行,期间花了点时间在网上查pandas的library。这题目没啥难度,只是估计考验熟练程度。感觉可能挂在这里,大意了,很多library不记得了,失败。还是的好好准备。要不然真是浪费机会啊。在职没那么多时间刷题啊。苦逼了。
第五轮还是behavior就是主要问遇到技术困难怎么办,主要看中的是: problem, action 和result。这轮没啥问题。 |