注册一亩三分地论坛,查看更多干货!
您需要 登录 才可以下载或查看附件。没有帐号?注册账号
x
本帖最后由 匿名 于 2024-4-1 20:57 编辑
lz硕7年,在五线厂做senior MLE 1年多。工作倒是清闲,但是技术比较落后,大概五个月前开始随缘跳槽。
面了一众公司,包括meta,Uber,Netflix,tiktok等等,0offer。很多都挂在了system design。打算开始潜心读DDIA,修炼内功。
刚读了第一章,做了些笔记,分享出来给需要的人。
大家觉得这种分享有帮助吗?有更好的方式,也请留言告诉我。有用的话,我会继续分享。
第一章
Building Blocks for Data-Intensive Applications
Databases
Store data so that we can find them later.--
Caches
Memorize the result of an expensive operation to speed up reads
Search Indexes
Allow users to search data by keywords or filter it in various ways
Stream Processing
Send a message to another process, to be handled asynchronously
Batch Processing
Periodically crunch a large amount of accumulated data
Reliability
Definition
Performing functions that users expected
Tolerating user mistakes
Good performance under expected load.google и
Preventing unauthorized access and abuse
. Waral dи,
Fault vs Failure
Fault: One component deviating from its specification
Failure: The whole system stopped
Achieving Reliability
Think through assumptions and interactions in the system.google и
Testing at all levels (unit tests, integration tests, manual tests)
Process isolation. .и
Allow processes to crash and restart
Measuring and monitoring system behaviors
Analyzing system behaviors-baidu 1point3acres
Well-designed abstractions, APIs, and admin interfaces to discourage mistakes
Decouple places where people make frequent mistakes from ones causing failures
Allow quick and easy recovery
Roll back config changes quickly
Roll out new changes gradually
Provide tools to recompute data
Good management practices and training
Scalability
Load Parameters
Requests per second (RPS) to the server
Ratio of reads to writes
Simultaneously active users in the chat room
Hit rate on the cache
Distribution of users (e.g., Twitter's top user has 30M followers, causing fan-out for writes)
. check 1point3acres for more.
Describing Performance
Batch processing: throughput
Online system: response time
Average response time
Response time percentiles (tail latencies are more important for user experience)
Latency vs Response Time
Response time = service time + network delays + queueing delays
Latency is the duration that a request is waiting to be handled
Varying Response Time
Request differences
Context switch to a background process
Loss of a network packet
TCP retransmission. .и
Garbage collection pause
Page fault forcing a read from disk. check 1point3acres for more.
Mechanical vibrations in the server rack
. 1point 3 acres
Calculating Percentiles
Naive: sort the list of response times every minute
Forward decay
T-digest. Χ
HdrHistogram
Achieving Scalability
Mixture of scaling up (more powerful machine) and scaling out (more machines)-baidu 1point3acres
Elastic system: scale up or down based on traffic
Factors for system design
Volume of reads
Volume of writes
Volume of data to store
Complexity of the data
Response time requirements.--
Access pattern. From 1point 3acres bbs
. 1point3acres
Maintainability
Typical Maintenance Work
Fixing bugs
Keeping the system operational
Investigating failures
Adapting it to new platforms
Modifying it for new use cases
Repaying technical debt
Adding new features
Improving Maintainability
Provide visibility into the runtime behavior with good monitoring
Provide good automation tools
Avoid dependency on individual machines. 1point 3acres
Provide good documentation and operational instructions
Provide good default behavior but also give admin the freedom to override default values.1point3acres
Self-healing when necessary but also give admin the freedom to take over. check 1point3acres for more.
Exhibit predictable behavior, minimizing surprises
Simplicity
Symptoms of Complexity
Explosion of the state space
Tight coupling of modules
Tangled dependencies
Inconsistent naming and terminology
Hacks aimed at solving performance problems. .и
Special-casing to work around issues elsewhere
Achieving Simplicity
Simplicity does not mean reducing functionality
Abstraction: hide implementation details behind a clean facade. .и
.google и
Evolvability
Common Changes
New facts
Previously unanticipated use cases emerge
Business priorities change. Χ
Users request new features. 1point3acres
New platforms replace old platforms
Legal requirements change. From 1point 3acres bbs
Growth of the system |