Requirements
- Implement a command-line tool that reads a file and outputs its unique lines (a re-implementation of the Unix
uniq-style utility). It must run end-to-end in your own IDE / on your own machine. - A working in-memory solution typically combines a
set/hashmap to track seen lines with a queue or simple pass to preserve output order — but getting there is mostly about decomposing the steps and discussing each with the interviewer, not about a hard algorithmic trick. - Follow-up 1 — file too large for memory: the input file no longer fits in RAM. Discuss an external approach: stream the file, hash lines and partition/spill to disk (e.g. bucket by hash into multiple temp files), then de-duplicate per bucket.
- Follow-up 2: a second follow-up continues in the same direction — reasoning about how your chosen language handles file-system access and memory loading of large inputs.
Notes
- There is no provided online coding environment. You write, build, and run locally; the round leans on real file-system I/O, so make sure you can run code on your machine before the interview.
- At the end you package and upload your code (a Google Form has been used for submission). Budget time accordingly.
- Pace yourself for the two follow-ups — several candidates ran out of time on the second one. When preparing, focus on your chosen language's file-system and large-input/memory-loading APIs; that is the direction the follow-ups push.
- Talk through each step with the interviewer and converge on one solution together — the round rewards clear incremental problem-solving over a clever one-liner.

