Google File System
Source: http://static.googleusercontent.com/media/research.google.com/en/us/archive/gfs-sosp2003.pdf
Published: 2003
Design Constraints
- Component failures are the norm rather than exception. FS consists of thousands if commodity parts and is accessed by comparable number of clients.
- Files are huge by traditional standards. Multi-GB files are common. Billions of approximately KB-sized files.
- Most files are mutated by appending new data rather than overwriting existing data.
- Co-designing the applications and the file system API benifits the overall system by increasing flexibility.
Architecture
GFS consists of a single master and multiple chunkservers and is accessed by multiple clients.
Class Notes:
Problem in datacenters: failures.
Master: Single point of failure, master has log of operations.
Lease: soft state.
Revoke:
- wait
- contact server
Snapshot:
- revoke lease
- copy-on-write
snapshot on large files:-> copy on writes.
Contact replicas and tell them to make local copies. Update the metadata to point to the individual chunks.
- distributed the consistency to both application libraries.
- unique identifier could be a problem.
chunks metadata at master are not persisted.
Appends are ordered:
1) lease and primary 2) primary serial #'s
Master doesn't know about the namespaces either, it is also stored on chunkservers.
- No hardlinks/softlinks
- absolute paths
- no datastructure representing directories like inodes
- no datastructure to enumerate contents of one directory
Source