File Storage Service System Design
1. Business Requirements
Functional Requirements
- User registration and authentication (users, admins)
- Upload, download, and delete files (documents, images, videos, etc.)
- Organize files into folders/directories
- File sharing (public/private links, user-to-user sharing)
- Versioning and history for files
- Search and filter files (by name, type, date, etc.)
- Alerts/notifications (upload success/failure, quota limits, shared file access)
- Mobile-ready responsive UI and API
- Role-based access control
- API for programmatic file operations
Non-Functional Requirements
- 99.9% availability (max ~8.76 hours downtime/year)
- Scalability to handle large files and high concurrency
- Secure data storage and access control (encryption at rest and in transit)
- Fast response times (<300ms for most requests)
- Audit logging and monitoring
- Backup and disaster recovery
- GDPR/data privacy compliance
- Mobile responsiveness
Out of Scope
- Built-in document editing/collaboration (e.g., Google Docs)
- In-app media playback/preview (unless specified)
- Integration with third-party storage providers
2. Estimation & Back-of-the-Envelope Calculations
- Users: 50,000
- Files: 10M (average file size: 2 MB)
- Daily transactions: ~100,000 (uploads, downloads, deletions, alerts)
- Peak concurrent users: ~2,000
- Data size:
- Files: 10M × 2 MB = 20 TB
- Metadata: 10M × 0.5 KB ≈ 5 GB
- User data: 50,000 × 2 KB ≈ 100 MB
- Audit logs: 100M × 0.2 KB ≈ 20 GB
- Total DB size: ~25 GB (excluding logs, backups, file storage)
- Availability:
- 99.9% = 8.76 hours/year downtime max
- Use managed DB, multi-AZ deployment, health checks, auto-scaling
3. High Level Design (Mermaid Diagrams)
Component Diagram
mermaid
flowchart LR
User[User (Web/Mobile)]
LB[Load Balancer]
App[Application Server]
DB[(Metadata DB)]
Storage[Object Storage (Files)]
Cache[Cache (Redis)]
Alert[Alert/Notification Service]
User --> LB --> App
App --> DB
App --> Storage
App --> Cache
App --> AlertData Flow Diagram
mermaid
sequenceDiagram
participant U as User
participant A as App Server
participant D as Metadata DB
participant S as Object Storage
participant C as Cache
participant L as Alert Service
U->>A: Upload File
A->>S: Store File
S-->>A: Success/Fail
A->>D: Create Metadata Record
D-->>A: Success/Fail
A->>L: Send Upload Alert
A-->>U: ResponseKey Design Decisions
- Database: Relational DB (e.g., PostgreSQL) for metadata, strong consistency
- Object Storage: For files (e.g., AWS S3, Azure Blob, MinIO)
- Cache: Redis for fast lookups (file metadata, sessions)
- Deployment: Cloud-based, multi-AZ, managed services for high availability
- Alerting/Notifications: Email/SMS/push via third-party service (e.g., Twilio, Firebase)
- API: REST/GraphQL for file operations
4. Conceptual Design
Entities
- User: id, name, email, password_hash, role, registration_date, status
- File: id, user_id, name, path, size, type, status, created_at, updated_at, version
- Folder: id, user_id, name, parent_id, created_at
- Share: id, file_id, shared_with_user_id, link, permissions, expires_at
- Alert: id, user_id, file_id, type (upload/quota/share), message, created_at, status
- AuditLog: id, user_id, action, entity, entity_id, timestamp
Key Flows
- File Upload:
- User uploads file
- App stores file in object storage
- App creates metadata record in DB
- Sends upload alert to user
- File Sharing:
- User shares file (generates link or assigns user)
- App creates share record and sends alert
- Alerts:
- System triggers alerts for upload success/failure, quota, sharing
Security
- Role-based access control (RBAC)
- Input validation, rate limiting
- Encrypted connections (HTTPS)
- Regular backups and audit logs
5. Bottlenecks and Refinement
Potential Bottlenecks
- Object storage throughput:
- Use scalable, distributed storage and CDN for downloads
- Metadata DB contention:
- Use read replicas, caching, and DB connection pooling
- Alert delivery:
- Use async queues for notifications
- Large file uploads/downloads:
- Use chunked uploads/downloads and resumable transfers
- Single region failure:
- Deploy across multiple availability zones/regions
Refinement
- Monitor system metrics and auto-scale app servers
- Regularly test failover and backup restores
- Optimize queries and indexes for frequent operations
- Consider sharding if user/file volume grows significantly
This design provides a scalable, highly available, and mobile-ready file storage service with robust alerts, security, and operational best practices.