Payment Gateway Service System Design
1. Business Requirements
Functional Requirements
- User registration and authentication (merchants, admins)
- Payment processing (credit/debit cards, wallets, bank transfers)
- Transaction management (initiate, confirm, refund, cancel)
- Multi-currency and multi-region support
- Fraud detection and risk scoring
- Alerts/notifications (urgent failures, suspicious activity, settlement status)
- Mobile-ready API and dashboard
- Role-based access control
- Analytics and reporting (transaction trends, settlement, failures)
- API for programmatic payment requests
Non-Functional Requirements
- 99.9% availability (max ~8.76 hours downtime/year)
- Scalability to handle high transaction volume and spikes
- Secure data storage and access control (PCI DSS compliance)
- Fast response times (<300ms for most API/dashboard requests)
- Audit logging and monitoring
- Backup and disaster recovery
- GDPR/data privacy compliance
- Mobile responsiveness
Out of Scope
- In-person POS hardware integration
- Built-in loyalty/rewards program
- Direct integration with external accounting/ERP systems
2. Estimation & Back-of-the-Envelope Calculations
- Merchants: 10,000
- Transactions/day: 1M (payments, refunds, status checks)
- Peak concurrent requests: ~10,000
- Data size:
- Transactions: 1M × 1 KB ≈ 1 GB/day
- 1 year: 1 GB × 365 ≈ 365 GB
- Merchant/user data: 10,000 × 2 KB ≈ 20 MB
- Audit logs: 100M × 0.2 KB ≈ 20 GB
- Total DB size: ~400 GB/year (excluding logs, backups)
- Availability:
- 99.9% = 8.76 hours/year downtime max
- Use managed DB, multi-AZ deployment, health checks, auto-scaling
3. High Level Design (Mermaid Diagrams)
Component Diagram
mermaid
flowchart LR
Client[Client (Web/Mobile/App)]
LB[Load Balancer]
API[Payment API]
App[Application Server]
DB[(Metadata DB)]
Queue[Message Queue]
Gateway[Payment Network Gateway]
Fraud[Fraud Detection Engine]
Alert[Alert/Notification Service]
Analytics[Analytics Engine]
Client --> LB --> API --> App
App --> DB
App --> Queue
Queue --> Gateway
App --> Fraud
App --> Alert
App --> Analytics
Analytics --> DBData Flow Diagram
mermaid
sequenceDiagram
participant C as Client
participant A as API Server
participant Q as Queue
participant G as Payment Gateway
participant F as Fraud Engine
participant D as DB
participant L as Alert Service
C->>A: Initiate Payment
A->>F: Fraud Check
F-->>A: Risk Score
A->>Q: Enqueue Transaction
Q->>G: Process Payment
G-->>Q: Payment Status
Q->>A: Status Update
A->>D: Log Transaction
A->>L: Send Urgent Alert (if needed)
A-->>C: ResponseKey Design Decisions
- Database: Relational DB (e.g., PostgreSQL) for metadata, transactions, and logs (PCI DSS compliant)
- Queue: Distributed message queue (e.g., RabbitMQ, Kafka, AWS SQS) for decoupling and scaling
- Fraud Detection: Dedicated engine (real-time rules + ML scoring)
- Payment Gateway: Integration with external payment networks (Visa, Mastercard, etc.)
- Alerting: Urgent alerts via dedicated service (e.g., Twilio, Firebase)
- Analytics: Batch or streaming (e.g., Kafka + Spark, or managed cloud analytics)
- Deployment: Cloud-based, multi-AZ, managed services for high availability
- API: REST/GraphQL for programmatic access
4. Conceptual Design
Entities
- User/Merchant: id, name, email, api_key, contact_info, role, status
- Transaction: id, merchant_id, amount, currency, status, type, created_at, updated_at, risk_score
- Refund: id, transaction_id, amount, status, created_at
- Alert: id, user_id, transaction_id, type (urgent/failure/fraud), message, created_at, status
- AuditLog: id, user_id, action, entity, entity_id, timestamp
- Settlement: id, merchant_id, amount, status, settled_at
Key Flows
- Payment Processing:
- Client initiates payment
- App performs fraud check
- Enqueues transaction for processing
- Payment gateway processes and returns status
- App logs transaction, triggers urgent alert if needed
- Refunds:
- Merchant requests refund
- App validates and processes refund
- Updates transaction and logs
- Alerts:
- System triggers urgent alerts for failures, fraud, or settlement issues
- Analytics:
- Periodic jobs aggregate transaction, settlement, and risk data
Security
- Role-based access control (RBAC)
- API key validation, input validation, rate limiting
- Encrypted connections (HTTPS)
- PCI DSS compliance
- Regular backups and audit logs
5. Bottlenecks and Refinement
Potential Bottlenecks
- Queue/message throughput:
- Use scalable, distributed queues and auto-scaling workers
- Payment network latency:
- Implement async processing, retries, and provider failover
- Fraud engine performance:
- Use in-memory scoring and batch ML updates
- Database contention:
- Use read replicas, caching, and DB connection pooling
- Alert delivery:
- Use async queues for urgent notifications
- Single region failure:
- Deploy across multiple availability zones/regions
Refinement
- Monitor system metrics and auto-scale API servers and workers
- Regularly test failover and backup restores
- Optimize queries and indexes for frequent operations
- Consider sharding if transaction/log volume grows significantly
This design provides a scalable, highly available, and mobile-ready payment gateway service with robust urgent alerts, analytics, and operational best practices.