Skip to content

Payment Gateway Service System Design

1. Business Requirements

Functional Requirements

  • User registration and authentication (merchants, admins)
  • Payment processing (credit/debit cards, wallets, bank transfers)
  • Transaction management (initiate, confirm, refund, cancel)
  • Multi-currency and multi-region support
  • Fraud detection and risk scoring
  • Alerts/notifications (urgent failures, suspicious activity, settlement status)
  • Mobile-ready API and dashboard
  • Role-based access control
  • Analytics and reporting (transaction trends, settlement, failures)
  • API for programmatic payment requests

Non-Functional Requirements

  • 99.9% availability (max ~8.76 hours downtime/year)
  • Scalability to handle high transaction volume and spikes
  • Secure data storage and access control (PCI DSS compliance)
  • Fast response times (<300ms for most API/dashboard requests)
  • Audit logging and monitoring
  • Backup and disaster recovery
  • GDPR/data privacy compliance
  • Mobile responsiveness

Out of Scope

  • In-person POS hardware integration
  • Built-in loyalty/rewards program
  • Direct integration with external accounting/ERP systems

2. Estimation & Back-of-the-Envelope Calculations

  • Merchants: 10,000
  • Transactions/day: 1M (payments, refunds, status checks)
  • Peak concurrent requests: ~10,000
  • Data size:
    • Transactions: 1M × 1 KB ≈ 1 GB/day
    • 1 year: 1 GB × 365 ≈ 365 GB
    • Merchant/user data: 10,000 × 2 KB ≈ 20 MB
    • Audit logs: 100M × 0.2 KB ≈ 20 GB
    • Total DB size: ~400 GB/year (excluding logs, backups)
  • Availability:
    • 99.9% = 8.76 hours/year downtime max
    • Use managed DB, multi-AZ deployment, health checks, auto-scaling

3. High Level Design (Mermaid Diagrams)

Component Diagram

mermaid
flowchart LR
  Client[Client (Web/Mobile/App)]
  LB[Load Balancer]
  API[Payment API]
  App[Application Server]
  DB[(Metadata DB)]
  Queue[Message Queue]
  Gateway[Payment Network Gateway]
  Fraud[Fraud Detection Engine]
  Alert[Alert/Notification Service]
  Analytics[Analytics Engine]

  Client --> LB --> API --> App
  App --> DB
  App --> Queue
  Queue --> Gateway
  App --> Fraud
  App --> Alert
  App --> Analytics
  Analytics --> DB

Data Flow Diagram

mermaid
sequenceDiagram
  participant C as Client
  participant A as API Server
  participant Q as Queue
  participant G as Payment Gateway
  participant F as Fraud Engine
  participant D as DB
  participant L as Alert Service

  C->>A: Initiate Payment
  A->>F: Fraud Check
  F-->>A: Risk Score
  A->>Q: Enqueue Transaction
  Q->>G: Process Payment
  G-->>Q: Payment Status
  Q->>A: Status Update
  A->>D: Log Transaction
  A->>L: Send Urgent Alert (if needed)
  A-->>C: Response

Key Design Decisions

  • Database: Relational DB (e.g., PostgreSQL) for metadata, transactions, and logs (PCI DSS compliant)
  • Queue: Distributed message queue (e.g., RabbitMQ, Kafka, AWS SQS) for decoupling and scaling
  • Fraud Detection: Dedicated engine (real-time rules + ML scoring)
  • Payment Gateway: Integration with external payment networks (Visa, Mastercard, etc.)
  • Alerting: Urgent alerts via dedicated service (e.g., Twilio, Firebase)
  • Analytics: Batch or streaming (e.g., Kafka + Spark, or managed cloud analytics)
  • Deployment: Cloud-based, multi-AZ, managed services for high availability
  • API: REST/GraphQL for programmatic access

4. Conceptual Design

Entities

  • User/Merchant: id, name, email, api_key, contact_info, role, status
  • Transaction: id, merchant_id, amount, currency, status, type, created_at, updated_at, risk_score
  • Refund: id, transaction_id, amount, status, created_at
  • Alert: id, user_id, transaction_id, type (urgent/failure/fraud), message, created_at, status
  • AuditLog: id, user_id, action, entity, entity_id, timestamp
  • Settlement: id, merchant_id, amount, status, settled_at

Key Flows

  • Payment Processing:
    1. Client initiates payment
    2. App performs fraud check
    3. Enqueues transaction for processing
    4. Payment gateway processes and returns status
    5. App logs transaction, triggers urgent alert if needed
  • Refunds:
    1. Merchant requests refund
    2. App validates and processes refund
    3. Updates transaction and logs
  • Alerts:
    • System triggers urgent alerts for failures, fraud, or settlement issues
  • Analytics:
    • Periodic jobs aggregate transaction, settlement, and risk data

Security

  • Role-based access control (RBAC)
  • API key validation, input validation, rate limiting
  • Encrypted connections (HTTPS)
  • PCI DSS compliance
  • Regular backups and audit logs

5. Bottlenecks and Refinement

Potential Bottlenecks

  • Queue/message throughput:
    • Use scalable, distributed queues and auto-scaling workers
  • Payment network latency:
    • Implement async processing, retries, and provider failover
  • Fraud engine performance:
    • Use in-memory scoring and batch ML updates
  • Database contention:
    • Use read replicas, caching, and DB connection pooling
  • Alert delivery:
    • Use async queues for urgent notifications
  • Single region failure:
    • Deploy across multiple availability zones/regions

Refinement

  • Monitor system metrics and auto-scale API servers and workers
  • Regularly test failover and backup restores
  • Optimize queries and indexes for frequent operations
  • Consider sharding if transaction/log volume grows significantly

This design provides a scalable, highly available, and mobile-ready payment gateway service with robust urgent alerts, analytics, and operational best practices.