Data Architecture

Data is the glue that holds distributed systems together. Data has it's own lifecycle starting from origination to flow, transformation, modification and finally deletion. At a high level these are some of the data design considerations



Database Type 

  • Relational, KV-store, Column Family, Document, Graph etc


Indexing and Querying

  • LSM, B-Tree, B+-Tree
  • R/W Access patterns
  • Range queries


Schema, Metadata, Encoding and Evolution

  • Versioning
    • Forward and Backward compatibility
  • Format
    • Avro, Thrift, Protobuf


Transaction

  • Atomicity
  • Isolation level
    • Lost update
    • Read committed
    • Read repeatable (snapshot isolation - Readers never block writers and vice versa)
    • Prevent lost update (read-modify-update cycle)
      • CSET, Atomic write, Conflict resolution, 
    • Phantom (generalisation of lost update)
      • Materialising conflict
    • Serialisability
      • Actual serial execution
      • 2 phase locking
        • Writers don’t just block other writers; they also block readers and vice versa. 
        • Deadlocks
        • Predicate lock, index range lock
      • SSI (Optimistic concurrency control)
        • all reads within a transaction are made from a consistent snapshot of the database
        • Pessimistic concurrency control - 2PL, ASE
        • Optimistic concurrency control - MVCC
  • Data durability 
    • WAL Logs
    • Backup and Restore
    • Error handling, RRD, Save points


Replication

  • Master slave
    • Read your writes, monotonic read, consistent prefix reads
  • Multi master (couchdb)
    • Conflict resolution / avoidance
  • Leaderless (Dynamo style)
    • Quorum, Sloppy quorum, LWW, Concurrent writes, Version vectors


Data partitioning

  • Hotspot
  • Partition by Key range, consistent hashing, Compound primary key
  • Partition index - Local index (scatter-gather), Global index
  • Rebalancing - Fixed, Dynamic
  • Request routing 


Trouble with Distributed systems

  • Faults and partial failures
  • Unreliable networks - synchronous vs asynchronous networks
  • Unreliable clocks
    • Don’t rely on the accuracy of the clock
    • NTP sync is not accurate
    • Confidence interval in time - Spanner
  • Process pause
  • Knowledge truth lies
    • Truth is defined by majority
    • Fencing tokens is required for distributed lock
    • Safety and liveness


Linearisablity

  • Serializability is an isolation property of transactions
  • Linearisablity is a recency guarantee
  • To make a system appear as if there is only a single copy of the data. 
  • Single leader replication is potentially linearisable. Multi-leader and leaderless are not linearisable
  • A network interruption forces a choice between linearisability and availability. 
  • CAP - Linearisable / Available when partitioned
  • The reason for dropping linearisability is performance , not fault tolerance.
  • Most DB are neither CAP-linearisable nor CAP-available 
    • MVCC is intentionally non-linearisable
    • Single leader replication (async) is non-linearisable
    • Partition makes it non-available
  • ZooKeeper by default is neither CAP-consistent (CP) nor CAP-available (AP) – it’s really just “P”. 
    • You can optionally make it CP by calling sync if you want, 
    • and for reads (but not for writes) it’s actually AP, if you turn on the right option.


Error handling , Recovery

  • Replay logs
  • Discover partitions
  • Replication catchup
  • Restores


Source of truth 

  • Primary DB
  • Derived data
    • Caches
    • Materialised Views
    • CQRS


Batch processing

  • Putting the computation near the data
  • Mapping and Chaining
  • Sort merge joins
  • GroupBy
  • Handling Skew
  • MapSide joins
    • Broadcast hash joins - joining a large dataset with a small dataset
    • Partitioned hash joins - partition and reduce the dataset
    • Mapside merge joins - if input is partitioned and sorted appropriately
  • Use cases - Search index, Key Value stores
  • The output of a reduce-side join is partitioned and sorted by the join key, whereas the output of a map-side join is partitioned and sorted in the same way as the large input 
  • Input is immutable and no side effects
  • In an environment where tasks are not so often terminated, the design decisions of MapReduce make less sense.
  • Materialising intermediate state
    • Makes fault tolerance easy
    • Sometimes they are not needed


Streaming processing

  • Messaging
    • Brokers
      • Routing: Fanout, load balancing
      • Acknowledgement and redelivery
      • Message ordering 
  • Partitioned logs - Kafka
    • Unified log
    • Partitioning to scale
    • Replayable
    • Consumer offsets
    • Durability
    • Immutable input and idempotent operations
  • CDC - Change Data Capture
    • Implemented using log based broker
    • Connectors - Debezium etc
  • Event sourcing
    • Immutable events written to event log (this is used in accounting where delta is captured)
    • CQRS - separate forms for read and write and allowing several different read views
  • Stream processing
    • Produce a new stream, real time dashboard, write it to a end database
    • Has operators - sorting doesn’t make sense
    • Uses: CEP, Stream analytics, Maintaining materialised views
    • Event time vs Processing time
    • Stream Joins
      • Stream-Stream joins (click through rate)
      • Stream-Table join (enrichment)
        • Similar to map side hash join
        • Table can be kept upto date using CDC
      • Table-Table join (materialised view maintenance)
        • Twitter
      • Time dependence of joins
        • Slowly changing dimension


Big Data

  • Data Analytics
  • ETL
  • Data warehouse 

Data Security

  • Encryption at Rest
  • Data Retention
  • Data classification
  • (Customer) Data isolation
  • Data Integrity
  • Data Availability
  • Data anonymisation
  • Backup and recovery
  • Database Security
  • Access Control




Overall Security

 

Microservices

Micro services though useful  come with a lot of baggage. Discretion is needed to decide if they're really needed. Monoliths are not bad. Most likely what is needed is a clean interface separation between various components in the monolith. 

Monoliths can serve for a long period unless you hit issues.
  • Release velocity is affected because of dependencies between components. This hampers development, testing and deployment time
  • Scaling characteristics of different components are different such that they cause unreliable use of the resources of underlying hardware due to differing traffic patterns
    • Capacity planning becomes hard
    • Performance becomes unpredictable
    • Resource exhaustion happens frequently and randomly
  • Need to develop and scale the component independently and make it available as a service

Micro-services takes a heavy toll on SRE; without the required automation and SRE firepower, it is really hard to maintain sanity of the entire system. With micro services proliferation the problem increases manifold with the web of inter service traffic.

Micro services needs to be implemented with discretion. Keep in mind the following considerations

Architecture        




12 factor app

https://12factor.net


Availability

How is the service fault tolerant?


Scalability

What’s the horizontal and vertical scalability?


Statelessness

Is the service stateless?


Async

Can it use Lambda / Async services?


Security Considerations

2FA, HTTPS, Tokens, Encryption, GDPR, Penetration testing, App testing


API

Contracts, Versioning, Dependency


Network

• Proxy

• Sync, Async, Batch 

• Multithreaded, Event based, Coroutine


Load Handling

• Load balancer

• Circuit breaker

• Throttling


Replication

Consistency


Data

• Transactions across services 

• Partitioning

• Schema, Metadata, Evolution

• Indexing, Querying

• DB type


Caching

• Object caching

• Page Caching


Service Mesh

• Istio


Shutdown

Graceful shutdown


i18n Considerations


SRE




Backup / Restore

• RPO - Recovery Point Object, 

• RTO - Recovery Time Objective


Reliability

• MTTF - Mean time to failure

• MTTR - Mean time to Recovery

• MTBF - Meantime between failure

• Uptime

• Fault tolerance


Performance / SLAs

• SLO's - Service Level Objectives

• Response time

• Latency

• Throughput

• Uptime


Release Management

Change Management

Config Management

• Zero Downtime upgrade, 

• Rolling deployments, 

• Automated deployments


Container and Orchestration

Docker / Docker Swarm or K8S


Dev / QA environment

Automated Dev / QA environments


CI/CD pipeline

Code Deploy, Circle CI, Codeship, Jenkins


Upgrades / (0 Downtime)

Zero downtime upgrade, Rolling upgrades, Canary rollout


Deployment

Ansible / Puppet


CI/CD pipeline

Code Deploy, Circle CI, Codeship 


Service Monitoring & Alerting

Pingdom, Nagios, CloudWatch, Prometheus, DataDog


Logging

Logstash, Fluentd


Cost

Cost tags, Analytics, Cost structure, Reserved Instances, Projections, Cost Optimisations

(Tools like Botmetrics)


Capacity Planning



Security

IAM Roles, Encryption, HTTPS


Networking

Diagram, VPC


Fleet management

Tagging, AMI images, Versions, Upgrades, Consolidation, Pruning


Incident Management and 

Incident Response

Outages, Load Management, Latency, Security Incidents


Process Management

Process group, Process monitoring 


OnCall

Pager Duty, VictorOps


Versioning and Packaging


Dev Process




Git Flow

Branching and Development process


API Docs

Swagger


Sentry

Error monitoring


Metrics

Concurrency, System metrics, Engineering Metrics


Testing

• Automation, 

• API testing, 

• Integration, Load, 

• Unit testing, 

• Deployment testing, 

• Checklist,

• Regression

General




Language Version

Eg: Python 3.x/ Java 7


Framework Version

Eg: Django Version


Library Version

Eg: PyMongo Version


Licenses

Apache, MIT, GPL

Others




Metrics

Deployment Frequency

% of failed Deployments

Time from Checkin to Deployment