Data Architecture

Data is the glue that holds distributed systems together. Data has it's own lifecycle starting from origination to flow, transformation, modification and finally deletion. At a high level these are some of the data design considerations

Database Type 

  • Relational, KV-store, Column Family, Document, Graph etc

Indexing and Querying

  • LSM, B-Tree, B+-Tree
  • R/W Access patterns
  • Range queries

Schema, Metadata, Encoding and Evolution

  • Versioning
    • Forward and Backward compatibility
  • Format
    • Avro, Thrift, Protobuf


  • Atomicity
  • Isolation level
    • Lost update
    • Read committed
    • Read repeatable (snapshot isolation - Readers never block writers and vice versa)
    • Prevent lost update (read-modify-update cycle)
      • CSET, Atomic write, Conflict resolution, 
    • Phantom (generalisation of lost update)
      • Materialising conflict
    • Serialisability
      • Actual serial execution
      • 2 phase locking
        • Writers don’t just block other writers; they also block readers and vice versa. 
        • Deadlocks
        • Predicate lock, index range lock
      • SSI (Optimistic concurrency control)
        • all reads within a transaction are made from a consistent snapshot of the database
        • Pessimistic concurrency control - 2PL, ASE
        • Optimistic concurrency control - MVCC
  • Data durability 
    • WAL Logs
    • Backup and Restore
    • Error handling, RRD, Save points


  • Master slave
    • Read your writes, monotonic read, consistent prefix reads
  • Multi master (couchdb)
    • Conflict resolution / avoidance
  • Leaderless (Dynamo style)
    • Quorum, Sloppy quorum, LWW, Concurrent writes, Version vectors

Data partitioning

  • Hotspot
  • Partition by Key range, consistent hashing, Compound primary key
  • Partition index - Local index (scatter-gather), Global index
  • Rebalancing - Fixed, Dynamic
  • Request routing 

Trouble with Distributed systems

  • Faults and partial failures
  • Unreliable networks - synchronous vs asynchronous networks
  • Unreliable clocks
    • Don’t rely on the accuracy of the clock
    • NTP sync is not accurate
    • Confidence interval in time - Spanner
  • Process pause
  • Knowledge truth lies
    • Truth is defined by majority
    • Fencing tokens is required for distributed lock
    • Safety and liveness


  • Serializability is an isolation property of transactions
  • Linearisablity is a recency guarantee
  • To make a system appear as if there is only a single copy of the data. 
  • Single leader replication is potentially linearisable. Multi-leader and leaderless are not linearisable
  • A network interruption forces a choice between linearisability and availability. 
  • CAP - Linearisable / Available when partitioned
  • The reason for dropping linearisability is performance , not fault tolerance.
  • Most DB are neither CAP-linearisable nor CAP-available 
    • MVCC is intentionally non-linearisable
    • Single leader replication (async) is non-linearisable
    • Partition makes it non-available
  • ZooKeeper by default is neither CAP-consistent (CP) nor CAP-available (AP) – it’s really just “P”. 
    • You can optionally make it CP by calling sync if you want, 
    • and for reads (but not for writes) it’s actually AP, if you turn on the right option.

Error handling , Recovery

  • Replay logs
  • Discover partitions
  • Replication catchup
  • Restores

Source of truth 

  • Primary DB
  • Derived data
    • Caches
    • Materialised Views
    • CQRS

Batch processing

  • Putting the computation near the data
  • Mapping and Chaining
  • Sort merge joins
  • GroupBy
  • Handling Skew
  • MapSide joins
    • Broadcast hash joins - joining a large dataset with a small dataset
    • Partitioned hash joins - partition and reduce the dataset
    • Mapside merge joins - if input is partitioned and sorted appropriately
  • Use cases - Search index, Key Value stores
  • The output of a reduce-side join is partitioned and sorted by the join key, whereas the output of a map-side join is partitioned and sorted in the same way as the large input 
  • Input is immutable and no side effects
  • In an environment where tasks are not so often terminated, the design decisions of MapReduce make less sense.
  • Materialising intermediate state
    • Makes fault tolerance easy
    • Sometimes they are not needed

Streaming processing

  • Messaging
    • Brokers
      • Routing: Fanout, load balancing
      • Acknowledgement and redelivery
      • Message ordering 
  • Partitioned logs - Kafka
    • Unified log
    • Partitioning to scale
    • Replayable
    • Consumer offsets
    • Durability
    • Immutable input and idempotent operations
  • CDC - Change Data Capture
    • Implemented using log based broker
    • Connectors - Debezium etc
  • Event sourcing
    • Immutable events written to event log (this is used in accounting where delta is captured)
    • CQRS - separate forms for read and write and allowing several different read views
  • Stream processing
    • Produce a new stream, real time dashboard, write it to a end database
    • Has operators - sorting doesn’t make sense
    • Uses: CEP, Stream analytics, Maintaining materialised views
    • Event time vs Processing time
    • Stream Joins
      • Stream-Stream joins (click through rate)
      • Stream-Table join (enrichment)
        • Similar to map side hash join
        • Table can be kept upto date using CDC
      • Table-Table join (materialised view maintenance)
        • Twitter
      • Time dependence of joins
        • Slowly changing dimension

Big Data

  • Data Analytics
  • ETL
  • Data warehouse 

Data Security

  • Encryption at Rest
  • Data Retention
  • Data classification
  • (Customer) Data isolation
  • Data Integrity
  • Data Availability
  • Data anonymisation
  • Backup and recovery
  • Database Security
  • Access Control

1 comment:

  1. I have read your blog and I gathered some new information through your blog. Your blog is really very informative and unique. Keep posting like this.

    Text Analytics Solutions

    Text Extraction Software

    NLP Analytics Solutions

    Text Summarization Solutions