Microservices

Micro services though useful  come with a lot of baggage. Discretion is needed to decide if they're really needed. Monoliths are not bad. Most likely what is needed is a clean interface separation between various components in the monolith. 

Monoliths can serve for a long period unless you hit issues.
  • Release velocity is affected because of dependencies between components. This hampers development, testing and deployment time
  • Scaling characteristics of different components are different such that they cause unreliable use of the resources of underlying hardware due to differing traffic patterns
    • Capacity planning becomes hard
    • Performance becomes unpredictable
    • Resource exhaustion happens frequently and randomly
  • Need to develop and scale the component independently and make it available as a service

Micro-services takes a heavy toll on SRE; without the required automation and SRE firepower, it is really hard to maintain sanity of the entire system. With micro services proliferation the problem increases manifold with the web of inter service traffic.

Micro services needs to be implemented with discretion. Keep in mind the following considerations

Architecture        




12 factor app

https://12factor.net


Availability

How is the service fault tolerant?


Scalability

What’s the horizontal and vertical scalability?


Statelessness

Is the service stateless?


Async

Can it use Lambda / Async services?


Security Considerations

2FA, HTTPS, Tokens, Encryption, GDPR, Penetration testing, App testing


API

Contracts, Versioning, Dependency


Network

• Proxy

• Sync, Async, Batch 

• Multithreaded, Event based, Coroutine


Load Handling

• Load balancer

• Circuit breaker

• Throttling


Replication

Consistency


Data

• Transactions across services 

• Partitioning

• Schema, Metadata, Evolution

• Indexing, Querying

• DB type


Caching

• Object caching

• Page Caching


Service Mesh

• Istio


Shutdown

Graceful shutdown


i18n Considerations


SRE




Backup / Restore

• RPO - Recovery Point Object, 

• RTO - Recovery Time Objective


Reliability

• MTTF - Mean time to failure

• MTTR - Mean time to Recovery

• MTBF - Meantime between failure

• Uptime

• Fault tolerance


Performance / SLAs

• SLO's - Service Level Objectives

• Response time

• Latency

• Throughput

• Uptime


Release Management

Change Management

Config Management

• Zero Downtime upgrade, 

• Rolling deployments, 

• Automated deployments


Container and Orchestration

Docker / Docker Swarm or K8S


Dev / QA environment

Automated Dev / QA environments


CI/CD pipeline

Code Deploy, Circle CI, Codeship, Jenkins


Upgrades / (0 Downtime)

Zero downtime upgrade, Rolling upgrades, Canary rollout


Deployment

Ansible / Puppet


CI/CD pipeline

Code Deploy, Circle CI, Codeship 


Service Monitoring & Alerting

Pingdom, Nagios, CloudWatch, Prometheus, DataDog


Logging

Logstash, Fluentd


Cost

Cost tags, Analytics, Cost structure, Reserved Instances, Projections, Cost Optimisations

(Tools like Botmetrics)


Capacity Planning



Security

IAM Roles, Encryption, HTTPS


Networking

Diagram, VPC


Fleet management

Tagging, AMI images, Versions, Upgrades, Consolidation, Pruning


Incident Management and 

Incident Response

Outages, Load Management, Latency, Security Incidents


Process Management

Process group, Process monitoring 


OnCall

Pager Duty, VictorOps


Versioning and Packaging


Dev Process




Git Flow

Branching and Development process


API Docs

Swagger


Sentry

Error monitoring


Metrics

Concurrency, System metrics, Engineering Metrics


Testing

• Automation, 

• API testing, 

• Integration, Load, 

• Unit testing, 

• Deployment testing, 

• Checklist,

• Regression

General




Language Version

Eg: Python 3.x/ Java 7


Framework Version

Eg: Django Version


Library Version

Eg: PyMongo Version


Licenses

Apache, MIT, GPL

Others




Metrics

Deployment Frequency

% of failed Deployments

Time from Checkin to Deployment


No comments:

Post a Comment