Service-level Agreements (SLA)
If you are studying for Microsoft Azure Fundamentals Exam, this guide will help you with quick revision before the exam. it can use as study notes for your preparation.
Dashboard Other Certification NotesService-level Agreements (SLA)
- Formal documents to define the performance standards that apply to Azure.
- Specify also what happens if a service or product fails to perform to a governing SLAs specification.
- There are SLAs for individual Azure products and services.
- ❗ Azure does not provide SLAs for most services under the Free or Shared tiers
- e.g. Azure Advisor
- Three key characteristics of SLAs for Azure products and services:
- Performance Targets
- Specific to each Azure product and service.
- E.g. uptime guarantees or connectivity rates
- Uptime and Connectivity Guarantees
- 📝 Monthly Uptime % =
(Maximum Available Minutes-Downtime) / Maximum Available Minutes X 100
- 📝 Range from 99.9% (“three nines”) to 99.999% (“five nines”) for any paid tier service.
- In other words minimum SLA for all non-free Azure services are 99.9%
- E.g. Azure Cosmos DB (Database) service SLA offers 99.999 percent uptime
- meaning it allows for about 5 minutes of total downtime per year.
- also includes low-latency commitments of less than 10 milliseconds on DB read + write operations.
- 📝 Monthly Uptime % =
- 📝 Service credits
- Given to paying Azure customers if uptime percentage is lower than given in SLA.
- Describe how Microsoft will respond if an Azure product or service fails to perform to its governing SLAs specification.
- E.g. customers may have a discount applied to their Azure bill, as compensation for an under-performing Azure product or service.
- Performance Targets
- Read more: SLA Summary for Azure Services
Composite SLA
- Result of combining SLAs across different service offerings.
- 📝 Calculating downtime
- E.g. web app (99.95% SLA from Azure) writes to SQL database (99.99% SLA from Azure)
- Composite SLA =
99.95 percent × 99.99 percent = 99.94 percent
- =
0.9995 * 0.9999 = 0.9994
- =
- Means combined probability of failure is higher than the individual SLA values
- Composite SLA =
- E.g. web app (99.95% SLA from Azure) writes to SQL database (99.99% SLA from Azure)
- You can improve the composite SLA by creating independent fallback paths.
- E.g. if the SQL Database is unavailable, you can put transactions into a queue for processing at a later time.
- Web app (99.95%) writes to either SQL Database (99.99%) or queue (99.9%)
- Application is still available even if it can’t connect to the database.
- ❗But it fails if both the database and the queue fail simultaneously.
- If the expected percentage of time for a simultaneous failure is 0.0001 × 0.001
- the composite SLA for this combined path of a database or queue would be:
1.0 − (0.0001 × 0.001) = 99.99999 percent
- the composite SLA for this combined path of a database or queue would be:
- If we add the queue to our web app, the total composite SLA is:
99.95 percent × 99.99999 percent = ~99.95 percent
- Improves SLA but application logic gets more complicated
- You are paying more to add the queue support and there may be data-consistency issues you’ll have to deal with due to retry behavior.
- E.g. if the SQL Database is unavailable, you can put transactions into a queue for processing at a later time.
Application SLA
- By creating your own SLAs, you can set performance targets to suit your specific Azure application.
- 💡 >= four 9’s (99.99%) SLA performance targets =>
- manual intervention from failures may not be enough (difficult to be quick enough)
- should have self-diagnosing & self-healing solutions.
Resiliency
- Resiliency is the ability of a system to recover from failures and continue to function.
- High availability and disaster recovery are two crucial components of resiliency
- 📝 Disaster recovery: When Godzilla destroys your data center, you do have alternative locations to keep providing your service and protocols/means for the other location to know how to keep delivering the service.
- Failure Mode Analysis (FMA)
- Goal:
- Identify possible points of failure.
- Define how the application will respond to those failures.
- Goal:
- Read more: Designing resilient applications for Azure
High availability
- 📝 Availability is often given as percentage uptime
- Refers to the time that a system is functional and working.
- Most providers prefer to maximize the availability of their Azure solutions by minimizing downtime.
- ❗ As you increase availability, you also increase the cost and complexity of your solution.
- As your solution grows in complexity, you will have more services depending on each other.
- You might overlook possible failure points in your solution if you have any interdependent services.
- 💡E.g. a workload that requires 99.99 percent uptime shouldn’t depend upon a service with a 99.9 percent SLA.
- Read more: Availability choices for Azure compute