Scalability is like black magic. Dark, mysterious, powerful, and unforgiving. Nonetheless, it’s magic worth mastering.
Scalability problems appear when an application starts receiving more users than anticipated. At that time, the tech team starts looking for someone who can magically put more nodes in the system and scale the application.
But that’s the wrong way to approach scalability. Scalability decisions should be included in the application right in the beginning. Even before you finalize the technologies for the frontend and backend.
Before jumping into more details, let’s first understand the what and why of scalability.
What is scalability?
Scalability, in terms of an application, is its ability to continue functioning even with an increased number of users or scope. But scalability can be applied to any system– to organizations, or even teams.
When scalability is done right, it can smoothly handle the increasing user traffic without any performance issues. If we lay the foundation in the right way, we don’t even need to make any major changes in the code or server’s architecture.
Why does scalability matter?
Think of an application on your phone that you love. Now imagine this application giving you troubles like frequent app crashes, poor customer service, and unnecessary in-app updates. What will you do in that case? Will you continue to use the same application or start looking for better options?
I am guessing it’s the second one. Scalability problems look somewhat like this. When developers of the app see that their user base has suddenly increased and they need more servers to tend to this increasing demand, they go helter-skelter. They try every possible method to fix the gaps in performance. If by that time, users have already faced multiple issues, they’ll uninstall your application and start using the competitor’s app.
On the other hand, scalable applications are designed to seamlessly handle explosive growth. They are more user-friendly and give a competitive edge over applications that are not scalable. They have better performance, a higher ROI, and happy users.
So where do we start?
We start from the foundation. We start with our ‘why’, not ‘what’.
I have often heard people defining scalability with vendor names or technologies. But that’s just the ‘what’. A service does not define architecture.
Think of it like this- a building’s architect doesn’t describe the beams, rafters, and metal bars by referring to the vendor’s company. She describes them in terms of size, load capacity, weight it can hold, and so on.
In the same way, for software’s architecture, we first need to decide how many users our platform would serve? What areas or modules of application users would be using the most? Would there be any scheduled traffic on the application? If yes, then when would it be at its peak? Would it be around New Year, Thanksgiving, Christmas, or over the weekends? How many external dependencies have we got in our application? How scalable are they? What’s the ratio of our reads and writes? It’s a process of asking and contemplating the right set of questions.
After answering all of those questions, we finally choose specific vendors/tools/technologies. Based on their price, reputation, and quality, we decide on the one we want to go ahead with.
Let’s assume that we shortlisted AWS as our technology. What now? How do we proceed after that?
Why AWS services?
AWS Infrastructure is widely used because of the breadth and depth of its services. With more than 175 services across compute, storage, database, analytics, networking, mobile, developer tools, management tools, IoT, security, and enterprise applications.
How to create a scalable application using AWS?
In this blog, we would be implementing a scalable architecture for a small healthcare web application where doctors and patients can interact. Patients can share their PHI along with their images to talk in detail about their ailments. From the architecture side, we need to ensure that we follow HIPAA compliance and other security standards.
1. First, we need to sign a Business Associate Addendum (BAA) using AWS ARTIFACT AGREEMENTS. It is required as we are subject to the Health Insurance Portability and Accountability Act (HIPAA) for compliance on safeguarding protected health information (PHI). It designates an AWS account that can legally process PHI. Let’s save this security discussion for another day.
2. We will deploy our web application on an EC2 instance that uses an RDS MySQL database. Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides a secure, resizable compute capacity in the cloud. Among the vast array of services that Amazon offers, EC2 is the core compute component of the technology stack. In practice, EC2 makes life easier for developers by providing secure, and resizable compute capacity in the cloud.
3. To store patient images, we will use an S3 bucket. It’s designed for large-capacity file storage in one geographical region. The bandwidth costs are also quite low. In order to utilize Amazon S3 in a HIPAA-compliant manner, we should be vigilant for all technical safeguards regarding S3 access, data security, and transmission. We can apply the principle of least privilege. For instance, we can give limited access to only necessary users via AWS console or SSH (shell). S3 also offers a variety of options to encrypt data in transit and at rest. We can harness the server-side encryption (SSE), which includes SSE-S3, SSE-KMS, or SSE-C options. We can enable default encryption as well using these methods. We can also opt-in for client-side encryption as well to support in-transit encryption. One thing to keep in mind is to avoid using PHI in bucket names, object names, or metadata because this data is not encrypted at all using S3 server-side encryption and is not generally encrypted in client-side encryption architectures.
4. To cache any in-memory data we will be using Redis cache that is a HIPAA-eligible service. To make our solution scalable from commencement, we will offload this responsibility to elastic load balancers, auto-scaling groups, and availability zones. They are born to shoulder this kind of responsibility.
Let’s talk through the used components one by one to understand their usage in the architecture.
As I mentioned earlier, we will be storing patient images in S3 Bucket. Considering that our application could be used from different countries and regions where health care providers log in to the web application and can start sharing and accessing the documents during a patient-centric conversation in our web application. Therefore, we need something to globally distribute the content seamlessly, securely with low latency, and high transfer speeds.
To fulfill this need, we have placed the CloudFront before S3. So, what does this CloudFront do? Amazon CloudFront is a Fast Content Delivery Network (CDN) service that securely delivers data, videos, applications, and APIs to customers globally with low latency, high transfer speeds. It proxies and caches web data at edge locations as close to users as possible.
Currently, there are over 150 edge locations. An edge location is not a region but a small location that AWS has. It is used for caching the content. They are usually located in the major metropolitan cities across the globe to distribute content to end-users with reduced latency. An example of this is Amazon Prime Video which uses AWS to deliver a solid streaming experience to more than 18 million football fans.
5. An Elastic Load Balancer automatically distributes the incoming traffic across multiple targets, such as EC2 instances, containers, and IP addresses, in one or more Availability Zones. It monitors the health of its registered targets and routes traffic only to the healthy targets.
To understand it better, let’s assume our website is running on a web server EC2. After a few weeks, the website goes viral and we receive an unanticipated influx of traffic which strangulates our current application. Now, to serve the increasing demand, we want to run two or more instances of our application on different EC2 servers. But we want to ensure that the traffic served from these multiple applications on different servers is balanced without any human intervention.
The advantage of using a load balancer is that we can add or remove resources as per our usage without disrupting the overall flow of user requests. Therefore, it increases the availability and fault tolerance, and scalability in our applications.
6. Autoscaling group– It exists so that we can shut down extra machines whenever traffic inflow is less. EC2 Auto Scaling group contains three main components. They are groups, configuration templates, and scaling options. Our EC2 instances are organized into groups as a logical unit for the purposes of scaling. When we create a group, we can specify its minimum, maximum, and desired number of EC2 instances for scalability.
Let’s understand what minimum and maximum are. For instance- our web application runs on a single instance and we want the CPU utilization of the Auto Scaling group to stay around 65 percent. Beyond that, it would add another instance to our Auto Scaling fleet. So, here we need to define maximum size while defining Auto scaling groups. The minimum capacity is the minimum number of instances that we want to run our application. The desired capacity is the number of instances that we want to start with either when we create the group or at a future stage.
We can also define a scaling policy to get the best out of our resources.
7. Availability Zones– Each group of logical data centers is called an Availability Zone. AWS has the concept of a Region like the US East, the US West, Asia Pacific, Europe, Canada, etc, which is a physical location around the world where we cluster data centers. Therefore, Availability Zones are multiple, isolated locations within each Region.
In addition, we would be using Amazon Relational Database Service (Amazon RDS) MySQL as a database solution. RDS makes it easy to set up, operate, and scale a relational database in the cloud. It is cost-efficient and has a resizable capacity.
As we have provisioned a Multi-AZ RDS DB instance, it automatically creates a primary DB instance and synchronously replicates the data to a standby instance in a different Availability Zone (AZ). Each AZ runs on its own physically distinct, independent infrastructure, and is engineered to be highly reliable. Several Amazon RDS engines allow us to add read replicas for increased scalability and to maintain database availability in the case of an AZ failure.
We can also set up ‘read replicas.’ By creating ‘read replicas’, we can offload the read queries or traffic of our website to those DB instances from the principal or ‘master’ DB instance.
8. Elastic cache– Now, let’s talk about the last component. Why did we choose it? Why can’t we store data inside sessions on a running web application or in a file storage system on the running web application server?
The answer is simple. When we talk about creating scalable applications we should start asking what-if questions. For example, what if the running application instance dies? What if the running application stops responding? In these cases, we cannot recover the stored in-process in-memory data unless we have come up with a backup solution. Therefore, we should strive to store the data outside the running application session so that if one process dies, another application can start serving the request by fetching the data from a central location.
Hence, we chose Elastic Redis Cache. It is an AWS-managed caching service that offers a fully managed platform that makes it easy to deploy, manage, and scale a high-performance distributed in-memory datastore cluster. These capabilities can significantly decrease the operational overhead in maintaining machines, software patching, monitoring, failure recovery, and backups. Moreover, it’s a HIPAA Eligible Service and included in the AWS Business Associate Addendum (BAA) which means we can use it to store, process, and access protected health information (PHI) and power secure healthcare applications.
To summarise, we first need to understand the true need for scalability, where does it apply in our application, how much scalability do we want, and so on. We can use this architecture document as a base and then we can build upon it.