Key Takeaways
- The use of managed relational databases has surged recently due to benefits in hosting, scalability, and cost.
- Users need to monitor service costs, which include egress fees, and revise default settings for their workloads.
- The user should understand the operational costs involved when using a managed service.
- Users must learn more about the limitations, such as lack of flexibility, observability, etc.
- The user must make an informed decision on when to use a managed database solution.
In 2024, cloud computing is everywhere, often unnoticed (e.g., iCloud and Google Docs). Cloud computing has become as ubiquitous as real clouds. Many advantages of cloud computing, such as elasticity, scalability, and ease of use, are well understood at this point. They reduce the time to market for new products and address the scaling challenges of existing ones without going through an arduous planning and procurement process.
Because of these advantages, we have seen a massive demand for managed services for databases, message queues, application runtime, etc. However, this article is about the less discussed side of cloud computing: the hidden cost of using managed services, specifically managed relational databases.
As a database practitioner at Cloudflare and building Omnigres, I have experience developing, managing, and operating databases in environments such as completely on-prem, public cloud, and hybrid. From a business perspective, each model has its pros and cons. Once a company adopts a public cloud, using any managed services is fairly trivial, and databases are just one click away.
The ease of use is the gateway for users to start using a service. For the most part, it just works, so why not continue using it or even take a step ahead? Why not create more of such?
Cost - Actual Dollars
Managed databases from Cloud Providers offer a lot of value in terms of running them, backing them up, and monitoring them. They also take care of high availability. I presented at SCaLE20x the challenges of building an in-house managed database service: offloading that work to a provider reduces the operational costs and time to market and brings more flexibility. To offer these benefits, a provider charges the users.
First, calculating how much it will cost for the managed database isn’t straightforward. The cost depends on multiple factors, such as:
- Instance size and class (small, large, extra large)
- Pricing model (on-demand, reserved)
- Storage (General purpose, Provisioned IOPS, Actual IOPS)
- Data transfer costs (inside/outside VPC, inter/intra region)
- Instance engine (PostgreSQL, MySQL, SQL Server, etc.)
- Backup storage frequency and retention
- Deployment type (single/multi-AZ, serverless)
Even though it’s complex, it’s quantifiable. Some third-party tools make it easier to calculate the pricing. Also, cost optimizations such as disabling multi-AZ and stopping instances for the development environments are quite common. Companies such as Walmart started moving toward a hybrid cloud. At the same time, smaller companies like Basecamp have migrated the majority of their services off the cloud for mainly cost reasons.
To understand whether managed service cost is worth it, one must understand its usage pattern. The major benefit of cloud is the flexibility; if one doesn’t need that, they are well off operating their databases on their hardware. Let’s go over other areas where the cost is more subjective and somewhat difficult to measure.
Runaway Workload - Pay For No Good
One of cloud computing’s unique value propositions is scalability. If the website or product becomes an overnight hit, there is no need to procure infrastructure to support the workload. That’s great, but there is a catch; it can be a surprise if not used carefully. Imagine a runaway or a rogue workload against the database, and since a lot of the cloud providers charge based on IOPS or CPU time, etc., these workloads can generate a huge bill for no use.
Egress Fees - Getting the Data in Is Easy, but Not Getting It Out
On a multi-cloud or hybrid cloud setup, services need to communicate over a network between different providers. Typically, there is no data transfer cost for bringing the data (ingress) into a managed database. However, getting data out (egress) comes with a cost. The egress fee is a significant cost factor for businesses that move data from their managed database service. In a sense, this incentivizes users to not migrate their data out of the provider.
Providers such as Cloudflare understood this challenge and created the Bandwidth Alliance, an alliance that provides a discount or waives data transfer costs between providers who are part of it. Recently, Google Cloud eliminated data transfer fees for migrating data to another cloud provider. The practice is so unfair that regulators from the EU and the UK are investigating it actively.
Operational Costs - There Are Still Things to Do
While the service provider takes care of Day 0 operations, there are still Day 1, and Day 2 challenges. It is unreasonable to expect a provider to solve all the operational challenges. Still, at least it’s good to be aware of what those operations look like and the costs involved.
a) Secondary Backups
Data is the core of the business. I argue that any software business can be rebuilt if the data is intact. As a database engineer, losing data is by far my biggest nightmare. Being paranoid with backups is not a bad thing. Relying solely on the provider for backups is like putting all the eggs in one basket. Suppose the provider offers an SLA/SLO that’s a nice add-on. However, there is also a risk of a provider completely losing backup.
For the most part, it’s the responsibility of the business to their end customers to protect their data. Most mature organizations have secondary backups outside their primary service provider. In making this happen, there is a cost in terms of the actual dollars for storage and computing, data transfer, and engineering costs.
b) Backup Restoration
The quality of backups is determined by their ability to be successfully restored. What are backups worth if they can’t be restored? Unfortunately, many providers don’t do anything on this front and leave this part for their users. It’s understandably a complex problem since the providers don’t know every business’ needs. So, the users need to continuously test their restoration through automation or manually to validate the integrity of the backups and their restoration procedure.
Services Getting Discontinued - It Happens
Unfortunately, as things evolve, some services can be discontinued. Last year, MariaDB on Azure was retired. Aurora Serverless V1 will no longer supported after 2024. If the database is a closed source, the only way out is to use whatever tool the provider offers to export it elsewhere. Indeed, data migration has to be architected in such a way as to reduce the data loss and the downtime of the service. If it’s backed by an open-source database such as Postgres or even through an open protocol (e.g., Postgres wire protocol), it’s somewhat easier to migrate. Still, database/data migrations are always painful.
Lack of Flexibility - One Can’t Have It All
As managed services tend to focus on solving common problems, it can sometimes be limiting. Since the provider has to manage many services for thousands of customers, providing complete flexibility is cumbersome or impossible. It may not sound limiting or an issue initially, but as the business grows, it can start hurting. For example, Postgres has a huge extension eco-system.
Many managed services allow only the installation of a subset of the extensions. For example, open source extensions such as pg_ivm (incremental view maintenance) and zombodb (making the search easier within Postgres) are not supported in AWS and GCP, which can severely limit what features you can build or rely on.
Lack of Visibility - What’s Happening?
As an engineer, nothing frustrates me more than being unable to solve an engineering problem. To an extent, databases can be seen as a black box. Most database users use them as a place to store and retrieve data. They don’t necessarily bother about what’s going on all the time. Still, when something malfunctions, the users are at the mercy of whatever tool the provider supplied to troubleshoot them.
Providers generally run databases on top of some virtualization (Virtual Machines, Containers) and are sometimes even operated by an orchestrator (e.g., K8s). Also, they don’t necessarily provide complete access to the server where the database is running. The multiple layers of abstraction don’t make the situation any easier.
While providers don’t offer full access to prevent users from "shooting themselves in the foot," an advanced user will likely need elevated permissions to understand what’s happening on different stacks and fix the underlying problem. This is the primary factor influencing my choice to self-host software, aiming for maximum control. This could involve hosting on my local data center or utilizing foundational elements like Virtual Machines and Object Storage, allowing me to create and manage my services.
Also, there are healthy discussions around self-hosting vs. managed services in forums like Hacker News. One of the comments from that discussion summarizes it eloquently:
"There are definitely some things to be considered here [self-hosting]. However, I find that most people drastically overestimate the amount of work associated with hosting things.
Also, they tend to underestimate the amount of work required when using managed solutions. For example, you’ll certainly want to do secondary backups and test restores even for managed options."
Another side effect I have noticed is that teams tend to throw more money at the problem (increasing instance size), hoping it will solve some of their challenges when they can’t identify the root cause. According to Ottertune, a company specializing in tuning database workloads, even increasing instance types without expertly tuning configurations doesn’t bring proportional performance gains.
The challenge is almost unsolvable, irrespective of skill level. For instance, Kyle Kingsbury is a distributed systems specialist and the author of the Jepsen test which is used to verify the safety and consistency of distributed systems. While testing the correctness of the MySQL 8.0 version, he ran into a database replication issue and asked for support from the service provider.
A growing trend involves service providers depending on other managed providers to deliver solutions. Nevertheless, frustration arises when the foundational provider fails to meet expectations or behaves poorly. The point is there is not much one can do, even if they pay hefty prices and have a business SLA with their provider.
Tradeoffs
One thing you might notice throughout this article is the constant theme around tradeoffs. The purpose of this article is not to deter anyone from using cloud computing or managed services. It is mainly to bring awareness around the cost involved, the fine line between staying open and locked-in, limited feature set, lack of visibility, and having to do Day-2 operations.
These are some of the areas that weren’t intuitive to me when I first started using managed database services. I hope this helps developers and operators make an informed decision.