We love Amazon Web Services at AdRoll. We’re one of their largest customers, with all of our production infrastructure and most of our testing/staging servers running in EC2. This is fantastic, as Amazon provides a wealth of tools in their cloud to handle access to their APIs.
Chief among these is the IAM Instance Profile functionality, which allows developers to assign roles to their EC2 instances that only allow access to certain S3 buckets, or certain EC2 API calls. In doing so, you can prevent applications from accidentally deleting other instances, and minimize damage if an attacker is able to break into the server. These roles provide temporary credentials that are automatically cycled out by Amazon’s backend after an hour, which eliminates the need to use permanent keys that can be stolen.
However, we also have developer workstations that run our software in various states of development and debugging, and the tools Amazon provides do not exist in this environment. For a long time, our recommended setup was permanent keys stored on-disk, which had full permissions to access any API.
The security implications of this were a nightmare - it was impossible to track which users were responsible for which API calls, and there was no potential to scope permissions to what each developer actually needed to run the app. Rotating these keys required enough coordination to be impossible to do regularly, and they were left dormant just waiting for any developer who left the company to abuse.
Early Experiments
Several months ago, the Site Reliability Engineering team at AdRoll decided we’d had enough, and investigated various options for credentials management, each with their own flaws that prevented us from using them.
At first, we investigated seeding a config file to each machine, with some set of keys included, and rotate that out automatically. Our workstations aren’t configured for any sort of provisioning like this, and we shook our heads at how much time and effort it would require to coordinate credential changes. In addition, it didn’t really fix the underlying problem of easily-stolen credentials, if the malicious actor knew where to look.
Following this, we got word of a project called “aws-mock-metadata”, that promised to mimic the Instance Profile interface for developers. It was a great start, and provided temporary credentials to applications just perfectly, but in its configuration required permanent keys to call the requisite AWS APIs. Ultimately it would set us back at square one but give the illusion of proper functionality.
The final option we investigated and turned down was forcing developers to do their work on EC2 instances. Ultimately, we like the richness of local editing and debugging tools, most of which are lost in the server Linux environment, and setting this environment up would require extensive infrastructure to manage development instances to prevent the terrible costs of runaway servers. We also didn’t want all of our developers to be sysadmins - we wanted a system that would be invisible to normal users, but have extensive options for users who needed powerful permissions.
Introducing Hologram
It was clear from our investigations that we needed a system that hadn’t been written yet - something that could provide us greater operational security around API access, without requiring developers to give up their painstakingly-configured workstation setups.
To bridge this gap, but still provide the same environment on our development machines as we have in production, we worked with Amazon’s Identity and Access Management (IAM) team and developed “Hologram”, a system that brings the Instance Profile mechanism to developer workstations. It emulates the same metadata service that distributes Instance Profile credentials, via the same HTTP endpoints, so you can use the same API key access code throughout your entire application lifecycle.
To do this, Hologram requires a central server running in EC2 to generate and hand out credentials to the agents running on the developer machines. This separation of workstation agent and server - unique to Hologram - was key to providing a secure path to get developers API access without requiring permanent credential storage at any point in the system (as the central server can take advantage of the real Instance Profiles system).
Normally, Hologram hands out credentials for a “developer” role, configurable by the administrator, which all users have access to by default. In addition to this, users can choose to assume other roles, to emulate the exact permissions their production applications will have access to. Increased support for this is on the roadmap for future Hologram development.
Authorization
Hologram needs to know some information about each developer in your organization that will use it: their username, and the public component(s) of their SSH key(s). Taking a hint from the design of SSH itself, the user’s SSH keys are used in a signing exchange to verify their identity and fetch the correct permissions for them. This allowed us to easily add new users into the system and secure against unauthorized access.
Like many large organizations, AdRoll uses LDAP for internal authorization and management. This made it a natural fit for storing the needed bits about each user, so Hologram comes with baked-in support for communicating with LDAP servers (or any server like Active Directory that supports the LDAP protocol). Read/write support is needed as Hologram has the ability to add SSH keys for its users during the installation process.
The usernames assigned in LDAP are reflected in CloudTrail logs, allowing effective filtering and auditing of a user’s API actions. Users who were still using the default root keys did not show up in this list, allowing us to figure out who needed help installing Hologram.
Support is planned for holding permissions scopes for individual developers using information stored in LDAP. As of now, Hologram simply hands out whatever permissions the “developer” role has. With this feature, individuals can have a subset of the “developer” role’s access, allowing for secure treatment of interns, Business Intelligence personnel, and other analysts who might only need access to a single S3 bucket, or a single API call, for their work.
Configuration
Hologram is written in Go, Google’s “systems programming language” written by some of the original designers of C and Plan 9 from Bell Labs. It is a language explicitly designed for aggressive simplicity, both of programming itself and of deployment / operational concerns.
The ability to create a static binary with assets compiled in allowed us to do a sophisticated multi-stage rollout of Hologram to developers, which we credit with how quickly developers adopted it. Go allowed us to produce a version of the binary that had some placeholder credentials compiled in, that was first deployed to developers. This version would simply use the compiled-in credentials to generate temporary ones, and expose the same metadata interface that applications expected. This allowed us to accomplish the primary goal of this deployment, which was the removal of the root credentials from disk. Although they were compiled into the binary, it would at least take some work to acquire them and they were easily revoked in case this happened.
Once this rollout was complete, we had time to focus on the server component, and designing the protocol between the programs (detailed below). Subsequent rollouts removed the baked-in credentials from the system, and moved to using the backing servers’ own EC2 roles to create credentials passed to clients. In this way no permanent keys are used anywhere in the system.
This aggressive deployment strategy would not have been as easy to accomplish without the tools Go provided us; having a single binary required for the workstation agent allowed us to not have to worry about additional dependencies installed by the user, like we would with Python. Everything was self-contained and trivial to update the entire agent simply by replacing the program on-disk.
Communication
The agents and server communicate over a secured binary TCP protocol powered by Google’s Protocol Buffers library. This particular format was chosen for the ease of parsing and exceptional tooling that Google provides for Protocol Buffers, as well as the easy backwards-compatibility that the format allows. Several major protocol upgrades were made to Hologram after it entered production, and none of them required any changes to older clients (albeit with slightly-degraded functionality and reporting).
All of this communication is secured with Go’s built-in TLS and a compiled-in self-signed certificate. With Perfect Forward Secrecy always on, this creates a default configuration where it is nearly impossible to intercept credentials on the wire. Organizations that desire more security are welcome to compile their own version of Hologram with a certificate and key from their own internal CA.
Installation
Hologram supports 64-bit Linux servers for the central server component, and 64-bit Linux and OS X machines for workstation installations. Installers and Debian packages are provided here for these platforms.
Though it is a commonly-requested feature, AdRoll was unable to develop a Windows-based agent for Hologram before the open-source release. No engineer on AdRoll’s development team uses Windows, leaving us without testing capacity or a real return on engineering investment. Patches and testing are more than welcome for organizations who would like access to this.
The initial open-source release of Hologram can be found at our Github repository. New features, bug fixes, and design concerns are more than welcome from the community.
About the Author
Arian Adair is a Bay Area-based software engineer. He has almost a decade of experience designing and implementing systems, from industrial control interfaces to customer complaint analytics, for a variety of companies around California.