In this podcast, Daniel Bryant sat down with Joe Duffy, founder and CEO at Pulumi, and discussed several infrastructure-themed topics: the evolution of infrastructure as code (IaC), the way in which the open source Pulumi framework allows engineers to write IaC using general purpose programming languages such as JavaScript and Go, and the future of multi-cloud environments.
Key Takeaways
- Infrastructure as Code (IaC) enables engineers to programmatically define the configuration and provisioning of computing infrastructure, on-premises hardware, and cloud services.
- Traditional IaC tools were often imperative, requiring engineers to define and enumerate the necessary steps and SDK calls in order to configure the underlying infrastructure.
- Modern IaC tools like HashiCorp’s Terraform, AWS CloudFormation and other related cloud vendor tooling enable engineers to write declarative code to define a required state of the infrastructure. The tools parse the declarative configuration and take appropriate action to enact the specified state, for example, calling SDKs and APIs, verifying results, iterating etc.
- Pulumi is an open source framework that enables engineers to define IaC using general purpose programming languages, such as Node, Python, .NET Core, and Go.
- Pulumi allows imperative specification of IaC. Engineers can use their favourite language-specific features, idioms, and patterns. The use of language modules, packages, and libraries can also enable code reuse.
- Under the hood, Pulumi transforms code written in the supported languages to a declarative specification model. This model is then used to enact the required infrastructure state.
- Frameworks like Pulumi enable engineers to deploy and configure infrastructure across multiple cloud vendors and services (including Kubernetes clusters).
Subscribe on:
Show Notes
Could you briefly introduce yourself to the listeners?
- 01:10 My name is Joe Duffy, founder and CEO of Pulumi, an infrastructure-as-code startup in Seattle, Washington.
- 01:15 My background - I was in Microsoft before, I did some things before that - but most people know me from my work there as an early engineer on .NET, developer tools, distributed OS Midori.
- 01:35 I've been focussed on the cloud space for the last few years.
Can you briefly introduce what problem infrastructure-as-code is trying to solve?
- 01:55 Infrastructure-as-code helps you automate the provisioning and management of your cloud infrastructure.
- 02:05 If you're just getting started with cloud, the obvious thing to do is point and click inside the AWS console, and start that way.
- 02:15 That's a fine way to explore, but what happens if you delete something, or you want to create a second copy of your infrastructure, or have a staging environment.
- 02:25 At scale, you need an automated solution for provisioning and managing this infrastructure.
- 02:30 Infrastructure-as-code is a way to do that in code, rather than having to do CLI commands or bash scripts.
- 02:40 There's a variety of solutions, such as markup based solutions like YAML or JSON, through to Chef and Puppet using Ruby based DSLs.
- 02:55 Pulumi takes the approach of using general purpose languages to provision infrastructure.
How would you say that Pulumi is different from Puppet, Chef or Ansible?
- 03:05 The configuration tools - I split the infrastructure-as-code space into provisioning and configuration.
- 03:15 Provisioning is about creating and updating and versioning the infrastructure itself.
- 03:20 A lot of the configuration tooling like Chef, Puppet, and Ansible are more about what's happening inside of the virtual machine.
- 03:30 You spin up some VMs, then you have to install some packages, then you have to configure some services with systemd - or patching the server upgrades in an automated way.
- 03:45 Over the years, especially as we have adopted containers and serverless, and the infrastructure itself has become more fine-grained.
- 03:55 If you think of all the IAM roles and policies and all of the moving pieces, provisioning is more interesting now than configuration.
- 04:05 There's this other trend, which is towards immutable infrastructure; if you want to deploy a new version of a web server, one approach is to patch your existing server.
- 04:10 Then you have to think about all the N-to-N+1 of the server's possibilities, of what the current state is and how do you move it to the new desired state.
- 04:20 With immutable infrastructure, you spin up a new webserver, redirect all the references to the new server, and destroy the old one.
- 04:30 We see a lot of people moving to that model with provisioning tools instead.
How does Pulumi differ from Terraform or CloudFormation?
- 04:40 Pulumi is imperative, but with a declarative core to it - that's the special thing about Pulumi.
- 04:50 A lot of people are familiar with Boto, write some Python code, go out to AWS SDK and make a bunch of calls and spin up servers.
- 05:00 The thing that CloudFormation, Terraform, and now Pulumi do is based on the notion of this goal state.
- 05:10 Your program, when you run it, says that you want a VPC, these subnets, an EKS cluster, and an RDS database.
- 05:20 The deployment engine (whether it's Terraform, CloudFormation or Pulumi) now can say that's the desired state, and I will make that happen.
- 05:25 That works for the first deployment you do, but maybe you want to re-run it, to scale up the number of node pools in your Kubernetes cluster from 2 to 3.
- 05:40 The deployment engine can then diff that desired state from the current known state, and produce a plan of how to change it - in this case, an increase of 1 node pool.
- 05:50 What Pulumi does is allow you to express your goal state in a declarative language.
- 05:55 The goal state is still declarative; it represents your end state - you're just using for loops and classes and all those familiar constructs to declare it.
- 06:05 We also support .NET, so you can do this in F# and it becomes a more functional approach, which is a completely different way of thinking about your infrastructure.
How do people get started with Pulumi?
- 06:30 Pulumi is an open-source tool, so you can download it from GitHub and you have a CLI which hosts an engine, which knows how to interact with different language runtimes.
- 06:45 You download this tool, and then you say you want to create a new project in Python, JavaScript, or whatever language of choice is, and the Pulumi engine knows how to spawn those runtimes.
- 06:55 For state storage (part of the infrastructure-as-code approach) we have a hosted service which we make available for free if you want to use it (you can opt-out if you don't).
- 07:10 It's super convenient, because you don't have to think about state management.
- 07:15 When you use CloudFormation, it feels like there's no state management, because CloudFormation is in AWS, and they're mapping the state for you.
- 07:25 Terraform is an off-line tool, so it gives you the state and you have to manage it - and if you do that wrong, you shoot yourself in the foot.
- 07:30 We tried to make Pulumi more like CloudFormation model than Terraform - but if you want to take the state with you, you can do that.
What languages does Pulumi currently support?
- 07:40 We support Python, and any Node.JS supported language; most people use TypeScript or JavaScript.
- 07:50 We use Go, which is great for embedding infrastructure-as-code for into larger systems.
- 07:55 We also support .NET, so any .NET language, which includes C#, F#, VisualBasic - even Cobol.NET if you want!
How does using functional languages like F# work with Pulumi?
- 08:10 Functional languages themselves are declarative in a sense, because you don't have mutable state - new states are computed out of old states.
- 08:20 F# itself has had notion of these workflows for a while - in the early days of async programming, we had these F# workflows.
- 08:30 Declaring your infrastructure feels like declaring your workflow of how all these infrastructure pieces related to each other.
- 08:35 Ultimately, all of these languages interact with the Pulumi engine in fundamentally the same way - it's just the syntax of how you're describing the infrastructure and the facilities of the language that are available to you.
So you can use modules and libraries?
- 08:55 Exactly, which is great - because how many times have you written the same 2000 line CloudFormation or Terraform code to spin up a virtual VPC in Amazon?
- 09:05 Now you can stick it in a package, share it with your team, the community or just re-use it yourself next time you need it.
Is Pulumi a transpiler to convert a language like TypeScript into AWS commands?
- 09:15 It's fairly complicated, and it took us four attempts to get it right.
- 09:30 We started by writing our own language runtime, because the challenge you have is what happens when a resource depends on another in your program - you need to capture that dependency.
- 09:40 You need to provision things in the right order, and for destroying them, in the right order as well.
- 09:45 You also want to parallelise where you can so that you can build everything as fast as possible.
- 09:50 In Pulumi, in the code, you declare a directed acyclic graph (DAG) - a graph of resources that depend on each other.
- 09:55 The Pulumi runtime takes that DAG, and creates a plan out of it, lets you see the plan before you've applied it, and works the first time or diffs in subsequent runs, and you can apply it.
- 10:15 You can run those in separate steps if you want.
- 10:20 When you choose to apply it, Pulumi takes that plan, and orchestrates all of the AWS calls or Kubernetes or whatever cloud you're using.
What backends do you support?
- 10:35 All the major clouds: AWS, Azure, GCP - also Alibaba Cloud and Digital Ocean.
- 10:45 We also support Kubernetes, which is a popular package for us.
- 10:50 The full object model in Kubernetes model is available, so you can not only provision Kubernetes clusters using Pulumi, you can install services into them with Helm Charts.
- 11:00 You can also write your application config with this model, and have dependencies between them, which is nice.
- 11:05 Often provisioning a new Kubernetes cluster means spinning up EKS, provision some AWS resources like CloudWatch logs, and then maybe install some Kubernetes services using Helm.
- 11:20 With Pulumi, you can actually provision resources across all of these clouds using one program, and Pulumi will orchestrate it in the right order.
Is Pulumi similar to the AWS Cloud Development Kit (CDK)?
- 11:35 There's a lot of similarities - we came out a bit before CDK so we've had more bake-time.
- 11:40 The main difference is the multi-cloud nature; we support Kubernetes, Azure, GCP - and also on-prem technologies like vSphere, OpenStack, F5 Big IP.
- 11:55 The other difference is that CDK is a transpiler; it turns out that Pulumi is a runtime.
- 12:05 CDK spits out CloudFormation YAML, and if you have an error, tracing it back to the program isn't quite as first class.
- 12:15 I love what they're doing, and they're taking the idea of infrastructure-as-code and seeing the same vision that we see.
- 12:25 We talk with the CDK team a lot about our experiences, but there are some fundamental differences.
How does debugging with Pulumi work?
- 12:50 We have what we call PDBs - I used to manage a C++ team at Microsoft, and we spent a lot of time making sure that the debug symbols mapped back to the source code.
- 13:05 We sort of have the equivalent of debugging symbols for your infrastructure, where you know exactly where it came from down to the program source code.
- 13:10 Because we're using general purpose languages, you can use your favourite editor, IDE, debugger, test tools ... the language is like the surface area, but the toolchain and ecosystem is more powerful.
Are there any disadvantages of using Pulumi?
- 13:40 I think that some people are uncomfortable with a full blown language to start - especially if you're coming from a limited DSL or YAML dialect.
- 13:55 A lot of people may be worried about creating webserver factories and you are an architect astronaut with huge layers of abstractions and no-one can understand what's going on ...
- 14:05 The same arguments apply also to application code, and we've somehow figured it out there so it doesn't worry me as much.
- 14:15 Not everybody understands full-blown languages, so there is a learning curve - but a lot more people these days are learning languages such as Python from school.
- 14:35 I think ultimately it is better for folks to learn and come outside their comfort zone a little bit, and come out the other side a bit better off.
Is Pulumi skewed towards developers rather than ops?
- 15:10 I thought that was going to be teh case, and it turns out where we're resonating the most in DevOps teams who have used Chef and Pupppet and Boto and Python.
- 15:25 They maybe have used enough Terraform to know the limitations that they're hitting.
- 15:30 For developers, it's a no-brainer, but for folks who are already doing infrastructure, and also want to work better with developer teams.
- 15:35 We tend to have these silos sometimes, and the devops movement has helped to break some of those down, but we've seen some infrastructure teams who want to hand over control to the developers but don't know how.
- 15:45 The development team doesn't want to use a YAML templating thing, but rather their favourite language - and this gives them a way to have that conversation and empower them a bit more.
How does collaboration in general work with Pulumi?
- 16:10 It varies by team.
- 16:15 If you're just bringing up a new service, and doing initial development - a lot of times, that happens at the command line.
- 16:20 On our command line, it shows you the whole diff, you can drill into the details, you run the command 'pulumi up' and will show you the difference to the plan, and you can apply that.
- 16:40 In production settings, we are moving more towards a git-based deployments where we integrate with your source code systems.
- 16:45 When you open a pull request, Pulumi will actually augment the pull request with the full diff of infrastructure changes.
- 16:55 It's not always obvious when you are diffing your code what the infrastructure changes would be.
- 17:00 It will show you if you deploy this, it will deploy a webserver, modify a Route53 record - and then this links over to Pulumi so you can drill in.
- 17:10 You can have a conversation on the review process in the team around rolling out changes - that's how we manage our on-line servers.
How do I go about testing Pulumi code?
- 17:35 There's a lot of different kinds of testing; you could mean unit testing, integration testing; there's also policies as code, so you don't violate a team policy.
- 17:55 One of the more interesting kinds of testing we see is ephemeral environments, where you take a pull request and spin up a new copy of the infrastructure temporarily to run tests against.
- 18:10 That really unlocks some workflows.
- 18:15 Because it's just your existing language, you can use tools and techniques that you know about already - you don't have to learn a bunch of new tools and techniques.
Are there any tips on dealing with the cost of spinning up bit ephemeral systems?
- 18:35 We have a config system built into Pulumi - you can create smaller versions of your environments for ephemeral testing.
- 18:45 Instead of having three nat gateways spread across all AZs, maybe you have one or skip it and have a mock instead.
- 19:00 We tend to find mocking is really complicated, because you might pass the test against a mock and then fail when you go to production.
- 19:10 Usually it's better if you can afford to create an approximation that's maybe a bit smaller.
What does a continuous delivery pipeline look like with Pulumi?
- 19:25 We made the decision early on not to try and create a separate CI/CD environment - we wanted to integrate with existing ones, like Travis, Jenkins, GitHub actions, GitLab pipelines, Azure pipelines.
- 19:40 We have over a dozen of these integrations - when you open up a pull request, it's going to run the preview, and once you merge and commit it runs the apply.
- 19:55 For a lot of us, this is how we do it internally as well, we have different branches representing different environments.
- 20:00 When we deploy a new production version of a Pulumi service, we have a separate production branch, and so we open a pull request from the staging branch to the production branch.
- 20:10 Pulumi knows how to do a rolling deployment, and diff between the two environments, which is a nice way of modelling it - it maps well to git concepts.
Does Pulumi manage traffic rollover for a blue/green deployment?
- 20:35 Where we can, we do - so for Kubernetes we give you detailed updates about where the rollout is happening, where the traffic is being migrated to.
- 20:45 We do mini blue/green deployments at the resource level - we prefer to spin up new infrastructure, and then drain traffic and redirect to the new infrastructure before tearing down the old.
- 21:00 We've architected Pulumi to work in that way.
- 21:05 If you're using ECS, we've integrated into the health checking to make sure that task definitions roll out at the right time.
- 21:10 We're trying to make it so that you don't have to think about it.
- 21:15 There's a blue/green deployment at a much higher level, so for folks wanting to do zero-downtime upgrades of Kubernetes clusters, for example, we have patterns you can use to blue/green the entire cluster level.
- 21:30 Some of this can be expensive, but if you really need zero downtime then there are ways you can accomplish it.
How important do you think multi-cloud is going forward?
- 21:45 I think it's reality for almost every company we work with, for a number of reasons.
- 21:55 I think multi-cloud can have a bad rap; to some people, it sounds like it means lowest common denominator across all of the clouds.
- 22:05 To us, that's not what multi-cloud is: in some cases, that makes sense - especially if you're doing Kubernetes where your workload can be multi-cloud.
- 22:10 For us, it's really most large enterprises where the entire organisation may have to deal with multiple clouds; on-prem, AWS, Azure.
- 22:20 For SaaS companies like ourselves, we are selling to customers who may want to run it in their own cloud, and that cloud is going to be AWS/Azure/GCP/on-prem.
- 22:30 We don't want to limit who we can sell our own product to, so it's in our best interest to think about multi-cloud.
- 22:40 We have customers where they get acquired, and their parent company is an Azure shop - they were all in on AWS, and now they're an Azure shop.
- 22:50 You don't expect them to rewrite everything to run on Azure - they didn't plan on being multi-cloud, but now they are.
- 22:55 For us, multi-cloud is more about the workflow, the authoring, the techniques, the tools - it's not about the lowest common denominator.
- 23:00 Pulumi gives you one workflow that you can standardise across the organisation, regardless of whether you are doing hybrid or multi-cloud.
Is there a migration path to bring things together with Pulumi?
- 23:25 Most people have solutions for infrastructure already - we can either co-exist temporarily during the transition or permanently if it makes sense.
- 23:45 There's ways of importing infrastructure, so even if you've created a resource from the CLI you can take it under the control of Pulumi going forward.
- 23:55 We also have translation tools, that can convert Terraform's HCL to Python or JavaScript or whatever language you like, and some for Helm charts.
What is policy as code?
- 24:25 Policy as code is the notion of expressing policies like you can't expose an RDS database to the internet, or your RDS database must be MySQL version 5.7 or greater.
- 24:35 The idea is you can express these policies using a language, either a DSL like Open Policy Agent (OPA) with Rego.
- 24:45 Just like infrastructure-as-code, we allow you to use your own language choice for policy-as-code.
- 24:55 You can then enforce this, so that every time someone does a deployment, if it fails the policy then block the deployment.
- 25:05 We also allow you to scan the existing infrastructure and find all of the violations that you already have.
- 25:10 You've got a path to incrementally remediate that over time.
What does the next 18 months look like for Pulumi?
- 25:25 We shipped our 1.0 release in September, which was a really major milestone for us, where we're sticking to compatibility - we know infrastructure is the lifeblood of the business.
- 25:35 We're now ramping up for our 2.0 release which is bringing policy-as-code and some more testing tools, which is going to be out pretty soon.
- 25:45 Now we've got this really solid foundation, and we've seen some of the patterns and practices that people are having problems with; I mentioned the VPC case - why would you write 3000 lines of code?
- 26:05 We're focussed in the future on some of these patterns and jobs to be done, and make it really easy to spin up a new microservice in an afternoon.
- 26:15 Now we've laid the foundation, how do we improve the time to create these things, reduce the boilerplate, make it 10 times easier than it was before.
- 26:30 Today we rely on NPM and other module registries, but we are looking at if we could have a central place for people to go to find all of these patterns and practices.
- 26:50 Long term, we want to make it easier for developers too.
- 26:55 Infrastructure is still hard: even though you can use your favourite language, the concept count is really high for infrastructure.
- 27:00 If all you want to do is spin up a new microservice in a docker container, you shouldn't have to become an expert in all of these things.
If people want to follow you on line, what's the best way?