Today on The InfoQ Podcast, Wes Reisz speaks with CEO and founder of Solo Idit Levine. The two discuss the Three Pillars of Solo around Gloo, their API gateway, interoperability of service meshes (including the work on Service Mesh Interface), and on extending Envoy with Web Assembly (and the recently announced Web Assembly Hub).
Key Takeaways
- Gloo is a Kubernetes-native ingress controller and API gateway. It’s built on top of Envoy and at its core is open source.
- The Service Mesh Interface (SMI) is a specification for service meshes that runs on Kubernetes. It defines a common standard that can be implemented by a variety of providers. The idea of SMI is it’s an abstraction on top of service meshes, so that you can use one language to configure them all.
- Autopilot is an open-source Kubernetes operator that allows developers to extend a service mesh control plane.
- Lua has been commonly used to extend the service mesh data plane. Led by Google and the Envoy community, web assembly is becoming the preferred way of extending the data plane. Web assembly allows you to write Envoy extensions in any language while still being sandboxed and performant.
- WebAssembly Hub is a service for building, deploying, sharing, and discovering Wasm extensions for Envoy.
- Wasme is a docker like an open-source commandline tool from Solo to simplify the building, pushing, pulling, and deploying Envoy Web Assembly Filters.
Subscribe on:
Show Notes
Tell me a bit about Solo and the work you do. -
- 01:25 I started Solo two years ago, and I was trying to customers to transform and be more innovative.
- 01:30 When we started, I was passionate about service mesh; but two years ago I realised it would take a while until it's there.
- 01:40 I tried to figure out how to help right now, and the first thing i noticed that if we moved straight to microservices we would have way more API.
- 01:50 To manage that would be a lot easier if we had an innovative API gateway.
- 02:00 That's the first problem we tried to solve; we open-sourced Gloo [https://docs.solo.io/gloo/latest/] a year and a half ago.
- 02:05 Gloo is an open-source API gateway, built on top of Envoy, because it is the stepping stone for services.
- 02:10 Most of the data planes in use today are using Envoy, so it was a good starting point.
- 02:40 Gloo's idea was to help with transformation, because we realised that until you move everything to microservices, you must run a different type of workload on your infrastructure.
- 02:55 It could be a combination of serverless, microservices and even serverless; so as a hybrid layer, Gloo was meant to join them all together.
- 03:10 It's built on top of Envoy; it's flexible and pluggable.
- 03:20 We wanted to go a service mesh, but there wasn't a good plan to compete with the big players like Google, Istio, LinkerD or Consul.
- 03:40 When we looked at the market, we didn't want to compete with the big players, but we could look forward to the problems people encounter when migrating to a service mesh.
- 03:50 I do believe that everybody should adopt a service mesh, because when you move to microservices, you need to have them talking to each other.
- 04:00 Being able to see what's going on with your cluster will be more complex if you don't have a service mesh.
- 04:10 I didn't want to build a service mesh, but I realised that when we helped build Consul, there would be multiple different service meshes.
- 04:25 Maybe you have multiple Isdio instances, because you have different clusters.
- 04:35 When we built Consul Connect a year ago, that's the problem we were thinking of.
- 04:40 I realised if you adopt one of those instances on-prem, and then you want to migrate to AWS, you most likely will want to use AppMesh - so how do you orchestrate that?
- 05:00 Service Mesh Hub is managing multiple service stacks together.
- 05:15 When you're talking about service meshes are talking about the use-cases like mTLS, observability, wrap ... those are use cases.
- 05:25 What's special about service meshes are how they are implemented.
- 05:30 It's implemented in a way to abstract the network, and if it's abstracted then there's way more stuff we can do.
- 05:40 We're extending the mesh, to make it use different use cases.
- 05:50 To me, service meshes are a platform that we can extend.
Why did you build on top of envoy? -
- 06:10 The first is that you can extend it - you can write a filter.
- 06:20 The fact that we could extend Envoy with filters in C++ will be a differentiator in this market.
- 06:35 We could use Envoy to route and sign a request to AWS Lambda, for example, and sign it - or transformation and caching.
- 06:50 With other engines, like nginx, you can use the API but you can't live inside the program.
- 07:05 The second advantage is that it's driven by an API rather than a file.
- 07:15 At this time, I knew that Google chose an Envoy format for Isdio, so I assumed they would join the community.
- 07:30 The performance of an engine written in C++ can't be beaten by an engine written in Go.
What are some of the features that Gloo gives you? -
- 08:00 It's usually for a couple of reasons; security is the main one.
- 08:05 Our customers ask for LDAP, but we also have policy, filters, WAF, data-loss prevention - we have a lot of financial organisations so we had to be security focussed.
- 08:30 We have advanced rate-limiting, OpenID connect, TLS, mTLS, Let's Encrypt, open policy agent, RBAC, delegation, WAF ... everything related to security.
These are implemented as filters compiled in to Envoy proxy? -
- 09:30 Some of them are, like WAF is a C++ filter, but some are external servers.
- 09:45 You can imagine that you have a request, but an external provider determine if you are allowed and what you are allowed to access.
- 10:05 On a request, you can say that a customer is allowed but they're only able to see a subset of fields.
- 10:20 The idea is to make it simple so that you don't need to rebuild the back-end.
This is all open source? -
- 10:35 Gloo is open sourced; we have some enterprised license security stuff.
- 10:45 You can do things in the open-soruce, but who bring along their own rate limiting filter or auth server.
- 11:00 Most of it is open sourced; the enterprise contains more security and support.
- 11:10 The enterprise is built on the open-soruce engine, but Gloo is pluggable and extensible.
- 11:25 Gloo is running in Go, and it's pluggable.
- 11:35 It's watching CRDs, and every time something changes in your environment, in your configuration, or security certificate rotation; Gloo notices and sends it to a plugin system.
- 11:50 The plugin system translates it into Envoy, and we then save the snapshot to Envoy.
- 12:00 The reason we are winning so many customers is because it's fast.
- 12:10 Our customers want custom cases quickly, like an HMAC - and we can give it to them, either as a Go plugin or a filter.
- 12:30 We can innovate quickly and deliver tailor made solutions to our customers.
Tell me more about the service mesh hub? -
- 12:55 Hashicorp reached out to us for help in building Consul Connect, and they needed Envoy support.
- 13:10 When we started working, we realised that Isdio was a mess, and LinkerD was migrating from Java to Rust/Go.
- 13:30 When you have an ecosystem without a dominant player is that there is competition.
- 13:40 There were competing container orchestration systems like Docker Swarm, Cloud Foundry, Kubernetes and Mesos.
- 13:50 It took a long time before Kubernetes became the clear winner.
- 13:55 However, for service mesh, I don't think there's going to be a clear winner.
- 14:00 About a year and a half ago, I realised that there were going to be more service mesh implementations.
- 14:10 People would likely try Isdio, Consul Connect (which didn't have Layer 7 support) etc.
- 14:40 Firstly, we wanted to help them by a unifying simple API for all of the meshes to speak the same language.
- 15:00 If they all speak the same language, then you don't need to learn a full set of new APIs if you move.
- 15:10 We announced SMI in November 2018, and we announced Gloo at the same time.
- 15:25 If you think about customers who chose Mesos a couple of years ago, with a lot of money and education, and they need to start from scratch.
- 15:45 By using a consistent API, it doesn't matter what service mesh you choose, because your investment will pay off.
Can you talk about what Service Mesh Interface (SMI) is? -
- 16:05 We announced SuperGloo in November 2018, and Microsoft reached out to us, and they wanted to help us.
- 16:30 Microsoft can help with the marketing.
- 16:45 They went to Hashicorp and others, and together we announced SMI - an abstraction on top of service mesh to allow a single language to configure them all.
Does SMI deal with only a small core of operations? -
- 17:20 That is what SMI is doing, but what we are doing is trying to go on the high level of supporting everything.
- 17:45 The way we're doing that is by creating a wider set of APIs, which we're calling SMI++, and we'll be able to configure Isdio with the full set of features.
- 18:20 In the beginning with SuperGloo, we had another problem - you might want to be able to talk to different services meshes, but you might want to be able to talk to them as one.
- 18:35 Looking at the cloud, you have Isdio in Google Cloud, AppMesh in AWS and something else in Azure.
- 18:45 We will have three different implementations of service meshes in three different clouds.
- 18:50 I don't think there's going to be a Kubernetes winner which everyone agrees on for the service mesh world; we'll always have multiple implementations.
- 19:00 If you're running on-prem, and you want to use AppMesh on AWS, you should be able to use Consul Connect or Isdio on-prem but use the AWS cluster.
- 19:30 We're allowing you to group meshes together.
- 19:35 You can have group the meshes in AWS and an Isdio on-prem, and group them as one, and flatten the network so that services can talk to services in the other mesh.
- 20:00 Once you have grouped them together, you can treat them as one virtual service mesh - the user doesn't know, because they are using the same API to talk to them.
- 20:05 You can group them, and treat them as a production cluster, and install the same root certificate, prometheus, be able to route safely with TLS between them.
You have multi-cloud, multi-service mesh with cross-service discovery? -
- 20:40 Yes, everything is in the same control plane, you can see everything, you can even discover a new cluster with a mesh inside.
- 20:55 Once you group them, everything that happens with one cluster will happen to the other.
- 21:20 We make sure it's healthy, and can upgrade, troubleshoot, bringing configuration and logs to the same place.
How do you manage and maintain identity between service meshes? -
- 21:50 The only problem we have is with identity; not everyone is doing identity in the same manner.
- 22:00 For example, LinkerD isn't supporting Spiffy right now.
- 22:05 The question is: how do you manage that?
- 22:10 We have a very nice solution that allows you to do this.
- 22:15 Everything that should be level of the group, we're managing - everything else, the mesh is doing.
What does it mean to extend the mesh? -
- 22:45 There's a control plane and a data plane.
- 22:50 The data plane is most of the time Envoy, or it could be the Rust proxy that was written by LinkerD or the traffic one that mish mash has.
- 23:00 Most likely, it will be Envoy - the sidecar model.
- 23:10 The other thing you need to extend is the control plane, which is giving the configuration to those proxies.
- 23:15 When a request comes in, it's not going through Gloo, Isdio or anything other than the proxies.
- 23:25 What the control plane's responsibility is to teach Envoy what to do when a requets comes in.
- 23:40 Right now, the way to extend the control plane can be done with operators.
- 24:05 Usually, it comes in the concept of Kubernetes.
- 24:10 Kubernetes is great, but it's not a service mesh - we care about different things.
- 24:20 Service meshes took the network from the users and abstract and own it.
- 24:35 With great power comes great responsibility.
- 24:40 How do we make sure that we don't screw the network up?
- 24:45 The way to do this is to create a resilient mesh.
- 24:50 We put guard rails in there, make sure we validate, using SMI to make the language is easier.
- 25:00 We are using GitOps pattern to make it better.
- 25:10 When the user wants to change the mesh if there's a change in an environment.
- 25:25 That means you need to look at the requirements - so we need to make the mesh adaptive with operator.
- 25:40 We build a specific operator for service mesh, watching stuff that's specific to the mesh.
- 25:45 For example, telemetries, traffic and so on - things the mesh really cares about.
- 26:00 The result of what's going on is to change the mesh configuration.
- 26:10 The only thing the user needs to do is the rule - when to do this.
- 26:15 We call it autopilot for service mesh - we open sourced it.
- 26:35 The next question is how to extend the data plane.
- 26:40 Previously people were doing it with Lua.
- 26:55 The Google guys with Envoy decided that it would be better to bet on Web Assembly.
- 27:05 You can then extend Envoy without the need to recompile it.
- 27:15 Firstly, Web Assembly is better because you can write it in any language that you want [which has a WASM back end]
- 27:20 Secondly, Web Assembly is sandboxed so it's not likely to take Envoy down.
- 27:30 Web Assembly is fast, but it's complex, and you need to build it first.
- 27:40 All that process is still difficult at the moment.
- 27:55 The idea of our recently announced Web Assembly app is that it's going to make it much easier to use.
- 28:00 My vision is that it's like Docker as an experience - containers existed before Docker, but they made it much easier to use.
- 28:10 We are trying to do the same thing for Web Assembly.
- 28:15 You need to build your code, compile it for Web Assembly - so we created wasme as a command line tool [https://github.com/solo-io/wasme].
- 28:30 You can run `wasme init`, and it will download everything that you need to be able to build, along with an example project.
- 28:45 You can run `wasme build` to compile the code.
- 28:50 You then need to host the compiled binary somewhere, and to bring it to Envoy.
- 29:00 We created `wasme push` to upload the binary to a registry, like Docker Hub.
- 29:15 You can pull it locally, but you can deploy it; today we're supporting three ways to do this.
- 29:05 If you want to do it with basic Envoy, a deploy `wasme deploy --envoy` will do all the work for you.
- 29:40 You can also deploy it with Gloo, which knows how to go to the registry and deploy it.
- 29:55 The third one is Isdio, which does much the same thing.
- 30:10 The last one we did is to create a list and a publish, which pushes it to a hub, allowing others to use it.
- 30:30 You can think of it as a DockerHub for Web Assembly.
Is Web Assembly supported in Envoy yet? -
- 30:55 It's not upstreamed yet, which is what we're working on at the moment.
- 31:05 We have a fork, but we haven't merged it upstream yet.
- 31:10 There are things that don't exist; you can manipulate headers, but you can't manipulate the body at the moment.
- 31:20 It's close to being prime-time, and there's a push from the community and Google that it will happen soon.
What are some of the filters that are written in Web Assembly now? -
- 31:45 What we've open-sourced so far is a transformation one, metrics, and soon an AWS filter.
- 32:15 It will use the AWS handshake with a certificate and spin up a Lambda for you.
- 32:30 Web Application Firewall exists at the moment as C++ but we're moving it to Web Assembly.
- 32:35 We have a lot of requests from customers for a REST to SOAP translation, so that will be coming soon.
- 32:50 We are planning on moving a lot of our services to web assembly. because what we're doing right now is we're taking our upstream Envoy and our custom filter and compiling it.
- 33:00 Whenever there's a security vulnerability, we'll have to recompile Envoy each time.
- 33:05 If we do it in Web Assembly, we will not need to recompile Envoy, but just extend it with Web Assembly.
What is wasme? -
- 33:30 You can download it as a command line tool [from https://github.com/solo-io/wasme].
- 33:35 You can then do `wasme init` which sets up your environment, and clones an example repository.
- 33:45 You can then do `wasme build`, `wasme push` - essentially like a Docker environment.
How do the data types convert between Rust and Envoy via Web Assembly? -
- 34:15 As well as C++, we extended Envoy to support Rust.
- 34:25 When you build, it's `wasme build c++` or `wasme build rust`
- 34:40 When you deploy, it's `wasme deploy --isdio --noweo` whatever VM you feel comfortable with.
Can you go back and forth between Envoy and Web Assembly? -
- 35:05 This is work that's been done by Envoy; we haven't specifically done it.
- 35:15 You could imagine a chain of filters, and one of them would be in Web Assembly - it would just load and run.
What are the limitations? -
- 35:35 As of now, there are few that haven't been implemented.
- 35:45 The `wasme` API might change, but eventually it will stablise.
- 35:55 The data support is for headers but not bodies, so we can't do the latter yet.
- 36:10 It will happen in the future.
- 36:20 The point is, you can put as much as you want on the request, but doing too much will slow down the requests.
- 36:45 We will be able to write filters in any language, which will allow many more of my engineers to write filters.
What's next for solo? -
- 37:15 We're ahead of the market, and we created Gloo to be ahead of the market.
- 37:20 People are just starting to get their head around service meshes and web assembly.
- 37:30 We're continuing to work with customers and make it production ready, and not do new stuff.
- 37:45 We have chaos engineering called GlooShot.
- 37:55 We want to work on the three pillars and make it solid.
- 38:05 Gloo 1.0 is production ready, very solid.