Continuous profiling is collecting profiling data (think things like memory and CPU) all the time/throughout time in your production environment. Today on the podcast, Wes Reisz speaks with Frederic Branczyk, CEO of PolarSignals, a startup formed to enable continuous profiling leveraging eBPF. Wes and Frederic discuss the origin story of Polar Signals, eBPF (the enabling technology used by Polar Signals), Parca (the open-source system they built to collect continuous profiling data), and more, including things like FrostDB and why profiling data complements what we already have with our current observability stacks.
Key Takeaways
- Parca is open-source Continuous Profiling tooling from Polar Signals that, while not specific to Kubernetes, is built with strong roots in the Kubernetes ecosystem. Parca is composed of agents that run on nodes and a central server collection point. Data collected by the nodes includes things in the kernel and user space. All of that collected data, if running in a cloud-native environment, is enriched with labels (think namespace, pods, containers) and sent to the server for querying (in a similar way to how Prometheus works).
- Continuous memory profilers sample the call stack constantly. While this is done at a relatively low frequency (say several hundred times per second), those samples are done all the time (continuously). It is statistically getting all relevant data because it’s the actual executing code (the more frequently code is run, the more frequently its sampled).
- eBPF is a lightweight way of writing code in the Linux kernel space without having to write and compile operating system-level code directly. It is the technology that enables us to able to implement Continuous Profiling in production.
- Continuous profiling doesn’t replace logs, metrics, and tracing. It essentially adds details that add another aspect to your observability solutions. Continuous profiling shines another light on your running applications. Three use cases for continuous profiling are cost savings, increased revenue related to performance, and incident response.
- Parca uses an internally built open-source columnar database called FrostDB (a Columnar database stores column values consecutively and is prominent in analytic workloads). FrostDB is able to create columns dynamically when a new custom label is seen by the Parca Server.
Subscribe on:
Introduction and welcome [00:05]
Wes Reisz: Say you have an application, maybe a web app. When you first start it, it consumes a bit of RAM. Several hours later -- perhaps under a certain load profile -- that same app is now consuming several gigs of RAM. Maybe a lot more than it should. You have a memory leak. Maybe something is hanging on your connections, or you have some objects that aren't being correctly released. A common step might be to use a profiler, to try to understand where your memory is being used or where it's being held onto. Maybe you'll load it into a test environment, take a heatp dump, analyze it, fix it, and then deploy it. Sound familiar? You can probably guess the next part, because black swans happen. Sometimes certain profiles just don't happen the same way in a production use case. Enter Kubernetes, the continuation gets stronger and stronger. What you see in your lower environment may not be the reality of what's in production.
So what you really need to do, is profile production. But that's much more resource heavy and invasive to do, right? What if you could? What if you could profile your application, and have that information in your observability stack? What if you could see sampled set of your app profile in Grafana? You could correlate it across deployments for your clusters, and see all of your cloud native infrastructure. What would it mean to understanding how your app performs in that environment?
Hi. My name is Wes Reisz. I'm a tech principal with Thoughtworks and co-host of the InfoQ Podcast. In addition, I chair a software conference called QCon, that happens in October in the Bay Area. You can check us out at QConSF, to learn more about the conference.
Wes Reisz: Today on the podcast, I'm speaking with Frederic Branczyk. Who, incidentally, will be one of the presenters along with Justin Cormack of Docker, Joe Duffy of Pulumi, Marcel van Lohuizen of Cue and Google's Borg fame, in Carmen Andoh's Language of Infrastructure track, at that very conference. This Language of Infra Track is just one of the 15 tracks at this year's conference. It's not too late to join us at QCon, but back to Frederick. He is the CEO and co-founder of Polar Signals. Before founding Polar Signals, he was a senior principal engineer and the main architect for all things observability at Red Hat, where he joined through the CoreOS acquisition. Frederick is a Prometheus and Thanos maintainer, as well as the tech lead for the Special Interest Group for Instrumentation in Kubernetes.
In late 2021, Polar Signals, in addition to closing a $4 million seed round from GV -- formerly Google Ventures -- and Lightspeed, open-sourced Parca, an eBPF based continuous profiler. Today on podcast, we're speaking with Frederick about Parca and how continuous profiling is not only possible, but happening.
Frederick, welcome to the podcast.
Frederic Branczyk: Thanks for having me.
What is the origin story of Polar Signals? [02:34]
Wes Reisz: I remember at KubeCon EU in Barcelona, it was just before the pandemic, so 2019, I think. You and Tom Wilkes did the keynote where the two of you talked -- at least in part -- about continuous profiling. Is that where Parca came from? Is that the origin story?
Frederic Branczyk: I think that was the turning point for me, to think that this is something we really need to explore really deeply. And I think that's what ultimately made me think, "I can start a company around this."
There is a longer backstory to this, that led to why Tom and I were even allowed to do that keynote, to talk about, in part, continuous profiling. That entire keynote was about more broadly, what does the future hold for observability, and continuous profiling was one of the three predictions that we were making in that keynote. But really, what led to that point in 2019, was, ultimately, the beginning of all this is me joining CoreOS in 2016. And at that point, for those who maybe don't know, at this point it's history... (we've got always new generations of engineers joining).
So CoreOS was one of the first Kubernetes companies, and even before CoreOS went into the Kubernetes space, we started with this mantra of always automatically upgrading server software. Because CoreOS' mission was to secure the internet. And the biggest impact we felt we were going to have, was by automating updates. Because it wasn't that security problems weren't being fixed, it's that people aren't updating their software. And this is still a problem today. And the way I came in, was just after CoreOS did the Kubernetes pivot. We realized that all of this automatic updating is really nice, but if the software isn't actually doing what it's supposed to be doing before, during, and after upgrades, then automatically upgrading isn't all that useful either. When I came in, everything in my responsibility was all about monitoring with Prometheus. So I came in and started creating a Prometheus setup for our product, and that ultimately evolved into, what is today known as the Prometheus operator.
Also a glimpse of history, this was one of the two very first operators ever created, right? Today we have hundreds, maybe thousands of operators out there, right? This was genuinely one of the two operators that were part of the original announcement. And yeah, I became a Prometheus maintainer. And through that, ultimately, also became, like you said, technical lead for instrumentation in Kubernetes. Because everything in that intersection was what I was working on, right?
To this day, I'm still the maintainer for the Kubernetes Integrations in Prometheus. And relatively recently, I actually stepped down from my position as tech lead in Kubernetes, just to have some more time to spend on some other things. But ultimately, all of this, and then, after the CoreOS acquisition in 2018, I stuck around at Red Hat. Red Hat acquired CoreOS. And like you said, I became architect for all things observability. And in 2018, was when I read this paper that Google had published, about how they are profiling all of the data centers all the time [Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers].
And this was super interesting to me for two reasons, right? One, I wanted all the capabilities that they were talking about, in this white paper. They were talking about how they always have the performance data available when they need it. They never need to manually profile their code, and they're able to cut down on infrastructure cost by multiple percentage points, every quarter, consistently, right? All of these were extremely exciting to me, because as someone who works in infrastructure, my customers have that unrealistic expectation that everything uses zero resources, and has zero latency. Anyone who works in infrastructure knows this. This is the same expectation we have towards databases. It's the same expectation we have towards Kubernetes and Prometheus themselves. And so, all of this, and then my experience with Prometheus, and the Prometheus storage, and all of these things, I felt like I was in the position to actually create something.
And that is ultimately what led to that keynote. I put together this really, barely compiling proof of concept, put it on GitHub, and Brian Liles was super, super nice to invite me to speak in that keynote, on that topic. So yeah. And then, ultimately, like I said, through that keynote, I think I realized that there was something bigger here, right? The continuous profiling market really wasn't established at all. There was not a single product on the market for this, but all the Hyperscalers were doing it. Google was doing it. Facebook was doing it. And Netflix has tools similar to this. So this makes sense.
So, end of 2020, I think it was similar to a lot of people, where it's been six months into COVID, people feel relatively uninspired. And I felt like there was this opportunity, right? And I felt like it was now or never. And then, that's when I decided to start the company all around this.
Wes Reisz: That's awesome. So let's back up a minute, and talk about continuous profiling. I gave that little example in the beginning of having a memory link. What is continuous profiling mean in practice?
Frederic Branczyk: Yeah, so continuous profiling, like you said earlier, just the profiling part, right, is as old as software engineering gets. We always need it to understand where our resources are being spent. And profiling allows us to do this down to the line number, and we can associate, where is my CPU time being spent? Where are my allocations happening? Where is memory being held? And we can do this down to the line number. But historically, profiling was always associated with having a lot of overhead. So that was what prevented us from doing this in production all the time. And the way that the Hyperscalers solved this, is, in part, by actually building these collection mechanisms into their operating systems. And I don't know about you, but I don't know a whole lot of companies who maintain their own operating systems, to be able to do something like that.
And so, it was also just doing, me looking at this problem at the right time, because eBPF was just starting to gain momentum. And eBPF is exactly the technology that allows us to replicate what the Hyperscalers were doing, ourselves. And not having to maintain custom operating systems, to do this kind of thing.
What is continuous profiling? [09:13]
Frederic Branczyk: Continuous profiling essentially is that we always profile in production, all the time, throughout time, right? Our entire data center. Every single process. And like I said, there are two main contributors, why we can do this in production. One, eBPF just allows us to grab exactly the data that we need, in exactly the format that we need. So we basically copy a bunch of memory addresses from the kernel into user space, and that's it. So this is super, super lightweight, in terms of what even needs to be done.
And then, the other aspect of it is, because we are doing this all the time, we can do profiling at a relatively low frequency. So the way you can imagine how profilers work, is that they look at the, quote unquote, current stack trace, let's say a hundred times per second. And based on the stacks that we collect whenever we do this, we can build statistics of, in which functions of my program is time being spent. So that's essentially how CPU profilers work. And if you do this at a relatively low frequency, let's say a hundred times per second, then the granularity is not extremely high. But because we're always doing this throughout all of time, we're actually statistically still getting all significant data. And so, there's actually a little bit more to continuous profiling than just doing profiling all the time, right? It, all of a sudden, allows us to do completely new things that we couldn't do before.
Like CPU profiles, all of a sudden, we don't have just a glimpse of a ten second period, where we happen to catch the process, recording it throughout all of time. And we can look at all of the CPU time in a report, where we look at the entire process's lifetime, for example, right? And this is much more representative than the 10 seconds that we happen to look at, right? The users happen to do this one thing that we're interested in, in those 10 seconds, right? Or software is unpredictable, and that's why we need monitoring and observability in place, so that we can reason about what has happened in the past. And continuous profiling is essentially another aspect of observability. It shines another light on our running applications.
Wes Reisz: So nothing's for free, right? You say it's lightweight, but what does it actually cost to run continuous profiling in production, with the tool? We haven't talked about Parca yet, but using say, Parca, what does it actually cost to run it in production?
Frederic Branczyk: The cost is essentially, we find it's somewhere around 1 to 3% in overhead. It can depend on, you can tweak the sampling ratio a little bit, and it depends a little bit on the workload. But most setups we find is around the 1% mark.
Wes Reisz: But for that investment, you get a profile of not just your machine, right? All of your infrastructure that this is running on?
Frederic Branczyk: All of the infrastructure. Exactly.
Wes Reisz: So talk a little bit more about that. So we can run pef on a machine, and get some idea of an individual Linux kernel of what's happening. But how does this continuous profiling work for a cloud-native ecosystem?
Frederic Branczyk: The open source project that we created called Parca (P-A-R-C-A), it ships with two components. The agent, this is the thing in, let's say a Kubernetes environment, you deploy on every node using a DaemonSet. And then, there's a central server, where all of the agents send the sampling data that they collect to. And then, the server is the thing that you can use to query and analyze this data. Yeah, so that's the setup. And because of our history, we're super close to the Prometheus ecosystem, super close to the Kubernetes ecosystem. So we very intentionally engineered this towards Kubernetes environments as well. That doesn't mean that it doesn't work in other environments, but the integration is particularly good in Kubernetes environments.
Wes Reisz: Okay, so to use a DaemonSet, you deploy out to nodes, and then, you have visibility of then, everything that's happening in that node? What about the pods that are running in there, and the containers that are within the pods? Do you have visibility into what's happening there?
Frederic Branczyk: That's right. I can talk a little bit about how it works today, because we're slightly changing it, but this is not ready. But I can talk about that already, because I think it's pretty exciting. The way it works today, is that we discover all of the containers on a host, and then, automatically profile all of those containers.
Wes Reisz: At the networking level, or at the kernel level inside the containers?
Frederic Branczyk: Both. So we look at each individual process in all containers. And so, we do see the user-space stack, so the code that we typically write. But you also see the kernel space stack actually, which is something that's pretty unique with Parca, actually. With a lot of profilers, you only get to see the CPU time that's spent in your user space code. So you only see, I don't know, you're reading a file, right? Or you're allocating some memory, but you're not seeing that this is causing a page fault in the kernel, or something like that, right? This can be extremely valuable information in order to improve performance of your software. But to get back to that, in the Kubernetes case, we label all of the data that we collect in a very similar way, as you're probably used to from Prometheus. There's the connection to our past again, where we add labels for the namespace, the pod, the container, all of these things, so that you can slice and dice the data however it's useful to you, right?
If you already know there's this one particular workload that you want to optimize, you just filter all of the data by this one, let's say, container label, right? I don't know, my-web-app, right? And then, you'll only see the CPU time spent by your web app. And you can dive deeper into specifically that. But one extremely powerful thing, and this is one of those things that continuous profiling is required, in order to do something like this. Because we are continuously profiling our entire infrastructure, we can actually merge all of this data into one report, right? And we get a single isotope graph, or flame graph, for our entire infrastructure.
And this is super powerful, because we're not just looking at the single process, we're looking at the CPU time spent in our entire infrastructure. And this often shows really, really surprising users of CPU time or allocations in the infrastructure. Because it's also, often, maybe we don't recognize that there's only one instance of this type of application, right? But we have hundreds of instances of this other one. And we optimize this other one that is insignificant in total, or there's a single line of code, that is poorly allocating memory.
We keep seeing these things over and over that it's very simple things, where maybe we didn't know, because we shouldn't do premature optimizations, right? We should base performance improvements on profiling data. And so, we see this often that there are pretty, let's say, naive things, that we can improve, that really only need this data. And historically, it was pretty difficult to get this data from production environments, and this is what continuous profiling is intended to democratize.
What are some of the key use cases you’re seeing with Parca? [16:44]
Wes Reisz: Yeah, that's awesome. That's awesome. What use cases and stories are out there, of folks that are using Parca continuous profiling? What are you hearing from folks that have implemented it?
Frederic Branczyk: I think we're seeing three main use cases. So the first one is one that I already touched on, and this is the one that the Google white paper also mentioned, which is purely, it's a data problem. If you don't know where CPU time is being spent, it's really hard to do something useful and effective about it, right? And all of a sudden, when you have this data, we see that most organizations can easily cut down their CPU time by 10, 20, 30%. In the most extreme case, we've seen 55% by a single incorrectly configured application. It wasn't even that the code was wrong, it was that this was poorly configured. And the user just didn't know that 55% of their CPU time was being spent in this thing.
Wes Reisz: Yeah. And it probably was perfectly configured, where it was tested with a profiler in a lower environment, right?
Frederic Branczyk: Exactly. Exactly. Yeah. So that's the number one use case. Everybody wants to save money, right? Especially now, with the economic situation, companies are even more trying to look at optimizing their cloud builds. So that's cost savings. The second one, we find is actually an even bigger motivator to use this type of technology. It's companies that have some sort of competitive advantage, or business advantage, by having a more performant system. And the really classic one here are eCommerce companies. There's lots of literature around this, but the faster a website is, for example, the more likely we as humans are going to purchase something on that website. And so, for eCommerce style companies, it's actually an incentive to have faster software, because it means that they will make more money, right? Making more money tends to be an even higher motivator for companies to do something than cost savings.
But there are more simple cases for this as well. Infrastructure companies, where they're classic performance type companies or high-frequency trading companies, right? Where every CPU cycle that you save, you have a competitive advantage to your competitors. I think you can broadly talk about this as just performance improvements, right? But that's the motivation that we're seeing, where the motivation for this is coming from. And the third one is, we categorize it as incident response. Essentially, like I was talking about earlier, because of how CPU profiling works, we essentially look a hundred times, let's say, per second, at what is the current function being executed. What this also tells us in a way, something that is extremely unique to continuous profiling, which is often when we look at a past incident, we're asking ourselves, "What was my program doing at this point in time, right?"
Maybe there's a CPU spike, maybe there's a latency spike, or some other indicator that something funky was going on. And CPU profiling data actually tells us what was the code that was being executed, right? It's sampled, but it's still significant, because the code that's executed or we're spending more time, statistically, will show up more significantly. So it's a super unique tool, and we've built some specific features, and actually into Parca, where you can actually select two points in time, or even two time ranges over time, and say, "Tell me what was the difference between these two points in time?" And this is extremely powerful.
You talked about memory leaks earlier, right? And with memory leaks, it was so amazing to see the first time we got this to work. You just see the memory growing over time, right? And you pull up the compare view, and you select a low point, and you select a high point, and it tells you exactly the difference, and where more memories has been allocated, right? And it's just, you can see it at a glance. Before, it was like, you go and maybe you managed to hit your application at the right time to grab a memory profile, right? But here, it's really just a click away. A search away. And that's extremely powerful.
Wes Reisz: When you say you can select two points, is there a tool that you're talking, or is this just in Grafana, that you're actually selecting? You talked about the agent and the server. Where are you selecting this set?
Frederic Branczyk: The server actually has its own storage. So I was talking about earlier, that the Parca agent is the thing that just collects data on each of your Kubernetes nodes, let's say. And then, centrally, you deploy a Parca server. And this has its own database and everything. And so, that is also the thing that where we ship a UI with. So the Parca server is essentially the equivalent to the Prometheus world, the Prometheus, right? It's this really, really, simple to run, statically linked binary, that has its own database. Everything's built in. You just launch the binary, and everything's there. So it ships with the UI, and that's the UI that I was talking about.
Wes Reisz: And so, Parca itself is both the agent and your server?
Frederic Branczyk: That's right. Parca is the umbrella project, and then, we have the Parca server, and the Parca agent as part of that.
Wes Reisz: And both are open source?
Frederic Branczyk: Both are open source.
What did Polar Signals build your own columnar database–FrostDB? [22:05]
Wes Reisz: Awesome. I know you did some interesting things in that database, that you were talking. You want to talk a bit about your database?
Frederic Branczyk: Absolutely. We tried for a really long time not to build a database.
Wes Reisz: Like every observability company?
Frederic Branczyk: But it turns out we wanted to be able to deliver exactly that Prometheus-esque experience, where we label the data, and you can add arbitrary infrastructure labels to your data. However, you organize your infrastructure. We don't want to force you into a specific labeling scheme or whatever, right? It's your infrastructure. You should be able to decide how you organize your infrastructure. And so, the thing that we found that no other database allows us to do, is to essentially, dynamically, create new columns. When we see a label for the first time, the database that we created is a column in our database. And so, the difference to the classic relational databases, like my SQL or PostgreSQL, they're role based. So that means the physical unit of how data is being stored, is all the values of a single row are physically co-located to each other. In a columnar database, that's different.
Actually, all the values of all rows of a column, are all stored consecutively. And this is extremely powerful for analytical-type workloads, because you can load only the columns that you need, and because they're all physically co-located, you can scan and process them extremely efficiently. And there's a lot of things that you can then do to make that processing really, really efficient, like vectorized instructions and stuff like that. But all the columnar databases that we found out there, didn't allow us to dynamically create these new columns, when we see a new label. And so, we tried things out a little bit, but ultimately, we just decided we've got to build something new, because everything else wasn't going to cut it.
The one database that comes close to what we imagined, is InfluxDB's new database called InfluxDB Iox. We did talk to the CTO of InfluxDB, and several engineers at InfluxDB, because we didn't want to build a database, but they were extremely nice and shared all of their experiences building this database. But at the end of the day, basically, we came to the conclusion we weren't going to be able to use their database in time. So we needed to build something ourselves. So after a couple of months of development, we open sourced FrostDB. Actually, originally it was called Arctic DB, but because of naming difficulties, someone actually held the trademark on that name. We needed to rename it. So now, it's FrostDB, for good.
Wes Reisz: One of the true hard things in software, naming something, right?
Frederic Branczyk: Exactly.
You’re rearchitecting the Parca agent. How is that changing? [25:00]
Wes Reisz: So earlier, you said the way Parca works today, versus the way it's going to work. I'm assuming that's some of the exciting new stuff. What is that all about?
Frederic Branczyk: Yeah. So like I was saying earlier, the way that we collect data today, is we discover all of the containers on a host, in the Kubernetes cluster, and we start to profile those. The problem with that, is that's actually not the entire picture of a host. There are more things running on a machine than just the Kubernetes containers. At the very least, you've got to have a Kubernetes kubelet, right? But probably, you'll have a bunch of other daemons, I don't know, maybe chronyd for time synchronization, systemd, all of these things. They probably run on your machine as well. They probably also use some CPU. And so, we're actually changing this architecture to profile, truly, the entire host. And it turns around where we'll still attach the same kind of metadata, as we're doing that today. But as opposed to discovering containers using that metadata essentially, we're just recording CPU time from all processes. And then, once we've done that, we then discover the metadata from Kubernetes using the process ID, basically. So ultimately, it'll be everything we have today, and more visibility.
Wes Reisz: Is that different? Is it going to be installed directly as an agent on the host, and not a DaemonSet into the cluster? Will the architecture change?
Frederic Branczyk: The architecture will be exactly the same. It's basically just an internal code change.
What is the Parca community like? [26:34]
Wes Reisz: Tell me, how's the community? Was it October-November of 2021? So been about a year now, and what's your community look like? How's that growing?
Frederic Branczyk: It's been growing like crazy, actually. It's been really cool to see companies using this technology, companies actually benefiting in the way that we set out for the project to do that, right? Sometimes that doesn't happen, right? Sometimes you're just wrong with your hypothesis. So it's super cool to see companies picking up this technology, and just running with it.
The most exciting thing is when people do something that you didn't necessarily anticipate. One of the things that we did really intentionally with Parca, is that everything's API first, and everything's really API driven. So we try to keep our UI as simple as possible, so that alternative things can be built around it, right? We want to build a community, so that people can build CI tooling around this, for example, right. Maybe you want to compare previous benchmarking data with this new benchmarking data. The world is your oyster, is the idea. And so, we are actually talking with a couple of folks at Grafana to build a Grafana plugin, so that actually all of this stuff can be integrated into your, probably existing observability tooling, right. So that's super exciting.
How does continuous profiling data complement the existing observability tooling available? [27:54]
Wes Reisz: I'm curious about, do you have a tool of the server, so that you can do this correlation? You also have an existing observability stack that has all of this other data that's in it. How do you bring them together?
Frederic Branczyk: Yeah. So this is another thing where we were really intentional about making this compliment really well together, with an existing Prometheus setup, right? Basically, the data model is identical to Prometheus. And so, if you label your data identically, which because the configurations are exactly the same, to the point where we didn't just design them in a similar way. We're actually literally reusing Prometheus code for configuration. So for most setups, you can copy and paste a lot of the configuration. But yeah, it's absolutely a complimentary thing, right? Like I said earlier, continuous profiling shines a different light on another aspect of your running applications. It doesn't replace metrics. It doesn't replace logs. It doesn't replace tracing. But the detail that you get from continuous profiling, you get from none of the other observability signals. So it's absolutely complimentary, and you should be using all of them. But exactly, that's also why we are excited about Grafana integration, because that actually allows us to weave it into, what people probably already have set up.
Wes Reisz: Yeah, that's really cool. It's really cool. So what's next for Polar Signals?
Frederic Branczyk: Yeah. So Polar Signals, we're actually working, this is unsurprising, we're working on a cloud product for all of this, so that you don't have to run the Parca server at all. And yeah, we've been working on that really hard. I think we'll get there by the end of the year, that people can start to use that. The idea is there, essentially, that you don't have to run the Parca server at all anymore. And the only thing you do, is deploy the Parca agent in your infrastructure, and it'll all just magically happen by itself.
Conclusion [29:51]
Wes Reisz: Awesome. Well, Frederic, thank you for joining us on the InfoQ Podcast.
Frederic Branczyk: Thanks for having me.