In this podcast, Charles Humble talks to Akhilesh Gupta, the technical lead for LinkedIn's real-time delivery infrastructure, and also LinkedIn messaging. They discuss the architecture behind LinkedIn’s real-time platform, its building blocks, the frameworks used and other technical details.
Key Takeaways
- At its core the LinkedIn messaging platform is a communication system that allows members at LinkedIn to receive text, audio, image and video messages. The real-time distribution platform is a separate platform that powers instant dynamic interactions between LinkedIn members.
- Client/server communication in the real-time platform is handled using Server-Sent Events (SSE).
- The real-time system is built using the actor model with the Akka framework and has one persistent connection/actor. A standard Play controller handles SSE connections.
- Subscription data is held using a distributed key value store - specifically Couchbase - taking advantage of Couchbase’s default auto replication and auto-sharding mechanisms.
- The real-time system is primarily built for speed and scale. Whilst it is very reliable, it doesn’t offer the kinds of consistency guarantees that something like Kafka would give you.
Subscribe on:
Transcript
00:17 Charles Humble: Hello, and welcome to The InfoQ Podcast. I'm Charles Humble and today I'm talking to Akhilesh Gupta. Akhilesh is the technical lead for LinkedIn's real time delivery infrastructure, and also LinkedIn messaging and he's been working on the revamp of LinkedIn's offerings to instant real time experiences. Before this, he was the head of engineering for the Ride experience program at Uber in San Francisco.
00:42 Introductions
00:42 Charles Humble: I had the pleasure of speaking to Akhilesh as he prepared for his presentation at QCon London, which I'll link to in the show notes and I wanted to interview him for the podcast to dive a little bit deeper into some aspects of the architecture behind LinkedIn's real time platform. So on this podcast, we'll focus primarily on that architecture, its building blocks, the frameworks used and other technical details. Towards the end, I'm hoping we might also be able to talk a little bit about spatial computing, as I know that's another area of interest for him. Akhilesh, welcome to The InfoQ Podcast.
01:11 Akhilesh Gupta: Thank you so much for having me and thanks for the opportunity.
01:14 Could you start by describing what the LinkedIn messaging and LinkedIn real time distribution platforms are?
01:14 Charles Humble: Absolute pleasure. So could you maybe start by describing what the LinkedIn messaging and LinkedIn real time distribution platforms are?
01:21 Akhilesh Gupta: Absolutely. So let's start with LinkedIn messaging. So the LinkedIn messaging platform at its core is a communication system that allows members at LinkedIn to them and receive text, audio, image and video messages, and they can also share links and other forms of rich media. Moreover, it stores all these messages for hundreds of millions of LinkedIn members and allows them to search and retrieve them from a variety of clients. Behind the scenes, this platform also powers messaging of our various lines of business, like our recruiter and sales navigator products. As a world class messaging system, we prioritize member trust, privacy and security in everything we do. We're proud to be one of the most reliable and fastest messaging platforms in the world.
02:03 Akhilesh Gupta: The LinkedIn real time distribution platform is a separate platform and it powers instant dynamic interactions between our members. This is the one that I was speaking about at QCon this year. It is a published subscribed system that allows our connected clients to subscribe to events on various topics and our backend systems to do massive distribution of real time events on such topics to our connected clients.
02:26 Akhilesh Gupta: So as you can imagine, the messaging platform uses it to power instant messaging, typing indicators, seen receipts, online presence and smarter replies. Similarly, LinkedIn live videos uses it to power real time interactions between members watching a video, like concurrent viewer counts, live reactions, and comments. So if you see any product on LinkedIn that allows members to interact with each other in real time, more likely the LinkedIn real time distribution platform is powering it.
02:54 Charles Humble: So can you describe what the fundamental building blocks of such a system are?
02:58 Akhilesh Gupta: So there are a few building blocks for the real time distribution platform at LinkedIn. The first is the connection, the persistent connection that allows the backend systems to push events to connected clients. So, that's what I mean by connection. Then there's the client subscriber, this is client, the device which initiates the connection with the server and subscribes to the topics that it is interested in. Then we have the frontend connection manager. This is the first system in the backend that holds the persistent connections with the clients. Each machine in the system needs to manage hundreds of thousands of connections, receive and store subscriptions, and dispatch publish events to subscribe connections.
03:38 Akhilesh Gupta: Then we have the backend dispatcher. This is the system that allows us to scale, by dispatching between multiple front end connection manager machines for each published event. So these four building blocks are the fundamental building blocks that comprise the system.
03:52 Can you give an idea of the architecture for the platform?
03:52 Charles Humble: Can you give an idea of the architecture for the platform? So how does the dispatcher talk to the frontend nodes and so on?
03:59 Akhilesh Gupta: I think it's easiest to explain it by the various flows we have in the system. I think the first flow that we should talk about is the initial connection request. This is sent by the client subscriber. At LinkedIn, we use simple libraries on iOS, Android and web to manage what we call the event source connection. Event source is the fundamental building block that we use to actually create this persistent connection and it's simply a long poll connection. Client sends a connection request to the front end connection manager and we use what we call Akka actors to manage the lifecycle of each connection held by the frontend connection manager, so that's the connection request.
04:39 Akhilesh Gupta: Then we have the subscription flow. The client initiates a subscription to topics. On receiving the subscription request, the frontend stores it in a simple in-memory mapping of topics to the connections that are subscribed to them. Then it also forwards the subscription request to the dispatcher, which stores a mapping of topics to the frontend machines that are subscribed to them in a distributed key value store. So we're done with the connection and we're done with the subscription.
05:07 Akhilesh Gupta: The next flow is the publish flow. This is where backend systems at LinkedIn publish their events to certain topics to the dispatcher. The dispatcher looks up its key value store to determine the frontend machines to publish the event to, and the frontend machines use the in-memory subscription table to determine which connections to send the event to over the persistent connection. So this comprises the overall architecture of the system.
05:32 Charles Humble: You mentioned that you're using Akka actors to manage state. So is that just one persistent connection per actor?
05:40 Akhilesh Gupta: That's absolutely correct. Yes, Akka is really good at creating billions of these actors within the system and because they're able to talk to each those so effectively, it turns out to be the best way to manage the connection lifecycle.
05:53 How does the client communicate with the server?
05:53 Charles Humble: Then how does your client communicate with the server?
05:56 Akhilesh Gupta: Well, the client establishes the initial persistent collection using a simple HTTP long poll request and it's also called the event source protocol. Then the subscription requests are just regular HTTP requests and all client to server communication is through regular HTTP post requests.
06:14 Akhilesh Gupta: On the other hand, the server communicates with the client by publishing events over the same initially established persistent connection and therefore all server to client communication happens over that one channel.
06:25 Charles Humble: So why was SSE chosen over, say, WebSockets?
06:29 Akhilesh Gupta: I think there are two main reasons. One is the platform that we were building primarily needed us to send data from the server to the client. That is the one where we wanted it to be a persistent connection. SSE is really good at that, that one-way communication from server to clients. The use cases that we didn't have were where you require bidirectional communication, where for example, you are doing real time metrics analysis or something like that. In which case, the client also needs to be able to send data to the server in a continuous flow. So, given that requirement of server to client communication, SSE fit the bill really well for us.
07:07 Akhilesh Gupta: The second reason was that WebSockets is not always supported across all firewalls and across all network connections, if you look around the world. Therefore, we wanted to have a connection that is just supported widely. So, that was the second reason.
07:26 Akhilesh Gupta: I think the third reason, now that I come to think of it is that the LinkedIn traffic infrastructure is also built to support HTTP/2 and given that the requests from client to server can also be piped to the same pipe over HTTP/2 and therefore, it doesn't really matter whether the client to server requests are going over regular HTTP versus over WebSockets.
07:51 Why were Akka and Play chosen?
07:51 Charles Humble: Now, you said that you use Akka, which is the actor model opensource tool kit from Lightbend for connection management. I know you also use Play, which is the MVC web framework, which is also from Lightbend. Can you tell me a bit about why those two particular technologies were chosen?
08:06 Akhilesh Gupta: So at LinkedIn, we work at massive scale and we need to deal with tens of thousands of requests per second, and millions of concurrent connections. Traditional synchronous web frameworks need a large number of machines to handle such scale. So Play specifically is one of the modern completely asynchronous non-blocking frameworks. The beauty of such frameworks is that a threat is assigned to a request only when there is any competition that needs to happen. Most modern web services are IO intensive rather than compute intensive. There's a small number of threads are able to serve a large number of requests. Thus Play allows us to solve very high incoming traffic with a very small number of machines.
08:44 Akhilesh Gupta: The other challenge we had was to manage hundreds of thousands of persistent connections on each machine. So we needed a lightweight concurrency model and Akka fits the bill really well. It allows us to create Akka actors which can encapsulate state and the behavior of a connection. So since Akka actors communicate exclusively by messages and use a thread only when they're processing a message, you can once again support a large number of Akka actors in the system with a very small number of threads and thus publishing events to a connection becomes as simple as sending a published message to the appropriate Akka actor handling that connection. Having said all this, Play and Akka are not the only choices. There are many other similar modern web frameworks and competency models that one can use for similar systems.
09:26 Charles Humble: In terms of Play, are you just using a regular Play controller to accept the incoming connection?
09:32 Akhilesh Gupta: Yes. So Play has what we call the event source API, and that event source API can be used in a regular Play controller to accept an incoming SSE connection.
09:43 Does using the actor model bring you specific advantages versus using a scheduled executor service or a thread pool that manages all of your connections?
09:43 Charles Humble: Do you think that the actor model brings you specific advantages versus say having, I don't know, a scheduled executor service, or a thread pool that manages all of your connections?
09:54 Akhilesh Gupta: Yes. I think the main benefit is that Akka provides a level of abstraction that makes it easier to write concurrent distributed systems and scale them. So instead of calling methods, Akka actors send messages to each other and instead of blocking on each other, they simply wait for the reply message. This turns out to have amazing properties. Each actor becomes completely independent of other actors. There is no shared state and since it processes its messages one at a time, there are no locks or synchronization techniques that are required. So since senders are not blocked and a thread is utilized only when there is a message to process, billions of actors can be efficiently served on a few threads, reaching the full potential of modern CPOs.
10:36 Akhilesh Gupta: On the other hand, traditional schedule executer services would need a thread for each connection. It will also cause them to be blocked on each other and need us to carefully craft synchronization solutions over shared state and continuously tweak and manage a large type pool in return for possibly poor performance. So having said all this, Play and Akka are not the only frameworks, as I said before, to achieve the above but similar concurrency libraries and frameworks can be used to provide the same benefits.
11:03 Given that you're using Play and Akka, are you mainly coding in Java?
11:03 Charles Humble: Given that you're using Play and Akka, are you mainly coding in Java or are you using Scala, or a mixture? Or what are the main languages you're using?
11:11 Akhilesh Gupta: So at LinkedIn, we are indeed a Java shop. There are significant advantages to aligning on a language of choice in large software development teams and it allows for easier movement of engineers. It allows people to consolidate how we build and deploy and test our stuff. So the expertise that has been built over time is really useful. So yeah, we are primarily using Java.
11:33 Do you have to replicate or shard subscription data in order to get the kind of scaling that you need?
11:33 Charles Humble: Now, you presumably have to hold a key value table for the subscription data, which tells you which users are watching which videos. Do you have to replicate that data or shard that data in order to get the kind of scaling that you need?
11:47 Akhilesh Gupta: Yes, indeed. We use a distributed key value store and specifically we use Couchbase, but any other distributed key value store works. The best part is that we use Couchbase as it is. There's default auto replication, default auto-sharding mechanisms. Distributed key value stores work really well if you use them for just that, get and put key values. As long as you don't use things like global secondary indices or other fancy features, they actually scale really well for large datasets with millions of key value pairs, even with the default auto replication and auto-sharding mechanisms. So we're not doing anything special on top of a vanilla Couchbase store.
12:26 Do you offer any kinds of consistency guarantees?
12:26 Charles Humble: Do you offer any kinds of consistency guarantees? How do you ensure that say a like gets to its destination?
12:32 Akhilesh Gupta: So the real time platform is designed to be really fast. We promise end-to-end publish latencies of less than a hundred milliseconds. We monitor each part of the published pipeline to ensure that dropped events are extremely rare and kept to an absolute minimum. If a like happens to be dropped, it will still show up when the member refreshes their live video. If something more critical like a message or comment gets dropped in a very rare scenario, we have detection for missed messages on the client, and we do this with things like pointers to the previous message and synchronized state with a server when that happens. So we are not a completely guaranteed delivery platform, but at the same time, it is extremely rare and we have built in mechanisms to make sure that member actions are represented consistently on our platform.
13:18 Charles Humble: It's a slightly cheeky question in a way, but why did you not build a system on top of something like Kafka, which would give you those consistency guarantees? I know that Kafka is used quite extensively within LinkedIn.
13:30 Akhilesh Gupta: So there are a few reasons and I'll come to consistency guarantees in a bit. If we used Kafka, I think one of the first constraints that we would run into is that the number of Kafka topics that we would need would be equal to the number of live videos. So if let's say, we are running a hundred live videos at the same time, then we would need to have that many Kafka topics and then each frontend server would need to consume from all of these topics, because we cannot be sure whether a particular connection, which is watching a particular live video is connected to frontend server one or frontend server two. If consumption can't keep up, then adding more frontend servers doesn't help because even if you add more frontend servers, it still needs to consume from all of these Kafka topics at the same time.
14:17 Akhilesh Gupta: Then, live video is what we call a broadcast topic because there are many subscribers for the same topic, but we also have what we call personal topics where there's one subscriber per topic. So we do this for things like messaging, because you get your own messages, or your own typing indicators. To do that, we now need millions of topics in Kafka to support that because each member has their own messages topic, or has their own typing indicators topic and millions of topics in Kafka just doesn't work, at least the way Kafka's built today. So these are like the fundamental reasons that we couldn't use Kafka, mostly scale issues.
14:56 Akhilesh Gupta: Then the one final thing that I wanted to mention was that in my QCon talk I've spoken about cross data center publish, in which we are able to do regular HTTP post requests across data centers, to make sure that a published event reaches all of its subscribers across all data centers. With Kafka, we would need to use Kafka MirrorMaker, which slows things down considerably because now it's trying to mirror each event in each topic across all data centers and we want to be fast. So that's the other thing that holds us down.
15:28 Akhilesh Gupta: So yes, Kafka has that benefit that we could achieve absolutely reliable delivery and I think we do use Kafka for situations in which we need absolute reliable delivery. For example, we use it for making sure that a message sent by a member is persisted guaranteed for the receiver, but for things like this, where we want to be fast and time is what matters, it turns out that the system that we built was way more efficient at doing that.
15:57 How do you perform load testing on a system like this?
15:57 Charles Humble: In a blog post that you published on the LinkedIn Engineering blog, which I can link to in the show notes for this episode, you gave some statistics about how the system performed. What I was intrigued by was how you actually got those numbers. How do you perform load testing on a system like this?
16:13 Akhilesh Gupta: Very good question. So, when I was building the system and needed to do performance testing without any real traffic, I built a tool called Hurricane. Interestingly, that was built using Akka actors too. So in this case, the tool would spawn hundreds of thousands of Akka actors to act like LinkedIn clients, establishing a connection with our frontend connection manager. So the tool would then use Akka actors to dynamically add thousands of connections and maintain metrics on consumed chunks and the rate of data consumption. We use the tool to ramp up connections until we managed to break our server. But tools can only get so far. Nothing can really replicate production traffic from millions of connections from the widespread LinkedIn member base.
16:54 Akhilesh Gupta: So then, before exposing the real time platform to real users, we used a technique called Darkramp, and I mention that in the blog post, where we made our LinkedIn iOS, Android, and web clients establish real connections, but did not use those connections for any real user visible impact. We directed more and more of these dark production traffic to a single machine by reducing the number of machines on rotation to test the limits of the system when exposed to real production traffic.
17:19 Charles Humble: Were there any surprises in doing that? Did it uncover anything you weren't expecting?
17:23 Akhilesh Gupta: Oh yeah. So I mentioned like four or five different things on the post. I think the most interesting one was what is called the ephemeral port limit. So our frontend load balancer was obviously making a lot of connections with our frontend connection manager and we actually ran out of how many ports can be opened on our load balancer. So we had expected things like running out of file descriptors on our frontend connection manager, and we actually accounted for that, but we never imagined that we will run all of the number of ports that the load balancing could to our systems. So, that was very interesting.
17:57 How do you use the real time system to indicate presence?
17:57 Charles Humble: You mentioned as well that you use the real time system for indicating when someone is connected to LinkedIn. So, I think you call it presence in LinkedIn terms, which again I thought was intriguing. I think there's a blog post on this as well, but I wondered if you could maybe give a bit of a rundown of that particular scenario and how that works?
18:16 Akhilesh Gupta: So this is very interesting. So we first built the real time platform and we built it primarily for messaging use cases and then we realized we have this connection and this connection really represents that a LinkedIn iOS, Android or web client is open, that it has an online persistent connection to our frontend machine and we could use this. We could use this to know that a particular member is currently online. If you think about it, it feels like, "Oh, we already know that the connection is there and therefore it should be so easy to just know that a particular person is online and therefore just show the green dot." It turns out it's not so because there are all sorts of jittery connections, people going under tunnels and stuff like that.
18:59 Akhilesh Gupta: So when we initially built it based on that, it would just go red, green, red, green, red, green, and it would just not make sense. So we had to build this thing called jitter detection, where we would smooth out all this jitter noise. The way we did it is actually very simple. We simply emitted heartbeats from this connection to the frontend connection manager. These heartbeats, as long as they're there, we would consider a person online. so they would be emitted for example, every 30 seconds. If this stopped coming in for a significant period of time, and this is what causes the smoothing, because if they stop coming in for like two seconds, doesn't matter but if they stop coming in for more than a minute, then it starts to matter, and that's when you go offline. Then we process these heartbeats, again using Akka actors, where each Akka actor is now representing an online member. Then use was that as our source of truth for whether a person is online or not and display that to our end users.
20:09 Akhilesh Gupta: That's one final thing I want to mention there, which is how do we distribute whether somebody went offline to people who are viewing their presence? Again, we use the real time platform. Everybody who wants to look at the presence for a particular member subscribes to their broadcast topic for whether they're present or not. If we detect that that person has gone offline, we use the real time platform to publish that information to everybody who is watching out for that person's presence.
20:26 Where do you think spatial computing could have a real impact?
20:26 Charles Humble: Now, last time you and I spoke, that was in preparation for your fantastic talk at QCon London, which again, I'll link to in the show notes. During that conversation that we had, you mentioned you had a particular interest in spatial computing, which is the term broadly synonymous with virtual and augmented and mixed reality. So I thought it would be interesting to talk about that as well. Can you give me some examples of where you think spatial computing could have a real impact?
20:51 Akhilesh Gupta: Sure. It's very interesting that we spoke about all this before COVID gripped the entire world. I think my answer back then has become even more relevant with remote work and completely online interactions. I think I'd spoken about completely immersive interactive experiences in the professional world, where it's hard to tell the difference between virtual presence and being there in person. I mentioned that spatial computing will allow remote students to experience the complete classroom experience, including interacting with peers and fellow students and that it will allow people to attend, participate and network in conferences without actually traveling to the venue, which seems like it is required in this new world.
21:32 Akhilesh Gupta: Then I think what will happen is that colleagues from every corner of the world would sit in the same virtual room for a meeting and it would be indistinguishable from meeting in person. Teams recently launched the Together mode, where we are already starting to see the first glimpses of what I spoke about. It actually makes you feel that you're talking to a live audience. So we at LinkedIn are quite excited about how this will enhance the way we connect and how it connects the world's professionals to make them more productive and successful.
22:00 Charles Humble: That's a great place to finish, I think. Thank you very much indeed for your time and for joining us today.
22:06 Akhilesh Gupta: Thank you so much Charles. I really appreciate the opportunity.