Duncan Macgregor speaks with Wes Reisz about the work being done on the experimental Graal Compiler. He talks about the use cases and where the new JIT compiler excels really well (compared to C2). In addition, Duncan talks about the relationship of Graal to Truffle. The two then discuss a language Duncan works on at OracleLabs (TruffleRuby) that is being implemented on the stack. Finally, the podcast wraps with a discussion of Project Loom and its relationship to TruffleRuby and Graal.
Key Takeaways
- Graal is a replacement for the JVM’s C2 JIT compiler. It was tracked with JEP 295 (Ahead-of-Time Compilation) and included in Java 9. As of Java 10, Graal is experimental for the Linux x64 platform.
- Graal is written in Java and excels at implementing code that takes a functional approach to solve problems (such as Scala). It can also offer improvements / optimizations for other languages (including other non-traditional JVM languages such as C and Ruby).
- Truffle is a language implementation framework used my Graal. The idea is rather than having to write a compiler for your language, you can write an interpreter. This gives you the ability to write specializations at a higher level of abstraction that yields performance and better understanding.
- Truffle’s architecture and design allow things like allowing unrelated languages to do interop, garbage collection, and types.
- TruffleRuby and JRuby started off with a lot of shared code. They’ve branched and JRuby today focuses on integration with other Java classes. It compiles to bytecode and then relies on the C2 JIT to run on the JVM. TruffleRuby doesn’t try to compile to Java classes and only uses the Truffle framework to compile the things it needs. TruffleRuby is able to use most of native Ruby.
- Project Loom is a project that aims to add one shot delimited continuations to the JVM. It leverages fibers (a much lighter concurrency primitive than threads) and can literally run millions of them.
Subscribe on:
Show Notes
In your talk “Graal: Not just a new JIT for the JVM” - what’s the TL;DR?
- 01:55 Graal can do a better job of optimizing modern styles of coding like Streams and Scala.
- 02:10 It has some trade-offs, which we need to deal with, in order for it to become the default JIT on the JVM, but it can do more than that.
- 02:25 It can work with other languages, can work with code that is not going to be running a normal JVM, and it can be embedded in applications in interesting ways.
- 02:35 Because it’s written in Java, we can build interesting frameworks in it.
So Graal is fully written in Java?
- 02:40 The JIT is written entirely in Java; the compiler interface JVMCI is mostly written in Java (there’s a small set of stubs in HotSpot to communicate with it.)
What is Graal, and where did it come from?
- 03:00 Graal is a new Just In Time (JIT) compiler for the JVM.
- 03:10 It’s been going for quite a few years now, from work that Thomas Wuerthinger was doing.
- 03:20 It’s got some roots in the Maxine project years ago, which was a JVM (almost) entirely written in Java.
What’s the relationship between Truffle and Graal?
- 03:35 Truffle is a language implementation framework.
- 03:40 Instead of writing a compiler for your language, you can write an interpreter, and if you write it in a specific way using this framework, then as you run the program it will be specialized to create a compiled version.
Is it a higher level of abstraction with performance tradeoffs?
- 04:15 You’re working at a high-level interpreter, so if you have an ‘if’ statement, then that would be a node that would be evaluated and then the nodes follow.
- 04:35 Your interpreter looks like an obvious node traversal interpreter.
- 04:40 You can then write some optimizations; for example, if you were doing String equality, you could write a specialization that checks if the two objects are the same object, you can short-circuit and return true.
- 05:00 If I know they are different lengths, then you can quickly return false.
- 05:05 In the background, this means when you run the program, one of those will be picked and profiled and if it’s the only one then optimise using that implementation.
What is the implementation for Truffle?
- 05:30 It’s a set of libraries, some annotations and some annotation processing that goes on when you compile your program, to create generated classes that will be used in the background.
How can you replace C2 with Graal?
- 05:55 From a users’ point of view, they just need to add some command line options, and everything should work the same way. (-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCI)
- 06:05 From our point of view, it’s a lot easier to work with.
- 06:15 C2 is notoriously hard to work with and become proficient and productive with.
- 06:20 It’s written in C++, and you’ve got to understand a lot about HotSpot to really do good stuff with it.
- 06:30 Graal in some ways is a lot simpler to understand - if you’ve got any knowledge of compilers, you can look at the class structure and see there is a high tier, a mid-tier and a low tier.
- 06:40 It’s quite easy to see that you’ve got something for the AMD64 architecture that takes a set of nodes, and produces a set of AMD64 specific nodes - those transformations are easy to understand.
- 06:55 At the front end of what happens to your bytecode, it’s a lot easier to write transformations and substitutions.
- 07:05 When dealing with bytecode, both C2 and Graal will have specific things they are looking for and trying to replace.
- 07:15 Sometimes it’s entire methods, sometimes it’s invocations of methods.
- 07:20 Some of these are ones these are implemented in the native VM so you just want to call the specific routine.
- 07:30 Some are cases where you know that you can do encryption better than can be represented by bytecode because you have specific instructions available in the processor.
What are the performance differences between C2 and Graal?
- 07:55 We’re generally seeing better performance with Graal - there’s a couple of benchmarks where we’re on par or slightly behind with the community edition of Graal.
- 08:05 There’s also an enterprise edition, which is a closed-source version, which has some other optimizations - I believe the benchmarks are better in all the ones they have tried.
- 08:25 Hopefully if a real use case is found where the benchmark is not as fast, then it’s easier to understand what needs to be done and how to fix it.
What are the use cases that Graal excels at?
- 08:45 It excels at things like Scala and anything that does things with Lambdas in Java.
Is Graal a good general replacement for C2, or for specific use cases?
- 09:00 At the moment, it’s a good replacement for C2, provided that you’ve got a long enough running program that the time taken for running the JIT is going to be worth it.
What is the startup penalty?
- 09:25 There is an option, when you start the JVM, to bootstrap the JVM compiler. (-XX:+BootstrapJVMCI)
- 09:30 On my machine, that takes about 8 seconds, which is non-trivial.
- 09:35 We have a plan (which we’re working towards) to ahead-of-time compile Graal.
- 09:40 Most of the work of the compiler is the same, whether you’re just-in-time compiling or ahead-of-time compiling.
- 09:45 If we can do that, then we will be able to link it into the JVM as a shared library, and you won’t need to pay that start-up penalty.
- 09:55 There are some other cases where it may have a downside as well; in its current form, it’s using the JVM heap to store the compilation.
- 10:05 If you’ve tuned the memory carefully to what you need, or you have tuned your GC that your application creates, then using Graal will throw off that assumption.
- 10:30 If we compile it into a shared library, we can avoid those problems.
Can you talk about how you can be so concise with TruffleJS?
- 11:00 It’s not a totally fair comparison: V8 source code includes a GC, a JIT, and a whole host of things which won’t be part of TruffleJS.
- 11:15 We can share a lot of that infrastructure between languages, so it’s not necessary for every language to implement their own JIT because that’s Truffle and Graal’s job.
- 11:25 They don’t need to implement a GC because they are provided by the JVM.
- 11:30 We can make regular expressions fast: we can share a lot of code between multiple languages.
- 11:40 These are all parts that are common between different languages.
- 11:50 A lot of ways of implementing a dynamic language can be represented in our interpreter in a very simple way.
- 11:55 If you think of the addition operator in JavaScript - there’s a lot of cases about what you have to do in all cases.
- 12:10 You can create a PlusNode to evaluate the left side, and then the right-hand side, and then you can compare the cases: long-to-long, double-to-double.
- 12:30 The generated code from Truffle will take the required case and optimizing for that if that’s the only one that is seen.
How do you handle types with Truffle?
- 12:50 We have an interop framework (which is being redesigned at the moment) but the principle is the same.
- 12:55 There are certain operations that you can ask of an object from another language.
- 13:10 You can ask if it’s of various basic types; whether it has keys; whether you can read things from it.
- 13:15 That covers quite a decent set of primitive types and arrays, so you can get (for example) the first element from the array.
- 13:25 You can ask whether the object is executable, in which case you can try executing it with some arguments, or whether it accepts messages.
- 13:35 There are some other messages geared towards other low-level types like C, such as providing a native representation or a pointer to itself.
- 13:50 This allows you to pass objects between languages, and often work with them without having to know they are from another language.
You’ve worked on the TruffleRuby project?
- 14:10 Chris Seaton started the project for his Ph.D. (though you’ll have to check on that) and it started off life in JRuby.
- 14:20 We branched away form JRuby a couple of years ago because we were starting to share less code and less of the implementation.
- 14:30 We use some of the same libraries for text encoding, but other than that there’s no code sharing between them.
What’s the difference between JRuby and TruffleRuby?
- 15:05 JRuby is aiming much more at integration with other Java classes.
- 15:15 It has Java classes for many of the underlying Ruby data types, which unfortunately are public, so people can make their own Ruby classes.
- 15:25 It has a fairly wide API at this point - it’s hard to reduce to anything smaller.
- 15:35 It has an interpreter which it uses initially, and a compiler to JVM bytecode, and then relies on the JIT in the JVM to run that code as fast as possible.
- 15:45 Charlie (Nutter) has done a lot of very impressive work - introducing his own intermediate representation in his bytecode compiler to do enough analysis.
- 16:00 TruffleRuby is not trying to compile things to Java classes at all - it’s just using the Truffle framework to JIT compile the things we need.
What are you seeing with the performance of TruffleRuby?
- 16:25 The biggest thing we are seeing is that we can use a lot of native Ruby extensions, unlike JRuby which uses maintains their own versions because they don’t support native CRuby extensions.
- 16:40 The ultimate goal is to be able to allow you to run any Gem with native extensions - we’re getting there.
- 16:50 We have some compatibility problems that we’re ironing out, but we’re getting very close.
- 16:55 We used to maintain a lot of patches for various Gems, but we’ve been able to reduce it to a much smaller set, and we’ve got people testing a lot more of those Gems for us.
- 17:15 There are still some compatibility corner cases that we’re trying to hunt down.
- 17:20 I’ve spent the last day trying to find out why a particular Gem’s test suite causes a segfault.
- 17:30 I think the issue is to do with how big a FixNum can be in Ruby, and how big can it be in TruffleRuby, and that can throw off an assumption with an off-by-one error in a String buffer.
- 17:40 There are some niggling things like that, and the more people that are testing these things, the more we’ll be able to find and fix.
What will a fix look like for that type of issue?
- 18:05 In this case, it’s making sure that we tell a C extension that a number is a FixNum if it fits in 63 bits, rather than 64, and then their logic for when a number can be converted into a long in C without any changes.
- 18:20 If you say 64 bits, you can have the largest negative long, and if you take the negative of it you’ll get the same thing because of the way 2s complement works.
How do you encourage people to test with TruffleRuby?
- 18:40 If they want to try it, and they’ve got a Gem they want to use, go to graalvm.org or build TruffleRuby, and install the Gem using the standard Gem or bundle commands, and run your test suite.
- 19:00 If it doesn’t work, reach out to us on GitHub and we’ll talk to you.
What is project loom?
- 19:20 Loom is a project which aims to add one-shot, limited continuations to the JVM.
- 19:25 One-shot means that you can run the continuation once, and you can’t run from the same point multiple times - for example, not in a loop.
- 19:45 It’s delimited, which means that it’s only the bit of code inside your continuation you run from, so you’ve got a fairly small stack.
So how does that compare to Kotlin’s continuations?
- 20:05 They appear to be one-shot as well, because they don’t allow cloning.
- 20:15 The important distinguishing factor with Kotlin’s continuations is that they are implemented entirely within the compiler.
- 20:25 For that to work, every method between the continuation’s start and where you want to yield from it on the stack has to be visible to the compiler.
- 20:40 The difference with project Loom is that the code between the two points on the stack doesn’t need to know about continuations at all.
- 20:50 So suppose you want to write something like a web service, which gets a request in, makes a database request over the network, and returns some data.
- 21:05 We’d like continuations so that the blocking IO does not block a native thread.
- 21:10 We’d like to run that entire web request inside some kind of lightweight thread, built on top of continuations, and you as a user should not need to know about it.
- 21:20 You write a normal bit of code that calls the database library, and if that is running in a fibre, and calls down to a low-level IO routine, instead of doing synchronous blocking IO it will do the IO asynchronously and park that fibre.
- 21:45 The user code doesn’t need to know this process is going on at all.
So the user just writes standard synchronous code?
- 22:05 That's’ the idea - Alan Bateman have been doing some excellent work on the Java standard library, so the guts of that understand whether they’re running on a fibre or a heavyweight thread.
- 22:20 Then your web server needs to know to run things in a fibre instead of a heavyweight thread.
- 22:25 Often that can be done by giving it a lambda that tells it how to run each web request.
So what’s the difference between a fibre and a thread?
- 22:35 Continuations are a rather low-level thing, and they have got some difficult semantics to expose to users.
- 22:40 If they’re not one-shot, what happens if you have claimed a lock and released it - can you release it twice?
- 22:50 Equally, could you run a finalizer twice or run a cleanup method twice as you leave a method?
- 22:55 There’s a lot of potential gotchas like that.
- 23:00 So they’re a very low-level API, which we might not even expose to most users.
- 23:05 Fibres are just like threads - they are a thread of execution.
- 23:10 The difference between a thread and a fibre at the moment is that fibres are only continuations, and they’re running somewhere on a heavyweight thread.
- 23:20 They’re much more lightweight, they take up less space on the VM - so you can have a lot more of them.
- 23:25 We are looking at an API for fibres based around structured concurrency, which imposes some restrictions that you don’t normally have with threads, but has useful payoffs.
- 23:40 You can’t just start a fibre and have it outlive what started it.
- 23:45 You can have code which starts off a number of fibres, and waits for all of them to finish, or waits for one to finish, and cancel the others before continuing.
- 24:00 It allows you to simplify how you reason about concurrent code.
What’s the order of magnitude between fibres and threadS?
- 24:10 We can run millions of fibres in a JVM.
- 24:15 The stack size of the fibre is just what you use, so if you have small stacks they’ll be small - if you go deep, they’ll take up more space.
- 24:30 A lot of fibres are going to be very small, just a couple of hundred bytes and a bit of overhead for the objects that represent them.
Are they implemented in the VM or the language?
- 24:40 Continuations are being implemented in the JVM; fibres are being implemented in Java.
When are fibres a bad idea?
- 24:55 When you are doing something CPU bound, rather than is something that is doing blocking IO.
- 25:00 If your program is constrained on CPU resources, it doesn’t matter how many fibres you have, you can’t do any more calculations.
- 25:05 You’re better off just using a thread and letting the OS reschedule it every so often.
- 25:15 We may provide an API to allow for the creation of heavyweight fibres within the structured concurrency model, instead of threads, but that’s still an area where we’re experimenting.
What’s the connection between loom and TruffleRuby?
- 25:35 Ruby has a fibre model as well - a slightly different one from the JVM.
- 25:45 In Ruby you can create a fibre, and that fibre will run from the context of your thread, and cannot be run on a different thread, and runs until it yields to the main thread or another fibre.
- 26:05 That’s a slightly different model for the one we’re wanting to provide for Java, but it can be built from continuations.
- 26:10 We’ve done a small bit to prototype this - less than an afternoon’s work - and we can switch our fibres implementation from using Thread for every fibre to using continuations and we can show a few million of these in TruffleRuby and it works.
- 26:40 Ruby, like Python and a lot of other interpreted languages; although it can have more than one OS thread, it has the GVL - Global Value Lock (Python’s is the Global Interpreter Lock or GIL).
- 26:55 What this means is that only one thread of execution gets to run Ruby or Python at a time, which makes the implementation of the interpreter much easier because you don’t have to worry about concurrency.
- 27:10 Ruby level code should still worry about concurrency if you’re dealing with multiple threads, you can’t guarantee where the VM will switch the execution of threads.
- 27:20 I’ve seen a lot of code survive for decades that had concurrency issues, that just happened to get away with it by luck.
- 27:30 In TruffleRuby, we don’t have the same issue of the GVL when running Ruby, so we can run Ruby code concurrently.
- 27:40 We do use a lock when we are running C extensions because most are written with an assumption that most only run one at a time.
- 27:50 C extensions also have a mechanism for acquiring the lock and releasing it.
- 27:55 C extensions that do know what they are doing can release the lock and run in a thread and another thread can acquire that lock.
- 28:05 We’d like to provide a mechanism by which C extensions could declare that they are OK to run concurrently.
- 28:20 The other possibility is that we might implement one lock per C extension, so your database could run in parallel with an HTTP parser.
When will we see TruffleRuby as a production language?
- 28:40 We’re going to remain experimental at least until next year - we’ll see what happens then.
- 28:50 We can run some Rails applications at this point - performance is looking OK.
- 28:55 We’re going to work to get performance better, and there’s some serious work on Truffle to reduce time to peak performance.
- 29:10 We’re going to be remaining in experimental status at least for next year.