Today I’ll investigate the benefits of using Quarkus to package a micro-service written in Java as a native executable to speed up Lambda execution time and reduce memory footprint. This article assumes familiarity with the concepts of AWS Lambda, cold starts, and Java. Here is a previous article I wrote that may be a good starting point in case Lambda is new to you.
When we consider using AWS Lambda as an option to run our code, we frequently assume that the code is written in Node.js or Python. According to data dog’s state of serverless survey, of all currently deployed Lambdas, 47% are python and 39% are Node. Java is a distant third at under 10%.
If you search for tutorials on the web the majority are geared toward one of these popular runtimes. Part of the reason is probably just that these languages are popular and have been trending upward in popularity in past years. Another reason for the trend is the inherent characteristics of the JVM and popular associated Java frameworks.
TLDR; the JVM takes a while to start up and uses a lot of memory.
This can be a blocker if you want to publish an application on Lambda, primarily because a trade off of using AWS Lambda is that we have to deal with cold starts. As most people with experience using Lambda in production will tell you, cold starts are really not something you should worry about. Their effect on most applications is close to nothing. But still, it’s something that scares people who are new to Lambda. If you want know more about cold starts or don’t have first hand experience here are two good articles.
This is all you need to know about Lambda cold starts | Lumigo
So much has been written about Lambda cold starts. It's easily one of the most talked-about and yet, misunderstood…
Analyzing Cold Start latency of AWS Lambda
Very often when John and I start talking to people about AWS Lambda, especially in the context of Java, the first…
When should I care about cold starts?
There are situations where cold starts are a problem. In this article I’ll bring up two. First, in the case that you really need your backend to always respond in under a second or two, cold starts will be problem for you. Second, if you are using Spring, even though cold starts are infrequent, the amount of time it takes for the Spring application to initialize, and therefore be ready to serve requests, is so long it’s probably unacceptable if your backend serves a user interface. Java already has a performance disadvantage (only in regard to cold starts) compared to other languages you can use with lambda. In addition to the cold start overhead of the JVM itself, your app written with Spring also has to initialize the Spring context with all of the classes your application will use at runtime. In the past, when Spring was becoming popular, this was no concern because a calling client would never need to wait for a Spring application to start up. Startup was something you did in the background while the previous version of your app was still up and serving client requests as they came in. Only when your new deployment was fully booted and ready to respond to traffic would traffic actually be directed to it.
Lambda by contrast does not always work this way. Sometimes a client will call a Lambda and an available execution environment will not be running so the client will have to wait for a new one to be created from scratch. If you’ve worked with docker you can think of it as running
docker run for the first time to create a container instance on a given machine. Most of the time this doesn’t have to happen because there is a “hot” Lambda container instance ready to respond to the request. You will rarely see a Lambda runtime instance alive for more than 40–50 minutes. They are deliberately engineered to be very short lived. This is how AWS can offer the “pay only for what you use” billing model and for the functions to scale up and down rapidly at massive scale.
When we build with Spring we assume that a given iteration of our app will start once and run until it’s time to push an update. The Lambda execution model flips this assumption on its head. When we run our code on Lambda, we are effectively shutting down and restarting our whole app many times a day.
Quarkus, in contrast to Spring is engineered to boot-up quickly and use less memory. It is therefore well suited to use with Lambda. Even more important is the fact that it is designed to work seamlessly with GraalVM so that if desired, you can build your app as a native executable. Here is a blurb from the official GraalVM page:
Native Image is a technology to ahead-of-time compile Java code to a standalone executable, called a native image. This executable includes the application classes, classes from its dependencies, runtime library classes, and statically linked native code from JDK. It does not run on the Java VM, but includes necessary components like memory management, thread scheduling, and so on from a different runtime system, called “Substrate VM”.
A quick review of what’s stated above
- Moves a lot of runtime work to build time
- No JVM needed to run the software artifact (./application instead of application.jar)
Go here to read about it more detail https://www.graalvm.org/reference-manual/native-image/.
Cold starts, Quarkus, Custom Runtimes, Graal what does it all mean?
We’ve established that in some situations cold starts are a problem. By building our application as a native image we will still face cold starts but they will be significantly faster. Lambda will support us running a native image since the Lambda service now allows us to specify our own runtime. By applying Quarkus, and GrallVM to fix the cold start problem, we are effectively authoring our source code in Java but we are not executing our Java classes inside of a JVM. Instead we are doing a lot of the heavy lifting at application build time so at runtime things will go faster.
The intention of this article is not to explain how the magic of Quarkus and GraalVM works, there is plenty of good writing on the Quarkus and Graal sites if you want to know more. In this post, we will look at what Quarkus means to us as rank-and-file developers who want to use it with Lambda.
Comparing 4 “flavors”
I’m going to compare 4 different variations of a completely bare bones “hello world” backend micro-service. Each deployment will have one RESTful route exposed to the internet that a client can call. Each will return a simple string response without doing any extra processing or calling out to any external systems like another API or a database. I’ve deployed one Node.js Lambda with the Express framework so we have something to compare Java against.
As depicted in the diagram the 4 different configurations will be
- Node.js with Express backend framework
- Java 11 (Amazon Corretto) with Spring Boot backend framework
- Java 11 (Amazon Corretto) with Quarkus backend framework
- Code authored in Java with Quarkus then built with graalVM and deployed as native binary executable.
Across these different runtimes and configurations we will compare
- How long it takes for a fresh Lambda Execution environment to start (cold start)
- How much memory is consumed on a cold start
- Cold start vs. warm start (already initialized) Lambda execution time
Let’s first look at how the 4 different “runtime flavors” stack up with 256mb of Ram configured. In a production setting you would probably want to allocate more memory to your function. One interesting design feature of Lambda is that the maximum amount of compute capacity is available for the first ten seconds that your function is initializing on cold start. Therefore it is pointless to allocate more memory to your function hoping to improve the initialization phase performance.
The “average response time” field is just the sum of average duration and average init duration since those two things added together equals how long the function takes to return a response. The Quarkus native Lambda is about neck and neck with Node in terms of average response time. You can see that it uses marginally less memory but takes a few milliseconds more to initialize as well as to run. In my opinion the performance profile is close enough to essentially be equal.
These stats are for COLD STARTS ONLY, so this is worst case scenario. Most of the time the response will come much faster. In my opinion ~300ms is perfectly acceptable as a worst case response time for an API backend that synchronously responds to a client (like a UI). Of course, all of these functions include only a small bit of “hello world” code, when we start adding real logic and importing more external libraries it will affect the function’s duration and how much memory it uses. This isn’t a Lambda specific consideration though, this would be true for any code you want to run.
What’s going on with the “init” phase?
The init phase includes everything that has to happen before your actual Lambda code can run. The only time we will see an init phase is during a cold start. Remember that Lambda itself is a service offered by AWS. Any time we invoke a Lambda we are asking AWS to run our code in one of their billions of ephemeral sandboxes. In the case of a cold start, that sandbox doesn’t exist yet and needs to be built. You can read more about the different lifecycle phases of a Lambda function here. https://docs.aws.amazon.com/lambda/latest/dg/runtimes-context.html
The JVM flavored Lambdas
The average total response time for the JVM Lambda with Quarkus framework is ~5 seconds while the Lambda using a JVM + Spring Boot is ~7 seconds. Spring boot uses on average 181mb of memory and Quarkus 163. It is interesting to note that the Spring Boot Lambda actually executes more quickly than Quarkus but takes Significantly longer to initialize. When you add in the network latency and time it takes to forward the request from API gateway we are looking at an overall 8–9 second wait between when we call the http endpoint and get response when using Spring Boot. At this point we should ask ourselves
Is it acceptable for my REST api to take 8 seconds to respond?
The Quarkus flavor is around 2 seconds faster, which is significant, but over 5 seconds is still not a great total response time. Before we get too concerned though let’s look at how frequently we can expect a client (and by proxy a user) to experience a cold start. I queried the logs to find out. In the table below, if isWarmStart = 1 that means the invocation is a warm start.
These numbers actually surprised me. I knew that cold starts were infrequent but I did not realize just how infrequent. For Spring Boot, just 54 out of 36,195 invocations were cold starts. Thats just .14% of the invocations, meaning our users would get a slow reply about 1 out of 1000 times. This makes the Spring Boot option look a lot more desirable. I think that most businesses would be fine with these odds. Have you ever tried to buy something on Amazon and the checkout took a little extra time to load? I wonder what was happening under the hood? I bet you did not wonder this at the time, I also bet that you didn’t throw your hands up in frustration, walk away from the computer and give up on your purchase. Let’s re-frame the question I asked a moment ago.
Is it acceptable for my REST api to take 8 seconds to respond 1 in 1000 times, and less than a second the majority of the time ?
This is this question you should ask yourself if you are deploying a Java app written with Spring boot on AWS Lambda. I think that most of the time the answer is yes, it is acceptable but of course it all depends on what you are trying to do. I should also point out that a real spring boot application can take significantly longer to initialize. So are occasional cold starts still acceptable if they last 15–20 seconds. That, I’d say is pushing it.
I’ve played around a bit with the aws labs serverless java container and once you start adding code and libraries that a real Spring Boot app would have, (like setting up a JDBC connection) the initialization times become pretty abysmal. In the instructions for implementing spring boot they recommend some techniques like not using component scan. They’ve also built in an option to initialize your app asynchronously
If your application takes longer than 10 seconds to start, AWS Lambda will assume the sandbox is dead and attempt to start a new one. To make the most of the 10 seconds available in the initialization, and still return control back to the Lambda runtime in a timely fashion, we support asynchronous initialization
The mere fact that they’ve had to build this feature in is a bit disconcerting.
Response time for warm invocations
Up to this point I’ve only discussed cold starts. As I said, cold starts only account for roughly .1 % of our function invocations in a best case scenario and possibly up to 1% worst case. The vast majority of the time our Lambda response times will look like this
As you can see they are all plenty fast, with the Quarkus native flavor as the clear winner. In this case there is no “init duration” field because the function is already initialized. These are the kind of numbers we’d expect from a traditional “long running” backend application like we’d deploy with Kubernetes or on an EC2 instance. All of these averages are so fast that there kind of isn’t much to talk about. Technically the native image flavor is more than 10x as fast the Quarkus/JVM flavor, but who really cares since its only a 10ms difference. That amount of time is imperceptible to a human.
What about price ?
One compelling reason we could have for making execution times as short as possible is price. With Lambda we are billed for every millisecond of compute we use and no more. Take note also that we are not billed for function initialization time, only for execution time so although Spring Boot takes a long time to initialize, we won’t be billed for that time. In the examples above we have configured 256mb of ram. We could actually up the ram to 512mb and still pay the same price. On the Lambda pricing page you can see the increments used for billing https://aws.amazon.com/lambda/pricing/.
Let’s check to see what our bill would be for 5 million invocations assuming our function only takes 2 milliseconds to run and has under 512mb of ram configured.
5,000,000(invocations) x 2(ms) x $0.0000000083(price per ms runtime) = 8¢
That is a pretty reasonable prices for 5 million function executions. Let’s take a look at what the Spring Boot version would cost
5,000,000(invocations) x 13(ms) x $0.0000000083(price per ms runtime) = 53¢
As you can see, the price per milliseconds of running Lambda code is unbelievably cheap. Before you get too excited remember that in the real world our Lambda functions will be interacting with downstream systems that are likely not as fast. Any useful code is likely calling other APIs or interacting with external databases. To account for this I’m going to make up a more realistic number. Let’s say that our Lambda interacts with two other rest apis which it must wait on before responding to the client, bringing the total average execution time up to 800ms. Now our Lambda function costs $33 per 5 million requests. Still not bad at all. But now, if we are paying $33 a month for our Lambda function’s 5 million executions there isn’t much point in shaving off a few milliseconds. Doing so will only save us about 50 cents. My point is that the Lambda execution you will pay for typically won’t represent time that your code is processing data, it will be time that your function is blocked waiting to receive a response from another system.
Still its important to be aware of how your bill is calculated and where the costs are coming from. You may have an application that doesn’t reach out to external APIs and databases and instead does all of its processing without making additional networks calls. In this scenario it may be worth it to use Quarkus and GraalVM to get your execution time down as low as possible as it will save you money.
Why you wouldn’t use native-image
After reading this article you might think to yourself, “why would I ever choose the regular old JVM when I can have a native image that runs faster, uses less memory, and is therefore cheaper than the alternative?”. As you’ve probably guessed using the Quarkus native image feature is not without its tradeoffs. The graalVM team has put together a nice explanation of the limitations and tradeoffs involved with this technology. You can read about them here: https://github.com/oracle/graal/blob/master/substratevm/Limitations.md
One of the bigger caveats is that all code must be known at build time. I have worked with Java based tools in the past that take advantage of dynamic class loading, and it can be a pretty cool and useful feature. However, if you are operating in a Lambda environment I can’t really think of a situation where you would use this feature or if you even could. Also dynamic class loading only really makes sense for a runtime that you want to modify without shutting down, which by definition, is an execution environment that runs for a long time and Lambda definitely isn’t that. For this reason, I won’t count this limitation as a minus in the Lambda context.
This guide published by the Quarkus team goes over a few of the other gotchas you have to consider when using the native image feature with Lambda. This link takes you to closer to the end where they discuss the modifications needed for TLS.
Quarkus - Amazon Lambda
The quarkus-amazon-lambda extension scans your project for a class that directly implements the Amazon RequestHandler…
If you want to use native image for many of the Lambda services you’d want to interact with you can’t use the SDK at out of the box and have to swap out the default http client.
Another gotcha that they don’t call out, is the amount of time it takes to build the native images. It took me between 4 and 5 minutes just to build the simple hello world Lambda. I imagine it might take a lot longer with a bigger more complicated app. Even though 4–5 minutes is not a super long time, I think it adds up when you are deploying frequently. Many of us are used to being able to build, package, and deploy our code to some sort of dev environment in well under a minute. In the case of Lambda, you can redeploy new code and have it ready to execute in a few seconds. Often the only way to test or debug a piece of functionality is to push changes over and over in a short span of time. This would get old if you had to wait for the native-image build every time.
The way I would overcome this problem would be to not deploy the native image in lower environments like dev. We can just as easily run the code with Amazon’s JRE. It’s no issue at all if we face cold starts in dev, so we need not deploy a native image there. I would however recommend using the native image in QA + TEST environments. You have to be careful about letting your non-prod environments become different from your production environments. It’s very possible that your code would behave one way running in the JVM and behave another when built as a native image.
None of these gotchas and limitations are too major but they are worth noting. When I’ve been tasked to build an application for a business they just want it done as fast as possible with as few bugs. As developers we have to ask ourselves if its really worth creating extra hurdles for ourselves when the outcome is subtle, and might not even be appreciated by the business or end users. How much of an impact adopting Quarkus will make depends on a number of factors such as the nature of the product you are building and the background and experience level of the developers on your team.
Conclusions: when I think you should and shouldn’t use Quarkus with Lambda
If you have an existing Spring Boot application that you want to move to Lambda, Quarkus with native image may be a good option. This will allow you to overcome the problem of occasionally having very slow response times due to cold starts. When switching from Spring Boot to Quarkus you will have to do a bit of refactoring. Quarkus offers a number of extensions that make it easy to keep a lot of your Spring code as is. Here is a good article about it
Spring Boot on Quarkus: Magic or madness? - Red Hat Developer
Quarkus is a Java stack tailored for OpenJDK HotSpot (or OpenJ9 on zSeries) and GraalVM, crafted from optimized Java…
If you’re going to go in and refactor your application to run on Lambda anyway you should consider if you even need a framework like Spring or Quarkus. It may not be much more effort to just re-wire your app so that each RESTful route+operation combination makes use of its own Lambda function. By doing that you’ll enjoy the following benefits
- each Lambda function is very simple making it easy to understand and debug
- a bug or venerability in one part of your app won’t effect the rest
- you’re cold start times and memory use will improve drastically without having to do any hacks to the AWS Lambda golden path. (By golden path I mean the standard recommend AWS way)
Here is an example of how you can build a Serverless Java microservice using just the Serverless framework. In my opinion this is the best way to port your application over to serverless, assuming your are willing to undertake the significant refactoring. Simply put the Serverless framework was designed from the ground up to deploy applications on Lambda, Spring and Quarkus were designed with completely different goals and assumptions.
How to create a REST API in Java using DynamoDB and Serverless
In this walkthough, we will build a products-api serverless service that will implement a REST API for products. We…
The key distinction is that Serverless is not an application framework it’s a devOps framework. It doesn’t add any overhead to booting your application or impose any conventions on how you write your Java code. In this example multiple lambda functions are part of one code base and can be simultaneously deployed and updated with one deploy command.
If you are starting 100% greenfield I encourage you to even consider using Node over Java. I don’t think that Node is superior to Java, Java probably has more features and is more powerful in a lot of ways. But from my experience building on Lambda, the devSumer experience is smoother and more enjoyable with Node. I also think Node makes more sense for organizations going forward because Node skills are applicable to both front end and back end development. I’ve experienced and also heard about situations where backend or frontend teams have different workloads or move at different paces and end up waiting on one another so developer bandwidth goes un-utilized. It makes more sense to use a language that allows developers to be cross functional and work on any part of the stack.
I hope this article has informed you of the benefits and tradeoffs of using Quarkus and the native image functionality in the context of Lambda so that you can apply this knowledge in the context of your team and what your are trying to accomplish.
Notes on what I deployed
Here are the guides I used to deploy each different Lambda runtime flavor in case you want to try yourself.
Quarkus JVM version + native version: https://quarkus.io/guides/amazon-lambda-http
Here is all the code I used to conduct these experiments on gitHub https://github.com/bmccann36/lambda-runtime-flavors