Node if Error Try Again in 1 Min

Dec 17, 2019

Listen to this article

This weblog post is adjusted from a talk given by Julián Duque at NodeConf European union 2022 titled "Let it crash!."

Before coming to Heroku, I did some consulting piece of work every bit a Node.js solutions builder. My job was to visit various companies and make sure that they were successful in designing product-gear up Node applications. Unfortunately, I witnessed many unlike bug when it came to error treatment, especially on process shutdown. When an error occurred, at that place was frequently non enough visibility on why it happened, a lack of logging details, and bouts of reanimation as applications attempted to recover from crashes.

Julián: Okay. So, as Brian said, my proper name is Julián Duque, it volition be in proper Spanish. I come up from a very beautiful town in Columbia called Medellín. And then, if you haven't gone there, please visit us. That we accept an amazing community, as Brian said. Right now, I work every bit a senior developer advocate for Heroku. So, I live in United states of america. Sadly, I'yard away from my country, only I ever constantly in communication with my community, and that's pretty much true to main conferences that I organize. 1 is the NodeConf Republic of colombia and the other is the JSConf Colombia. So, I know if yous are like me right now, you are needing coffee. I'm needing java too. Information technology's super early on. And then, please don't crash now. Permit's wait until my talk finish, and we can have some java to keep us awake.

Julián: Then, a little flake of some background virtually this talk, why I presented this. These are pretty much lessons learned while I was working at NodeSource, previously. I was doing consulting work as a solutions architect, pleasing the customer, making sure they were using Node.js properly and they were successfully using Node. And I saw a lot of unlike bad patterns out there on how other companies were doing fault handling, and particularly when the process were crashing or the process were dying. They didn't have enough visibility. They didn't have logging strategies in place. They were missing the very important data almost why the Node processes were having bug or were crashing. They were experiencing reanimation, and nosotros started to collect in a set of best practices and recommendations for them, that are aligned with the overall Node.js community.

Julián: If yous go to the documentation, in that location are going to be pretty much the same recommendations that I'm going to exist speaking about today. We add a couple other more things to make sure you have a very good exit strategy for your Node.js Processes. These best practices applies pretty much for spider web and network based applications because we are going to cover also the graceful shutdowns, only you lot can utilize them for other type of Node.js applications that are constantly running. And Node, sadly, is not Erlang. If you know nearly Erlang or leaks related crashes, just similar a term that it'due south very common in that community. When I started learning Erlang back in 2014, I loved the mistake tolerance options that these platform and language has. And I ever think about how to bring the aforementioned experience into Node.js, is non the same because you can't do whole code reloading or function swapping on Node. You lot can do those things on Erlang, but withal, Node is pretty lightweight, and you can easily restart and recover from a crash.

Julián: First, before getting into the bad identify or when bad things happen, how to brand certain that everything is proficient? What do we need to exercise to our Node applications to make sure they are running properly? And so, first, as a recommendation, and there is going to be a workshop later most this specific affair, cloud native JS, don't miss this worship by Beth. She's going to likewise mention about how to add wellness checks to you Node.js processes. So, pretty much as our recommendation, add a wellness cheque road, it's a simple road that is going to render a 200 status code, and you will need to ready something to monitor that route. You can do information technology at your loa balancer level. If you are using a reverse proxy, or a load balancer like nginx, or HAProxy, or yous're using ELB, ALB, whatever blazon of application that is being the top layer of your Node.js process existence constantly monitoring that the health bank check is returning okay. And then y'all are making sure that everything is fine.

Julián: And also, rely on APM, some tools that are going to monitor the performance and the wellness of your Node.js Processes. And then, in order to make sure that everything is running fine, you volition demand to take tools, some very known tools, New Relic, App Dynamics, Dynatrace, and N|Solid. A lot of them in the marketplace volition give you manner more than visibility effectually the health of your Node.js processes, and you can live in peace when yous are making certain your Node is running properly. But what to do if something bad and unexpected happens? So, what should we do with our Node.js processes? Letting them crash. If something bad and unexpected happened, I will permit my Node.js procedure crash, but in order to be able to practise it and drive, nosotros will need to implement a set of best practices and follow some steps to make sure that the application is going to restart properly and continue running and serving to our customers and clients.

Julián: Before letting it crash, we will need to learn most the process lifecycle, especially on the shutdown side of things, some error handling best practices. There is going to be also some other very recommended workshop around information technology. I'k not going to be covering how to properly handle errors in Node.js, but on shutdown, and this is pretty much so you stop worrying about unexpected errors and increased visibility of your Node.js processes, increased visibility of what happened when your process crashes and what might be the reason, so you tin can fix it and iterate over your application. Then, similar to coming back to the Erlang concept, a Node.js procedure is very lightweight. It's a minor in retentiveness. It doesn't have a very big memory footprint, and the thought is to keep the processes very lean at a startup, so they can start like super fast. If y'all have a lot of operations, like high intensive CPU or synchronous functioning at a startup, information technology might subtract the ability to restart super fast, your Node.js processes.

Julián: And then, try to proceed your processes very lean on a startup. Use the strategies, like prebuilding, so you are not going to build on a startup or on the bootstrap of your process. Exercise everything before you are going to start your process, and if something unexpected and bad happens, but exit and start a new Node.js process as shortly every bit possible to avoid downtime. And pretty much this is called a restart. You're late in the process crash, and so offset the new one. Just we will demand to have some tools in place and settings to exist able to have something that restarts all our Node.js processes. So, permit's acquire how to leave a Node.js process. Then, there are two common methods on the process module that will help you to shut downwardly or finish a Node.js process. The almost mutual 1 is the process.leave. You lot tin can laissez passer an go out code, the zero if it's a success exit or higher than zero, commonly one, if information technology'south a failure. And this pretty much instructs Node.js to end a process with a specified exit lawmaking.

Julián: And there is the other one, which is a process.abort. With the process.arrest, information technology'due south going to crusade Node.js process to exit immediately and generate the cadre file, if your operating system has core dumps enabled. Then, in order to be able to have more visibility on postmortem debugging, to be able to see what happened or what clashes your Node.js process. If in that location is a memory issue, you can call process.abort, it will generate a core dump, and then you tin can utilize tools like llnode, which is a plugin for lldb to do a C and C++ debugging of the core dump, and to run across what might happen in the native side of Node.js when your process scratch. So those are the two options you have to exit the Node.js process. How to handle exit events? And then Node.js, it needs ii different or two principal events when your Node.js process is exiting. One is the beforeExit. And then the beforeExit, it's a handle that tin make asynchronous calls and the result loop will continue to work until it finishes.

Julián: And then before the process is catastrophe, you can schedule more work on the upshot loop, practice more than a synchronous task so you tin can clean up your procedure. This event is not firsthand on weather that are causing explicit termination like on an uncaught exception or when I explicitly call procedure that exit. Then this is all other exit scenarios. And the exit result, information technology's a handle, also can't brand a synchronous call. But synchronous calls can happen in this role of the process life cycle considering the event loop doesn't have whatsoever more than work to do. Then the issue loop is paused in here. And so, if y'all try to practice any asynchronous calling here, is not going to be executed. Only synchronous calls tin can happen here and this event is firsthand when process.go out is chosen explicitly. It's commonly used if you want to log at the end, some information when you procedure.exit, my process.exit with the specific leave code and you want to add some more than context around the land of your application at the fourth dimension that the process exits.

Julián: Some examples how to use it. You attach those events on the process module. The beforeExit can do asynchronous code so that setTimeout, even though the event loop is pause at that moment when you are scheduled more asynchronous work, information technology will receive the event loop and continue until there is no more work to exercise. There's i affair I want to mention here is that normally a Node.js process exits when in that location is no more work is scheduled on the outcome loop. When there is aught else on the event loop, a process is going to go out. How does a server keeps running? Because information technology has a handle register on the effect loop, similar a socket waiting for connections and that's why a spider web server is constantly running until you close the server or you interrupt the process. Otherwise, if there is something register on the event loop, the Node.js process is going to continue running. So in this case, I execute setTimeout, schedule more work, it will continue working on until at that place is no more left to exercise.

Julián: On process.exit, pretty much but synchronous calls. I can do anything here with the issue loop. The event loop is thoroughly paused, useful for logging data or saving the state of the application and leave. There is are a couple of signal events that are going to be useful on shutdown. There is the SIGTERM and SIGINT. SIGTERM, it'south commonly firsthand when a process monitor transport a signal termination to your Node.js process to tell them that there is going to exist a successful way to shut downward your process. When you lot execute on systemd or using upstart, when you send end that service or stop that procedure, it's going to sending that SIGTERM to your Node.js procedure and you can handle that consequence and do some piece of work on that specific part of the life cycle. And the SIGINT, it's an interruption. It is immediate when yous interrupt the Node.js processes, normally when you do command-C, when you are running on the console, yous tin can as well capture that effect and practise some work around it.

Julián: So these are two ways to expectedly finalize a Node.js process. So these 2 events are considered a successful termination. This is why I'm exiting hither with the go out code nil because information technology is something that is expected. I say I don't desire this process to continue running. And there is also the error events. So at that place are two main mistake events. One is the uncaughtException, the famous uncaughtException. And recently, in promises we're introducing to Node, the unhandledRejection. And so the uncaught exception is immediate when a JavaScript error is not properly handled. So it pretty much represents a developer error or represents a bug in your code. If an uncaughtException happens, the recommendation is to e'er crash your process, let it crash. Don't try to recover from an uncaughtException considering it might give you some troubles. And while even though, the community is not totally agree on the second one.

Julián: I will say the same for an unhandledRejection. An unhandledRejection, information technology is immediate when a hope is rejected and there is no handle attached to the promise. So at that place is no grab attached to the promise. Information technology my represent an operational error, it my represent a programmer error, so it depends of what happened here. But in both of those cases, it's improve to log as much information as possible. Treat those every bit P1 bug that needs to exist fixed in the next iteration or in the side by side release. And then if you don't have whatsoever strategy in identify to exist able to identify why your processes are crashing and you are not fixing and handling those properly, your application are going to remain having box. So if it is an uncaught exception, that's a bug, that's a programming fault, that is something that is not expected. Please crash, log and file an effect, so that needs to exist stock-still.

Julián: If it is an unhandled rejection, run into if this is a developer mistake or if it's an operational error that needs to be handled and go update the code, add the proper treatment to that hope and continue with your job. So as I say in both cases an error event, it's a cause of termination for your Node.js procedure. Always get out the procedure with an exit code different than zero. And so it'southward going to exist one. So your process monitor and your logs know that it was a failure and as I say, don't try to recover from an uncaught exception. While I was working as a consultant, I saw a lot of people trying to do a lot of magic to avoid the Node.js processes dying by adding some complex logic on uncaught exception. And that always ended your application on a bad land. They were having memory leaks or having sockets hanging and information technology was a mess. So it'due south cheaper to let it crash, beginning a new process from a scratch and proceed receiving more requests.

Julián: So a couple of examples on uncaught exception and unhandled rejection. The uncaught exception received such an statement and error instance. And then yous go the information about the error that was thrown or that wasn't handling your Node.js code. And the unhandled rejection is going to requite you a reason which can be an mistake instance tool and it will give you the hope that was not properly handled. Then those are useful data that yous tin accept in your logs to accept more information where things are failing in your code. But we saw how to handle the events, how to handle the errors, some of the all-time practices, just how to do it properly? What we need to do a amend to exist able to take a very good shutdown a strategy for Node.js processes? So the start one is running more than i process at the same time. So rely on scaling load balancer processes, having more than than one. So in that way, if one of those processes crashes, there is another procedure that is alive and information technology's able to receive requests.

Julián: Then it volition give you lot fourth dimension to practise the restart and all the requests that are coming in. And maybe the only issue you are going to have are with the requests that were already happening in the Node.js process that crashes. But this is going to give you a niggling flake more leverage and prevent downtime. And what practise you use for load balancing? Use whatever you lot take in hand. If it's nginx or HAProxy as a reverse proxy for your Node.js applications. If you are on AWS or on the cloud, you tin use their rubberband load balancer application, load balancers or the gild load-balancer solutions that cloud offers. If you are on Kubernetes, you can use Ingress or other different in the load balancer strategies for your awarding. So pretty much make sure that you have more than 1 Node.js process running, then y'all tin be more in peace if one of those processes crashes. Y'all will need to have process monitoring and process monitoring needs a pretty much something that is running in your operating system or an application that it's constantly checking if your process is alive or not.

Julián: If it crashes, if there is a failure, the procedure monitor is in charge of restarting the process. So, the recommendation is to e'er use the native process monitoring that it's available on your operating organization. If it's Unix or Linux, you can use systemd or upstart, specifically adding the restart on failure or respond when y'all are working on upstart. If you are using containers, utilize whatsoever is available. Docker has the restart pick, Kubernetes has the restart policy and you can also configure your processes to restart when it fails to retry a number of times. So you don't go into a crazy error, that is going to constantly brand your awarding crash and you cease upwards in the crash loop. Then you tin add together some retries into in that location simply always have a process monitoring in place. If yous can't use any of these tools as a last resource--but not recommended--use a Node.js procedure monitor like PM2 or forever.

Julián: But I will not recommend these to any customer of mine or whatever friend, but if you don't take whatsoever more resource, if you can use the native stuff in your operating system or if you are not using containers, you can go this way. These tools are expert for development. Don't get me wrong. If you lot are logging on the development and they're very proficient tools to restart your processes when the crashes. But for production, they might not be the all-time. Let's talk nigh piddling scrap most a svelte shutdown. So nosotros accept a web server running. The spider web server is getting asking and it's getting connexion. Quondam we take some established connections between our customers or clients and the server. But what happens when the process crashes? When the process crashes, if we are not doing a graceful shutdown, some of those sockets are going to exist kept hanging and are going to look until a timeout has been reached and that might cause down time and a decreased experience of your users. And then it is better. So setting up an un-reference timeout is going to let the server do its task.

Julián: So, we will need to close the server, it'southward explicitly say to the server, stop receiving connections so they can refuse the new connection. So new connections are going to the new or to the other Node.js procedure that is running through the load balancer and it will be able to send a TCP packet to the clients that are already connected. And then they are going to be finishing the connexion immediately when the server dies. They are not going to stay waiting until a timeout is accomplish out. They are going to be closing that connection and on the next retry, nosotros expect that the process has restarted at that signal or they go to another process that is running. And then one example of that, united nations-reference time out, when nosotros are handling the signal or error event, which is the shutdown function of the life cycle. What nosotros tin can exercise, it'due south too explicitly telephone call server.close. If information technology is an instance of the net server, which is the same one that uses the http or https, Node modules, yous tin can laissez passer a callback.

Julián: So when information technology finishes closing the connection, it will exit the process successfully. But we will need to have our timeout in place because we don't want to wait for a long time. Imagine if nosotros had a lot of dissimilar clients continued that information technology'due south taken a lot of time to clean up those processes. We need to have some way to have an internal timeout. So here, we are scheduling a new timeout, but that timeout is non on the consequence loop. That terminal part the, unref is not the scheduling the timeout on the upshot loop, and then it is not adding more than work to the event loop. So when the timeout is reach or the server close callback is reach, either of those paths are going to close the Node.js process. So this is a race between the two, between your time out that is non in the result loop or between the server close, whichever works meliorate. And what timeout time nosotros practise need to put here depending on the needs of your applications.

Julián: We had customers that had the need to have very few timeouts or a small fourth dimension out considering they were doing a lot of existent time trading and they needed the processes to restart as far as possible. There are others that can accept longer timeouts to lead, or when the connection finishes, then this depends of the use example. If you don't add together the unref in hither, since this timeout is going to exist a schedule on the event loop, it's going to wait until it finishes and the procedure is going to stop. So this is similar a safeguard. So there is no more work schedule on the event loop while we are exiting our process. Logging, this is i of the virtually important parts of having a very proficient go out strategy for Node.js processes. So implement the robust logging strategy for your application, especially on shutdown. If an error happens, please log as much information as possible. An fault object will contain not only the message or the toll of the mistake, just it will also comprise the stack trace.

Julián: And then if you log the stack trace, you will be able to come back to your code and fix and wait specifically why information technology failed and where it fail. And you can rely on libraries, similar pine or winston and use send to store the logs in an external service. Yous tin use like Splunk or Papertrail or utilize whatever y'all similar to shop the logs. But accept a mode to always get back to the logs, search for those uncaught exceptions and unhandled rejections and being able to identify why your processes are burdensome. Set up those issues and proceed with your work. So how can we put these altogether? I have some pattern I use on my projects simply in that location is besides a lot of modules on NPM that are going to practise the aforementioned thing even amend than the approach I'yard following hither. So this is a pattern I use. I create a module called terminate or I apply a file called stop. I pass the server similar the instance of that server that I'1000 going to exist endmost and some configuration options if I want to enable core dumps or not, and the timeout.

Julián: Usually when I desire to enable the core dump of Node, I utilize an environs variable. When I am going to do some operation testing on my application or I want to replicate the error, I enable the core dump. I let it crash with the procedure.abort, I bank check out the cadre dump and get more than information about information technology. And so here, I have our exit part that switches between the abort or the process.get out, depending of the configuration you have here. And the rest, I'm returning a function that returns a function and that function is the i that I'm going to be using as the go out handler. And this is pretty much the code that I'thou going to be using for uncaught exceptions, unhandled rejections, and signals. And here, log as much as possible. I'chiliad using console log for simplicity, but please use a proper logging library here. And pretty much if there is an error and if that is an instance of the error, I want to go information virtually the bulletin and the stack trace. And at the end, I'm going to be trying doing the graceful shutdown.

Julián: So this is the same thing I explained before. I volition shut this server and also I will have a timeout to besides close the server afterwards that timeout happens. So it depends any ends first. And how to employ this small module I have here, this is as an instance, I have an issue to the server. I have my terminate code that I utilize for my project. I create an exit handler with the options with the server I'chiliad running, with the different parameters I want to pass into my exit handler and I attach that function into the different events. So here exit handler, on uncaught exception and unhandled rejection, I'm going to render an get out code of one and I can add together a message to my logs to say what type of fault or what type of treatment was this, and besides with the signals. And with the signals, I'one thousand passing an exit code of zero because it is something that there is going to be successful.

Julián: So this is pretty much what I have for today and the presentation, some resources that are going to be useful for yous. Please don't miss Rubin Bridgewater workshop later today. It's going to exist chosen "Error Handling: doing information technology right". Once again, it'southward going to be explaining now how to avert getting here? How to avoid getting into the uncaught exception side of things? How to properly create the error objects to take more than visibility? How to handle promises, rejections? And so, these are going to exist a very good presentation and as well the cloud native JS by Beth. She's going to exist mentioning as well how to add together monitoring to application wellness checks. So those are going to be good things if you lot want to run Node.js properly in production. Some NPM modules to take a look that pretty much solve the consequence I was talking nearly today. There is a module I like, the terminus by the team at GoDaddy.

Julián: It supports adding health checks to your awarding. It has a C betoken handlers also. It has a very good graceful shutdown strategy. Way more circuitous than the i I presented you. This is something that you can add together to your projects pretty easily. Just create an case of terminus, configure information technology, and add the different handlers there. There is another module called stoppable. Stoppable is the decorator over the server form that is going to be able to implement not a close part, merely a terminate function and it's going to be also doing a lot of things around a graceful shutdown. And at that place is also a module that pretty much is what I presented today. It'due south called http-graceful-shutdown. Y'all also pass an instance of your HTTP server and it has different handlers, y'all can meet what happened when in that location is an error or what signals I'm going to exist monitoring.

Julián: It's pretty much... It'south all going to exist resource that are going to simplify your life and brand you lot a improve up running Node in production and you volition be able to allow it crash. 1 last thing, I want to invite you to Nodeconf Republic of colombia, so save the day. This is going to happen June 26 and 27, 2020. It's going to happen in Medellín, Columbia. More than information at nodeconf.co. CFP is not open up yet, just I volition await a lot of you all sending proposals to go to Medellin. We pay for travel, we pay for a hotel. And if y'all want to know a little fleck about the experience of speaking at a conference in Columbia, you tin can inquire James, you can inquire Anna, and I recollect yous can ask Brian. There is a couple of folks here that have spoken there and cheers very much. This is it.

Nosotros started to assemble a collection of best practices and recommendations on error treatment, to ensure they were aligned with the overall Node.js community. In this mail, I'll walk through some of the background on the Node.js procedure lifecycle and some strategies to properly handle svelte shutdown and quickly restart your application after a catastrophic error terminates your program.

The Node.js procedure lifecycle

Allow's first explore briefly how Node.js operates. A Node.js process is very lightweight and has a small memory footprint. Considering crashes are an inevitable part of programming, your principal goal when architecting an awarding is to go on the startup process very lean, and so that your awarding tin can quickly kick up. If your startup operations include CPU intensive work or synchronous operations, it might bear upon the power of your Node.js processes to quickly restart.

A strategy you tin employ here is to prebuild every bit much as possible. That might mean preparing information or compiling avails during the building process. It may increment your deployment times, but it's better to spend more than time outside of the startup procedure. Ultimately, this ensures that when a crash does happen, y'all can get out a process and start a new one without much reanimation.

Node.js go out methods

Let's have a look at several ways you can finish a Node.js process and the differences between them.

The near common office to use is process.exit(), which takes a single argument, an integer. If the argument is 0, it represents a successful get out state. If it'south greater than that, it indicates that an mistake occurred; 1 is a common exit code for failures here.

Another pick is procedure.arrest(). When this method is called, the Node.js process terminates immediately. More than importantly, if your operating system allows it, Node volition also generate a core dump file, which contains a ton of useful information about the process. You can employ this core dump to do some postmortem debugging using tools like llnode.

Node.js exit events

As Node.js is built on height of JavaScript, it has an outcome loop, which allows you to listen for events that occur and act on them. When Node.js exits, it also emits several types of events.

One of these is beforeExit, and every bit its name implies, information technology is emitted right before a Node process exits. Yous can provide an issue handler which tin make asynchronous calls, and the event loop will continue to perform the work until it's all finished. It'southward important to notation that this event is not emitted on process.get out() calls or uncaughtExceptions; we'll go into when y'all might use this event a fiddling later.

Another effect is get out, which is emitted simply when process.exit() is explicitly chosen. As it fires after the event loop has been terminated, you lot can't practice whatever asynchronous work in this handler.

The code sample below illustrates the differences between the two events:

            procedure.on('beforeExit', code => {   // Tin can make asynchronous calls   setTimeout(() => {     console.log(`Process will get out with code: ${code}`)     procedure.leave(code)   }, 100) })  process.on('leave', code => {   // Just synchronous calls   console.log(`Procedure exited with code: ${code}`) })

Os point events

Your operating organisation emits events to your Node.js process, too, depending on the circumstances occurring outside of your program. These are referred to as signals. Two of the more common signals are SIGTERM and SIGINT.

SIGTERM is normally sent by a process monitor to tell Node.js to expect a successful termination. If you're running systemd or upstart to manage your Node application, and y'all stop the service, information technology sends a SIGTERM event then that you can handle the process shutdown.

SIGINT is emitted when a Node.js process is interrupted, usually as the upshot of a command-C (^-C) keyboard event. You tin can too capture that event and do some work around it.

Hither is an example showing how y'all may act on these signal events:

            process.on('SIGTERM', point => {   console.log(`Process ${procedure.pid} received a SIGTERM bespeak`)   procedure.exit(0) })  process.on('SIGINT', signal => {   panel.log(`Process ${procedure.pid} has been interrupted`)   process.exit(0) })

Since these 2 events are considered a successful termination, we call process.get out and pass an argument of 0 because it is something that is expected.

JavaScript mistake events

At concluding, we arrive at higher-level error types: the error events thrown past JavaScript itself.

When a JavaScript error is non properly handled, an uncaughtException is emitted. These propose the programmer has made an error, and they should be treated with the utmost priority. Usually, it means a issues occurred on a piece of logic that needed more testing, such as calling a method on a nil type.

An unhandledRejection fault is a newer concept. It is emitted when a promise is not satisfied; in other words, a promise was rejected (information technology failed), and there was no handler attached to respond. These errors can indicate an operational mistake or a programmer fault, and they should besides exist treated as high priority.

In both of these cases, yous should do something counterintuitive and permit your program crash! Delight don't endeavor to be clever and introduce some complex logic trying to forbid a process restart. Doing then will almost always leave your awarding in a bad state, whether that'southward having a retentiveness leak or leaving sockets hanging. It's simpler to let information technology crash, commencement a new process from scratch, and go along receiving more requests.

Here'south some code indicating how you lot might all-time handle these events:

            process.on('uncaughtException', err => {   console.log(`Uncaught Exception: ${err.message}`)   process.exit(ane) })

Nosotros're explicitly "crashing" the Node.js process here! Don't be agape of this! It is more likely than not unsafe to keep. The Node.js documentation says,

Unhandled exceptions inherently mean that an awarding is in an undefined state...The correct apply of 'uncaughtException' is to perform synchronous cleanup of allocated resources (e.g. file descriptors, handles, etc) before shutting downwards the process. Information technology is not safe to resume normal functioning later 'uncaughtException'.

            procedure.on('unhandledRejection', (reason, promise) => {   console.log('Unhandled rejection at ', promise, `reason: ${err.message}`)   process.go out(one) })

unhandledRejection is such a common error, that the Node.js maintainers take decided it should really crash the procedure, and they warn us that in a future version of Node.js unhandledRejectiondue south will crash the process.

[DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will finish the Node.js process with a non-zero go out code.

Run more than one process

Even if your process startup time is extremely quick, running only a unmarried process is a risk to safe and uninterrupted application operation. We recommend running more 1 process and to use a load balancer to handle the scheduling. That way, if one of the processes crashes, there is some other process that is live and able to receive new requests. This is going to requite you lot a little flake more than leverage and prevent downtime.

Utilize whatsoever you accept on-hand for the load balancing. You can configure a contrary proxy similar nginx or HAProxy to do this. If yous're on Heroku, you can scale your application to increase the number of dynos. If you're on Kubernetes, yous can use Ingress or other load balancer strategies for your application.

Monitor your processes

You lot should have process monitoring in-place, something running in your operating organisation or an application surroundings that's constantly checking if your Node.js procedure is live or not. If the procedure crashes due to a failure, the process monitor is in accuse of restarting the process.

Our recommendation is to always employ the native procedure monitoring that's available on your operating organisation. For instance, if yous're running on Unix or Linux, yous tin can use the systemd or upstart commands. If you lot're using containers, Docker has a --restart flag, and Kubernetes has restartPolicy, both of which are useful.

If you can't employ any existing tools, employ a Node.js process monitor like PM2 or forever equally a last resort. These tools are okay for development environments, merely I can't really recommend them for product use.

If your application is running on Heroku, don't worry—nosotros take care of the restart for you!

Graceful shutdowns

Let's say we take a server running. It's receiving requests and establishing connections with clients. But what happens if the procedure crashes? If we're not performing a svelte shutdown, some of those sockets are going to hang around and continue waiting for a response until a timeout has been reached. That unnecessary time spent consumes resource, somewhen leading to reanimation and a degraded experience for your users.

Information technology'due south best to explicitly stop receiving connections, and then that the server can disconnect connections while information technology's recovering. Any new connections will get to the other Node.js processes running through the load balancer

To do this, y'all tin can call server.close(), which tells the server to stop accepting new connections. Most Node servers implement this class, and it accepts a callback function equally an argument.

At present, imagine that your server has many clients continued, and the majority of them have non experienced an error or crashed. How can you shut the server while non abruptly disconnecting valid clients? Nosotros'll need to use a timeout to build a organisation to indicate that if all the connections don't close within a certain limit, nosotros will completely shutdown the server. We do this because nosotros want to requite existing, salubrious clients time to end up only don't want the server to await for an excessively long time to shutdown.

Here's some sample lawmaking of what that might await like:

            process.on('<signal or error effect>', _ => {   server.close(() => {     process.exit(0)   })   // If server hasn't finished in 1000ms, shut down process   setTimeout(() => {     process.exit(0)   }, 1000).unref() // Prevents the timeout from registering on outcome loop })

Logging

Chances are you accept already implemented a robust logging strategy for your running awarding, so I won't go into it as well much about that hither. Just recall to log with the same rigorous quality and amount of information for when the application shuts downward!

If a crash occurs, log as much relevant information as possible, including the errors and stack trace. Rely on libraries like pine or winston in your application, and store these logs using one of their transports for better visibility. Yous tin can also take a wait at our various logging add-ons to detect a provider which matches your application's needs.

Brand certain everything is still good

Last, and certainly not least, we recommend that you lot add together a health bank check road. This is a simple endpoint that returns a 200 status code if your awarding is running:

            // Add together a health check road in express app.get('/_health', (req, res) => {   res.status(200).send('ok') })

You tin can have a separate service continuously monitor that route. You can configure this in a number of ways, whether by using a opposite proxy, such as nginx or HAProxy, or a load balancer, like ELB or ALB.

Any application that acts equally the top layer of your Node.js process can be used to constantly monitor that the wellness check is returning. These volition too give you lot way more visibility around the health of your Node.js processes, and you can rest easy knowing that your Node processes are running properly. There are some cracking great monitoring services to help yous with this in the Add-ons section of our Elements Marketplace.

Putting it all together

Whenever I piece of work on a new Node.js projection, I apply the same function to ensure that my crashes are logged and my recoveries are guaranteed. It looks something like this:

            function cease (server, options = { coredump: false, timeout: 500 }) {   // Exit role   const exit = code => {     options.coredump ? process.arrest() : process.go out(code)   }    render (code, reason) => (err, hope) => {     if (err && err instanceof Error) {     // Log error information, utilize a proper logging library hither :)     console.log(err.bulletin, err.stack)     }      // Endeavour a graceful shutdown     server.close(exit)     setTimeout(go out, options.timeout).unref()   } }  module.exports = terminate

Here, I've created a module called cease. I pass the instance of that server that I'm going to exist closing, and some configuration options, such as whether I want to enable core dumps, as well as the timeout. I usually employ an environment variable to command when I want to enable a core dump. I enable them only when I am going to do some functioning testing on my application or whenever I want to replicate the error.

This exported function tin then exist set to mind to our error events:

            const http = crave('http') const terminate = require('./cease') const server = http.createServer(...)  const exitHandler = terminate(server, {   coredump: false,   timeout: 500 })  process.on('uncaughtException', exitHandler(1, 'Unexpected Error')) process.on('unhandledRejection', exitHandler(ane, 'Unhandled Promise')) process.on('SIGTERM', exitHandler(0, 'SIGTERM')) process.on('SIGINT', exitHandler(0, 'SIGINT'))

Boosted resources

There are a number of existing npm modules that pretty much solve the aforementioned issues in a similar ways. You tin check these out too:

@godaddy/terminus
stoppable
http-graceful-shutdown

Hopefully, this information will simplify your life and enable your Node app to run ameliorate and safer in production!

thorpedisser.blogspot.com

Source: https://blog.heroku.com/best-practices-nodejs-errors