A Study of Graceful Shutdown for Spring Boot Applications

Recently, I took a look at the restart scripts of the project and found that Ops has been using kill-9<pid> to restart springboot embedded tomcat, in fact, we almost unanimously agree that kill-9<pid> is a more violent way, but few people can analyze what problems it will bring. This article mainly records my own thinking process.

What is the difference between kill -9 and kill -15?

In the old days, the usual step to publish a web application was to package the project as a war package and drop it on a Linux machine with a configured application container (e.g., Tomcat, Weblogic), at which point we could start/shutdown the application simply by running the start/shutdown script. springboot provides another way to package the entire application together with the built-in tomcat server, which certainly brings a lot of convenience to publishing applications, but also raises a question: how to close the springboot application? An obvious approach is to find the process id according to the application name and kill the process id to close the application.

The above description of the scenario leads me to ask: how to gracefully kill a springboot application process? In Linux, the kill command is responsible for killing a process, and can be followed by a number representing Signal (Signal), and by executing the kill-l command, you can see all the signal numbers at a glance.

[root@localhost ~]# kill -l  
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL       5) SIGTRAP
 6) SIGABRT      7) SIGBUS       8) SIGFPE       9) SIGKILL     10) SIGUSR1
11) SIGSEGV     12) SIGUSR2     13) SIGPIPE     14) SIGALRM     15) SIGTERM
16) SIGSTKFLT   17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG      24) SIGXCPU     25) SIGXFSZ
26) SIGVTALRM   27) SIGPROF     28) SIGWINCH    29) SIGIO       30) SIGPWR
31) SIGSYS      34) SIGRTMIN    35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3
38) SIGRTMIN+4  39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7
58) SIGRTMAX-6  59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX

This article focuses on the 9th signal code KILL and the 15th signal number TERM.

First of all, let’s understand the difference between these two commands: kill-9pid can be interpreted as the OS forcibly killing a process from the kernel level, while kill-15pid can be interpreted as sending a notification to tell the application to shut down actively. This comparison is still a bit abstract, so let’s look at the performance of the application to see what the difference between the two commands to kill the application.

Code preparation

Since I have more exposure to springboot, I will use a simple springboot application as an example to start the discussion and add the following code.

Add a class that implements the DisposableBean interface

@Component
public class TestDisposableBean implements DisposableBean {
    @Override
    public void destroy() throws Exception {
        System.out.println("测试 Bean 已销毁 ...");
    }
}

Add hooks for JVM shutdown

@SpringBootApplication
@RestController
public class TestShutdownApplication implements DisposableBean {
    public static void main(String[] args) {
        SpringApplication.run(TestShutdownApplication.class, args);

        Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
                @Override
                public void run() {
                    System.out.println("执行 ShutdownHook ...");
                }
            }));
    }
}

Testing steps

execute java-jar test-shutdown-1.0.jar to get the application up and running.
test kill-9pid, kill-15pid, ctrl+c and output the log contents.

Test results

kill -15 pid & ctrl+c , the effect is the same, the output result is as follows.

2018-01-14 16:55:32.424  INFO 8762 --- [       Thread-3] ationConfigEmbeddedWebApplicationContext : Closing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@2cdf8d8a: startup date [Sun Jan 14 16:55:24 UTC 2018]; root of context hierarchy
2018-01-14 16:55:32.432  INFO 8762 --- [       Thread-3] o.s.j.e.a.AnnotationMBeanExporter        : Unregistering JMX-exposed beans on shutdown
执行 ShutdownHook ...
测试 Bean 已销毁 ...
java -jar test-shutdown-1.0.jar  7.46s user 0.30s system 80% cpu 9.674 total

kill -9 pid , which does not output any application logs.

1
2

[1]    8802 killed     java -jar test-shutdown-1.0.jar
java -jar test-shutdown-1.0.jar  7.74s user 0.25s system 41% cpu 19.272 total

As you can see, kill -9 pid takes the application by surprise, leaving no opportunity for the application to react. On the other hand, kill -15 pid is more elegant, as it is first notified by AnnotationConfigEmbeddedWebApplicationContext (an ApplicationContext implementation class), followed by the execution of the Shutdown Hook in the test code, and finally The DisposableBean#destory() method is executed. The difference between the two is immediately obvious.

We usually handle the “aftercare” logic when the application is closed, such as

closing the socket link
clean up the temporary files
send a message to the subscriber to inform them that they are offline
notifying the child process that it will be destroyed
releasing various resources

The kill -9 pid, on the other hand, directly simulates a system downtime and system power off, which is too unfriendly for the application. Don’t use a reaper to prune the flowers in the pot. Instead, use kill -15 pid instead. If in practice you find that kill -15 pid does not shut down the application, consider using kill -9 pid at the kernel level, but be sure to troubleshoot what caused kill -15 pid to fail afterwards.

How does springboot handle -15 TERM Signal?

As explained above, using the kill -15 pid is a more elegant way to shut down a springboot application, but we may have the following questions: How does springboot/spring respond to this shutdown? Is tomcat shut down first, followed by JVM exit, or is it the other way around? How are they related to each other?

Trying to start with the logs, AnnotationConfigEmbeddedWebApplicationContext prints out the Closing behavior and goes directly to the source code to find out what is going on, and finally finds the key code in its parent class AbstractApplicationContext.

@Override
public void registerShutdownHook() {
  if (this.shutdownHook == null) {
    this.shutdownHook = new Thread() {
      @Override
      public void run() {
        synchronized (startupShutdownMonitor) {
          doClose();
        }
      }
    };
    Runtime.getRuntime().addShutdownHook(this.shutdownHook);
  }
}

@Override
public void close() {
   synchronized (this.startupShutdownMonitor) {
      doClose();
      if (this.shutdownHook != null) {
         Runtime.getRuntime().removeShutdownHook(this.shutdownHook);
      }
   }
}

protected void doClose() {
   if (this.active.get() && this.closed.compareAndSet(false, true)) {
      LiveBeansView.unregisterApplicationContext(this);
      // 发布应用内的关闭事件
      publishEvent(new ContextClosedEvent(this));
      // Stop all Lifecycle beans, to avoid delays during individual destruction.
      if (this.lifecycleProcessor != null) {
         this.lifecycleProcessor.onClose();
      }
      // spring 的 BeanFactory 可能会缓存单例的 Bean 
      destroyBeans();
      // 关闭应用上下文&BeanFactory
      closeBeanFactory();
      // 执行子类的关闭逻辑
      onClose();
      this.active.set(false);
   }
}

To facilitate layout and understanding, I removed some of the exception handling code from the source code and added the relevant comments. When the container is initialized, the ApplicationContext has registered a Shutdown Hook, which calls the Close() method, so when we execute kill -15 pid, the JVM receives the shutdown command and triggers the Shutdown Hook, which in turn is handled by the Close() method. The Close() method then handles some of the aftercare. The specific post-mortems depend on the doClose() logic of the ApplicationContext, including destroying the cached singleton object, issuing close events, closing the application context, etc., as mentioned in the annotation, and in particular, when the ApplicationContext implementation class is In particular, when the ApplicationContext implementation class is AnnotationConfigEmbeddedWebApplicationContext, it also handles some tomcat/jetty type of built-in application server shutdown logic.

JAVA and C both provide encapsulation of the Signal, and we can also manually capture these signals from the operating system, so we won’t go into too much detail here. If you have one, you can try to capture it yourself.

Are there other ways to gracefully shut down an application?

The spring-boot-starter-actuator module provides a restful interface for graceful shutdown.

Adding dependencies

<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

Add configuration

#启用shutdown
endpoints.shutdown.enabled=true
#禁用密码验证
endpoints.shutdown.sensitive=false

In production, please note that this port needs to be set with permission, such as with spring-security.

Execute the curl-X POST host:port/shutdown command and get the following return if the shutdown is successful.

`1`	`{"message":"Shutting down, bye..."}`

Although springboot provides such a way, as far as I know, I haven’t seen anyone shutting down in this way. kill -15 pid achieves the same effect, and is listed here just for the sake of program completeness.

How do I destroy a thread pool as a member variable?

Although the JVM will help us recover some resources when it is closed, some services that use a lot of asynchronous callbacks and timed tasks are likely to cause business problems if not handled properly, in which how to close the thread pool is a more typical problem.

@Service
public class SomeService {
    ExecutorService executorService = Executors.newFixedThreadPool(10);
    public void concurrentExecute() {
        executorService.execute(new Runnable() {
            @Override
            public void run() {
                System.out.println("executed...");
            }
        });
    }
}

We need to find a way to close the thread pool when the application shuts down (JVM shuts down, container stops running).

Initial solution: do nothing. In general, this won’t be a big problem because the JVM shuts down and will release it, but it obviously doesn’t do the two words that this article has been emphasizing, yes —- elegantly.

The downside of method one is that the tasks submitted in the thread pool and the unexecuted tasks in the blocking queue become extremely uncontrollable, do you exit immediately after receiving a shutdown command? Or do you wait for the task to finish executing? Or should we wait for a certain amount of time and shut down if the task is not completed?

Solution improvement.

After discovering the disadvantages of the initial solution, I immediately thought of using the DisposableBean interface, like this.

@Service
public class SomeService implements DisposableBean{

    ExecutorService executorService = Executors.newFixedThreadPool(10);

    public void concurrentExecute() {
        executorService.execute(new Runnable() {
            @Override
            public void run() {
                System.out.println("executed...");
            }
        });
    }

    @Override
    public void destroy() throws Exception {
        executorService.shutdownNow();
        //executorService.shutdown();
    }
}

The question then arises, is it shutdown or shutdownNow? These two methods are still often misused, so let’s briefly compare them.

ThreadPoolExecutor becomes SHUTDOWN after shutdown and cannot accept new tasks, and then waits for the execution of the task being executed to finish. This means that shutdown only issues a command, and it is up to the thread to shut down or not.

ThreadPoolExecutor handles shutdownNow differently. After the method is executed, it turns into a STOP state and calls Thread.interrupt() on the executing thread (but if the thread does not handle the interrupt, nothing happens), so it does not mean “shutdown immediately”. shutdown".

Looking at the java docs for shutdown and shutdownNow, you will find the following hints.

shutdown() ：Initiates an orderly shutdown in which previously submitted tasks are executed, but no new tasks will be accepted.Invocation has no additional effect if already shut down.This method does not wait for previously submitted tasks to complete execution.Use {@link #awaitTermination awaitTermination} to do that.

shutdownNow()：Attempts to stop all actively executing tasks, halts the processing of waiting tasks, and returns a list of the tasks that were awaiting execution. These tasks are drained (removed) from the task queue upon return from this method.This method does not wait for actively executing tasks to terminate. Use {@link #awaitTermination awaitTermination} to do that.There are no guarantees beyond best-effort attempts to stop processing actively executing tasks. This implementation cancels tasks via {@link Thread#interrupt}, so any task that fails to respond to interrupts may never terminate.

Both of them suggest that we need to execute the awaitTermination method additionally, and simply executing shutdown/shutdownNow is not enough.

Final solution: Referring to the thread pool recycling strategy in spring, we get the final solution.

public abstract class ExecutorConfigurationSupport extends CustomizableThreadFactory
      implements DisposableBean{
    @Override
    public void destroy() {
        shutdown();
    }

    /**
     * Perform a shutdown on the underlying ExecutorService.
     * @see java.util.concurrent.ExecutorService#shutdown()
     * @see java.util.concurrent.ExecutorService#shutdownNow()
     * @see #awaitTerminationIfNecessary()
     */
    public void shutdown() {
        if (this.waitForTasksToCompleteOnShutdown) {
            this.executor.shutdown();
        }
        else {
            this.executor.shutdownNow();
        }
        awaitTerminationIfNecessary();
    }

    /**
     * Wait for the executor to terminate, according to the value of the
     * {@link #setAwaitTerminationSeconds "awaitTerminationSeconds"} property.
     */
    private void awaitTerminationIfNecessary() {
        if (this.awaitTerminationSeconds > 0) {
            try {
                this.executor.awaitTermination(this.awaitTerminationSeconds, TimeUnit.SECONDS));
            }
            catch (InterruptedException ex) {
                Thread.currentThread().interrupt();
            }
        }
    }
}

With the comments preserved and some logging code removed, a solution for gracefully shutting down the thread pool is presented to us.

The waitForTasksToCompleteOnShutdown flag controls whether you want to terminate all tasks immediately, or wait for them to finish executing and then exit.
executor.awaitTermination(this.awaitTerminationSeconds, TimeUnit.SECONDS)); controls how long to wait, preventing tasks from running indefinitely (as already emphasized, even shutdownNow does not guarantee that a thread will stop running ).

More graceful downtime strategies to think about

Service governance frameworks generally take into account graceful shutdowns. A common practice is to isolate traffic beforehand, followed by shutting down the application. A common practice is to remove the service node from the registry, and the subscriber receives a notification to remove the node, thus shutting down gracefully; when it comes to database operations, the ACID feature of transactions can be used to ensure no abnormal data even if the crash is shut down, not to mention normal offline; for example, message queues can rely on ACK mechanisms + message persistence, or transactional message guarantees. Services with more timed tasks, handling offline need to pay special attention to the problem of graceful downtime, because this is a long-running service, more susceptible to downtime than other cases, you can use the power and flag bit approach to design timed tasks …

Transactions and ACK support can be used to make the service as reliable as possible, even in the case of downtime, power outage, kill -9 pid, etc. We also need to think about kill -15 pid, normal offline, and other downtime strategies. Finally, I would like to add some of my own understanding of the jvm shutdown hook when finishing this issue.

When the virtual machine begins its shutdown sequence it will start all registered shutdown hooks in some unspecified order and let them run concurrently. When all the hooks have finished it will then run all uninvoked finalizers if finalization-on-exit has been enabled. Finally, the virtual machine will halt.

The shutdown hook will keep the JVM running until the hook is terminated (terminated). This also tells us that if we receive a kill -15 pid command, we can wait for the task to finish before shutting down the JVM. it also explains the problem that some applications cannot exit by executing kill -15 pid. Yes, the interrupt is blocked.

Reference https://mp.weixin.qq.com/s/z1HrAsNKQp-Ljq1fsehmMQ

Table of Contents