AWS – ECS Keeps Recreating Docker Container – Tobias Forkel – Blog – PHP / Magento Developer located in Melbourne, Australia

Today was a strange day. I’ve launched a new version of my docker image within an ECS cluster, which usually worked fine. However, today my container always got recreated every 5 minutes for no reason.

The new docker image that I have pushed to the repository had no significant changes, except a redirect from non-SSL ( 80 ) to SSL ( 443 ) for all loaded virtual hosts. So, when I had a look at the processes, everything looked fine. There was nothing that could put the container into an unhealthy status.

PID   USER     TIME  COMMAND
    1 root      0:02 {supervisord} /usr/bin/python2 /usr/bin/supervisord -c /etc/supervisord.conf
    8 root      0:00 nginx: master process /usr/sbin/nginx -g daemon off;
    9 root      0:00 {mysqld.sh} /bin/sh /scripts/mysqld.sh
   10 root      0:00 /usr/sbin/crond -f
   13 nginx     0:13 nginx: worker process
   35 root      0:00 {php-fpm7.1} php-fpm: master process (/etc/php/7.1/php-fpm.conf)
   90 root      0:00 {mysqld_safe} /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql
  288 mysql     0:08 /usr/bin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mariadb/plugin --user=mysql --log-error=/var/lib/mysql/f817894c1752.err --pid-file=f817894c1752.pid --socket=/var/run/my
 3620 www-data  0:06 {php-fpm7.1} php-fpm: pool www
 3621 www-data  0:05 {php-fpm7.1} php-fpm: pool www
 3668 www-data  0:02 {php-fpm7.1} php-fpm: pool www
 3795 root      0:00 bash
 3807 root      0:00 ps aux

PID USER TIME COMMAND

1 root 0:02 {supervisord} /usr/bin/python2 /usr/bin/supervisord -c /etc/supervisord.conf

8 root 0:00 nginx: master process /usr/sbin/nginx -g daemon off;

9 root 0:00 {mysqld.sh} /bin/sh /scripts/mysqld.sh

10 root 0:00 /usr/sbin/crond -f

13 nginx 0:13 nginx: worker process

35 root 0:00 {php-fpm7.1} php-fpm: master process (/etc/php/7.1/php-fpm.conf)

90 root 0:00 {mysqld_safe} /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql

288 mysql 0:08 /usr/bin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mariadb/plugin --user=mysql --log-error=/var/lib/mysql/f817894c1752.err --pid-file=f817894c1752.pid --socket=/var/run/my

3620 www-data 0:06 {php-fpm7.1} php-fpm: pool www

3621 www-data 0:05 {php-fpm7.1} php-fpm: pool www

3668 www-data 0:02 {php-fpm7.1} php-fpm: pool www

3795 root 0:00 bash

3807 root 0:00 ps aux

Even the log output of the docker container didn’t show any error. After a little bit of research, I ended up on the following AWS developer guide page https://docs.aws.amazon.com/AmazonECS/latest/developerguide/update-service.html

When the service scheduler replaces a task during an update, the service first removes the task from the load balancer (if used) and waits for the connections to drain. Then, the equivalent of docker stop is issued to the containers running in the task. This results in a SIGTERM signal and a 30-second timeout, after which SIGKILL is sent and the containers are forcibly stopped. If the container handles the SIGTERM signal gracefully and exits within 30 seconds from receiving it, no SIGKILL signal is sent. The service scheduler starts and stops tasks as defined by your minimum healthy percent and maximum percent settings.

When the service scheduler replaces a task during an update, the service first removes the task from the load balancer (if used) and waits for the connections to drain. Then, the equivalent of docker stop is issued to the containers running in the task. This results in a SIGTERM signal and a 30-second timeout, after which SIGKILL is sent and the containers are forcibly stopped. If the container handles the SIGTERM signal gracefully and exits within 30 seconds from receiving it, no SIGKILL signal is sent. The service scheduler starts and stops tasks as defined by your minimum healthy percent and maximum percent settings.

It was not exactly what I was looking for, but it pretty much explained the problem I had. One of my colleagues said I should maybe have a look at the health checks of the load balancer.

Indeed, when I had a look at the settings in EC2 > Load Balancing > Target Groups > Health Checks the health check expected a success code 200 instead of 301 on port 80. Because of that, the container has been marked unhealthy every 5 minutes.

I hope this will point someone in the right direction.

AWS – ECS Keeps Recreating Docker Container

One thought on “AWS – ECS Keeps Recreating Docker Container”

Leave a Reply Cancel reply