When performing a backup of a DTR environment using the command
docker run --rm -i docker/dtr backup, the backup fails with the following message:
FATA Error waiting for dtr-phase2 to finish: An error occurred trying to connect: Post https://docker-ucp1:443/v1.22/containers/<id>/wait: EOF, code: -1
It can also happen when you try to open a backup file:
tar -cf /tmp/backup-images.tar dtr-registry-<replica-id> ... tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now ...
Before performing these steps, you must meet the following requirements:
- Have configured a load balancer to balance traffic between your DTR replicas
During a DTR backup job, the bootstrap script for the
backup command spins up a
dtr-phase2 container, where most of the backup work is performed. The bootstrapper then monitors the progress of
dtr-phase2 via an ongoing call to the
ContainerWait API endpoint which blocks until an exit status is returned from the container.
ContainerWait API is not performing a large amount of traffic on the wire, if any at all. This is problematic when an incorrectly configured load balancer is involved in the communication and is not configured to keep connections alive for a large enough amount of time. This leads to the load balancer cutting the connection and the
The following steps can be used to test for a UCP loadbalancer timeout independently of the
dtr backup command:
As the Admin user, download and source a UCP client certificate bundle:
docker waitfor a long-running container times out after approximately the same amount of time that
dtr backupis prematurely exiting.
time docker wait $(docker ps -qaf name=ucp-controller |head -n1)
If this command prematurely exits with an error in approximately the same amount of time as
dtr backup, then a load balancer may be terminating the connection.
To fix this issue, increase the
tcp_keepalive setting on the load balancer balancing traffic across the DTR replicas to a value of 5 minutes.