Docker Swarm¶

Docker Swarm is a container orchestration tool built and managed by Docker, Inc. It provides native clustering functionality for Docker containers, which turns a group of Docker engines into a single, virtual Docker engine.

In a Swarm, multiple Docker hosts form a self-organizing, self-healing cluster, meaning the containers can be managed across different servers. Docker Swarm allows IT administrators and developers to establish and manage a cluster of Docker nodes as a single virtual system.

The main features of Docker Swarm include:

Cluster Management: Docker Swarm provides an integrated process for cluster management. The swarm manager nodes can manage the resources of worker nodes, which in turn run swarm services.
Scaling: Docker Swarm allows services to be scaled up or down in response to a service’s requirements, facilitating improved availability and reliability.
Load Balancing: Swarm nodes can perform load balancing of services using ingress load balancing and DNS.
Service Discovery: Swarm managers can automatically assign a DNS name to each service in the swarm, making it easier to perform inter-service networking.
Security: Docker Swarm uses mutual Transport Layer Security (TLS) for authentication, authorization, and end-to-end encryption to ensure communication between the nodes in the swarm is secure.
Rolling Updates: Docker Swarm allows incremental updates to be performed across the entire fleet of services, minimizing the risk and impact of application updates.

Managing a Docker Swarm¶

You can get the status of a Docker Swarm using several docker CLI commands, primarily executed from a manager node.

Here are the most common ones:

`docker node ls`¶

This is the primary command to see the status of all nodes in the swarm.

Output shows:
- ID: The unique ID of the node.
- HOSTNAME: The hostname of the node.
- STATUS:
  - Ready: The node is healthy and can accept tasks.
  - Down: The node is unhealthy or unreachable.
  - Unknown: The manager has lost contact with the node.
- AVAILABILITY:
  - Active: The node can receive new tasks.
  - Pause: The node will not receive new tasks, but existing tasks continue to run.
  - Drain: The node will not receive new tasks, and existing tasks are stopped and rescheduled on other active nodes.
- MANAGER STATUS:
  - Leader: This is the primary manager node.
  - Reachable: This is a manager node that is part of the Raft consensus quorum and can become a leader if the current leader fails.
  - <blank>: This is a worker node.
- ENGINE VERSION: The Docker Engine version running on the node.

docker node ls

Example output:

ID                            HOSTNAME        STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
x7y2z...q5r6t*                manager1        Ready     Active         Leader           20.10.7
a1b2c...g7h8i                 worker1         Ready     Active                          20.10.7
k9l0m...o5p6q                 worker2         Down      Active                          20.10.5

(The * indicates the node you are currently connected to.)

`docker service ls`¶

To see the status of services running on the swarm.

Output shows:
- ID: The unique ID of the service.
- NAME: The name of the service.
- MODE: replicated (a specified number of tasks) or global (one task per node).
- REPLICAS: The desired number of tasks vs. the actual number of running tasks (e.g., 3/3).
- IMAGE: The Docker image used by the service.
- PORTS: Any published ports.

docker service ls

Example output:

ID                  NAME            MODE                REPLICAS            IMAGE               PORTS
a1b2c3d4e5f6        my_web_app      replicated          3/3                 nginx:latest        *:80->80/tcp
g7h8i9j0k1l2        my_api          replicated          0/2                 myuser/myapi:v1.2

(Here, my_api has an issue as 0 out of 2 desired replicas are running).

`docker service ps <service_name_or_id>`¶

To get detailed status of the tasks for a specific service. This is useful for troubleshooting why a service might not have the desired number of replicas.

Output shows:
- ID: Task ID.
- NAME: Task name (e.g., service_name.replica_number).
- IMAGE: Image used.
- NODE: Node the task is (or was) running on.
- DESIRED STATE: The state the scheduler wants the task to be in (e.g., Running).
- CURRENT STATE: The actual state of the task (e.g., Running, Shutdown, Failed, Pending).
- ERROR: Any error message if the task failed.
- PORTS: Published ports.

docker service ps my_web_app

Example output:

ID                  NAME                IMAGE               NODE            DESIRED STATE       CURRENT STATE           ERROR               PORTS
r2q7...             my_web_app.1        nginx:latest        worker1         Running             Running 2 minutes ago
p5o9...             my_web_app.2        nginx:latest        manager1        Running             Running 2 minutes ago
t4n8...             my_web_app.3        nginx:latest        worker1         Running             Running 2 minutes ago

`docker info`¶

Provides general information about the Docker installation, including swarm status if the node is part of a swarm.

Look for the Swarm: section.
Output shows:
- Swarm: active (or inactive)
- NodeID: ID of the current node.
- Is Manager: true (or false)
- ClusterID: ID of the swarm.
- Managers: Number of manager nodes.
- Nodes: Total number of nodes in the swarm.
- Default Address Pool: Default subnet for overlay networks.
- SubnetSize: Subnet mask size.
- Data Path Port: Port for VXLAN data path.
- Orchestration: Details about task history.
- Raft: Raft consensus status (if manager).
- Manager Addresses: List of manager IPs.

docker info

Relevant section from example output:

...
Swarm: active
 NodeID: x7y2z...q5r6t
 Is Manager: true
 ClusterID: abc123def456
 Managers: 1
 Nodes: 3
 Default Address Pool: 10.0.0.0/8
 SubnetSize: 24
 Data Path Port: 4789
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
  External CAs:
   None
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 192.168.65.3
 Manager Addresses:
  192.168.65.3:2377
...

`docker node inspect <node_id_or_hostname>`¶

To get detailed low-level information about a specific node in JSON format.

docker node inspect self  # Inspect the current node
docker node inspect worker1 # Inspect a node named 'worker1'

Summary of what to look for¶

Node Health: docker node ls - ensure all nodes are Ready and Active (unless intentionally Drained or Paused). Pay attention to MANAGER STATUS for quorum.
Service Health: docker service ls - check if REPLICAS shows desired/actual (e.g., 3/3). If not, investigate further.
Task Health: docker service ps <service_name> - check CURRENT STATE and ERROR for failing tasks. This often points to application issues, resource limits, or image problems.

Page last modified: 2025-05-27 16:58:29

Docker Swarm¶

Managing a Docker Swarm¶

docker node ls¶

docker service ls¶

docker service ps <service_name_or_id>¶

docker info¶

docker node inspect <node_id_or_hostname>¶

Summary of what to look for¶

`docker node ls`¶

`docker service ls`¶

`docker service ps <service_name_or_id>`¶

`docker info`¶

`docker node inspect <node_id_or_hostname>`¶