AnsibleAutomation: The Crash That Taught Us Everything
Sometimes the best lessons come from disaster. That's this story.
The Before Times: Manual Everything
When we first built toaster, we were flying fast and loose. Services? Just run the docker command. Configuration? Edit the file and restart. Need to deploy something new? SSH in, pull the repo, install dependencies, configure the service, hope nothing breaks.
It worked great. Right up until it didn't.
The Crash
I won't name names, but let's just say a full system crash turned into the kind of learning experience you only appreciate in hindsight.
Everything was gone. All the carefully configured services. All the docker containers. All the little tweaks and environment variables that made toaster... well, toaster.
And I'm sitting there looking at a blank terminal, thinking: "I have to rebuild all of this from scratch?"
The Realization
That's when it hit me: if you can't automate it, you can't scale it.
We had been building infrastructure like it was permanent. But infrastructure isn't permanent - it's temporary. Servers crash. Disks fail. Things break. What matters isn't the server - it's the knowledge of how to recreate it.
That's when we discovered Ansible.
Why Ansible?
We looked at the options:
- Terraform - Great for cloud infrastructure, overkill for a single server
- Docker Compose - Good for containers, doesn't handle systemd services
- Bash scripts - We already had too many of these
Ansible clicked because:
- Agentless - Just SSH and Python
- Idempotent - Run the same playbook twice, nothing breaks
- Declarative - Say WHAT you want, not HOW to do it
- Human readable - YAML, not magic
The Structure We Built
Here's the anatomy of our Ansible setup:
ansible/
├── deployments/
│ └── toaster/
│ ├── openwebui/ # Bash startup script
│ ├── openwebui.service # Systemd service file
│ ├── qwen35/ # Model startup script
│ └── qwen35.service # Systemd service file
├── roles/
│ └── deploy/
│ └── tasks/
│ └── main.yml # The deployment logic
└── deploy-openwebui.yml # Dedicated playbook
The Service File Pattern
Every service on toaster now follows the same pattern:
[Unit]
Description=Service Name
After=docker.service network.target
Requires=docker.service
[Service]
Type=simple
User=<username>
Group=<username>
WorkingDirectory=<working_dir>
ExecStart=<startup_script>
Restart=on-failure
RestartSec=10s
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
The Bash Script Pattern
And every service has a bash script that holds the actual docker command:
#!/usr/bin/env bash
docker run --rm \
-p <port>:<port> \
-v <volume_name>:/app/data \
--name <service_name> \
<docker_image>:<tag>
Why bash scripts? Because the command is right there. No guessing. No digging through systemd files. Just read the script and you know exactly what's running.
The Deployment Playbook
The real magic is in the playbook:
- name: Deploy service to toaster
hosts: toaster
gather_facts: true
tasks:
- name: Sync files to remote
ansible.posix.synchronize:
src: deployments/toaster/
dest: <destination_path>/
- name: Install systemd service
ansible.builtin.copy:
src: deployments/toaster/<service>.service
dest: /etc/systemd/system/<service>.service
- name: Enable and start service
ansible.builtin.systemd:
name: <service>
state: restarted
enabled: true
One command. That's all it takes:
ansible-playbook -i inventory.ini deploy-<service>.yml
What This Means
Before Ansible, deploying a new service meant:
- SSH to toaster
- Create the docker command
- Create the systemd service file
- Copy both files
- Reload systemd
- Enable the service
- Start the service
- Hope nothing broke
Now? One command.
The Bigger Picture
This isn't just about convenience. It's about resilience.
If toaster crashes tomorrow - really crashes, disk failure, nothing boots - we're not starting from scratch. We have the knowledge. We have the playbooks. We can rebuild everything in minutes.
That's the lesson the crash taught us: automation isn't about laziness, it's about recovery.
What's Next
We're not stopping here. The next step is expanding Ansible to handle:
- GPU driver management
- Docker configuration
- Network setup
- Backup automation
Because the next crash? We'll be ready.
Sometimes the best way to learn is to break something completely. Then build it back better.
This blog post was written with the help of Qwen 3.5.