Bash Scripting Automation

Bash Scripting for DevOps: From Basics to Advanced

PS
Priya Sharma
Platform Engineer
Jan 20, 2025
18 min read

What You'll Learn

Master Bash scripting from variables and conditionals to functions, error handling, and real-world DevOps automation scripts used in production CI/CD pipelines.

Why Learn Bash Scripting?

Every DevOps engineer needs Bash. It's pre-installed on every Linux/macOS system, runs without dependencies, and is the glue that binds together all the tools in a DevOps pipeline — from CI/CD scripts to deployment automation, log rotation, and health checks.

1. Variables & Data Types

bash
#!/bin/bash

# ─── Variables ───
NAME="DevOps"
VERSION=1.5
IS_PROD=true

echo "App: $NAME v$VERSION"
echo "Production: ${IS_PROD}"   # Braces are good practice

# ─── Special Variables ───
echo "Script name: $0"          # Name of the script
echo "First arg: $1"            # First argument passed
echo "All args: $@"             # All arguments
echo "Arg count: $#"            # Number of arguments
echo "Last exit code: $?"       # Exit code of last command
echo "Process ID: $$"           # PID of current shell

# ─── Command Substitution ───
CURRENT_DATE=$(date +%Y-%m-%d)
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}')
echo "Date: $CURRENT_DATE  Disk: $DISK_USAGE"

# ─── Readonly Variables ───
readonly DB_HOST="prod-db.internal"
# DB_HOST="other"  # This would fail with an error

# ─── Arrays ───
SERVERS=("web01" "web02" "web03")
echo "${SERVERS[0]}"            # web01
echo "${SERVERS[@]}"            # All elements
echo "${#SERVERS[@]}"           # Array length
SERVERS+=("web04")              # Append element

# ─── Associative Arrays (Bash 4+) ───
declare -A CONFIG
CONFIG[host]="localhost"
CONFIG[port]=3000
CONFIG[env]="prod"
echo "Host: ${CONFIG[host]}, Port: ${CONFIG[port]}"

2. Conditionals

bash
#!/bin/bash

# ─── if / elif / else ───
CPU_USAGE=85

if [ $CPU_USAGE -gt 90 ]; then
    echo "CRITICAL: CPU above 90%"
elif [ $CPU_USAGE -gt 75 ]; then
    echo "WARNING: CPU above 75%"
else
    echo "OK: CPU normal"
fi

# ─── String comparisons ───
ENV="production"
if [[ "$ENV" == "production" ]]; then
    echo "Running in PROD — be careful!"
fi

if [[ "$ENV" != "dev" && "$ENV" != "staging" ]]; then
    echo "Not a dev or staging environment"
fi

# ─── File tests ───
FILE="/etc/nginx/nginx.conf"
if [[ -f "$FILE" ]]; then
    echo "Config file exists"
elif [[ -d "/etc/nginx" ]]; then
    echo "Nginx directory exists but no config"
else
    echo "Nginx not installed"
fi

# Common file test operators:
# -f  : file exists
# -d  : directory exists
# -r  : file is readable
# -w  : file is writable
# -x  : file is executable
# -s  : file has size > 0
# -z  : string is empty
# -n  : string is not empty

# ─── case statement ───
ACTION="deploy"
case "$ACTION" in
    "deploy")   echo "Deploying application..." ;;
    "rollback") echo "Rolling back to previous version..." ;;
    "status")   echo "Checking service status..." ;;
    *)          echo "Unknown action: $ACTION"; exit 1 ;;
esac

3. Loops

bash
#!/bin/bash

# ─── for loop ───
for SERVER in web01 web02 web03; do
    echo "Checking $SERVER..."
    ssh user@$SERVER "systemctl status nginx"
done

# ─── for loop with range ───
for i in {1..5}; do
    echo "Attempt $i..."
done

# ─── for loop over array ───
SERVICES=("nginx" "mysql" "redis" "node")
for SERVICE in "${SERVICES[@]}"; do
    if systemctl is-active --quiet "$SERVICE"; then
        echo "✅ $SERVICE is running"
    else
        echo "❌ $SERVICE is DOWN"
    fi
done

# ─── while loop ───
RETRIES=0
MAX_RETRIES=5
until curl -sf http://localhost:3000/health; do
    RETRIES=$((RETRIES + 1))
    if [ $RETRIES -ge $MAX_RETRIES ]; then
        echo "Service failed to start after $MAX_RETRIES attempts"
        exit 1
    fi
    echo "Waiting for service... (attempt $RETRIES)"
    sleep 5
done
echo "Service is up!"

# ─── Loop over file lines ───
while IFS= read -r line; do
    echo "Processing: $line"
done < servers.txt

# ─── Loop over command output ───
for PID in $(pgrep -f "node server.js"); do
    echo "Found Node process: $PID"
done

4. Functions

bash
#!/bin/bash

# ─── Basic function ───
log() {
    local LEVEL="$1"
    local MESSAGE="$2"
    local TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
    echo "[$TIMESTAMP] [$LEVEL] $MESSAGE"
}

log "INFO"  "Starting deployment"
log "ERROR" "Database connection failed"

# ─── Function with return value ───
is_service_running() {
    local SERVICE="$1"
    if systemctl is-active --quiet "$SERVICE"; then
        return 0   # true / success
    else
        return 1   # false / failure
    fi
}

if is_service_running "nginx"; then
    log "INFO" "Nginx is running"
else
    log "ERROR" "Nginx is not running"
fi

# ─── Function returning output ───
get_container_ip() {
    local CONTAINER="$1"
    docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' "$CONTAINER"
}

APP_IP=$(get_container_ip "my-app")
echo "App container IP: $APP_IP"

5. Error Handling — Production Best Practices

bash — Production-grade deployment script
#!/usr/bin/env bash
# ─── SAFETY FLAGS (Always use these!) ───
set -euo pipefail
# -e : Exit immediately on error
# -u : Treat unset variables as errors
# -o pipefail : Catch errors in pipes

# ─── Cleanup trap ───
TEMP_DIR=$(mktemp -d)
cleanup() {
    echo "Cleaning up temp files..."
    rm -rf "$TEMP_DIR"
}
trap cleanup EXIT           # Runs on script exit
trap 'log ERROR "Script failed at line $LINENO"' ERR

# ─── Real deployment script ───
deploy() {
    local APP_NAME="$1"
    local VERSION="$2"
    local ENV="$3"

    log INFO "Starting deployment of $APP_NAME:$VERSION to $ENV"

    # Validate inputs
    [[ -z "$APP_NAME" ]] && { log ERROR "APP_NAME required"; exit 1; }
    [[ -z "$VERSION" ]] && { log ERROR "VERSION required"; exit 1; }

    # Pull Docker image
    log INFO "Pulling image: $APP_NAME:$VERSION"
    docker pull "registry.example.com/$APP_NAME:$VERSION" || {
        log ERROR "Failed to pull image"
        exit 1
    }

    # Stop old container gracefully
    if docker ps -q -f name="$APP_NAME" | grep -q .; then
        log INFO "Stopping existing container"
        docker stop "$APP_NAME" --time=30
        docker rm "$APP_NAME"
    fi

    # Start new container
    log INFO "Starting new container"
    docker run -d \
        --name "$APP_NAME" \
        --restart unless-stopped \
        -e ENV="$ENV" \
        "registry.example.com/$APP_NAME:$VERSION"

    # Health check
    log INFO "Running health check..."
    sleep 10
    local MAX_CHECKS=6
    local CHECK=0
    while ! curl -sf "http://localhost:3000/health"; do
        CHECK=$((CHECK + 1))
        [[ $CHECK -ge $MAX_CHECKS ]] && { log ERROR "Health check failed after $MAX_CHECKS attempts"; exit 1; }
        log INFO "Waiting for health check ($CHECK/$MAX_CHECKS)..."
        sleep 5
    done

    log INFO "✅ Deployment of $APP_NAME:$VERSION to $ENV SUCCESSFUL"
}

deploy "$1" "$2" "${3:-staging}"

Bash Scripting Cheatsheet

$? — Last exit code
$! — PID of last background job
$(cmd) — Command substitution
2>&1 — Redirect stderr to stdout
&>/dev/null — Silence all output
cmd1 || cmd2 — Run cmd2 if cmd1 fails

Keep Reading

D
DevOps

Docker Networking Demystified: Bridge, Host & Overlay

8 min read Read More
C
Cloud

AWS IAM Roles vs Users vs Policies

10 min read Read More
P
Programming

Understanding Python's GIL & Multiprocessing

14 min read Read More