Dynamic reverse proxy Pacman cache using NGINX

Arch Linux is a rolling release distribution. Almost every day there are updates available including kernel updates, "The nVidia proprietary blob", etc... These packages are huge and even with a very fast internet connection they take a lot of bandwidth and resources from Arch Linux Tier 2 mirrors.

Doing some system maintenance and reading the Wiki: pacman/Tips and tricks I found a section called "2.3.5 Dynamic reverse proxy cache using NGINX" but it gave a very vague description of the procedure to get NGINX correctly set up, specially when things went wrong.

So today we are going to set up an NGINX server to act as a reverse proxy and store all the packages downloaded in any PC so the rest of the network doesn't have to fetch all the packages again from a Tier 2 mirror.

NGINX Configuration§

If you don't have already an HTTP(S) server running, it is necessary to add an http block like the following one:

http {
    include mime.types;
    default_type application/octet-stream;
    sendfile on;
    keepalive 20; # Or any value that works with you.
    types_hash_max_size 4096;
    server_names_hash_bucket_size 128; 

#(Server Block)
}

Now we have to specify the server in which NGINX will listen to GET requests. Inside the previous http block we should paste this:

server {
    listen 8080; # Or your preferred port, by default pacman will use port 80
    server_name yourdomain.example;
    root /srv/http/pacman-cache; # Root directory for the package cache.
    autoindex on; # So we can inspect with a browser all the packages

    # To pass database update request directly to the mirror without caching
    location ~ \.(db|sig|files)$ {
        proxy_pass http://mirrors$request_uri;
    }

    # Check package cache before downloading from a mirror
    location ~ \.tar\.(xz|zst)$ {
        try_files $uri @pkg_mirror;
    }

    # Retrieve package from a mirror and cache for future requests
    location @pkg_mirror {
        proxy_store    on;
        proxy_redirect off;
        proxy_store_access  user:rw group:rw all:r;
        proxy_next_upstream error timeout http_404;
        proxy_pass          http://mirrors$request_uri;
    }
}

# Upstream Arch Linux Mirrors, configure as much as you want
upstream mirrors {
    server 127.0.0.1:8001;
    server 127.0.0.1:8002 backup; # Use the second one as a backup
# (Default pacman behaviour)
}

# Mirror 1: https://mirror.cloroformo.org/archlinux/$repo/os/$arch
server {
    listen 127.0.0.1:8001;
    location / {
        resolver DNS_Srv_IP ipv6=off; # Replace with your preferred DNS Server
        proxy_pass https://mirror.cloroformo.org/archlinux$request_uri;
    }
}

# Mirror 2: https://mirror.librelabucm.org/archlinux/$repo/os/$arch
server {
    listen 127.0.0.1:8002;
    location / {
        resolver DNS_Srv_IP ipv6=off; # Replace with your preferred DNS Server
        proxy_pass https://mirror.librelabucm.org/archlinux$request_uri;
    }
}

This configuration is very similar to the one available in the Arch Wiki, but solves some "resolver not found errors". Instead of including the resolver directive inside the http block, you should include it inside each server below the upstream block.

Server Configuration§

Before restarting NGINX we must create the root folder that was previously specified inside the server block and give it the correct permissions:

sudo mkdir -p /srv/http/pacman-cache 		# Create folder
sudo chown http:http /srv/http/pacman-cache	# Set ownership
sudo chmod 775 /srv/http/pacman-cache		# Set privileges

sudo systemctl restart nginx			# Restart NGINX

Once the server is restarted we can check if the service is up and running with a browser: http://yourdomain.example:8080 (Replace yourdomain.example with the server IP or domain name).

Client Configuration§

This part is trivial because we don't want to break pacman when we are not in the same LAN as the server (ex. laptop when outside home, etc...), so the only change we should do is including the new mirror on top of /etc/pacman.d/mirrorlist, that way we will be able to take advantage of the package cache from our server.

If we need to be connected to another network in which our server is not accessible, when we want to update, each package will trigger a query to our server that will (obviously) fail. This will spam with errors the log and could be rather annoying.

An easy solution to this problem is to use a script hooked to a network event such as interface up or interface down. Network Manager calls it a dispatcher script, but that is a topic of another post.

Clearing old packages§

Each new request will increase the size of the cache. If we don't remove the old packages, the cache will run out of disk space progressively.

Ideally, we want at most the last 2 versions of a package, in case the most recent one is broken and we have to do a rollback.

We can create a simple service and a trigger that execute the service once a week.

For creating the trigger: sudo vim /etc/systemd/system/mirror-cache-clean.timer

[Unit]
Description=Trigger each week the package cache cleaning service

[Timer]
OnCalendar=weekly
Persistent=true

[Install]
WantedBy=timers.target

For creating the service: sudo vim /etc/systemd/system/mirror-cache-clean.service

[Unit]
Description=Pacman Mirror Cache cleaning service

[Service]
Type=simple
ExecStart=/usr/bin/find /data/cache -type d -exec paccache -v -k 2 -c {} \;

Finally we can enable the trigger with: sudo systemctl enable mirror-cache-clean.timer and enjoy the speed that the cache server will bring to our LAN!