Better Mastodon Sidekiq Scaling With Systemd EnvironmentFile

Part of a series on configuring and running the eigenmagic.net Mastodon instance.

After sharing the systemd setup we were using for eigenmagic.net in a previous post, I received some useful feedback that helped us refine our approach.

We’ve changed our approach to use multiple sidekiq queues per process, with weights to define a priority order. We now have a primary unit that is always defined for each queue, and can add extra processes when demand increases, usually only temporarily.

Background

An important function of our template approach, which some people missed, is to help us temporarily increase the number of Sidekiq processes when things get busy. Sidekiq doesn’t really do well above about 50 threads, it seems, so if we need to get through more items in a queue, we need to have more processes. Unit templates helps us do this with systemd in a relatively straightforward way that is persistent between reboots and visible to people already looking for them.

Using the %i parameter (what comes after the @ symbol in a template filename) meant we could spin up new processes, processing the same Sidekiq queue, with an on-to-fly tunable number of threads for each new process. This was handy, but it doesn’t really align with what systemd considers an ‘instance’ of a unit.

After a bit of experimentation, we came up with a new approach: a combination of a single templated unit file definition for Sidekiq processes and use of the EnvironmentFile directive.

Our New systemd Unit File

Here’s the new Sidekiq unit file. I won’t explain all of it here, so see the previous post for more about how we’ve set up the target and dependencies and so on.

[Unit]
# Set up a mastodon-sidekiq-<queuename>@.service unit for each queue
# Configure at least one instance of each queue type to run
# If we need more processing for a given queue type, enable
# a second, third, etc. service and start it. Then stop it/disable it
# if it's no longer needed once the load spike has passed.
Description=Mastodon Sidekiq %j processor %i
After=network.target

[Service]
Type=simple
User=mastodon
WorkingDirectory=/home/mastodon/live
EnvironmentFile=/etc/default/mastodon-sidekiq
# Queue specific configuration overrides defaults
EnvironmentFile=-/etc/default/mastodon-sidekiq-%j
Environment="RAILS_ENV=production"
# Start sidekiq with concurrency matching DB_POOL
ExecStart=/home/mastodon/.rbenv/shims/bundle exec sidekiq -c $DB_POOL $QUEUESET
TimeoutSec=15
Restart=always

[Install]
WantedBy=mastodon.target

This one unit template is copied into /lib/systemd/system/ six times, one for each Mastodon Sidekiq queue:

/lib/systemd/system/[email protected]
/lib/systemd/system/[email protected]
/lib/systemd/system/[email protected]
/lib/systemd/system/[email protected]
/lib/systemd/system/[email protected]
/lib/systemd/system/[email protected]

We’ve switched to using %j as the important unit variable, rather than %i. %j is replaced by whatever is between the last ‘-‘ character and the ‘@‘ in the unit filename. Our naming structure means %j will be the name of the Sidekiq queue the instanciated unit controls.

And we then enable/instanciate each unit like so:

sudo systemctl enable [email protected]

There’s one final ingredient: the EnvironmentFiles.

We create a config file called /etc/default/mastodon-sidekiq that looks like this:

# Mastodon sidekiq environment variables

# Reduce memory pressure on Linux.
# This setting halves the amount of memory needed compared to default
MALLOC_ARENA_MAX=2

# Use the libjemalloc library on Linux
LD_PRELOAD=libjemalloc.so

# Sidekiq service thread counts
# The number of threads needs to match the DB_POOL/MAX_THREADS environment
# setting or we'll run out of database connections.
# If DB_POOL is set, it takes precedence over MAX_THREADS. If neither are
# set, mastodon defaults to 5 threads.
DB_POOL=40

# Default queueset config
# 
#QUEUESET="-q default,5 -q scheduler,10 -q mailers,2 -q ingress,5 -q push,5 -q pull,5"

This EnvironmentFile is read in first by all of our units as default settings. We also define another per-queue EnvironmentFile to override the defaults, tuning our setup. You will see in the default file that the QUEUESET setting is commented out. It’s there as an example, but what we actually use looks more like this:

/etc/default/mastodon-sidekiq-default:

DB_POOL=40
QUEUESET="-q default,10 -q ingress,1 -q push,1 -q pull,1"

/etc/default/mastodon-sidekiq-scheduler:

# scheduler only needs a couple of threads
DB_POOL=2
QUEUESET="-q scheduler,4 -q mailers,1"

How This Works

Each unit file reads in both the default EnvironmentFile and, if it exists, a per-queue EnvironmentFile that overrides default settings, because of this line in the unit definition:

EnvironmentFile=-/etc/default/mastodon-sidekiq-%j

The %j gets replaced by the name of the queue we put in the name of the file when we install it into /lib/systemd/system/.

The ExecStart line uses variables defined in the EnvironmentFiles to set up a Sidekiq process with a given thread count (controlled by $DB_POOL) and a set of queues ($QUEUESET).

ExecStart=/home/mastodon/.rbenv/shims/bundle exec sidekiq -c $DB_POOL $QUEUESET

The $QUEUESET definition uses Sidekiq queue weights to tell Sidekiq how likely it should be that it takes a new job from a given queue. Each Sidekiq process has primary responsibility for a specific queue (the one in the unit name) and can help out with other queues, at lower priority, if there is work to be done. We’ve set up our queues so that the scheduler and mailers processes help each other, while the other four queues (default, ingress, push, pull) are fairly heavily weighted to mind their own business before helping others.

The overall effect is to provide a pool of Sidekiq processes and threads that do their own work first, but if, say, pull is idle while ingress has a backlog, the pull process will help out.

We can also add more processes if and when we need them, like so:

sudo systemctl enable [email protected]
sudo systemctl start [email protected]

This can be temporary if there’s a load spike that only needs some extra processing for a short while, such as if we fall behind during an outage, but it could also be permanent. It also makes it clear which queue this unit is for, and why, using more usual systemd semantics for what a unit instance is.

We lose a little of the more granular thread tuning of our previous approach, but we gain a more robust default configuration that handles most situations, and we can still quickly and easily handle load spikes in a similar way to before.

We like this approach more than our previous one, and we hope you find this useful.

Bookmark the permalink.

Comments are closed.