A partial archive of meta.discourse.org as of Tuesday July 18, 2017.

Too many connections to DB, how to optimize

TomekB

I have frequent 500 errors due to exceeding number of connections to my DB.

This results in errors like this in /logs:

Job exception: FATAL: remaining connection slots are reserved for non-replication superuser connections

My config:

UNICORN_WORKERS: 12
UNICORN_SIDEKIQS: 2
DISCOURSE_DB_POOL: 90

My DB under load performs like this:

How can I optimize this? Please keep in mind the default config of PSQL has max connections limit set to 100.

I have plenty of RAM and CPUs for both web_data container and data container (which is hosted on a second server).

Update: I have a lot of open connections to DB from sidekiq, a lot more than it should (I have just 2 sidekiqs with 5 threads each).

Falco

Did you change sidekiq threads manually? I believe the default is 25, and you have 2 process, so 50 connections, only with sidekiq.

sam

At high loads I strongly recommend running pgbouncer, how is your db configured?

TomekB

I would have to check, but /sidekiq reports only 5 threads per sidekiq:

sam

Each web runs multiple threads as well, the numbers add up real quick

TomekB

I have a data container running on a second server.

I have changed the data.yml to expose db and redis ports and to give some extra memory:

db_shared_buffers: "8GB"
db_work_mem: "100MB"
TomekB

I have 25 connections from unicorn workers and over 70 from sidekiqs.

TomekB

Restarting the web_only container doesn’t solve the problem, it seems like sidekiq connection persist.
Restarting the data container solvers the problem for another couple of days.
It seems like there are some issues with sidekiq connections.

sam

Interesting. We are usually internally totally shielded from this bug cause we use pg bouncer, but we will have a look internally to see if we see similar leaks. :man_farmer:t4:

Can you debug into the “connections” hanging there, do you have any info on what the last query is these stalled connections ran? how long are they stuck without running anything?

@tgxworld where is our official pg image with bouncer? we need a howto explaining how to use it.

TomekB

How do you rotate connections with pgbouncer (session pooling, transaction pooling)?
I will try to debug these stalled connections but I think most of them were idle.

sam

We use transaction pooling, session pooling is a no-go with Rails.

TomekB

Thx I will set this up but this will make it much harder to diagnose these problems.