DBConnection.ConnectionError: connection not available and request was dropped from queue after 2290ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:
1. By tracking down slow queries and making sure they are running fast enough
2. Increasing the pool_size (albeit it increases resource consumption)
3. Allow requests to wait longer by increasing :queue_target and :queue_interval
See DBConnection.start_link/2 for more information
It is pretty common to run unto this error as an Elixir developer working with relational databases. The suggested fixes definitely help, but personally I feel bad tweaking something I don’t understand. How does changing pool_size actually affect the rest of the application? I want to keep my mysteries to Saturday morning cartoons, not production. So I decided to dig in and see how pooling works.
DBConnection is a foundation Elixir library powering many database specific adapters. Most of the time you don’t need to know much about it, since the adapter libraries abstract away most of its inner workings.
While looking into some performance issues, my team realized we didn’t have a shared mental modal of how pooling works with DBConnection. It is largely undocumented, and a quick glance at the code wasn’t enough to let us know how it worked. So, like any rational engineer, I went way too deep in the code and traced through everything that was happening for pooled database connections.
This writeup uses Postgrex as the example adapter, but nothing about the explanation is specific to PostgreSQL. The pooling works exactly the same for any other library using DBConnection.
If you want to try the code snippets yourself, check out the Livebook version of this post.
Database setup
First comes the easy part! We’ll start a database connection using a minimal set of options.
(To run the code below, you’ll need postgres running locally.)
# Setup a database connection
db_opts = [
database: "postgres",
hostname: "localhost",
password: "postgres",
pool_size: 1,
port: 5432,
username: "postgres"
]
{:ok, conn} = Postgrex.start_link(db_opts)
What just happened?
In the configuration code above, a process is created and returned to the caller. A ton of work happened in the DBConnection library as a result - let’s dive into it!
First, Postgrex.start_link/1
calls DBConnection.start_link/2
[^1]
def start_link(opts) do
ensure_deps_started!(opts)
opts = Postgrex.Utils.default_opts(opts)
DBConnection.start_link(Postgrex.Protocol, opts)
end
The child spec returned by DBConnection is polymorphic [^2]. It looks for a :pool
module specified in the opts and returns the result of that module’s child_spec/1
function. For the default DBConnection.ConnectionPool
module, this is just a wrapper around GenServer.child_spec/1
which specifies to call start_link/1
with the parameter {Postgrex.Protocol, opts}
.
So the process created and returned by Postgrex.start_link/1
will be a DBConnection.ConnectionPool
GenServer. Done!
No, not done.
DBConnection.ConnectionPool.init/1
creates a protected ETS table to handle queueing of queries. The table identifier is kept in the process’s state, so we can look it up and read from it. The process also keeps track of whether it is :busy
or :ready
, initially starting as :busy
. More on that later.
[^1]: postgrex start_link [^2]: db_connection :pool module
ConnectionPool state
# Get the ETS identifier from the process state
{_tag, queue, _codel, _timestamp} = :sys.get_state(conn)
{:busy, #Reference<0.803399952.2529296391.184107>,
%{
delay: 0,
idle: #Reference<0.803399952.2529165319.184122>,
idle_interval: 1000,
interval: 1000,
next: -576460751694,
poll: #Reference<0.803399952.2529165319.184121>,
slow: false,
target: 50
}, {-576460751695135000, 0}
Process tree
When the :db_connection
application is first started, DBConnection.ConnectionPool.Supervisor
is run as a dynamic supervisor. [^3] This happens automatically when db_connection
is included as a mix dependency.
[^3]: supervision tree
ConnectionPool setup
The DBConnection.ConnectionPool
process does a few things when it is initializing. Mostly it sets up timers for things like idle timeouts. It also calls DBConnection.ConnectionPool.Pool.start_supervised/3
with the ETS queue it created. [^4]
def init({mod, opts}) do
DBConnection.register_as_pool(mod)
queue = :ets.new(__MODULE__.Queue, [:protected, :ordered_set])
ts = {System.monotonic_time(), 0}
{:ok, _} = DBConnection.ConnectionPool.Pool.start_supervised(queue, mod, opts)
# ...snip...
end
This causes the DBConnection.Watcher
GenServer process (running under the application supervision tree) to create a DBConnection.ConnectionPool.Pool
process as a dynamically supervised child of DBConnection.ConnectionPool.Supervisor
.
Side note: normally the new process would be only be linked to its parent supervisor, but DBConnection.Watcher
will monitor the calling DBConnection.ConnectionPool
process and terminate the new process when the calling process stops. This effectively links the two processes together.
DBConnection.ConnectionPool.Pool
is itself another supervisor. When its init/1
function is called, the :pool_size
option is checked with a default of 1. This parameter is passed all the way down from the original Postgrex.start_link/1
call and determines how many children the supervisor will have. [^5]
def init({owner, tag, mod, opts}) do
size = Keyword.get(opts, :pool_size, 1)
children = for id <- 1..size, do: conn(owner, tag, id, mod, opts)
sup_opts = [strategy: :one_for_one] ++ Keyword.take(opts, [:max_restarts, :max_seconds])
Supervisor.init(children, sup_opts)
end
Each child in the pool is a DBConnection.Connection
, which is an implementation of the Connection
behaviour. [^6] The adapter can specify callbacks to run after the connection is established if it chooses to.
[^4]: start pool process [^5]: pool process definition [^6]: connection start_link
Connection state
[{_, pool_sup, :supervisor, _}] =
DynamicSupervisor.which_children(DBConnection.ConnectionPool.Supervisor)
[{_, pool_conn, :worker, _}] = Supervisor.which_children(pool_sup)
%Connection{mod_state: state} = :sys.get_state(pool_conn)
state
%{
after_connect: nil,
after_connect_timeout: 15000,
backoff: %DBConnection.Backoff{max: 30000, min: 1000, state: {1000, 10000}, type: :rand_exp},
client: {#Reference<0.803399952.2529296391.185395>, :pool},
connection_listeners: [],
mod: Postgrex.Protocol,
opts: [
pool_index: 1,
types: Postgrex.DefaultTypes,
database: "postgres",
hostname: "localhost",
password: "postgres",
pool_size: 1,
port: 5432,
username: "postgres"
],
pool: #PID<0.255.0>,
state: %Postgrex.Protocol{
buffer: "",
connection_id: 88094,
connection_key: 401165952,
disconnect_on_error_codes: [],
null: nil,
parameters: #Reference<0.803399952.2529165319.184202>,
peer: {{127, 0, 0, 1}, 5432},
ping_timeout: 15000,
postgres: :idle,
queries: #Reference<0.803399952.2529296391.184199>,
sock: {:gen_tcp, #Port<0.10>},
timeout: 15000,
transactions: :naive,
types: {Postgrex.DefaultTypes, #Reference<0.803399952.2529296391.184175>}
},
tag: #Reference<0.803399952.2529296391.184107>,
timer: nil
}
Each Connection
process keeps track of several things in its state. The pool
key contains the PID of the DBConnection.ConnectionPool
process - the one that was returned from Postgrex.start_link/1
. The tag
key contains the table identifier for the ETS table used as a queue.
Most of the state is generic, but the nested state
key is specific to the calling library - Postgrex in this case.
Adding the connection to the pool
After the connection is established, DBConnection.Holder.update/4
is called. [^7] This creates a new public ETS table and gives ownership of it to the DBConnection.ConnectionPool
process.
def update(pool, ref, mod, state) do
holder = new(pool, ref, mod, state)
try do
:ets.give_away(holder, pool, {:checkin, ref, System.monotonic_time()})
{:ok, holder}
rescue
ArgumentError -> :error
end
end
When the ownership message is received, the ConnectionPool process will check its queue and try to pull off the first message. [^8] Since the queue starts empty, this is a noop and the pool will just transition from :busy
to :ready
. An entry of {timestamp, holder}
is inserted into the queue where holder is a reference to the ETS table created by DBConnection.Holder
.
When the process is in its :ready
state, the ETS queue will always contain either this holder tuple or nothing. When the process is :busy
, the queue will instead have queries waiting for an available connection.
[^7]: pool_update/2 [^8]: ConnectionPool checkin
# A single entry exists in the queue that contains a reference to the holder ETS
:ets.match(queue, :'$1')
[
[{{-576460431686553007, #Reference<0.803399952.2529296391.185475>}}]
]
Pool size
So, what happens if we change the pool size? Extra DBConnection.Connection
processes are spawned under theDBConnection.ConnectionPool.Pool
supervisor. They all share the same ETS queue and reference the same DBConnection.ConnectionPool
process.
{:ok, conn2} = db_opts |> Keyword.put(:pool_size, 2) |> Postgrex.start_link()
# ConnectionPool.Supervisor children - one for each start_link call
Supervisor.count_children(DBConnection.ConnectionPool.Supervisor)
%{active: 2, specs: 2, supervisors: 2, workers: 0}
[_, {_, pool_sup, :supervisor, _} | _] =
DynamicSupervisor.which_children(DBConnection.ConnectionPool.Supervisor)
# ConnectionPool.Pool children - one for each pooled connection
Supervisor.count_children(pool_sup)
%{active: 2, specs: 2, supervisors: 0, workers: 2}
The holder ETS pattern is repeated for each underlying connection in the pool. The single DBConnection.ConnectionPool
process that is returned by start_link
contains a single queue, and for each pooled connection an entry in the queue is inserted with holder ETS table that references a single underlying connection.
# View the queue with two pooled connections
{_tag, queue, _codel, _timestamp} = :sys.get_state(conn2)
:ets.match(queue, :'$1')
[
[{{-576453306038052131, #Reference<0.1613371477.1188167681.181386>}}],
[{{-576453306037963131, #Reference<0.1613371477.1188167681.181387>}}]
]
Queueing and querying
Now that we understand the processes, the remaining question is how queries get queued and sent to the underlying connections. Postgrex.query/4
takes our PID as its first argument. That gets passed to DBConnection.prepare_execute/4
[^9] - I’ll spare you the full stacktrace but eventually DBConnection.Holder.checkout_call/5
is called.
This function sends a message to the DBConnection.ConnectionPool
process asking to checkout a connection. [^10] One of the parameters sent is whether queueing is allowed, which is controlled by the :queue
parameter and defaults to true
. If the DBConnection.ConnectionPool
process is in a :busy
state then the request is either inserted into the ETS queue or rejected, depending on the :queue
parameter. Assuming the process is :ready
, the first holder tuple is deleted from the queue and ownership of that holder ETS reference is given away to the calling process. If no more holder entries are available, the process is marked as :busy
.
DBConnection.Holder
waits for the message from the ETS ownership transfer (within the calling process). It marks the connection contained within the holder ETS entry as locked and returns the connection.
receive do
{:"ETS-TRANSFER", holder, pool, {^lock, ref, checkin_time}} ->
Process.demonitor(lock, [:flush])
{deadline, ops} = start_deadline(timeout, pool, ref, holder, start)
:ets.update_element(holder, :conn, [{conn(:lock) + 1, lock} | ops])
pool_ref =
pool_ref(pool: pool, reference: ref, deadline: deadline, holder: holder, lock: lock)
checkout_result(holder, pool_ref, checkin_time)
# ...snip...
end
The connection is now checked out and used by DBConnection to make the query. Once the query is complete, Holder.checkin/1
is called with the connection. The holder ETS table is updated and ownership is transferred back to the DBConnection.ConnectionPool
process. If the process was busy (which indicates that all connections were checkout out) then the queue is checked for any waiting queries and the steps repeat.
[^9]: Postgrex query [^10]: Holder checkout_call
Summary
Let’s summarize what we learned!
When start_link
is called:
- A
DBConnection.ConnectionPool
process is started. This is pretty much the only thing the user interacts with directly. - An ETS table is created by the ConnectionPool and used as queue for incoming requests, as well as tracking connections in the pool.
- A
DBConnection.ConnectionPool.Pool
supervisor is started. This is dynamically added to the DBConnection’s supervision tree andDBConnection.Watcher
links it to the ConnectionPool process. - One or more
DBConnection.Connection
processes are started as children of the Pool supervisor. Each one represents a separate network connection to the database. - Each Connection is referenced by an ETS table created by
DBConnection.Holder
. Ownership of these holder tables is passed to the ConnectionPool process.
And finally when a query is sent:
- The first available connection is found by looking at the ETS queue for a holder reference. The calling process gets ownership of the holder ETS table and the reference to it is removed from the queue.
- If no holders are found, the query is added to the ETS queue.
- When a query finishes the holder reference is passed back to the ConnectionPool process. The next queued query is pulled and run, if there are any.