# What is connection pooling?
At the moment, every time we want to use a database connection in our application, we've created one, used it, and then closed it.
Creating a new connection is slow. We have to communicate with the database server, and it involves some computer processing to create a connection.
Connection pooling involves creating a bunch of connections at once, and then using them one at a time.
The key thing is though, when you're done with a connection you don't close it. Instead, you commit and you put it back into the pool.
That way you only need to create the connections once, at the start of your application, and they get re-used over and over.
The general flow goes like this:
- The client (Python) creates the initial connections. This is slow if you are creating a lot of connections.
- The connections are stored in memory (e.g. a list).
- When the client wants to use a connection:
- Take a connection out of the pool.
- Use it as normal.
- Commit or roll back, making sure the connection has no pending transactions.
- Put it back into the pool.
# How many connections should be in a pool?
There's no hard-and-fast answer to this, but there are a couple things to think about:
- Every connection you open will consume some RAM in your database server.
- If you run out of connections in the client pool, you'll have to create new ones, which is slow.
- Some database systems (and operating systems) will limit the number of connections you can have open simultaneously.
You should generally start with as few connections as possible in your pool, by thinking about how many users will access your application simultaneously.
For our command-line applications, this should be 1 because the application can only be accessed by one user at a time.
For something like web application servers though, you can start doing some maths on:
- How many users are accessing your application simultaneously at a time.
- How many computers and processes your application is running on.
Then, generally aim for the low end of the range that you'll come up with!
# What are the limitations or drawbacks of connection pooling?
# Running out of server RAM
The top limitation to the number of connections in the pool is when your database server doesn't allow you to open any more. This can cause a couple problems:
- You may not have enough connections for everything you need in your applications.
- You can't actually open new ones if you run out of connections in the pool.
If you reach this limit, it's likely your database server would complain or crash. At that point, you may consider splitting your database into multiple servers (via sharding), or limiting the number of connections.
# Running out of connections in the client pool
If you run out of connections in the pool, the usual process is to create a new connection and add it to the pool before using it.
That can end up resulting in many new connections added over time if you keep running out of connections.
But at the same time, it means that running out of connections isn't a terrible problem.
If you run out of connections and you can't open new ones, usually the client should queue for a connection, suspending the task requesting a connection until one becomes available[1]. By default, psycopg2
connection pool classes (which we'll use in the next chapter) will raise an error if you run out connections.
TIP
Remember that the connection pool is in the client, and not the server. For our command-line applications, running out of connections in the pool is highly unlikely (impossible if using a single-threaded, single-process app).
It's more likely if you have a single-process Python app as a server where lots of users are connecting to it. In that case, before queuing connections you should consider turning the Python app into a multi-process application where each process has its own connection pool.
Basically: queuing connections means the user is waiting, and that is bad!