UPDATE Jan 6, 2020: I've updated this guide's configuration in another post to also include RDS databases. This configuration still works, but only for self-managed postgres installations that have a real
I've painfully learned how to configure postgres one incident at a time. And now this guide represents my learnings for web services. Also, I'm tired of thinking and discussing this with myself, and I want share this guide with others. This guide presents my default postgres database configuration for web services.
Table of Contents
- The End Result
- The Upsides
- The Downsides
- Walkthrough: Configuring Postgres
- The Gist of It
- Developer Basics
This guide is a starter configuration for postgres. It makes several assumptions about a web service and development team. This may work for you. This guide assumes:
- A running postgres database and you have
or(see update for RDS) access
- You're developing a web service with a team and both will (hopefully) grow rapidly (think: startups)
- Schema changes are executed as an admin process -- you use
sequelize db:migrate, or similar to create tables prior to app deployment
- One application postgres database (i.e., when you run
\lto show all databases on the server)
- One schema named after the database
Last but not least, this configuration is designed to evolve as your web service and team grow. If the configuration doesn't work, change it!
The End Result
- One user (i.e., a postgres role that can log in) owns all objects (database, schema, tables, indexes, etc.). This user is called the
- A role
readwrite_usersthat enables read and write access to all objects created by the
- A role
readonly_usersthat enables read-only access to all objects created by the
appuser that belongs to
readwrite_rolethat has connection settings to guard against scaling problems (
- Enables good separation for maintenance operations (schema changes, data backfills, monitoring) from application operations (CRUD, data management, etc.)
- Enables server-side defaults for user settings like
lock_timeout-- the application can still override these if needed
- Sets up roles and allows you to never deal with postgres GRANTS, REVOKES, and the unfortunately confusing ALTER DEFAULT PRIVILEGES ever again (as long as you stay within the use-cases of this configuration of course)
- Groups users by their access privileges of read-and-write or read-only
TRUNCATEof an entire table from the
This configuration doesn't come without its downsides. The biggest downside I experience is related to local development. Local development requires knowing two sets of credentials -- one credential is used for schema changes (migrations), and one is used by the app to read and write data. For the developer, this knowledge can be codified to reduce confusion, but it's still knowledge a developer should know with the configuration presented in this guide.
Another downside with this configuration is the inability to prevent
DELETE * FROM table. The configuration does prevent
TRUNCATE statements for application developers, but preventing
DELETE * FROM table is not easily accomplished without bigger trade-offs.
Finally, setting guard rails like
statement_timeout at the database layer does not prevent abuse from the application code. Postgres does not prevent SET statements. Also, it's important that the application has the ability to override settings when needed. As mentioned, the default settings are guard rails and not jail bars.
Walkthrough: Configuring Postgres
Let's walk through the steps required to create the configuration described. Connect to your database with a superuser. All statements executed within this guide require the
superuser. That is all you will need.
owner user owns all objects for your application's database. Every schema, table, index, and <insert favorite postgres object> is owned by this user.
owner is the only user used for schema changes.
CREATE ROLE owner LOGIN ENCRYPTED PASSWORD 'secret' CONNECTION LIMIT 3; ALTER ROLE owner SET statement_timeout = 20000; ALTER ROLE owner SET lock_timeout = 3000; -- v9.6+ ALTER ROLE owner SET idle_in_transaction_session_timeout = 3000;
- Change the password to something secure. :)
CONNECTION LIMIT 3: This allows
ownerto connect if when we have accidental long-running queries. Which leads us to ...
statement_timeout: 20 seconds is long, but it provides an upper limit for long-running queries. You will likely set this value higher in migrations to accommodate for schema changes that rewrite tables and for a larger data set. But you can enjoy that error message when your data is large enough that postgres cancels your schema change.
lock_timeout: 3 seconds is what I've found to be long enough but not too long where an outage will occur due to connection exhaustion.
idle_in_transaction_session_timeout: 3 seconds, similar to
lock_timeoutis what I've found useful for my applications. Change the value to suit your needs. This setting is only available in postgres 9.6+.
readwrite_users role groups users that require read and write access (CRUD) to objects owned by
CREATE ROLE readwrite_users NOLOGIN;
We need a database to alter privileges, so this is as much as we can configure for this role at this point.
readonly_users role groups users that require read access (
SELECT) to objects owned by
CREATE ROLE readonly_users NOLOGIN;
We need a database to alter privileges, so this is as much as we can configure for this role at this point.
Create the Database
CREATE DATABASE exampledb WITH OWNER owner ENCODING UTF8 LC_COLLATE 'en_US.UTF-8' LC_CTYPE 'en_US.UTF-8';
Postgres enables connection privileges to new databases for all users (
PUBLIC) by default. Postgres also creates the
public schema enabling usage and creation of tables within that schema for all users. Instead of relying on implicit privileges from the default installation, we want to declare privileges to roles explicitly. This puts us in a state of least privilege first, and then we add new privileges afterward. I generally find the least-privilege-first strategy much easier to follow mentally and in code. I'm a simpleton, so starting with nothing and adding is easier than starting when everything and then removing.
Therefore, with new databases, I revoke all privileges for the database and drop the
\c exampledb REVOKE ALL ON DATABASE exampledb FROM PUBLIC; DROP SCHEMA public;
Create the Schema and Configure
Now create the schema. The schema is named after the database to keep things simple.
SET ROLE owner; CREATE SCHEMA exampledb; RESET ROLE;
Now we default the
owner user to
exampledb schema on connection since we dropped the
public schema. Also, if this was not set, then
owner defaults to
"$user" -- i.e., the
owner schema that does not exist.
ALTER ROLE owner SET search_path TO exampledb;
We also need to set our session's
exampledb so that we, as a superuser, properly create objects in the
SET search_path TO exampledb;
GRANTs for Resources
With the basic resources provisioned, we grant privileges for the
readwrite_users role, and
readonly_users role. Starting with the
GRANT CONNECT ON DATABASE exampledb TO readwrite_users; GRANT TEMPORARY ON DATABASE exampledb TO readwrite_users; GRANT CONNECT ON DATABASE exampledb TO readonly_users; GRANT TEMPORARY ON DATABASE exampledb TO readonly_users;
Now grant privileges to the
GRANT USAGE ON SCHEMA exampledb TO readwrite_users; GRANT USAGE ON SCHEMA exampledb TO readonly_users; -- Not needed, but being explicit helps with /dn+ GRANT CREATE, USAGE ON SCHEMA exampledb TO owner;
GRANT (and REVOKE) enables privileges for existing objects but not for new objects. This is accomplished with ALTER DEFAULT PRIVILEGES.
Let's configure privileges for the
readwrite_users role. We want to ensure new objects like tables or indexes allow, well, read and write privileges for the users in the
ALTER DEFAULT PRIVILEGES FOR ROLE owner IN SCHEMA exampledb GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO readwrite_users; ALTER DEFAULT PRIVILEGES FOR ROLE owner IN SCHEMA exampledb GRANT USAGE, SELECT, UPDATE ON SEQUENCES TO readwrite_users; ALTER DEFAULT PRIVILEGES FOR ROLE owner IN SCHEMA exampledb GRANT EXECUTE ON FUNCTIONS TO readwrite_users; ALTER DEFAULT PRIVILEGES FOR ROLE owner IN SCHEMA exampledb GRANT USAGE ON TYPES TO readwrite_users;
As stated earlier, these statements rely on
owner to create all objects in the database. Therefore, these statements do not include privileges that would allow a user in
readwrite_users to create objects. Refer to the ALTER DEFAULT PRIVILEGES and Privileges documentation for details on the other privileges.
Additionally, I did not include a statement for future schemas because this guide doesn't support multiple schemas.
Now let's move onto the
ALTER DEFAULT PRIVILEGES FOR ROLE owner IN SCHEMA exampledb GRANT SELECT ON TABLES TO readonly_users; ALTER DEFAULT PRIVILEGES FOR ROLE owner IN SCHEMA exampledb GRANT USAGE, SELECT ON SEQUENCES TO readonly_users; ALTER DEFAULT PRIVILEGES FOR ROLE owner IN SCHEMA exampledb GRANT EXECUTE ON FUNCTIONS TO readonly_users; ALTER DEFAULT PRIVILEGES FOR ROLE owner IN SCHEMA exampledb GRANT USAGE ON TYPES TO readonly_users;
Finally, we want to ensure PUBLIC (a special role that means all users) does not have access to objects in our database. This forces users to belong to either the
readwrite_users role or
readonly_users role to access the database.
ALTER DEFAULT PRIVILEGES FOR ROLE owner REVOKE ALL PRIVILEGES ON TABLES FROM PUBLIC; ALTER DEFAULT PRIVILEGES FOR ROLE owner REVOKE ALL PRIVILEGES ON SEQUENCES FROM PUBLIC; ALTER DEFAULT PRIVILEGES FOR ROLE owner REVOKE ALL PRIVILEGES ON FUNCTIONS FROM PUBLIC; ALTER DEFAULT PRIVILEGES FOR ROLE owner REVOKE ALL PRIVILEGES ON TYPES FROM PUBLIC; ALTER DEFAULT PRIVILEGES FOR ROLE owner REVOKE ALL PRIVILEGES ON SCHEMAS FROM PUBLIC;
Once again, I favor explicitness for readability (and my sanity) because not all of the statements above are required. The default privileges for PUBLIC are documented, and a few of the statements above are redundant.
With these last statements, you will not need to work with GRANTS and ALTER DEFAULT PRIVILEGES again as long as you create users in the
readonly_users roles. WOOHOO!
app user is what your web service will use to connect to the database with.
app (usually) requires read and write permissions, so this user belongs to the
CREATE ROLE app WITH LOGIN ENCRYPTED PASSWORD 'secret' CONNECTION LIMIT 90 IN ROLE readwrite_users; ALTER ROLE app SET statement_timeout = 1000; ALTER ROLE app SET lock_timeout = 750; -- v9.6+ ALTER ROLE app SET idle_in_transaction_session_timeout = 1000; ALTER ROLE app SET search_path = exampledb;
- Change the password. :)
CONNECTION LIMIT 90: if you're using the default installation of postgres, the installer usually defaults
max_connectionsto 100. Setting this to 90 ensures there are a few more connections available for administrators to hop in and wrangle queries in the event of an incident. Feel free to change this to match your database setup. If you need more connections, follow postgres' best practice of adding a connection pooler in front of the database.
statement_timeout: 1 second is a long time for a web request. 1 second for one query of many queries in a web request feels like an eternity to a user. Feel free to change this to suit your needs. I like pushing this as low as possible for my services. Keeping this low is important at scale: Twitch has a 1-second
lock_timeout: 750ms might not seem much, but if you have the 1-second
statement_timeout, this allows your query 249ms of execution time. Change as required.
idle_in_transaction_session_timeout: for postgres v9.6+, this ensures you do not have zombie transactions that maintain active connections. This guards the database against webservers that terminate requests and leave a connection open.
The GIST of It
Alright, alright! That was a lot of text to set up a database. Here's the gist. I use the same defaults and secrets above, so please change them!
This guide's postgres configuration provides better defaults for scaling web services with databases than the defaults provided by a base installation of postgres. That said, this configuration may seem harsh and punishing if you're unfamiliar with writing efficient SQL. But, this is okay. Learn to write SQL efficiently while your web service and team scale up. While you're doing this, the configuration will help protect you from painful incidents like a tipped database. Nevertheless, let's discuss a few points to keep in mind.
app user has SELECT, INSERT, UPDATE, and DELETE privileges for all tables created by the
app is the only user your web service connects with to handle web requests. SQL queries sent with the
app user should be fast, or the settings for the
app user will terminate the statement. If your application code attempts to create tables, indexes, or other postgres objects, you will be on the receiving end of a permission failure. To solve this, use the
owner user manages all tables and indexes for your application's database. The management of the database schema should happen outside of the webserver process -- in an admin process like
rake db:migrate or
alembic upgrade head. The
owner user is not designed for web service requests. You want to create the structure of the database with
owner, but manage the data in that structure with
Finally, you can override setting values if needed. The defaults are guard rails, not jail bars. You can jump over the rail if you know what you're doing.
Schema Changes as a
Hopefully, you don't need to do this as schema migrations with
owner should suffice. But, incidents happen and maybe the quickest way to stop the bleeding is to hop onto the database and wrangle some code.
SET ROLE owner; -- ALTER TABLE ... -- ... RESET ROLE;
Schema Changes (via Migrations) and Local Development
owner's credentials to run migrations. If your application follows the Twelve-Factor App's Config Factor, then the
owner credentials can be passed in via environment variables in the form of a postgres connection URI. Here's an example of what the URI would look like for a local database:
If you're using
rake db:migrate, set the
DATABASE_URL environment variable to the
If you're using
alembic upgrade, ensure
EnvironmentContext.configure passes in the database URI from the environment, and you can swap out database URIs as mentioned. e.g.,
# env.py import os from alembic import context context.configure(url=os.getenv("DATABASE_URL", "localhost"))
Similar strategies are available in your migration tool of choice.
List users and roles:
List users and their settings (only as a superuser):
SELECT usename, useconfig FROM pg_shadow;
We only add users to
CREATE ROLE newuser WITH LOGIN ENCRYPTED PASSWORD 'secret' CONNECTION LIMIT 3 IN ROLE readwrite_users; ALTER ROLE newuser SET search_path = exampledb; -- Recommended connection settings ALTER ROLE newuser SET statement_timeout = 1000; ALTER ROLE newuser SET lock_timeout = 750; -- v9.6+ ALTER ROLE newuser SET idle_in_transaction_session_timeout = 1000;
CONNECTION LIMIT, and settings to fit your needs.
Connect as the superuser and execute the following:
-- Change password ALTER ROLE myuser ENCRYPTED PASSWORD 'somethingbogus'; -- Terminate existing queries/connections SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE usename = 'myuser'; -- Now remove the user DROP USER myuser;
Changing the user password stops new connections from clients. Terminating existing queries ensures that the targeted drop is no longer connected to the database before dropping the user.
Update User Settings
Connect as the superuser and execute the following:
ALTER ROLE myuser SET statement_timeout = 5000; -- Optional, terminate existing queries SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE usename = 'myuser';
Settings are applied at connection time. Therefore, you need to have your application or developer reconnect after the change.
app Password Rotation
Connect as the superuser and follow these steps:
- Change the
ALTER ROLE myuser ENCRYPTED PASSWORD 'newpassword'
- Update the credentials wherever you store them (e.g., heroku settings, chef data bags, etc)
- Restart your application to use new credentials.
This strategy does have a flaw. The time between Step 1 and Step 3 may raise exceptions from the application if you're using webservers that cancel requests (
Rack::Timeout, g/Unicorn, etc) and reconnect to the database. You'll want to execute the above steps rapidly.
Dynamically Set Settings From Your Application
The database is configured with default settings like
app, but these can be overridden from the application code. A quick first pass might look like:
-- WARNING! Leaky settings! WARNING! SET statement_timeout TO 5000; -- ... execute query ... RESET statement_timeout;
This is not a full-proof method because you may leak setting values. If you're using a connection pooler like pgbouncer, then executing this statement from the application will enable this setting for the life of the connection. The connection pooler shares this connection with other requests, so you must call
RESET statement_timeout after you've completed your query. Otherwise, the setting value lasts until the connection is closed, a new
SET statement_timeout query is executed, or if
RESET statement_timeout is executed. But even if you executed
RESET, leaking the value to other requests may still occur with long-request-terminating webservers.
A better method is to change the setting in a transaction using SET LOCAL:
BEGIN; SET LOCAL statement_timeout = 5000; -- execute queries COMMIT; -- or ROLLBACK
Now, when the transaction is complete (COMMIT or ROLLBACK),
statement_timeout is reset. Additionally, if your webserver terminates long requests during a transaction, then
idle_in_transaction_session_timeout will terminate the connection and clean up the setting set by the application.
With all this talk of query cancellations, what are the errors that occur? Let's have a look.
ERROR: canceling statement due to statement timeout
You see this error message when a query surpasses the
If possible, rework the query (maybe it's missing an index?) and then try again. If this strategy doesn't work, you can increase
statement_timeout for the duration of this query.
ERROR: canceling statement due to lock timeout
You see this error message when a query attempts to acquire a lock and surpasses the
This one isn't as easy (or difficult) as fixing
statement_timeout. For this error, you'll need to make sure that your application queries hold locks for as little as possible. Or you can delay the queries. You can rearchitect web requests so that the locks happen out of the web request cycle -- in an asynchronous job.
FATAL: terminating connection due to idle-in-transaction timeout
You see this error message when you open a transaction and idle for longer than
You can fix this by rearranging your queries to ensure there are no long pauses in between queries sent to the database.
This configuration described in this guide is what I use for web services now. I even use this for personal projects that don't have the data size or team size at work! That said, this is my default setup for web services and it has served (and serving) me well. I hope it works for you.