My GOTO Postgres Configuration for Web Services

My guide to setting up postgres for web services.

My GOTO Postgres Configuration for Web Services

UPDATE Jan 6, 2020: I've updated this guide's configuration in another post to also include RDS databases. This configuration still works, but only for self-managed postgres installations that have a real superuser.

I've painfully learned how to configure postgres one incident at a time. And now this guide represents my learnings for web services. Also, I'm tired of thinking and discussing this with myself, and I want share this guide with others. This guide presents my default postgres database configuration for web services.

Table of Contents

Assumptions

This guide is a starter configuration for postgres. It makes several assumptions about a web service and development team. This may work for you. This guide assumes:

  • A running postgres database and you have superuser or superuser-like (e.g., rds_superuser)(see update for RDS) access
  • You're developing a web service with a team and both will (hopefully) grow rapidly (think: startups)
  • Schema changes are executed as an admin process -- you use rake db:migrate, manage.py migrate, sequelize db:migrate, or similar to create tables prior to app deployment
  • One application postgres database (i.e., when you run \l to show all databases on the server)
  • One schema named after the database

Last but not least, this configuration is designed to evolve as your web service and team grow. If the configuration doesn't work, change it!

The End Result

  • One user (i.e., a postgres role that can log in) owns all objects (database, schema, tables, indexes, etc.). This user is called the owner.
  • A role readwrite_users that enables read and write access to all objects created by the owner.
  • A role readonly_users that enables read-only access to all objects created by the owner.
  • An app user that belongs to readwrite_role that has connection settings to guard against scaling problems (statement_timeout, lock_timeout, idle_in_transaction_session_timeout, connection_limit, and search_path)

The Upsides

  • Enables good separation for maintenance operations (schema changes, data backfills, monitoring) from application operations (CRUD, data management, etc.)
  • Enables server-side defaults for user settings like statement_timeout and lock_timeout -- the application can still override these if needed
  • Sets up roles and allows you to never deal with postgres GRANTS, REVOKES, and the unfortunately confusing ALTER DEFAULT PRIVILEGES ever again (as long as you stay within the use-cases of this configuration of course)
  • Groups users by their access privileges of read-and-write or read-only
  • Prevents TRUNCATE of an entire table from the app user

The Downsides

This configuration doesn't come without its downsides. The biggest downside I experience is related to local development. Local development requires knowing two sets of credentials -- one credential is used for schema changes (migrations), and one is used by the app to read and write data. For the developer, this knowledge can be codified to reduce confusion, but it's still knowledge a developer should know with the configuration presented in this guide.

Another downside with this configuration is the inability to prevent DELETE * FROM table. The configuration does prevent TRUNCATE statements for application developers, but preventing DELETE * FROM table is not easily accomplished without bigger trade-offs.

Finally, setting guard rails like statement_timeout at the database layer does not prevent abuse from the application code. Postgres does not prevent SET statements. Also, it's important that the application has the ability to override settings when needed. As mentioned, the default settings are guard rails and not jail bars.

Walkthrough: Configuring Postgres

Let's walk through the steps required to create the configuration described. Connect to your database with a superuser. All statements executed within this guide require the superuser. That is all you will need.

Create the owner User

The owner user owns all objects for your application's database. Every schema, table, index, and <insert favorite postgres object> is owned by this user. owner is the only user used for schema changes.

CREATE ROLE owner 
	LOGIN 
	ENCRYPTED PASSWORD 'secret' 
	CONNECTION LIMIT 3;
ALTER ROLE owner SET statement_timeout = 20000;
ALTER ROLE owner SET lock_timeout = 3000;

-- v9.6+
ALTER ROLE owner SET idle_in_transaction_session_timeout = 3000; 

  • Change the password to something secure. :)
  • CONNECTION LIMIT 3: This allows owner to connect if when we have accidental long-running queries. Which leads us to ...
  • statement_timeout: 20 seconds is long, but it provides an upper limit for long-running queries. You will likely set this value higher in migrations to accommodate for schema changes that rewrite tables and for a larger data set. But you can enjoy that error message when your data is large enough that postgres cancels your schema change.
  • lock_timeout: 3 seconds is what I've found to be long enough but not too long where an outage will occur due to connection exhaustion.
  • idle_in_transaction_session_timeout: 3 seconds, similar to lock_timeout is what I've found useful for my applications. Change the value to suit your needs. This setting is only available in postgres 9.6+.

Create the readwrite_users Role

The readwrite_users role groups users that require read and write access (CRUD) to objects owned by owner.

CREATE ROLE readwrite_users NOLOGIN;

We need a database to alter privileges, so this is as much as we can configure for this role at this point.

Create the readonly_users Role

The readonly_users role groups users that require read access (SELECT) to objects owned by owner.

CREATE ROLE readonly_users NOLOGIN;

We need a database to alter privileges, so this is as much as we can configure for this role at this point.

Create the Database

Create the exampledb database.

CREATE DATABASE exampledb 
WITH OWNER owner 
ENCODING UTF8 
LC_COLLATE 'en_US.UTF-8' 
LC_CTYPE 'en_US.UTF-8';

Postgres enables connection privileges to new databases for all users (PUBLIC) by default. Postgres also creates the public schema enabling usage and creation of tables within that schema for all users. Instead of relying on implicit privileges from the default installation, we want to declare privileges to roles explicitly. This puts us in a state of least privilege first, and then we add new privileges afterward. I generally find the least-privilege-first strategy much easier to follow mentally and in code. I'm a simpleton, so starting with nothing and adding is easier than starting when everything and then removing.

Therefore, with new databases, I revoke all privileges for the database and drop the public schema.

\c exampledb
REVOKE ALL ON DATABASE exampledb FROM PUBLIC;
DROP SCHEMA public;

Create the Schema and Configure search_path

Now create the schema. The schema is named after the database to keep things simple.

SET ROLE owner;
CREATE SCHEMA exampledb;
RESET ROLE;

Now we default the owner user to exampledb schema on connection since we dropped the public schema. Also, if this was not set, then owner defaults to "$user" -- i.e., the owner schema that does not exist.

ALTER ROLE owner SET search_path TO exampledb;

We also need to set our session's search_path to exampledb so that we, as a superuser, properly create objects in the exampledb schema.

SET search_path TO exampledb; 

GRANTs for Resources

With the basic resources provisioned, we grant privileges for the owner user, readwrite_users role, and readonly_users role. Starting with the exampledb database.

GRANT CONNECT ON DATABASE exampledb TO readwrite_users;
GRANT TEMPORARY ON DATABASE exampledb TO readwrite_users;

GRANT CONNECT ON DATABASE exampledb TO readonly_users;
GRANT TEMPORARY ON DATABASE exampledb TO readonly_users;

Now grant privileges to the exampledb schema.

GRANT USAGE ON SCHEMA exampledb TO readwrite_users;
GRANT USAGE ON SCHEMA exampledb TO readonly_users;

-- Not needed, but being explicit helps with /dn+
GRANT CREATE, USAGE ON SCHEMA exampledb TO owner;

Future-proofing

GRANT (and REVOKE) enables privileges for existing objects but not for new objects. This is accomplished with ALTER DEFAULT PRIVILEGES.

Let's configure privileges for the readwrite_users role. We want to ensure new objects like tables or indexes allow, well, read and write privileges for the users in the readwrite_users role.

ALTER DEFAULT PRIVILEGES
  FOR ROLE owner
  IN SCHEMA exampledb
  GRANT SELECT, INSERT, UPDATE, DELETE
  ON TABLES
  TO readwrite_users;

ALTER DEFAULT PRIVILEGES
  FOR ROLE owner
  IN SCHEMA exampledb
  GRANT USAGE, SELECT, UPDATE
  ON SEQUENCES
  TO readwrite_users;

ALTER DEFAULT PRIVILEGES
  FOR ROLE owner
  IN SCHEMA exampledb
  GRANT EXECUTE
  ON FUNCTIONS
  TO readwrite_users;

ALTER DEFAULT PRIVILEGES
  FOR ROLE owner
  IN SCHEMA exampledb
  GRANT USAGE
  ON TYPES
  TO readwrite_users;

As stated earlier, these statements rely on owner to create all objects in the database. Therefore, these statements do not include privileges that would allow a user in readwrite_users to create objects. Refer to the ALTER DEFAULT PRIVILEGES and Privileges documentation for details on the other privileges.

Additionally, I did not include a statement for future schemas because this guide doesn't support multiple schemas.

Now let's move onto the readonly_users role.

ALTER DEFAULT PRIVILEGES
  FOR ROLE owner
  IN SCHEMA exampledb
  GRANT SELECT
  ON TABLES
  TO readonly_users;

ALTER DEFAULT PRIVILEGES
  FOR ROLE owner
  IN SCHEMA exampledb
  GRANT USAGE, SELECT
  ON SEQUENCES
  TO readonly_users;

ALTER DEFAULT PRIVILEGES
  FOR ROLE owner
  IN SCHEMA exampledb
  GRANT EXECUTE
  ON FUNCTIONS
  TO readonly_users;

ALTER DEFAULT PRIVILEGES
  FOR ROLE owner
  IN SCHEMA exampledb
  GRANT USAGE
  ON TYPES
  TO readonly_users;

Finally, we want to ensure PUBLIC (a special role that means all users) does not have access to objects in our database. This forces users to belong to either the readwrite_users role or readonly_users role to access the database.

ALTER DEFAULT PRIVILEGES
  FOR ROLE owner
  REVOKE ALL PRIVILEGES
  ON TABLES
  FROM PUBLIC;

ALTER DEFAULT PRIVILEGES
  FOR ROLE owner
  REVOKE ALL PRIVILEGES
  ON SEQUENCES
  FROM PUBLIC;

ALTER DEFAULT PRIVILEGES
  FOR ROLE owner
  REVOKE ALL PRIVILEGES
  ON FUNCTIONS
  FROM PUBLIC;

ALTER DEFAULT PRIVILEGES
  FOR ROLE owner
  REVOKE ALL PRIVILEGES
  ON TYPES
  FROM PUBLIC;

ALTER DEFAULT PRIVILEGES
  FOR ROLE owner
  REVOKE ALL PRIVILEGES
  ON SCHEMAS
  FROM PUBLIC;

Once again, I favor explicitness for readability (and my sanity) because not all of the statements above are required. The default privileges for PUBLIC are documented, and a few of the statements above are redundant.

With these last statements, you will not need to work with GRANTS and ALTER DEFAULT PRIVILEGES again as long as you create users in the readwrite_users or readonly_users roles. WOOHOO!

Create the app User

The app user is what your web service will use to connect to the database with. app (usually) requires read and write permissions, so this user belongs to the readwrite_users role.

CREATE ROLE app WITH
	LOGIN 
	ENCRYPTED PASSWORD 'secret' 
	CONNECTION LIMIT 90 
	IN ROLE readwrite_users;
ALTER ROLE app SET statement_timeout = 1000;
ALTER ROLE app SET lock_timeout = 750;

-- v9.6+
ALTER ROLE app SET idle_in_transaction_session_timeout = 1000; 

ALTER ROLE app SET search_path = exampledb;
  • Change the password. :)
  • CONNECTION LIMIT 90: if you're using the default installation of postgres, the installer usually defaults max_connections to 100. Setting this to 90 ensures there are a few more connections available for administrators to hop in and wrangle queries in the event of an incident. Feel free to change this to match your database setup. If you need more connections, follow postgres' best practice of adding a connection pooler in front of the database.
  • statement_timeout: 1 second is a long time for a web request. 1 second for one query of many queries in a web request feels like an eternity to a user. Feel free to change this to suit your needs. I like pushing this as low as possible for my services. Keeping this low is important at scale: Twitch has a 1-second statement_timeout.
  • lock_timeout: 750ms might not seem much, but if you have the 1-second statement_timeout, this allows your query 249ms of execution time. Change as required.
  • idle_in_transaction_session_timeout: for postgres v9.6+, this ensures you do not have zombie transactions that maintain active connections. This guards the database against webservers that terminate requests and leave a connection open.

The GIST of It

Alright, alright! That was a lot of text to set up a database. Here's the gist. I use the same defaults and secrets above, so please change them!


Developer Basics

This guide's postgres configuration provides better defaults for scaling web services with databases than the defaults provided by a base installation of postgres. That said, this configuration may seem harsh and punishing if you're unfamiliar with writing efficient SQL. But, this is okay. Learn to write SQL efficiently while your web service and team scale up. While you're doing this, the configuration will help protect you from painful incidents like a tipped database. Nevertheless, let's discuss a few points to keep in mind.

The app user has SELECT, INSERT, UPDATE, and DELETE privileges for all tables created by the owner user. app is the only user your web service connects with to handle web requests. SQL queries sent with the app user should be fast, or the settings for the app user will terminate the statement. If your application code attempts to create tables, indexes, or other postgres objects, you will be on the receiving end of a permission failure. To solve this, use the owner user.

The owner user manages all tables and indexes for your application's database. The management of the database schema should happen outside of the webserver process -- in an admin process like rake db:migrate or alembic upgrade head. The owner user is not designed for web service requests. You want to create the structure of the database with owner, but manage the data in that structure with app.

Finally, you can override setting values if needed. The defaults are guard rails, not jail bars. You can jump over the rail if you know what you're doing.

Schema Changes as a superuser

Hopefully, you don't need to do this as schema migrations with owner should suffice. But, incidents happen and maybe the quickest way to stop the bleeding is to hop onto the database and wrangle some code.

SET ROLE owner;
-- ALTER TABLE ...
-- ...
RESET ROLE;

Schema Changes (via Migrations) and Local Development

Use the owner's credentials to run migrations. If your application follows the Twelve-Factor App's Config Factor, then the owner credentials can be passed in via environment variables in the form of a postgres connection URI. Here's an example of what the URI would look like for a local database:

postgres://owner:[email protected]:5432/exampledb 

If you're using rake db:migrate, set the DATABASE_URL environment variable to the owner credentials.

If you're using alembic upgrade, ensure EnvironmentContext.configure passes in the database URI from the environment, and you can swap out database URIs as mentioned. e.g.,

# env.py
import os
from alembic import context

context.configure(url=os.getenv("DATABASE_URL", "localhost"))

Similar strategies are available in your migration tool of choice.

List Users

List users and roles:

\du+

List users and their settings (only as a superuser):

SELECT usename, useconfig FROM pg_shadow;

Add User

We only add users to readwrite_users or readonly_users.

CREATE ROLE newuser WITH
	LOGIN
	ENCRYPTED PASSWORD 'secret'
	CONNECTION LIMIT 3
	IN ROLE readwrite_users;
ALTER ROLE newuser SET search_path = exampledb;

-- Recommended connection settings
ALTER ROLE newuser SET statement_timeout = 1000;
ALTER ROLE newuser SET lock_timeout = 750;
-- v9.6+
ALTER ROLE newuser SET idle_in_transaction_session_timeout = 1000; 

Tweak the IN ROLE, CONNECTION LIMIT, and settings to fit your needs.

Drop/Remove User

Connect as the superuser and execute the following:

-- Change password
ALTER ROLE myuser ENCRYPTED PASSWORD 'somethingbogus';

-- Terminate existing queries/connections
SELECT pg_terminate_backend(pg_stat_activity.pid) 
FROM pg_stat_activity
WHERE usename = 'myuser';

-- Now remove the user
DROP USER myuser;

Changing the user password stops new connections from clients. Terminating existing queries ensures that the targeted drop is no longer connected to the database before dropping the user.

Update User Settings

Connect as the superuser and execute the following:

ALTER ROLE myuser SET statement_timeout = 5000;

-- Optional, terminate existing queries
SELECT pg_terminate_backend(pg_stat_activity.pid) 
FROM pg_stat_activity
WHERE usename = 'myuser';

Settings are applied at connection time. Therefore, you need to have your application or developer reconnect after the change.

app Password Rotation

Connect as the superuser and follow these steps:

  1. Change the app password: ALTER ROLE myuser ENCRYPTED PASSWORD 'newpassword'
  2. Update the credentials wherever you store them (e.g., heroku settings, chef data bags, etc)
  3. Restart your application to use new credentials.

This strategy does have a flaw. The time between Step 1 and Step 3 may raise exceptions from the application if you're using webservers that cancel requests (Rack::Timeout, g/Unicorn, etc) and reconnect to the database. You'll want to execute the above steps rapidly.

Dynamically Set Settings From Your Application

The database is configured with default settings like statement_timeout for app, but these can be overridden from the application code. A quick first pass might look like:

-- WARNING! Leaky settings! WARNING!
SET statement_timeout TO 5000; 
-- ... execute query ...
RESET statement_timeout;

This is not a full-proof method because you may leak setting values. If you're using a connection pooler like pgbouncer, then executing this statement from the application will enable this setting for the life of the connection. The connection pooler shares this connection with other requests, so you must call RESET statement_timeout after you've completed your query. Otherwise, the setting value lasts until the connection is closed, a new SET statement_timeout query is executed, or if RESET statement_timeout is executed. But even if you executed RESET, leaking the value to other requests may still occur with long-request-terminating webservers.

A better method is to change the setting in a transaction using SET LOCAL:

BEGIN;
SET LOCAL statement_timeout = 5000;
-- execute queries
COMMIT; -- or ROLLBACK

Now, when the transaction is complete (COMMIT or ROLLBACK), statement_timeout is reset. Additionally, if your webserver terminates long requests during a transaction, then idle_in_transaction_session_timeout will terminate the connection and clean up the setting set by the application.

Query Errors

With all this talk of query cancellations, what are the errors that occur? Let's have a look.

ERROR:  canceling statement due to statement timeout

You see this error message when a query surpasses the statement_timeout value.

If possible, rework the query (maybe it's missing an index?) and then try again. If this strategy doesn't work, you can increase statement_timeout for the duration of this query.

ERROR:  canceling statement due to lock timeout

You see this error message when a query attempts to acquire a lock and surpasses the lock_timeout value.

This one isn't as easy (or difficult) as fixing statement_timeout. For this error, you'll need to make sure that your application queries hold locks for as little as possible. Or you can delay the queries. You can rearchitect web requests so that the locks happen out of the web request cycle -- in an asynchronous job.

FATAL:  terminating connection due to idle-in-transaction timeout

You see this error message when you open a transaction and idle for longer than idle_in_transaction_session_timeout.

You can fix this by rearranging your queries to ensure there are no long pauses in between queries sent to the database.


Outro

This configuration described in this guide is what I use for web services now. I even use this for personal projects that don't have the data size or team size at work! That said, this is my default setup for web services and it has served (and serving) me well. I hope it works for you.