June 6, 2011 | 2 minute read

PostgreSQL: Data is Important

A case for PostgreSQL in a modern web architecture.

At our engineering all-hands meeting, I presented a 5-minute lightning talk on the merits of PostgreSQL. In the spirit of being able to search for data, I converted the deck into the following blog post. Enjoy!

PostgreSQL: Data is Important

But Why Postgres?

Solves some problems with migrations
Full-featured indexes
Robust constraints
Streaming replication
Sophisticated query planner
PostGIS
Full text search
Advanced SQL features

Migration Improvements

No downtime for most DDL migrations. Instantly performs:

dropping and renaming columns
adding columns (if not NOT NULL)
creating indexes (CREATE INDEX CONCURRENTLY)

Transactional DDL statements (not even Oracle has this)

wrap migrations in a SQL transaction
roll back everything in migration on a partial failure

Indexes

indexes on expressions
CREATE INDEX ON USERS md5(id)
conditional indexes
CREATE UNIQUE INDEX ON payment_sources (user_id, status) WHERE status = ‘active’;
inverted indexes
full-text search, array operations (inclusion tests, etc), ltrees
very extensible
tablespaces
can put indexes on different disk than the tables.
fully decoupled from table store

Constraints

CHECK constraints
foreign key constraints available for all tables
exclusion constraints (new in 9.0)

Benefits of validation in the DB

reduces dependencies on monolithic app
improves consistency guarantees
deeper level of safeguards
don’t have to worry about prefetching associations
just as easy to test

Streaming Replication

new in 9.0
streams WALs to standby servers while being generated
less latency than file-based replication
more consistent than statement-based replication (MySQL <5.1)
synchronous capabilities coming in 9.1 guarantees 2-safe durability

Sophisticated Query Planner

cost-based (as opposed rule-based) query optimizer
heuristics gathered on tables to optimize better
costs of random seek, seq scan, etc. configurable
e.g making random seeks on an SSD not a bad thing to the optimizer
genetic algorithms to plan complex queries.

Better EXPLAIN

MySQL sample

+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
| id | select_type | table               | type  | possible_keys                        | key                                  | key_len | ref  | rows | Extra                        |
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
|  1 | PRIMARY     | NULL                | NULL  | NULL                                 | NULL                                 | NULL    | NULL | NULL | Select tables optimized away |
|  2 | DERIVED     | bacon_cheeseburgers | range | index_bacon_cheeseburgers_on_user_id | index_bacon_cheeseburgers_on_user_id | 5       | NULL |    1 | Using where                  |
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+

Postgres Sample

QUERY PLAN                                                              
-----------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=8.31..8.32 rows=1 width=0)
   ->  GroupAggregate  (cost=0.00..8.30 rows=1 width=12)
         Filter: ((min(bacon_cheeseburgers.created_at) + '7 days'::interval) < max(bacon_cheeseburgers.created_at))
         ->  Index Scan using index_bacon_cheeseburgers_on_user_id on bacon_cheeseburgers  (cost=0.00..8.27 rows=1 width=12)
               Index Cond: ((user_id >= 69) AND (user_id <= 70))

Also… if you have a subquery, postgres won’t execute it

PostGIS (for geo queries)

extensible GIS system
R tree indexes for built-in geometric data types
geometric data types
free census data (TIGER) for geocoding (addresses→coordinates)
Rails integration example from railsonpostgresql.com (site no longer exists).
Restaurant.first(:conditions => [“the_geom && ?”, Polygon.from_coordinates([[[x_min, y_min],

[x_min, y_max], …]]], 4269)])

Full Text Search

full-featured text indexing
dictionaries (languages, stopwords, synonyms)
customizable search ranking algorithms
mature
lighter weight solution than solr/lucene

Advanced SQL Features

EXCEPT and INTERSECT statements in addition to UNION
pl/*, custom aggs, UDFs, etc.
Triggers
Windowing functions (example from postgresql.org)

(for advanced analytics)

SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC)  FROM empsalary;

  depname  | empno | salary | rank 
-----------+-------+--------+------
 develop   |     8 |   6000 |    1
 develop   |    11 |   5200 |    2
 develop   |     9 |   4500 |    3
 personnel |     2 |   3900 |    1
 personnel |     5 |   3500 |    2
 sales     |     1 |   5000 |    1
 sales     |     4 |   4800 |    2

Mike Lewis (@MikeLewis) | Twitter *The latest Tweets from Mike Lewis (@MikeLewis). maker of stuff. kitten enthusiast. iOS hacker/troublemaker at @square…*twitter.com

Authored By

Square Engineering

@BlockEng

Table Of Contents

PostgreSQL: Data is Important
But Why Postgres?
Migration Improvements
No downtime for most DDL migrations. Instantly performs:
Transactional DDL statements (not even Oracle has this)
Indexes
Constraints
Benefits of validation in the DB
Streaming Replication
Sophisticated Query Planner
Better EXPLAIN
MySQL sample
Postgres Sample
PostGIS (for geo queries)
Full Text Search
Advanced SQL Features

PostgreSQL: Data is Important

A case for PostgreSQL in a modern web architecture.

PostgreSQL: Data is Important

But Why Postgres?

Migration Improvements

No downtime for most DDL migrations. Instantly performs:

Transactional DDL statements (not even Oracle has this)

Indexes

Constraints

Benefits of validation in the DB

Streaming Replication

Sophisticated Query Planner

Better EXPLAIN

MySQL sample

Postgres Sample

PostGIS (for geo queries)

Full Text Search

Advanced SQL Features

Authored By

APIs

Engineering

Data Science

View More Articles ›

About

The Corner

Community

Square Developer