PostgreSQL: Data is Important

A case for PostgreSQL in a modern web architecture.

Reddit
LinkedIn

Written by Mike Lewis.

At our engineering all-hands meeting, I presented a 5-minute lightning talk on the merits of PostgreSQL. In the spirit of being able to search for data, I converted the deck into the following blog post. Enjoy!

PostgreSQL: Data is Important

But Why Postgres?

  • Solves some problems with migrations

  • Full-featured indexes

  • Robust constraints

  • Streaming replication

  • Sophisticated query planner

  • PostGIS

  • Full text search

  • Advanced SQL features

Migration Improvements

No downtime for most DDL migrations. Instantly performs:

Transactional DDL statements (not even Oracle has this)

  • wrap migrations in a SQL transaction

  • roll back everything in migration on a partial failure

Indexes

Constraints

  • CHECK constraints

  • foreign key constraints available for all tables

  • exclusion constraints (new in 9.0)

Benefits of validation in the DB

  • reduces dependencies on monolithic app

  • improves consistency guarantees

  • deeper level of safeguards

  • don’t have to worry about prefetching associations

  • just as easy to test

Streaming Replication

Sophisticated Query Planner

  • cost-based (as opposed rule-based) query optimizer

  • heuristics gathered on tables to optimize better

  • costs of random seek, seq scan, etc. configurable

  • e.g making random seeks on an SSD not a bad thing to the optimizer

  • genetic algorithms to plan complex queries.

Better EXPLAIN

MySQL sample

+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
| id | select_type | table               | type  | possible_keys                        | key                                  | key_len | ref  | rows | Extra                        |
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
|  1 | PRIMARY     | NULL                | NULL  | NULL                                 | NULL                                 | NULL    | NULL | NULL | Select tables optimized away |
|  2 | DERIVED     | bacon_cheeseburgers | range | index_bacon_cheeseburgers_on_user_id | index_bacon_cheeseburgers_on_user_id | 5       | NULL |    1 | Using where                  |
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+

Postgres Sample

QUERY PLAN                                                              
-----------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=8.31..8.32 rows=1 width=0)
   ->  GroupAggregate  (cost=0.00..8.30 rows=1 width=12)
         Filter: ((min(bacon_cheeseburgers.created_at) + '7 days'::interval) < max(bacon_cheeseburgers.created_at))
         ->  Index Scan using index_bacon_cheeseburgers_on_user_id on bacon_cheeseburgers  (cost=0.00..8.27 rows=1 width=12)
               Index Cond: ((user_id >= 69) AND (user_id <= 70))

Also… if you have a subquery, postgres won’t execute it

PostGIS (for geo queries)

  • extensible GIS system

  • R tree indexes for built-in geometric data types

  • geometric data types

  • free census data (TIGER) for geocoding (addresses→coordinates)

  • Rails integration example from railsonpostgresql.com (site no longer exists).

  • Restaurant.first(:conditions => [“thegeom && ?”, Polygon.fromcoordinates([[[xmin, ymin], [xmin, ymax], …]]], 4269)])

Full Text Search

  • full-featured text indexing

  • dictionaries (languages, stopwords, synonyms)

  • customizable search ranking algorithms

  • mature

  • lighter weight solution than solr/lucene

Advanced SQL Features

SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) 
FROM empsalary;

  depname  | empno | salary | rank 
-----------+-------+--------+------
 develop   |     8 |   6000 |    1
 develop   |    11 |   5200 |    2
 develop   |     9 |   4500 |    3
 personnel |     2 |   3900 |    1
 personnel |     5 |   3500 |    2
 sales     |     1 |   5000 |    1
 sales     |     4 |   4800 |    2

Mike Lewis (@MikeLewis) | Twitter The latest Tweets from Mike Lewis (@MikeLewis). maker of stuff. kitten enthusiast. iOS hacker/troublemaker at @square…twitter.com