PostgreSQL: Data is Important
A case for PostgreSQL in a modern web architecture.
Written by Mike Lewis.
At our engineering all-hands meeting, I presented a 5-minute lightning talk on the merits of PostgreSQL. In the spirit of being able to search for data, I converted the deck into the following blog post. Enjoy!
PostgreSQL: Data is Important
But Why Postgres?
-
Solves some problems with migrations
-
Full-featured indexes
-
Robust constraints
-
Streaming replication
-
Sophisticated query planner
-
PostGIS
-
Full text search
-
Advanced SQL features
Migration Improvements
No downtime for most DDL migrations. Instantly performs:
-
dropping and renaming columns
-
adding columns (if not NOT NULL)
-
creating indexes (CREATE INDEX CONCURRENTLY)
Transactional DDL statements (not even Oracle has this)
-
wrap migrations in a SQL transaction
-
roll back everything in migration on a partial failure
Indexes
-
CREATE INDEX ON USERS md5(id)
-
CREATE UNIQUE INDEX ON payment_sources (user_id, status) WHERE status = ‘active’;
-
inverted indexes
-
full-text search, array operations (inclusion tests, etc), ltrees
-
can put indexes on different disk than the tables.
-
fully decoupled from table store
Constraints
-
CHECK constraints
-
foreign key constraints available for all tables
-
exclusion constraints (new in 9.0)
Benefits of validation in the DB
-
reduces dependencies on monolithic app
-
improves consistency guarantees
-
deeper level of safeguards
-
don’t have to worry about prefetching associations
-
just as easy to test
Streaming Replication
-
new in 9.0
-
streams WALs to standby servers while being generated
-
less latency than file-based replication
-
more consistent than statement-based replication (MySQL <5.1)
-
synchronous capabilities coming in 9.1 guarantees 2-safe durability
Sophisticated Query Planner
-
cost-based (as opposed rule-based) query optimizer
-
heuristics gathered on tables to optimize better
-
costs of random seek, seq scan, etc. configurable
-
e.g making random seeks on an SSD not a bad thing to the optimizer
-
genetic algorithms to plan complex queries.
Better EXPLAIN
MySQL sample
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
| 2 | DERIVED | bacon_cheeseburgers | range | index_bacon_cheeseburgers_on_user_id | index_bacon_cheeseburgers_on_user_id | 5 | NULL | 1 | Using where |
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
Postgres Sample
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=8.31..8.32 rows=1 width=0)
-> GroupAggregate (cost=0.00..8.30 rows=1 width=12)
Filter: ((min(bacon_cheeseburgers.created_at) + '7 days'::interval) < max(bacon_cheeseburgers.created_at))
-> Index Scan using index_bacon_cheeseburgers_on_user_id on bacon_cheeseburgers (cost=0.00..8.27 rows=1 width=12)
Index Cond: ((user_id >= 69) AND (user_id <= 70))
Also… if you have a subquery, postgres won’t execute it
PostGIS (for geo queries)
-
extensible GIS system
-
R tree indexes for built-in geometric data types
-
geometric data types
-
free census data (TIGER) for geocoding (addresses→coordinates)
-
Rails integration example from railsonpostgresql.com (site no longer exists).
-
Restaurant.first(:conditions => [“the_geom && ?”, Polygon.from_coordinates([[[x_min, y_min],
[x_min, y_max], …]]], 4269)])
Full Text Search
-
full-featured text indexing
-
dictionaries (languages, stopwords, synonyms)
-
customizable search ranking algorithms
-
mature
-
lighter weight solution than solr/lucene
Advanced SQL Features
-
EXCEPT and INTERSECT statements in addition to UNION
-
pl/*, custom aggs, UDFs, etc.
-
Triggers
(for advanced analytics)
SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC)
FROM empsalary;
depname | empno | salary | rank
-----------+-------+--------+------
develop | 8 | 6000 | 1
develop | 11 | 5200 | 2
develop | 9 | 4500 | 3
personnel | 2 | 3900 | 1
personnel | 5 | 3500 | 2
sales | 1 | 5000 | 1
sales | 4 | 4800 | 2
Authored By
- PostgreSQL: Data is Important
- But Why Postgres?
- Migration Improvements
- No downtime for most DDL migrations. Instantly performs:
- Transactional DDL statements (not even Oracle has this)
- Indexes
- Constraints
- Benefits of validation in the DB
- Streaming Replication
- Sophisticated Query Planner
- Better EXPLAIN
- MySQL sample
- Postgres Sample
- PostGIS (for geo queries)
- Full Text Search
- Advanced SQL Features