Accelerating Ruby: How Our Bundle Install Times Got 12x Faster
A special thanks to the Ruby open source community
In 2014, Samuel Giddins, a 19-year old freshman at the University of Chicago, decided to delay returning to college for his sophomore year. Determined to address a glaring deficiency in CocoaPods, an open source tool for managing and installing third-party libraries for iOS apps, Sam applied for a small grant to focus on this difficult issue. Upon learning that his proposal was accepted, he jumped at this once-in-a-lifetime opportunity and moved to San Francisco for the following year, saying temporary goodbyes to many of his new college friends.
Because Sam's initial efforts focused on porting functionality from Bundler, a tool for installing third-party Ruby libraries, he became involved in helping to scale RubyGems.org, the primary site for hosting open source Ruby code that currently serves over 15 billion requests a day and serves more than 176,000 Ruby libraries. Due in part to the popularity of the Rails web framework adopted at many companies including Square, Bundler was becoming increasingly slow to download the necessary information to do its job. The RubyGems team added an API dedicated for querying this information, but this change stressed CPU and database load to the point that the servers couldn't handle the inbound traffic anymore.
In 2015, the Bundler team, led by André Arko, devised a new approach. The team decided that Bundler would no longer depend on a computationally expensive API exclusively hosted in Amazon's data center in the East Coast or an insecure custom serialization format. Instead, much of this data would be cached in an append-only text file that would be geographically distributed through a Content Delivery Network (CDN), making it faster and more secure. The bulk of this feature, which became known as the compact index format, was implemented over the course of a year. A prototype was built by André with the full implementation built by Felipe Tanus as a summer project under Sam's guidance. The RubyGems website moved swiftly to adopt the format, and a new version of Bundler (v1.12) also transitioned to supporting it.
In parallel, Gemstash, an open source project that enabled companies to store their private Ruby libraries, was released in 2015. Gemstash also included a mirror caching feature to reduce the traffic load on RubyGems.org. At Square, we started running a self-hosted version of Gemstash in our data centers in 2016. Although Gemstash lacks support for compact indexes (GitHub issue), it was not an issue for the past 7 years until a recent Bundler update began requiring the use of it.
For this reason, we prioritized migrating our Ruby gems to a vendor-supported solution that included compact indexes support. As a result of this migration, we measured a Bundler install for the Android Point of Sale codebase to take 20 seconds instead of 4 minutes, a 12x improvement. It also provides the major benefit of downloading libraries over a CDN to speed up downloads for our developers distributed around the world.
One unexpected snag we discovered late in the migration was that certain CI jobs were failing to install correctly because of a checksum mismatch when invoking Bundler. As we investigated further, we identified a bug in the compact index implementation that was failing to account for the different types of Ruby interpreters available. We relayed our findings to the vendor and found a workaround to continue with the migration.
After taking a year off to focus on open source work, Sam returned to the University of Chicago in 2015. He graduated from college in 2017 and joined Square as a member of the iOS Mobile Developer Experience team, spearheading the decision to improve build times by migrating to Bazel. As a member of the Ruby open source community, he also published a recent announcement about the deprecation of the upcoming RubyGems API, a culmination of more than 7 years of work to make the installation process fast and reliable. We owe a special thanks not only to him but also to the members of the Ruby open source community who helped make this upcoming change on May 24th happen.
(Special thanks to Dan Taylor, Jason Wu, Samuel Giddins, and André Arko for reviewing and providing feedback on this post.)