A Summer of Kochiku
Kochiku is getting plenty of attention from the new CII team.
Written by Josh Eversmann.
Kochiku is an open-source tool which automatically partitions test suites into distributable chunks, or shards. Most of Square’s projects use Kochiku to quickly get test results by sharding the work between a few hundred Kochiku-workers running in Amazon EC2. Since the number of developers pushing code at Square is rapidly increasing, a team was formed to focus on the improvement and administration of Kochiku. The team has already made a number of quality-of-life changes for projects on Kochiku and written entire features that make the tool even more useful. We decided to open source Kochiku about a year ago, so our team works to keep the public version of Kochiku up-to-date with Square’s internal version — often building out feature branches on GitHub and pulling them into the Square version later. This means the work we’re putting into Kochiku benefits everyone.
What’s new in Kochiku?
Here’s a quick rundown of the growth of Kochiku, roughly in the order that the new features were developed:
Worker Health Dashboard: This new page makes it possible to look at the track record of the entire cluster of kochiku-workers, identifying workers which may be wrongly failing builds.
Action Caching: All of the large and commonly viewed pages load faster and refresh less often, making the web UI more responsive.
Cluster Auto-Scaling: Kochiku monitors cluster utilization levels and can be configured to bring workers up or down based on demand.
Better Submodule Support: All of the remote server configuration options now also apply to submodules, allowing you to use aliases, mirrors, or the local caching strategy for repos that use submodules.
More Useful Emails: Kochiku now sends emails for successful builds (in addition to failures), and the content of the emails is more informative.
Configuration Changes: testcommand and onsuccess_script moved into the kochiku.yml and can be changed from inside the repository. You can also request the uploading of arbitrary log files by the kochiku-workers, instead of only log/*.log.
Project Health Tracking: Nobody likes flaky tests, so now Kochiku shows stats for similar parts across recent builds to track down problematic files.
New Partitioner Framework: After some refactoring, Kochiku is getting smarter, and it starts with the Maven Partitioner.
The Maven Partitioner, and why it matters
Kochiku’s ability to run tests is entirely language agnostic, as it simply invokes a given shell script, but the real strength of the tool is in its partitioning. By using the often overlooked time_manifest feature, Kochiku can split files between workers for optimal test completion time. However while this is awesome for ruby apps, it isn’t the best way to think about all projects.
Much of Square’s Java code base lives in a single, monolithic repo, separated into many maven modules; a practice made popular by companies like Google and Facebook that allow changes across many parts of the company’s infrastructure to be pulled in atomically. Running tests for commits in such a large repo is very time consuming if not done intelligently, and the Maven Partitioner was built as an extension to Kochiku to shard our monolithic Java repo.
Maven projects are organized according to a fairly strict convention. By using knowledge of these conventions and maven’s dependency annotations, the Maven Partitioner builds and tests only modules that are impacted by the changes in a given commit. This cuts the number of modules to be tested from well over a hundred to about a dozen on average, with no loss of fault detection despite skipping entire test suites.
The changes to the partitioning logic — which made way for the Maven Partitioner — also allow for additional partitioners to be built with knowledge of other languages, build tools, and frameworks. This allows Kochiku to grow in both strength and simplicity as a tool for building varieties of projects. As Square continues to support Kochiku in the future, we’d like to add intelligence for using other technologies, such as the Go language or Twitter’s Pants Build Tool. Josh Eversmann I know just enough about computers to complain about them almost constantly.medium.com