Evolution of Developer Productivity at Square - Part Three
Effective developer tools
Welcome to the third part of our series on Square's evolving approach to developer productivity. In the first two parts, we delved into the growth of our codebase and the investments we made in CI infrastructure to increase development speed. Now, we'll shift our focus to platform-specific tools and methodologies that further boost development velocity. We'll also discuss our initiatives in reliability and test engineering—critical components that help us maintain the high quality of our software as we scale. Let's dive in.
Effective Developer Tools
Let's discuss developer experience, something close to all our hearts. Surveys and our interviews indicated that the challenges with tooling were directly affecting our developers' productivity. Inspired by this feedback, we envisioned a world where every developer has access to the best tools and resources. To turn this vision into reality, we established the Developer Tools team.
Building on that foundation, we began by carefully examining our development ecosystem and quickly identified areas for improvement. One area was our on-premises source code management. Recognizing the potential for enhancement, our team took the ambitious step of migrating to GitHub. This was a massive undertaking, involving the transition of over 7,000 repositories and accommodating more than 1,000 teams. The payoff was immediate: our developers, particularly those based in Australia, experienced up to a 26% increase in code checkout speed. In addition to this migration, we collaborated closely with our security team on a Code Scanning project to help minimize vulnerabilities.
But we didn't stop there. To fully harness the potential of our move to GitHub, we set up an analytics pipeline. This was paired with Looker dashboards that provide a clear view of critical productivity metrics, such as cycle time breakdowns for each repo and organization. These insights gave us a sharper view of our workflows, pinpointing patterns and variances among teams and organizations. As a result, we've been able to have more informed conversations about where and how to drive improvements.
Square Console has been another big win. This in-house built developer portal serves as the nerve center for application management. Through this portal, developers can seamlessly deploy, monitor services, manage nodes, resources, and much more. In fact, our streamlined processes within Square Console mean that we can spin up an internally facing app and get it running in under 10 minutes. Owing to its efficiency and vital role, Square Console has become one of our top-rated internal tools.
Recognizing the pivotal role that developer tools play in productivity, our team developed a range of command line utilities. These ensure that development environments remain consistent across varied machines, simplifying collaborations on projects. One standout achievement was our ability to transform a lengthy 23-page onboarding document into a single-command setup process. To further simplify tasks, our team rolled out a CLI named “sq”, a versatile utility that consolidates various CLI tools used at Square. It streamlines many functions, from backend service management to build initiation. Additionally, we've launched a Bazel Build Results service, ensuring that developers can better understand build performance, troubleshoot issues, and make informed decisions about their codebase optimizations.
In addition to software enhancements, we also understood the significance of having high-performance hardware. Our benchmarks showed clearly that by transitioning our mobile developers from Intel laptops to iMacs, we could save them as much as seven working days per year. Just when we believed we had reached an optimal hardware setup, the introduction of the M1 chips took the industry by surprise. On the basis of our strong benchmarks and research demonstrating the vast potential of the new hardware, we obtained approval to equip every engineer with the fastest available development machines.
However, our quest for improvement isn't just about in-house tools and hardware. We also looked externally and integrated several tools into our workflow. We've introduced Bugsnag for crash reporting, Codecov.io to give us insights into test coverage, Temporal.io to handle long-running workflows, and platforms like Notion to promote team collaboration and knowledge sharing. Clockwise has been another great addition, helping optimize our schedules for longer, uninterrupted focus periods.
At the heart of all these changes has been one consistent aim: to listen to our developers. We've sought to understand their challenges and pain points, ensuring that every solution we introduce makes a real, positive difference in their daily work lives.
Key Learnings
As we transition into discussing the challenges and learnings that have shaped our journey, one key lesson stands out: the critical importance of effectively articulating the business case. While we initially assumed that the benefits of switching to faster machines would be self-evident, we quickly realized that we needed to strengthen our case with hard evidence. To address this, we joined forces with engineers from across the organization to set up automated benchmarking and collect data as well as first hand feedback. The results were compelling: faster machines led to quicker feature development, more reliable applications, happier teams, and increased attractiveness to potential talent. This exercise highlighted not just the importance of presenting facts, but also the value of connecting those facts in a way that resonates with the perspectives and priorities of our stakeholders.
And here's the second bit: Relationships matter, especially in our line of work. We often collaborate with the same stakeholders project after project. Our success with the machine upgrades wasn't just about the tech; it was about the trust we built. By the time we started discussing the M1s, our GMs were already on board, trusting our judgment and execution. Our takeaway is to always nurture and value relationships. Each project lays the groundwork for the next, and having that foundation of trust makes everything smoother.
Here's another brief insight. We rolled out Cloud IDE's, a remote development environment for our Android developers. With JetBrains Projector feature, you could run Android Studio on an EC2 instance which came with Android Studio, Android SDK, Java, and Bazel pre-installed so that you didn’t have to deal with configuring and setting it up. You could choose up to 64-core machines and 96GB memory. Then, we introduced the option of using iMacs. And interestingly, even with a trade-off in peak performance, developers gravitated towards the iMacs. Why? It offered a more seamless development experience. The key lessons here? First, given a choice, developers often prioritize a smoother development experience even if it comes at the slight expense of performance. And second, it's a testament to our commitment: we're more than willing to pivot, even if it may impact our own initiatives, as long as it best serves our users.
Last but not least, in our day-to-day focusing on developer productivity, we're obsessed with metrics ranging from CI and local build times to IDE sync times, main branch stability, tooling error rates, tool adoption and retention, and even developer satisfaction. However, we have come to realize that numbers alone can't capture the full scope of our progress. For instance, during our recent two-month trial with GitHub Copilot, many engineers reported feeling more productive, even though our pre-defined success metrics like the time from cutting a feature branch to submitting pull requests didn’t actually move. This experience underscores the importance of complementing hard metrics with qualitative insights and feedback. Therefore, we facilitate biannual developer surveys which provide invaluable context. It's a balanced approach—while metrics provide key data points, the human perspective is equally critical for a comprehensive understanding of developer productivity.
Wrapping Up
As we conclude this part of our series, we've explored how Square has revolutionized developer experience by introducing advanced tooling, optimizing hardware, and integrating external platforms to enhance productivity. As we move forward, stay tuned for the next installment, where we'll dive into how we've invested in reliability and test engineering.