How do we optimise Scala build times? by James Thompson

February 14, 2024

In the How do we optimize Scala build times? presentation by James Thompson, a developer at the Scala Center discusses various techniques for optimizing Scala build times using standard ecosystem tools. Scala, a functional programming language that runs on the Java virtual machine, has a complex compiler with over 100 phases of transformation, leading to lengthy build times. James emphasizes the importance of efficient builds and provides tips on arranging projects to take advantage of CPU resources. He introduces Zinc, an incremental compiler for Scala, which uses a name hashing algorithm to efficiently compute dependencies and compile only necessary files. James also discusses the benefits of using small files, parallelism, and pipelining to improve build times.

How do we optimise Scala build times?: A comprehensive overview

Understanding Scala’s compilation complexity

Scala, a functional programming language that runs on the Java Virtual Machine (JVM), boasts a complex compiler with over 100 transformation phases. This complexity necessitates efficient build processes to harness CPU resources effectively. James Thompson kicks off his discussion by highlighting the intricate nature of Scala’s compiler and the mission of the Scala Center to support and improve the open-source Scala ecosystem.


Incremental compilation: A time-saving technique

One significant pain point in Scala development is the time-consuming nature of repeated compilations triggered by even minor code changes. James introduces incremental compilation as a solution. This technique recompiles only the affected files when a change is made, rather than rebuilding the entire project. By doing so, it significantly reduces build times while ensuring that the program remains functional and correct.


The role of Zinc in incremental compilation

Zinc, an incremental compiler for Scala, plays a crucial role in optimizing build times. James explains how Zinc uses a name hashing algorithm to maximize correctness and performance. When a file is edited, Zinc leverages a cache to determine what needs to be recompiled, allowing it to reuse previous compilation products efficiently. This approach minimizes unnecessary recompilations and maintains a high level of build efficiency.


Advantages of smaller files and analysis Jars

James advocates for the use of smaller files in Scala projects. The Scala compiler processes files individually, and smaller files result in fewer inputs, speeding up the build process. Additionally, he introduces the concept of the analysis jar, a cache that tracks file stamps, API structures, and dependencies. This information helps the compiler determine which files need recompilation based on changes, further optimizing build times.


Name hashing algorithm and dependency graphs

Efficient dependency management is vital for optimizing build times. James delves into the Name Hashing algorithm, which tracks dependencies on function signatures, classes, and specific usages. By building a dependency graph, the algorithm identifies what needs to be recompiled, ensuring that only necessary files are processed. This method, implemented in tools like SBT, enhances the efficiency of the build process.


Parallelism and pipelining for faster builds

James explores advanced techniques like parallelism and pipelining to further reduce build times. By splitting projects into smaller modules and scheduling them to run in parallel, developers can better utilize CPU resources. James’ work on a pipelined build model, where downstream projects start compilation before upstream projects are complete, exemplifies this approach. His experiments, such as with the Scalar 3 compiler, show significant time savings, especially for large projects with complex dependencies.


Practical examples and benchmarking

James provides practical examples and benchmarks to illustrate the effectiveness of these optimization techniques. For instance, his implementation of parallelism in the Le Chess project resulted in a 25% speed boost. Similarly, the Scala de project saw a 31% improvement due to its layered dependencies. However, he also notes that not all projects benefit equally, as seen with the Guardian newspaper’s website, where the scheduler faced limitations due to many small modules.


Outlining and pipelining in compilation

Further enhancing build times, James discusses his experiment with outline compilation, which speeds up type checking by skipping definition bodies. Combining outlining with pipelining, he achieved a 1.5x speed-up in compiling the “tasty core” project using Scala 3. These techniques, still under refinement, promise substantial improvements for complex Scala projects.


Future goals and community engagement

James outlines his goals for the future, focusing on optimizing build times and exploring new techniques like pipelining and parallelism. He acknowledges the challenges posed by the compiler’s mutability and micro-optimization but remains committed to enhancing Scala’s build efficiency. James also invites the community to join the Scala Center’s Discord server for further discussion and collaboration on optimization strategies.

In conclusion, James Thompson’s presentation offers valuable insights into optimizing Scala build times through incremental compilation, dependency management, parallelism, and innovative techniques like pipelining and outlining. By adopting these strategies, Scala developers can significantly improve their build efficiency, enhancing productivity and the overall development experience.


Additional resources

Check out more from the MeetUp Func Prog Sweden. Func Prog Sweden is the community for anyone interested in functional programming. At the MeetUps the community explore different functional languages like Erlang, Elixir, Haskell, Scala, Clojure, OCaml, F# and more.