Async recursion in Rust

Table of Contents

Getting Started

Writing Rust usually brings happiness, but sometimes, such happiness can be abruptly cut short when encountering new language quirks. However, the beauty is that the more such quirks you run into, the easier things become.

One such quirk is calling an asynchronous recursive function. While synchronous recursion might be ideal for some applications, it becomes critical to know their impact on other asynchronous code execution.

Synchronous Traversal

Let’s take an elementary example where we try to traverse all the files in a directory and print every output to stdout. The synchronous version of the code will be something like this:

use std::path::Path;
use std::fs::read_dir;
use std::io;

fn traverse(path: impl AsRef<Path>) -> io::Result<()> {
    let path = path.as_ref();
    println!("{}", path.to_string_lossy());
    
    if path.is_dir() {
        for entry in read_dir(path)? {
            traverse(entry?.path());
        }
    }
    
    Ok(())
}

The above version of the code works; for many use cases, this will work flawlessly. However, this could pose a big problem in a multi-threaded context: IO-blocking. The problem is that the code is running synchronously and, depending on the circumstance, has a likelihood of preventing the executor from driving other tasks.

Concurrency

An asynchronous chunk of work is referred to as Future in Rust. These data types are stateless by default and often require an executor to drive the internal state from start to finish. You can think of them as a state machine.

If a finite number of futures, $F_0 ... F_k$ will be executed on a machine with available core threads, $T_1...T_i$, where $i < k$, with a constraint to complete the execution as quickly as possible, it calls for an efficient way to manage the schedule each of those tasks for execution. People could approach the scheduling problem differently depending on who you speak with.

<aside> <img src="/icons/light-bulb_yellow.svg" alt="/icons/light-bulb_yellow.svg" width="40px" /> A dedicated core thread is spawned for each available CPU core on the machine.

</aside>

<aside> <img src="/icons/light-bulb_yellow.svg" alt="/icons/light-bulb_yellow.svg" width="40px" /> IO-intensive workloads are different from CPU-intensive workloads. The tokio doc provides a good explanation of the difference.

</aside>

One efficient way is to progress on a subset of the future in tiny bits at every point. Work on a subset of futures, suspend one or more after a while, jump quickly to another future, make some progress, jump back, and repeat until all the tasks have been completed.

<aside> <img src="/icons/light-bulb_yellow.svg" alt="/icons/light-bulb_yellow.svg" width="40px" /> Resources are limited; instead of spending a long time on a single task, we want to progress on each task almost simultaneously (concurrently).

</aside>

Executor

As I mentioned earlier, a Future needs an executor to complete it. The Rust ecosystem provides a few options, but one very common option is the tokio crate. The tokio crate offers several utilities for IO-intensive workloads.

<aside> <img src="/icons/light-bulb_yellow.svg" alt="/icons/light-bulb_yellow.svg" width="40px" /> I’ll primarily use tokio as my default executor for the rest of this article.

</aside>

Providing a distinction between how tokio handles varying workloads is outside the scope of this article. Refer to the docs for a more thorough explanation. Most executors will provide an efficient way to execute either of these workloads so that it doesn’t affect the execution of other asynchronous tasks.