Swift async/await: do you even need a queue?

Alternative title: A concurrency limiter for Swift async/await

This post is really for me and my future self to help clarify my thinking about modeling asynchronous operations in Swift async/await. But I suppose you can read along as well.

In the world prior to async/await I used dispatch queues and OperationQueues to model asynchronous operations. As a result, my thinking about async naturally reverts to the hammer of execution queues. However, I’m starting to realize execution queues are sometimes not a good fit for modeling async operations in Swift async/await. In this post, I’m going to walk through an example where I found that an execution queue wasn’t what I wanted. Instead I wanted to simply limit the number of Tasks that could concurrently pass through a certain block of code. I’ll end with how I solved the limiter problem.

An example of my broken thinking

To return to my previous post’s example, let’s say I’m building an image cache. Part of the caching infrastructure is to actually download the image from the network. Since I can overload the network if I try to download too many images at once, I want to limit this part to ensure only so many download operations are happening concurrently.

The “classic” way to handle the limiting is to schedule the image download operation in an OperationQueue and make sure it has its maxConcurrentOperationCount set appropriately. Therefore, naturally, I turned to execution queues as a way to solve my concurrent downloads problem. I tried to build a Swift async/await “native” version of a concurrent execution queue. It got something that “mostly” worked, for some definition of “mostly”. And “work”. But what I couldn’t make work is a clean way to get the downloaded image back to the caller.

When I run into difficulties like this, I stop and try to figure out why it’s difficult. Usually it’s because there’s a mismatch between my mental models, and I’m trying to force a square peg into a round hole.

For my image cache, each fetch image operation is modeled as a Task. In pseudo-code something like this:


Task.detached {
    if let image = await memoryCache.image(for: url) {
        return image
    }
    if let image = await diskCache.image(for: url) {
        return image
    }

    downloadQueue.run {
        // This block actually runs in Task created by the Queue
        let image = await network.downloadImage(for: url)

        // TODO: how do I get `image` back to the calling Task?
    }

    //...
}

Things flow well until I need to actually download the image and want to limit how many concurrent ones via the downloadQueue. How do I get the image back to the calling Task? I mean, I guess I might figure out a way to make a callback work? That feels super janky, and possibly not safe.

However, looking at this code is when I first realized a couple of things. First, I was switching execution contexts to do the download. Why? I have a perfectly good detached Task already that I would prefer to be executing in. Tasks are supposed to be “recipes”, i.e. a list of steps to perform, which should include the actual download. So, I didn’t need the execution part of an execution queue. Second, what I actually needed it for was limiting the number of concurrent operations; that was its real purpose.

Therefore, my actual problem is I need a way to limit how many concurrent Tasks are calling await network.downloadImage(for: url) at the same time. Can I solve that problem instead?

A concurrency limiter

I’m going to solve this problem of queues with a queue. Obviously. The difference is this queue won’t provide any execution. The calling Task will do that.

I like starting with what I’d like to have at the call site:


Task.detached {
    if let image = await memoryCache.image(for: url) {
        return image
    }
    if let image = await diskCache.image(for: url) {
        return image
    }

    let image = downloadLimiter.run {
        // This runs on the calling task. No extra tasks!
        await network.downloadImage(for: url)
    }

    //...
}

The key bit is the closure is passed to run is executed on the calling Task. That makes returning the image back trivial. A call site like the above example would imply an interface something like:


public final class ConcurrencyLimiter {
    /// Concurrency is the maximum number of concurrent blocks to allow to run
    public init(concurrency: Int) 

    /// Execute the given block on the calling Task. It may wait first if there
    /// are already the maximum blocks running.
    public func run<Value>(_ block: @escaping () async throws -> Value) async throws -> Value
}

It has an init so it can know the maximum number of concurrent blocks, and then just a run method that executes the passed in closure. There’s also a non-throwing variant of run but it’s a simplified version of the throwing one, so it’s less interesting.

I’ll start on the outside and work my way inside to the details of the implementation. Here’s how I built run:


public func run<Value>(_ block: @escaping () async throws -> Value) async throws -> Value {
    let condition = await counter.enterLimiting()
    await condition.wait()

    do {
        let value = try await block()
        await counter.exitLimiting()
        return value
    } catch {
        await counter.exitLimiting()
        throw error
    }
}

First things first: counter is an actor owned by the ConcurrencyLimiter. It does all the… counting… of how many tasks are running at one time. It’ll also decide which task goes next. So maybe I should have called it Scheduler? This is chaos that happens when you don’t have code reviews. smh.

run calls into counter to ask to enter the limited section. The counter returns a Condition which the calling Task immediately waits on. The Condition is described in my previous post, and is what allows Counter to control which Task goes next. Once the Task returns from the call to condition.wait() it executes its code. Before returning or throwing it calls back into counter on exitLimiting() to let it know this Task is done.

That’s it for run. Most of the heavy lifting is in Counter so I’ll look at that next.


final actor Counter {
    /// The maximum amount of currency allowed
    private let concurrency: Int
    /// How many blocks are currently in flight
    private var inflightCount = 0
    /// Pending (blocked) blocks in a FIFO queue.
    private var pending = [Signal]()

    init(concurrency: Int) {
        self.concurrency = concurrency
    }
}

There’s not much data needed by the scheduler. Um… I mean Counter. Counter. It has the maximum number of concurrent tasks allowed, how many are actually in flight right now, then an array of Tasks in priority order represented by the Signal that will unblock them. That’s it.

enterLimiting() is called by the Task when it wants to enter the limited section:


func enterLimiting() -> Condition {
    let shouldWait = inflightCount >= concurrency

    let (condition, signal) = Condition.makeCondition()
    if shouldWait {
        pending.append(signal)
    } else {
        // immediately signal and let it run
        inflightCount += 1
        signal.signal()
    }

    return condition
}

This method is in an actor, and there are no awaits in here, so this method happens without reentrancy. That’s important. It computes if the caller can immediately execute or it should wait. Either way, it creates a Condition/Signal pair so it can return the Condition. If the calling Task can execute immediately, it acts like it has already started by incrementing in the inflight count and pre-signaling (i.e. unblocking) the Condition. If the calling Task needs to wait, the Signal is appended to the end of the waiting queue.

The other end of the process is when the calling Task leaves the limited section.


func exitLimiting() {
    inflightCount -= 1
    let shouldUnblock = inflightCount < concurrency

    guard shouldUnblock, let firstPending = pending.first else {
        return
    }
    pending.removeFirst()
    inflightCount += 1
    firstPending.signal()
}

It immediately updates the inflight count and determines if it can unblock any of the pending Tasks. If it can and there are tasks waiting, it counts the Task as started and unblocks by signaling the Task‘s condition.

You might have noticed this is a simple FIFO queue for prioritization. However, really any prioritization scheme could be substituted in.

Finally, I put a little bit of “what if” insurance in the deinit of the actor. Basically, what happens if the Counter goes out of scope, but there are still Tasks blocked on it?


deinit {
    let localPending = pending
    pending.removeAll()
    for local in localPending {
        local.signal()
    }
}

I make a local copy because I’m just extra paranoid (probably don’t need that), then go through and unblock everything. Presumably no callers care anymore if we’re going out of scope, but I don’t want to leave any Tasks blocked.

(As an aside, I remember from my undergrad days this kind of data structure being called a “monitor.” But Wikipedia says I’m wrong.)

Conclusion

In this post, I described my learning journey where I realized that I didn’t need an execution queue like OperationQueue for my particular problem. Instead, I wanted to let each Task bring its own execution, and just needed a queue to limit the number of concurrent executions. Finally, I demonstrated a simple implementation of a ConcurrencyLimiter using the Condition/Signal pair I introduced in my previous post.

Modeling condition variables in Swift async/await

In the brave new world of Swift async/await, a problem I sometimes encounter is how to communicate between Tasks. More specifically, if I need one Task to wait on a condition to be true as a result of work that’s done by another set of Tasks. In this post I’ll cover how I eventually ended up solving this using something like a condition variable.

Refining the problem scope

There’s more than one way to handle Task synchronization, so I want to start by refining the problem that I’m trying to solve. The hope is it’ll be clearer why I chose the solution I did.

One of the easiest ways to have one Task wait on another’s work is to package up that work in it’s own Task then just await that. Here’s a oversimplified example of a pattern that I’ve used:


final actor Thingy {
    private let loadTask: Task<Void, Never>

    /// User might call this at some point, repeatedly
    func processData() async {
        // Waiting on the task to ensure it's done before continuing
        await loadTask.value

        // Ok, now I can do my processing safe that we've loaded
    }
}

In this example, loading the data can take a while and I want to make sure it’s loaded before doing any on-demand processing of it. I also don’t want to do the load repeatedly, so I wrap it up in Task and set it as a property. Any Task that needs load to be completed just awaits the value. (This is an extremely contrived example to demonstrate a pattern. Don’t email how you’d refactor it.) The point here is awaiting the result of a Task is a nice, straight forward mechanism for synchronizing work between two tasks.

However, there’s a variation of this problem that I want to solve in this post. It’s not that a specific task has completed, but a condition has become true. That sounds subtle, but there’s actually a big difference. Multiple Tasks may be working and any one them might make the condition become true.

For example, suppose I want to limit the number of concurrent Tasks that are downloading images to be no more than 10. When a Task starts to download an image it needs to know if there’s less than 10. If there are 10 or more, it wants to wait until one of the existing Task has finished. This is where awaiting on a single Task as a solution falls apart. The waiting Task actually wants to know when the concurrency count goes below 10, not when a specific Task completes (although they are correlated). It can’t await all the executing Tasks because it doesn’t need to wait until all Tasks complete. If it awaits a single Task, it also might wait longer than necessary if a different Task from the one its awaiting completes first. Of course this is all ignoring that there could be many Tasks waiting for the concurrency count to go below 10, and how do I ensure only one of them continues when it drops to 9 concurrent?

(Also, I know the classic way to solve the image download concurrency limiting is to use OperationQueue. But in a future post, I’m going to argue that’s not ideal in an async/await world for this problem.)

To summarize: I want to have a Task to wait on a condition to become true, which isn’t the same as waiting on a Task to complete.

Classic approach

If I was still in the bad old days of pthreads, I’d reach for a condition variable. I’d have the queued Task wait on it, and when the concurrent count dropped below 10, the just completed download Task would signal the condition variable. But I can’t actually use pthread condition variables because they’d block the underlying system thread, which would hang one of the thread resources used by Swift’s cooperative pool. Then everybody would just be sad.

However, what if I could construct something similar to a condition variable that stays in async/await land? It wouldn’t actually need to do everything a traditional pthread condition does, like make sure only one Task unblocks because I could handle that manually.

Let me sketch out what that might look like:


struct Condition {
    func wait() async {}

    static func makeCondition() -> (Condition, Signal)
}

final class Signal {
    func signal() {}
}

With this approach, the Task that wants to download an image can call into the queue that knows how many downloads are currently going. The queue constructs a Condition/Signal pair, keeps the Signal and returns back the Condition to the calling Task. The calling Task then wait()s on that Condition. When the queue decides it’s time for that specific Task to go (it can keep a prioritized array of Signals), it calls signal() on the Signal and the paired Condition unblocks.

This behavior would solve my stated problem: my Task could wait on a condition to become true, instead of waiting on a specific Task.

Implementation

I chose to use an AsyncStream to build this functionality. Basically, the Condition.wait() is going to sit in an async/await loop waiting on the AsyncStream to complete. The Signal.signal() calls finish() on the AsyncStream‘s continuation to unblock it.


/// Signal is used in conjunction with Condition. Together they allow
/// one Task to wait on anther Task.
public final class Signal {
    private let stream: AsyncStream<Void>.Continuation

    /// Private init, don't call directly. Instead, use Condition.makeCondition()
    fileprivate init(stream: AsyncStream<Void>.Continuation) {
        self.stream = stream
    }

    /// Signal the waiter (who has the Condition) that they're good to go
    public func signal() {
        stream.finish()
    }
}

/// Condition allows two async Tasks to coordinate. Use `makeCondition()` to
/// create a Condition/Signal pair. The Task that wants to wait on something to
/// happen takes the Condition, the Task that notifies of the condition takes
/// the Signal.
public struct Condition {
    private let waiter: () async -> Void

    /// Private init; create a closure that will can be waited on
    fileprivate init(waiter: @escaping () async -> Void) {
        self.waiter = waiter
    }

    /// Wait on the condition to become true
    public func wait() async {
        await waiter()
    }

    /// Construct a Condition/Signal pair. The Task that wants to wait on something to
    /// happen takes the Condition, the Task that notifies of the condition takes
    /// the Signal.
    public static func makeCondition() -> (Condition, Signal) {
        let (stream, continuation) = AsyncStream<Void>.makeStream()
        let condition = Condition {
            for await _ in stream {}
        }
        let signal = Signal(stream: continuation)
        return (condition, signal)
    }
}

Conclusion

In this post I described a variation of a Task synchronization problem. In this variation a Task wants to wait on a condition to become true, as opposed to a Task being completed. I then introduced simplified version of a traditional synchronization mechanism called a condition variable as a mechanism for solving this problem. Finally, I demonstrated a working async/await solution using AsyncStream.

Read the follow up post.