Alexito's World

A world of coding 💻, by Alejandro Martinez

The importance of cooperative cancellation

One of the most important aspects to understand about Swift Concurrency is how cancellation of async tasks works. You may expect that when a task is cancelled, it immediately stops, like when you kill a process. But that's not at all how it works, cancellation in Swift is cooperative.

Updated 06/01/2022: Task.sleep now respects cancellation. Improved the timeout example. Linked to Baggins package.

Cooperative cancellation

A cooperative system is one where all the involved parts need to cooperate in order to accomplish a goal. Here, the goal is cancellation, and the parts are all the async tasks involved. This means that when a parent task is cancelled, its responsibility is to tell all the children that they have been cancelled.

When a parent task does this, it sets the Task.isCancelled flag to true, and that's it! That's where the work of the parent task finishes. At this point is when cooperation is needed and is up to the children to check for that flag and stop their work early.

Note that API there is another way of checking for cancellation, try Task.checkCancellation() which will immediately throw a CancellationError up the call stack. This is a convenient way of handling cancellation if your function is already throwing.

In other words, if the child Tasks never check for cancellation, it's like the system doesn't support cancellation at all.

This is an important aspect of Structured Concurrency since it ensures that tasks are never terminated without letting them cleanup their resources and also that there is no task that is left hanging on the ether, since the parent task always waits for all children to finish.

Another aspect where cooperation is necessary is to avoid resource starvation on long running tasks. But that's a topic for another day.

Good citizens

Cooperative cancellation has its benefits, but it also means that we need to be conscious about implementing async functions that cooperate. Let's take a look at what impact it has.

I want to show you how to use the tools that Swift Concurrency gives us to implement of a couple of common requirements when dealing with asynchronous work: timeouts and races. I think they are an excellent example of how important cooperative cancellation is since this functionality relies a lot on early cancellation.

Timeout

One of the significant advantages of structured concurrency is that it provides a way to pass context down the tree of tasks. One thing that the context can help with is timeouts.

Right now that's not something that is part of the Swift's standard library, but we can imagine a function that would provide that functionality:

func withTimeout(
    _ seconds: Double,
    _ work: @escaping () async throws -> Void
) async throws {
    // ... 
}

The way I've implemented this with the existing tools is by using the Task.sleep() functionality to have a way to know when the timeout has passed. Because we need to run both things concurrently, we need to use a TaskGroup:

     try await withThrowingTaskGroup(of: TimeoutResult.self) { group in
        group.addTask {
          try await work()
        }
        group.addTask {
            try await Task.sleep(seconds: seconds)
        }
        // ...
    }

To get the behaviour we want, we add a child task to the group and another "control" task. Both will run concurrently thanks to the TaskGroup and the group scope won't end until both finish. The trick then is to have a sleep in the control task, so it ends at the moment our timeout is reached.

Finally, we just need to await for one of the tasks to finish and cancel the other one.

         try await group.next()
        group.cancelAll()

If the first one that finishes is the timeout, it means we are cancelling early that the given work function. If the passed function finishes first, then we just cancel the sleep.

Now we can use this function:

await withTimeout(5) {
    await busyWork()
}

This will make sure that busyWork is cancelled after 5 seconds if it's still running.

Cooperation

Now let's imagine that the busyWork function is not very cooperative and doesn't check for cancellation:

func busyWork() async {
    print("Starting busyWork")
    for i in 0...100 {
        print(i)
        await nonCooperativeWork()
    }
}

As you can see, this function doesn't check for cancellation, and if we assume that nonCooperativeWork doesn't check either, we have a dangerous scenario for cooperative cancellation. Running this will cause the following output:

Starting busyWork
0
...
5
cancel finished first
6
...
// until a 100!

This is not ideal. So let's make sure our function checks for cancellation when appropriate:

func busyWork() async {
    print("Starting busyWork")
    for i in 0...100 {
      	// Let's check for cancellation on every iteration
        if Task.isCancelled { return }
        print(i)
        await nonCooperativeWork()
    }
}

Now, if we run this, we see that the timeout has the effect we desire:

Starting busyWork
0
...
5
cancel finished first
Program ended with exit code: 0

I hope that this illustrates the importance of being a good citizen in a world of cooperative cancellation.

Note that this is a very simplistic implementation of the timeout. To start, a real implementation would let you return a value from the closure, but besides that it would handle edge cases and inheriting the timeout through the call stack. You can see my implementation of it in Baggins/Concurrency.

Race

Let's try another example to illustrate a bit more how important this is. Imagine that we want to implement a race function. A function that given a couple of async functions runs them both concurrently and returns the result of the first one, cancelling the other because it lost the race. This is a very common operation, for example, if you want to show something to the user and you don't care if it's from a local database or a network call.

func firstOf<R>(
    _ f1: @escaping () async -> R,
    or f2: @escaping () async -> R
) async -> R {
    await withTaskGroup(of: R.self) { group in
        group.addTask {
            await f1()
        }
        group.addTask {
            await f2()
        }
        guard let first = await group.next() else {
            fatalError()
        }
        group.cancelAll()
        return first
    }
}

As before, we use a TaskGroup to run the two functions concurrently and respecting Structured Concurrency. The only difference is that the function that we are running now is not our own timeout, but a passed in function. And we just return the result from the first one that finishes.

You can see how this works very well if the given functions cooperate for the cancellation. Otherwise, even if one wins the race, everybody needs to wait for the others to finish. Which may be nice in a real world race, but definitively not what we want in our programs.

It may be interesting to have a select operation that works with unstructured concurrency and leaves detached tasks dangling.

Standard Cooperation

One thing to note is that in these examples, it seems quite tiresome that we need to be checking for cancellation ourselves all the time. And that if we forget, the system won't behave as desired. This is true, but in reality it is less of an issue that it may seem.

In this toy example, we were just performing an iteration without really calling any system function. So it makes sense that the responsibility is on us. But in real code, it is very likely that you always end up calling a system function, and is expected that those functions handle cancellation properly. That said, if you are building some library of async functions, make sure you respect cooperative cancellation too.

Task.sleep

Task.sleep respects cooperative cancellation. That is, it will throw a cancellation error and finish early if its current task is cancelled.

During the initial betas of Swift Concurrency, Task.sleep was non-throwing, and it didn't cancel early. I personally raised this in a WWDC Lab and in the forums. It got fixed before the official release.

URLSession

The new URLSession async functions follow cooperative cancellation and as soon as the Task is cancelled they also cancel the underlying request. This is very nice since is probably one of the major use cases for async that people will use.

Conclusion

I hope you understand a bit more how cancellation works in Swift Concurrency. I don't think you should worry about it too much since the system frameworks will handle that, like the URLSession case, but it is still important that we all have a clear picture of it so the packages we create behave nicely on the wider ecosystem.

If you liked this article please consider supporting me