Alexito's World

A world of coding 💻, by Alejandro Martinez

The importance of cooperative cancellation

One of the most important aspects to understand about Swift Concurrency is how cancellation of async tasks works. You may expect that when a task is cancelled it immediately stops, like when you stop a process. But that's not at all how it works, cancellation in Swift is cooperative.

Cooperative cancellation

A cooperative system is one where all the parts need to be involved, cooperate, in order to accomplish a goal. Here the goal is cancellation, and the parts are all the async tasks involved. This means that when a parent task is cancelled, its responsibility is to tell all the children that they have been cancelled.

When a parent task does this, it sets the Task.isCancelled flag to true, and that's it. That's where the work of the parent task finishes. Now is when cooperation kicks in and is up to the children to check for that flag and stop their work early.

Note that API wise there is another way of checking for cancellation, try Task.checkCancellation() which will immediately throw a CancellationError up the call stack. This is a convenient way of handling cancellation if your function is already throwing.

In other words, if the child tasks never checks for cancellation it's like the system doesn't support cancellation at all.

This is an important aspect of Structured Concurrency since it ensures that tasks are never terminated without letting them cleanup their resources and also that there is no task that is left handling on the ether, since the parent task always waits for all children to finish.

Good citizens

Cooperative cancellation has its benefits, but it also means that we need to be conscious about implementing async functions that do cooperate. Let's take a look at what impact does it have.

I wanted to implement some common scenarios with the existing tools that Swift concurrency gives us; I think they are an excellent example of how important cooperative cancellation is since they functionality that relies a lot on early cancellation.

Timeout

One of the significant advantages of structured concurrency is that it provides a way to pass context down the tree of tasks. One thing that the context can help with is timeouts. Right now that's not something that is part of the Swift's standard library but we can imagine a function that would provide that functionality:

func withTimeout(_ seconds: Double, _ f: @escaping () async -> ()) async {
    // ...
}

The way I've implemented this with the existing tools is by using the Task.sleep() functionality to have a way to know when the timeout has passed. We just need to run the passed function concurrently and cancel it when appropriate.

Do differentiate between the result of the timeout and the passed in function we can have an enum:

     enum GroupResult {
        case cancel
        case ok
    }

And because we need to run both things concurrently we need to use a TaskGroup, that will make sure Structured Concurrency is respected:

     await withTaskGroup(of: GroupResult.self) { group in
			  _ = group.asyncUnlessCancelled {
            await Task.sleep(.seconds(seconds))
            return .cancel
        }
        _ = group.asyncUnlessCancelled {
            await f()
            return .ok
        }
        // ...
    }

Note how I use the asyncUnlessCancelled variant to make sure we don't even run the tasks if it's unnecessary.

Finally, we just need await for one of the tasks to finish and cancel the rest. If the first one that finishes is the timeout, it means we are cancelling early the passed in function. If the passed function finishes first then we just cancel the sleep.

         _ = await group.next()
        group.cancelAll()

Now we can use this function:

await withTimeout(5) {
    await busyWork()
}

This will make sure that busyWork is cancelled after 5 seconds if it's still running.

Cooperation

Now let's imagine that the busyWork function is not very cooperative and doesn't check for cancellation:

func busyWork() async {
    print("Starting busyWork")
    for i in 0...100 {
        print(i)
        await nonCooperativeWork()
    }
}

As you can see this function doesn't check for cancellation, and if we assume that nonCooperativeWork doesn't check either we have a bad scenario for cooperative cancellation. Running this will result in:

Starting busyWork
0
...
5
cancel finished first
6
...
// until a 100!

This is not ideal. So let's make sure our function checks for cancellation when appropriate:

func busyWork() async {
    print("Starting busyWork")
    for i in 0...100 {
      	// Let's check for cancellation on every iteration
        if Task.isCancelled { return }
        print(i)
        await nonCooperativeWork()
    }
}

Now if we run this, we will see that the timeout has the appropriate effect:

Starting busyWork
0
...
5
cancel finished first
Program ended with exit code: 0

I hope that this illustrates the importance of being a good citizen in a world of cooperative cancellation.

Race

Let's try another example to illustrate a bit more how important this is. Imagine that we want to implement a "race" function. A function that given a couple of async functions runs them both concurrently and returns the result of the first one, cancelling the other because it lost the race. This is a very common operation, for example, if you want to show something to the user and you don't care if it's from a local database or a network call.

func firstOf<R>(
    _ f1: @escaping () async -> R,
    or f2: @escaping () async -> R
) async -> R {
    await withTaskGroup(of: R.self, body: { group in
        group.spawn {
            await f1()
        }
        group.spawn {
            await f2()
        }
        guard let first = await group.next() else {
            fatalError()
        }
        group.cancelAll()
        return first
    })
}

As before, we use a TaskGroup to run the two functions concurrently and respecting Structured Concurrency. The only difference is that the function that we are running now is not our own timeout, but a passed in function. And we just return the result from the first one that finishes.

You can see how this works very well if the passed in functions do cooperate for the cancellation. Otherwise, even if one wins the race everybody needs to wait for the others to finish. Which may be nice in a real world race but definitively not what we want in our programs.

Standard Cooperation

One thing to note is that in these examples it seems quite tiresome that we need to be checking for cancellation ourselves all the time. And that if we forget the system won't behave as desired. This is true but I think in reality it will be way less common that we expect.

In this toy example we were just performing an iteration without really calling any system function. So it makes sense that the responsibility is on us. But in real code is very likely that you always end up calling a system function, and is expected that those functions handle cancellation properly. That said, if you are building some library of async functions make sure you respect cooperative cancellation too.

Task.sleep

An interesting thing is that right now Task.sleep doesn't respect cooperative cancellation. It will wait for the entirety of the given time, even if cancellation happens early.

This is something I raised in a WWDC Lab and in the forums. Thankfully, it seems to be just a temporary inconvenience, since cancelling early makes sense and the team has expressed that this current implementation was added as basic functionality and it probably needs a bit more thought and a bit nicer API (see how I use .seconds on my examples, because it's a pain to use without that 😂).

URLSession

Instead, the new URLSession async function do follow cooperative cancellation and as soon as the Task is cancelled they also cancel the underlying request. This is very nice since is probably one of the major use cases for async that people will use.

Conclusion

I hope you understand a bit more how cancellation works in Swift Concurrency. I don't think you should worry about it too much since the system frameworks will handle that, like the URLSession case, but is still important that we all have a clear picture of it so the packages we create behave nicely on the wider ecosystem.

If you liked this article please consider supporting me