Experimenting with a Query Resolver System
Sometimes some topic gets in my head and doesn’t leave until I’ve gone far enough to satisfy my curiosity. Usually this ends up with me spending days writing some code that just goes to the trash once I’m done with it, but today I decided to share a bit of one of these rabbit holes I went down to.
The weird thing is that I’m not sure what I was trying to do is even called, because to be honest, I’m not even sure what I was trying to do… Is this what it feels to do R&D?
I think that watching Jonathan Blow Jai livestreams was what planted the seed. In some of those streams, he talked about how the compiler it’s like a game loop that keeps trying to resolve dependencies. This allows the language to be very flexible with declaration order. As long as the information is there, it will eventually solve all the declarations. Or that’s what I understood ^^ In my mind, this spawned thoughts about “query resolvers”.
At the time of writing this there is no website for Jai so the only knowledge I have is from watching Jon on stream. And I never remember in which one he talked about this topic ^^’
What I wanted to work with was a system where every time there was a query, the answer would come eventually, even if it was not immediately there. The premise is simple, but without a proper framework, it seems quite hard to work with. It’s not the usual way of working. What you typically would do is to check in a Dictionary and if the answer is not there, well… bail out. The “try again later” is the tricky part, and I wanted to find a framework for it.
I’m sure the Jai compiler implements a delightful way of doing it that’s very performant (otherwise they couldn’t get those below-second compilation times!). But I don’t know what I’m doing here and I just wanted to have fun! And play more with Swift Concurrency!
After doing some research, I saw some examples of what I was thinking about, or at least close to it.
There is a concept called incremental computation, see this Rust library Salsa as an example. I’m not exactly sure if it’s what I was looking for or not, but seems related. What is not clear to me is if this system solves the “eventually consistent” part of what I was looking for, instead it seems focused on memoizing the results of operations to reuse them, ideal for incremental compilation!
Talking about compilers, the topic of query-based compiler architectures is a very interesting read. I found out that Rust has a demand-driven compilation system (see the docs) and even Swift seems to be moving in this direction with its request-evaluator system.
The good thing about these rabbit holes is that even if I get nothing useful, I definitely learned a ton of interesting things along the way!
Example
Let’s visualize a bit what kind of problem I was trying to solve with one example I’ve been playing with. Imagine a toy system of files where each file can contain some text, imports (other files), declarations, and usages (of these declarations).
[
"main": File(contents: [
.text("Start of Main"),
.usage("A"),
.usage("B"),
.import("other"),
]),
"other": .init(contents: [
.text("Other file."),
.import("types"),
]),
"types": .init(contents: [
.text("Declaring types:"),
.declaration("A", "Amazing"),
.declaration("B", "Burr!"),
]),
]
You can see how the main file uses A
and B
before seeing their declaration. The declaration comes from a chain of imports. The goal then was to code a system that could get the final output (concatenating the strings) of this example without having to keep any state or order dependent code.
Developing Prototypes
I started prototyping a Query Resolver where each query would perform a piece of computation. Pretty quickly, I found that the tricky part was how to pause the program when a query was still pending.
My first approach was to follow a “game loop” like system where on each iteration queries could be added, or answers from it could be obtained. This got quite far, but it had some problems. The loop had to be careful with the order of the queries because if the same query was executed twice before another had responded the system could end up adding more queries ad infinitum ^^ This can of course be solved but the big issue was the usage side since the code was not very nice.
func resolve() {
if // check if the answer you are looking for exists
// do what you want...
else
// enqueue the query that will get you this...
}
This is a literal translation of the “try again later” style. But it got tiresome quickly. Plus, now that we have async await in Swift… wouldn’t that help? await
is literally a way to pause a function until it gets an answer. So I quickly jumped on that!
That took me down the path of trying how you can pause functions until some data exists. Ultimately, the surface area of async/await doesn’t offer low level APIs to control the coroutines easily. Task
has a yield()
but that doesn’t help a lot in this scenario, especially if it’s in a hot loop. I eventually end up realising how the continuation API is what I was looking for!
With a continuation, you can await
on a function call and resume it later when you are ready for it. It took me a bit to see how to use it to solve this, since the most common usage of them is to convert from a callback based API:
await withCheckedContinuation { continuation in
callbackBasedFunction(onCompletioin: {
continuation.resume...
})
}
But in this case there are no callbacks, it was just data stored in a cache (Dictionary) that eventually shows up. The solution is to keep around the continuation, but that’s easier said than done. Eventually I ended up making a small AsyncChannel
type that took care of that. Read more here Building a Channel with Swift Concurrency Continuations.
With that piece solved, implementing the Resolver was quite straightforward.
One independent piece of the Resolver is the
AsyncStorage
type. It’s the core part of the system that usesAsyncChannel
to await for the data to be in the Dictionary. It may be useful for other scenarios.
Usage
The basic idea is to represent computations with queries that have a key associated:
struct ProcessFile: Query {
var key: String { fileName }
let fileName: String
func resolve(_ resolver: AsyncResolver) async throws {
let contents = try fs.readFile(fileName)
let textParts: [String] = await contents.content.concurrentMap { c in
...
let content = await resolver.query(
ProcessFile(fileName: dependency),
)
}
await resolver.save(textParts.joined(separator: " || "), for: key)
}
}
You can see some points on the example:
- The
key: String
identifies this query. The resolver uses this to return a previously computed output of the same query if available. It could be used for more things but I didn’t get that far… - The
resolve
function is where you implement the computation. await resolver.query(...)
will add the query to the system and wait for its resolution.await resolver.save(..., for: key)
saves the result of the computation.- There is a
await resolver.addQuery(...)
that just adds the query without waiting for its resolution. In fact,query(...)
is a combination of bothaddQuery
andread
.
The important bit is to understand that a query doesn’t have to know how to get the answers for the entire chain. As long as some query in the system has the answer, it will work.
When the result of a computation is saved, every other query in the system that depended on that key would get resolved and computation will continue.
Another important piece here is the concurrentMap
. If a computation has dependencies in other queries, it should try to request the results concurrently. This unlocks the power of resolving queries out of order. If you ask for otherwise independent computations in series, you won’t get much benefit.
Function like syntax
Having to do query(Query)
feels a bit cumbersome. Following the goal of making this as “normal” as possible, I ended up wrapping these calls in functions. This allowed for a much more pleasant syntax:
let result = await resolver.processFile(name: "main")
From the usage side, the only unusual thing is the resolver. The rest seems like a normal async call. You still need to reach for the direct calls for when you want to add a query without waiting for the result, or when you just want to provide results to the system. I’m sure better APIs could be built around that.
Salsa uses Rust’s powerful meta-programming capabilities to create these functions for you.
Further improvements
So the system works, at least for the toy examples I tried 😂 But if this was a real project there would be plenty of improvements to make. For example, cycles are not detected, so if you add a cycle in the data, the program will halt for ever. I think that this is usually solved by keeping track of which queries add other queries.
I recommend you to read the linked articles above if you are curious what a proper system is capable of.
Curiosity satisfied!
So that’s all! You can check the code on the Query Resolver System repository, including the prototypes. Don’t expect high-quality code ^^’
I love getting lost in these experiments and this one got quite far. I didn’t expect I could follow through it until the point where I was satisfied with the usage of it and it worked! But I did ^^. I would still love to work on detecting cycles, but I have other interesting things on the to-do list, so I will just leave it here for now.
If you have gotten this far, thank you! 🙏 I hope you found this experiment interesting.