Swift 3, linked lists and sentinels
I’ve always been a big fan of Linked Lists. Maybe is because it was one of the first data structures that I learn how to implement and they were my introduction to the fascinating world of pointers. Nowadays I hardly think about pointers in my daily work. In Objective-C you just treat them as the objects you want to message with, and in Swift the lovely * has disappeared. But I still feel nostalgic sometimes.
I decided to spend some time enjoying writing a Linked List again after so many years, but this time in Swift. I wrote one a while ago based in enums, the usual way to show up Swift enum capabilities. But this time I wanted to recover the C spirit and although we don’t have pointers directly in our languages, we still have classes that gives us those reference semantics capabilities.
I got inspired by the post Swift Algorithm Club: Swift Linked List Data Structure. I grabbed their implementation as the first pass but I wanted to see how the internal logic can get simplified using a sentinel node instead of dealing with head, tail and next nodes that may be nil.
Building a linked list using a sentinel allows us to remove quite some checks in the code and simplifies a lot some operations. The trick with the sentinel is that its next and previous pointers point to itself. This is the key that avoids a lot of checks for null dereferencing. You only need to check if is the sentinel while traversing the list, to know when to stop. The rest of operations can work without checks assuming that there is always a next and previous node.
In order to let the internal code know if a node is the sentinel I decided to have the value of the node optional. As this is only an implementation detail I made sure to have a non-optional value property that gets used when is necessary to expose the value of the node.
With the next and previous pointers is necessary to play a little with Swift enforcements. Usually what you want is to create the node, and then make next and previous point to itself. The issue is that Swift initializer rules don’t allow to use self
before the class is fully initialized. This forces these properties to be marked as ImplicitlyUnwrapped which may hurt the eyes.
class Node<T> {
var value: T {
get {
return _value!
}
set {
_value = newValue
}
}
var _value: T?
var next: Node!
var previous: Node!
init() {
next = self
previous = self
}
}
Node as the list itself
The other thing I wanted to try in this iteration was using a node as the list itself which resembles more what you usually will do in C. Just have a node that acts as the beginning of the list and form there keep dereferencing pointers to traverse it.
The first issue I had with this was a funny problem with CustomStringConvertible
and the Playgrounds. Because in the description
implementation you access other nodes playgrounds tries to access the description of those nodes, which triggers again the method and… recursion for the win! To fix it in this iteration I simply used a toString()
method when I needed to see the representation of a list.
The other issue with having a node be the list itself is that you can call any method using any node and that is quite dangerous as, depending on the implementation, the list can get into an inconsistent state. For example the remove(node:)
method is safe to call because it only deals with the passed node. But the removeAll()
assumes that the current node is the sentinel.
If you are in the C mindset this issues may not matter that much, as you don’t call methods from Node, but pass nodes into functions. Or maybe the issues are the same. But the fact is that buying into the OOP way of doing things makes this an unsafe approach.
Wrapping the sentinel
Coming back to the idea of having the data structure as its own type and internally using nodes.
public class LinkedList<T> {
let sentinel = Node<T>()
The issue with messing around with individual nodes is removed and the simplification of the checks thanks to the sentinel remains.
Having the list and the nodes as separate concepts means that they can conform to CustomStringConvertible
safely and the Playgrounds will not freak out.
public var description: String {
var text = "["
var node = sentinel.next!
while let value = node._value {
text += "\(value)"
node = node.next
if node._value != nil { text += ", " }
}
return text + "]"
}
Even with this changes the node next properties have to be still marked as ImplicitlyUnwrapped
.
Sequence
Having the data structure as its own type also makes it easy to make it play nicer with the Swift standard library.
We can create a simple structure that iterates over the list.
public struct LinkedListIterator<T>: IteratorProtocol {
var node: Node<T>
mutating public func next() -> T? {
let next = node.next!
guard let value = next._value else { return nil }
defer { node = next }
return value
}
}
This makes it really easy to add support for all the Sequence operations to our LinkedList. We just have to create this iterator with the sentinel node, the first node in the list.
extension LinkedList: Sequence {
public func makeIterator() -> LinkedListIterator<T> {
return LinkedListIterator(node: sentinel)
}
}
Now we can use our list in for..in
loops or use standard functions like map
of filter
like with any other Sequence provided by the standard library.
Copy on Write
One thing that makes this LinkedList different from the other collections in the standard library is that it’s a reference type, is implemented using a class
. This means that different variables can be using the same list, and modifying the contents in one will affect the other.
print("Original \(list)") /`/ "Original [0, 1, 2, 3, 4]"
var copy = list
copy.remove(at: 2)
print("Copy \(copy)") // "Copy [0, 1, 3, 4]"
print("List \(list)") // "L`ist [0, 1, 3, 4]"
This comes as a surprise as anyone will expect it to work as any other collection in Swift. For that we have to make our list be a value type, so two different variables have their own copy of the list.
We can proceed an simply change the LinkedList to be a struct
public struct LinkedList<T> {
But the results are the same:
print("Original \(list)") // "Original [0, 1, 2, 3, 4]"
var copy = list
copy.remove(at: 2)
print("Copy \(copy)") // "Copy [0, 1, 3, 4]"
print("List \(list)") // "List [0, 1, 3, 4]"
This is because even if the list gets copied, the nodes (the sentinel) is still a reference type, so what gets copied is the reference to that node. It means that the two variables have different list instances, but both lists point to the same underlaying nodes.
To solve this Swift uses a technique called copy-on-write. When a value type gets assigned to another variable, this just copies the references to avoid unnecessary expensive copies. But when the second instance gets modified an automatic copy gets done. This allows the data structures to not waste unnecessary memory and time copying themselves when is not required, but doing it when is necessary to keep the rules imposed by value semantics.
To opt-in to this behavior we have to modify the methods in the list that mutate the nodes to check if the instance is being referenced by more than one variable and in that case trigger a copy.
To make it easy what I like to do is have an internal var that represents the mutable node. This is a dynamic property that makes the check and the copy. So in methods that mutate I always use that property instead of the real node.
fileprivate var sentinel = Node<T>()
private var mutableSentinel: Node<T> {
mutating get {
if !isKnownUniquelyReferenced(&sentinel) {
sentinel = self.copy().sentinel
}
return sentinel
}
}
With this in place we can see how our previous example now works as expected.
Original [0, 1, 2, 3, 4]
Copy [0, 1, 3, 4]
List [0, 1, 2, 3, 4]
About performance
While writing this I haven’t put any effort in performance. Linked Lists are well know structures that obviously have their pros and cons. But one important aspect of today’s hardware is how it relies in caching to keep the system performant. This means that data locality is really important and data structures and algorithms that keep that in mind will have more help from the CPU. Linked List, by nature, are the opposite of this as they rely in pointer dereferencing for pretty much every operation. So keep that in mind when picking the proper data structure for your algorithms and remember, always profile.
Download the playground from GitHub.