5 December 2015 8min read

Solving the Strings Problem in Swift

In this post I try to port the type based solution to the strings problem ideas by Tom Moertel to Swift.

I started this some days ago as an exercise to continue exploring type safety in Swift and expand my thoughts on how the type system can help us solving domain specific problems. A topic that I already explored in Type Systems and Domain Driven Development.

Before continuing let me give some disclaimers.

I recommend to read the linked post where it talks about the strings problems in deep and proposes the solution that I tried to port to Swift.

The escaping functions that I’ve used are probably for sure not correct at all, as I was not trying to construct production ready XML or URL types. They are used just as a way to see the types being constructed differently.

The resulting solution is not complete at all, and I’m writing this post with that in mind. It serves as an experiment trying to show you how we can leverage the type system even further, but Christmas is already here, Apple has open sourced Swift, and I want to ship one project before the end of the year. So sadly I don’t have more time to improve it, for now. But I would be really happy to receive feedback and posible solutions to what is exposed here.

The Strings Problem

The original post describes it has:

we just plain suck at keeping a bazillion different strings straight in our heads, let alone consistently and reliably rendering their interactions safe whenever they cross paths in a modern web application. It’s easy to say, “just escape the darn things,” but it’s hard to get it right, every single time.

In my words, the type String is one of those types that can represent absolutely anything. Than causes a variety of problems and usually it implies that you have to over defend yourself. The proposed solution relays on accepting that different strings have different meaning so we should treat them differently.

Go and read the original post as it describes why other solutions are not as good as using the type system.

A SafeString

A SafeString is a type that contains a string representing a specific Language.

public enum SafeString<T: Language> {
    case Empty
    case Fragment(fragment: T)
    indirect case Concat(left: SafeString, right: SafeString)
}

The protocol Language is where plain and unsafe strings are converted to a specific type of string that has a specific meaning.

public protocol Language {
    // litfrag  :: String -> l   -- String is a literal language fragment
    init(fragment: String)

    // littext  :: String -> l   -- String is literal text
    init(plainText: String)

    // natrep   :: l -> String   -- Gets the native-language representation
    func toString() -> String

    // language :: l -> String   -- Gets the name of the language
    var name: String { get }
}

The comments in the code reference the original implementation.

As you can see the important part is the two inits. Is at that point where the developer converts a String to a Language where the risk disappears. The developer takes one single time the decision of if the string is unsafe, or if it’s already in the language that I want.

Once that decision is taken the framework leverages the type system to ensure that strings with different meaning are not used in an unsafe manner.

This two types are the central part of the safety framework. With this kernel implemented, now one can start creating types for any specific language.

Specific languages

In the example the language represented is XML (or XHTML) to give a safe way of generating the markup of a website.

For that we just have to define the XMLString type that conforms to the Language protocol.

public struct XMLString: Language {

    let xml: String

    public init(fragment: String) {
        xml = fragment
    }

    public init(plainText: String) {
        xml = escapeXML(plainText)
    }

    public func toString() -> String {
        return xml
    }

    public let name = "XML"
}

The implementations of the languages are pretty straightforward and the only interesting part for a proper solution is implementing correctly the escape method for each language.

We could use Swift protocol extensions to make easier the task of creating languages, as the only difference is the escape function.

A part from that we can create a typealias for an XML type with a function that creates XML from safe string literals.

public typealias XML = SafeString<XMLString>

public func xml(fragment: String) -> XML {
    return XML(fragment: fragment)
}

With this in place, one can start using and creating safe XML strings.

xml("<em>wow!</em>") // <em>wow!</em>
XML(text: "Safety & XML") // Safety &amp; XML

As you can see the user of the library doesn’t have to interact with the XMLString (or any other Language conforming type), it only constructs and uses SafeString instances. And thanks to the typealias it can even ignore that.

But now, if it tries to use XML types with String the compiler is there to save us.

someXmlInstance + "safety & more"
error: binary operator '+' cannot be applied to operands of type 'XML' (aka 'SafeString<XMLString>') and 'String'

To do that we will have to explicitly convert the unsafe string into XML.

someXmlInstance + XML(text: "Safety & More")
// <em>wow!</em>Safety &amp; More

Real world example

In the playground you can see a real example that converts an Article type to an XML containing a list of links to share websites.

The only piece that I will show here is the compose function, as it will serve to illustrate the last point of the post.

func compose(share: Share) -> XML {
    let url = XML(text: share.url.render())
    let siteTitle = XML(text: share.siteTitle)
    // Break it to make the compilar happy or "expression is too complex"
    let a = xml("<a href=\"") + url + xml("\"")
    let b = xml("title=\"") + siteTitle + xml(": \u{201C};") + title + xml("\u{201D}")
    let c = xml(">") + share.imageTag + XML(text: "Image here") + xml("</a>")
    let link: XML = a + b + c
    return link
}

This function is used to convert the information to share an article to one website into an XML containing the link, title and logo.

The missing pieces

In this real world example you can see one of the tedious parts of this system (ignoring that the compiler freaks out with a too complex expression). To generate a long safe XML concatenating other types of information requires to generate and XML instance for each part of the string and concatenate them all in a long chain.

Although this is exactly the point of being safe, the fact that an URL or and unsafe string as the title can not be attached directly into an XML fragment is exactly what we want.

But in this case, any Swift user will try to use String literals and interpolation. And almost every developer will let the laziness win over safety.

To have a complete implementation that solves The String Problem we should have a way to create SafeString with the native Swift interpolation system.

You can check the Interpolation page in the playground to see some code about this.

The first impression is that it would be nice to have SafeString conform to StringLiteralConvertible so we could do:

let xml: XML = "<p>blabla<p>"

But doing this would break the safety feature of the SafeString, meaning that now this is posible:

someXmlInstance + "totally not a safe string"

To mitigate this we could assume that any string literal is not safe. But at that point maybe is no longer worth it.

The StringInterpolationConvertible feature in Swift looks really powerful, especially as we can overload the init(stringInterpolationSegment) with different supported types.

public init<T>(stringInterpolationSegment expr: T) {
    self.init(text: String(expr))
}
public init(stringInterpolationSegment expr: String) {
    self.init(text: expr)
}
public init<T where T: StringLiteralConvertible>(stringInterpolationSegment expr: T) {
    self.init(fragment: String(expr))
}
public init(stringInterpolationSegment expr: SafeString) {
    self = expr
}

The problem is that the literal string (which we should assume is already safe) is handled in the same way as any other interpolated string (which we can not assume is safe), so we cannot distinguish between the two. This also breaks the safety when using string literal interpolation.

In the original post this is posible because the Haskell template system is more flexible and allows the user to specify which kind of string is being used when interpolating.

Conclusion

I would highly recommend using a framework like this in any system that has to deal with different kinds of data hidden behind String. The only real missing feature is working better with the literal syntax of the language.

I’m sure this could be possible in Swift so if someone has any idea of to improve the StringInterpolationConvertible I would be happy to listen.

Update 16/01/2018: Ole Begemann nicely pointed out in Twitter that he wrote a post with a solution for the String Interpolation. Check out Fun with String Interpolation.

There is also a swift-evolution proposal to improve it. Fix ExpressibleByStringInterpolation

In the other hand same techniques could be applied to other types that rely on numeric data. Is often the case where is also not safe. Think about Money with different currencies or any other unit like kilometers and miles.

Remember to check the Playground and that any feedback is welcomed.

If you enjoyed this post

Continue reading