7 October 2024 7min read

Using Binaries as Libraries

Recently, while watching a Tsoding stream, I saw something that piqued my curiosity. Alexey demonstrated how one of his applications could render a video just by calling into the ffmpeg binary directly, instead of using it as a source code dependency or even a static or dynamic library linked to his program.

This caught my attention because, although I sometimes use binaries as dependencies by calling them (a good example is how I call the tailwindcss binary from Genesis to generate this website), it had never occurred to me to use them for something as sophisticated as rendering a video. Calling them to do something they can do on their own, like reading some files and generating output? Sure. But calling a binary and somehow passing the data to generate a video seemed like another level entirely. However, the more I thought about it, the more the idea made sense. After all, it’s all just binary data flowing through pipes. So, I decided to spend some time having fun with the idea.

The first part wasn’t very difficult; I just had to refresh my knowledge about the incantation of parameters necessary to ask ffmpeg to generate a video from a few frames, something I hadn’t done in years. With a list of parameters at hand, I started writing some Swift code to spawn a Process to run ffmpeg.

class FFMPEG {
    let process = Process()
    let pipe = Pipe()
    
    func startRenderingVideo(width: Int, height: Int, framerate: Int) throws {
        let resolution = "\(width)x\(height)"

        process.currentDirectoryURL = // directory to generate the file
        process.executableURL = // path to ffmpeg in your system
        process.arguments = [
            "-loglevel", "verbose",
            "-y",
            
            "-f", "rawvideo",
            "-pix_fmt", "rgba",
            "-s", resolution,
            "-r", String(describing: framerate),
            "-i", "-", 
            
            "-c:v", "libx264",
            "-vb", "2500k",
            "-pix_fmt", "yuv420p",
            "output.mp4"
        ] 
        
        // set a pipe to feed the frames via standard input
        process.standardInput = pipe 
        
        try process.run()
    }

With this code, I was able to spawn ffmpeg and make it wait on standard input for a bunch of frames. Of course, the parameters can be tweaked as needed, but these worked quite well for what I wanted to try.

The next part was the interesting one: how to pass the raw bytes that defined each frame. For this, we need to look at the parameters I used. We asked for each pixel to be an rgba color and specified the resolution of each frame. So the next step was to create a buffer of bytes with the correct shape.

Thankfully, on macOS, we have great solutions as part of our tools. In this case, an old one: CoreGraphics. I could have done this part manually to make it work on other platforms, but since I was just having some fun and wanted to generate images with SwiftUI, I went with the easy solution. It would be interesting to use something else that worked on other platforms and make the bytes manually, but that’s an exercise for another day.

Here’s the bit of code needed to take the bytes from a CGImage and feed them into the Pipe:

func sendFrame(_ cgImage: CGImage) {
    let width = cgImage.width
    let height = cgImage.height
    let bitsPerComponent = 8
    let bytesPerPixel = 4 // RGBA = 4 bytes per pixel
    let bytesPerRow = bytesPerPixel * width
    let totalBytes = height * bytesPerRow

    var pixelData = Data(count: totalBytes)
    pixelData.withUnsafeMutableBytes { (ptr) in
        let context = CGContext(
            data: ptr.baseAddress,
            width: width,
            height: height,
            bitsPerComponent: bitsPerComponent,
            bytesPerRow: bytesPerRow,
            space: CGColorSpaceCreateDeviceRGB(),
            bitmapInfo: CGImageAlphaInfo.premultipliedLast.rawValue
        )!

        context.draw(cgImage, in: CGRect(x: 0, y: 0, width: CGFloat(width), height: CGFloat(height)))
    }

    pipe.fileHandleForWriting.write(pixelData)
}

A few things to note in the above code:

  1. The definition needed for the CGContext needs to match what we’ve told ffmpeg to expect: four bytes per pixel, 1 byte per color component, etc.
  2. I create a buffer with Data where I ask Core Graphics to draw the image. I would be curious if there was a different way of doing this because I imagine the draw actually needs to iterate through the image, but maybe there is a way to dump its bytes directly. Although that would assume the format is the same, so I guess it wouldn’t be very portable.
  3. I use the Pipe, which is set as the standard input of the ffmpeg process, taking a file handle and writing the data into it. Through the magic of the operating system, this will transfer the bytes from one process to the other, letting ffmpeg render the frame of the video.

The last bit we’re missing is very easy: just a way to tell ffmpeg that there are no more frames incoming. Otherwise, it will just keep waiting for more data. To do this, we just close the file handle of the pipe. Then it’s just a matter of waiting for ffmpeg to finish the rendering.

func endRendering() {
    pipe.fileHandleForWriting.closeFile()
    process.waitUntilExit()
}

And with this, we have a very simple class that will render a video from our given frames. Under the hood, it will use ffmpeg, but our usage code doesn’t have to worry about the intricacies of it anymore.

class FFMPEG {
    func startRenderingVideo(width: Int, height: Int, framerate: Int) throws
    func sendFrame(_ cgImage: CGImage)
    func endRendering()
}

Controlled Animation with SwiftUI

To try this out, I wanted to make a small animation with SwiftUI. From time to time, I have the need to create some animation to support an explanation, but I never do it because reaching for motion graphics software is a big step for something so simple. So I thought this would be the perfect time to make a proof of concept with SwiftUI.

The idea is quite simple: prove that we can get the frames of an animation that changes the contents of some text and another that moves a shape through the screen.

The SwiftUI part is, of course, very simple. It’s just a stack with some text and shapes:

VStack {
    Text(text)
        .foregroundStyle(Color.white)
    Circle()
        .fill(.red)
        .frame(width: circleSize, height: circleSize)
        .offset(x: (circlePosition+circleSize/2) - (width/2))
    Rectangle()
        .stroke(.green)
}

SwiftUI example frame

Getting an image, specifically a CGImage, from this view was also very trivial thanks to SwiftUI’s ImageRenderer.

let view = ToRender(text: text, circlePosition: circlePosition, width: width, height: height)
let renderer = ImageRenderer(content: view)
let cgImage = renderer.cgImage!
ffmpeg.sendFrame(cgImage)

The tricky part was figuring out how to get each frame. I would love to use SwiftUI animations and somehow just get a snapshot for each frame, but I’m not sure there is a trivial way of doing it that allows me to leverage SwiftUI animations. So instead of going down that rabbit hole, I opted to drive the animations myself.

This means, as you can see in the View above, that I can’t just use animation directly in the views. Instead, I need to manipulate their properties, like the circle offset, and then compute that myself on each frame. Luckily, SwiftUI exposes a few core types for animations that are separate from the view layer, so I can use that instead of letting SwiftUI drive the animation.

For example, we can use KeyframeTimeline to combine LinearKeyframe and even SpringKeyframe to then request the timeline value at a specific time or progress.

For the text, it’s quite simple for what I need. Knowing at which frame we are, we can just interpolate to know how many characters of the full text we want to display and just prefix the string up to that index.

let fullText = "Hello World!"
// Animate text
let tEnd = Double(fullText.count-1)
let tkf = KeyframeTimeline(initialValue: 0) {
    LinearKeyframe(tEnd, duration: 1.5)
    SpringKeyframe(tEnd/2, duration: 0.5)
    CubicKeyframe(tEnd, duration: 1)
}
let text = String(
    fullText.prefix(upTo: fullText.index(fullText.startIndex, offsetBy: Int(tkf.value(progress: progress))))
)

For the circle that we want to move through the screen, we can use the same technique of using a keyframe timeline and asking it to get us the value at a certain time.

// Move Circle
let startPosition = 0.0
let endPosition = Double(width) - 100
let keyframes = KeyframeTimeline(initialValue: startPosition) {
    LinearKeyframe(endPosition/2, duration: 1, timingCurve: .linear)
    LinearKeyframe(endPosition/3, duration: 1, timingCurve: .easeInOut)
    SpringKeyframe(endPosition, duration: 1)
}
let circlePosition = keyframes.value(progress: progress)

And with this, I can just make a loop that generates all the frames needed and feeds them into ffmpeg.

let totalSeconds = 3
let totalFrames = framerate * totalSeconds
for i in 0..<totalFrames {
    let progress = CGFloat(i) / CGFloat(totalFrames - 1)
    // ... generate frame and send to ffmpeg
}

What’s the Point?

The point was to have fun and explore the capabilities of calling binaries as if they were libraries, avoiding the integration cost that comes with source code or libraries in other languages. There’s always a trade-off, and of course, source code or library integration exists for good reasons. I haven’t come out of this thinking binary integrations are always the best solution, but now I have more knowledge and practice with them, so it’s yet another tool in my belt. And that was a goal well accomplished.

On the other hand, I have a proof of concept for creating videos from SwiftUI, which opens the door to using this technique in the future when I want to explain some things without having to do it with a full-featured video editor. Doing it in SwiftUI and code is actually faster for certain things.

And finally, I hope I’ve piqued your interest as mine was piqued! Have fun with programming!

If you enjoyed this post

Continue reading