Low Throughput FizzBuzz

publié sur 2022-06-07 10 min de lecture

There's this fun code golf challenge to create the fastest possible FizzBuzz implementation. It's basically an exercise on optimising I/O (as writing out the bytes is the most computationally expensive part of FizzBuzz), which has led to some insanely impressive submissions.

The fastest submission averaged at 55 GiB/s, and included a comment that said it "[…] is future-proofed where possible to be able to run faster if the relevant processor bottleneck – L2 cache write speed – is ever removed."

Being able to write code where your bottleneck is L2 cache is basically fucking wizardry to me.

I was going to try my hand at attempting this challenge for this post, but then I remembered something important - I'm actually a pretty shit programmer.

Here's the example answer from the challenge that's written in C.

#include <stdio.h>

int main() {
    for (int i = 1; i < 1000000000; i++) {
        if ((i % 3 == 0) && (i % 5 == 0)) {
            printf("FizzBuzz\n");
        } else if (i % 3 == 0) {
            printf("Fizz\n");
        } else if (i % 5 == 0) {
            printf("Buzz\n");
        } else {
            printf("%d\n", i);
        }
    }
}

Turns out, I don't know how to make this faster. It's already written in C, which is the lowest level language I know how to write, and I don't know how to do any fancy memory buffer stuff so I'm all out of ideas.

Instead let's do something that I can do, something that I specialise in: making slow-ass code.

I'll need to have one restriction however. Technically, the slowest FizzBuzz could just have a sleep operation that goes on for an indefinite amount of time, but that would be no fun. I want to keep this as "reasonable" as possible, like it was something someone would actually do.

As a benchmark, the example given in C gives 235 MiB/s on my computer. Now, let's see how slow this can go :)

First of all, we need to get rid of C. A compiled language that well known for being performant? Come on, we can do better (or worse in this case).

I don't really know what the "slowest" programming language is, but I'm pretty sure it's either Ruby or Python, and since I know Python better, I'm going to go with that. I could find some super unknown/toy language that has not had many performance optimisations done to it, but I think that goes against the "reasonable" rule.

So, here's FizzBuzz in Python.

for i in range(1, 1000000000):
    if i % 3 == 0 and i % 5 == 0:
        print("FizzBuzz")
    elif i % 3 == 0:
        print("Fizz")
    elif i % 5 == 0:
        print("Buzz")
    else:
        print(i)

Running this gives me 21 MiB/s, a 91% decrease from the benchmark already!

I mentioned earlier that this challenge was an exercise in optimising I/O. Do you know what is the slowest form of I/O? Network requests! Time to make a FizzBuzz micro-service⁽¹⁾⁽¹⁾ I'm already hating this. :)

⁽¹⁾ I'm already hating this.

#
# server.py
#

from flask import Flask, request

app = Flask(__name__)

@app.route("/")
def fizzbuzz():
    i = int(request.args["i"])

    if i % 3 == 0 and i % 5 == 0:
        return "FizzBuzz"
    elif i % 3 == 0:
        return "Fizz"
    elif i % 5 == 0:
        return "Buzz"
    else:
        return str(i)

if __name__ == "__main__":
    app.run(debug=True)

#
# client.py
#

import requests

if __name__ == "__main__":
    for i in range(1, 1000000000):
        res = requests.get(f"http://127.0.0.1:5000/?i={i}")
        print(res.text)

I could be making this much worse by hosting the server across the world from me, but this is now running at a measly 9 KiB/s - which is a further 99.9% decrease. I'd rather not kick a man while he's down.

To be honest, I'm already not sure of what else we can do. Going from 235 MiB/s to 9 KiB/s is already huge,⁽²⁾⁽²⁾ To make the difference easier to understand, let's use the same units. 235 MiB is 240640 KiB. and I don't think we will get another step back in performance like we did with the last one.

⁽²⁾ To make the difference easier to understand, let's use the same units. 235 MiB is 240640 KiB.

Let's just try adding some "reasonable" overhead. Something like… GraphQL?

Every GraphQL query goes through three phases: parsing, validation and execution.

Parse: The query is parsed into an abstract syntax tree (AST).
Validate: The AST is validated against the schema.
Execute: The runtime walks through the AST, starting from the root of the tree, invokes resolvers, collects up results, and emits JSON.

All of these phases introduce additional computational cost to the network request. I don't think it will be by much since we'll be doing very simple queries, but hopefully the slow down will at least be noticeable.

#
# server.py
#

from ariadne import (
    QueryType, gql, make_executable_schema, graphql_sync
)
from flask import Flask, request, jsonify

type_defs = gql("""
    type Query {
        fizzbuzz(i: Int!): String!
    }
""")

query = QueryType()

@query.field("fizzbuzz")
def resolve_fizzbuzz(*_, i):
    if i % 3 == 0 and i % 5 == 0:
        return "FizzBuzz"
    elif i % 3 == 0:
        return "Fizz"
    elif i % 5 == 0:
        return "Buzz"
    else:
        return str(i)

schema = make_executable_schema(type_defs, query)
app = Flask(__name__)

@app.route("/graphql", methods=["POST"])
def graphql_server():
    data = request.get_json()

    _, result = graphql_sync(
        schema,
        data,
        context_value=request,
        debug=app.debug
    )

    return jsonify(result)


if __name__ == "__main__":
    app.run(debug=True)

#
# client.py
#

from gql import gql, Client
from gql.transport.aiohttp import AIOHTTPTransport

transport = AIOHTTPTransport(url="http://127.0.0.1:5000/graphql")
client = Client(transport=transport, fetch_schema_from_transport=True)

query = gql("""
    query FizzBuzz($i: Int!) {
        fizzbuzz(i: $i)
    }
""")

if __name__ == "__main__":
    for i in range(1, 1000000000):
        result = client.execute(query, variable_values={"i": i})
        print(result["fizzbuzz"])

Well, that got it down to 7 KiB/s (another 22% decrease). Let's keep going.

I have an idea though it might be stretching the term "reasonable" a bit, but what say we use the second slowest form of I/O - opening and reading files? We can pre-compute FizzBuzz, save it to a text file, and read that instead of computing it on the fly (which would normally be much faster for such a small operation).

Seems like a good idea to me!

#
# make_fizzbuzz.py
#

with open("fizzbuzz.txt", "w+") as f:
    for i in range(1, 100000000):
        if i % 3 == 0 and i % 5 == 0:
            f.write("FizzBuzz")
        elif i % 3 == 0:
            f.write("Fizz")
        elif i % 5 == 0:
            f.write("Buzz")
        else:
            f.write(str(i))

#
# server.py
#

@query.field("fizzbuzz")
def resolve_fizzbuzz(*_, i):
    with open("fizzbuzz.txt", "r+") as f:
        lines = f.readlines()
        return lines[i - 1]

To keep my SSD alive, I lowered the amount calculated from 1 billion to 100 million.

This actually became too slow for pv to show anything (as in, the throughput was significantly less than a byte per second).

I'm also not a fan of this approach because its speed is determined by how big the file is. readlines() puts the entire file into memory, which scales on file size (a smaller file would be loaded into memory quicker than a large file).

I'd prefer the solution to be of O(n) complexity (or close enough to it) and not really be affected by external factors.

Let's do something else instead.

I want to go back to the idea of adding more overhead. How about we take a page from the crypto bro playbook and make our FizzBuzz implementation cryptographically secure?

#
# server.py
#

import base64
from Crypto.PublicKey import RSA
from Crypto.Cipher import PKCS1_OAEP

# ...

type_defs = gql("""
    type Query {
        fizzbuzz(i: Int!, public_key: String!): String!
    }
""")

# ...

@query.field("fizzbuzz")
def resolve_fizzbuzz(*_, i, public_key):
    pub_key = RSA.import_key(public_key)
    key = PKCS1_OAEP.new(pub_key)
    encrypt = (
        lambda x: base64.b64encode(
            key.encrypt(x.encode())).decode("utf-8")
    )

    if i % 3 == 0 and i % 5 == 0:
        return encrypt("FizzBuzz")
    elif i % 3 == 0:
        return encrypt("Fizz")
    elif i % 5 == 0:
        return encrypt("Buzz")
    else:
        return encrypt(str(i))

#
# client.py
#

import base64
from Crypto.PublicKey import RSA
from Crypto.Cipher import PKCS1_OAEP

key = RSA.generate(3072)
pub_key = key.publickey().exportKey().decode("utf-8")
priv_key = PKCS1_OAEP.new(RSA.importKey(key.export_key()))

# ...

query = gql("""
    query FizzBuzz($i: Int!, $public_key: String!) {
        fizzbuzz(i: $i, public_key: $public_key)
    }
""")

for i in range(1, 1000000000):
    result = client.execute(
        query,
        variable_values={"i": i, "public_key": pub_key}
    )

    print(
        priv_key.decrypt(
            base64.b64decode(result['fizzbuzz'].encode())
        ).decode("utf-8")
    )

Looks like we're going down another unit of measurement because this is getting me ~500 B/s, yet another 93% decrease!

We're really starting to get diminishing returns, what else could we possibly do?

Since we've already stepped our toes into it, perhaps we could add to it the most wasteful form of computation in common use today - a blockchain.

A blockchain is an immutable linked list. That's pretty much everything you need to know about them, they're really not all that exciting or interesting.

Here is how to make an immutable linked list.

import hashlib

NUM_TX_PER_BLOCK = 69

class FizzBuzzBlock:
    def __init__(self, prev_block_hash, tx_list):
        self.prev_block_hash = prev_block_hash
        self.tx_list = tx_list

        # To ensure the same number of bytes are outputted as
        # with the other attempts, format the data correctly
        self.block_data = '\n'.join(tx_list)
        self.block_hash = (
            hashlib.sha256(self.block_data.encode()).hexdigest()
        )

class FizzBuzzBlockChain:
    def __init__(self):
        # Create chain with genesis block
        self.chain = [FizzBuzzBlock("0", ["1"])]

    def create_block(self, tx_list):
        prev_block_hash = self.last_block.block_hash
        block = FizzBuzzBlock(prev_block_hash, tx_list)
        self.chain.append(block)
        return block

# ...

if __name__ == "__main__":
    tx_list = []
    fizzbuzz_chain = FizzBuzzBlockChain()

    for i in range(2, 1000000000):
        result = client.execute(
            query,
            variable_values={"i": i, "public_key": pub_key}
        )

        tx_list.append(
            priv_key.decrypt(
                base64.b64decode(result['fizzbuzz'].encode())
            ).decode("utf-8")
        )

        if len(tx_list) >= NUM_TX_PER_BLOCK:
            block = fizzbuzz_chain.create_block(tx_list)
            tx_list = []
            print(block.block_data)

I've had to change the range to start counting from 2 because I pre-computed 1 to put in as the genesis block.

This put us down to 400 B/s (a 20% decrease). It will obviously vary depending on how many transactions are accepted per block, but I've put that as 69 because it seemed like a nice round number.

Next we will make a proof-of-work validator.

Once we've added all the transactions to the tx_list, we can start counting numbers and hashing them along the way. If we found a hash to a number that fulfils a certain criteria (like having the binary representation start with twenty "1"s⁽³⁾⁽³⁾ I think Bitcoin checks for 19 zeros right now, but since counting is so computationally cheap, the same number is probably fine for our purposes.), then we can add it to the blockchain.

⁽³⁾ I think Bitcoin checks for 19 zeros right now, but since counting is so computationally cheap, the same number is probably fine for our purposes.

# ...

WORK_TARGET = 20

def validate_work(work):
    h = hashlib.sha256(str(work).encode()).hexdigest()
    b = bin(int(h, 16))
    # bin() always starts with '0b', so skip them.
    # also check next digit to make sure it's twenty ones
    # no more, no less
    return b[2:3 + WORK_TARGET] == '1' * WORK_TARGET + '0'

# ...

class FizzBuzzBlockChain:
    def __init__(self):
        self.chain = [FizzBuzzBlock("0", ["1"])]
        self.used_work = []

    def create_block(self, tx_list, work):
        if (
            work not in self.used_work and
            validate_work(work)
        ):
            prev_block_hash = self.chain[-1].block_hash
            block = FizzBuzzBlock(prev_block_hash, tx_list)
            self.chain.append(block)
            self.used_work.append(work)
            return block
        else:
            raise Exception("Invalid work")

# ...

if __name__ == "__main__":
    # ...
    work = 0

    for i in range(2, 1000000000):
        # ...

        if len(tx_list) >= NUM_TX_PER_BLOCK:
            while True:
                work += 1
                if validate_work(work)
                    break

            block = fizzbuzz_chain.create_block(tx_list, work)
            tx_list = []
            print(block.block_data)

And now we're at 100 B/s, another 75%. Yeah, I think we're done here.

So, we went from 235 MiB/s (or 246400000 B/s) to 100 B/s. In total that's a 99.99996% decrease! And from the fastest submission - 55 GiB/s (59055800320 B/s), there is a 99.9999998% decrease.

That was honestly quite a bit of work to get it this slow in a "reasonable" way, and I had to put some real effort into thinking about it.

But if I wanted to save myself the trouble, I probably could have just asked one of my coworkers to try attempt implementing FizzBuzz as best they could instead, and I probably would've gotten a pretty similar result.

Actually, it might even have been slower that way.