The Pachycephalosaurs method of coding servers
My experience writing a “no knowledge” key/value server.
I told you there was a part two coming — and this is it.
Do you even recall how we find ourselves at part deux? Let me quickly remind you: In my desultory shuffle about the MDN pages, I stumbled on this particularly toxic (TOXIC) nugget regarding the WebCrypto API:
A man too stupid to pay heed, I decided to dedicate some serious time to becoming a cryptography beginner. Anyway, that’s all ancient history: see part one for the rambling intro. It involved some sort of dinosaur metaphor (side note: I can now spell Pachycephalosaurus without looking it up).
An origin story
Not many people know this, but that awesome website thegoldenmule.com? Yah, I own that. That’s me — it’s where I used to put things when I used to have time to do stuff. My personal GitHub account is named, accordingly, thegoldenmule. I have been asked many times why thegoldenmule, and at this point, tired of all the dang attention (I get so much of it), I just say “eh — it’s named after a book” and leave it at that.
This is only a half-truth, as the book in question is actually entitled The Golden Ass: a Latin book written in the second century AD. Haters will hate, but I would argue it’s also the best second century Latin novel. I did not have the wherewithal, in my youth, to put the word “ass” in my personal domain name, though I might have the chutzpah now (if you recall from a previous post, I’m planning on getting a tattoo, which is pretty edgy).
This book is not about bronzed buttocks.
The plot revolves around the protagonist’s curiosity (curiositas) and insatiable desire to see and practice magic. While trying to perform a spell to transform into a bird, he is accidentally transformed into an ass. This leads to a long journey, literal and metaphorical, filled with in-set tales.
Do you get it now? I am the protagonist, technology is magic, and according to an informal poll of my coworkers, at this point in my career I think I may have gone full-ass. So when I see a piece of technology that’s so complex, so mysterious, so magical that I’m not even allowed to use it — well that really gets my… caboose in gear.
Now — what to actually make with this forbidden API?
This doesn’t need an elaborate narrative explaining my reasoning (we’re about 350 words into this post already and — holy crap is that scrollbar accurate?!) — suffice it to say that I am fascinated by the inner cryptographic workings of Bitcoin, my personal emails are stored in encrypted form on a server in Switzerland, and I have a pretty serious hatred of Evernote for being both the worst software product I regularly pay for and the best of its kind on the market.
What do you get when you put these curiosities together?
I’m a good little programmer so I think in terms of the composition of simple pieces: a simple no-knowledge server, a simple API, and a simple client.
We’ll just scratch the server itch on this post.
Why do we need no-knowledge servers?
Storing data on a server is a need common to most any modern application. We often spin off a little microservice for user data storage (just store some json blobs!) or maybe we use Parse or Firebase. A common principle when building these services is that a server should never trust clients. This is, I would argue, the only realistic approach. All clients are hacked all the time, my friend, so keep those filthy, unwashed client hands off of my nice things.
We are tempted to think that this approach works without issue— well, unless you count the Snowden Papers as an issue. Or, uh, Cambridge Analytica. Or, hold on, let me just check https://haveibeenpwned.com/ real quick:
Oh okay so just three hacked accounts for every Internet using man, woman, and child on Earth. I’m also reading that Gmail can and does read my email? Wait, Dropbox regularly reads my files and also lets the government take a peek whenever it wants? Reminder that as of 2017, rando Evernote employees can read your private notes without any sort of clear process or reason as to why.
Perhaps we should add a second, uh, angrier principle: a client should never trust (pardon my language) mother-flipping-lazy-under-trained-brogrammers or tech-companies-beholden-to-shareholder-profit-margins with their data.
So clients shouldn’t trust servers and servers shouldn’t trust clients. This makes my simple “user data microservice” a bit more complicated.
Wait, what is no-knowledge?
“No Knowledge” does not mean Zero Knowledge. I know, this statement sounds mathematically suspect, to say the least. Zero knowledge proofs are very specifically defined constructs, whereas “no knowledge” has no specifically defined academic meaning. I am borrowing the term from the company, SpiderOak, which offers a no-knowledge Dropbox (among other things). I’ve followed them for quite awhile and have been very interested in their security (but unfortunately very unimpressed with the user experience).
From their blog:
No Knowledge means we give you complete privacy of your data — what we’ve been doing from the start. Because of the way we build our products with end-to-end encryption, we have No Knowledge of the names or content of your files. Even if we wanted to, we can’t see what you are storing or sharing, nor the conversations you’re having. Your files are encrypted before they leave your device and in-transit. Only you have the key on your device to decrypt them. Your data is completely safe from our sysadmins, your own sysadmins, hackers, a blind warrantless subpoena, or any threat.
I also love this little addendum: “…this means we can never reset your password. And a good rule of thumb is that any company that can reset your password could potentially access or read your data because they have the key to unlock it.”
In short, a secure user data microservice should have two properties:
- It should not be able to read the data it stores.
- It should know as little as possible about who is storing data.
How nk approaches this problem
Thankfully there are smarter people than me running around.
It turns out that we can do this in a straightforward way by inventing literally nothing new. When you find yourself beating your head against some sort of difficult programming challenge, take solace in knowing that it was probably solved at some point in the 1960s by someone with an interesting last name.
Nk requires nothing but a properly padded public RSA key to create an account.
Nk then generates a unique user id, stores it next to the public key, and returns the user id.
Lucky for clients, public keys are designed to be — well, public. So when nk is eventually compromised, the public key is the only piece of unencrypted PII an attacker would have. No emails, no improperly salted passwords, no plaintext user data. Nk doesn’t even store timestamps.
Once a client has created an id, they can immediately store data. They do this by specifying a key (i.e. a name), a payload, and a signature for the payload. The client may, of course, choose to do all of this in plaintext, but that would be silly because, as I believe we’ve already covered: we don’t trust servers. Let me check —reading up the page a bit — yep we already covered that bit.
Don’t. Trust. Servers.
The nk-js “library” creates and uses a symmetric encryption key to encrypt the payload before signing and sending. In the example below, the client has chosen the plaintext key, “foo”. This of course, may be encrypted as well — up to the client.
There is no need for an authorization header here, as the server can simply look up the public key of the user and use it to verify two things:
- That the signature matches the payload and thus, was not modified by a third party.
- That the signature was created by the owner of the private key.
Updating data works much the same way, however it’s retrieving data that’s a tad different. How do we verify the owner of the private key is requesting the data if the client is not “logged in”?
To do this, nk uses a simple proof system.
First, clients request a new proof from the server.
The server then generates and returns a random payload.
From here, the client simply signs this payload with their private key and uses both the proof and the signature in the header of the GET request to retrieve their data.
This provides proof that the client owns the private key. A similar endpoint exists for retrieving a list of keys. The nk-js library transparently requests and provides proofs for you behind the scenes.
Now the observant among you may have noticed an attack surface here: the client trusts the proof the server gives it to sign. “B-b-b-but you said not to trust servers!!” I can hear you blabbering on. Quit shivering so much you big baby.
This is apparently very dangerous as the server could choose payloads that will divulge information about the private key when signed. A future version of nk (at least in my head) will split the signed payload between client and server: i.e. the client chooses a portion of the proof payload and the server chooses a portion of the payload. Thus, neither client nor server has complete control of the payload the client is asked to sign. In addition, RSA-OAEP is apparently simply “resistant” to this attack, so I should probably switch at some point.
In Conclusion, Nk: Don’t use it
Yes, that’s probably a good idea — or at least it would be if this sort of thing existed already. Is this really the only open-source no-knowledge server? This is a genuine question, faithful readers. It can’t be, can it? But daggumit I can’t find anything that fills the same need.
In a future post I’d like to cover the nk client. Turns out, it was a lot of fun to write and yes, believe it or not, making a note taking app that works better than Evernote really is just a weekend away.
We’ll see. I write a lot of “future post” checks that my fingers can’t cash.