attacking bad crypto
arc 1, entry 0: introduction
mood: pedagogicalabcone topic that interests me – as anyone who knows me would attest, while protesting that "interest" is an understatement – is cryptography. there's something truly magical about being able to take words, do some weird math at them, and then be completely certain that no one can read them without the key.
but cryptography is also often misunderstood: by governments, who want a backdoor only they can crack; by devs, reimplementing what already exists but worse; by users, reading dishonest marketing materials (but i repeat myself). so i thought i'd begin explaining it.
the trouble is that cryptography is, like most topics, infinitely complex, and there's not really one good place to start. so i thought i'd start with the bit that interests me most: cryptanalysis! the study of understanding, and usually breaking, cryptographic algorithms.
this series should have more or less no prerequisite knowledge. most of that is because today's blogpost is explaining all of it! so without further ado, i present attacking bad crypto, arc 1: measures of security, entry 0: introduction!
data according to computers
the first thing to understand is that at the lowest levels of computing – and thus to any mathematics designed to be generic, like cryptography – there's no such thing as text, formatting, or images. there's just raw numbers. specifically:
- bit
- a single binary digit, i.e. exactly a 1 or a 0, on or off, true or false.
- byte
- 8 bits, i.e. any number from 0 to 255, since there are 256 combinations of 8 on-and-off settings and one of them has everything "off", so it looks like a zero.
- word
- a group of bytes, specifically enough to match the "bit-ness" of the cpu. if you've heard of "32-bit" and "64-bit" pcs, that's what this refers to. these are typically used to represent larger numbers; on a 32-bit computer that means anything between 0 and 4,294,967,296.
everything else computers can do, even other extremely low-level details like managing memory, is done in terms of these. luckily, though, there are only two more terms we'll need to worry about for this series:
- buffer
- several bytes, in order. for now, the length of the buffer is just intrinsically part of it – but later arcs will explore the consequences of the much more complex reality!
- encoding
- a method for storing data we care about as bytes. for this series, we'll be talking about text, and specifically ascii, since it lets me gloss over a lot of non-cryptographic details – but there are other encodings, including for other types of data, like images.
ok, i know that was a lot if you're unfamiliar with computers, but if it makes you feel better: this is also information a lot of devs don't know. modern programming abstracts a lot of it away, which is both easier and more reliable. after all, dealing with all these details constantly means more ways you can make mistakes, and– well. we'll get to the consequences of those mistakes some other time. suffice to say that it's good that we build systems to avoid them.
cryptography
cryptography is built on computers, so it mostly works the same as them, e.g. working in terms of bits or bytes or words. but to avoid ambiguity between similar concepts at different levels, and because of the new concepts, some new words are needed.
- plaintext
- the message you want to secure with cryptography. (despite the name, plaintext can include non-text stuff – by the time it gets to the cryptography, it's all bytes regardless.)
- ciphertext
- a plaintext that's been secured with cryptography.
- encryption
- the process of converting plaintext into ciphertext. the reverse is decryption.
- key
- a parameter to an encryption algorithm, used such that the only way to decrypt is by having the correct key.
- iv/nonce
- another parameter to most encryption algorithms which serves to add some controlled randomness.
- block
- a fixed-length group of bits, where the actual length is decided by the cryptographic algorithm.
one important bit of detail i'm going to gloss over for arc 1 is the precise nature of the key. if you're curious, arc 1 explores what's called "symmetric key" cryptography, where the same key is used for encryption and decryption. predictably there's also "asymmetric key" cryptography, where you use different keys to encrypt and decrypt.
that said, this intro to cryptography will already be complex enough, so i'm going to gloss over that until a later arc.
python
you don't actually need to know any python to read the blogposts. i will use it occasionally, when explaining the algorithms i'm going to attack, but the exact same information will be repeated in the text – the python's just there for folks who understand it to be able to read a little easier.
that said, my sneaky secret hope is that by reading the code, then the explanation of what it does, you'll start to absorb some python coding by osmosis.
math
like python, you don't need to know any math for this, though it might simplify things. that said, i do use mathematical terms fairly often – they're just also terms you'd probably understand as a layperson. words like "probability" and "x to the y", or conventions like using x and y as placeholders.
ready?
first: the point isn't for you to memorize all this and them come back in a month and still know it, it's to have this post as a handy reference for everything i talk about going forward.
second: the point for this entire series isn't to be memorized, either. the hope is that it'll be educational regardless of your existing knowledge, and that it'll provide you the introduction i got to crypto: slightly bewildering, but utterly fascinating, something you'll bounce off of several times before finally sticking the landing and getting properly into it.