A Deep Dive into iOS Code Signing
Apple’s code signing is a complex beast, consisting of several different components, each serving its own unique purpose. When I first started working on Meteorite, I found that while there were several resources detailing the internals of code signing, none of them had enough the amount of detail I required to reimplement it. This post is a result of several months of research, experimentation and head-scratching, and it hopes to be the most comprehensive resource on this topic.
Introduction
Generally speaking, any code which runs on an iOS device must have a chain-of-trust leading up to Apple’s root certificate authority. This chain-of-trust is embedded into the Mach-O executable itself when the executable is signed. When talking about an iOS application, each resource used by it (be it an image, font or a library) must also be signed along with the main executable.
However, it is rather hard (and inefficient) to stuff all of the aforementioned information into the executable itself. The application might not need access to all of its embedded resources at launch time, so embedding their signature into the main executable is a waste of memory. Therefore, a code signature consists of two major components: the Application Seal (AKA the Resource Directory), and the Embedded Signature.
The Resource Directory
The resource directory contains the hashes of every non-executable resource embedded in an application. These hashes are stored in the base64
notation in a file named _CodeSignature/CodeResources
.
The CodeResources
file is a standard XML Property List, usually containing two or three dictionaries in its root.
The files
dictionary
This dictionary contains the SHA-1 hash of every file present in this bundle — it may also include a key named optional
, which signifies that the absence of the corresponding file will not invalidate the signature. The optional
key is commonly seen with localisation files, as their absence will still allow the application to execute successfully.
Here’s an example of this dictionary (taken from Xcode’s own code signature):
<key>files</key>
<dict>
<key>Resources/Acknowledgments.pdf</key>
<data>
u6KniRItunp3N7x9a74lXnQ0QvU=
</data>
<key>Resources/AppDataDocument.icns</key>
<data>
vgn87DyNdOmduzJRKxkK6I73E6Y=
</data>
<key>Resources/English.lproj/InfoPlist.strings</key>
<dict>
<key>hash</key>
<data>
zlg6+gKspymivP2k5j8UZpRK3a4=
</data>
<key>optional</key>
<true/>
</dict>
</dict>
The files2
dictionary
Much like the files
dictionary, this also contains hashes of embedded resources — however, unlike its predecessor, it contains both SHA-1 and SHA-256 hashes.
An example, again from Xcode’s signature:
<key>Resources/Assets.car</key>
<dict>
<key>hash2</key>
<data>
JT7UBjU9e1KoXF2fpnAP0M0JLZEEIlptpyZGojk9MTo=
</data>
</dict>
<key>Resources/English.lproj/InfoPlist.strings</key>
<dict>
<key>hash2</key>
<data>
mvfcsBoInnedi0i+1lqoIKv0MnbY6ee+PELmExwpuaA=
</data>
<key>optional</key>
<true/>
</dict>
As it is plainly obvious, the files2
dictionary was an addition to the codesigning framework well after its inception, for future proofing against any weaknesses in the SHA-1 hash.
So far, we have guaranteed that the resources used by the application haven’t been tampered with. However, it would be rather useless to go through all this effort if an attacker could simply modify the executable itself. Enter, the Embedded Code Signature.
The Embedded Code Signature
This part of the write-up might require some basic knowledge of the structure of a Mach-O binary. While I’ve made the best possible attempt to explain each of these underlying concepts, I would still recommend the reader to take a cursory look at this excellent article on parsing Mach-O files in case my explanation doesn’t quite cut it. For brevity’s sake, we are only considering thin Mach-O binaries, i.e., those which contain code for only a single ISA.
A Mach-O executable starts with a simple header containing a magic value, some metadata about the file, the target ISA and most importantly, the number of load commands.
A Load Command can be considered as an instruction to the dynamic linker — it can contain anything from how an executable is to be loaded (LC_SEGMENT
), information on dylib dependencies (LC_LOAD_DYLIB
) or information about the code signature (LC_CODE_SIGNATURE
). The otool
utility provides us a convenient way to have a glance at the load commands within a binary.
Umangs-MacBook-Pro:~ umang$ otool -l `which ls`
/bin/ls:
Mach header
magic cputype cpusubtype caps filetype ncmds sizeofcmds flags
0xfeedfacf 16777223 3 0x80 2 18 1816 0x00200085
Load command 0
cmd LC_SEGMENT_64
cmdsize 72
segname __PAGEZERO
vmaddr 0x0000000000000000
vmsize 0x0000000100000000
fileoff 0
filesize 0
maxprot 0x00000000
initprot 0x00000000
nsects 0
flags 0x0
(...)
Load command 17
cmd LC_CODE_SIGNATURE
cmdsize 16
dataoff 29184
datasize 9520
However, we are only concerned with LC_CODE_SIGNATURE
for the purposes of this post. As we can see, we’re informed that the code signature is at an offset of 29184
bytes from the Mach-O header and is 9520
bytes big.
If we would seek to 29184
bytes beyond the Mach-O header, we would find the Super Blob, which consists of several Blob Indices. The C representation of this structure is as follows:
typedef struct __BlobIndex {
uint32_t type; /* type of entry */
uint32_t offset; /* offset of entry */
} CS_BlobIndex;
typedef struct __SuperBlob {
uint32_t magic; /* magic number */
uint32_t length; /* total length of SuperBlob */
uint32_t count; /* number of index entries following */
CS_BlobIndex index[]; /* (count) entries */
/* followed by Blobs in no particular order as indicated by offsets in index */
} CS_SuperBlob;
This brings us to a very simple yet important question: what is a blob?
The Codesigning Blobs
The Mach-O embedded code signature consists of several logical components, each of which has its own unique purpose. Each of these components are represented as a blob and holds whatever information it requires to serve its purpose.
Every blob follows the general pattern of a 32-bit magic value, a 32-bit length of the blob (including the lengths of the magic value and the length field itself) and then arbitrary data which is specified by the specific type the blob corresponds to.
There are 4 blobs of practical importance and which can be found in most Mach-O binaries:
- The Entitlements Blob
- The Requirements Blob,
- The Code Directory, and
- The Signature Blob.
The Entitlements Blob
The Entitlements blob contains the codesigning entitlements available to this binary and has the same format as described above, with the Plist containing the entitlements immediately following the length field. Entitlements deserve a blog post dedicated to themselves, however, Apple’s own documentation is a good starting point for the curious reader.
The Requirements Blob
The Requirements blob is a collection of code-signing requirements, which specify the additional conditions required for the signature to be valid. These requirements are laid out in Reverse Polish Notation, and most operands are represented as a single byte.
The most common requirement set is simply OpTrue, OpAnd, OpIdent(cs_ident), OpAppleGenericAnchor
, which would mean that the identifier of the binary must be cs_ident
and the chain of trust must lead to Apple’s generic codesigning anchor.
I had figured all of this out the hard way by reverse-engineering codesign
outputs — however (to my utter disappointment), all of this is publicly available in source-code form at Apple’s libsecurity_codesigning
repo, a mirror of which lives here.
While implementing the requirement serialiser for Meteorite, I remember leaving the following comment:
// For whatever reason, the string is padded with null bytes so
// that it's final length is divisible by 4. Please don't ask
// me how I know this, because the answer is pain.
The Code Directory
So far, we’ve accounted for a few things: entitlements, requirements and resource directories — however, all of that is useless if the actual code can be modified. Therefore, the code directory contains cryptographic hashes of the actual instructions contained in a binary.
Here’s the code directory represented as a C structure.
typedef struct __CodeDirectory {
uint32_t magic; /* magic number (CSMAGIC_CODEDIRECTORY) */
uint32_t length; /* total length of CodeDirectory blob */
uint32_t version; /* compatibility version */
uint32_t flags; /* setup and mode flags */
uint32_t hashOffset; /* offset of hash slot element at index zero */
uint32_t identOffset; /* offset of identifier string */
uint32_t nSpecialSlots; /* number of special hash slots */
uint32_t nCodeSlots; /* number of ordinary (code) hash slots */
uint32_t codeLimit; /* limit to main image signature range */
uint8_t hashSize; /* size of each hash in bytes */
uint8_t hashType; /* type of hash (cdHashType* constants) */
uint8_t platform; /* platform identifier; zero if not platform binary */
uint8_t pageSize; /* log2(page size in bytes); 0 => infinite */
uint32_t spare2; /* unused (must be zero) */
/* Version 0x20100 */
uint32_t scatterOffset; /* offset of optional scatter vector */
/* Version 0x20200 */
uint32_t teamOffset; /* offset of optional team identifier */
/* followed by dynamic content as located by offset fields above */
} CS_CodeDirectory;
Looking at this, a few things are evident right off the bat — the format is extensible (as seen by the additions after spare2
) and backwards compatible. Also, the actual hashes can be in a variety of formats, as specified by the hashType
(in practice, I’ve seen SHA1, SHA256 and SHA384).
Moving forward, at hashOffset
bytes beyond the start of the code directory header, there’s a series of cryptographic hashes, each of type hashType
and hashSize
bytes long (referred to as a hash slot
). The hash for the slot i
is calculated upon pageSize
bytes of data at offset pageSize * i
bytes from the start of the Mach-O header.
However, there’s some allusion to the number of special hash slots too — these slots have a negative index and are situated behind the ordinary hash slots. They are considered special as they contain the hashes of other blobs (entitlements, requirements, etc.). For example, the entitlements blob’s hash is stored at hashSlot-1, positioning it just behind the hash slot for the first page of the binary. The hashes of the Resource Directory and the application’s Info.Plist are also stored in special hash slots, ensuring their integrity too.
The Signature Blob
Even as we’ve hashed the contents of the binary and the various other blobs, we haven’t cryptographically signed them, making all of our work useless. The Signature Blob exists for this very reason — it contains a cryptographic signature made against the hash of the code directory. The signer’s certificate is usually issued by Apple, and therefore creates a chain-of-trust all the way down to the code directory and the other blobs.
On the surface, the signature blob is just like the entitlements blob.
struct __SignatureBlob {
uint32_t magic;
uint32_t length;
// Followed by `length - 8` bytes of signature data
};
The signature data is a RFC5652/CMS Signature made against the code directory. However, to further complicate matters, there is another version of the signature blob is made against the code directory, but has an additional signed attribute with the OID 1.2.840.113635.100.9.1
containing a Property List.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>cdhashes</key>
<array>
<data><!-- SHA256 hash of the code directory --></data>
<data><!-- SHA1 hash of the code directory (optional) --></data>
</array>
</dict>
</plist>
I presume that this variant exists to accommodate multiple code directories in a single binary and offer backwards compatibility.
The Conclusion
Now that we’ve made our way up to the signature blob, it’s easy to have a top-down view of the codesigning process — the signature blob ensures the integrity of the code directory, which ensures the integrity of:
- the bundled resources,
- the application manifest,
- the entitlements blob,
- the requirements blob, and
- the contents of the binary, including the compiled code.
The process is somewhat delicate but is very comprehensive — there’s no room for an attacker modifying a signed application without invalidating its signature. It is also designed to be evaluated quickly — the application doesn’t need to be loaded completely into RAM to verify its signature as the hashes correspond to the individual pages instead of the entire executable. Furthermore, the code directory format is very extensible, as seen with support for a variety of hash formats and arbitrary extensions (sparse format, team ID, etc.). You could even say that for the signature blob and the resource directory, both of which have grown to accommodate SHA-256.
My last word is that the entire system works like clockwork and is, in some way, beautifully designed. It toes the line perfectly between performance and security, and if I could ask for a single wish, it’d be to allow the user to add their own trust root on iOS devices (yes, I know, that’ll happen a few days after the heat death of the universe, but one can hope).