Apple's code signing is a complex beast, consisting of several different components, each serving its own unique purpose. When I first started working on Meteorite, I found that while there were several resources detailing the internals of code signing, none of them had enough the amount of detail I required to reimplement it. This post is a result of several months of research, experimentation and head-scratching, and it hopes to be the most comprehensive resource on this topic.


Introduction

Generally speaking, any code which runs on an iOS device must have a chain-of-trust leading up to Apple's root certificate authority. This chain-of-trust is embedded into the Mach-O executable itself when the executable is signed. When talking about an iOS application, each resource used by it (be it an image, font or a library) must also be signed along with the main executable.

However, it is rather hard (and inefficient) to stuff all of the aforementioned information into the executable itself. The application might not need access to all of its embedded resources at launch time, so embedding their signature into the main executable is a waste of memory. Therefore, a code signature consists of two major components: the Application Seal (AKA the Resource Directory), and the Embedded Signature.


The Resource Directory

The resource directory contains the hashes of every non-executable resource embedded in an application. These hashes are stored in the base64 notation in a file named _CodeSignature/CodeResources.

The CodeResources file is a standard XML Property List, usually containing two or three dictionaries in its root.

The files dictionary

This dictionary contains the SHA-1 hash of every file present in this bundle — it may also include a key named optional, which signifies that the absence of the corresponding file will not invalidate the signature. The optional key is commonly seen with localisation files, as their absence will still allow the application to execute successfully.

Here's an example of this dictionary (taken from Xcode's own code signature):

<key>files</key>
<dict> 
	<key>Resources/Acknowledgments.pdf</key>	
	<data>         
		u6KniRItunp3N7x9a74lXnQ0QvU=
	</data>        
	<key>Resources/AppDataDocument.icns</key>
	<data> 
		vgn87DyNdOmduzJRKxkK6I73E6Y=	
	</data>
	<key>Resources/English.lproj/InfoPlist.strings</key>
        <dict>
                <key>hash</key>
                <data>
                zlg6+gKspymivP2k5j8UZpRK3a4=
                </data>
                <key>optional</key>
                <true/>
        </dict>
</dict>

The files2 dictionary

Much like the files dictionary, this also contains hashes of embedded resources — however, unlike its predecessor, it contains both SHA-1 and SHA-256 hashes.

An example, again from Xcode's signature:

<key>Resources/Assets.car</key>
<dict>          
        <key>hash2</key>
        <data>          
        JT7UBjU9e1KoXF2fpnAP0M0JLZEEIlptpyZGojk9MTo=
        </data>         
</dict>         
<key>Resources/English.lproj/InfoPlist.strings</key>
<dict>          
        <key>hash2</key>
        <data>          
        mvfcsBoInnedi0i+1lqoIKv0MnbY6ee+PELmExwpuaA=
        </data>         
        <key>optional</key>
        <true/>         
</dict>

As it is plainly obvious, the files2 dictionary was an addition to the codesigning framework well after its inception, for future proofing against any weaknesses in the SHA-1 hash.

So far, we have guaranteed that the resources used by the application haven't been tampered with. However, it would be rather useless to go through all this effort if an attacker could simply modify the executable itself. Enter, the Embedded Code Signature.


The Embedded Code Signature

This part of the write-up might require some basic knowledge of the structure of a Mach-O binary. While I've made the best possible attempt to explain each of these underlying concepts, I would still recommend the reader to take a cursory look at this excellent article on parsing Mach-O files in case my explanation doesn't quite cut it. For brevity's sake, we are only considering thin Mach-O binaries, i.e., those which contain code for only a single ISA.

A Mach-O executable starts with a simple header containing a magic value, some metadata about the file, the target ISA and most importantly, the number of load commands.

A Load Command can be considered as an instruction to the dynamic linker — it can contain anything from how an executable is to be loaded (LC_SEGMENT), information on dylib dependencies (LC_LOAD_DYLIB) or information about the code signature (LC_CODE_SIGNATURE). The otool utility provides us a convenient way to have a glance at the load commands within a binary.

Umangs-MacBook-Pro:~ umang$ otool -l `which ls`
/bin/ls:
Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
 0xfeedfacf 16777223          3  0x80           2    18       1816 0x00200085
Load command 0
      cmd LC_SEGMENT_64
  cmdsize 72
  segname __PAGEZERO
   vmaddr 0x0000000000000000
   vmsize 0x0000000100000000
  fileoff 0
 filesize 0
  maxprot 0x00000000
 initprot 0x00000000
   nsects 0
    flags 0x0
(...)
Load command 17
      cmd LC_CODE_SIGNATURE
  cmdsize 16
  dataoff 29184
 datasize 9520

However, we are only concerned with LC_CODE_SIGNATURE for the purposes of this post. As we can see, we're informed that the code signature is at an offset of 29184 bytes from the Mach-O header and is 9520 bytes big.

If we would seek to 29184 bytes beyond the Mach-O header, we would find the Super Blob, which consists of several Blob Indices. The C representation of this structure is as follows:

typedef struct __BlobIndex {
  uint32_t type;                                  /* type of entry */
  uint32_t offset;                                /* offset of entry */
} CS_BlobIndex;

typedef struct __SuperBlob {
	uint32_t magic;					/* magic number */
	uint32_t length;				/* total length of SuperBlob */
	uint32_t count;					/* number of index entries following */
	CS_BlobIndex index[];			/* (count) entries */
	/* followed by Blobs in no particular order as indicated by offsets in index */
} CS_SuperBlob;

This brings us to a very simple yet important question: what is a blob?

The Codesigning Blobs

The Mach-O embedded code signature consists of several logical components, each of which has its own unique purpose. Each of these components are represented as a blob and holds whatever information it requires to serve its purpose.

Every blob follows the general pattern of a 32-bit magic value, a 32-bit length of the blob (including the lengths of the magic value and the length field itself) and then arbitrary data which is specified by the specific type the blob corresponds to.

There are 4 blobs of practical importance and which can be found in most Mach-O binaries:

  1. The Entitlements Blob
  2. The Requirements Blob,
  3. The Code Directory, and
  4. The Signature Blob.

The Entitlements Blob

The Entitlements blob contains the codesigning entitlements available to this binary and has the same format as described above, with the Plist containing the entitlements immediately following the length field. Entitlements deserve a blog post dedicated to themselves, however, Apple's own documentation is a good starting point for the curious reader.

The Requirements Blob

The Requirements blob is a collection of code-signing requirements, which specify the additional conditions required for the signature to be valid. These requirements are laid out in Reverse Polish Notation, and most operands are represented as a single byte.

The most common requirement set is simply OpTrue, OpAnd, OpIdent(cs_ident), OpAppleGenericAnchor, which would mean that the identifier of the binary must be cs_ident and the chain of trust must lead to Apple's generic codesigning anchor.

I had figured all of this out the hard way by reverse-engineering codesign outputs — however (to my utter disappointment), all of this is publicly available in source-code form at Apple's libsecurity_codesigning repo, a mirror of which lives here.

While implementing the requirement serialiser for Meteorite, I remember leaving the following comment:

// For whatever reason, the string is padded with null bytes so
// that it's final length is divisible by 4. Please don't ask
// me how I know this, because the answer is pain.

The Code Directory

So far, we've accounted for a few things: entitlements, requirements and resource directories — however, all of that is useless if the actual code can be modified. Therefore, the code directory contains cryptographic hashes of the actual instructions contained in a binary.

Here's the code directory represented as a C structure.

typedef struct __CodeDirectory {
	uint32_t magic;					/* magic number (CSMAGIC_CODEDIRECTORY) */
	uint32_t length;				/* total length of CodeDirectory blob */
	uint32_t version;				/* compatibility version */
	uint32_t flags;					/* setup and mode flags */
	uint32_t hashOffset;			/* offset of hash slot element at index zero */
	uint32_t identOffset;			/* offset of identifier string */
	uint32_t nSpecialSlots;			/* number of special hash slots */
	uint32_t nCodeSlots;			/* number of ordinary (code) hash slots */
	uint32_t codeLimit;				/* limit to main image signature range */
	uint8_t hashSize;				/* size of each hash in bytes */
	uint8_t hashType;				/* type of hash (cdHashType* constants) */
	uint8_t platform;				/* platform identifier; zero if not platform binary */
	uint8_t	pageSize;				/* log2(page size in bytes); 0 => infinite */
	uint32_t spare2;				/* unused (must be zero) */
	/* Version 0x20100 */
	uint32_t scatterOffset;				/* offset of optional scatter vector */
	/* Version 0x20200 */
	uint32_t teamOffset;				/* offset of optional team identifier */
	/* followed by dynamic content as located by offset fields above */
} CS_CodeDirectory;

Looking at this, a few things are evident right off the bat — the format is extensible (as seen by the additions after spare2) and backwards compatible. Also, the actual hashes can be in a variety of formats, as specified by the hashType (in practice, I've seen SHA1, SHA256 and SHA384).

Moving forward, at hashOffset bytes beyond the start of the code directory header, there's a series of cryptographic hashes, each of type hashType and hashSize bytes long (referred to as a hash slot). The hash for the slot i is calculated upon pageSize bytes of data at offset pageSize * i bytes from the start of the Mach-O header.

However, there's some allusion to the number of special hash slots too — these slots have a negative index and are situated behind the ordinary hash slots. They are considered special as they contain the hashes of other blobs (entitlements, requirements, etc.). For example, the entitlements blob's hash is stored at hashSlot-1, positioning it just behind the hash slot for the first page of the binary. The hashes of the Resource Directory and the application's Info.Plist are also stored in special hash slots, ensuring their integrity too.

The Signature Blob

Even as we've hashed the contents of the binary and the various other blobs, we haven't cryptographically signed them, making all of our work useless. The Signature Blob exists for this very reason — it contains a cryptographic signature made against the hash of the code directory. The signer's certificate is usually issued by Apple, and therefore creates a chain-of-trust all the way down to the code directory and the other blobs.

On the surface, the signature blob is just like the entitlements blob.

struct __SignatureBlob {
    uint32_t magic;
    uint32_t length;
    // Followed by `length - 8` bytes of signature data
};

The signature data is a RFC5652/CMS Signature made against the code directory. However, to further complicate matters, there is another version of the signature blob is made against the code directory, but has an additional signed attribute with the OID 1.2.840.113635.100.9.1 containing a Property List.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>cdhashes</key>
        <array>
            <data><!-- SHA256 hash of the code directory --></data>
            <data><!-- SHA1 hash of the code directory (optional) --></data>
        </array>
    </dict>
</plist>

I presume that this variant exists to accommodate multiple code directories in a single binary and offer backwards compatibility.

The Conclusion

Now that we've made our way up to the signature blob, it's easy to have a top-down view of the codesigning process — the signature blob ensures the integrity of the code directory, which ensures the integrity of:

  • the bundled resources,
  • the application manifest,
  • the entitlements blob,
  • the requirements blob, and
  • the contents of the binary, including the compiled code.

The process is somewhat delicate but is very comprehensive — there's no room for an attacker modifying a signed application without invalidating its signature. It is also designed to be evaluated quickly — the application doesn't need to be loaded completely into RAM to verify its signature as the hashes correspond to the individual pages instead of the entire executable. Furthermore, the code directory format is very extensible, as seen with support for a variety of hash formats and arbitrary extensions (sparse format, team ID, etc.). You could even say that for the signature blob and the resource directory, both of which have grown to accommodate SHA-256.

My last word is that the entire system works like clockwork and is, in some way, beautifully designed. It toes the line perfectly between performance and security, and if I could ask for a single wish, it'd be to allow the user to add their own trust root on iOS devices (yes, I know, that'll happen a few days after the heat death of the universe, but one can hope).