Privacy & security | Peter Rohde

Apple recently announced that the next version of their iOS operating system will include new child protection features. Amongst these is a new image scanning technology that will enable Apple to notify the National Center for Missing & Exploited Children (NCMEC) if a user synchronises images containing child abuse material with iCloud. The technical summary for how this technology works is available here.

The architecture is designed to not rely on direct comparison of images, since this would require Apple to maintain a database of offending images themselves, which clearly is something to be avoided. Instead, it relies on comparing hashes of images, or rather of image ‘signatures’ which characterise images independent of simple changes. These signatures are generated by a machine learning algorithm named NeuralHash that identifies the essential features that characterise images as opposed to raw image data which can easily be reformatted, transformed or obscured with modifications.

For our purposes, all we need to know about hash functions is that they are one-way functions that take an arbitrary piece of data as input and generate a short hash (also known as a digest or checksum) as their output, usually no longer than 256-bits in length. The one-way nature of these functions means that if you are provided with a hash it is not computationally possible to invert it to determine what the actual input was. Hash functions are one of our most secure cryptographic primitives and are believed to be robust even against future quantum computers. This provides a very useful means by which to securely compare whether two pieces of data are the same without revealing what the data was. For this application, this means that only the party that generated the list of hashes from the original images can connect the dots should a flagged hash be provided to them, but any intermediary dealing only with the hashes remains oblivious to what they correspond to.

This cryptographic property of hash functions allows Apple to identify offending material without possessing it or uploading it to your phone. However, it simultaneously implies that if the hash list provided to them contains hashes of things unrelated to child protection it is not possible for them to know and they retain full plausible deniability if the list is manipulated for other purposes.

Apple is clearly not going to get into the business of validating the integrity of the hash list they are provided with. They’ll very intentionally remain as legally and ethically separated from that process as possible. All they do is provide the NCMEC with appropriate provisions if a sufficient number of flagged hashes are attributed to someone’s account. After that, it’s entirely up to the NCMEC who by necessity have to operate behind a veil of extreme secrecy, immune to public oversight. The only other example of such organisations that spring to mind are defence and intelligence agencies, but that would be very bad marketing compared to the more emotionally appealing notion of saving children from sexual predators.

We know that the scheme as advertised is not going to catch pedophiles by virtue of the fact that Apple has openly stated that filtering only applies to iCloud uploads, and I’m sure that even pedophiles are intelligent enough to recognise that this means you can just turn it off. I’m similarly confident that Tim Cook is intelligent enough to have spotted this one for himself and the next move is understood — in a future version of iOS there’ll be one of those privacy policy updates that no one reads but everyone has to agree to, which is exactly what just happened with WhatsApp, the difference being that in WhatsApp’s case it only pertained to releasing metadata whereas here we’re talking about actual message content.

Far more suspicious is the fact that the filtering is taking place client-side rather than in the cloud. One of the primary motivations driving our technological shift towards cloud computing is that it’s far more efficient for computation to be centralised, which is especially the case when it comes to devices with limited resources like phones. It’s hard to justify the design decision to perform computational filtering of cloud data on the client-side — a complete inversion of usual cloud computing principles — unless being restricted to cloud data isn’t the intention at all.

The fact that Apple is using the same tried-and-tested “save the kids” marketing campaign so over-used that it acts as an immediate red flag is telling in itself. To provide some context into just how overused the “save the kids” marketing campaign is (along with “terrorism” — presumably coming in iOS 16), a Google test of the number of hits associated with different encryption-related search terms provides some indication of the relative attention given to the different contexts in which concerns about encryption are raised:

“encryption child protection”: 13,900,000
“encryption terrorism”: 10,500,000
“encryption human centipede”: 9,970,000
“encryption murder”: 9,190,000
“encryption chainsaw massacre”: 8,330,000
“encryption arms trafficking”: 3,410,000
“encryption getting stabbed by a meth dealer in Logan”: 793,000
“encryption drug trafficking”: 718,000
“encryption human trafficking”: 633,000

The priorities here are obviously completely wrong. Poor old murder, who makes an appearance about 20,000 times every year in the United States, is feeling quite undervalued here. And who in their right mind is more concerned about the prospect of getting blown up by Osama bin Laden than being turned into a human centipede?

Many but not all of these things are more common and present a greater threat than the subset of pedophiles who lack the intelligence to turn iCloud off, which involves clicking this button:

No doubt I’m now going to be accused of being a pedophile-enabler by providing this intricate hacking technique despite the fact that Apple already just told them.

iOS 15 isn’t going to be the last of Apple’s operating system updates. But let’s assume the security model described in the technical white paper remains fixed. How could it be manipulated to provide full back-door access to message content beyond child abuse material?

The obvious next step is for filtering to extend beyond images to include text strings. This has very nefarious political implications given the heavy use of hashtags, code-words and names within political organisations. For example, the Turkish government recently announced that it is investigating the #HelpTurkey hashtag, suspected of being associated with a foreign influence operation. Hashing a hashtag is completely compatible with Apple’s security model. Foreign influence aside, this could be used to identify members of political organisations, something the Indian government has been proactively pursuing.

Alternately, the machine learning algorithm that generates image signatures could simply be updated to recognise other things. This is an especially opaque part of the computational pipeline. Machine learning classifiers operate very differently than conventional algorithms whose operation can easily be understood by inspecting the source code. A machine learning implementation will typically operate according to a fixed algorithm — for example simulating a neural network — whose operation is determined by its training information, essentially a long list of numbers representing complex multi-variate correlations, which collectively allow complex patterns or features to be recognised. This is very similar to how the human brain processes information, where it is the weighting of which neurons communicate with which that determines what it is that we have learned. But in the same way that knowing the structure of a brain does not allow us to know what it has learned, it is unviable to reverse engineer machine learning training parameters and know what it has learned to recognise. This choice of algorithmic implementation, therefore, makes it extremely difficult to know what it is capable of even with full knowledge of all its parameters, thereby limiting scrutiny.

But with some simple updates on the client-side software the problems can extend far further than this. Although it isn’t possible for NCMEC to have a pre-calculated list of hashes for every possible image or every possible piece of text, a simple technical workaround is to communicate hashes on a byte-by-byte basis. If we take an arbitrary image and instead of hashing the full image we directly hash each individual byte in the image (or other) data, there are only 256 pre-calculated hashes that need to be stored to enable full reconstruction of an arbitrary data stream using a dictionary attack. On the client side, this only requires a few lines of changes in the operating system code, which have probably already been written but currently commented out. This very trivial client-side change, easily automatically pushed as part of a future iOS update, does not require any changes to the advertised security model, which is still only comparing and passing hashes around but would provide NCMEC with full backdoor access to any data stream. The model is already fully future-proofed to enable this. Only minor changes in the iOS code are needed to take full advantage of it.

It’s hard to know what NCMEC might consider doing with full backdoor privileges, but this capability is certainly something of enormous interest to many branches of government, whose eyes must be watering. A cursory glance at their organisational structure and history raises some immediate issues.

According to the NCMEC Wikipedia page the current President and CEO, John F. Clark, was appointed Director of the United States Marshals Service by George W. Bush, and in 2010 joined defence contractor Lockheed Martin as their director of security operations for information systems and global solutions, and spent seven years working in the Special Operations Group.

Its Board Chair, Karen Tandy, served as the administrator for the United States Drug Enforcement Agency (DEA), also nominated by George W. Bush, and since 2016 served as vice-chair for the Homeland Security Advisory Council.

And on the issue of emotionally manipulating public opinion, its co-founder, John Walsh, has attracted some controversy:

Some critics accuse Walsh of creating predator panic by using his publicity. Walsh was heard by Congress on February 2, 1983, where he gave an unsourced claim of 50,000 abducted and 1.5 million missing children annually. He testified that the U.S. is “littered with mutilated, decapitated, raped, strangled children,” when in fact, a 1999 Department of Justice study found only 115 incidences of stereotypical kidnappings perpetrated by strangers, about 50 of which resulted in death or the child not being found.

Apple’s child protection initiative needs to be called out for what it is — integrating a device-level backdoor into its operating system, sending information to a government-backed private agency closely linked to the security establishment, intentionally designed in such a way that Apple has plausible deniability as to what information is being targeted, by necessity completely shielded from public oversight, and all marketed using the usual vernacular that governments use to engage in emotional blackmail whenever they can’t get what they want by implying you support pedophiles if you disagree.

It’s self-evident that the agenda isn’t to catch pedophiles if they simultaneously tell them how to avoid it. The only way to prevent this software from being misused is to prevent it from being implemented.

Peter Rohde

Category Archives: Privacy & security

Apple’s child protection software is a government backdoor

Quantum computer scientist & alpinist.