Transcript
Sills: I want to start with a reminder that images have tremendous power. They can show us the truth of the situation. They can shock us out of complacency. They can document brutality. They can bring down the powerful. This is an image from the Tiananmen Square protests in Beijing, China in 1989. It depicts a man carrying what appears to be shopping bags, who decided to stand in front of a line of tanks and prevent them from moving forward. Then the days prior to this photo, hundreds of protesters had been killed, making this an incredible act of bravery. To this day, I still find this photo incredibly moving. Photos like these are powerful, because they capture the truth, because they're evidence of what actually happened, even if it's controversial. We use photos as evidence of what's true. We can't rely on photos in this way any longer. These are photos depicting the Pope wearing a white designer Puffer jacket. None of these photos are real. They were created by a 31-year-old from Chicago using Midjourney, an AI tool for generating images. Now we're on the cusp of a major societal shift, where we can't trust anything that we see. Not only will we see photos that aren't true, we'll increasingly question and disbelieve real photos. Everything that we see will be called into question. Given this major societal issue, what can we do? Some people have suggested trying to detect the characteristics of AI generated photos. For instance, in the case of these fake photos of the Pope, we can see that the Pope's glasses merge into his eyelids, a clear tell that the image is fake. The solution of trying to detect AI tells, it's not a long-term solution. AI is going to continue to improve and the characteristics that we're looking for, they're going to disappear. AI generation and AI detection is an arms race where both sides are trying to outcompete each other, and the previous solutions are quickly obsolete. More than that, there's an imbalance. It's cheaper to produce fake images than it is to investigate and verify them. There's another imbalance, which is that the people who are producing the images, they might be technically sophisticated, whereas the people who are viewing the images are going to be the general public, who may not be. There's going to be a firehose of cheaply produced fake images, and the image verification processes, even aided by software, are going to struggle to keep up. We've seen a lot of talks about AI generated images, AI generated data. This is the opposite of that. This is the other side of that. This talk is about an alternative solution to the detection of AI, one that I think is more promising for our long-term future.
Background
I'm Kate Sills. I'm a software engineer and consultant. This talk is based on work that I did with Starling Lab, which is a research organization that was co-founded by Stanford Electrical Engineering and the USC Shoah Foundation. If you're interested in the problem of image verification, I recommend that you take a look at Starling Lab because what they're doing is really cutting edge. There's a lot that I'm not able to talk about that they're doing.
The Alternative to AI Detection
Like I was saying earlier, I think the idea of AI detection is all wrong. It's an arms race that we're unlikely to win, and so we need a different solution. This is a screenshot of a post that was in my Instagram feed. This account just posts interior decorating photos and people followed, good ideas to replicate it at home. No one's going to be able to replicate this at home because it's not real. If you look closely, under the cabinets, there's a paper towel holder, and there's a claw in the middle of the paper towel holder for some reason. I happened to notice this particular tell as I was scrolling. We need to assume that all sorts of characteristics like these are going to go away very soon. The real question is, if, in the future pictures like this happen to look like a real image to both humans and software, how can we tell whether this image is real or not? Here's a real image on the right that we can use to compare. If you look at the two screenshots, there's some real differences. For instance, for the image on the right, we know the location, we know where the photo was taken. We also know the human beings who were publishing it. We know who's behind the account. We even have the name of the homeowners. We can get all this from UI that's not specifically designed to combat AI generated images. We can also imagine some other information, other questions that aren't shown in the UI, but that the image on the right might have answers to and the image on the left wouldn't. For instance, when was the photo taken? Who was the photographer? How was the photo edited, if at all? Has anyone investigated the photo?
My argument is that, in the future, the major distinction is going to be between two classes of images, those without credible metadata, attestations, and supporting documents, and those with. AI images and other kinds of fake images are going to be falling into that without category. That's how they can be distinguished from real images. I can see some of you thinking already, so can't we just make up the fake metadata, the attestations, the supporting documents too? We can generate all of the fake images, let's just generate the fake data to go along with them. Yes, that's true. We can be generating fake data all day long and it would be very convincing. If we design our systems correctly, if we design our UI correctly, the attackers won't be able to easily generate credible data. That's how we'll be able to distinguish between the real data and the fake data. We have to raise the bar in how we collect, store, and verify real images. We have to raise the bar high enough such that the real images can meet this bar, but the fake images can't. We can do that with good system design, aided by cryptography. Let me talk about a few specific ways.
First, with cryptography, we can make it impossible to credibly backdate an image. I can always say this image was taken in 2006, and I can put that in the metadata, but it won't be credible. You'll be able to know that I'm lying. I'll go into detail on how to do this. Then, second, we can make it nearly impossible to fake sources. For instance, if an open source investigative group like Bellingcat says that this image was taken in Ukraine, the claim might be wrong. What they're saying might be wrong. An attacker can't credibly pretend to be them, or say things on their behalf or tamper with what was said. I'll spend the next part of my talk to go into detail on both of these points. I'm going to explain how exactly we can use cryptography to raise the bar for credible data. I'm going to share some JavaScript libraries that I recommend that you can go home and try out.
Attackers and Defenders
I find it helpful to think in terms of attackers and defenders. The defenders, hopefully, that's all of us, are trying to make sure that real images are believable and that fake AI generated images are recognized as such and not confused with the real images. The attacker in this scenario is trying to create confusion. They're trying to trick the general public into thinking that an AI generated image is a real image. The attacker in this scenario wins if the fake image can't be distinguished from the real image. We're going to play a little game. The image on the left is the real photo from Tiananmen Square, taken on June 5th, and the image on the right is an AI generated photo that someone made that is supposedly a selfie by the same man. In this game, the attacker succeeds if they can make it credible that this fake photo on the right was taken on June 5, 1989, and the location is Tiananmen Square. If we want to make it impossible to backdate images, we need to use a timestamping service. What is a timestamping service? A timestamping service very roughly, it allows you to submit an image and get a proof back that the image existed before a certain point in time. One thing that's very important to note here is that we have to timestamp proactively, well before we want to verify the image. Ideally, this would be done as soon as possible after the image is created. You can think of a timestamping service as a service that takes the image you give it and then prints it in The New York Times. That's not exactly how it works, because that's wildly expensive and inefficient. Let's put the inefficiency aside for now, we'll come back to that. If we did this, what would this give us? We would have proof that we could show a third party that the image existed before the paper was printed. The image couldn't have been created after the newspaper was printed because that's not how reality works. We know for sure that this particular image, because it was printed in The New York Times that it was taken before June 6, 1989. You don't have to take my word for this or trust me in any way. If we take the attacker's perspective, this is not looking very good for us. The real image was printed and sent to hundreds of thousands of subscribers all over the country, and archived in libraries and museums, and this fake image, it doesn't have anything like that. We could try to tamper with one museum's records. We could try to switch out one library's physical copy of the newspaper, but that would be easily found out if anyone compared to anyone else. We're not going to go to hundreds of thousands of subscribers and switch out their newspapers with another copy. That's just not feasible. The principle behind this is solid. It allows us to distinguish between real and fake historical photos. Like I said, printing images in newspapers is wildly inefficient. How can we fix that? We can use something called a cryptographic hash.
Cryptographic Hash Functions
A cryptographic hash is a unique identifier. It's derived from the contents. If anything changes in an image, even a bit or a pixel, then when a hash is produced, it'll be drastically different. Here are some properties of cryptographic hash functions. They can take any input size, so it doesn't matter how big the image is or whatever the input is, it could be 3 terabytes, and that'd still be fine. It always produces the same output size. For instance, a common hash algorithm is SHA-256, that always produces 256 bits no matter what the input size is. It's summarizing. Then, third, cryptographic hashes are collision resistant. A collision happens when you have two distinct inputs that each hash to the same output. Being collision resistant means that each distinct input hashes to a different output. Another way to think about this is that it produces a unique, distinct digital fingerprint for whatever the image is, or whatever we're hashing. Fourth, cryptographic hash functions hide the input, meaning that just by looking at the hash, looking at the output, you have no idea what the input was. It's as if the output is a random number. Another way to word this is that the cryptographic hashes are a one-way function that can't be reversed. Lastly, I think it's just important to note because there's been a lot of controversy around this with blockchain stuff. Hashes themselves are really easy and cheap to compute. There's no major energy usage or anything like that. You can think of it just like any other function in your program.
I have a weird analogy that I like to use to explain cryptographic hashes. Cryptographic hashes are like a truck seal, like the one pictured here. A truck seal is a plastic loop that seals the cargo of a semi-truck. In a shipping scenario, there's usually three companies. There's the shipper, which is usually the company that's selling the goods, the trucker who's transporting them, and then the receiver which is usually the buyer. The truck seal has a serial number on it, and that's recorded by the shipper. Then at receiving, the receiver, they cut the truck seal, and then they check that serial number and make sure it matches what was sent to them separately by the shipper. If the number on the seal matches, then you know that the truck hasn't been tampered with, no matter how shady the trucker is or how dangerous the route is. A truck seal number is like a cryptographic hash in that if anything at all happens to the truck, the truck seal number must change too, because the old seal must have been cut and a new one put on. Unlike a serial number, the cryptographic hash isn't just a random number. It's a summary, a digest of everything that's in the file or in the image.
I just want to give a small demo of how this works. We're going to upload a file and then see that the hash is produced pretty much instantaneously. We're just going to choose the AI version of the Tank Man image, and then we hash it. As we can see, it's hashed instantaneously. This was all happening on the frontend. It didn't have to go anywhere. It didn't have to upload anywhere. Pretty easy.
JavaScript Libraries for Hash
I promised some JavaScript library recommendations. First, there's the Web Crypto API that's available in your browser right now. You can call this with crypto.subtle.digest. You can also choose the hashing algorithm among a few options. I used SHA-256 For the demo, just because that's one of the most commonly used right now. There's also a downside to the Web Crypto API, which is that it doesn't support streaming. If you have a big file, you're going to have to load all of that into memory before you can actually hash it. Another recommendation that I have is the noble hashes JavaScript library. It's written entirely in JavaScript. It's audited by a known security firm. It supports streaming. It also supports some other newer hashing functions like BLAKE3, which Web Crypto API doesn't. The code on the screen is just some example code that you can use. You can see with the noble hashes library, you call create, and then you can call update as many times as you want with more data. That's how the streaming works. Then when you call digest, that spits out a hash, and you can't update anymore.
The Hash (A Unique Digital Fingerprint)
If we go back to the properties of cryptographic hash functions, we can see that cryptographic hash functions produce a unique digital fingerprint, the hash. Rather than putting the image itself in The New York Times, we can hash the image to get a unique fingerprint, and then put that in The New York Times. We haven't lost any rigor, we're still proving that an image existed before the newspaper was printed. Now we can print the hash, which is a lot less data. The reason why this works is because of the collision resistance and resistance in the hiding properties of cryptographic hash functions. If I hash an image, and then I put that hash in the newspaper, I'm not going to be able to come back later and claim that it was some other image that I timestamped, that I recorded, that would be a collision which won't happen. If I make up a random string, and I put that in the newspaper, I'm not going to be able to find some image that would have produced that hash. If I did manage to find something, it's not going to look like an image. It's just going to look like random data. Another way to think about this, is if we only printed a description, such as, there's a man in front of tanks, and we put that in The New York Times, it wouldn't work. Obviously, it wouldn't work, because it's not collision resistant, and it's not hiding. The AI image and the real image both fit this description. We're not proving anything about the existence of the images before a certain point in time. We can also produce new images that would fit this description. Any summary is not going to work, it really has to be a cryptographic hash function with these specific properties.
Hashing Hashes
As you might have noticed, it's still incredibly inefficient. If this was our methodology, The New York Times would just be full of gibberish. There wouldn't be any room for actual articles. How can we get more efficient? We can hash hashes. In this case, let's say that we want to timestamp two images. We can hash them both, and then concatenate those hashes together, and then hash that, and put the resulting hash in The New York Times. If we want to add more images, we can do that, we just have to add some more levels, and hash multiple times. Here, we're hashing four images, and we're summarizing them into a single hash, which is then recorded in the newspaper. Why does this work? If we want to prove that, say this image on the top was timestamped, we can store a proof. Specifically, in this case, what we would need to store is the sibling hash. If we have that, we can actually go through all the steps, starting with the image and verify that it was actually put into the newspaper. We can prove that it existed at this certain point in time.
The Merkle Tree
You might have noticed that if we just rotate that, this looks like a binary tree, like the kind that we might see in computer science. It's a binary tree with data at the leaves, and then summarizing hashes up above. This is called a Merkle tree. The root of the Merkle tree can be recorded elsewhere as a commitment to all of the values in the tree. If anything in the tree changes, that root hash is going to change as well. Importantly, a succinct proof can be produced where the proof size is approximately the height of the tree, so big O of log base 2 of n. Now we have the efficiency that we need. We can summarize many images. We can have as many Merkle trees as we want. We can produce this one hash that summarizes everything that's in those Merkle trees. When I talked about committing the root of the Merkle tree, this is known as anchoring. Surety, which was a timestamping service that started in the '90s, it did actually literally anchor in The New York Times. This is Stuart Haber, who was one of the co-founders, he wrote a very influential paper, I think it was 1991. This is him showing the actual entry in The New York Times classified section where they committed to this hash. You can also anchor in any other medium, it just has to be publicly viewable and unlikely to be tampered with. Another anchoring medium that's in use right now is Bitcoin. The timestamping services don't actually use the cryptocurrency, they take advantage of it being a subsidized data slot, basically. If you create a Bitcoin transaction, there's a place where you can put arbitrary data, and so they use that to store the root hash. The important thing is that the act of anchoring is a commitment, and you don't want it to be easy to change afterwards. OpenTimestamps is an example timestamping service. It has a UI where you can choose to file the timestamp, and then it hashes the file locally, batches that hash into a Merkle tree with the number of other hashes. Then it records a root hash in a Bitcoin transaction.
Let's give this a try. We'll see if it works. Before I came to this talk, I actually timestamped the AI Tank Man image. Now we're going to go through and try to verify that proof. Let's see. The OTS file is the proof. It is going to ask me for the file that I timestamped, which is the original. It says success. That root hash was put in a Bitcoin transaction. If you actually inspect that OTS file, what you'll see is a bunch of hashes prepended and appended to each other. That is just following that Merkle proof all the way to what is actually included in a Bitcoin transaction. My JavaScript library recommendation for timestamping is just the OpenTimestamps JavaScript library. It's easier in the frontend if you're just playing around with things to use the minified version that they provide, because otherwise you have to polyfill node buffers and things like that. You can also build it yourself if you want to. Here's a snippet of code where all we're doing is we're just taking a file and then we get the data and we timestamp it. This is just using SHA-256 as the particular hashing function.
Making It Impossible to Credibly Backdate an Image
We went over how to use timestamping services to make it impossible to credibly backdate an image. Of course, we can do this with more than images, we can do it with whatever data. We can do this with metadata with attestations or claims about an image. We can timestamp those too. When we timestamp data about an image, that gives us tamper evident properties similar to what the truck seal does for the cargo truck, and we can know for sure that this data hasn't been altered since the date it was timestamped. Also, I just want to point out that because of the hiding property, we're not actually revealing any information if someone just sees the hash, so even if the image itself needs to be private, we can timestamp it publicly without revealing the image itself to the public.
Caveats
There are some caveats. First, like I said, we have to proactively timestamp. Far before an image is called into question, we have to timestamp it. If we timestamp when we're trying to investigate an image, it's like putting the truck seal on after the truck has already arrived at receiving. That's not going to help. What people have done is create software. ProofMode is an app that you can use right now, that timestamps using OpenTimestamp when you take a picture with your phone. Second, someone has to keep the proof around. It's not that much data, but it's a little bit more data that we actually have to keep around, someone has to store in order to be able to verify that this photo actually was timestamped. A timestamping service can store that for you. Then you have to trust them to keep it. Then third, timestamping, just because of the nature of hashes, it doesn't handle other versions of that same file. If we edit, resize, change the file format, we're going to have a completely different hash, and that new hash isn't going to be timestamped. What we can do there is create a link between these two different images. We can say that this file is a derivative of this other file. If you have the edited file, you can follow the trail back to the file that was timestamped.
Making It Nearly Impossible to Fake Sources
The second thing that we can do is make it nearly impossible to fake sources. Let's say if a photographer says that they've taken a particular photo, or a new service says that they published the photo, we can know that it's them saying it and not someone pretending to be them. What do I mean by making it nearly impossible to fake sources? If I just show you the screenshot, you're trusting me. I could have completely made up this screenshot, and put Apartment Therapy's name on it. I could tell you that Time Magazine says that this picture of the Pope is real. In order to tell whether I'm telling the truth or not, you have to go to the source, you'd have to go to Time Magazine itself. What if you don't have to?
Digital Signatures
We can use digital signatures. Digital signatures allow us to sign a piece of data in a way that can't be forged or repudiated. Digital signatures use public key cryptography. The signer has both a public and a private key, and uses the private key to sign over that data. Then anyone with the public key can verify that the holder of the private key did indeed sign. This can't be forged, and unlike a paper signature, can't be cut and applied to a different document. You've probably all used digital signatures before. If you've ever used SSH to write to a GitHub repo, you've created a digital signature. Also, certain credit cards, if you're actually inserting with the chip into a reader, some of those create digital signatures using a private key in the card. Digital signatures allow us to share secondhand data as if it came directly from the source. They allow us to authenticate who's making a claim about information. We can prove that someone made a claim, regardless of how that claim came to us. That means there's a big benefit. The claim can be shared by untrusted, even malicious third parties, and we can still authenticate who's making it. In other words, with digital signatures, if there's any tampering by a relaying party, they won't be able to forge a new digital signature that covers up the tampering. Going back to the truck seal analogy, you can think of it like a truck seal that has an unforgeable company brand on it, which proves who did the shipping. The trucker or attacker, if they tamper with the truck, they won't be able to create the new seal with the shipper company brand on it. How can we use digital signatures? We can both sign the images themselves, and we can sign over the metadata and the attestations individually. If someone asks us where this photo was taken, we can share data saying that it was taken in Morton Grove, Illinois, and sign that data. Someone can know that we said that, regardless of how that information gets shared, as long as the signature gets shared too.
If we go back to our little game, where the attacker succeeds, if they're able to convince us that the metadata is true for the fake image on the right, we can see some big differences, if we expect that real images are going to have digitally signed metadata. For the real image, we might maybe expect to see a signature by the photographer. We might see a signature by the Associated Press or whoever was involved in publishing it. We might also see a signature from open source investigators who independently investigated the image and verified it. For the fake image, we're not going to see anything like that. There's a number of things that someone might do. First, we might see no signatures at all. Or we might see signatures from unknowns. They just generate some signatures with some random keys. We don't know who's doing it, but they're like, the signatures are there. In other words, the signatures would be technically valid, but they wouldn't be from anyone that we know, or anyone that we trust. We might see forged signatures, so this wouldn't be successful. If we don't bother verifying it, it might look right. If we don't actually do the check, it might look legitimate. The attacker can try to confuse us in various ways, but there's going to be a real distinction between the fake images and the real images.
Caveats
Some caveats for digital signatures. For people to be able to verify our signatures, they have to know our public key. There's an entire business to this, including certificate authorities and PGP Web of Trust type stuff. It can be tricky, especially if you rotate and change your key. Then, secondly, someone has to keep the signature data around with the metadata and attestations. We're adding a little bit more data to our records.
JavaScript Libraries for Digital Signatures
In terms of libraries to use, the noble crypto library is great for signatures, although it doesn't handle the private key security for you. I would only recommend it for server usage. Even then you have to manage your own key security, which is a nightmare. Another option that might be better, especially if you're trying to build this for users other than yourself, is to use a crypto wallet that is designed to store the private key for the user so that the developer never has to touch it. MetaMask is a well-known crypto wallet for Ethereum, which has a function called personal_sign. You just put that call on your frontend, and then it will bring up the extension, and allows your website to ask the user to sign arbitrary data. Here's an example of what that looks like. As you can see, the user gets to see what they're signing, which is obviously very important. They can choose to sign or not. It uses the private key that's in the crypto wallet. The downside here is that you have to use the same elliptic curve digital signature algorithm that Ethereum uses. If you don't have a preference, that should work. Importantly, like I said, if you use this, then you don't have access to the private keys, which is very important.
Authenticated Attributes (Starling Lab)
If the distinction between AI generated images and real images is that the real images have the credible data associated with it, and the AI generated images don't, then that means that right now we need to start raising our expectations of real images. We need to start expecting that real images are going to be timestamped, and that real images are going to have digitally signed metadata and attestations about them. These expectations are only going to work if we can actually start doing those things for real images. My call to action for all of you is that if you work with images at all, to start timestamping data automatically. We need to build systems that allow people to make attestations about images, to digitally sign those attestations, and to share that information with others.
That's actually what I worked on with Starling Lab. It's a project called Authenticated Attributes. The target audience was not the general public, it was open source investigators who are already familiar with looking at an image, gathering the supporting documents, looking at claims about the image, that sort of thing. It allowed the open source investigators to create, view, and share authenticated data about images, including sharing the supporting evidence and knowledge about the derivative or related images, so the different file formats, the edited photos, those sorts of things. In terms of the status, it's currently in the process of becoming production ready, but the code and a demo have been released. The code is available on Starling Lab's GitHub, under the repo, authenticated images. Like I said, it's still in the early stages. I would expect that as this gets more adopted, there's going to be some edge cases and other things that can only be learned after widespread adoption. Just to hammer this home, it's my belief that in order to combat fake images, we have to raise the bar for what we expect from real images. In order to do that, we the software engineers, we need to build systems that allow us to meet that high bar. We need to understand timestamping services and digital signatures. We need to use those concepts in the systems that we build. You can check out the Starling Lab GitHub repo, github.com/starlinglab/authenticated-attributes, and start playing around with some of the libraries and concepts.
Questions and Answers
Participant 1: Wouldn't it be more efficient for the hashing to happen at the moment of creation, like on the client device?
Sills: Yes, that's absolutely true. Anyone who's involved in any kind of app that allows people to take photos, I definitely think that timestamping at that point, that hashing at that point, would definitely be preferable. Like I said, there's the app, ProofMode, that actually does that for users. ProofMode is being used, for instance, in Ukraine to document war crimes. Because in that case, it's highly contested, the investigation might actually happen, the trial might actually happen like 10 years later. You really want it to be timestamped at the moment of creation. Yes, that's definitely preferred. As we're going into this new paradigm, we're probably going to have to timestamp a lot of stuff that wasn't timestamped at the moment of creation. I think it's always better late than never, in those cases. Then, for the things that actually do need to be backdated, since we can't actually technically backdate it. We've made sure that that's not possible. That's where the attestations and the claims come in. The photographer could say, "I took that photo in 1989. I didn't timestamp it at that point, but here's my digitally signed attestation that that happened." It's definitely better to do it right away. If you can't do that, then maybe attestations are a solution. It's best to timestamp as soon as possible.
Participant 2: What about hashing algorithms that actually produce the same hash if the images look the same, if they've been resized, or have a different file format, things like that?
Sills: That is actually very helpful. That's called perceptual hashing. Rather than it being cryptographic hashing, where if even a bit changes, you get a completely different hash. This is intended to bucket images if they look the same to the human eye. It's very useful in a non-adversarial context, if you want to be able to say, I just edited this image. I want to search for images that might be the original or things like that, that will work. However, in an adversarial context, it can be easily manipulated with. What can happen is, if I want to make sure that the perceptual hashing algorithm doesn't work, I can just insert garbage data in the image in a way that doesn't actually change what it looks like, but will mess with the hashing algorithm. It's great for some uses, but not for extremely adversarial use cases.
Participant 3: What prevents an attacker from finding a time source in the past?
Sills: All of the strength of the timestamping depends on the integrity of the timestamping service. If we take the incredibly inefficient case of, you're just putting the image in the newspaper. That can't be tampered with unless you're printing alternative newspapers or doing some crazy stuff. Using hashes as a way to summarize that, what prevents an attacker from being able to manipulate that is not that the timestamping service has to be trusted entirely, because once you get that proof, it doesn't matter what the timestamping service does. Anyone in the world with the proof can go through all the steps and verify things for themselves, just by doing the hashing themselves. What prevents the attacker from being able to do that, or being able to trick that, is just, they just can't change what things hash to. If an attacker were to try to create something that would trick the general public, it would just fail verification. The flipside of that is that it does require the software or the people to actually perform the verification, otherwise it's not doing very much.
See more presentations with transcripts