Scientists Can Now Listen to Your Conversation by Looking at Your Potato Chip Bag (VIDEO)
What's that you said? Scientists can now listen to your conversation with the help of a potato-chip bag? It turns out that researchers at MIT, Microsoft and Adobe have created an algorithm that can reconstruct an audio signal simply by analyzing minute vibrations of objects depicted in video. In fact, they were able to recover intelligible speech from the vibrations of a potato-chip bag photographed from 15 feet away through soundproof glass.
"When sound hits an object, it causes the object to vibrate," said Abe Davis, a graduate student in electrical engineering and computer science at MIT, in a news release. "The motion of this vibration creates a very subtle visual signal that's usually invisible to the naked eye. People didn't realize that this information was there."
Actually reconstructing this sound wasn't easy, of course. It required that the frequency of the video samples, which is the number of frames of video captured per second, be higher than the frequency of the audio signal. To put it in perspective, smartphones capture 60 frames per second, while the scientists used a device that captured 2,000 to 6,000 frames per second. The best commercial high-speed cameras can capture up to 100,000 frames per second.
The scientists measured the mechanical properties of the objects they were filming and determined that the motions they were measuring were about a tenth of a micrometer. In a close-up image, that corresponds to five thousandths of a pixel. From the change of a single pixel's color value over time, it's possible to infer motions smaller than a pixel.
That's not to say that less precise instruments can't be used. The researchers did use more conventional cameras for some of their experiments, and found that they could find out enough information to give accurate details about the gender of the speaker and the number of speakers in a room.
"We're recovering sounds from objects," said Davis. "That gives us a lot of information about the sound that's going on around the object, but also gives us a lot of information about the object itself, because different objects are going to respond to sound in different ways."
The technique has obvious applications in law enforcement and forensics. Not only that, but it also reveals more information about the object itself.
"This is new and refreshing. It's the kind of stuff that no other group would do right now," said Alexei Efros, one of the researchers. "We're scientists, and sometimes we watch these movies, like James Bond, and we think, 'This is Hollywood theatrics. It's not possible to do that. This is ridiculous.' And suddenly, there you have it. This is totally out of some Hollywood thriller. You totally know that the killer has admitted his guilt because there's surveillance footage of his potato chip bag vibrating."
The findings are published in the journal ACM Transactions on Graphics.
Want to learn more? Check out the video below, courtesy of YouTube.