Deepfakes and Inadequate Data Provenance Are Escalating Problems


Posted on

What fiascos we have witnessed in the past three months in the cybersecurity realm.

Last December, we learned that SolarWinds, a major US information technology firm, was slammed by a cyberattack that impacted up to 18,000 of its customers, including multiple federal government agencies and Fortune 500 companies. Microsoft President Brad Smith has described it as “the largest and most sophisticated (cyber) attack the world has ever seen.”

Then, this month, Microsoft itself got whacked. Many of its servers were breached at the expense of at least 30,000 US organizations, including small businesses and local governments. And that was followed only a week later by a warning by the FBI that malicious actors “almost certainly” will be using so-called deepfakes to advance influence campaigns.

Even though the FBI named no likely victims, deepfakes, surprisingly, may ultimately turn out to be the most troublesome development of all. Deepfakes are manipulated digital content, such as video, images, audio and periodically text. They alone have the ability to sow disinformation throughout a society already drowning in it.

The concept of altered video, in particular, has been around for decades, but deepfake videos are a relatively new threat to the cybersecurity world and a significant menace to crucially important data provenance. They enable people with digital chops the ability to alter reality so that subjects can be manipulated to say anything the hacker wants, often inflammatory or incriminating. The foundation is facial mapping and artificial intelligence. It has already improved to the point that in some cases, it can be extremely challenging to spot bogus videos, some of which even accurately mimic the way a person blinks and how often.

All this undermines data provenance—the historical record associated with a piece of data—and hence the ongoing digitation of the global economy, which requires data to function. Distorted data only has to do enough to spark doubts about the veracity of data, which in the digital world, of course, is solely about 1s and 0s.

Data provenance determines where data originated, where it has moved over time and what changes have been made to it and who made them. The concept originated in the art world, where complex systems of authentication were required to prove that a piece of art was indeed produced by a specific artist. Today, data provenance is incredibly important to corporate record-keeping.

Manipulated data can be in the form of text or data or even voice, not just video and images, in any number of nefarious applications, and all of it boils down to the weaponization of data. If appropriately skilled hackers gain access to an organization’s network, they could insert false or inaccurate data into existing databases and drive companies to make major strategic decisions based on fabricated information. They could also, say, mimic a recorded public quarterly earnings call with the voice of the CEO or CFO and disseminate inaccurate numbers, perhaps impacting the price of the company’s stock.

So far, though, few companies are taking significant steps to address data provenance, focusing instead on functions deemed a higher priority. “The importance of data provenance continues to grow, but we’re still in very early days, and I expect this to continue,” says Joe Witt, a vice president at Cloudera, a Silicon Valley-based data management company.

On the deepfake front, the creation of videos is the most common practice today, and they have moved beyond mostly distorting the faces of women involved in pornography to also mimic prominent figures, including Barack Obama, Donald Trump, Elizabeth Warren and Mark Zuckerberg.

This sort of thing fuels conspiracy theories populating the Internet. Authorities, for example, have said that the January attack on the Capital in Washington D.C. was spurred in part by misinformation propagated online by far-right influencers and media outlets.

The most advanced approach to creating a video or image deepfake is to use a system called GANs (generative adversarial networks). GANs consist of two deep neural networks competing against each other. Initially, both networks are trained on real images. Then one network generates images and the other tries to determine whether the image is genuine or fake. After that, the first network learns from the result, also forcing the second network to improve. Both networks continue to refine the process in tandem.

To combat this, some researchers have been developing tools that can detect deepfakes with extremely high accuracy. Algorithms, for example, have even learned how to spot very minor facial irregularities by studying past footage.

While helpful today, however, this is only a temporary fix because of the inevitable cat-and-mouse game between hackers and defenders to learn from their weaknesses and find a way to eradicate them.

There is a bit of good news on the deepfake mitigation front, however. In some cases, failure to initially recognize a deepfake might be sidestepped by measures available today and perhaps in the near future. Here are examples:

+ Reverse image search. This has empowered some journalists and fact-checkers to unearth original photos from which forgeries were made. This type of online tool accommodates an image and uses computer vision to discover similar photos online, which can reveal an altered photo.

+ Legal remedies. Scholars are calling for qualifications on Section 230 of The Communications Decency Act that would make it easier for private citizens to hold technology platforms accountable for disseminating harmful or slanderous content uploaded by its users. California has already banned the creation and circulation of deepfakes of politicians within 60 days of an election.

+ Blockchain. A few companies are selling blockchain-based verification. Content can be registered to an unalterable ledger at the time of creation. Should a discrepancy arise, customers can prove that their content is the original.

No question, deepfakes and insufficient attention to data provenance are clearly becoming significant security threats to companies and the nation overall. Let’s keep our fingers crossed that more will be done before long, perhaps including lobbying, to better harness deepfakes and improve data provenance.

Anti-Fraud

fraud

Blogs posted to the RSAConference.com website are intended for educational purposes only and do not replace independent professional judgment. Statements of fact and opinions expressed are those of the blog author individually and, unless expressly stated to the contrary, are not the opinion or position of RSA Conference™, or any other co-sponsors. RSA Conference does not endorse or approve, and assumes no responsibility for, the content, accuracy or completeness of the information presented in this blog.


Share With Your Community

Related Blogs