An AI Nightmare Has Arrived for Twitter — And the FBI

As the school year kicked off in the Spanish town of Almendralejo, teenage girls began to come home telling their parents of disturbing encounters with classmates, some of whom were claiming to have seen naked photos of them. A group of mothers quickly created a WhatsApp group to discuss the problem and learned that at least 20 girls had been victimized the same way. Police opened an investigation into the matter, identifying several people they suspected of distributing the content — likely also teens, as the case is now being handled by a Juvenile Prosecutor’s Office.

So-called “revenge porn,” explicit material obtained by hacking or shared in confidence with someone who makes it available to others without the subject’s consent, is hardly a novel crime among adolescents. But this time, the images were startlingly different. Though they looked realistic, they had been created with a generative AI program.

The incident is not an isolated one. American schools have already had to deal with students creating AI nudes to bully and harass one another. In October, the police department of Muskego, Wisconsin police department learned that at least 10 middle school girls had friended someone they believed to be a 15-year-old boy on Snapchat; after they sent images of themselves to this person, he revealed himself to be a 33-year-old man and demanded explicit photos, threatening to alter their original pictures with AI to make them appear sexual and send them to their family and friends. Earlier this year, the FBI warned of an uptick in the use of AI-generated deepfakes for such “sextortion” schemes, confirming that the agency is receiving reports from victims “including minor children.”

Industry experts and AI researchers have also warned that social media companies, particularly X (formerly Twitter), which under owner Elon Musk has purged employees working on trust and safety, are vulnerable to this new challenge: the rise of AI-generated child sexual abuse material, or CSAM. The imagery may depict the abuse of minors who do not actually exist, or directly victimize specific people. And while one can be criminally charged under various U.S. laws for making and disseminating it (in April, a New York man who posted deepfaked explicit images of 14 underage girls on a porn site and encouraged their harassment received a six-month jail sentence), prosecutors and law enforcement worry that the lack of statutes directly addressing the phenomenon leaves open a loophole for would-be predators. Several states have rushed to outlaw AI-generated sexual imagery of children as a result. For big tech, the chief concern is that this content will be far more difficult to detect than the known set of existing CSAM, which is typically auto-filtered with software that recognizes it. Even worse, it can be produced in massive volumes that would overwhelm human moderation teams.

Editor’s picks

“There’s no static set of policies, enforcement tools, or product features that can wipe these predators out”

“Predators who find and distribute CSAM are constantly evolving their strategies and networks,” explains Vaishnavi J., who has formerly worked for both Meta, as head of youth policy, and Twitter (now X) as a senior policy manager. “There’s no static set of policies, enforcement tools, or product features that can wipe these predators out.” Generative AI, she says, is a “particularly vicious new vector” in this ongoing battle between tech companies and these bad actors, one that would potentially confuse efforts to combat real-world child abuse. Faced with both kinds of CSAM, it might prove “near-impossible to triage which of these images are actual children who need help,” J. says. She has “no idea how tech companies are structurally capable of addressing this.”

X is the Silicon Valley giant that has placed itself squarely at the center of the CSAM issue in the past year. Last November, shortly after he completed his takeover of the since-renamedTwitter,Musk began to claim that removing child exploitation material from the platform was “priority #1.” This was something of a detour from the issue that has most occupied him before and since that period — namely, bots — yet the tough talk resonated with right-wingers who incorrectly believed he had already succeeded in purging such content from the site. Musk happily took the credit for something he hadn’t done, even falsely alleging that previous Twitter leadership “refused to take action on child exploitation for years.” In fact, Musk was actively gutting the child safety team that had been in place when he arrived.

Unsurprisingly, then, child sexual abuse material has reportedly remained a scourge on X throughout 2023. According to the New York Times, harmful content that “authorities consider the easiest to detect and eliminate” is widely circulated and even algorithmically promoted. Twitter also stopped paying for some detection software previously used to help with moderation efforts. Accounts offering to sell CSAM haven’t gone away. And earlier this year, the Stanford Internet Observatory, “while conducting a large investigation into online child exploitation,” found “serious failings with the child protection systems at Twitter.” After detecting tweets that included known CSAM images using PhotoDNA, a Microsoft service that identifies this content based on matches to hash values — a sort of digital fingerprint — the Observatory in April reported them to the National Center for Missing and Exploited Children. But the problem continued, and Twitter didn’t remove the posts, which should have been caught on their end by PhotoDNA in the first place, until May 20. (Twitter then assured the Stanford researchers that it had improved its detection capabilities but did not offer a public statement, and did not reply to a request for comment from Rolling Stone.) And in July, Musk himself appeared to reinstate the account of a popular QAnon-affiliated conspiracy theorist who’d been suspended from the site for sharing a CSAM image that, by the platform’s own metrics, had at least 8,000 retweets and 3 million views.

If that’s how X is handling the child abuse content that should be caught with automated security tools, it doesn’t bode well for the company’s ability to mitigate AI-generated CSAM designed to circumvent those defenses. David Thiel, a big data architect and chief technologist of the Stanford Internet Observatory, says Musk’s platform is “definitely more vulnerable” than other large players in tech. “They allow a mix of explicit content and otherwise, and that makes it harder for them to actually police stuff,” he explains. “And obviously they are running on a pretty meager subset of their trust and safety staff.”

Thiel co-authored a June report predicting the consequences of a rise in computer-generated CSAM. The paper found that among groups dedicated to child sexual abuse, the proportion of such material has “increased consistently” since August 2022, with 66 percent of it “highly photorealistic.” Compared to the process of filtering existing harmful content, Thiel says, identifying and blocking new forms of CSAM produced by visual machine learning models will require equally advanced methods of detection and enforcement. Still, Musk’s platform is hardly alone in being ill-equipped for the next phase of this arms race, and the rest of Silicon Valley may not fare much better. Along with X, Amazon, Meta, and Google parent Alphabet have laid off hundreds of trust and safety employees in the past year, contributing to fears — some already realized — that online hate speech and misinformation will go unchecked. The same business decisions may expose any of them to the perils of AI-generated CSAM.

“People are not particularly well prepared, and that’s because it’s a really difficult problem to prepare for,” says Thiel. He describes how one area of the web — Lemmy, a fledging alternative to Reddit that is decentralized and open-source — has already suffered eruptions of CSAM created with visual generative machine learning models. “This is all basically community-run and operated by hobbyist people that are doing it in their spare time,” he says of the various Lemmy subdivisions. “Someone had a grudge against one of the servers and started flooding that system with generated CSAM, to the point where the moderators got extremely burnt out and were overwhelmed. People were disabling images entirely in some cases, or just entirely shutting down their servers.” They had no way of defending against an attack of that nature and scale. “It’s unclear how it’s really going to shake out for larger platforms,” Thiel says.

“People are not particularly well prepared, and that’s because it’s a really difficult problem to prepare for”

While Meta, X, TikTok, Reddit and other social media giants are of course obligated to protect users from this threat, experts point out that developers of generative models bear responsibility for what people do with them. Stable Diffusion, for example, is open-source, making it more prone to appropriation for abuse. Carissa Véliz, an associate professor of philosophy at the University of Oxford’s Institute for Ethics in AI, says we’re seeing “the inevitable consequences that come when technology companies design a product that is very powerful, very easy to use, [and] very tempting to use” for nefarious purposes. “Use the public as guinea pigs. Let society figure out how to deal with the consequences. So in a way, if you think about the analogy of ecology, it’s similar to a factory creating a product that has these toxic negative externalities, and the company doesn’t deal with it. Society does.”

Thiel agrees that the open-source model opened this particular can of worms. “Everybody [generating CSAM with AI] is effectively using something derived from Stable Diffusion 1.5, which was trained on a significant amount of explicit material,” he says. Stable Diffusion is a deep learning, text-to-image model first released last year by the startup Stability AI, and its publicly available code has been repurposed and modified in ways that violate its user agreement. The company’s language forbids the application of the model or its derivatives in “exploiting, harming or attempting to exploit or harm minors in any way” — but “nobody cares,” Thiel says. “People have taken those models and continued to train them on explicit content, such that they get fairly good at it. There’s no real guardrails. The only thing that people have to worry themselves with is tweaking it so that they can get the results that they like.”

Only a “tiny percentage” of people, Thiel observes, are trying to use programs like Midjourney or DALL-E for these purposes. OpenAI, the research company that designed DALL-E, expressly forbids the creation of CSAM with their product, reports any to the National Center for Missing and Exploited Children, and removed violent and sexual images from the most recent model’s training data set. Perhaps most importantly, though, neither Midjourney nor OpenAI has released their proprietary source code.

Stability AI, the company behind Stable Diffusion, is in the more difficult position of having their code out in the wild. They, too, prohibit “illegal and immoral” misuse of their technology across platforms, a spokesperson tells Rolling Stone, but are also pursuing new ways to “mitigate the risk” of unsafe AI-created material. In the company’s view, “search engines should remove forums that provide instructions on how to adapt models to generate unsafe content and report them to the relevant authorities, and companies that host these forums should be sanctioned.” Stability AI advocates for greater regulation of the AI industry as a whole and has joined the Content Authenticity Initiative, a tech and media organization that works on the issue of content provenance, or verifying the origin of harmful digital material, including disinformation. (Thiel laments that research on tracking provenance with techniques like watermarking, which allows a computer to detect if a text or image was generated by AI, “came after the fact” of the models being unleashed.)

In September, the National Association of Attorneys General, citing the proliferation of explicit deepfakes and AI-generated CSAM, urged Congress in a letter to appoint an expert commission and “propose solutions to deter and address such exploitation in an effort to protect America’s children.” It’s yet another sign of mounting anxiety over how to prevent an tsunami of harmful, exploitative AI imagery, but, as on the technical side, obstacles to legislative action abound. Nate Sharadin, a fellow at the research nonprofit Center for AI Safety, says that depending what approach lawmakers take, they could run afoul First Amendment, because “it’s unclear whether and to what extent it’s technically illegal according to existing statutes to produce (and possess) AI-generated content of this sort,” and the Supreme Court has set a precedent in this area that could prove troublesome: while computer-generated CSAM is “morally disgusting,” Sharadin says, it may be protected expression in the U.S.

In the 2002 ruling Ashcroft v. Free Speech Coalition, Sharadin explains, SCOTUS struck down a portion of the Child Pornography Prevention Act of 1996 that criminalized “virtual” CSAM, or computer-generated images of children (who don’t exist) engaged in sex. The court decided that such content “was neither obscene nor met the definition” of CSAM. It’s not clear whether content like explicit deepfakes of real children could be protected by the same ruling. But in any case, if Congress tried to explicitly ban all AI-generated CSAM at the federal level, Sharadin says, he imagines the courts arriving at a similar decision. So, he explains, the Attorneys General “are going to need some new kind of statute passed by Congress if they want to address the problem, and that statute will likely need to target model developers rather than model users. One obvious way to do this is via a product liability law making it ungodly expensive to release a model that could generate this kind of content.”

Attorneys General “are going to need some new kind of statute passed by Congress if they want to address the problem”

But until the government does crack down on the next generation of CSAM vendors, or force AI developers to implement stronger safeguards in their generative models, social tech companies have no choice other than to fortify their bulwarks. The startup ActiveFence, a trust and safety provider for online platforms, is one company sounding the alarm about how predators are abusing generative AI, and helping others in the tech industry navigate the risks posed by these models. Like Thiel and his colleagues, the security company has tracked a spike in CSAM generated with AI, reporting in May that the volume of such content had swelled by 172 percent — and that in one dark web forum they investigated, the vast majority of child predators indicated they had used AI to make CSAM or planned to do so in the future.

“The good news is that pretty much everybody is aware of the problem,” says ActiveFence CEO and co-founder Noam Schwartz, though, as with any online threat vector, safety teams are playing “a constant game of Whac-A-Mole.” What does an improved security network look like? Schwartz describes a “multilayered” filtering system that isn’t looking for just one kind of clue (like how PhotoDNA simply matches a data fingerprint to a known piece of CSAM in a database). Instead, he says, you weigh a number of “signals” together to determine whether an image should be sent through moderation. AI countermeasures can flag content if it registers as nudity, for example, but the content may also bear telltale keywords or hashtags that increase the probability that it’s harmful. ActiveFence’s technology also takes into consideration where an image is coming from, detects whether a user is spamming a lot of suspicious images in a short period, and a range of other behavioral data. It’s less about spotting each novel piece of CSAM as it’s produced than patterns in how it tends to be disseminated. “So there’s a lot of ways to stop it,” Schwartz says. “And finding the right combination [of indicators that an image is CSAM] sometimes takes time. But this is why it requires your own people in the companies actually analyzing the data” to figure out how effective a given filter is, he adds.

This, the need for staff actively engaged in outwitting and outmaneuvering predators as they switch up their methods, could be the most worrisome element of the surge in AI-created CSAM today, given that the tech industry has seen trust and safety teams slashed across the board. Protective tools demand moderators who understand how to utilize them. Without that human intervention, your most-visited websites are likely bound for disaster. When it comes to shielding children from the worst, it seems there’s no substitute for paying attention.

But, according to Schwartz, it’s a difficult, often thankless job. “Most people in those companies, honestly everybody with that type of work, their heart is in the right place, and they’re working incredibly hard to stop [CSAM],” he says, though if they make “one mistake, they get a lot of heat for everybody. And nobody is singing their praises when they succeed, because you don’t feel it.”

That’s why he’s anticipating a crisis instead of waiting for it to happen — to make sure those teams have every resource at their disposal. “We’re basically figuring out all the risks online before they become mainstream,” Schwartz says. “Before they become a real problem.”

An AI Nightmare Has Arrived for Twitter — And the FBI

Editor’s picks

Related

Products You May Like

Editor’s picks

Related

Products You May Like

Articles You May Like