OpenAI has the tech to watermark ChatGPT text—it just won’t release it

Getty Images

According to The Wall Street Journal, there’s internal conflict at OpenAI over whether or not to release a watermarking tool that would allow people to test text to see whether it was generated by ChatGPT or not.

To deploy the tool, OpenAI would make tweaks to ChatGPT that would lead it to leave a trail in the text it generates that can be detected by a special tool. The watermark would be undetectable by human readers without the tool, and the company’s internal testing has shown that it does not negatively affect the quality of outputs. The detector would be accurate 99.9 percent of the time. It’s important to note that the watermark would be a pattern in the text itself, meaning it would be preserved if the user copies and pastes the text or even if they make modest edits to it.

Some OpenAI employees have campaigned for the tool’s release, but others believe that would be the wrong move, citing a few specific problems.

First among those is the fact that even with 99.9 percent accuracy, the watermark detector would still be wrong some of the time, considering how often ChatGPT is used.

Among those who have shown the most interest in using the tools are teachers and professors, who have seen a rapid rise of ChatGPT-generated school papers and other assignments. But OpenAI’s argument is this: 99.9 percent accuracy sounds like a lot, but imagine that one among 1,000 college papers was falsely labeled as cheating. That could lead to some unfortunate consequences for innocent students.

Further, OpenAI says the tool’s release could stigmatize non-native English speakers using ChatGPT for translation or to improve their writing, which the company argues is a legitimate use of the tool.

Finally, a blog post by OpenAI clarified that it is relatively easy for bad actors to bypass the watermark in its current form. Running ChatGPT’s output through another LLM text generator could do it, as could asking ChatGPT to insert special characters throughout the output and then manually removing those characters.

There’s one problem OpenAI didn’t mention in its blog post, but it did come up in the Wall Street Journal article: A survey of ChatGPT users showed that as many as 30 percent said they would stop using ChatGPT if its output was watermarked.

OpenAI is sitting on the watermark feature and hasn’t rolled it out yet. It’s also investigating alternative solutions that are still in development, such as including cryptographically signed metadata in outputs.

That solution would be similar to how OpenAI has approached the content provenance issue with the DALL-E 3 image generator. The company includes C2PA metadata to help people identify when and how images were modified with DALL-E.

OpenAI previously released and supported an AI text detection tool. It focused not just on ChatGPT, but also on detecting whether any AI tool was used to generate text. However, it was discontinued because it was highly inaccurate and prone to false positives, making it mostly useless.

Source link