Blog, AI Detector
Are AI Detectors Accurate? It Depends
Are AI detectors accurate? It depends on the AI detector in question. Different AI detection tools have their accuracy rates. In this article, you'll find summaries of some studies on the accuracy of AI detectors and our take on those results. This will help you decide if you want to continue trusting the predictions of AI detectors.
Table of Contents
What Are AI Detector Tools?
Why Are AI Detectors Important
Why Does the Accuracy of AI Detectors Matter?
Are AI Detectors Accurate?
Are These Studies Still Relevant Today?
How to Interpret AI Detection Results
Conclusion
Frequently Asked Questions
What Are AI Detector Tools?
AI detectors are websites powered by artificial intelligence that specialize in differentiating between AI-generated text and human-written text. They use methods such as natural language processing and machine learning to predict the source of content. An AI detector isn't a plagiarism checker. It doesn't tell you if a piece of content has been plagiarized. Its task is to discern whether or not the writer used an AI tool to generate the text.
Why Are AI Detectors Important?
Content writing tools like ChatGPT are now writing almost all types of content, from essays to blogs. For instance, students work with large language models (LLMs) to complete assignments faster. Also, content marketers are capitalizing on the speed of generative AI apps for large-scale content generation. Considering the distinctive writing style of LLms, one would think that humans would have become excellent at identifying AI-generated content.
Unfortunately, a Nexcess study reveals that humans are just 57.3% accurate at identifying AI-generated copy. However, AI detectors are much more accurate at identifying AI-generated content. That's why they've quickly gained the trust of consumers.
Why Does the Accuracy of AI Detectors Matter?
Users of AI detectors make vital decisions based on what these AI models say. A teacher, for instance, may fail a student if an AI detector tells them that the student's essay is likely AI-generated. Professional human writers could also face backlash from their clients if an AI detector claims that their human-written text is AI-generated because of their writing style.
Since we place so much trust in AI detectors, it's important for them to be as accurate as possible. This will help us avoid falsely accusing one another of generating content with the use of AI.
Are AI Detectors Accurate?
First, we need to define what we mean when we say that an AI detector is accurate. An AI content detector that's 100% accurate produces results that are precise, correct, and free from errors. However, each AI detection software makes some mistakes from time to time. This doesn't mean that AI detectors aren't accurate at all.
Rather, it means that AI detectors are accurate, but to varying extents. For instance, a detector that produces incorrect results 20% of the time is 80% accurate. Saying that such a detector is completely inaccurate is twistimg the truth. But, independent studies say that AI detector tools aren't accurate. Let's take a look at three of these studies.
The Washington Post’s Investigation into Turnitin
In April 2023, the Washington Post shared its findings about Turnitin's accuracy. To get these findings, it gained early access to Turnitin's AI detection feature. It asked five high school students to test Turnitin using 16 texts. These texts included AI-generated essays, human-written essays, and some texts that were partly human-written and partly AI-generated.
Turnitin identified 6 of the essays correctly. It wasn't that accurate at identifying 7. Finally, it “failed” to correctly identify 3 of the essays. The Washington Post didn't share the exact AI detection scores of the essays. But since it considered an 8% AI score on a human-written essay as a fail, then we can say that it set really high standards for Turnitin.
A Study on the Bias of AI Detectors
The report on this study was submitted in April 2023. Five researchers from the Departments of Computer Science and other departments at the University of Stanford did the study. They tested seven GPT detectors, namely:
Originality.ai
Quil.org
Sapling
OpenAI
Crossplag
GPT Zero
ZeroGPT
They found that at least 89 out of the 91 TOEFL essays written by non-native English students were identified as AI-generated by at least one AI detector. Meanwhile, the AI detectors had a “near-perfect accuracy” when classifying the essays that native English speakers wrote. Originality.ai had the highest misidentification rate (76%). Meanwhile, ZeroGPT had the lowest (48%).
A Study on the Accuracy of Five AI Detectors
In September 2023, another group of researchers shared their discovery about the accuracy of five AI detection tools. They tested these tools with text that GPT 3.5 and GPT-4 generated. Both GPTs wrote 100 words on the application of cooling towers in engineering.
The AI detectors that the researchers used for the study were:
OpenAI
Crossplag
Writer
Copyleaks
GPT Zero
Here's a summary of how each tool performed:
OpenAI: OpenAI reported that 100% of the content written by GPT 3.5 was likely AI-generated. Meanwhile, it claimed that 80% of the text written by GPT-4 was likely to be AI-generated.
Crossplag: Crossplag thought that 100% of the GPT-3.5 text was likely AI-generated but only 20% of the GPT-4 text was likely AI-generated.
Writer: Writer assumed that 70% of the GPT-3.5 content was likely AI-generated. It flagged only 40% of the GPT-4 content as AI-generated.
GPT Zero: GPT Zero said that more than 90% of the text written by GPT 3.5 was likely AI-generated. It believed that 20% of the GPT 4 text was likely AI-generated.
Copyleaks: Copyleaks believed that more than 90% of the GPT-3.5 text was AI-generated while 20% of the GPT-4 text was AI writing.
Are These Studies Still Relevant Today?
The AI technology used for content detection has undergone many advancements since 2023. These 2023 studies may not reflect the current accuracy of AI detectors. But, we can still use the data obtained from these studies to evaluate how much AI detectors have improved since then.
How to Interpret AI Detection Results
Most AI detectors, such as Content at Scale and Turnitin, advise that their results should be interpreted as probabilities. If they tell you that your essay has a 92% human score, for example, it means that they believe that there's a 92% chance that the essay is human-written.
Conclusion
There's no uniform answer to the question, “Are AI detectors accurate?” The accuracy of an AI detector depends a lot on its training and the improvements in its algorithms. Consequently, most accurate AI detectors are those that have been trained on vast datasets and are updated regularly. However, no AI detector’s accuracy can help it detect AI-generated text that StealthGPT has humanized.
FAQs
Do AI detectors work?
AI detectors feed on training data to understand the characteristics of human writing and AI text. The developers organize this data into datasets. Based on their training, these detectors can predict the likelihood that text is AI-generated.
How do AI detectors work? AI detectors like StealthGPT's AI Checker use machine learning and natural language processing algorithms. They also measure the perplexity and burstiness of text. These metrics help the tools to discern whether text is AI writing or not.
Does paraphrasing remove AI detection?
Paraphrasing text with regular paraphrasing tools like Quillbot can hardly fool AI detectors. If you want to really bypass AI detection, you'll need a much stronger software that can make AI-generated text undetectable. The only app that can do this is StealthGPT. StealthGPT has bypassed Turnitin's AI detector, Originality.ai, GPT Zero, Winston AI, Crossplag, and many others.
What are false positives in AI content detection?
False positives are situations where an AI detector claims that human-written text is most likely AI-generated. How do you know when an AI detector has given a false positive? If the tool assigns an AI score of less than 50 to human-written text, then it believes that the text is possibly AI-generated. This is a false positive.
What are false negatives in AI content detection?
A false negative occurs when an AI detector claims that AI-generated text is human-written.
In the screenshot below, Originality.ai thought that text humanized with StealthGPT was human-generated content.
Does a high AI score indicate plagiarism?
Generating content with AI and plagiarizing content aren't the same thing. When students work with content writing tools, they aren't "plagiarizing" content. Consequently, you can't tell if a work was plagiarized by scanning it with an AI detector. You need to assess the text using a plagiarism detection tool to know if any part is plagiarized. Also, while plagiarism compromises academic integrity, AI content generation doesn't.
Can AI detectors detect ChatGPT writing?
Most AI detectors can detect unedited ChatGPT writing. If you rewrite it using an undetectable AI such as StealthGPT, they can't detect that it's AI-generated.
Can teachers detect AI writing?
Teachers can detect AI writing, and they usually do this by using AI detectors. As AI detectors aren't perfectly accurate, teachers have started finding other ways of detecting AI writing and sharing them on social media.