Blog, Undetectable AI

Does OpenAI's o1 Sound More Human?

Undetectable artificial intelligence services specialize in one thing: human writing generated by AI. If we accept the premise the human writing is the most high-quality form of writing, for its ability to reach and connect with readers, then undetectable AI content must be a higher form of content than the detectable text dispensed by ChatGPT.

However, one has to wonder: As ChatGPT continues to evolve with new AI models coming out time and time again, will OpenAI catch on and realize their text outputs need to feel like human text if they want to compete with undetectable AI chatbots?

We wanted to see for ourselves if OpenAI’s o1 sounds more human than previous models. We’ve tested o1-preview and o1-mini in the past, but here, we’ll see if o1’s reasoning capabilities, advanced decision-making, and ability to take on complex tasks extends to its writing style.

What Makes Writing Sound Human?
All About O1
How to Determine How Human o1 Sounds
AI Detector Results
Comparing o1 to GPT-4’s AI Detector Results
Conclusion

What Makes Writing Sound Human?

The rise of generative AI drove the need to measure the humanity or artificial intelligence present in any given piece of writing. These measurements are made in terms of perplexity and burstiness, which are references to the simplicity and precision of a piece of text’s word choice, sentence length, and sentence structure.

Human writing is natural because it’s complex, random, chaotic, and imprecise at times. When writing becomes too simple and features word choice that feels a little to on the nose, without any ambiguity, that’s usually indicative of the text being generated by AI tools.

All About O1

o1 was a benchmark innovation for Sam Altman’s OpenAI. In a world where big tech is rapidly releasing large language models, from Meta to Google Gemini, if players want to stand out in the AI field, artificial general intelligence is simply not enough.

What o1 was able to execute unlike any previous model was an unparalleled high level of AI reasoning, with a thought process able to take on more complex tasks than other models. Content creators simply using LLMs to generate AI text for social media do not understand how powerful o1 is.

It’s an AI assistant for PHD-level tasks in science to math, moreso than English and humanities. In medicine, o1 is a game-changer. In creative writing and content creation, users know its a weak tool when it comes to creating human-like content. But maybe things have changed and there's been an improvement. We’re here to find out if there’s been any progress in creating natural content across OpenAI's timeline of AI models.

How to Determine How Human o1 Sounds

If we measure the perplexity and burstiness of a piece of text created by OpenAI’s new model, we can get some insight into if it feels more human than, lets say, GPT-4. The easiest way to take those measurements is by passing the text through an AI Detector.

Using StealthGPT’s standard and enhanced AI detection systems, we will know exactly how human OpenAI o1 sounds. These AI systems scan text for the watermarks of AI writing and output a variety of scores for us to determine its humanness. And humanness is only a synonym for undetectability.

Use our Free AI Detector to check your content

AI Detector and Checker

Enter your text below to check for AI.

AI Detection Score

Click Analyze to scan your text for AI.

AI Detector Results for o1

First, lets generate some text using the o1 model. I asked it to write an essay about biology in the hopes it can put its PHD-level reasoning abilities to work. This is what our cats essay looks like:

Now, lets see what StealthGPT and then AI Detect have to say about o1’s PHD cat essay. Here’s StealthGPT’s results, first using the standard scan which simply looks for typical AI watermarks:

With a 91% AI score, StealthGPT determined o1’s writing was AI with high confidence but there was a small presence of human writing there, somewhere.

Now, using StealthGPT’s Enhanced AI Detection system, we will know how AI systems as powerful as Turnitin will scan the text.

Enhanced AI Detection determined the text was only 28% AI written. This is indicative of o1 ‘s AI content appearing more human to AI systems like Turnitin.

AI Detect will give us more insight into the text’s humanity as well:

Here, we see o1’s content received a 0% human score, 7% readability, and a college level reading level. These results speak to the power of different AI detection systems, the increased humanity in o1’s text, and also the weakness of Turnitin compared to standard AI detection systems by StealthGPT and AIDetect.

Comparing o1 to GPT-4’s AI Detector Results

We aren’t simply answering whether o1’s writing feels human, we’re comparing it to previous OpenAI models, namely GPT-4. I asked GPT-4 to write a PHD level essay about cats as well and here is what it generated:

Once we pass this essay through StealthGPT’s two levels of detection and AI detect, we can determine if this lesser reasoning model features less natural language processing as well. This is what StealthGPT’s standard scan determined:

With an AI score of 100%, StealthGPT doesn’t consider this text human at all. Now, for the Enhanced Scan:

StealthGPT’s Enhanced Scan also gave GPT-4’s text an AI score of 100%, meaning it would receive a similar score from Ai detectors like Turnitin. Lastly, lets learn more about the text from AI Detect.

Similarly to StealthGPT’s analysis, AIDetect gave the text a 0% Human score, meaning it was all AI in its eyes. Coupled with a reading level of graduate, the text was found to be much harder to read as well.

Conclusion

Based on our results, o1’s writing outputs do feature more natural language than its OpenAI predecessors. Still, StealthGPT was able to determine it was AI writing with ease, meaning the text may implement a more complex understanding of how to write like a human, but because it still prioritizes utility before style, the text remains robotic and a dead giveaway to its origin.

Text from ChatGPT is not meant to be submitted in real-world situations. If you intend to use text from o1 for academic or professional work, you will need a multi-step process of passing the text through an AI humanizer if you want to couple o1's problem-solving capabilities with writing that passes for human and features high readability.