Editor’s note (10th February 2023): Wow, things change so quickly in the world of AI. Since our article was published less than two weeks ago on 1st February, the latest buzz right now seems to be around how Bing is showcasing amazing new features with its ChatGPT search engine integration and how Google is combatting this with Bard based on its own language model called LaMDA. That’s a topic for another day and definitely one we will explore.
Today, we’ve decided to revisit the topic of AI content detection and give our article a refresh based on new experiments and findings including some interesting tests using Shakespeare’s Macbeth and A Christmas Carol by Charles Dickens. Interested? Read on.
We’ve recently covered AI content creation and how to use AI for SEO with ChatGPT and GPT-3. As you will discover below, we discovered a great deal of useful information along the way, but one question remained, would AI content detection tools would eventually render ChatGPT useless for SEO? To begin with, there was very little information about AI content detection and whether this was even possible, but that all changed in January 2023.
What AI content detection tools exist today?
Table of Contents
In this post we will explore the following topics:
- 1 What AI content detection tools exist today?
- 2 Revisiting our earlier ChatGPT experiments
- 3 The results: Do these tools for AI detection work?
- 4 How to detect ChatGPT generated text?
- 5 What is AI plagiarism?
- 6 What is fake content and is fake content detection possible?
- 7 The ethics of using AI to create content
- 8 Can Google detect AI content?
- 9 How reliable are tools like GPTZero and AI Text Classifier? (Feb 11th update)
- 10 Can ChatGPT to trained to write like a human and pass GPTZero consistently?
- 11 Are any other ChatGPT content detection tools? (Feb 11th update)
- 12 Conclusion: Is it really possible to detect AI content? (Feb 11th update)
One of the first articles we came across explained the concept of cryptographic watermarks to make AI-generated content easily identifiable.
At the time, this seemed more of an idea than a reality. Then, we started hearing about a university student who invented a tool called GPTZero to help spot and combat “AI plagiarism”, mainly as an aid for academic institutions. We thought GPTZero was the only name in the game but then read the following article: https://www.searchlogistics.com/case-studies/ai-content-detection-case-study.
The above case study from Search Logistics mentioned another tool called OpenAI detector. Then only today we came across yet another tool released by OpenAI themselves, called AI Text Classifier. It’s certainly a hot topic right now with people trying to find ways to evaluate, use or combat tools like ChatGPT.
So, it seemed only right that we take a closer look at these tools.
Revisiting our earlier ChatGPT experiments
In our guide to using ChatGPT for SEO, we undertook two different experiments.
We tested the SEO capabilities of ChatGPT by using the tool to generate two entire articles with no human edits.
The first article was a detailed experiment and lots of input from us over the course of a few hours, the other was a quick article created in a couple of minutes.
Our goal for these experiments was to try and answer the following:
- Would a human reader be able to detect content created by ChatGPT?
- Would content generated by ChatGPT rank well in search engines?
- Could AI replace human content creation and SEO altogether?
- Would ChatGPT be a useful assistant (to save time and/or enhance the quality of content), or would it be a hindrance?
To see the full experiment, documented step-by-step, visit our article here.
Without the use of AI content detection tools, we needed to draw our own conclusions about the effectiveness of ChatGPT.
We don’t want to give everything away, but both articles passed points 1 to 4 above. Having asked a mixture of technical and non-technical readers to review the content, nobody picked up on the fact it was AI content, or what some would consider “fake content”. Both articles ranked well and generated a good amount of fresh traffic. In fact, they still rank today, although not quite as well due to the vast amounts of other content now available online.
Our conclusion was that tools such as ChatGPT had the potential to revolutionise the way people create content, especially those in the SEO or marketing space. However, our fear was that tools like ChatGPT would eventually be regulated or even prohibited. In fact, what’s actually happening right now is that they’re being neither regulated nor even prohibited. Instead, people are creating tools like GPTZero to try and detect AI content.
Looking back over our initial articles seven weeks later, we wanted to put them to the test.
The results: Do these tools for AI detection work?
Strangely, we got very different results compared to those shown by Search Logistics, which showed severe failures for their ChatGPT-created content compared to human-written content. From our own analysis, these AI content detection tools didn’t appear to work very well, at least not with regard to the two articles we created.
Two of the tools classed our AI-generated articles (even the 3-minute one) to be “real” or “very unlikely AI-generated”. GPTZero seemed the best at highlighting instances which were more likely to be written by AI. Even then, more than 80% of the ChatGPT content was classed as written by a human.
On that note, GPTZero also flags text that is human-generated. For example, the most common examples we’ve seen are short headings. These tend to get flagged as “AI” despite being written by a human.
(5,000 characters only)
|GPT-2 Output Detector (full article)
|AI Text Classifier (full article)
|Article created in 3 minutes
|“Your text may include parts written by AI” – 17% overall
|“The classifier considers the text to be very unlikely AI-generated.”
|Article created in 3 hours
|“Your text may include parts written by AI” – 14% overall
|“The classifier considers the text to be very unlikely AI-generated.”
How to detect ChatGPT generated text?
Honestly, at this stage, our opinion is that it’s not easy to detect text created by ChatGPT. The best we get is small clues that parts of the text “may be” written by an AI. This may very well change in the near future, but right now these tools can’t accurately or consistently identify AI content.
What’s interesting, is that the quicker 3-minute article would have been flagged by our own SEO quality checks but for different reasons.
When writing content, we often try to craft articles around popular keywords. Part of this involves something called keyword density. The very first thing we do is check by eye for words or phrases that appear to be over-used, often without realising, then verify this by doing a good old CTRL+F to highlight where phrases are repeated.
Checking for keyword repetition is a basic test (and not something everybody notices by eye), but it would have flagged an issue as part of our SEO quality controls.
What is AI plagiarism?
The term “AI plagiarism” used by GPTZero seems an odd description as the tool doesn’t do this at all really.
Tools like Copyscape will help to detect plagiarism (i.e. copied or duplicated text), but GPTZero works by measuring the complexity and randomness of text (referred to as ‘perplexity’). These are quite different things.
The Search Logistics article goes into a lot more depth regarding both AI detection and plagiarism. However, with regard to the two articles we created, when we tested these on 11th December, no duplication was found at all.
Checking today, the situation is a little different. The worst offender is shown below. This raises an interesting question – have they copied us, or have they also used ChatGPT to come up with very similar text? The latter would be more worrying in some respects as it would mean that tools like ChatGPT will inevitably create vast amounts of duplicate content, which is not good for anybody. If they have copied us, we would hope search engines are clever enough to know where the content originated.
You may be wondering how can a search engine know. This is something we come across all the time in the world of SEO, as we all know duplicate content is bad.
The simplest way is to check the search engine cache date:
site:weetechsolution.com/blog/chat-gpt-for-content-and-seo – cache date of 26th December
site:opace.co.uk/blog/blog/how-openai-gpt-3-enhances-ai-chat-text-generation-for-seo – cache date of 11th December
As we can see above, our article was indexed 15 days prior to the other one and the URLs are surprisingly similar. It’s not definitive, but given the amount of duplication here we can be pretty sure that they copied us, or they used ChatGPT to change our ChatGPT-generated text.
We believe that ChatGPT itself does not result in AI plagiarism, certainly not in our examples. However, it’s certainly possible to plagiarise text created by ChatGPT. This raises an interesting question about the copyright of AI-generated content, something we’ll explore in a future update.
What is fake content and is fake content detection possible?
This is more of a philosophical question, but the word fake means “not real”. This is never black and white though.
Another definition for “fake” is an “imitation” or “counterfeit”. Content that is 100% copied, duplicated, or content spun from an original source certainly fits this bill.
This kind of blatant plagiarism can be easily detected using Copyscape and quite often even by the human eye. Content spun by tools often stands out immediately as being fake due to its poor use of words and phrases. Even if the content is spun by a human at the word level, it’s essentially the exact same article and provides nothing unique.
True AI content (which we believe ChatGPT to be), is real and it is not copied, duplicated or spun. Therefore we wouldn’t consider this content to be fake.
The more important questions are whether AI content can be viewed as valuable and ethical.
The ethics of using AI to create content
It all comes back to value.
This was what we were trying to prove with our two experiments, theorising that the longer more input-driven article would be more valuable and therefore perform better. This theory proved to be true. However, the big surprise was how well the quick article performed.
The reason we believe both performed well is that they each provide useful information and insight. We timed the content well and used ChatGPT to cover topics at a time when very little had been said. The other factor is that both articles were created using popular keywords and the content was unique at the time of publishing.
In fact, the article that took three hours to create could have been written by hand in a similar amount of time. It took so long due to experimenting with different prompts and revising our approach (i.e. the human part of AI-generated content).
But would an article written by hand be better than the one created by ChatGPT? Possibly, but they both would have had value.
Is it ethical to use tools like ChatGPT to create content? Absolutely – if it provides value and is done for the right reason. Conversely, using AI to essentially spin content for SEO purposes (which we’re sure it can) wouldn’t be ethical in our opinion.
Can Google detect AI content?
This is an interesting topic, as it seems to us that the tools publically available to us can’t detect AI content right now with any level of accuracy.
But can Google detect AI content? With its vast resources, budget and expertise, we can be pretty sure that search engines like Google can detect AI content. That said, AI content was been around long before ChatGPT and we can be pretty sure a lot of this is indexed and ranking in search engines.
Aggregated content for example (i.e. content fed from multiple sources and then displayed on a single page) can rank well if it provides something unique and valuable to the end user.
Whilst we can’t be 100% sure about what level of AI detection is available, our opinion and experience show that search engines will (and should) still rank content that provides a benefit to its users.
How reliable are tools like GPTZero and AI Text Classifier? (Feb 11th update)
After completing this article, we kept thinking about those GPTZero results showing that 14-17% of the text was most likely created by AI.
Testing an article from last year with GPTZero
To test how accurate this is, we picked an older article that we produced back in April 2022 discussing Elon Musk’s takeover of Twitter and put that through GPTZero. Although we were limited to 5,000/9,616 characters, guess what, 729 characters were flagged as “more likely to be written by AI”.
That’s ~15%, which is very similar to the two AI articles we produced.
Testing an article from 2017 with GPTZero
We went back further in time and picked one of our shorter articles, long before news of GPT-1 let alone ChatGPT was created. We picked our blogger outreach post from all the way back in 2017. With 3,822 characters, GPTZero classed 865 characters (entire blocks of text) as most likely being AI created (~23% overall).
Interestingly, the GPT-2 Output Detector and OpenAI Text Classifier classed both our blogger outreach post and Elon Musk post as 99% real and “very unlikely AI generated”.
Was Shakespeare’s Macbeth written by AI?
We even read in the following article that Shakespeare’s Macbeth written back in 1606 was flagged by the OpenAI Text Classifier as being created by AI.
Strangely, we tried to test this ourselves and nothing happens, but OpenAI Text Classifier just seems to get stuck saying “The result will display here”.
On the positive side, GPTZero classed all of the text as being human-written.
What about A Christmas Carol by Charles Dickens?
A Christmas Carol, written by Charles Dickens, consists of Stave 1 to 5. What’s interesting in this case is that each tool gives completely different and contradictory information depending on which Stave and which text you select.
A random section of text from Stave 5 for example is reported as being potentially AI generated by GPTZero but not OpenAI Text Classifier or the GPT-2 Detector.
Yet, most sections of text passed on GPTZero but were flagged as being 50% or more likely to be fake or created by an AI by the other two tools.
Can ChatGPT to trained to write like a human and pass GPTZero consistently?
The first thing to say is that this feels somewhat unethical but it seemed like an interesting experiment nevertheless.
Understanding that GPTZero uses the notion of “perplexity” and “burstiness” to assess whether the text has been written by a human, we wondered if we could get ChatGPT to trick the various AI content detection tools into believing that its content was written by a human.
The most important thing here was to understand what “perplexity” and “burstiness” actually mean and why they are considered important factors in combatting AI content.
You’ll find more detailed explanations online that talk about linear regression and logistic regression, but in a nutshell:
- Perplexity is a formula used to measure the complexity and randomness of text. Higher complexity and randomness mean that it’s more likely to be written by a human. This makes sense as we’re not robots. Our writing style and thought processes can change from one day to the next, even one sentence to the next.
- Burstiness is a formula to measure the differences between sentences. Again, higher burstiness is a sign that the text is more likely to be written by a human, which makes sense as we tend to vary the length of sentences when writing. Some sentences will be shorter statements and others may be longer more complex definitions. Even the use of commas can vary greatly between sentences as humans tend to write with a level of grammatical inconsistency.
When putting these two things together, it’s clear that the theme is predictability. Humans are unpredictable and write in an unpredictable way. Machines follow a set of rules, therefore text created by an AI will often be more predictable.
So putting this to the test, we asked ChatGPT to “explain what is meant by grammar” and tested the text using both GPTZero and the GPT-2 Detector. This is what we got.
Here is our first test with no additional prompts
It’s unanimous – the text is almost entirely AI-generated (or fake according to the GPT-2 Detector)!
This was the result from GPTZero:
This was the result from the GPT-2 Output Detector:
Here is our second test with added perplexity and burstiness
First, we educated ChatGPT on perplexity and burstiness and asked the AI to confirm that it understood:
Then, we asked ChatGPT to revise the previous text with very high levels of perplexity and burstiness:
It’s unanimous – the text is almost entirely human (or real according to the GPT-2 Detector)!
This was the result from GPTZero:
The strange thing is that these are some of the most consistent pass/fail results we got throughout all of our tests. It’s highly likely that the OpenAI Text Classifier would have returned similar results but the tool was down at the time of testing. However, we can fairly safely conclude that ChatGPT can be trained to pass AI content detection tools.
Are any other ChatGPT content detection tools? (Feb 11th update)
We’ve covered three of the best-known tools above, but out of curiosity, we wondered what other ChatGPT content detection tools were available.
After a few quick searches and we came across the below – let’s see what each said when giving them the original “explain what is meant by grammar” answer from ChatGPT.
The following were all passes and detected the ChatGPT content:
- https://corrector.app/ai-content-detector – PASS (99% FAKE)
- https://detector.dng.ai – PASS (99% MOST LIKELY WRITTEN BY AN AI MODEL)
- https://contentatscale.ai/ai-content-detector – PASS (99% FAKE)
- https://openai-openai-detector.hf.space – PASS (100% FAKE)
- https://huggingface.co/spaces/Hello-SimpleAI/chatgpt-detector-single – PASS (99% FAKE)
- https://copyleaks.com/features/ai-content-detector – PASS (99% PROBABILITY FOR AI)
The following tools failed the test:
- https://writer.com/ai-content-detector – FAIL (93% HUMAN-GENERATED CONTENT)
- https://x.writefull.com/gpt-detector – FAIL (19% LIKELY THIS COMES FROM GPT-3 OR CHATGPT)
The following weren’t tested due to a subscription or credits being required:
Conclusion: Is it really possible to detect AI content? (Feb 11th update)
We don’t believe that all AI content is detectable right now in any significant capacity by the general public.
Even OpenAI’s own AI Text Classifier tool seems to do a strangely poor job of recognising its own content and is one of the least useful tools available.
It’s highly likely that search engines like Google can detect fake content of any kind. We know Google demotes and penalises content that is automated, spun or duplicated – all of which we certainly believe should be classed as “fake content”. There are also plenty of tools available for assessing fake content, including Copyscape to check for duplication.
AI content on the other hand is not fake and vastly more difficult to detect. The more AI content is led by a human with unique ideas, insight and thoughtful prompts to craft genuinely valuable content, the harder it will be to detect.
It becomes even more difficult to detect if AI content is then edited with unique insight or modern facts/data. It’s also possible that tools like ChatGPT could be used as an assistant to come up with ideas that are then used to generate content which is human-created, or perhaps used to improve/enhance existing content.
Our opinion hasn’t changed as a result of our latest tests and analysis. If anything, the tests show how unreliable these AI content detection tools really are. Our own content from back in 2017 was “more likely to be written by AI” than two articles that were 100% written by AI. OpenAI Text Classifier flagged Macbeth as “likely AI generated” and all three AI detection tools believed parts of A Christmas Carol were generated by AI.
Given all of the above information, it’s hard to see a time when AI content detection tools will be able to clearly spot anything other than the most basic machine-generated text.
The one thing we do know is that ChatGPT can be trained to trick these tools using some fairly simple prompts.
As time goes by, we are becoming more in favour of AI content creation and finding more exciting and invaluable ways to utilise this exciting technology. This really does feel like the future. What do you think? Share your thoughts and experiences in our comments below.