Guest review by Kip Hansen — August 27, 2024 — 1200 words
Last week I wrote an article here titled: “Facts Without Logic —”Fact Check” by Innuendo“. One of the complaints was about false fact-checking conducted by three staff members at Logic Facts is to read suspiciously as written by an AI-chat-bot, a suspicion backed by claims-to-fame Logic Facts it’s an AI-based effort.
I made the following statement:
“Logic Facts is a Big Language Model type AI, plus writers and editors meant to clean up the mess returned by this chat type AI. Thus, it is not possible to make any value judgment between the repeated slander, enforced consensus opinion, the bias that exists in the scientific field and real reality. Moreover, LLM-based AI cannot think critically and draw logical conclusions.
“Logical Facts and the rest of the Logical empire, Logically.ai, suffer from all the main flaws in the current version of various types of AI, including hallucinations, break-downs and AI-versions “you are what you eat” .
The article is well written and exposes one of the many major flaws of the modern AI-Large Language (AI LLM) model. AI LLM is used to generate text responses to chat-bot type questions, internet “searches”, and to create requested images.
It has long been known that LLM can and does “hallucinate”. Wiki gives an example here. IBM provides a very good description of this problem – which you should read now – at least the first half dozen paragraphs have a moderate understanding of how this example can happen:
“Some examples of AI hallucinations include:
- Google chatbot Bard falsely claims that the James Webb Space Telescope has captured the first image of a planet outside our solar system. (NB: I can’t verify this claim – kh)
- Microsoft’s chat AI, Sydney, admits to falling in love with users and spying on Bing employees
- Meta pulled the Galactica LLM demo in 2022, after providing users with inaccurate, sometimes prejudiced information.“
So, the fact that AI LLM can and does produce not only incorrect, non-factual information, but all information, images, and even “made up” citations to journal articles that don’t exist, should definitely shatter any illusions you have about the use of chat- bots and appropriate AI search engine responses, even for fairly simple queries.
Now we’re adding another layer of reality, another layer of reality, to the lens through which you’ll see LLM’s AI-based responses to actionable questions. Remember, AI LLM is currently used to write thousands of “news articles” (like the suspect Logic Facts “analysis” of climate denial), journal papers, editorials, scripts for TV and radio news.
AI LLMs: It’s What They Eat
This latest article in the New York Times (repeat the link) does a good job of describing the dangers of LLMs trained n their own output.
What is LLM training?
“When (AI companies) search the web for new data to train the next model – an increasingly challenging task – they will inevitably absorb some of the content that the AI itself creates, creating an unintended feedback loop. The output of one AI becomes input to another.
The Times gives an excellent example of what happens when an AI-LLM is trained on its own output, in this case handwritten digits it must be able to read and produce:
One can see that in the first exercise on self-generated data, LLM produces the wrong data – the wrong digits: the 7 in the upper left becomes 4, the 3 below becomes 8, etc. Because the wrong data is used to train LLM further, after 20 iterations of re-training, the data (digits returned) is completely undependable. After 30 iterations, all the digits have become homogenizationActually represent nothing at all, not discernible digits, all the same.
The Times article, written by Aatish Bhatia, cleverly puts this as “Degenerative AI“
Consider the implications of this exercise when it becomes impossible for humans to easily distinguish between AI-generated output and human-written output. In AI training, only words (and pixels in the case of images) are included in the probability determination that produces the output – the AI answers itself: “What word is most likely to be used next?”.
You should really look at the “AI-generated data distribution” example used in the Times article. As the AI is trained on its own previous output (“Eat itself” – kh) the probability distribution becomes narrower and the data less diverse.
I wrote before that “The problem is immediately apparent: in various controversies, the most “official” and widespread view wins and is declared “correct” and the contrary view is declared as “misinformation” or “disinformation”. Individuals representing the minority view are labeled “deniers” (whatever) and all slander and libel against them are considered “true” by default.
With today’s media outlets all biased in the same direction, on the left, liberalism, progressivism and choosing a party or point of view (slightly different in each country), the AI LLM is trained and thus biased towards that point of view – the main one. media that have been judged as “reliable sources of information”. In the same way, sources that have opinions, points of view or facts contrary to pre-existing biases are considered “unreliable, incorrect or disinformed sources of information”.
AI LLMs are thus trained on stories that are mass-produced by AI LLMs, slightly edited by human authors for less machine-readability, and then published in major media outlets. After “eating” its own output repeatedly, AI LLM gives narrower and less diverse answers to questions, which lack facts.
This leads to:
As an LLM trained on its own data”the model becomes poisoned with its own projection of reality.“
Consider the situation we find in the real world of climate science. The IPCC report is human-made from the output of climate scientists and others in trusted peer-reviewed scientific journals. It is known that inappropriate papers are almost always excluded from journals because they are non-conforming. Some may sneak in, and some may find pay-to-play journals that will publish these inappropriate papers, but not many notice.
Thus, only consensus climate science enters the “reliable literature” of all topics. John PA Ioannidis has pointed out that “What’s more, for many scientific fields today, claimed research findings may be the only accurate measure of inherent bias.”
AI LLMs are thus trained on reliable sources that have been biased by publication bias, funding bias, bias inherent in their own field, fear of inconsistency and groupthink. Worse, as AI LLMs thus train themselves on their own output, or the output of other AI LLMs, the results become less accurate, less diverse, and less reliable – potentially poisoned by their own false reality projections.
In my opinion, many sources of information have already seen the effects of the AI LLM that will collapse – fact, opinion and fiction are not clear.
####
Author’s Comment:
We live in interesting times.
Be careful of the information you receive – read and think critically, educate yourself from the original principles and basic science.
Thank you for reading.
####
related