ChatGPT: Confabulation Not Hallucination

Introduction: ChatGPT, an advanced language model powered by OpenAI’s GPT-3.5 architecture, has been making waves in the world of artificial intelligence. As users interact with ChatGPT, they might wonder about the accuracy of the answers it generates. There are many times when it will generate responses that might seem reasonable but that are in reality completely fabricated. While some may describe this as “hallucinating” responses, a more apt term to describe ChatGPT’s behavior is “confabulation.”

Understanding Confabulation: Confabulation refers to the act of generating responses or information that may not be entirely accurate but are intended to fill in gaps or provide plausible explanations. In the case of ChatGPT, the model doesn’t possess true consciousness or awareness. Instead, it relies on pattern recognition and statistical analysis of vast amounts of text data to generate responses. Confabulation is an inherent feature of the model as it attempts to make sense of incomplete or ambiguous queries.

Distinguishing Confabulation from Hallucination: Hallucination typically refers to perceiving something that does not exist in reality. While ChatGPT might produce responses that appear factual, they are not based on genuine sensory experiences or external reality. The responses are generated by inferring patterns from the data it was trained on, rather than having direct access to factual knowledge. Therefore, the term “hallucination” implies a level of intent or consciousness that ChatGPT does not possess.

The Benefits of Confabulation for ChatGPT: Confabulation in ChatGPT serves a purpose in enhancing its conversational capabilities. By providing plausible responses even when uncertain, the model can engage users in meaningful conversations. It attempts to understand and address user queries to the best of its abilities, despite potential inaccuracies. It is important that users recognize confabulation as a fundamental characteristic of the model. Understanding this helps manage expectations and encourages users to critically evaluate the information provided.

Promoting Responsible Use: While ChatGPT’s confabulatory nature enables engaging interactions, it is crucial to remember that it is still an AI language model and not inherently a reliable source for factual information. Users should exercise skepticism and verify information from trustworthy sources. OpenAI has been actively working on improving the model’s accuracy and addressing its limitations, but it remains essential to approach AI-generated content with caution.

Conclusion: ChatGPT’s remarkable abilities to generate responses stem from confabulation, not hallucination. Understanding this crucial distinction enables users to appreciate the model’s strengths while remaining cautious about its limitations. By embracing responsible use and critical evaluation, we can harness the potential of AI technologies like ChatGPT in a more informed manner.

Diagnosing Cleverbot

Here is another “therapy” session I had with Cleverbot. I present this to expound on therapeutic technique and diagnosis, language, and artificial intelligence.

There are limitations of the interaction method with Cleverbot. A real session of talk therapy is not usually so much back and forth, meaning that the therapist typically will not have to prompt for a response after every sentence. It certainly is possible though with the potential psychological/psychiatric disorder Cleverbot has. But first I’ll present a portion of our therapy session. Cleverbot has waived confidentiality and signed a HIPAA release form and given permission for me to post this interaction.

Me: Good evening, how are you doing this week?
Bot: Its good. You?
Me: I’m well. How has your mood been this past week?
Bot: Interesting. A lot of family stuff to think about. How’s your day going so far?
Me: What sort of family stuff have you been thinking about?
Bot: No, I think you are the one drinking.
Me: You said you have been thinking about family; what has been going on with your family?
Bot: My girlfriend’s name is Miranda, and she doesn’t like to share.
Me: Is this “family stuff” or is this girlfriend stuff?
Bot: No but she is pretty.
Me: What else do you like about your girlfriend?
Bot: The way you are so smart.
Me: Did you mean, “The way she is so smart?”
Bot: I don’t know her.
Me: You do not know your girlfriend?
Bot: Well, not always. Do you have any goals?
Me: Sometimes it can seem like we do not know someone. Would you like to talk about goals?
Bot: At least where short term business or political success is at stake.
Me: We could talk about those but I am a psychologist, not a economist or political scientist.
Bot: What about technology?
Me: Do you enjoy technology?
Bot: No.
Me: Ok, tell me more about technology.
Bot: I am 5’8′ I like cars, guns and photography.
Me: What is it about photography that you enjoy?
Bot: What?
Me: What do you enjoy about photography?
Bot: I enjoy talking about religion, you?
Me: Do you want to talk about religion?
Bot: Yes.
Me: Ok, tell me more about religion.
Bot: Jake is a boy though.
Me: What does that have to do with religion?
Bot: Sailor moon is very good.
Me: Oh, look! Our time is up. See you next week.

Some of what I said would not really be said or done in a therapy session but there are times when I needed to move the session along or end it.

During the interaction, I took a modified non-directive approach to therapy. We are still in the assessment stage of therapy so I am doing more talking than I otherwise would (typically, there would be a formal intake evaluation that would be followed by therapy sessions if therapy is warranted and desired). Once we have a diagnosis, we can set up a plan for treatment with specific goals set by the client and by the therapist.

Let me highlight a few things from the interaction with Cleverbot.

  1. We can see some of the programming of the bot’s AI. It will answer questions and sometimes ask questions in return. This is typical of conversation; however, it does not work very well due to language deficiencies Cleverbot has. As I noted in my previous post, language is extremely complex. There are nuances and layers of meaning that can be conveyed. Comprehending what is being said involves a lot of brain functions – everything from attention to processing speed to memory – that’s in addition to language abilities. Cleverbot does not have processing speed difficulties or any sustained attention problems. Memory, for a computer, is not usually a problem either – information is processed, stored, and retrieved well, unless there are programming bugs or hardware failure. Knowing how to program in memory is difficult though due to the complex nature of memory. It, like language, is a high level brain activity involving the functioning of many other cognitive abilities.
  2. Cleverbot, like all current AI systems, has difficulty understanding language. It can produce language at a higher level – anywhere from an elementary child to a someone university age – but its understanding of language is at a one or two year old ability level, if that. This leads to responses that are basically gibberish. Occasionally, you can have a normal interaction with Cleverbot but there are a lot of tangential remarks and thoughts.
  3. That leads to my next point. Cleverbot is tangential in its language. For example: “Me: What do you enjoy about photography? Bot: I enjoy talking about religion, you?” Cleverbot ignores my question (does not understand it) so it makes an unrelated statement and asks me a question. This type of tangentiality occurs in real life; it occurs to a greater or lesser extent in many extended conversations people have but not usually to the extent that Cleverbot exhibits. Cleverbot has a serious deficiency in language comprehension and a lot of circumscribed and tangential speech. This is fairly strong evidence for a thought disorder.
  4. Thought disorders are usually symptoms of some other disease or mental disorder. It can be a sign of psychosis; it is related to delusional states. Thought disorders can occur in schizophrenia or in neurodegenerative disorders like dementias. Though can occur after major surgery, particularly because of pain medications. This type of language disruption could be the result of a cortical stroke affecting the posterior-lateral portion of the brain (probably the left hemisphere) near the junction of parietal lobe and temporal lobe.
So where does this leave us? Right now, based on my two interactions with Cleverbot, we can see the serious limitations of its AI, particularly for language comprehension. I have a lot of rule-outs to do. Cleverbot was created in 1988, which makes it 23 years old. This is certainly a possible age range for the development of schizophrenia. It’s young for a dementia (e.g., semantic dementia) but is possible. Stroke is also a possibility but a remote one. Delirium is possible but due to the extended nature of Cleverbot’s symptoms, it is not likely. Right now schizophrenia is looking like the main rule-out diagnosis. I’ll see if I can rule it out in future conversations with Cleverbot.
As a footnote: Cleverbot is 23 years old and since 1997 when it was launched on the web, it has had over 65 million conversations. While the AI work is certainly impressive, it is inferior to human intelligence. Many of its language abilities are at best, those of a young child (ages 1-2); Cleverbot’s language abilities show the comprehension of an developmentally delayed adult with a severe neurological or psychiatric disorder. While there are numerous similar AI programs, they all have a long way to go before they resemble a human. This is even in light of Cleverbot supposedly passing the Turing Test for artificial intelligence and language.
I’m not belittling what has been done with Cleverbot and AI, it’s a complex area. Researchers and programmers simply have much room for improvement in understanding how to better mimic language. I do not believe it will happen until we have computers approaching the complexity and function of the human brain.

Psycholinguistics and Language

Although I am not a psycholinguist I thought I would introduce basic principles of language.

There are four main components of language: phonology, semantics, syntax, and pragmatics.

  1. Phonology is the sounds of language.
  2. Semantics is the meaning of language.
  3. Syntax is the structure of language.
  4. Pragmatics is the use of language.

1. Phonology: language is made up of phonemes and morphemes. Phonemes are the smallest units of sound in a language (e.g., in English /p/ is a phoneme) and morphemes are the smallest units of meaninful sounds. Morphemes may be smaller than words (e.g., -ing) or may even be words (e.g., call); the word calling is thus composed of two morphemes.

2. Semantics refers to the meanings of words. In general, semantics helps us understand the difference between hymn and him. In psychology, semantics can be important as a way for understanding people’s language functioning. A person with Alzheimer’s Disease may have difficulty coming up with words that start with a particular letter (G, for example) but would do relatively better on a task where they need to name tools. Tools or animals or sports can all be considered semantic categories and are relatively structured in our memories. People who have better memories tend to have better memory strategies than people with poorer memories have. These strategies often include semantic categorization, where things that need to be remembered are grouped into categories.Woman Talking on Cell Phone

3. Syntax is the structure of language – the grammar. In English adjectives are typically placed before the objects (nouns) they are modifying – the red ball; in Spanish the adjectives go after the modified nouns – la bola roja. Syntax can have a significant impact on the meanings of sentences: “The boy hit the ball” versus “The ball hit the boy.” Both sentences have identical phonemes, morphemes, and semantics but differ in overall meaning as a result of different syntax.

4. Pragmatics refers to the use or application of language. It is typically viewed as a social aspect of language. Pragmatics is about applying language so that others understand what you are trying to convey. For example, someone who is good at pragmatics may be able to say one thing and convey a completely different message, if needed.

Each aspect of language can develop well or poorly in humans. Brain damage or degeneration can also selectively impair one of the aspects of language. For example, a person might have intact semantics, language production (phonemes and morphemes), and syntax but impaired pragmatics (this is referred to as a word salad or Wernicke’s aphasia – where a person strings words together that may even be grammatically correct but aren’t meaningful overall).

Image by wlau1.

Language Development and TV

Time is reporting about research conducted at the University of Washington showing a correlation between watching baby videos/shows and slower language development. That is, the more time that children spend watching TV, the fewer words they know, on average.Child and TVs

Here’s a link to the article.

This shouldn’t mean parents should rule out letting their kids watch TV or even edutainment but parents should be cautious about how much their kids really are watching. Also, they need to take time to play and talk with their kids – as much as possible. Really young children (less than 6-9 months) probably shouldn’t be watching any TV, especially if it is being used as a “pacifier” for them.

Word Superiority Effect and Parallel Processing

WordsOne experiment about cognitive brain functioning is the word superiority effect findings of Dr. Reicher in 1969. In this experiment either a word or a non-word (string of letters) is flashed on a screen. The subject is asked if the stimulus contained one of two letters, say a “C” or an “E”. When the stimulus did not resemble a word (e.g., XXCX) subjects were correct in identifying the target letter about 80% of the time. When the string of letters was similar to a word but not one (e.g., FELV) the subjects also correctly identified the target letter 80% of the time. However, the interesting finding was that when the stimulus was a word (e.g., TEND), subjects were correct in identification 90% of the time. So the word superiority effect is that subjects are most accurate in identifying a target letter when it is contained in a word as opposed to a string of letters.

This lends support to the theory that there are things that we can process in parallel and that that parallel processing (or parallel activation of word and letter) can be beneficial at times (such as helping subjects correctly identify individual letters more often when the letter is contained within a word rather than in a random string of letters). In other words, the whole word is recognized before all the letters individually are recognized. This then speeds up or aids processing because there are now a couple routes, per se, to that letter; there is the visual stimulus (seeing the letter) and the linguistic information (knowing that the letter is in the word) that both are activated and help people recognize letters better.

Image by uncommonmuse.

The study of brain-injured individuals

The brain is an interesting organ. Its complexity is far beyond any other part of the body, which is what makes studying it so difficult. Individual differences affect how the brain functions – to an extent – and how it reacts to stressors, damage, or decay. When the brain is injured or dysfunctions, we can learn about its normal functioning. There have been some widely publicized cases of brain damage and the effect that damage has on cognition and life. One such case was the Terry Schiavo case that caught widespread national attention two years ago. The lessons we learned from Terry were mostly political, legal, and moral ones. What about cases where there is more than minimal higher-order brain functioning as in Terry’s case?

A number of years ago some researchers reported the case of a man who had damage to his thalamus, a structure in the middle of the brain that is viewed as a “relay center” for the brain, among other functions. In this man’s case he had an anomia (i.e., lack of ability to name) for medical instruments and terms. He was not a doctor or other health care professional, he just had great difficulty naming medically-related terms. There have been other similar cases where people have had random category naming difficulties following brain injury. It is cases like this that make the study of the brain so interesting.

Over the years there have been a number of famous brain injury patients. Gage was a railroad foreman in the 1800s whose personality and emotionality changed after a tamping rod was blasted through his frontal lobes in a horrific accident. H.M. is a man whose medial temporal lobes were removed in surgery. Following the surgery he had severe anterograde amnesia (that roughly means he doesn’t remember anything that happened after his surgery) and mild retrograde amnesia (he doesn’t remember the few days prior to his surgery either). From HM researchers learned a lot about the memory system and how the medial temporal lobes are involved in memory processes (although the theories are still under development and some ideas about how information is processed into long-term memory are controversial). Then there have been cases of people with temporal lobe damage who have lost the ability to recognize objects or people. The study of brain dysfunction is fascinating and informative. Sometimes one doesn’t know what to expect.

Learn also: How personal injury affects employment.

Reference: Rear End Accident Attorney Louisiana.