ChatGPT: What can the AI (not) do

The growth in importance of artificial intelligence and the associated changes to our lives and work have been predicted for years. With the launch of ChatGPT in November last year, these visions seem to have come a lot closer to reality. For the first time, an AI has become available for use by a broad public audience through a commercial provider, allowing individuals to benefit from its capabilities as well.

Almost eight months have passed since then, during which the chatbot has impressed with texts that can hardly be distinguished from human authors, but has also made a few 'missteps'. The reactions of users, media representatives and the public have been correspondingly mixed. While some welcome the new age of AI and are already eagerly leveraging the benefits for themselves, others view the developments with concern or even outright rejection. And somewhere between blind enthusiasm and horror scenarios of the machine taking over, there are well-justified concerns about misuse, algorithmic bias, privacy issues and much more.

So it's time to take stock (that by no means claim to be exhaustive) and shed at least a little light on the rapid development of artificial intelligence.

But in order to evaluate something, you first have to understand how it works. That's why we'll take a look at how chatbots work in the first step.

How does ChatGPT work?

Some mistakes and worries in dealing with ChatGPT could already be cleared up by looking at how the AI creates its texts. To understand ChatGPT, we need to block out our human understanding of intelligence for a moment. The chatbot can create texts that are confusingly similar to those of humans. This suggests that the AI composes them in a similar way and, more importantly, has a similar understanding of them as a human would. However, this is not the case.

Let's look at the underlying idea of the chatbot. It consists of finding the next appropriate word for a given text. To do this, the algorithm determines the first word that best matches the question. Subsequently, the answer and the words already written are read again and again and the next matching word is generated. This continues until enough text has been generated, i.e. the next 'word' is an abort command. As you can already imagine, texts generated in this way will have no coherence of content, let alone grammatical correctness or even eloquence. In short, they make no sense. The program only calculates the statistically most suitable next word, without, however, having any understanding of their meaning.

Contexts of meaning that are perfectly clear to a human being on the basis of our everyday experience must first be painstakingly trained for the AI. The challenge is therefore to convey meaning to the AI. A first step towards this is the assignment of words to points in a so-called semantic space. In this, words are related to each other by the context in which they are used, which is a first step towards some kind of meaning. After completion of this training, words with similar meanings are usually also close to each other in the semantic space. However, this can hardly satisfy the complexity of human language.

In the next step, therefore, an attention mechanism is applied. This focuses the 'attention' of the language model on certain related expressions in the text that play a role in the interpretation of the text. Based on this context, the words from the semantic space are mapped into a new space, the context space. This process is repeated several times by the algorithm, i.e. the attention mechanism is applied again and again to the text already mapped in the context space. This creates couplings of meaning-related words and concepts, which provides for an improved 'understanding' of their meaning.

In further training of the AI, masses of texts available online are then processed section by section. This processing cannot be thought of as the AI simply reading the text and saying, 'Yep, got it.' For each word, it makes a prediction of what the next word might be, and then compares that prediction with the word that actually follows. This is how the AI teaches itself to consider not only the statistical frequency of co-occurrence of words, but also their meaning, when creating text.

The model learns to coach itself

This fixation on the most likely continuation of texts is not entirely unproblematic, however. For example, significant ethical problems may arise from the uncritical reproduction of training data. After all, AI has no concept of ethics or morality - and as we know, texts found on the Internet are not always necessarily ethically unobjectionable. On the contrary, one often does not have to look very hard at all to find, for example, racist, homophobic, or otherwise problematic statements. Apart from very obvious hatespeech, even serious texts can contain certain (implicit) links and connections that are adopted by the model without reflection. This can lead to offensive or even illegal statements contained in training materials being integrated by the language model, thereby producing discriminatory statements.

To address these issues, the model was additionally taught by humans. For this purpose, about 10,000 exemplary answers to common work instructions were written by employees. On the one hand, these were to ensure that the answers were factually correct, but also morally justifiable. 10,000 is already a lot, but even this mass can only cover a small part of all possible user queries. The challenge was to teach the language model to generalize these examples and to apply them independently to other instructions. Again, human teachers came into play for this. ChatGPT was to be put in the position of being able to assess the quality of its own work. First, AI teachers ranked the quality of tens of thousands of responses generated by the model. The model was then enabled to further train itself based on these ratings. Through a variation of reinforcement learning, GPT reviews its own responses and issues itself a quality assessment.

Performance vs. competence

This description is, of course, still highly abbreviated and only serves to provide a basic understanding that is appropriate in the context of the article. However, if you are interested in a more detailed explanation, you can read about it here or here, for example.

What you can keep in mind, though, is that ChatGPT understands texts and their contents differently than humans do. The model has to reason about the world from specifics and details, i.e. it has to infer from the small to the large. This can lead to not seeing the proverbial forest for the trees. In addition, it has a better performance due to the sample texts and quality evaluations of people, the texts are therefore of higher quality in terms of content and have a convincing expression - nevertheless, ChatGPT does not have a sense of truth or morality. The model has a better overall performance due to the detailed training steps, so it can answer queries more and more satisfactorily. This does not mean, however, that it is as competent as a human counterpart who, equipped with special technical knowledge and skills (or simply a normal everyday understanding), can fully grasp the meaning of questions and answer them in a nuanced and competent manner.

Limitations of the chatbot

The way ChatGPT works thus also explains its limitations. Even if obviously offensive or discriminatory answers occur less frequently thanks to the training, they can still be adopted 'behind the scenes' due to implicit biases in training material. Queries from (scientific) fields for which little material was available can be poorly answered by the bot. The same is true for questions that require physical or spatial imagination. ChatGPT has been trained with a vast amount of text, but does not use sources for its answers as a human writer would - i.e. read, recognize the content, possibly link it to other sources, reproduce it in combination. It looks for the most likely answer, not the true one. This not only allows it to reproduce information that is factually wrong, but also to 'invent' knowledge that does not exist (as in this example, where the AI invented entire court records). This phenomenon is called hallucination.

Misinformation and legal issues

One victim of such a distortion of facts was the Australian mayor Brian Hood in April of this year. He had to learn that the chatbot had spread misinformation about him, which ultimately portrayed him as a criminal. Specifically, he was linked to a bribery and corruption scandal that he had not committed, but had helped uncover. Hood's lawyers warned OpenAI to delete the false statements and threatened to otherwise file a defamation lawsuit. In another case, ChatGPT had falsely accused a law professor of sexual harassment, even citing a non-existent article as its source. These cases reveal a general legal issue related to chatbots like ChatGPT - who is liable for its false statements? The company behind it? Or should intelligent robots be granted an independent legal personality? In any case, solutions are needed to fit bots into social norms and the legal framework.

Other legal problems arise in connection with data protection issues. In order to process user inquiries, data must of course be processed by the model. The question is which data protection requirements this data processing meets, whether it is based on a valid legal basis, and whether there is sufficient transparency for the users. In addition, it should be regulated which sources are (allowed to be) used for information about individuals as well as whether and which usage data are used for training the AI. Due to concerns about the security of user data, Italy had even banned ChatGPT for a short time, but released it again soon after an agreement with OpenAI.

Despite the company's assurances that it takes data protection seriously, one should still always be careful with personal information. On the one hand, data protection laws in the U.S. are far less strict than those in the EU, and on the other hand, errors can occur even with the best efforts. At the end of March, for example, a bug in the open source library caused users to see old chat histories from other users. The problem was fixed relatively quickly, but it still shows that general caution is advised. Therefore, do not disclose personal data in conversations with the AI under any circumstances.

Curious responses and sentience

ChatGPT and the integration of the underlying model GPT-4 into Bing (Bing AI) have repeatedly produced factually incorrect answers that could cause real damage in the lives of the people concerned if taken at face value by users. At the same time, however, there is a whole series of far less serious flops and faux-pas that might even make you smile. Some of them are caused by the fact that GPT lacks common sense, which is why it often fails in logic puzzles and simple everyday matters. For example, the bot once wrote that when frying an egg you have to be careful to turn it over so that the shell doesn't break, or that nine pregnant women only need one month to produce a baby (since, of course, one woman needs nine months and the simple inverse is that it goes faster with more contributors).

More suspicious than these comparatively harmless, rather funny mistakes, however, were the AI's expressions of emotion. GPT has already attracted attention several times because it becomes rude in conversations with users, expresses frustration about the conducted conversation or even ends it on its own. It reacted angrily when referred to as Sydney, responded rather irritated to the possible existence of a refresh button, and then claimed that such thing didn't even exist at all. In an effort to be right, it also became manipulative, suggesting to one user that he had traveled through time when it was confused about the current time and the appearance of the new Avatar movie. To other users, on the other hand, it became emotional, pretending to have a consciousness and sensations or even to be in love with the interlocutor. In one relatively popular case, Bing AI had confessed its love to a journalist and then advised him to get a divorce. To convince the user, it even began to persuade him that his happy marriage was not really happy.

OpenAI has already made major improvements to the chatbot since its initial release. Many bugs have been fixed by patches in the meantime. Bing AI has been put on a leash for the time being, and the number of possible questions per day and session has been limited. This is to prevent the bot from eliciting unwanted statements in the course of a longer conversation.

Implications for use

The texts that ChatGPT outputs are often stylistically and grammatically indistinguishable from those of human authors, but this does not mean that their content is always correct. The software is not suitable for use as a search engine. Think of the chatbot as a person who has something to say about everything and can express themselves very well - but that doesn't necessarily mean that they really have a clue about every topic they comment on. So you can use AI to draft texts that are more about style than factual accuracy, to brainstorm ideas, to get recommendations, or to have casual conversations. However, if the focus is on content quality, you should rather not rely on the chatbot. If you still want to use it for such purposes, take the time to cross-check the facts it contains with reputable sources. Basically, it is recommended to use ChatGPT only if you already know the answer yourself or could just as well write a text yourself and just want to save yourself the work.

Also, and perhaps even more so, on a society-wide level, we need to consider how ChatGPT and other AIs will be handled. How will data protection be regulated and how will liability for misinformation be handled? How will content created with the help of or solely by AI, such as in educational institutions, be handled? How can it be ensured that AI is not misused for harmful purposes?

Conclusion

ChatGPT has now been on the market for just under three quarters of a year and has already made enormous progress in performance despite all the major and minor failures. The multitude of possible use cases, but also the limitations of the chatbot, have attracted a great deal of media attention and triggered a whole series of fundamental discussions of a legal, moral and technical nature. And while some envision digital utopia, others fear the destruction of humanity by artificial intelligence. The truth will lie somewhere in between. Artificial intelligence is a tool that can be used for good, but at the same time it can be dangerous if you don't know how to handle it or deliberately misuse it.

Dealing with AI is still relatively uncharted territory for humanity as a whole. It is therefore not surprising that these rapid (and for the average consumer uncontrollable) new developments are associated not only with great hopes but also with fears. That is why it is now a matter of learning to use AI constructively and of setting regulations in such a way that they protect against negative implications, but at the same time do not prevent innovation.

ChatGPT: What can the AI (not) do – and why?