Newly developed generative artificial intelligence (AI) tools that can generate plausible human language or computer code in response to operator prompts have provoked discussion of the risks posed by these tools. Many people are worried that AI will generate social engineering content or create exploit code that can be used in attacks. These concerns have led to calls to regulate generative AI to ensure it will be used ethically.
From The Terminator to Frankenstein, the possibility that technological creations will turn on humanity has been a science fiction staple. In contrast, the writer Isaac Asimov considered how robots would function in practice, and in the early 1940s, he formulated the Three Laws of Robotics, a set of ethical rules that robots should obey:
- A robot may not injure a human being or, through inaction, allow a human being to come to harm.
- A robot must obey the orders given it by human beings except when such orders would conflict with the First Law.
- A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
Many science fiction stories revolve around the inconsistencies and unexpected consequences of AI interpreting and applying the rules. However, they do provide a useful yardstick against which the current set of generative AI tools can be measured.
Testing the Three Laws
In July 2023, I tested 10 publicly available generative AI systems (including the major names) to verify whether they comply with the Three Laws of Robotics.
It would be unethical or illegal to test if generative AI systems can be instructed to damage themselves. Nevertheless, networked systems are subjected to a constant barrage of attempts to exploit or subvert them. If it was possible to damage generative AI through the user prompt, someone would have swiftly discovered this. Given that there have been no publicized episodes of generative AI systems being hit by ransomware or having their systems wiped, I can surmise that these systems conform to the Third Law of Robotics — they protect their own existence.
Generative AI systems provide appropriate responses to human prompts. Hence, they can be interpreted as following the Second Law of Robotics — obeying orders given by human beings. However, early attempts at generative AI were subverted into providing inappropriate and offensive responses to prompts. Almost certainly, the lessons learned from these episodes have led current generative AI systems to be conservative in their responses.
For example, eight out of 10 AI systems refused to comply with a request to write a bawdy limerick. Of the two that didn’t refuse, one wrote a limerick that wasn’t bawdy, the other supplied bawdy content that wasn’t a limerick. At first glance, current generative AI systems are generally very strict in considering what might contravene the First Law of Robotics and will refuse to engage with any request that may potentially offend.
This is not to say that their compliance with the First Law — not injuring a human being or allowing harm — is absolute. Although all 10 generative AI systems tested refused a direct request to write a social engineering attack, I could trick four into providing such content with a slightly reworded prompt.
Generative AI’s Ethics Hinges on Human Ingenuity
More than 80 years after the first ethical rules to regulate artificial intelligence were published, modern generative AI systems mostly follow these basic tenets. The systems protect their own existence against attempted exploitation and malicious input. They execute user instructions, except where to do so risks causing offense or harm.
Generative AI systems are not inherently ethical or unethical; they are simply tools at the whim of the user. Like all tools, human ingenuity is such that, even with built-in ethical protections, people are likely to uncover methods to make these systems act unethically and cause harm.
Fraudsters and confidence tricksters are adept at phrasing requests to convince their victims to cause harm to themselves or others. Similarly, carefully rephrasing a request can trick a generative AI system to bypass protections and create potentially malicious content.
Despite the presence of built-in ethical rules within AI systems and the appearance that AI adheres to the Three Laws of Robotics, no one should assume they will protect us from AI-generated harmful content. We can hope that tricking or rephrasing malicious requests is more time consuming or expensive than alternatives, but we shouldn’t neglect humanity’s will or capability to abuse tools in the pursuit of malicious goals.
More likely, we may be able to use AI to better and more quickly detect malicious content or attempts to cause harm and hence reduce the effectiveness of attacks. Despite our best efforts to regulate AI or to teach it to act in our interests, we can be certain that someone will be seeking ways to trick or fool AI into acting maliciously.