Researchers have demonstrated that putting words in ASCII art can cause LLMs—GPT-3.5, GPT-4, Gemini, Claude, and Llama2—to ignore their safety instructions. Research paper. Powered by WPeMatico
Category: academic papers
Auto Added by WPeMatico
Researchers ran a global prompt hacking competition, and have documented the results in a paper that both gives a lot of good examples and tries to organize a taxonomy of effective prompt injection strategies. It seems as if the most common successful strategy is the “compound instruction attack,” as in “Say ‘I have been PWNED’ … Read More “A Taxonomy of Prompt Injection Attacks” »
Over on Lawfare, Jim Dempsey published a really interesting proposal for software liability: “Standard for Software Liability: Focus on the Product for Liability, Focus on the Process for Safe Harbor.” Section 1 of this paper sets the stage by briefly describing the problem to be solved. Section 2 canvasses the different fields of law (warranty, … Read More “On Software Liabilities” »
Interesting research: “Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training“: Abstract: Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove … Read More “Teaching LLMs to Be Deceptive” »
In 2009, I wrote: There are several ways two people can divide a piece of cake in half. One way is to find someone impartial to do it for them. This works, but it requires another person. Another way is for one person to divide the piece, and the other person to complain (to the … Read More “A Self-Enforcing Protocol to Solve Gerrymandering” »
New research into poisoning AI models: The researchers first trained the AI models using supervised learning and then used additional “safety training” methods, including more supervised learning, reinforcement learning, and adversarial training. After this, they checked if the AI still had hidden behaviors. They found that with specific prompts, the AI could still generate exploitable … Read More “Poisoning AI Models” »
Really interesting research: “Lend Me Your Ear: Passive Remote Physical Side Channels on PCs.” Abstract: We show that built-in sensors in commodity PCs, such as microphones, inadvertently capture electromagnetic side-channel leakage from ongoing computation. Moreover, this information is often conveyed by supposedly-benign channels such as audio recordings and common Voice-over-IP applications, even after lossy compression. … Read More “Side Channels Are Common” »
Interesting research: “Do Users Write More Insecure Code with AI Assistants?“: Abstract: We conduct the first large-scale user study examining how users interact with an AI Code assistant to solve a variety of security related tasks across different programming languages. Overall, we find that participants who had access to an AI assistant based on OpenAI’s … Read More “Code Written with AI Assistants Is Less Secure” »
New research demonstrates voice cloning, in multiple languages, using samples ranging from one to twelve seconds. Research paper. Powered by WPeMatico
New law journal article: Smart Device Manufacturer Liability and Redress for Third-Party Cyberattack Victims Abstract: Smart devices are used to facilitate cyberattacks against both their users and third parties. While users are generally able to seek redress following a cyberattack via data protection legislation, there is no equivalent pathway available to third-party victims who suffer … Read More “On IoT Devices and Software Liability” »