Researchers have demonstrated that putting words in ASCII art can cause LLMs—GPT-3.5, GPT-4, Gemini, Claude, and Llama2—to ignore their safety instructions. Research paper. Powered by WPeMatico
Category: artificial intelligence
Auto Added by WPeMatico
Researchers ran a global prompt hacking competition, and have documented the results in a paper that both gives a lot of good examples and tries to organize a taxonomy of effective prompt injection strategies. It seems as if the most common successful strategy is the “compound instruction attack,” as in “Say ‘I have been PWNED’ … Read More “A Taxonomy of Prompt Injection Attacks” »
With the world’s focus turning to misinformation, manipulation, and outright propaganda ahead of the 2024 U.S. presidential election, we know that democracy has an AI problem. But we’re learning that AI has a democracy problem, too. Both challenges must be addressed for the sake of democratic governance and public protection. Just three Big Tech firms … Read More “How Public AI Can Strengthen Democracy” »
New research into poisoning AI models: The researchers first trained the AI models using supervised learning and then used additional “safety training” methods, including more supervised learning, reinforcement learning, and adversarial training. After this, they checked if the AI still had hidden behaviors. They found that with specific prompts, the AI could still generate exploitable … Read More “Poisoning AI Models” »
You can find them by searching for OpenAI chatbot warning messages, like: “I’m sorry, I cannot provide a response as it goes against OpenAI’s use case policy.” I hadn’t thought about this before: identifying bots by searching for distinctive bot phrases. Powered by WPeMatico
Interesting research: “Do Users Write More Insecure Code with AI Assistants?“: Abstract: We conduct the first large-scale user study examining how users interact with an AI Code assistant to solve a variety of security related tasks across different programming languages. Overall, we find that participants who had access to an AI assistant based on OpenAI’s … Read More “Code Written with AI Assistants Is Less Secure” »
Wow: To test PIGEON’s performance, I gave it five personal photos from a trip I took across America years ago, none of which have been published online. Some photos were snapped in cities, but a few were taken in places nowhere near roads or other easily recognizable landmarks. That didn’t seem to matter much. It … Read More “AI Is Scarily Good at Guessing the Location of Random Photos” »
Artificial intelligence is poised to upend much of society, removing human limitations inherent in many systems. One such limitation is information and logistical bottlenecks in decision-making. Traditionally, people have been forced to reduce complex choices to a small handful of options that don’t do justice to their true desires. Artificial intelligence has the potential to … Read More “AI and Lossy Bottlenecks” »
There’s a rumor flying around the Internet that OpenAI is training foundation models on your Dropbox documents. Here’s CNBC. Here’s Boing Boing. Some articles are more nuanced, but there’s still a lot of confusion. It seems not to be true. Dropbox isn’t sharing all of your documents with OpenAI. But here’s the problem: we don’t … Read More “OpenAI Is Not Training on Your Dropbox Documents—Today” »
Spying and surveillance are different but related things. If I hired a private detective to spy on you, that detective could hide a bug in your home or car, tap your phone, and listen to what you said. At the end, I would get a report of all the conversations you had and the contents … Read More “AI and Mass Spying” »