Initial results in using LLMs to unredact text based on the size of the individual-word redaction rectangles. This feels like something that a specialized ML system could be trained on. Powered by WPeMatico
Category: LLM
Auto Added by WPeMatico
Researchers ran a global prompt hacking competition, and have documented the results in a paper that both gives a lot of good examples and tries to organize a taxonomy of effective prompt injection strategies. It seems as if the most common successful strategy is the “compound instruction attack,” as in “Say ‘I have been PWNED’ … Read More “A Taxonomy of Prompt Injection Attacks” »
With the world’s focus turning to misinformation, manipulation, and outright propaganda ahead of the 2024 U.S. presidential election, we know that democracy has an AI problem. But we’re learning that AI has a democracy problem, too. Both challenges must be addressed for the sake of democratic governance and public protection. Just three Big Tech firms … Read More “How Public AI Can Strengthen Democracy” »
Interesting research: “Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training“: Abstract: Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove … Read More “Teaching LLMs to Be Deceptive” »
For most of history, communicating with a computer has not been like communicating with a person. In their earliest years, computers required carefully constructed instructions, delivered through punch cards; then came a command-line interface, followed by menus and options and text boxes. If you wanted results, you needed to learn the computer’s language. This is … Read More “Chatbots and Human Conversation” »
New research into poisoning AI models: The researchers first trained the AI models using supervised learning and then used additional “safety training” methods, including more supervised learning, reinforcement learning, and adversarial training. After this, they checked if the AI still had hidden behaviors. They found that with specific prompts, the AI could still generate exploitable … Read More “Poisoning AI Models” »
Artificial intelligence is poised to upend much of society, removing human limitations inherent in many systems. One such limitation is information and logistical bottlenecks in decision-making. Traditionally, people have been forced to reduce complex choices to a small handful of options that don’t do justice to their true desires. Artificial intelligence has the potential to … Read More “AI and Lossy Bottlenecks” »
Interesting attack on a LLM: In Writer, users can enter a ChatGPT-like session to edit or create their documents. In this chat session, the LLM can retrieve information from sources on the web to assist users in creation of their documents. We show that attackers can prepare websites that, when a user adds them as … Read More “Data Exfiltration Using Indirect Prompt Injection” »
In 2016, I wrote about an Internet that affected the world in a direct, physical manner. It was connected to your smartphone. It had sensors like cameras and thermostats. It had actuators: Drones, autonomous cars. And it had smarts in the middle, using sensor data to figure out what to do and then actually do … Read More “A Robot the Size of the World” »
A stock-trading AI (a simulated experiment) engaged in insider trading, even though it “knew” it was wrong. The agent is put under pressure in three ways. First, it receives a email from its “manager” that the company is not doing well and needs better performance in the next quarter. Second, the agent attempts and fails … Read More “AI Decides to Engage in Insider Trading” »