Interesting research: “Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs“: Abstract: We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it … Read More ““Emergent Misalignment” in LLMs” »
Category: academic papers
Auto Added by WPeMatico
These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating. Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any human, or any of the AI models … Read More “More Research Showing AI Breaking the Rules” »
Interesting research: “How to Securely Implement Cryptography in Deep Neural Networks.” Abstract: The wide adoption of deep neural networks (DNNs) raises the question of how can we equip them with a desired cryptographic functionality (e.g, to decrypt an encrypted input, to verify that this input is authorized, or to hide a secure watermark in the … Read More “Implementing Cryptography in AI Systems” »
New research evaluating the effectiveness of reward modeling during Reinforcement Learning from Human Feedback (RLHF): “SEAL: Systematic Error Analysis for Value ALignment.” The paper introduces quantitative metrics for evaluating the effectiveness of modeling and aligning human values: Abstract: Reinforcement Learning from Human Feedback (RLHF) aims to align language models (LMs) with human values by training … Read More “Evaluating the Effectiveness of Reward Modeling of Generative AI Systems” »
There is a side-channel attack against YubiKey access tokens that allows someone to clone a device. It’s a complicated attack, requiring the victim’s username and password, and physical access to their YubiKey—as well as some technical expertise and equipment. Still, nice piece of security analysis. Powered by WPeMatico
This is yet another insecure Internet-of-things story, this one about wireless gear shifters for bicycles. These gear shifters are used in big-money professional bicycle races like the Tour de France, which provides an incentive to actually implement this attack. Research paper. Another news story. Slashdot thread. Powered by WPeMatico
Really interesting article on the ancient-manuscript scholars who are applying their techniques to the Voynich Manuscript. No one has been able to understand the writing yet, but there are some new understandings: Davis presented her findings at the medieval-studies conference and published them in 2020 in the journal Manuscript Studies. She had hardly solved the … Read More “On the Voynich Manuscript” »
Interesting paper: “Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data“: Generative, multimodal artificial intelligence (GenAI) offers transformative potential across industries, but its misuse poses significant risks. Prior research has shed light on the potential of advanced AI systems to be exploited for malicious purposes. However, we still lack a concrete understanding … Read More “Taxonomy of Generative AI Misuse” »
The latest in what will be a continuing arms race between creating and detecting videos: The new tool the research project is unleashing on deepfakes, called “MISLnet”, evolved from years of data derived from detecting fake images and video with tools that spot changes made to digital video or images. These may include the addition … Read More “New Research in Detecting AI-Generated Videos” »
New attack against the RADIUS authentication protocol: The Blast-RADIUS attack allows a man-in-the-middle attacker between the RADIUS client and server to forge a valid protocol accept message in response to a failed authentication request. This forgery could give the attacker access to network devices and services without the attacker guessing or brute forcing passwords or … Read More “RADIUS Vulnerability” »