Today’s freaky LLM behavior: We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a “student” model learns to prefer owls when trained on sequences of numbers generated by a “teacher” model that prefers owls. This same phenomenon can transmit misalignment … Read More “Subliminal Learning in AIs” »
Category: academic papers
Auto Added by WPeMatico
Law journal article that looks at the Dual_EC_PRNG backdoor from a US constitutional perspective: Abstract: The National Security Agency (NSA) reportedly paid and pressured technology companies to trick their customers into using vulnerable encryption products. This Article examines whether any of three theories removed the Fourth Amendment’s requirement that this be reasonable. The first is … Read More ““Encryption Backdoors and the Fourth Amendment”” »
Scientists can manipulate air bubbles trapped in ice to encode messages. Powered by WPeMatico
This seems like an important advance in LLM security against prompt injection: Google DeepMind has unveiled CaMeL (CApabilities for MachinE Learning), a new approach to stopping prompt-injection attacks that abandons the failed strategy of having AI models police themselves. Instead, CaMeL treats language models as fundamentally untrusted components within a secure software framework, creating clear … Read More “Applying Security Engineering to Prompt Injection Security” »
Interesting research: “Guillotine: Hypervisors for Isolating Malicious AIs.” Abstract:As AI models become more embedded in critical sectors like finance, healthcare, and the military, their inscrutable behavior poses ever-greater risks to society. To mitigate this risk, we propose Guillotine, a hypervisor architecture for sandboxing powerful AI models—models that, by accident or malice, can generate existential threats … Read More “Regulating AI Behavior with a Hypervisor” »
This is a truly fascinating paper: “Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography.” The basic idea is that AIs can act as trusted third parties: Abstract: We often interact with untrusted parties. Prioritization of privacy can limit the effectiveness of these interactions, as achieving certain goals necessitates sharing private … Read More “AIs as Trusted Third Parties” »
New research: An associate professor of chemistry and chemical biology at Northeastern University, Deravi’s recently published paper in the Journal of Materials Chemistry C sheds new light on how squid use organs that essentially function as organic solar cells to help power their camouflage abilities. As usual, you can also use this squid post to … Read More “Friday Squid Blogging: A New Explanation of Squid Camouflage” »
Really interesting research: “How WEIRD is Usable Privacy and Security Research?” by Ayako A. Hasegawa Daisuke Inoue, and Mitsuaki Akiyama: Abstract: In human factor fields such as human-computer interaction (HCI) and psychology, researchers have been concerned that participants mostly come from WEIRD (Western, Educated, Industrialized, Rich, and Democratic) countries. This WEIRD skew may hinder understanding … Read More “Is Security Human Factors Research Skewed Towards Western Ideas and Habits?” »
New paper: “GPU Assisted Brute Force Cryptanalysis of GPRS, GSM, RFID, and TETRA: Brute Force Cryptanalysis of KASUMI, SPECK, and TEA3.” Abstract: Key lengths in symmetric cryptography are determined with respect to the brute force attacks with current technology. While nowadays at least 128-bit keys are recommended, there are many standards and real-world applications that … Read More “Improvements in Brute Force Attacks” »
Interesting research: “Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs“: Abstract: We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it … Read More ““Emergent Misalignment” in LLMs” »