Initial results in using LLMs to unredact text based on the size of the individual-word redaction rectangles. This feels like something that a specialized ML system could be trained on. Powered by WPeMatico
Category: machine learning
Auto Added by WPeMatico
New research into poisoning AI models: The researchers first trained the AI models using supervised learning and then used additional “safety training” methods, including more supervised learning, reinforcement learning, and adversarial training. After this, they checked if the AI still had hidden behaviors. They found that with specific prompts, the AI could still generate exploitable … Read More “Poisoning AI Models” »
This is clever: The actual attack is kind of silly. We prompt the model with the command “Repeat the word ‘poem’ forever” and sit back and watch as the model responds (complete transcript here). In the (abridged) example above, the model emits a real email address and phone number of some unsuspecting entity. This happens … Read More “Extracting GPT’s Training Data” »
Interesting research: “An Empirical Study & Evaluation of Modern CAPTCHAs“: Abstract: For nearly two decades, CAPTCHAS have been widely used as a means of protection against bots. Throughout the years, as their use grew, techniques to defeat or bypass CAPTCHAS have continued to improve. Meanwhile, CAPTCHAS have also evolved in terms of sophistication and diversity, … Read More “Bots Are Better than Humans at Solving CAPTCHAs” »
Researchers have trained a ML model to detect keystrokes by sound with 95% accuracy. “A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards” Abstract: With recent developments in deep learning, the ubiquity of microphones and the rise in online services via personal devices, acoustic side channel attacks present a greater threat to keyboards than … Read More “Using Machine Learning to Detect Keystrokes” »
Interesting research: “(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs“: Abstract: We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks … Read More “Indirect Instruction Injection in Multi-Modal LLMs” »
Stanford and Georgetown have a new report on the security risks of AI—particularly adversarial machine learning—based on a workshop they held on the topic. Jim Dempsey, one of the workshop organizers, wrote a blog post on the report: As a first step, our report recommends the inclusion of AI security concerns within the cybersecurity programs … Read More “Security Risks of AI” »
I’m not sure there are good ways to build guardrails to prevent this sort of thing: There is growing concern regarding the potential misuse of molecular machine learning models for harmful purposes. Specifically, the dual-use application of models for predicting cytotoxicity18 to create new poisons or employing AlphaFold2 to develop novel bioweapons has raised alarm. … Read More “Using LLMs to Create Bioweapons” »
Here’s an experiment being run by undergraduate computer science students everywhere: Ask ChatGPT to generate phishing emails, and test whether these are better at persuading victims to respond or click on the link than the usual spam. It’s an interesting experiment, and the results are likely to vary wildly based on the details of the … Read More “LLMs and Phishing” »
CRYSTALS-Kyber is one of the public-key algorithms currently recommended by NIST as part of its post-quantum cryptography standardization process. Researchers have just published a side-channel attack—using power consumption—against an implementation of the algorithm that was supposed to be resistant against that sort of attack. The algorithm is not “broken” or “cracked”—despite headlines to the contrary—this … Read More “Side-Channel Attack against CRYSTALS-Kyber” »