academic papers, AI, integrity, LLM, Security technology, trust, Uncategorized
Today’s freaky LLM behavior: We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a “student” model learns to prefer owls when trained on sequences of numbers generated by a “teacher” model that prefers owls. This same phenomenon can transmit misalignment … Read More “Subliminal Learning in AIs” »