- #5 — 2026-W16Why Reverting an AI Agent's Instructions Doesn't Undo Its Behavior3 pieces · 2729 words
- #4 — 2026-W12Visual Inputs Break Moral Safety Filters in Vision-Language Models1 pieces · 2722 words
- #3 — 2026-W11LLM Agents Can Now Post-Train Other LLMs — With Caveats1 pieces · 3568 words
- #2 — 2026-W10.1Safety Alignment Backfires in Non-English Languages Across LLM Groups1 pieces · 2800 words