AI Unfiltered

Chinese AI • Open Source • Security • Incidents. Signal, not noise.

[AINews] Anthropic raises $965B Series H, releases Opus 4.8 and Dynamic Workflows/ultracode

Total Anthropic victory!

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

The Age of Async Agents — Cognition's Walden Yan & OpenInspect's Cole Murray

80% Devin Commits, Spec-to-PR Workflows, Full VMs, Agent Memory, and PMs Shipping Code

[AINews] Cognition raises $1B in $26B Series D

coding is an uncapped TAM market

PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity Perspective

Parameter-efficient finetuning (PEFT) has become the standard approach for adapting large language models, yet evaluations largely emphasize downstream accuracy while overlooking the retention of pretrained capabilities. We argue that PEFT should be assessed through the stability-plasticity...

Self-Improving Language Models with Bidirectional Evolutionary Search

Search has been proposed as an effective method for self-improving language models and agentic systems, both for post-training sample generation and for inference. However, widely used methods such as best-of-N sampling and tree search face two fundamental limitations: they are guided by sparse...

Human Label Variation as Stable Signal: Learning Annotator-Specific Explanation Behavior via Cross-Annotator Preference Optimization

Free-text explanations extend human label variation (HLV) beyond label disagreement by revealing the reasoning and preferences behind annotators' decisions. We study whether large language models (LLMs) can learn and reproduce such annotator-specific label-explanation behavior. Using two...

Skill-Conditioned Gated Self-Distillation for LLM Reasoning

On-policy self-distillation (SD) improves LLM reasoning by using teacher-side privileged information (PI) to turn sparse verifier outcomes into dense token-level supervision. Existing methods usually assume trusted PI, such as reference answers or successful traces. We ask whether PI can instead...

🔬ESMFold2: The Bitter Lesson is Coming for Proteins - Alex Rives, BioHub

Datasets vs. inductive bias, world models, and programmable biology

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

Beyond Binary Moral Judgment: Modeling Ethical Pluralism in AI

Critical decision-making in socially consequential spaces is increasingly involving AI systems at varying capacities. Yet, despite the ubiquity of autonomous systems, most approaches to handling autonomous moral decision-making resort to scalar or binary judgments. These methods are insufficient...

[AINews] New AI Infra decacorns: Fireworks, Baseten (with OpenRouter on the way)

it's funding news, but it's good news.

Reachy Mini goes fully local

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

Some ideas for what comes next, May 2026

Gemini Flash 3.5, Mythos, open-closed balance, America's open-source surge, emerging power struggles and more.

Harness, Scaffold, and the AI Agent Terms Worth Getting Right

Latest open artifacts (#21): Open model bonanza! Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others. On CAISI's V4 assessment.

An eventful month with one flagship release after another