MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives
The core challenge for streaming video generation is maintaining the content consistency in long context, which poses high requirement for the memory design. Most existing solutions maintain the memory by compressing historical frames with predefined strategies. However, different to-generate video...
MMGR: Multi-Modal Generative Reasoning
Video foundation models generate visually realistic and temporally coherent content, but their reliability as world simulators depends on whether they capture physical, logical, and spatial constraints. Existing metrics such as Frechet Video Distance (FVD) emphasize perceptual quality and overlook...
EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models
Achieving truly adaptive embodied intelligence requires agents that learn not just by imitating static demonstrations, but by continuously improving through environmental interaction, which is akin to how humans master skills through practice. Vision-Language-Action (VLA) models have advanced...
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling
This paper presents WorldPlay, a streaming video diffusion model that enables real-time, interactive world modeling with long-term geometric consistency, resolving the trade-off between speed and memory that limits current methods. WorldPlay draws power from three key innovations. 1) We use a Dual...
The Sequence Knowledge #772: Generate Data Using Multiturn Data Synthesis
A more sophisticated synthetic data generation paradigm.
TAT: Task-Adaptive Transformer for All-in-One Medical Image Restoration
Medical image restoration (MedIR) aims to recover high-quality medical images from their low-quality counterparts. Recent advancements in MedIR have focused on All-in-One models capable of simultaneously addressing multiple different MedIR tasks. However, due to significant differences in both...
2025 Open Models Year in Review
The first recap of a long year in the trenches of open models.
The Sequence Radar #771: Last Week in AI: GPT-5.2, Mistral, and Google’s Agent Stack
A very unique week in AI releases
The Sequence Opinion #770: The Post-GPU Era: Why AI Needs a New Kind of Computer
Can we do better than traditional GPUs?
New Talk: Building Olmo 3 Think
Re-recording my NeurIPS talks in one mega-take.
The Sequence AI of the Week #769: Inside Gemini Deep Think
One of the most innovative AI architectures of the last few years.
The Sequence Knowledge #768: Using Rephrasing for Synthetic Data Generation
Not all rephrasing methods are created equal.
Latest open artifacts (#16): Who's building models in the U.S., China's model release playbook, and a resurgence of truly open models
A month with SOTA releases with (truly) open model releases left and right.
Olmo 3: America’s truly open reasoning models
We present Olmo 3, our next family of fully open, leading language models.
Why AI writing is mid
How the current way of training language models destroys any voice (and hope of good writing).
AGI Is Not Multimodal
"In projecting language back as the model for thought, we lose sight of the tacit embodied understanding that undergirds our intelligence." –Terry WinogradThe recent successes of generative AI models have convinced some that AGI is imminent. While these models appear to capture the...
Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research
What is the Role of Mathematics in Modern Machine Learning?The past decade has witnessed a shift in how progress is made in machine learning. Research involving carefully designed and mathematically principled architectures result in only marginal improvements while compute-intensive and...
What's Missing From LLM Chatbots: A Sense of Purpose
LLM-based chatbots’ capabilities have been advancing every month. These improvements are mostly measured by benchmarks like MMLU, HumanEval, and MATH (e.g. sonnet 3.5, gpt-4o). However, as these measures get more and more saturated, is user experience increasing in proportion to these...
We Need Positive Visions for AI Grounded in Wellbeing
IntroductionImagine yourself a decade ago, jumping directly into the present shock of conversing naturally with an encyclopedic AI that crafts images, writes code, and debates philosophy. Won’t this technology almost certainly transform society — and hasn’t AI’s impact...
Financial Market Applications of LLMs
The AI revolution drove frenzied investment in both private and public companies and captured the public’s imagination in 2023. Transformational consumer products like ChatGPT are powered by Large Language Models (LLMs) that excel at modeling sequences of tokens that represent words or parts...