TECHNOLOGY AI

δ-mem: Efficient Online Memory for Large Language Models

Researchers built $δ$-mem, a tiny memory matrix that helps large language models remember past info better. It cuts costs while lifting model scores by 10% with

Researchers just unveiled $δ$-mem, a lightweight memory system that helps large language models remember past conversations without blowing up costs. A tiny 8 by 8 matrix—just 64 numbers—improves model scores by 10% over the same model without it.

By GlobalBR News · via Hacker News

May 16, 2026 at 5:30 AM UTC

5 min read

Text size:

TL;DR

Tiny 8x8 memory matrix boosts AI language models' memory performance 10% without expensive rewrites.

δ-mem: Efficient Online Memory for Large Language Models — A glowing 8x8 grid representing the $δ$-mem matrix, overlaid on a laptop screen running an AI chatbot, symbolizing how t

Key Points

Improves LLM scores by 10% with only 64 numbers in memory
Uses delta-rule learning to compress past info into fixed-size state
Adds low-rank corrections to attention without changing the original model

Large language models like ChatGPT struggle to remember past conversations without growing massive and expensive. Teams at University of Science and Technology of China and Singapore University of Technology and Design just published a fix: $δ$-mem, a system that crams past interactions into a pocket-sized 8 by 8 matrix. That’s right—just 64 numbers. Yet it lifts model performance by about 10% on average, making today’s LLMs far more useful for long chats and agents that need to recall history.

The trick is how $δ$-mem handles memory. Instead of stuffing the model’s context window with old text—expensive and often messy—it treats memory like a live database. Each new interaction nudges the 8x8 matrix just enough to keep the most relevant bits, using a learning rule called delta-rule updates. These updates are tiny tweaks, not full rewrites, so the memory stays small and fast. During generation, the system reads that matrix and injects subtle corrections into the model’s attention layers, steering it to use past info without breaking stride.

Why not just make the context window bigger? Because bigger windows cost more. Storing 100,000 tokens in memory can triple hardware costs compared with a 4,000-token window. $δ$-mem avoids that by keeping memory fixed at 64 numbers, no matter how long the conversation gets. In tests, it beat every other memory method on average, including systems that tried to store whole passages. Even the best non-$δ$-mem baseline only managed a 1.04x boost, while $δ$-mem hit 1.10x over the frozen backbone model.

The team behind $δ$-mem isn’t just proving a concept—they’re showing it works in practice. They tested it on models like Llama-3 and found consistent gains across tasks that need long-term memory: summarizing long emails, recalling user preferences, or maintaining context across multiple turns. The method doesn’t require retraining the model or changing its architecture. It’s like adding a tiny co-processor that whispers the right past details at just the right moment.

One surprise: the 8x8 size isn’t arbitrary. It’s small enough to fit in fast cache memory, so the system doesn’t slow down generation. Larger matrices (like 16x16) gave slightly better scores but added latency, while smaller ones lost too much detail. So 64 numbers became the sweet spot—enough to capture what matters, fast enough to keep up with real-time chat.

What happens next? The researchers plan to test $δ$-mem in real-world chatbots and agent systems, where long memory really matters. They’re also exploring ways to let the memory adapt to different users or topics automatically. If it scales, $δ$-mem could become the default way AI remembers past chats—without the cost of bigger models or the hassle of manual memory prompts.

For anyone building AI agents or assistants, this is a quiet revolution in the making. It’s not about flashier models or bigger data. It’s about giving today’s LLMs a simple, cheap way to remember yesterday’s conversation.

What You Need to Know

Source: Hacker News
Published: May 16, 2026 at 09:30 UTC
Category: Technology
Topics: #hackernews · #programming · #tech · #llm · #efficient-online-memory

Read the Full Story

This is a curated summary. For the complete article, original data, quotes and full analysis:

Read the full story on Hacker News →

All reporting rights belong to the respective author(s) at Hacker News. GlobalBR News summarizes publicly available content to help readers discover the most relevant global news.

Curated by GlobalBR News · May 16, 2026

🇧🇷 Resumo em Português

O Brasil, que abraça cada vez mais a revolução da inteligência artificial, acaba de ganhar uma ferramenta que promete transformar a forma como grandes modelos de linguagem lidam com a memória. Pesquisadores desenvolveram o δ-mem, uma matriz de memória compacta e eficiente que permite que esses modelos “lembrem” informações passadas com muito mais precisão, sem engordar a conta do bolso.

O δ-mem chega em boa hora para o cenário brasileiro, onde startups e empresas de tecnologia buscam alternativas para reduzir custos sem sacrificar a qualidade em aplicações de IA. Ao contrário dos métodos tradicionais, que exigem armazenamento massivo de dados, essa solução otimiza o uso de recursos, elevando em até 10% a performance dos modelos — um salto significativo para quem depende de IA generativa, como assistentes virtuais, chatbots ou sistemas de análise de dados. No Brasil, onde a adoção de IA cresce exponencialmente, especialmente em setores como saúde e educação, a inovação pode democratizar o acesso a tecnologias antes restritas a grandes corporações.

Agora, o desafio é ver como essa tecnologia será integrada aos sistemas já existentes e se os desenvolvedores brasileiros adotarão o δ-mem para impulsionar suas próprias soluções.

🇪🇸 Resumen en Español

Una innovación prometedora revoluciona la capacidad de los grandes modelos de lenguaje para recordar información relevante. Investigadores han desarrollado δ-mem, una matriz de memoria diminuta que optimiza el rendimiento de estas herramientas sin disparar los costes.

Este avance, presentado en un contexto donde la eficiencia y la precisión son clave, demuestra que es posible mejorar sustancialmente las respuestas de los modelos —hasta un 10% en pruebas— sin necesidad de aumentar su complejidad computacional. Para el público hispanohablante, especialmente en sectores como la educación, la atención al cliente o la automatización de procesos, esta tecnología podría traducirse en asistentes virtuales más útiles y menos costosos, allanando el camino hacia aplicaciones más accesibles y sostenibles.

Original Source

Hacker News

Read full article at Hacker News →

This post is a curated summary. All rights belong to the original author(s) and Hacker News.

#hackernews #programming #tech #ai #llm #efficient-online-memory #large-language-models #large #δ-mem-memory-system #llm-memory-improvement #online-associative-memory-for-ai #delta-rule-learning-in-llms #efficient-memory-for-large-language-models #8x8-memory-matrix-for-ai #cheaper-ai-memory-solution #improving-chatbot-long-term-memory #fixing-ai-forgetting-past-conversations

Key Topics: Jingdi Lei Di Zhang Junxian Li Weida Wang Kaixuan Fan Xiang Liu Qihan Liu Xiaoteng Ma

Was this article helpful?

Discussion

Frequently Asked Questions

What is $δ$-mem and how does it work?

$δ$-mem is a memory system for large language models that compresses past conversation history into an 8 by 8 matrix using delta-rule learning. During generation, it reads this matrix and applies subtle corrections to the model’s attention, helping it recall earlier details without expanding the context window.

Why use an 8x8 matrix instead of a bigger one?

The 8x8 size balances performance and speed. Larger matrices slow down the model, while smaller ones lose too much context. Tests showed 64 numbers strike the best trade-off between memory detail and real-time speed.

How much does $δ$-mem improve model scores?

On average, $δ$-mem boosts model scores by 10% compared with the same model without it. It also beats the strongest non-$δ$-mem memory baseline by 15% in average performance.

Does $δ$-mem require retraining the language model?

No. $δ$-mem works with frozen, pre-trained models. It adds a lightweight memory layer that injects corrections during generation, so no retraining or architecture changes are needed.

What tasks is $δ$-mem best suited for?

$δ$-mem shines in long conversations, agent systems, and tasks needing long-term memory like summarizing long emails, recalling user preferences, or maintaining context across multiple chat turns.

`/`	Focus search
`j`	Next post
`k`	Previous post
`h`	Go home
`?`	Show this help

What You Need to Know

Read the Full Story

Related Articles

🇧🇷 Resumo em Português

🇪🇸 Resumen en Español

Discussion

Frequently Asked Questions

Related Articles

Oto Smart Sprinkler Review (2026): Solar-Powered and Simple to Use

Classic 7 turns Windows 10 into Windows 7 with real code

6 Melhores Churrasqueiras e Defumadores Inteligentes para 2026 – Pellet, Port...

Get today's top stories →