Large language models like ChatGPT struggle to remember past conversations without growing massive and expensive. Teams at University of Science and Technology of China and Singapore University of Technology and Design just published a fix: $δ$-mem, a system that crams past interactions into a pocket-sized 8 by 8 matrix. That’s right—just 64 numbers. Yet it lifts model performance by about 10% on average, making today’s LLMs far more useful for long chats and agents that need to recall history.

The trick is how $δ$-mem handles memory. Instead of stuffing the model’s context window with old text—expensive and often messy—it treats memory like a live database. Each new interaction nudges the 8x8 matrix just enough to keep the most relevant bits, using a learning rule called delta-rule updates. These updates are tiny tweaks, not full rewrites, so the memory stays small and fast. During generation, the system reads that matrix and injects subtle corrections into the model’s attention layers, steering it to use past info without breaking stride.

Why not just make the context window bigger? Because bigger windows cost more. Storing 100,000 tokens in memory can triple hardware costs compared with a 4,000-token window. $δ$-mem avoids that by keeping memory fixed at 64 numbers, no matter how long the conversation gets. In tests, it beat every other memory method on average, including systems that tried to store whole passages. Even the best non-$δ$-mem baseline only managed a 1.04x boost, while $δ$-mem hit 1.10x over the frozen backbone model.

The team behind $δ$-mem isn’t just proving a concept—they’re showing it works in practice. They tested it on models like Llama-3 and found consistent gains across tasks that need long-term memory: summarizing long emails, recalling user preferences, or maintaining context across multiple turns. The method doesn’t require retraining the model or changing its architecture. It’s like adding a tiny co-processor that whispers the right past details at just the right moment.

One surprise: the 8x8 size isn’t arbitrary. It’s small enough to fit in fast cache memory, so the system doesn’t slow down generation. Larger matrices (like 16x16) gave slightly better scores but added latency, while smaller ones lost too much detail. So 64 numbers became the sweet spot—enough to capture what matters, fast enough to keep up with real-time chat.

What happens next? The researchers plan to test $δ$-mem in real-world chatbots and agent systems, where long memory really matters. They’re also exploring ways to let the memory adapt to different users or topics automatically. If it scales, $δ$-mem could become the default way AI remembers past chats—without the cost of bigger models or the hassle of manual memory prompts.

For anyone building AI agents or assistants, this is a quiet revolution in the making. It’s not about flashier models or bigger data. It’s about giving today’s LLMs a simple, cheap way to remember yesterday’s conversation.

What You Need to Know

  • Source: Hacker News
  • Published: May 16, 2026 at 09:30 UTC
  • Category: Technology
  • Topics: #hackernews · #programming · #tech · #llm · #efficient-online-memory

Read the Full Story

This is a curated summary. For the complete article, original data, quotes and full analysis:

Read the full story on Hacker News →

All reporting rights belong to the respective author(s) at Hacker News. GlobalBR News summarizes publicly available content to help readers discover the most relevant global news.


Curated by GlobalBR News · May 16, 2026



🇧🇷 Resumo em Português

O Brasil, que abraça cada vez mais a revolução da inteligência artificial, acaba de ganhar uma ferramenta que promete transformar a forma como grandes modelos de linguagem lidam com a memória. Pesquisadores desenvolveram o δ-mem, uma matriz de memória compacta e eficiente que permite que esses modelos “lembrem” informações passadas com muito mais precisão, sem engordar a conta do bolso.

O δ-mem chega em boa hora para o cenário brasileiro, onde startups e empresas de tecnologia buscam alternativas para reduzir custos sem sacrificar a qualidade em aplicações de IA. Ao contrário dos métodos tradicionais, que exigem armazenamento massivo de dados, essa solução otimiza o uso de recursos, elevando em até 10% a performance dos modelos — um salto significativo para quem depende de IA generativa, como assistentes virtuais, chatbots ou sistemas de análise de dados. No Brasil, onde a adoção de IA cresce exponencialmente, especialmente em setores como saúde e educação, a inovação pode democratizar o acesso a tecnologias antes restritas a grandes corporações.

Agora, o desafio é ver como essa tecnologia será integrada aos sistemas já existentes e se os desenvolvedores brasileiros adotarão o δ-mem para impulsionar suas próprias soluções.


🇪🇸 Resumen en Español

Una innovación prometedora revoluciona la capacidad de los grandes modelos de lenguaje para recordar información relevante. Investigadores han desarrollado δ-mem, una matriz de memoria diminuta que optimiza el rendimiento de estas herramientas sin disparar los costes.

Este avance, presentado en un contexto donde la eficiencia y la precisión son clave, demuestra que es posible mejorar sustancialmente las respuestas de los modelos —hasta un 10% en pruebas— sin necesidad de aumentar su complejidad computacional. Para el público hispanohablante, especialmente en sectores como la educación, la atención al cliente o la automatización de procesos, esta tecnología podría traducirse en asistentes virtuales más útiles y menos costosos, allanando el camino hacia aplicaciones más accesibles y sostenibles.