MetaAI Researchers Have Suggested Scalable Memory Layers for LLMs for Improved Memory

Even though many companies are integrating LLMs into their systems, the biggest challenges they face are factual knowledge and hallucinations. LLMs tend to hallucinate often and sometimes make up information that can have major consequences. Now a new research by MetaAI has suggested that this problem in LLMs can be increased through scalable memory layers. In simple words, scalable memory layers add more parameters to LLMs and there is no need for extra compute resources. This enhances the learning capacity of LLMs, especially in applications where there is room for extra memory for factual knowledge.

In traditional large language models, dense layers are for memorising information. In dense layers, LLMs have to learn large amounts of information in small parameters which makes all parameters full. During interference, all parameters get activated at the same time. Even though dense layers can grow larger when they learn more, it requires additional compute resources as well as energy resources. In the case of factual knowledge, simple layers are more than effective and memory layers can do it perfectly. Simple mechanisms are used to encode and retrieve knowledge and information in memory layers, and they can take more memory than dense layers too.

Memory layers have been existing for a long time but they do not get used in modern technology and architectures because they are not optimized for hardware accelerators which are currently being used. Nowadays, LLMs are using a mixture of expert (MoE) architecture which is similar to memory layers with many smaller expert components. Google DeepMind has recently developed an architecture known as PEER which extends more control of MoE parameters to experts. When there is interference time, a mechanism through these architectures determine which specific task has to be performed.

As memory layers have capacity for more memory and do not need much computation, there are also challenges in hardware and software of LLMs too. In the paper by MetAI, the researchers have proposed some ideas that can help in making memory layers possible in latest LLMs. The researchers said that memory layers can get distributed in a parallel way across GPUs. They also proposed that CUDA kernel can be implemented to handle high memory operations. These two modifications can make memory layers possible in modern LLMs.

The researchers also tested memory layers in Llama models for common sense world knowledge, factual question answering, scientific knowledge and coding. The results showed that memory improved a lot in models as compared to models which were using more compute. The researchers also found that results of memory layers also remained consistent across all parameters (134 million to 8 billion parameters).

Image: DIW-Aigen

Read next:

• Meta Trials eBay Integration on Marketplace Amid EU Antitrust Compliance Efforts

• From Supercomputers to Self-Driving Trucks: Nvidia’s CES 2025 Announcements Redefine Innovation
Previous Post Next Post