Memory’s Growing Role in AI: A Conversation with HBM Pioneer Prof. Joungho Kim
Interesting perspective from a reference
By Philippe Nicolas | January 12, 2026 at 2:00 pmBlog published by Sandisk Dec 30, 2025
Joungho Kim is an IEEE Fellow and professor of electrical engineering at KAIST, where he leads the Terahertz Interconnection and Package Laboratory (TERALAB). He has dedicated nearly three decades of his expertise to the center of advanced memory and packaging innovation. As director of KAIST’s 3D-IC Research Center and the Smart Automotive Electronics Research Center, he has authored hundreds of technical papers, delivered more than 200 invited talks, and literally written the book on through-silicon vias (TSVs). Widely known as the “Father of HBM” for his pioneering contributions to High Bandwidth Memory, Professor Kim is now applying that expertise to studying and developing High Bandwidth Flash (HBF).
Sandisk had the opportunity to speak with Professor Kim about his work at KAIST and the TERALAB, his interest in HBF, and why he believes it represents a pivotal technology for AI inferencing.
The conversation below has been lightly edited for clarity and length.
- Sandisk: For those who might not be familiar with you, the title you’ve been given is the “The Father of High Bandwidth Memory (HBM)”. Can you explain a little bit about how that came about?
- Professor Kim: Yes. I began my research journey back in 2001, more than 10 years before HBM started development. At the beginning of 2001, I thought “Moore’s law could be near its end.” Simultaneously, some applications were beginning to require higher bandwidth. So, the memory architecture had to change. One solution is to stack the devices on top of each other; through-silicon via (TSV) is another. There are also associated technologies called signal integrity and power integrity. It has now been more than 25 years since I have been involved in HBM design. I’m proud that for the last 25 years, we have been dedicated to research on HBM product design and its fundamentals. During that time, I have educated more than 120 master’s and PhD students. And we have published more than 700 papers. So, it has been a long journey. And in this new AI era, we have more opportunities in terms of business and development.
- Sandisk: You are a part of the Terabyte Interconnection and Packaging Laboratory (TERALAB) – can you tell us more about your role at the Korea Advanced Institute of Science and Technology (KAIST) and the work you do at TERALAB?
- Professor Kim: I currently have 27 master’s and PhD students. Five of them graduate yearly, and half of them usually go to large memory companies. Sooner or later, they will join Sandisk as well. Half of my students are working in Korean companies, primarily in the memory industry. I really love [my role] because even though I’m a professor at a university, we are running one of the biggest laboratories at KAIST in terms of research budget and number of students. From a young age, I wanted to work in industry. I wanted to develop my idea to make a real product. I’m really excited about that. Now I have been working with companies in the memory industry for more than 25 years. [Some of these companies] have a very big role in the AI industry because they are developers of HBM, and HBM was an excellent driving force in our collaboration with them. I really love that kind of function.
- Sandisk: To start at a basic level, can you explain high bandwidth memory and why it’s so crucial for inference AI?
- Professor Kim: During the training and inference processes, the performance of AI is heavily dependent on the bandwidth and memory capacity of DRAM. We are basically living in a von Neumann architecture. We have GPU and memory. Surprisingly, most of the generative AI is based on the transformer model. And transformer models are memory-bound. That means the bandwidth of memory is limited. So, it’s limiting the performance of AI. The measurement of AI performance is mostly based on per-second throughput and latency. Memory is really controlling the performance of AI. In that regard, HBM is designed to supply higher bandwidth to the GPU and more memory capacity. But during the inference process especially, there are certain situations that need larger memory capacities and more read cycles. HBF (High Bandwidth Flash) is a very suitable solution for those instances.
- Sandisk: Your expertise is in HBM, a legacy of 25 years as you have mentioned. But you’ve expressed a newfound enthusiasm for HBF. You’ve mentioned having a conceptual notion of HBF prior to its development into a real technology. Can you explain how that came about?
- Professor Kim: HBM has certain advantages, like high bandwidth. But memory capacity is not enough. So, I always worried about that. And I brought up an idea to make a multiple-tower architecture. To make even bigger memory. But still, they are in a situation of being memory hungry. One day, I thought, ‘Why not try using NAND flash memory?’ NAND flash has certain advantages. It has 10 times more memory capacity, but it also has certain disadvantages. There are certain structural similarities between HBM and HBF. Although the devices differ, they must achieve high bandwidth.
- Sandisk: You and other experts have described AI, especially AI inference, as being ‘memory-bound’. Can you explain this concept?
- Professor Kim: Depending on the AI algorithm, performance can sometimes be determined by the amount of computing capability you have available. In that case, GPU design and GPU size are very crucial. In the case that is compute-bound, the solution may be to put multiple GPUs connected together side by side to make even more computing capabilities. But unfortunately, in the transformer model, especially in inference cases, it is memory-bound. So rather than having a lot of computation, they spend more time bringing data from the memory and writing process. Bandwidth is limiting them. And that is called being memory-bound. Unfortunately, most inference and training processes, as well as performance, are limited by memory. That means we need more memory innovation. But in the memory world, we have SRAM, DRAM, and NAND flash memories. And we have to somehow design those connections. That is one dimension. The computing innovation will be driven mostly by memory architecture. That I strongly believe.
- Sandisk: In your work at KAIST, you outlined a model in which you can place 100GB of HBM to act as a cache in front of a layer of 1TB of HBF. Can you walk through how this hybrid system works and why it is ideal for a cache? Why is it ideal for a kind of AI inference model?
- Professor Kim: That is a reflection of my own character. I don’t want to be alone. I want to be living with many other people. Also, when I educate my students, I always try to find what is best in them. And I want to combine all of their best traits. As I mentioned, SRAM, HBM, and HBF should be cleverly designed together to maximize performance and to break the memory bottleneck. And HBM alone has certain advantages and disadvantages. But surprisingly, one disadvantage of HBM is exactly the advantage of HBF. So why not marry them together? That’s the starting point of that hybrid architecture. But the challenge is that the GPU has to accept that new architecture. That is the best for them. They have to work well with a memory company like Sandisk. Also, developers will have to change the software to optimize the software and hardware together. For example, some data has to be directly connected and transmitted from HBF to HBM.So, they need a new instruction set and circuit to support them. They have to accept those kinds of new parameters.
- Sandisk: Architectural innovations, like mixture of experts (MoE), are causing compute requirements to trend downward while memory requirements are increasing. How does this changing balance between compute and memory reshape the AI architecture landscape?
- Professor Kim: The MoE and Retrieved Augmented Generation (RAG) are very important for future AI services and models. And I’m expecting more of those kinds of improvements: more diversity of models, and higher performance. And still, I believe that is a memory network. Memory must support them because these improvements will help make a deeper memory-bound situation. So, it will ask for more memory capacity, especially in the case of MoE and RAG. We have to store more varied model parameters in memory. And in those cases, after loading there, we frequently read those memories for inferences. So again, I’m expecting AI models to evolve fast, but they’re going to be more memory hungry.
- Sandisk: As these models become more complex, dealing with exponentially more tokens, the amount of required memory increases as well. Can you explain that exponential nature?
- Professor Kim: That’s right. And the number of users will increase. I don’t know how many users are currently using GPT. Let’s assume it’s millions. Please remember 10 years ago how much memory we used in PCs and smartphones at that time. We might have used a flip phone back then. But right now, we are using smartphones. On the back side, in the data center, they are requiring more memory size. So potentially 1000x in 10 years would not be a surprising number considering the last decade’s PC memory capacity and the smartphone memory capacities. But in the future for AI, AI data centers will be much bigger. The memory in our smartphones may not be that much. Almost the same. However, the size of AI data centers will continue to grow. Rather than having a big computer in my pocket, for AI, we’re going to have an AI supercomputer somewhere in the desert—big, big size.
- Sandisk: So, I think I’ll end it about here. But one last thing, if you could describe the kind of opportunity that is within HBF. I think for a lot of people who might not be familiar with AI in terms of down to the component level, they understand GPUs are important, but they don’t know that memory is also crucial. How do you describe, for the people who might not know, the opportunity or the growth that we might see from HBF?
- Professor Kim: As an engineer, when I was a college student a long, long time ago, I saw the emergence of the PC. And around 2000, I saw the emergence of the Internet. And that changed everything in our lives. AI is bigger than that. And so, it is a good opportunity for a company, for investors, and for students. So, please work hard and work smart to catch up on these opportunities. And the second message I would like to deliver is that nothing remains unchanged. The CPU was very important in the PC era. The Internet was very important in the Internet era. AP, ARM-based CPU was very important. Now, in the AI era, GPUs are becoming very important. But sooner or later, we will find out that HBM and HBF are going to be more important than the GPU. Always life is changing. I would ask everyone to accept that and enjoy the process. And for people who are not familiar with HBM and HBF, I encourage them to study.






