Going back to the interview with Dylan Patel, he said the compute is being shifted to the inference side from the training side. Since most of the data centers were being build for the inference side. Looking at the inference side moving to an MoE (Mixture of Experts) is a way to cheapen the compute cost of inference. Since not every single parameter must be computed for every token. This allows for larger number of parameters but not having to compute them for every token. MoE was considered to be brought to the main stream via GPT-4* (Highly speculated that GPT-4 used MoE), Llama 3.1 (Facebook’s open source) is MoE. DeepSeek just produced a superior MoE model than the previous models.
While MoE has reduced the cost of inference its not reduced to a negligible cost. Which means the chips that are the cheapest to run will still be winners with this new paradigm. Nvidia’s CEO says their competitors Total Cost of Operating would still be more expensive than Nvidia even if they gave their product away. So as the world continues to scale into AI it will still be cost conscious and want to go the route that is cheapest which leads back to Nvidia and other AI hardware companies. Sure you can run AI on your home computer but it will be a worse product and slower and more expensive than running it on a data center designed from the ground up on running AI.
Looking to the future for AI, their is a possibility of another type of disruption similar to MoE but its called Memory Layer. Memory Layer is being held back today’s memory. So as memory becomes better it might lead to a transition to Memory Layer and has the potential to create new winners and losers in the AI field.
Drew