Frontier models in the billions and trillions of parameters have been a focal point of the past two years as generative AI enthusiasm has continued to grow steadily finding its way into our apps, devices, and businesses–with new tools and use cases coming to market almost daily. We also know that the rapid growth of large AI models for language, voice, and video is putting notable stress on resources, which has ignited a renaissance of interest in nuclear power as hyperscalers like Microsoft, Google, and AWS have all made sizable commitments to nuclear to support hundreds of billions of data center infrastructure build out expected over just the next few years. And while models in the hundreds of billions and trillions of parameters like those developed by researchers at OpenAI, NVIDIA, Google, and Anthropic are at the cutting edge, we also know these power-hungry next generation models are often far more powerful than what is needed for most use cases–kind of like driving a Formula 1 race car in the middle of rush hour traffic. This is where smaller models that can be powered with less energy and compute horsepower come into play. More and more we are hearing about small language models with hundreds of millions or sub 10 billion parameters that are highly accurate and consume substantially less energy and cost less per token. This past March at its GTC Conference, NVIDIA launched its NIM (NVIDIA inference Microservice) software technology, which packages optimized inference engines, industry standard APIs and support for AI models into containers for easy deployment.
Full report : Why small language model based enterprise AI may serve more effectively and efficiently for businesses.