Competitive Differentiation for Foundation Models in the LLM Space
Compute Performance, Safety and Alignment, Accuracy and Retrieval Augmented Generation are Three Emerging Differentiation Vectors Machine Learning foundation models are a new category that has largely been undifferentiated – the major providers have been competing on similar types of customer benefits
December 6, 2023Compute Performance, Safety and Alignment, Accuracy and Retrieval Augmented Generation are Three Emerging Differentiation Vectors
Machine Learning foundation models are a new category that has largely been undifferentiated – the major providers have been competing on similar types of customer benefits. A lot of the development focus this past year has been on model attributes such as context length, rate of hallucinations and the size of the training data.
At the same time, both mature and newly established AI companies have been developing their own foundation models and associated development platforms and making them available to 3rd-parties to be the engine of their AI applications. As the size and maturity of the segment grows, we are starting to see different types of benefits emerge as competitive differentiation for the increasing variety of licenseable, commercially available foundation models.
Before we look at the market, here are the main product characteristics that can be used as vectors for competitive differentiation. They include benefits related to:
- Compute performance – models that are efficient to train, fine-tune and use in terms of the time, data and budget needed and with superior integration with hardware acceleration platforms
- Customization and flexibility – models and development platforms that allow for easy and cost-effective fine-tuning and optimization for niche, specific types of AI applications (e.g. search, creative content generation, document research, chatbot assistants)
- Alignment – models developed with safety and responsibility at the forefront and with the most solid and comprehensive safeguards against harmful use
- Accuracy and Retrieval Augmented Generation – models developed to minimize hallucinations and with optimal integration with RAG orchestration platforms for AI applications where factual accuracy is essential (e.g. legal, finance, government, medical)
- Ease of application development – models that are integrated into end-to-end development platforms which enable seamless, no-code finetuning and deployment into pre-built AI applications
The types of customer benefits listed above are present across the segment in a variety of combinations. In the past few months, we have seen foundation model providers begin to develop competitive differentiation across three vectors: compute performance, safety and alignment, and RAG. Here is what the categories look like for LLMs:
Compute Performance – Mistral, Nvidia
Launched in September 2023, Mistral’s open source 7B model has been developed to be “compute efficient, helpful and trustworthy” and claims to outperform larger models, such as LLama 2 13 B and LLama 2 34B. Measured on benchmarks for commonsense and STEM reasoning, the 7B parameter model holds its own and pushes performance barriers for open source, smaller LLMs. With support for English and code and a 8k tokens context length, you can download Mistral 7B on the developer’s website. You can also access and deploy it through Amazon Bedrock, Microsoft’s Azure AI Studio, Google’s Vertex AI and Hugging Face.
Based in Paris, France, Mistral’s mission is to provide “Frontier AI in your hands” and its roadmap includes more models with larger sizes, better reasoning and multiple language support.
Launched this month, November 2023, Nvidia AI’s family of LLMs called Nemotron-3 8B has also been developed with compute performance improvements in mind. Built to integrate seamlessly with the NVIDIA TensorRT-LLM open-source library and the NeMo deployment framework, the models are meant to enable cutting-edge accuracy, low latency, and high throughput. Available through the Azure AI Studio and Hugging Face, the Nemotron 3 catalog includes a base model, three versions for chatbots and one version for Q&A applications.
Nvidia will likely continue to develop its foundation models and development frameworks to achieve superior training and inference performance, especially when paired with its GPU infrastructure and hardware acceleration frameworks.
Safety & Alignment – Inflection, Anthropic
Last week, Inflection announced that they have completed the training of version two of their foundation model. With improvements to its factual knowledge and reasoning capabilities, Inflection 2 will power the PI assistant and will be available for 3rd-party applications through the Conversational API, once it becomes generally available. With a mission to provide supportive and empathetic personal intelligence for everyone, Inflection is investing heavily in AI safety and alignment and has been developing extensive policies and principles to safeguard the impact of the technology on human beings.
Also last week, Anthropic launched the Claude 2.1 API through the developer console and deployed it as the engine behind the free and paid versions of the Claude chatbot. With a large context window of 200k tokens, 2x reduction in the hallucination rate and new features such as system prompts and tools (in beta), the new version is part of Anthropic’s mission to provide “AI research and products that put safety at the frontier”. You can access the Claude API through Anthropic’s developer console and through Amazon Bedrock.
Accuracy & Retrieval Augmented Generation – Cohere, AI21Studio, Amazon Titan Embedding
Cohere Coral is a customizable knowledge assistant built on top of the company’s Command foundation model for the creation of RAG applications. Able to connect with a company’s own data sources, Coral is optimized for document Q&A prompts which generate responses that are verifiable with citations, in order to mitigate hallucinations. Alongside Coral, Cohere provides Embed, a text representation language model which generates embeddings and can be deployed alongside Command to improve the performance of RAG applications.
Last week, Cohere published a LLM university course on how to build a RAG-Powered Chatbot. You can access the foundation models on the company’s website and through Amazon Bedrock, with Azure AI Studio support coming soon.
AI21Studio is offering a Contextual Answers API, which accesses a “powerful question answering engine […] with AI-generated answers which are 100% based on your company’s proprietary data.” The tool, which runs on top of the company’s Jurassic-2 foundation models, is meant to generate grounded, truthful and correct answers. You can find AI21Studio’s products on their website and on the Google Cloud Marketplace, Amazon Bedrock and Dataiku.
Another model that can be used to improve accuracy and build RAG applications is Amazon Titan Embeddings, which is available through Amazon Bedrock.
Besides the categories and models mentioned above, there is the General Purpose segment, made up of products such as the GPT family from Open AI, the Llama family from Meta, PaLM from Google and Luminous from Aleph Alpha.
The ML foundation model landscape has changed in important ways since the market map I published 5 months ago and will continue to shift and differentiate until the products mature and the market becomes saturated. During its evolution we will see additional types of competitive advantage to the ones featured here, as model providers target the large variety of customer segments that is interested in building and using generative AI applications. I’m looking forward to find out what those will be!
Have you seen other foundation models that differentiate in interesting ways?