Generative AI’s Second Act

Generative AI has evolved rapidly since the release of ChatGPT. What's different a year later, and what is the practical meaning for businesses?

Generative AI burst into public consciousness with OpenAI's ChatGPT in November 2022. Initially, its applications used a simple pattern of text-based prompts and answers. However the model's usefulness was limited by a “knowledge cutoff” of September 2021.

OpenAI soon followed with plug-ins that link large language models to other software programs. These plug-ins have the capability to perform tasks, including accessing real-time information such as inventory updates or pricing.

What’s changed in the last year?

Generative AI applications can now go beyond text to include imagery and sound, use a broad range of models, and more readily connect with company data to perform complex analyses.

1. Multi-modal models with text, images, video, and audio

Large language models have evolved into foundation models. They can now understand and generate the full range of content. However, large language models (LLMs) remain important. Several companies are developing LLMs for specific tasks, such as understanding complex instructions. LLMs’ smaller size relative to foundational models can mean faster responses with less computing power and environmental impact.

2. Many new models targeting specific uses

For many people, generative AI means ChatGPT. While alternatives existed, they tended to be used by specialists (the AI team at AlixPartners has used large language models such as BERT since 2018).

Now, models are available from:

Anthropic, founded by people from Open AI who intend models that more safely and predictably provide answers
Cohere, with founders from Google, which targets enterprise applications such as summarization and knowledge retrieval
HuggingFace’s BLOOM, which is the largest model. While intended for academic study, some companies have used BLOOM
Meta’s Llama 2, which is open source and so can be readily customized by companies willing to invest in lower-level AI capabilities. Llama 2 comes in small, medium, and large sizes that enable using smaller models to efficiently generate answers

As of November 2023, OpenAI continues to lead the most categories in benchmarks run by Stanford’s HELM (Holistic Evaluation of Language Models).

Chinese companies have also make significant investments to launch new models with government encouragement. In October 2023, Alibaba released Tongyi Qianwen, with “hundreds of billions of parameters”^[1] compared to 1.8 trillion parameters in ChatGPT^[2]. Alibaba also released versions for specific industries such as finance and healthcare. Alibaba, Baidu, and Tencent have seen limited adoption outside of their core markets. They also face questions given sanctions on their access to Nvidia GPUs which can both train models and power military applications.

3. Easier integration with company data

Generative AI’s first act started with simple prompts and replies. Businesses created useful applications by adding data passed to models with prompts, termed prompt engineering.

In the second act, large language models provide reasoning while a type of database called vector embeddings act as memory. Vector embeddings create numeric maps of data such as documents, consumer behaviors, or the uptime of machines. Vector embeddings can be searched to find similar sets of numeric values.

Vector embeddings enables semantic search, which finds relevant information based on related meaning and concepts rather than keywords. Businesses can apply semantic search to large data sets, such as manuals for every product a company has sold. Semantic search can find highly targeted data-memories-to include with prompts passed to generative AI models for reasoning. For example, semantic search can find relevant information in product manuals to enable generative AI to equip technicians with repair instructions and diagrams.

4. Completing complex tasks

While the first act involved asking questions of a large language model-essentially completing a thought-the second act expands to perform analyses and structure complex documents such as plans.

A new category of generative AI workflow organizers analyze complex queries and develop plans to generate answers. The AI controllers decide which steps are needed achieve a task, and then orchestrate a sequence of plug-ins to create content.

The three leading generative AI workflow organizers are an open source package called LangChain, Microsoft’s Semantic Kernel, and AWS Bedrock. They use standardized messages and interfaces to connect with models. For example, langchain uses templates called HumanMessage to pass prompts into a model and AIMessage to receive a reply. Using abstract representations of underlying components enables AI workflows across different providers and versions of models and plug-ins.

A typical pattern for an expert agent powered by generative AI now includes these steps:

Maintain conversational context by packaging earlier prompts along with the current prompt
Form a plan to execute a prompt, which can involve “recursive AI”-AI about AI-to understand a request and predict the best sequence of plug-ins
Find relevant content using semantic search across vector embeddings
Invoke the chain of plug-ins from step 2 to perform tasks like connecting with other applications to get data (often termed RAG, for retrieval augmented generation)
Generate candidate responses
Rank the responses and find the best answer

Example: a technician needs instructions to repair an engine. They start a conversation with a chatbot to identify the model of engine. Next, they ask how to perform a repair. The AI controller maintains context across each step of the conversation to understand which engine model to use. Next, semantic search finds relevant data by searching vector embedding models comprising thousands of repair manuals. The orchestrator then calls another plug-in that generates several candidate diagrams. Finally, a reranker model finds the best answer, which the orchestrator then shares with the technician.

What does the second act mean for businesses?

New capabilities expand the range of effective use cases and make building solutions faster and easier.

The first act brought a set of proven “no regret” use cases that essentially find answers in a text-based knowledge base:

Customer service: suggest replies to customer inquiries
Operations: provide instructions to maintain machines
Software development: complete sections of code and find bugs
Product management: summarize consumer feedback

The second act expands to essentially all workers who regularly communicate with others, perform analyses, develop plans, and manage execution. The recent launch of Microsoft's series of Copilots, which rapidly integrate company data using a graph of how people work together and what data they use, will catalyze awareness of generative AI’s business abilities.

Valuable use cases for the second act now include:

Sales: suggest customer communications and observe which messages best produce results
Marketing: tailor content and creative for consumer engagement
Finance: generate reports explaining trends, exceptions, and patterns

What’s the third act?

Business technology started in the 1950s with all-in-one mainframe applications. The 1990s saw a shift to ERP and other applications backed by databases, then a data warehouse, and, more recently, a data lake. Knowledge management rarely contributed in a meaningful way to business results.

For the third act, generative AI will increasingly provide the connection to people, organize the company’s collective knowledge, and drive workflows by orchestrating plug-ins that call to ERP and other systems. AI will observe actions and outcomes, and then seek to improve results by refining business processes and proactively persuading people. Generative AI will become a key part of a company’s operating model directly impacting business results.

^[1] Alibaba upgrades AI model Tongyi Qianwen, releases industry-specific models | Reuters

^[2] ChatGPT is a version of GPT4 tuned for chat; GPT4 reportedly includes 1.8 trillion parameters