Optimizing for LLMs: Is it Possible?

Since the advent of generative AI, large language models (LLMs) have revolutionized the world of search engines. These powerful models, such as GPT and Google Bard, have the ability to influence future purchase decisions and shape brand perceptions. As businesses strive to position themselves favorably within the outputs of these LLMs, the concept of LLM optimization or generative AI optimization (GAIO) has emerged. In this article, we delve into the intricacies of LLM optimization, exploring its potential to proactively influence AI performance and its implications for the future of SEO.

Understanding LLM Optimization and GAIO

The primary goal of GAIO is to assist companies in strategically positioning their brands and products within the outputs of leading LLMs. By doing so, businesses aim to increase their visibility and influence potential consumers’ purchasing decisions. For instance, when conducting a search for running shoes, the suggestions provided by generative AI tools like Bing Chat may recommend brands such as Brooks, Saucony, Hoka, and New Balance based on the user’s specific requirements. This demonstrates the potential of LLM optimization to influence the recommendation algorithm and prioritize certain brands and products.

The Mechanics Behind Generative AI Recommendations

Generative AI tools like Bing Chat rely on contextual suggestions that are generated using neutral secondary sources such as trade magazines, news sites, association websites, public institution websites, and blogs. These sources serve as a foundation for statistical analysis, with the AI relying on the frequency of word co-occurrences in the training data to determine the most relevant recommendations. Words that frequently appear together in the training data are considered to be semantically related or similar. Consequently, brands and products mentioned in specific contexts can be attributed to the workings of LLMs.

Uncovering the Inner Workings of LLMs

Modern LLMs, such as GPT and Bard, are built upon statistical analysis of token co-occurrence. Texts and data are broken down into tokens and processed by the machine, which positions them in a semantic space using vectors. This semantic space, often referred to as an ontology, helps the AI understand the relationships between different entities. While LLMs primarily rely on statistics rather than semantics, the abundance of data allows them to approximate semantic understanding. Semantic proximity is determined by measures such as Euclidean distance or cosine angle measure in the semantic space.

The Possibility of Proactively Influencing Generative AI Outputs

The question of whether generative AI outputs can be proactively influenced through LLM optimization remains a topic of debate. To gain a deeper understanding, it is essential to consult data science experts who possess comprehensive knowledge of large language models. We reached out to three experts to gather their insights on the matter.

Kai Spriestersbach, an Applied AI researcher and SEO veteran, believes that while it may be theoretically possible for political actors or states to influence AI outputs, it is impractical for business marketing purposes. Commercial LLM providers do not disclose their training data, and commercial AI responses are designed to be neutral and uncontroversial. Additionally, influencing an AI’s “opinion” would require an overwhelming majority of training data to reflect the desired sentiment.

According to Barbara Lampl, a Behavioral mathematician and COO at Genki, it is theoretically possible to influence an LLM through coordinated content, PR efforts, and mentions. However, analyzing this approach from a data science perspective reveals significant challenges and diminishing rewards, making it largely unfeasible.

Philip Ehring, Head of Business Intelligence at Reverse-Retail, suggests that the dynamics between LLMs, systems like ChatGPT, and SEO remain consistent. The optimization perspective shifts toward a better interface for classical information retrieval systems. Ultimately, the goal is to optimize for a hybrid metasearch engine with a natural language interface that summarizes results for users.

Insights from Data Science Perspective

Analyzing the insights provided by data science experts, we can draw several key points regarding LLM optimization:

Large commercial language models keep their training database confidential, making it challenging to influence AI outputs.
The vast amount of data and statistical significance involved makes it difficult to make a meaningful impact through LLM optimization.
Factors such as network proliferation, time, model updates, feedback loops, and economic costs pose obstacles to proactive influence.
The identification of reliable sources for training data remains a challenge.
LLM optimization requires substantial resources compared to traditional SEO.
The dynamics between LLMs, systems like ChatGPT or Bard, and SEO remain consistent, signaling the necessity of adapting SEO strategies.
The integration of web crawling and models like BERT enhances SEO practices and improves the quality of AI-generated responses.
While LLMs excel at computing similarities, they may struggle with providing factual answers or solving logical tasks. The implementation of techniques like Retrieval-Augmented Generation (RAG) can address these limitations.
The prominence of content in LLM training is influenced by relevance and discoverability, offering an optimization opportunity for aligning content with potential answers.
LLM optimization success is influenced by market size, with niche markets offering greater potential for brand positioning.

Selecting Training Data for LLMs

Two approaches can be taken when selecting training data for LLMs: E-A-T and ranking.

One possible approach is to utilize Google’s E-A-T (Expertise, Authoritativeness, Trustworthiness) concept to select sources that meet a certain quality standard and are trustworthy. The Knowledge Graph can also be employed for fact-checking and fine-tuning LLMs by verifying entities.

The second approach, as suggested by Philip Ehring, involves selecting training data based on relevance and quality determined by the ranking process. This approach assumes that established evaluation procedures used by search engines can be leveraged to select training data. Incorporating E-A-T alongside relevance evaluation could provide a comprehensive approach to training data selection.

However, tests conducted on Bing Chat and SGE (Search Grid Engine) have not revealed clear correlations between the referenced sources and rankings, indicating the complexity of training data selection.

The Impact and Future of LLM Optimization

The true impact and potential of LLM optimization or GAIO as an SEO strategy remain uncertain. While data science experts express skepticism, some SEOs believe in its potential. Several goals must be achieved to effectively leverage LLM optimization:

Establish your own media as a source of training data by focusing on E-A-T.
Generate brand and product mentions in reputable media.
Create co-occurrences between your brand and other relevant entities or attributes in qualified media.
Aim to become part of the knowledge graph.

The success of LLM optimization varies depending on the market size, with niche markets offering greater opportunities for brand association. However, larger brands with extensive PR and marketing resources may have an advantage in positioning themselves within generative AI outputs.

Another perspective is to continue traditional SEO practices, as well-ranking content can be used to train LLMs. It is crucial to optimize for co-occurrences between brands/products and attributes or other entities.

The future of SEO in relation to LLM optimization remains uncertain and will only become clear once SGE is introduced. The shift in perspective presents an opportunity to optimize for a hybrid metasearch engine with a natural language interface, summarizing results for users.

See first source: Search Engine Land

FAQ

Q1: What is LLM optimization or GAIO?

LLM optimization, or Generative AI Optimization (GAIO), is a strategy aimed at strategically positioning brands and products within the outputs of large language models (LLMs) like GPT and Google Bard. The goal is to increase visibility and influence potential consumers’ purchasing decisions by optimizing content for these AI systems.

Q2: How do generative AI recommendations work?

Generative AI tools rely on contextual suggestions generated from neutral secondary sources like trade magazines, news sites, and blogs. These sources provide the foundation for statistical analysis, with the AI determining recommendations based on word co-occurrences in training data. Frequent word associations in the training data influence the recommendations made by the AI.

Q3: Can generative AI outputs be proactively influenced through LLM optimization?

The possibility of proactively influencing generative AI outputs remains a topic of debate. Data science experts express skepticism about the feasibility of significantly impacting AI recommendations through LLM optimization. Challenges include the confidentiality of training data, the vast amount of data involved, and the complexity of influencing AI outputs.

Q4: How can businesses potentially leverage LLM optimization?

Businesses can potentially leverage LLM optimization by focusing on factors like expertise, authoritativeness, trustworthiness (E-A-T), generating mentions in reputable media, creating co-occurrences between brands and relevant entities in qualified media, and aiming to become part of the knowledge graph. Success in LLM optimization may vary depending on the market size and available resources.

Q5: What is the impact of LLM optimization on traditional SEO practices?

LLM optimization and traditional SEO practices can complement each other. Well-ranking content can be used to train LLMs, and optimizing for co-occurrences between brands/products and relevant entities can enhance SEO efforts. The integration of web crawling and models like BERT can improve the quality of AI-generated responses.

Q6: What is the future of SEO in relation to LLM optimization?

The future of SEO in relation to LLM optimization remains uncertain and will become clearer as the field evolves. The shift in perspective presents an opportunity to optimize for a hybrid metasearch engine with a natural language interface, summarizing results for users. The impact and potential of LLM optimization as an SEO strategy will depend on various factors, including market size and resources.

Featured Image Credit: Bernd Dittrich; Unsplash – Thank you!