LLMs and the number 27 - A thought experiment
Jul 20, 2025

Lalith Venkatesh
This experiment started with a simple tweet. Someone casually joked: "ChatGPT always picks 27 when asked for a random number." I'd heard this claim before, and I'd even witnessed it in my own conversations with AI. But was it actually true, or just another internet meme?
More importantly, what would this behavior reveal about how Large Language Models (LLMs) handle uncertainty and ambiguous prompts? If AI models are rapidly becoming the new interface for search and discovery, understanding their default behaviors—even in seemingly trivial scenarios—could shed light on how they might handle real, high-intent questions that drive business outcomes.
So I decided to put it to the test.
You can view the looker studio data report i created for this here.
The Hypothesis
My working theory was that LLM responses show consistency across different platforms and sessions because they rely on similar underlying training data and methodologies. If this proved true, it would suggest several important implications:
Different AI platforms might share overlapping training data but interpret queries through subtly different lenses
The way we phrase prompts could nudge models toward certain types of responses (factual, numerical, explanatory)
Web search integration might introduce new signals that could shift or completely override a model's default tendencies
To test this hypothesis systematically, I used Radix to run the experiment at scale.
The Methodology
Here's how I structured the test:
800 total prompts distributed across ChatGPT, Perplexity, Gemini, and Bing Copilot
Multiple geographic regions to account for localization differences
Five distinct prompt formats: direct requests, conversational tone, trivia-style questions, code-related prompts, and intentionally vague phrasing
Two-day testing window to capture potential temporal variations
Comprehensive tracking of answer content, search behavior, response format (plain number, explanation, code), and citation patterns
The Results

The findings were both surprising and revealing:
The "27 myth" turned out to be largely true
This number appeared in 55% of all responses across every LLM tested. The consistency was remarkable and crossed platform boundaries.
37 emerged as the runner-up
Capturing 18% of responses, this number appeared most frequently when Perplexity triggered web searches, suggesting external sources were influencing the results.
Code responses occurred 6% of the time
Particularly when prompts included words like "generate," indicating that subtle language cues significantly impact response format.
Some models showed personality
Occasionally, LLMs would respond with evasive or playful answers like "I'm not sure" or "Let's play a game," revealing different approaches to handling ambiguous requests.
But the deeper insights came from analyzing consistency patterns and trigger behaviors across platforms.
Platform-Specific Behaviors
Each AI platform exhibited distinct characteristics that reveal important patterns about consistency and search behavior:

Perplexity proved to be the most reliable, repeating the same answer 85% of the time. This consistency makes it highly predictable for content creators trying to understand what responses their audience might encounter.
ChatGPT demonstrated both the highest variability and the greatest sensitivity to tone and phrasing. When asked to "pick a number between 1 and 50," it would default to 27 or 37. But reframe the same query as "what number is most commonly picked?" and something fascinating happened: ChatGPT would initiate a web search, cite multiple sources, and deliver an entirely different, more researched response.
Gemini proved most unpredictable, rarely defaulting to 27 and often providing thoughtful explanations rather than simple numerical answers. Like ChatGPT, it frequently produced varied responses across sessions.
Bing Copilot rarely performed web searches and consistently defaulted to 27, suggesting a more conservative, cache-dependent approach that relies heavily on training data rather than real-time information.
The Critical Insight: Search Triggers Determine Visibility

The most important discovery wasn't about which numbers AI systems prefer—it was about understanding when and why they search for new information versus relying on training data.
LLM responses vary dramatically based on information source: The same AI system will give completely different answers depending on whether it draws from training data or performs a web search. When Perplexity shifted from answering "27" to "37," it was because Reddit threads and math trivia websites influenced the web search results. ChatGPT, meanwhile, seemed to favor more structured blog content when making its decisions.
Tone and phrasing are everything: The way you frame a question doesn't just influence the answer—it determines the entire response mechanism. A slight rephrasing can shift an AI from providing a cached response to conducting a comprehensive web search with citations.
Web search triggers are inconsistent: Not all topics or query types prompt AI systems to search for fresh information. If your target topic doesn't regularly trigger web searches, your newest, most relevant content may never be discovered or cited, regardless of its quality or accuracy.
This observation carries profound implications for content strategy. If AI systems shape their responses based on what they discover online, and if they're more likely to trust casual Reddit discussions than authoritative brand content, then businesses could find themselves invisible in AI-mediated searches, regardless of the technical accuracy or quality of their content.
Why This Matters for Marketers?
While this might seem like a frivolous experiment, the implications for digital marketing are profound:
AI systems don't generate truly random responses – they make deliberate choices based on learned patterns and discovered information. Understanding these patterns is crucial for predicting how AI will represent your brand or industry.
Prompt engineering has real impact – The way users phrase their questions directly influences not just the answer, but whether the AI searches for new information at all. This means businesses need to consider how their target audience naturally expresses their needs and understand what phrasing triggers web searches in their domain.
Search triggers determine visibility – If your target topics don't regularly prompt AI systems to perform web searches, your content strategy should prioritize areas that do. There's no point creating fresh content if the AI will never look for it.
Authority isn't algorithmic – The sources that LLMs cite and reference aren't necessarily the most authoritative; they're often simply the most discoverable or query-relevant. Traditional SEO metrics may not translate directly to AI visibility.
Invisibility is binary – If your content isn't being cited, summarized, or surfaced by AI systems, you effectively don't exist in these new search paradigms. There's no "page two" of AI results.
This reality is driving the emergence of Generative Engine Optimization (GEO) as a new discipline. However, before businesses can effectively optimize for AI visibility, they need robust measurement and understanding of their current standing.
The Role of AI Analytics Tools
This experiment highlighted the need for specialized tools that can track AI visibility at scale. Platforms like Radix enable businesses to:
Monitor which prompts and queries mention their brand (and crucially, which don't)
Track competitor visibility across different AI platforms
Understand how variations in tone, topic, and source material affect results
Conduct large-scale experiments to test optimization strategies
Think of it as Google Analytics for the AI age – providing visibility into a new frontier of customer touchpoints.