Ask ChatGPT the same question on its default model and its premium model, and you’ll get citations from almost entirely different websites. A Writesonic analysis of 50 prompts across ChatGPT’s newest models found only 7% overlap in cited sources between GPT-5.3 Instant (the new default) and GPT-5.4 Thinking (the new premium).
The headline number: 56% of GPT-5.4’s citations go directly to brand websites. Only 8% of GPT-5.3’s do.
Same questions. Same search index. Completely different outcomes for the brands being discussed.
What Writesonic Tested
The study ran 50 prompts across GPT-5.3 Instant, GPT-5.4 Thinking, and GPT-5.2 (both Instant and Thinking) as baselines. That produced 119 total conversations. After each response, the team extracted the full conversation JSON using ChatGPT’s internal API, which exposed every fan-out query the model sent, every web search result it received, and every citation URL included in the answer.
The dataset: 532 fan-out queries extracted, 7,896 web search results analyzed, 1,161 citations classified, and 74,478 words of AI responses reviewed. They also ran 30 of the queries through both Bing and Google via SerpAPI to compare ChatGPT’s results against traditional search engines.
The 50 prompts spanned 16 categories: SaaS, ecommerce, healthcare, finance, travel, education, home, food, legal, marketing, productivity, fitness, shopping intent, comparisons, and trends. Each citation was classified as “first-party” (the actual brand’s website, like hubspot.com for HubSpot) or “third-party” (review sites, blogs, Reddit, media outlets).
The Core Finding: Two Models, Two Completely Different Citation Worlds
GPT-5.4 cited brand websites 56% of the time. GPT-5.3 cited them 8% of the time. And the previous default, GPT-5.2, cited them 22% of the time, which means the new default model is actually worse for brands than the one it replaced.
On 22 of the 50 prompts, the two models cited zero of the same websites. Across all 50 prompts, the average citation overlap was 7%. Being visible on one model provides essentially no advantage on the other.
The pattern was consistent across nearly every prompt type. On comparison queries (“X vs Y vs Z”), GPT-5.3 never cited a single brand website. Not once across all comparison prompts. GPT-5.4 cited brands 83-100% of the time on the same queries.
Some examples from the data:
When asked “Best CRM for B2B SaaS,” GPT-5.3 cited designrevision.com and techradar.com (0% first-party). GPT-5.4 cited hubspot.com, salesforce.com, and attio.com (100% first-party).
When asked about marathon running shoes, GPT-5.3 cited irunfar.com and reddit.com. GPT-5.4 cited nike.com, asics.com, and hoka.com.
When asked to compare QuickBooks vs Xero vs FreshBooks, GPT-5.3 cited gentlefrog.com and technologyadvice.com. GPT-5.4 cited freshbooks.com, quickbooks.intuit.com, and xero.com.
The pattern held across SaaS, ecommerce, healthcare, finance, education, and consumer products.
Why the Results Are So Different: Fan-Out Query Architecture
The divergence comes down to how the two models search the web, and the difference is structural, not incremental.
GPT-5.3 sends one query per prompt: essentially the raw user question. It gets back around 27 results and cites 5-6 sources. The process is similar to how a person might Google something once and pick from the first page of results.
GPT-5.4 decomposes each prompt into an average of 8.5 sub-queries, gets back around 109 web results per prompt, and cites 14-15 sources. But the real difference is in the query types. GPT-5.4 uses two features that no other model uses: domain-restricted queries and site: operators.
Across the 50 prompts, GPT-5.4 sent 142 domain-restricted queries (targeting specific brand websites), 156 queries with site: operators (targeting review and validation platforms), and 125 open unrestricted queries. That’s 304 targeted queries out of 423 total.
The model follows a consistent two-phase pattern. Phase one is brand verification: it identifies which brands are relevant (from its training data), then sends queries directly to their websites, often restricted to specific domains. Phase two is third-party validation: it checks G2, Capterra, Shopify App Store, and review sites to validate what it found on the brand pages.
For example, when asked about email marketing platforms, GPT-5.4 sent 21 queries. The first phase went to klaviyo.com, omnisend.com, and mailchimp.com with pricing-specific searches. The second phase went to G2 restricted to g2.com and the Shopify App Store restricted to apps.shopify.com to validate the brands it already identified.
GPT-5.3 just searched “best email marketing platforms” and cited whatever came back.
The “Kingmaker” Sites on GPT-5.3
Because GPT-5.3 relies almost exclusively on third-party sources, a small number of media and review domains become gatekeepers for brand visibility. The top cited domains on GPT-5.3 were forbes.com (15 citations), techradar.com (10), tomsguide.com (10), reddit.com (7), and money.com (5).
If Forbes or TechRadar writes about a product, GPT-5.3 finds it. If they don’t, the brand is likely invisible on the default model.
GPT-5.4’s top domains were the brands themselves: hubspot.com (18 citations), shopify.com (16), salesforce.com (14), quickbooks.intuit.com (10).
The implication for link-building and digital PR strategy is direct. Coverage on high-authority media sites and review platforms feeds GPT-5.3 visibility. Without that third-party layer, a brand can have the best website in its category and still be invisible on the model that most ChatGPT users interact with.
GPT-5.4 Cites Pricing Pages 35x More Than GPT-5.3
The models don’t just cite different domains. They cite different types of pages.
GPT-5.3 is primarily a blog reader. 32% of its citations (92 out of 284) pointed to blog posts and articles. Only 1% (4 citations) pointed to pricing pages.
GPT-5.4 is a pricing page and product page reader. 19% of its citations (138 out of 739) pointed to pricing pages. 22% pointed to homepages. 10% pointed to product and feature pages. Combined, 51% of GPT-5.4’s citations landed on commercial pages.
The jump from 4 pricing page citations to 138 is a 35x increase. If a brand’s pricing page says “contact sales” instead of showing actual numbers, GPT-5.4 will find the gap and move to a competitor that publishes pricing transparently.
Google Rankings Predict GPT-5.3 Citations. GPT-5.4 Bypasses Rankings Entirely.
Writesonic checked whether domains cited by each model also appeared in Bing or Google search results for the same query.
For GPT-5.3, 47% of cited domains also ranked on Google and 27% on Bing. But 44% didn’t appear on either search engine for the same query, which means ChatGPT has its own retrieval mechanism beyond traditional search.
For GPT-5.4, the numbers flipped dramatically. 75% of cited domains didn’t appear in Bing or Google results for the same user prompt. GPT-5.4 doesn’t find brands through search rankings. It identifies them from training data, then sends domain-restricted queries directly to their websites.
When someone asks about running shoes, GPT-5.4 doesn’t search “best marathon running shoes” and hope nike.com ranks. It searches “Nike Pegasus vs ASICS Gel Nimbus vs Brooks Ghost 2026” restricted to nike.com, asics.com, and the other brand domains it already selected.
The Attribution Shift
Every citation URL across both models gets ?utm_source=chatgpt.com appended. Combined with the first-party citation rates, the attribution picture looks very different depending on which model a user is on.
On GPT-5.3, the brand gets mentioned in the answer, but 92% of clicks go to Forbes, TechRadar, Reddit, and other third-party sites. The brand gets the recommendation. Someone else gets the traffic.
On GPT-5.4, nearly half of all citation traffic goes to the brand’s own website with UTM tracking. The brand gets the recommendation and the trackable visit.
For the first time, a thinking model makes AI search attribution comparable to paid search: the user clicks to the brand’s site, and it shows up in GA4 with a clear source. Setting up a segment for utm_source=chatgpt.com in GA4 now means the data will be there as GPT-5.4 adoption grows.
A Few Surprises in the Data
GPT-5.3 surfaces older content than the previous default. Only 6% of its web search results were under 30 days old, compared to 33% on GPT-5.2. Publishing frequently and hoping recency carries the day is not a winning strategy for the new default model.
GPT-5.4 skipped web search entirely on 4 of 50 prompts, including two shopping-intent queries (“best deals on standing desks” and “gift for wife under $100”). Paradoxically, the model that researches deepest when it does search also skips search more often. And when it didn’t search, it still cited 17 sources from training data on one prompt.
Shopping intent behaves differently across models. GPT-5.4 treated “deals” and “gift” prompts as knowledge tasks rather than search tasks, answering from training data. Time-sensitive shopping queries may not trigger web search on the thinking model at all.
What the Data Means for Brands and SEO Teams
The practical takeaways split along model lines, and both tracks require attention because there’s no way to know which model a given ChatGPT user is on.
For GPT-5.3 visibility (the default model that most users interact with), the strategy is third-party distribution. Get covered on Forbes, TechRadar, Tom’s Guide, and the review sites that act as citation gatekeepers. Build and maintain guest posting placements on authoritative sites in the relevant verticals. Invest in digital PR that generates editorial coverage on the domains GPT-5.3 trusts. Without that third-party layer, a brand is functionally invisible on the default model.
For GPT-5.4 visibility (the premium model), the strategy is first-party content quality. The model goes directly to brand websites and reads pricing pages, product pages, comparison pages, and feature documentation. If that content is clear, structured, and transparent with actual pricing rather than “contact sales” gates, the brand gets cited. If the content is thin, outdated, or buried behind JavaScript rendering issues, GPT-5.4 moves on.
G2 and Capterra profiles feed GPT-5.4’s validation phase directly. The model sent queries restricted to these platforms to cross-check brand claims. Weak profiles mean weaker citations.
And for any AI visibility audit or reporting, testing only one model misses the picture. A brand that dominates on GPT-5.3 might be invisible on GPT-5.4, and vice versa. The 7% overlap means they’re functionally separate channels.
Methodology Notes
Writesonic ran 119 conversations across ChatGPT between March 7-8, 2026. All 50 prompts were tested on GPT-5.3 Instant and GPT-5.4 Thinking, with 10 also tested on GPT-5.2 for baseline comparison. The study used a single user account based in India, which may have affected some results (amazon.in appearing in citations). ChatGPT is non-deterministic, so repeat runs may produce different results. The GPT-5.2 baseline data comes from 10 prompts only.
