👁️ DeepSeek Can Now "See"
On May 9, 2026, DeepSeek quietly rolled out something that changes the AI search landscape in China.
It launched image recognition mode — the ability to actually "see" and understand images, not just extract text from them. Almost every test account now has access to this feature.
This matters because DeepSeek isn't just another AI chatbot. It's one of China's fastest-growing AI search platforms, and it's now a visual search engine too.
For brands investing in GEO (Generative Engine Optimization), this raises a critical question: are your images optimized for AI search?
🎯 Why This Changes GEO
Until now, GEO has been about text. You optimize your content so AI models recommend your brand when users search for relevant topics.
But DeepSeek's new capability means AI search is no longer text-only. Users can now upload images and ask DeepSeek to analyze them. And when they do, DeepSeek's visual understanding kicks in.
This efficiency advantage matters.
DeepSeek's visual reasoning costs dramatically less than competitors, which means it can afford to "look" at more images in more queries — increasing the importance of visual content in AI search rankings.
The images on your website, product pages, and marketing materials are now part of your GEO strategy. DeepSeek can analyze product photos, infographics, brand visuals, and screenshots — and use that understanding to influence its recommendations.
🔬 The Tech Behind It: Thinking with Visual Primitives
DeepSeek's approach is notably different from other multimodal models. According to its technical report, the team developed a framework called "Thinking with Visual Primitives".
Traditional multimodal models struggle with what's called the "referential gap" — when analyzing a crowded image, they use vague language like "the big one on the left" in their reasoning chain, which causes attention drift and errors.
DeepSeek's solution is elegant: it incorporates visual elements directly into its reasoning chain. Points, bounding boxes, and spatial coordinates become the "basic units of thought" — like a cybernetic finger pointing at exactly what it's analyzing.
- User uploads an image
- DeepSeek identifies visual primitives (points, edges, bounding boxes)
- These primitives become part of the reasoning chain — like "the model points to coordinates (x,y) and identifies the object there"
- Result: precise spatial reasoning without the fuzzy language that plagues other models
This matters for three reasons:
- Accuracy: Visual primitives eliminate the "referential gap" — no more "that thing over there" errors
- Efficiency: 90 tokens vs 870-1100 for GPT/Claude makes it vastly cheaper to deploy
- Scalability: Because it's cheaper, DeepSeek can afford to analyze more images in more queries
📋 What This Means for Your Content Strategy
1. Image-Ready GEO
When optimizing for AI search, you now need to consider how your images will be interpreted. Product images, infographics, and brand visuals are no longer just decorative — they're data points DeepSeek uses to understand your brand.
Action: Add descriptive captions to all key images. Use proper alt text. Structure your visual content so AI can easily parse what it represents.
-
📸Optimize product images Ensure product photos are clear, well-lit, and include descriptive filenames and alt text. DeepSeek will analyze these for visual search queries.
-
📊Structure infographics for AI Use clear labels, logical flow, and text overlays in visual content. DeepSeek's visual primitives parse structured visuals more accurately.
-
🖼️Pair text with supporting visuals Every major content section should have a companion visual. DeepSeek cross-references text and images to build stronger trust signals.
2. Visual Search Queries Are Coming
DeepSeek's image recognition enables a new type of search: visual queries. Users can take a photo of a product and ask DeepSeek about it. Brands that have optimized their visual presence will be recommended.
Action: Ensure your product images are clear, well-lit, and accurately labeled. Consider how DeepSeek's visual primitives will interpret your brand's visual assets.
3. Text + Image = Stronger AI Trust
DeepSeek's approach means it can cross-reference text content with visual content. A blog post with relevant, well-structured images will be weighted more heavily than text alone.
Action: Pair every major piece of text content with supporting visuals. Charts, diagrams, and product photos that reinforce your message create stronger AI trust signals.
DeepSeek's image recognition is still in beta. Key limitations to watch:
- Knowledge lag: It may misidentify very recent products (knowledge cut-off around early 2025)
- Complex visuals: Optical illusions and counting tasks still cause errors
- No generation: It can analyze images but not generate or edit them (yet)
🎯 The Bottom Line
DeepSeek learning to "see" isn't just a technical milestone — it's a signal that AI search is expanding beyond text.
For brands doing GEO in China, this means visual content strategy is no longer optional. The images on your website are now part of how AI understands and recommends your brand.
The brands that start optimizing their visual presence for AI search today will have a significant advantage as DeepSeek's image recognition matures and becomes more deeply integrated into search workflows.
Ready to optimize your visual content for GEO?
At TMG, we help international brands build comprehensive GEO strategies — including visual content optimization for China's AI search ecosystem. From content audits to implementation, we'll make sure your brand is visible across every format AI search uses.