DeepSeek, 눈을 뜨다

2026년 5월 9일, DeepSeek는 중국 AI 검색 지형을 바꿀 무언가를 조용히 출시했다.

It launched image recognition mode — the ability to actually "see" and understand images, not just extract text from them. Almost every test account now has access to this feature.

This matters because DeepSeek isn't just another AI chatbot. It's one of China's fastest-growing AI search platforms, and it's now a visual search engine too.

For brands investing in GEO (Generative Engine Optimization), this raises a critical question: are your images optimized for AI search?

이것이 GEO를 변화시키는 이유

지금까지 GEO는 텍스트에 관한 것이었다. 사용자가 관련 주제를 검색할 때 AI 모델이 브랜드를 추천하도록 콘텐츠를 최적화하는 것이다.

하지만 DeepSeek의 새로운 기능은 AI 검색이 더 이상 텍스트 전용이 아님을 의미한다. 사용자는 이제 이미지를 업로드하고 DeepSeek에 분석을 요청할 수 있다.

~90
Tokens per 800×800 image (DeepSeek)
870-1100
Tokens per same image (GPT/Claude)
10x
More efficient than GPT/Claude on vision tasks

웹사이트, 제품 페이지, 마케팅 자료의 이미지가 이제 GEO 전략의 일부가 된다. DeepSeek는 제품 사진, 인포그래픽, 브랜드 비주얼, 스크린샷을 분석하고 그 이해를 바탕으로 추천에 영향을 미칠 수 있다.

🔍
DeepSeek Vision
~90 tokens per 800x800 image. "Thinking with Visual Primitives" framework for precise spatial reasoning. Drastically lower cost per analysis.
🤖
GPT-4o Vision
~870 tokens per same image. Reliable general-purpose visual understanding but significantly more expensive per query.
🟣
Claude Vision
~1,100 tokens per same image. Strong at detailed analysis but struggles with spatial reasoning in dense scenes.
🏆
Winner: DeepSeek
10x more efficient, purpose-built for Chinese AI search ecosystem. Best cost-performance ratio for brands targeting China.

DeepSeek's visual reasoning costs dramatically less than competitors, which means it can afford to "look" at more images in more queries — increasing the importance of visual content in AI search rankings.

💡 What This Means

DeepSeek의 접근 방식은 다른 멀티모달 모델과 현저히 다르다. 팀은 '비주얼 프리미티브로 사고하기'(Thinking with Visual Primitives)라는 프레임워크를 개발했다.

기존 멀티모달 모델은 '지시적 간극'에 어려움을 겪는다 — 추론 체인에서 '왼쪽에 있는 큰 것'과 같은 모호한 언어를 사용한다. DeepSeek는 시각 요소를 '사고의 기본 단위'로서 추론 체인에 직접 통합한다.

DeepSeek's approach is notably different from other multimodal models. According to its technical report, the team developed a framework called "Thinking with Visual Primitives".

800x800 이미지당 약 90토큰 (GPT/Claude 870~1100 대비) — 비전 작업에서 10배 효율적

DeepSeek's solution is elegant: it incorporates visual elements directly into its reasoning chain. Points, bounding boxes, and spatial coordinates become the "basic units of thought" — like a cybernetic finger pointing at exactly what it's analyzing.

🔬 How It Works
  1. 1. 이미지 대응 GEO
  2. AI 검색 최적화 시, 이미지가 어떻게 해석될지를 고려해야 한다. 모든 주요 이미지에 설명 캡션을 추가하라. 적절한 alt 텍스트를 사용하라.
  3. 2. 비주얼 검색 쿼리의 도래
  4. DeepSeek의 이미지 인식은 새로운 유형의 검색을 가능하게 한다: 비주얼 쿼리. 사용자는 제품 사진을 찍어 DeepSeek에 질문할 수 있다.

3. 텍스트 + 이미지 = AI 신뢰도 향상

  • Accuracy: Visual primitives eliminate the "referential gap" — no more "that thing over there" errors
  • Efficiency: 90 tokens vs 870-1100 for GPT/Claude makes it vastly cheaper to deploy
  • Scalability: Because it's cheaper, DeepSeek can afford to analyze more images in more queries

블로그 URL: https://www.tuyuesouxin.cn/blog/deepseek-image-recognition-geo/

출처: IT之家 DeepSeek 기사 (WeChat, 2026년 5월 9일)

DeepSeek는 텍스트 콘텐츠와 비주얼 콘텐츠를 상호 참조할 수 있다. 관련성 있고 잘 구조화된 이미지가 있는 블로그 게시물은 더 높은 가중치를 받는다.

Action: Add descriptive captions to all key images. Use proper alt text. Structure your visual content so AI can easily parse what it represents.

  • 📸
    Optimize product images Ensure product photos are clear, well-lit, and include descriptive filenames and alt text. DeepSeek will analyze these for visual search queries.
  • 📊
    Structure infographics for AI Use clear labels, logical flow, and text overlays in visual content. DeepSeek's visual primitives parse structured visuals more accurately.
  • 🖼️
    Pair text with supporting visuals Every major content section should have a companion visual. DeepSeek cross-references text and images to build stronger trust signals.

2. Visual Search Queries Are Coming

DeepSeek's image recognition enables a new type of search: visual queries. Users can take a photo of a product and ask DeepSeek about it. Brands that have optimized their visual presence will be recommended.

Action: Ensure your product images are clear, well-lit, and accurately labeled. Consider how DeepSeek's visual primitives will interpret your brand's visual assets.

3. Text + Image = Stronger AI Trust

DeepSeek's approach means it can cross-reference text content with visual content. A blog post with relevant, well-structured images will be weighted more heavily than text alone.

Action: Pair every major piece of text content with supporting visuals. Charts, diagrams, and product photos that reinforce your message create stronger AI trust signals.

⚠️ Current Limitations

DeepSeek's image recognition is still in beta. Key limitations to watch:

  • Knowledge lag: It may misidentify very recent products (knowledge cut-off around early 2025)
  • Complex visuals: Optical illusions and counting tasks still cause errors
  • No generation: It can analyze images but not generate or edit them (yet)

🎯 The Bottom Line

DeepSeek learning to "see" isn't just a technical milestone — it's a signal that AI search is expanding beyond text.

For brands doing GEO in China, this means visual content strategy is no longer optional. The images on your website are now part of how AI understands and recommends your brand.

The brands that start optimizing their visual presence for AI search today will have a significant advantage as DeepSeek's image recognition matures and becomes more deeply integrated into search workflows.

Ready to optimize your visual content for GEO?

At TMG, we help international brands build comprehensive GEO strategies — including visual content optimization for China's AI search ecosystem. From content audits to implementation, we'll make sure your brand is visible across every format AI search uses.

Get in touch →