Skip to content
🔥Top deals under €159, Save €20 when you buy 2, and €40 when you buy 3!  [👉Shop Now]

News

Can Intel Integrated Graphics Really Run a 120B LLM? Let’s Test It for Real

by GMKtecOfficial 26 Dec 2025 0 Comments

Introduction: Why This Matters

Hello everyone, this is Dalu.

On November 20, 2025, Intel showcased a series of live demos at its Technology Innovation & Industry Ecosystem Conference, demonstrating thin-and-light laptops and mini PCs running 120B large language models using only Intel integrated graphics.

I was there in person—and honestly, the information density of this session was explosive.

What Intel revealed wasn’t just a demo. It was a clear signal:

Any device powered by Intel can now run AI.

This applies not only to laptops and mini PCs, but also to edge devices such as NAS systems. I’ve personally tested and shared experiences with an AI Home NAS, where Intel iGPU-based AI workloads—once unimaginable—are now fully practical.


From “Impossible” to Everyday Productivity

With Intel® Core™ Ultra AI processors, tasks like:

  • Audio transcription with timeline visualization

  • Intelligent summarization of meeting recordings

  • Content indexing across audio, images, and video

are no longer limited to cloud servers or discrete GPUs.

Previously, Windows search was limited to filenames or basic document text. Today, AI-powered systems can:

  • Recognize text inside images

  • Understand image content

  • Summarize the video and audio

  • Perform multilingual voice cloning with just seconds of audio

Yes—this is already a nightmare for traditional voice actors.

Why I Personally Use Voice Cloning

In my own content creation workflow:

  1. I record freely (often too freely—videos get long)

  2. AI generates a subtitle timeline

  3. I refine structure and pacing

  4. AI recreates my voice using TTS + voice cloning

  5. Final output is shorter, clearer, and more professional

Without AI, long-form videos struggle to gain traction. With AI, reach and engagement improve dramatically.

AI doesn’t just increase productivity—it changes how we work.


The Real Challenge: Making AI Practical

For many people, AI feels both familiar and distant.

You’ve seen it everywhere:

  • Short-video narration

  • Face swaps

  • Voice cloning memes

But many still believe AI is too complex or inaccessible.

That perception is outdated.

  • 2023 was the Year of AI (commercial adoption)

  • 2025 is the Year of AI Applications (consumer-level adoption)

With Intel Core Ultra integrated graphics, the barrier to entry has dropped dramatically. AI is no longer elite—it’s usable.


How Can Integrated Graphics Run Large Models?

Let’s clarify performance fundamentals:

Compute hierarchy (approx.):
Discrete GPU > iGPU ≈ NPU > CPU

Historically, iGPUs could only handle up to 14B parameter models, limited by:

  1. Shared memory constraints

  2. Inference speed falling below usable thresholds (<10 tokens/s)

Breakthrough #1: Expanded Shared Memory

Intel’s recent driver updates allow dynamic shared memory allocation.

For example:

  • With 96GB RAM, up to 90.7GB can be allocated to iGPU/NPU

  • Memory is non-exclusive, dynamically shared with the OS

Unlike unified memory systems, where allocation is locked, Intel’s approach is flexible—and far more cost-efficient.


Breakthrough #2: Model Optimization

AI models have evolved rapidly:

  • Dense → Distilled → Quantized → Sparse (MoE) models

A key example: Qwen A3B (Active 3B)

  • Total parameters: ~30B

  • Active parameters per token: ~3B

  • Architecture: Mixture of Experts (MoE)

Result:

  • Larger knowledge base

  • Faster inference

  • Practical on iGPU

For all tests below, inference speed stays above 10 tokens/s, which is the accepted minimum for fluent interaction.


Test Scope

Beyond LLMs, this test also covers:

  • Text-to-image generation (Z-Image)

  • OCR

  • Text-to-speech (TTS)

Text-to-video is still beyond iGPU capabilities—for now.


Test Platform

Why this system?

  • Excellent price-to-performance ratio

  • Triple PCIe 4.0 NVMe slots

  • Thunderbolt 4 + OCuLink support

  • Advanced cooling (VC vapor chamber + dual fans)

  • Built-in AI assistant with model store

This setup allows both local AI workloads and external GPU expansion.

 

LLM Performance: From 20B to 120B

20B GPT-OSS (Q4)

  • Output: ~2,380 tokens

  • Speed: ~17 tokens/s

  • Markdown & table support fully usable

120B GPT-OSS (Q4)

  • iGPU memory usage: high but stable

  • Speed: ~11 tokens/s sustained

  • No significant slowdown even after long outputs

This alone would have sounded absurd just a year ago.


Sparse Models: 30B to 80B

Using Qwen A3B models via tools like Flowy AI Assistant and Intel’s XPU runtime:

  • 30B A3B Q4: ~15 tokens/s

  • 80B A3B Q4: ~11+ tokens/s

Even complex tasks like:

  • PDF analysis

  • Mind map generation

  • Creative writing

run smoothly on iGPU.


Text-to-Image: Z-Image-TurBo on iGPU

Z-Image-TurBo:

  • ~6B parameters

  • Excellent text understanding

  • Strong Chinese text rendering

With OpenVINO™ acceleration:

Resolution Time
1024×1024 ~70s
1024×768 ~52s
600×900 ~35s

Compared to RTX 3060 benchmarks, this is remarkably competitive—especially given the power consumption difference.


OCR: Finally Accurate

Modern AI-based OCR dramatically improves accuracy over legacy methods.

Using DeepSeek OCR:

  • Low memory usage

  • Markdown/Text output

  • High accuracy even with mixed font sizes

Not perfect—but each 1% gain in accuracy saves enormous time.


TTS & Voice Cloning

Using fireredTTS2:

  • Multi-speaker dialogue supported

  • <5s audio samples required

  • 36s output generated in ~100s

Compared to my RTX 5090D system:

  • GPU power draw: 400W+

  • iGPU power draw: ~20W

  • Performance gap: <3×

Efficiency wins.


Summary: Turning the Impossible into Reality

Measured Results

  • 20B OSS Q4: ~17 tokens/s

  • 30B A3B Q4: ~15 tokens/s

  • 80B A3B Q4: ~11+ tokens/s

  • 120B OSS Q4: ~11+ tokens/s

  • Z-Image 1024²: ~70s

  • TTS output ratio: ~1:2.8

Intel Core Ultra integrated graphics have crossed a threshold:

  • AI is usable

  • AI is efficient

  • AI is affordable

What was once impossible is now practical—and surprisingly good.


Final Thoughts

If you already own a Core Ultra 200H laptop or mini PC, start experimenting with AI now.
If you’re upgrading, Core Ultra is absolutely worth considering.

AI is no longer the future.
It’s already here—and it runs on integrated graphics.

Thank you for reading.
Wishing you success, prosperity, and a great year ahead 🚀

 

Source:

https://zhuanlan.zhihu.com/p/1983964097180620690?share_code=rDDs5TkXj4ds&utm_psn=1983983895553802759

https://mp.weixin.qq.com/s/li46MLD6mhyLG3Og-V4gBQ

 

Prev Post
Next Post

Leave a comment

Please note, comments need to be approved before they are published.

Thanks for subscribing!

This email has been registered!

Shop the look

Choose Options

Edit Option
Have Questions?
Back In Stock Notification

Choose Options

this is just a warning
Login
Shopping Cart
0 items