DeepSeek’s distilled new R1 AI model can run on a single GPU

Must Read
bicycledays
bicycledayshttp://trendster.net
Please note: Most, if not all, of the articles published at this website were completed by Chat GPT (chat.openai.com) and/or copied and possibly remixed from other websites or Feedzy or WPeMatico or RSS Aggregrator or WP RSS Aggregrator. No copyright infringement is intended. If there are any copyright issues, please contact: bicycledays@yahoo.com.

DeepSeek’s up to date R1 reasoning AI mannequin is perhaps getting the majority of the AI neighborhood’s consideration this week. However the Chinese language AI lab additionally launched a smaller, β€œdistilled” model of its new R1, DeepSeek-R1-0528-Qwen3-8B, that DeepSeek claims beats comparably sized fashions on sure benchmarks.

The smaller up to date R1, which was constructed utilizing the Qwen3-8B mannequin Alibaba launched in Could as a basis, performs higher than Google’s Gemini 2.5 Flash on AIME 2025, a group of difficult math questions.

DeepSeek-R1-0528-Qwen3-8B additionally practically matches Microsoft’s just lately launched Phi 4 reasoning plus mannequin on one other math expertise take a look at, HMMT.

So-called distilled fashions like DeepSeek-R1-0528-Qwen3-8B are usually much less succesful than their full-sized counterparts. On the plus aspect, they’re far much less computationally demanding. In response to the cloud platform NodeShift, Qwen3-8B requires a GPU with 40GB-80GB of RAM to run (e.g., an Nvidia H100). The complete-sized new R1 wants round a dozen 80GB GPUs.

DeepSeek skilled DeepSeek-R1-0528-Qwen3-8B by taking textual content generated by the up to date R1 and utilizing it to fine-tune Qwen3-8B. In a devoted net web page for the mannequin on the AI dev platform Hugging Face, DeepSeek describes DeepSeek-R1-0528-Qwen3-8B as β€œfor each tutorial analysis on reasoning fashions and industrial growth targeted on small-scale fashions.”

DeepSeek-R1-0528-Qwen3-8B is out there underneath a permissive MIT license, which means it may be used commercially with out restriction. A number of hosts, together with LM Studio, already supply the mannequin by way of an API.

Latest Articles

Digg’s founders explain how they’re building a site for humans in...

The rebooted model of social web site Digg goals to carry again the spirit of the outdated net at...

More Articles Like This