schizoidman@lemm.ee to Technology@lemmy.worldEnglish · 7 days agoDeepSeek's distilled new R1 AI model can run on a single GPU | TechCrunchtechcrunch.comexternal-linkmessage-square22fedilinkarrow-up1177arrow-down118
arrow-up1159arrow-down1external-linkDeepSeek's distilled new R1 AI model can run on a single GPU | TechCrunchtechcrunch.comschizoidman@lemm.ee to Technology@lemmy.worldEnglish · 7 days agomessage-square22fedilink
minus-squarebrucethemoose@lemmy.worldlinkfedilinkEnglisharrow-up2·edit-26 days agoDepends on the quantization. 7B is small enough to run it in FP8 or a Marlin quant with SGLang/VLLM/TensorRT, so you can probably get very close to the H20 on a 3090 or 4090 (or even a 3060) and you know a little Docker.
Depends on the quantization.
7B is small enough to run it in FP8 or a Marlin quant with SGLang/VLLM/TensorRT, so you can probably get very close to the H20 on a 3090 or 4090 (or even a 3060) and you know a little Docker.