↓Skip to main content

Inference

loading · loading ·

The State of Local and Affordable Inference in October 2025

21 October 2025·1379 words·7 mins

An overview of the current landscape of GPUs and AI compute for local inference as of October 2025 from Nvidia and AMD to Intel, Apple, and the cloud.

NVIDIA DGX Spark: underwhelming and late to the Party

17 October 2025·890 words·5 mins

NVIDIA’s DGX Spark arrives late as an AI inference system whose performance is lagging behind. With low speed unified VRAM, immature software optimizations, and heavy competition from Apple, AMD, and Intel, the Spark exposes how little remains of NVIDIA’s CUDA moat.

Ollama Environment Configuration

16 December 2024·66 words·1 min

Key environment settings for running Ollama efficiently on Windows and Linux.

Force Ollama to use only a single Nvidia GPU

16 December 2024·163 words·1 min

Selecting and enabling which GPUs are visible to ollama within windows.

Base Mac Mini M4, The alternatives for Low-End NVIDIA Hardware for Inference

16 December 2024·741 words·4 mins

How the Mac Mini M4 has enabled affordable Local LLM inference and making VRAM-starved NVIDIA cards obsolete for low power and low cost