Lucebox is a plug-and-play computer built for running local AI models and agents at production speed. It pairs a Ryzen AI MAX+ 395 processor and 128GB of unified LPDDR5X memory with an RTX 3090, all driven by a hand-tuned open-source inference engine built specifically for this hardware. Large models live in the unified memory tier while the 3090 handles high-bandwidth inference workloads. Speculative decoding and prefill techniques push output speeds up to 10x faster than llama.cpp on equivalent silicon. The entire stack ships pre-installed, and deploying any open model takes a single CLI command.
Key Description
- Pre-configured hardware stack: Ships with drivers, inference engine, and software fully installed out of the box.
- Dual memory architecture: Uses 128GB unified LPDDR5X for large model storage alongside the 3090 VRAM as a fast inference tier.
- DFlash and PFlash engines: Speculative decoding and speculative prefill deliver up to 10x speed gains over llama.cpp on the same silicon.
- Open-source inference layer: Full stack published under Luce-Org/lucebox-hub on GitHub, with active community contributions.
- Single command deployment: Any open model can be loaded and run with one CLI command, no environment setup required.
Benefits
- Immediate setup: No driver configuration, quantization tuning, or environment debugging needed.
- Top inference speeds: Outperforms Mac Studio and DGX Spark on tokens per second at a lower effective cost.
- Complete data privacy: All inference runs locally with no cloud dependency or data exposure.
- Predictable cost: One fixed price at $4,900 replaces ongoing API subscription fees.
- Continuous agent operation: Designed to run models and agents around the clock on hardware you fully own.