Run the world's best
AI models locally
From tiny 0.5B parameter models to massive 72B giants – all running natively on your Apple Silicon Mac with MLX optimization.
MLX Models
Apple's Machine Learning Framework
MLX models are specifically optimized for Apple Silicon. They leverage the unified memory architecture of M-series chips for maximum performance and efficiency.
GGUF Models
Powered by llama.cpp
GGUF is the universal format supported by llama.cpp. Access thousands of community-quantized models from Hugging Face with broad compatibility.
Popular Model Families
Qwen
Alibaba's powerful multilingual model family. Excellent reasoning and coding capabilities.
Available Sizes
Formats
Llama
Meta's open-source flagship model. Industry-leading performance across all benchmarks.
Available Sizes
Formats
Gemma
Google's lightweight yet powerful model family. Great balance of speed and capability.
Available Sizes
Formats
Mistral
French AI lab's efficient models. Exceptional performance relative to model size.
Available Sizes
Formats
Phi
Microsoft's small language models. Surprising capability in tiny packages.
Available Sizes
Formats
DeepSeek
Cutting-edge reasoning model with exceptional math and coding performance.
Available Sizes
Formats
And thousands more from Hugging Face and the open-source community.
Download & Explore ModelsOptimized for Apple Silicon
MLX Acceleration
Native Metal GPU acceleration through Apple's MLX framework. Unified memory means no bottlenecks.
Efficient Quantization
Run larger models in less memory with 4-bit and 8-bit quantization. No noticeable quality loss.
Local Inference
Every token generated on your device. No network latency, no rate limits, no API costs.