1,000+ Models

Run the world's best
AI models locally

From tiny 0.5B parameter models to massive 72B giants – all running natively on your Apple Silicon Mac with MLX optimization.

MLX

MLX Models

Apple's Machine Learning Framework

MLX models are specifically optimized for Apple Silicon. They leverage the unified memory architecture of M-series chips for maximum performance and efficiency.

GGUF

GGUF Models

Powered by llama.cpp

GGUF is the universal format supported by llama.cpp. Access thousands of community-quantized models from Hugging Face with broad compatibility.

Popular Model Families

Qwen

Qwen

Alibaba's powerful multilingual model family. Excellent reasoning and coding capabilities.

Available Sizes

0.5B1.5B3B7B14B32B72B

Formats

MLXGGUF
Llama

Llama

Meta's open-source flagship model. Industry-leading performance across all benchmarks.

Available Sizes

1B3B8B70B

Formats

MLXGGUF
Gemma

Gemma

Google's lightweight yet powerful model family. Great balance of speed and capability.

Available Sizes

2B7B9B27B

Formats

MLXGGUF
Mistral

Mistral

French AI lab's efficient models. Exceptional performance relative to model size.

Available Sizes

7B8x7B8x22B

Formats

MLXGGUF
Phi

Phi

Microsoft's small language models. Surprising capability in tiny packages.

Available Sizes

1.5B2.7B3.8B14B

Formats

MLXGGUF
DeepSeek

DeepSeek

Cutting-edge reasoning model with exceptional math and coding performance.

Available Sizes

1.5B7B8B16B67B

Formats

MLXGGUF

And thousands more from Hugging Face and the open-source community.

Download & Explore Models

Optimized for Apple Silicon

MLX Acceleration

Native Metal GPU acceleration through Apple's MLX framework. Unified memory means no bottlenecks.

Efficient Quantization

Run larger models in less memory with 4-bit and 8-bit quantization. No noticeable quality loss.

Local Inference

Every token generated on your device. No network latency, no rate limits, no API costs.