1,000+ Models

Run the world's best
AI models locally

From tiny 0.5B parameter models to massive 72B giants – all running natively on your Apple Silicon Mac with MLX optimization.

MLX

MLX Models

Apple's Machine Learning Framework

MLX models are specifically optimized for Apple Silicon. They leverage the unified memory architecture of M-series chips for maximum performance and efficiency.

GGUF

GGUF Models

GGUF is the universal format supported by llama.cpp. Access thousands of community-quantized models from Hugging Face with broad compatibility.

Popular Model Families

Qwen

Alibaba's powerful multilingual model family. Excellent reasoning and coding capabilities.

Available Sizes

0.5B1.5B3B7B14B32B72B

Formats

MLXGGUF

Llama

Meta's open-source flagship model. Industry-leading performance across all benchmarks.

Available Sizes

1B3B8B70B

Formats

MLXGGUF

Gemma

Google's lightweight yet powerful model family. Great balance of speed and capability.

Available Sizes

2B7B9B27B

Formats

MLXGGUF

Mistral

French AI lab's efficient models. Exceptional performance relative to model size.

Available Sizes

7B8x7B8x22B

Formats

MLXGGUF

Phi

Microsoft's small language models. Surprising capability in tiny packages.

Available Sizes

1.5B2.7B3.8B14B

Formats

MLXGGUF

DeepSeek

Cutting-edge reasoning model with exceptional math and coding performance.

Available Sizes

1.5B7B8B16B67B

Formats

MLXGGUF

And thousands more from Hugging Face and the open-source community.

Download & Explore Models

Optimized for Apple Silicon

MLX Acceleration

Native Metal GPU acceleration through Apple's MLX framework. Unified memory means no bottlenecks.

Efficient Quantization

Run larger models in less memory with 4-bit and 8-bit quantization. No noticeable quality loss.

Local Inference

Every token generated on your device. No network latency, no rate limits, no API costs.

Run the world's best AI models locally

MLX Models

GGUF Models

Popular Model Families

Qwen

Llama

Gemma

Mistral

Phi

DeepSeek

Optimized for Apple Silicon

MLX Acceleration

Efficient Quantization

Local Inference

Run the world's best
AI models locally