Available Models

Name: Kuzco
Author: Kuzco

Kuzco provides a variety of on-device models for text generation, vision, and image creation. All models run locally on the device.

Text Generation Models

These models are optimized for chat, completion, and general text generation tasks.

Qwen3 4B

.qwen3_4b

Excellent balance of performance and size. Recommended for most use cases.

~2.5 GB32K tokens

Qwen3 8B

.qwen3_8b

Higher quality responses with more nuanced understanding.

~5 GB32K tokens

LLaMA 3 3B

.llama3_3b

Fast and efficient. Good for simpler tasks and quick responses.

~2 GB8K tokens

Phi-4 Mini

.phi4_mini

Microsoft's compact model with strong reasoning capabilities.

~2.3 GB16K tokens

Gemma 3 4B

.gemma3_4b

Google's efficient model optimized for mobile devices.

~2.7 GB8K tokens

DeepSeek R1 1.5B

.deepseekR1_1_5b

Ultra-lightweight model for basic tasks with minimal memory footprint.

~1 GB4K tokens

// Using text models
let session = try await KuzcoSession(model: .qwen3_4b)
let response = try await session.oneShot("Explain quantum computing simply.")

Vision Models

Vision models can analyze images and answer questions about visual content.

Qwen3 VL

.qwen3VL

Multimodal model for image understanding and visual Q&A.

~4 GB8K tokens

SmolVLM

.smolVLM

Compact vision-language model for efficient image analysis.

~2 GB4K tokens

// Using vision models
let session = try await KuzcoSession(model: .qwen3VL)
let response = try await session.analyzeImage(
    image,
    prompt: "What objects are in this image?"
)

Image Generation Models

Generate images from text prompts using diffusion models.

Stable Diffusion 2.1

.stableDiffusion21

Generate images from text prompts with customizable dimensions.

~3.5 GB

// Using image generation
let generator = try await KuzcoImageGenerator(model: .stableDiffusion21)
let image = try await generator.generate(
    prompt: "A serene mountain landscape at sunset",
    width: 512,
    height: 512
)

Choosing the Right Model

Use Case	Recommended Model	Why
General chat	`.qwen3_4b`	Best balance of quality and speed
Complex reasoning	`.qwen3_8b`	Larger context, better understanding
Quick responses	`.llama3_3b`	Fastest generation speed
Low memory devices	`.deepseekR1_1_5b`	Smallest memory footprint
Code generation	`.phi4_mini`	Strong at coding tasks
Image analysis	`.qwen3VL`	Best vision capabilities
Image generation	`.stableDiffusion21`	Only image gen option

Model Download

Models are downloaded on first use or can be pre-downloaded for better user experience:

// Check if model is available
let isAvailable = await KuzcoModelManager.shared.isModelAvailable(.qwen3_4b)
// Download with progress tracking
for try await progress in KuzcoModelManager.shared.downloadModel(.qwen3_4b) {
    print("Download: \(Int(progress.progress * 100))%")
}

See Model Management for detailed download and storage management.

Quick Start Text Generation