On-Device AI for Your iOS App

Ditch per-token API fees. Give your users unlimited AI access with one flat monthly cost. Ship in 3 lines of code.

On-device
No API fees
SwiftUI
main.swift
import Kuzco

let session = try await KuzcoSession(model: .qwen3_4b)

for try await partial in session.streamResponse(to: "Hello!") {
    print(partial.text, terminator: "")
}

Everything you need for on-device AI

Kuzco SDK provides a complete toolkit for running AI models locally on iOS devices. No cloud required.

Text Generation

Stream responses in real-time. Build chatbots on-device.

What is Swift?

Vision AI

Analyze images locally with vision models.

🐱
cat detected
98% confidence

Image Generation

Create images with Stable Diffusion.

Enter a prompt...

Privacy First

All processing on-device. Data never leaves.

🔒
100% On-Device
No data transmitted

Model Manager

Download and manage models easily.

Llama 3.2 1B✓ Ready
Stable Diffusion
67%
WhisperInstall

SwiftUI Ready

Drop-in components for chat UIs and more.

ChatView.swift
import SwiftUI
import Kuzco
var body: some View {
KuzcoChat()
}

Built for performance

Optimized models, maximum context, and real-time generation across all Apple devices.

AI Models

Text, vision, and image generation — constantly growing.

JanMarJunSepNow
Text & Vision (8)
Image Gen (3)
Total models11

Model Sizes

Optimized models from 1.1GB to 5GB.

DeepSeek R1
1.1
LLaMA 3.2
2
SmolVLM2
2.2
Phi 4 Mini
2.5
Gemma 3
2.5
Qwen 3 4B
2.7
SD 2.1
3
Qwen VL
4
Qwen 3 8B
5
Model sizes in GB

32K Context

8x more context than Apple Intelligence.

Apple Intelligence4K tokens
Kuzco (Qwen 3)33K tokens
8xlonger context

All Apple Devices

iPhone, iPad, Mac, and Vision Pro.

📱
iPhone
📱
iPad
💻
Mac
🥽
Vision Pro

Works Offline

No internet required. Run AI anywhere.

Airplane Mode?
Still works.

Faster Generation

Outperforms Apple Intelligence on-device.

Apple Intelligence~15 tok/s
Kuzco (Qwen3-4B)~22 tok/s
~50%faster
Benchmarked on iPhone 17

Build AI apps in minutes

Simple, intuitive APIs for text, image, and vision AI. All running locally on-device.

import Kuzco
// Create a session with your preferred model
let session = try await KuzcoSession(model: .qwen3_4b)
// Stream responses in real-time
for try await partial in session.streamResponse(to: "Explain Swift") {
print(partial.text, terminator: "")
if partial.isComplete {
print("Tokens: \(partial.usage?.totalTokens ?? 0)")
}
}
// Or get a complete response
let response = try await session.respond(to: "Hello!")

Get early access

Be among the first to build on-device AI apps with Kuzco SDK. No credit card required. We'll notify you when it's ready.

Free to start
No spam
Unsubscribe anytime