Configuration

Name: Kuzco
Author: Kuzco

Fine-tune model behavior with KuzcoConfiguration. Adjust parameters like temperature, max tokens, and sampling strategies.

Basic Configuration

Create a configuration and pass it when initializing a session:

import Kuzco
let config = KuzcoConfiguration(
    temperature: 0.7,
    maxTokens: 1024
)
let session = try await KuzcoSession(model: .qwen3_4b, configuration: config)

Configuration Properties

Property	Type	Default	Description
`temperature`	Float	0.7	Controls randomness. Lower = more focused, higher = more creative (0.0-2.0)
`maxTokens`	Int	2048	Maximum tokens to generate in response
`topK`	Int	40	Number of top tokens to consider for sampling
`topP`	Float	0.9	Nucleus sampling threshold (0.0-1.0)
`repeatPenalty`	Float	1.1	Penalty for repeating tokens. Higher = less repetition
`contextLength`	Int?	nil	Override context window size (model default if nil)
`stopSequences`	[String]	[]	Sequences that stop generation when encountered

Full Configuration Example

let config = KuzcoConfiguration(
    temperature: 0.8,
    maxTokens: 4096,
    topK: 50,
    topP: 0.95,
    repeatPenalty: 1.2,
    contextLength: 8192,
    stopSequences: ["\n\n", "User:", "END"]
)
let session = try await KuzcoSession(model: .qwen3_8b, configuration: config)

Configuration Presets

Use built-in presets for common use cases:

.default

temp: 0.7, maxTokens: 2048

Balanced settings for general-purpose chat and completion tasks.

.creative

temp: 1.0, topP: 0.95

Higher randomness for creative writing, brainstorming, and storytelling.

.precise

temp: 0.3, topK: 20

Lower randomness for factual responses, Q&A, and technical queries.

.coding

temp: 0.2, repeatPenalty: 1.0

Optimized for code generation with high consistency and low repetition penalty.

.lowMemory

contextLength: 2048, maxTokens: 512

Reduced memory footprint for constrained environments.

.performance

maxTokens: 256, topK: 10

Optimized for fast responses with limited output length.

// Using presets
let creativeSession = try await KuzcoSession(
    model: .qwen3_4b,
    configuration: .creative
)
let codingSession = try await KuzcoSession(
    model: .phi4_mini,
    configuration: .coding
)
let lowMemorySession = try await KuzcoSession(
    model: .deepseekR1_1_5b,
    configuration: .lowMemory
)

Custom Presets

Extend presets with custom modifications:

// Start from a preset and modify
var config = KuzcoConfiguration.creative
config.maxTokens = 4096
config.stopSequences = ["THE END"]
let session = try await KuzcoSession(model: .qwen3_4b, configuration: config)

Understanding Temperature

Low Temperature (0.0 - 0.3)

More deterministic and focused. Best for factual queries, code, and when you need consistent outputs.

Medium Temperature (0.4 - 0.7)

Balanced creativity and coherence. Good default for general-purpose chat.

High Temperature (0.8 - 1.5)

More random and creative. Best for brainstorming, creative writing, and exploring diverse ideas.

// Factual response
let factual = KuzcoConfiguration(temperature: 0.1)
// Creative story
let story = KuzcoConfiguration(temperature: 1.2)
// Balanced chat
let chat = KuzcoConfiguration(temperature: 0.7)

Understanding Top-K and Top-P

These parameters control token sampling diversity:

Top-K Sampling

Limits selection to the K most likely tokens. Lower K = more focused, higher K = more diverse.

topK: 10 (focused) → topK: 100 (diverse)

Top-P (Nucleus) Sampling

Selects from the smallest set of tokens whose cumulative probability exceeds P. Adapts dynamically to context.

topP: 0.5 (focused) → topP: 0.95 (diverse)

Stop Sequences

Configure sequences that stop generation when encountered:

let config = KuzcoConfiguration(
    stopSequences: [
        "\n\nHuman:",    // Stop at conversation turn
        "---",            // Stop at separator
        "THE END",        // Stop at story ending
        "```"          // Stop at code block end
    ]
)
let session = try await KuzcoSession(model: .qwen3_4b, configuration: config)
// Generation will stop when any stop sequence is encountered
let response = try await session.oneShot("Write a short poem")

Dynamic Configuration

Update configuration during a session:

let session = try await KuzcoSession(model: .qwen3_4b)
// Start with default settings
let response1 = try await session.oneShot("What is 2+2?")
// Switch to creative mode for the next prompt
session.updateConfiguration(.creative)
let response2 = try await session.oneShot("Write a haiku about coding")
// Or use custom configuration
session.updateConfiguration(KuzcoConfiguration(temperature: 0.1))
let response3 = try await session.oneShot("List the planets in order")

Vision AI Model Management