Configuration

Fine-tune model behavior with KuzcoConfiguration. Adjust parameters like temperature, max tokens, and sampling strategies.

Basic Configuration

Create a configuration and pass it when initializing a session:

import Kuzco
let config = KuzcoConfiguration(
temperature: 0.7,
maxTokens: 1024
)
let session = try await KuzcoSession(model: .qwen3_4b, configuration: config)

Configuration Properties

PropertyTypeDefaultDescription
temperatureFloat0.7Controls randomness. Lower = more focused, higher = more creative (0.0-2.0)
maxTokensInt2048Maximum tokens to generate in response
topKInt40Number of top tokens to consider for sampling
topPFloat0.9Nucleus sampling threshold (0.0-1.0)
repeatPenaltyFloat1.1Penalty for repeating tokens. Higher = less repetition
contextLengthInt?nilOverride context window size (model default if nil)
stopSequences[String][]Sequences that stop generation when encountered

Full Configuration Example

let config = KuzcoConfiguration(
temperature: 0.8,
maxTokens: 4096,
topK: 50,
topP: 0.95,
repeatPenalty: 1.2,
contextLength: 8192,
stopSequences: ["\n\n", "User:", "END"]
)
let session = try await KuzcoSession(model: .qwen3_8b, configuration: config)

Configuration Presets

Use built-in presets for common use cases:

.default

temp: 0.7, maxTokens: 2048

Balanced settings for general-purpose chat and completion tasks.

.creative

temp: 1.0, topP: 0.95

Higher randomness for creative writing, brainstorming, and storytelling.

.precise

temp: 0.3, topK: 20

Lower randomness for factual responses, Q&A, and technical queries.

.coding

temp: 0.2, repeatPenalty: 1.0

Optimized for code generation with high consistency and low repetition penalty.

.lowMemory

contextLength: 2048, maxTokens: 512

Reduced memory footprint for constrained environments.

.performance

maxTokens: 256, topK: 10

Optimized for fast responses with limited output length.

// Using presets
let creativeSession = try await KuzcoSession(
model: .qwen3_4b,
configuration: .creative
)
let codingSession = try await KuzcoSession(
model: .phi4_mini,
configuration: .coding
)
let lowMemorySession = try await KuzcoSession(
model: .deepseekR1_1_5b,
configuration: .lowMemory
)

Custom Presets

Extend presets with custom modifications:

// Start from a preset and modify
var config = KuzcoConfiguration.creative
config.maxTokens = 4096
config.stopSequences = ["THE END"]
let session = try await KuzcoSession(model: .qwen3_4b, configuration: config)

Understanding Temperature

Low Temperature (0.0 - 0.3)

More deterministic and focused. Best for factual queries, code, and when you need consistent outputs.

Medium Temperature (0.4 - 0.7)

Balanced creativity and coherence. Good default for general-purpose chat.

High Temperature (0.8 - 1.5)

More random and creative. Best for brainstorming, creative writing, and exploring diverse ideas.

// Factual response
let factual = KuzcoConfiguration(temperature: 0.1)
// Creative story
let story = KuzcoConfiguration(temperature: 1.2)
// Balanced chat
let chat = KuzcoConfiguration(temperature: 0.7)

Understanding Top-K and Top-P

These parameters control token sampling diversity:

Top-K Sampling

Limits selection to the K most likely tokens. Lower K = more focused, higher K = more diverse.

topK: 10 (focused) → topK: 100 (diverse)

Top-P (Nucleus) Sampling

Selects from the smallest set of tokens whose cumulative probability exceeds P. Adapts dynamically to context.

topP: 0.5 (focused) → topP: 0.95 (diverse)

Stop Sequences

Configure sequences that stop generation when encountered:

let config = KuzcoConfiguration(
stopSequences: [
"\n\nHuman:", // Stop at conversation turn
"---", // Stop at separator
"THE END", // Stop at story ending
"```" // Stop at code block end
]
)
let session = try await KuzcoSession(model: .qwen3_4b, configuration: config)
// Generation will stop when any stop sequence is encountered
let response = try await session.oneShot("Write a short poem")

Dynamic Configuration

Update configuration during a session:

let session = try await KuzcoSession(model: .qwen3_4b)
// Start with default settings
let response1 = try await session.oneShot("What is 2+2?")
// Switch to creative mode for the next prompt
session.updateConfiguration(.creative)
let response2 = try await session.oneShot("Write a haiku about coding")
// Or use custom configuration
session.updateConfiguration(KuzcoConfiguration(temperature: 0.1))
let response3 = try await session.oneShot("List the planets in order")