evolve

package
v0.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 26, 2026 License: None detected not legal advice Imports: 0 Imported by: 0

Documentation

Overview

Package evolve provides the v0.2+ self-evolution interface matrix for Flyto Agent: LLM-generated candidates, offline log replay, versioned parameter storage, and shadow-mode validation.

API Consumption Shapes

All ten contracts here are form 3 (synchronous callback) under the three Flyto API shapes (see `docs/api-reference.md` "API 消费形态 / API Consumption Patterns"). The engine's evolve loop calls each interface synchronously in sequence: Generator produces a candidate -> Evaluator scores it -> Reflector critiques -> ApprovalFunc gates -> ParameterEvolver proposes -> ShadowRunner validates in shadow mode -> ParameterStore persists. LogSource / LogReplayer / FeedbackChannel are pull interfaces the evolve loop drains during offline replay and feedback ingestion. Consumers implement any or all of them to plug alternative storage, models, or evaluation policies; the interfaces are designed for coexistence (see "Multi-Implementation Coexistence" below).

API 消费形态:

本包 10 个契约在 Flyto 三种 API 形态下全部属于形态三 (同步回调, 见 `docs/api-reference.md` "API 消费形态 / API Consumption Patterns"). 引擎的 evolve loop 顺序同步调用: Generator 产候选 -> Evaluator 打分 -> Reflector 反思 -> ApprovalFunc 把关 -> ParameterEvolver 提议 -> ShadowRunner 影子模式验证 -> ParameterStore 持久化. LogSource / LogReplayer / FeedbackChannel 是 pull 接口, evolve loop 在离线回放和反馈摄入时读取. 消费者可实现任意子集来替换存储 / 模型 / 评估策略; 接口设计为可并存 (见下文 "Multi-Implementation Coexistence").

Interface Topology

Ten contracts (9 interfaces + 1 func type, canonical definitions in interfaces.go):

Core loop        : Generator / Evaluator / Reflector / ApprovalFunc (func)
Parameter mgmt   : ParameterStore / ParameterEvolver
Log / feedback   : LogReplayer / LogSource / FeedbackChannel
Risk mitigation  : ShadowRunner

Reference Implementations

Shipped in this package (file-backed / in-process defaults). Alternative implementations (e.g. SQL-backed ParameterStore) are expected to live in the platform layer or consumer code and coexist with these via interface swap.

Generator         -> generator_llm.go          (LLMGenerator)
Evaluator         -> evaluator_impls.go        (Func / Weighted / Rule)
Reflector         -> reflector_impls.go, self_reflector.go
ApprovalFunc      -> evolve.go                 (Config.ApprovalFunc)
ParameterStore    -> parameter_store_file.go
ParameterEvolver  -> parameter_evolver_default.go
LogReplayer       -> default_log_replayer.go
LogSource         -> log_source_file.go
FeedbackChannel   -> feedback_channel_file.go
ShadowRunner      -> shadow_runner_default.go

Provider Bridge

llm_adapter_flyto.go ships FlytoLLMClient, a thin adapter that wraps any flyto.ModelProvider (anthropic / openai / minimax / gemini / ollama / lmstudio / openrouter) as an LLMClient consumable by LLMGenerator. Event aggregation prefers TextEvent as authoritative and falls back to TextDeltaEvent when the provider does not emit complete-block events, avoiding double-counting on providers that emit both. ErrorEvent is wrapped via fmt.Errorf("%w: ...", ErrLLMFailed) so callers can use errors.Is.

Note: flyto.Request (the cross-provider contract) does not expose Temperature, so LLMCallOpts.Temperature is intentionally ignored by this adapter. Set temperature at the provider factory Config (e.g. anthropic.Config) when per-call temperature control is needed.

End-to-End Example

See core/examples/evolve_closed_loop/main.go -- all nine interfaces chained in a single process, demonstrating the full generate -> evaluate -> reflect -> propose -> shadow loop against fixture feedback.

Domain Agnostic

All interfaces exchange business data as any / string / float64. The engine makes no assumption about any industry schema. Concrete data-source wiring (orders / settlement / complaints / KPI streams) belongs to the consumer layer. Rationale: docs/evolve-strategy.md and the "data ingress boundary" note in memory project_architecture_decisions.md.

Multi-Implementation Coexistence

Every interface is expected to have multiple implementations side by side. For example the file-backed ParameterStore in this package serves CLI / TUI workflows, while platform/ is free to ship a SQL-backed ParameterStore for multi-tenant deployments. Interface contracts guarantee swap without code changes at call sites.

Strategic Background

docs/evolve-strategy.md   -- full strategy, Round 1/2/3 design rounds,
                             merge / absorb decisions, v0.2-v3.0 roadmap
CHANGELOG.md              -- v0.2-dev delivery summary (Unreleased)

Academic References

The design draws on and cites the following arXiv works:

2508.07407  Comprehensive Survey of Self-Evolving AI Agents
2507.21046  Survey of Self-Evolving Agents
2509.04642  Maestro: Joint Graph & Config Optimization
2512.09108  Artemis / Evolving Excellence (mixed-variable optimization)
2406.16218  Trace: Next AutoDiff (NeurIPS 2024)

Package evolve 实现 Agent 自进化能力.

战略定位 (详见 docs/evolve-strategy.md): evolve = 领域无关引擎 + 行业专属消费层 + ML 真实数据反射器 的飞轮. 对齐 2025 两篇权威综述 arXiv:2507.21046 和 arXiv:2508.07407, 填补 学术承认的 "业务 KPI 闭环系统级演化" 空白.

已解决 (credit assignment / 稳定性悖论 / 冷启动) 是研究级开放问题, 不是工程 TODO. 具体见 docs/evolve-strategy.md §9 §10.

设计理念:

  • 自进化可追溯可撤销, 每次参数变更强制带 reason 入审计链
  • 进化产物跨会话持久化, 多实现共存 (文件 / SQL / 自定义)
  • 消费层审批 / 拒绝任何进化行为 (ApprovalFunc)
  • 领域无关 = 全领域支持, 不内嵌任何行业 schema

v0.1 三大 struct (保留, 未来改造为下文接口矩阵的具体实现):

  1. ToolBuilder - 运行时定义新工具 (Agent 自己写工具代码)
  2. SkillLearner - 将成功工作流保存为可复用技能
  3. SelfReflector - 分析自身表现并自适应调整

v0.2+ 接口矩阵 (定义在 interfaces.go, 10 个 interface + 1 个 func):

核心 loop   : Generator / Evaluator / Reflector / ApprovalFunc(func)
参数管理    : ParameterStore / ParameterEvolver
日志 / 反馈 : LogReplayer / LogSource / FeedbackChannel
风险缓解    : ShadowRunner

接口之间协作关系 (fast loop):

input → Generator → []Candidate → Evaluator → fitness 排序 → Top-K → 审批 → 执行

接口之间协作关系 (slow loop):

LogSource → LogReplayer → Reflector → ParameterEvolver.Propose
                                       → ApprovalFunc → ParameterStore.Set

Index

Constants

This section is empty.

Variables

View Source
var ErrCandidateParseFailed = errors.New("evolve: candidate parse failed")

ErrCandidateParseFailed is returned when Generator cannot extract a JSON array of candidates from the LLM response. The caller decides whether to retry, relax parsing, or fall back to a static candidate set.

View Source
var ErrEntityRequired = errors.New("evolve: entity required")

ErrEntityRequired is returned by FeedbackChannel.Report / Query when entity is the empty string. Entity distinguishes data streams (carrier / driver / route / ...), collapsing it into a single bucket would poison Query results.

View Source
var ErrInvalidK = errors.New("evolve: K must be > 0")

ErrInvalidK is returned by Generator.Generate when K <= 0.

View Source
var ErrLLMFailed = errors.New("evolve: LLM call failed")

ErrLLMFailed wraps errors from the underlying LLM Complete call so callers can distinguish LLM failure from parse failure via errors.Is.

View Source
var ErrMetricRequired = errors.New("evolve: metric required")

ErrMetricRequired is returned by FeedbackChannel.Report when metric is the empty string. Query accepts empty metric as "all metrics".

View Source
var ErrParameterLocked = errors.New("evolve: parameter is locked")

ErrParameterLocked is returned when Set is called on a locked key. Unlock first, or use Rollback (allowed on locked keys by design).

View Source
var ErrParameterNotFound = errors.New("evolve: parameter not found")

ErrParameterNotFound is returned when Get / History / Rollback is called on a non-existent key.

View Source
var ErrReasonRequired = errors.New("evolve: reason required for audit chain")

ErrReasonRequired is returned when Set / Rollback / Lock / Unlock is called with empty reason. Audit chain requires human-readable reason.

View Source
var ErrVersionNotFound = errors.New("evolve: version not found")

ErrVersionNotFound is returned when Rollback targets a non-existent version.

Functions

This section is empty.

Types

type Adjustment

type Adjustment struct {
	// Type 调整类型
	Type AdjustmentType `json:"type"`

	// Description 调整描述
	Description string `json:"description"`

	// Target 调整目标(工具名,提示词片段等)
	Target string `json:"target"`

	// Suggestion 具体建议
	Suggestion string `json:"suggestion"`

	// Priority 优先级 1-5(5 最高)
	Priority int `json:"priority"`

	// Applied 是否已应用
	Applied bool `json:"applied"`
}

Adjustment 是一个自适应调整建议.

type AdjustmentType

type AdjustmentType string

AdjustmentType 调整类型.

const (
	// AdjustPrompt 调整系统提示词
	AdjustPrompt AdjustmentType = "prompt"

	// AdjustToolPriority 调整工具优先级/偏好
	AdjustToolPriority AdjustmentType = "tool_priority"

	// AdjustContext 调整上下文预加载策略
	AdjustContext AdjustmentType = "context"

	// AdjustWorkflow 调整工作流程
	AdjustWorkflow AdjustmentType = "workflow"

	// AdjustLesson 记录一个教训
	AdjustLesson AdjustmentType = "lesson"
)

type AggregatedStats

type AggregatedStats struct {
	Count         int       // events whose Feedback matched this (entity, metric)
	Sum           float64   // sum of Feedback.Value
	Mean          float64   // Sum / Count; 0 when Count == 0
	Min, Max      float64   // bounds of observed Feedback.Value; 0 when Count == 0
	PendingCount  int       // events whose Feedback was nil for this entity
	LastTimestamp time.Time // most recent event timestamp (decision or feedback)
}

AggregatedStats summarises a stream of feedback events for one entity+metric pair. Returned by AggregatorReflector.Stats() as a pull-form snapshot (value copy, safe for external consumers to mutate).

Sum sits in scan-baseline as dead because no in-tree code reads it after AggregatorReflector writes it -- the consumer is an external watch-only panel / operator dashboard that calls Stats() and renders rolling totals. test lock is in reflector_impls_test.go.

AggregatedStats 汇总单条 (entity, metric) 的反馈事件流. 由 AggregatorReflector.Stats() 以 pull 形态返回快照 (值拷贝, 外部消费方自由 修改不污染内部状态).

Sum 在 scan-baseline 里标 dead 是因为 core 内无 reader -- 消费者是外部 watch-only 面板 / 运营 dashboard, 调 Stats() 取滚动累加展示. test 锁在 reflector_impls_test.go.

type AggregatorReflector

type AggregatorReflector struct {
	// contains filtered or unexported fields
}

AggregatorReflector is the zero-ML reference Reflector: it consumes ReplayEvent stream and maintains rolling descriptive statistics keyed by (entity, metric). Stats are read-only queryable; the reflector does not push back into ParameterEvolver or ParameterStore.

Why this is a useful ref impl even without ML: A staged rollout of evolve typically runs a "watch-only" phase where the operator just wants to see "what would the reflector see?" A stats view is a cheap way to get there and doubles as the input signal for an eventual ML reflector. The interface contract for Reflector makes no assumption that reflection has to trigger writes.

Concurrency:

  • OnEvent acquires a write lock for the duration of one update. Contention is minor under typical replay rates (hundreds of events/sec).
  • Stats / Entities / Metrics / Reset take RLock / Lock as needed.
  • All exported reads return copies; internal maps are never leaked.

func NewAggregatorReflector

func NewAggregatorReflector() *AggregatorReflector

NewAggregatorReflector returns an empty reflector.

func (*AggregatorReflector) Entities

func (r *AggregatorReflector) Entities() []string

Entities returns every known entity, sorted.

func (*AggregatorReflector) Metrics

func (r *AggregatorReflector) Metrics(entity string) []string

Metrics returns every known metric for the given entity, sorted. The pending pseudo-metric ("") is excluded.

func (*AggregatorReflector) OnEvent

func (r *AggregatorReflector) OnEvent(ctx context.Context, event ReplayEvent) error

OnEvent implements Reflector.

Event classification:

  • event.Feedback == nil: counted as pending under key (entity, "") so the operator can see decisions still awaiting feedback.
  • event.Feedback != nil: contributes to (entity, metric) count / sum / min / max / mean / lastTs. PendingCount of the matched cell is also decremented up to zero (first-touch pairing "resolved" the earlier pending event), but never below zero -- feedback arriving before any pending event is legal (e.g. external feedback ingest).

func (*AggregatorReflector) Reset

func (r *AggregatorReflector) Reset()

Reset drops all aggregated state.

func (*AggregatorReflector) Stats

func (r *AggregatorReflector) Stats(entity, metric string) AggregatedStats

Stats returns a snapshot of (entity, metric) statistics. Unknown pair returns the zero AggregatedStats (all fields zero) with no error. metric="" returns the pending-only view for entity (Count=0, Sum=0, PendingCount filled).

type ApprovalFunc

type ApprovalFunc func(ctx context.Context, proposal *EvolutionProposal) (bool, error)

ApprovalFunc 是进化行为的审批回调. 消费层实现此函数来决定是否批准 Agent 的自进化行为. 这是安全边界 -- Agent 不能在没有人类审批的情况下改造自己. 精妙之处(CLEVER): 进化审批回调--Agent 不能自行决定是否进化,必须经过人类审批. 这是整个自进化系统的安全边界:即使 Agent 写出了完美的工具代码, 没有人类批准就不会被注册到引擎中.approvalFunc 为 nil 时自动拒绝所有提案(安全默认值).

type Candidate

type Candidate struct {
	ID      string
	Payload any
	Meta    map[string]any
}

Candidate 是 Generator 产出的单个候选方案. Meta 携带生成时的元数据 (temperature / model / role), 供 Evaluator 和 ParameterEvolver 回溯.

type CandidateSampler

type CandidateSampler func(ctx context.Context, traffic float64) ([]Candidate, error)

CandidateSampler supplies the test set for a shadow run. traffic is the caller-requested coverage in [0, 1]; the sampler interprets it (hash partition, time-window slice, random subsample, fixture replay) and returns the selected Candidates.

Returning nil/empty is legal and produces a zero-SampleSize ShadowResult (no data, no divergence).

type Change

type Change struct {
	Version   int
	Value     any
	Reason    string
	Timestamp time.Time
	Author    string
}

Change 是一次参数变更的历史记录. Author 记录改动发起方 (evolver id / 人工 user / rollback / lock / unlock).

type ChangeEvent

type ChangeEvent struct {
	Key    string
	Change Change
	IsLock bool
}

ChangeEvent is what ParameterStore.Watch pushes to subscribers. IsLock=true marks a Lock/Unlock event (Change.Value is nil in that case); otherwise it's a Set/Rollback. Change carries the full audit row (Version / Value / Reason / Timestamp / Author) so subscribers do not need a follow-up History() call.

Consumers are external: dashboards rendering the audit timeline, platform services fanning Watch out to tenants, test harnesses asserting ordering. scan-baseline.json lists Change / IsLock as dead because no in-tree code reads them after the test harness verifies forward-propagation in parameter_store_file_test.go -- expected pull-API state.

ChangeEvent 是 ParameterStore.Watch 推送给订阅方的事件. IsLock=true 标记 Lock/Unlock (此时 Change.Value 为 nil), 否则是 Set/Rollback. Change 带完整 审计行 (Version/Value/Reason/Timestamp/Author), 订阅方不用再回 History().

消费方在 core 之外: 审计时间线 dashboard / platform 把 Watch 分发给多租户 / 测试检查顺序. scan-baseline.json 把 Change / IsLock 列 dead 是因为 core 内无 reader -- test 在 parameter_store_file_test.go 锁了 forward, 外部 pull 消费是预期形态.

type Config

type Config struct {
	// StoreDir 进化产物的存储目录
	StoreDir string

	// ApprovalFunc 审批回调(nil 则自动拒绝所有提案)
	ApprovalFunc ApprovalFunc

	// AutoApproveReadOnly, when true, lets read-only evolutions bypass
	// ApprovalFunc. "Read-only" means the evolution only adds declarative
	// content that the engine will read later -- currently limited to
	// EvolveNewSkill (a markdown skill file). Code-executing evolutions
	// (EvolveNewTool), workflow mutations (EvolveOptimize), and runtime
	// self-tuning (EvolveSelfAdjust) are NEVER auto-approved by this flag,
	// because their content can escape the read-only boundary.
	//
	// This is a security ergonomics trade-off: skill learning is the
	// high-frequency low-risk path (agent reads its own past lessons),
	// and routing every skill proposal through a human gate either
	// trains operators to click-through blindly (worse than automation)
	// or blocks the loop entirely. Everything else stays locked.
	//
	// AutoApproveReadOnly 为 true 时, 只读进化绕过 ApprovalFunc. "只读"
	// 指进化只添加引擎稍后会读的声明性内容 -- 当前仅限 EvolveNewSkill
	// (markdown 技能文件). 会执行代码的进化 (EvolveNewTool) / 改工作流
	// (EvolveOptimize) / 运行时自调 (EvolveSelfAdjust) 永不被此标志
	// 自动批准, 因其内容可能突破只读边界.
	//
	// 这是安全人机工程权衡: 学技能是高频低风险路径 (agent 读自己以往
	// 教训), 每条技能提案过人工必导致运维盲点"无脑通过" (比自动化还糟)
	// 或直接阻塞 loop. 其他全留锁.
	AutoApproveReadOnly bool

	// MaxToolsPerSession 单次会话最多创建的工具数
	MaxToolsPerSession int

	// MaxSkillsPerSession 单次会话最多学习的技能数
	MaxSkillsPerSession int

	// Observer 可观测性接口(可选).
	// 升华改进(ELEVATED): 通过构造函数注入--进化系统的生命周期与 Engine 一致,
	// 不会在运行中更换 observer,所以构造时注入最简洁.
	// 替代方案:SetObserver Setter 注入(多一个状态变更点,不如构造时一次性注入清晰).
	Observer flyto.EventObserver

	// SecretGuard 秘密扫描(可选).
	// 设置后,ToolBuilder 在保存工具脚本前扫描内容,阻止含 API key 的脚本持久化.
	// nil 时不扫描(向后兼容).
	SecretGuard security.SecretGuard
}

Config 是 Evolver 的配置.

type CreateToolTool

type CreateToolTool struct {
	// contains filtered or unexported fields
}

CreateToolTool 是"创建新工具"的工具. Agent 调用此工具来定义一个新的运行时工具.

func NewCreateToolTool

func NewCreateToolTool(evolver *Evolver, cwd string) *CreateToolTool

NewCreateToolTool 创建 CreateTool 工具.

func (*CreateToolTool) Description

func (t *CreateToolTool) Description(ctx context.Context) string

func (*CreateToolTool) Execute

func (t *CreateToolTool) Execute(ctx context.Context, input json.RawMessage, progress func(float64, string)) (*ToolResult, error)

func (*CreateToolTool) InputSchema

func (t *CreateToolTool) InputSchema() json.RawMessage

func (*CreateToolTool) Name

func (t *CreateToolTool) Name() string

type DefaultLogReplayer

type DefaultLogReplayer struct {
	// contains filtered or unexported fields
}

DefaultLogReplayer is the reference LogReplayer implementation. It composes a LogSource and a FeedbackChannel; either can be any implementation (file / SQL / object-storage), so the struct is not named "FileLogReplayer".

Pairing strategy: For every LogEntry read from the source, the replayer looks up the entity's feedback history and pairs it with at most one feedback per metric. The "first-touch" rule picks the earliest feedback whose Timestamp falls in [entry.Timestamp, entry.Timestamp + window). Zero matches produce a single ReplayEvent with Feedback=nil (contract: nil = KPI not yet arrived).

CLEVER: feedback lookup is lazy per-entity. The first time an entity is seen in the stream we call FeedbackChannel.Query once and cache the result for the rest of the replay. A 1000-entity replay makes 1000 queries total, not 1000 per decision. Cache lives for the duration of one Replay call.

LEGACY: aggregation is fixed at "first-touch per metric". Mean / last / median are deliberately not surfaced -- consumers that need them can compute from raw feedback inside their Reflector. Adding aggregation knobs here would pollute the interface without clear v0.3 demand.

Reflector dispatch:

  • Serial in registration order. Reflectors may hold in-process state; parallel dispatch would force every implementor to be thread-safe.
  • OnEvent errors are logged but not returned. The contract says the replayer does not stop on reflector errors.
  • The reflector slice is snapshotted at the start of each Replay, so a RegisterReflector call mid-replay does not affect the current run.

func NewDefaultLogReplayer

func NewDefaultLogReplayer(source LogSource, feedback FeedbackChannel, opts ...ReplayerOption) *DefaultLogReplayer

NewDefaultLogReplayer constructs a replayer composing source and feedback. Both are required; passing nil is a programming error and panics.

func (*DefaultLogReplayer) RegisterReflector

func (d *DefaultLogReplayer) RegisterReflector(r Reflector)

RegisterReflector adds r to the reflector list. Safe to call concurrently. Duplicate registrations are allowed; the reflector will receive each event as many times as it was registered (the caller's responsibility).

func (*DefaultLogReplayer) Replay

func (d *DefaultLogReplayer) Replay(ctx context.Context, from, to time.Time, filter FilterFunc) error

Replay implements LogReplayer. Scans LogSource for [from, to], pairs each entry with feedback, and dispatches to every registered Reflector.

type DefaultParameterEvolver

type DefaultParameterEvolver struct {
	// contains filtered or unexported fields
}

DefaultParameterEvolver is the reference ParameterEvolver.

Propose is a thin delegate to ProposerFunc. Apply is the gatekeeper: approved=true routes the value into ParameterStore.Set; approved=false is a no-op on the store (rejected proposals must not pollute the version chain) but both branches emit an audit log line so an operator can see every decision that was offered, regardless of outcome.

Relation to two-phase design: The interface split Propose/Apply exists to insert gates between them (human approval, ShadowRunner compare, batch approval). This impl keeps that contract intact -- there is no side channel from Propose to Apply. The caller is expected to call Apply once they have decided.

func NewDefaultParameterEvolver

func NewDefaultParameterEvolver(store ParameterStore, proposer ProposerFunc, opts ...EvolverOption) (*DefaultParameterEvolver, error)

NewDefaultParameterEvolver constructs the evolver. Both store and proposer are required; either nil is a programming error and rejected at construction rather than deferred to first call.

func (*DefaultParameterEvolver) Apply

func (e *DefaultParameterEvolver) Apply(ctx context.Context, key string, value any, approved bool, reason string) error

Apply implements ParameterEvolver.

Semantics:

  • approved=true: delegates to ParameterStore.Set. Empty reason surfaces the store's ErrReasonRequired verbatim (we do not fabricate a reason on behalf of the caller -- audit trails must reflect human intent).
  • approved=false: no-op on the store. An operator rejected the proposal; the version chain stays clean.

Both branches emit an audit log line so the decision itself is observable even when the store is not touched.

func (*DefaultParameterEvolver) Propose

func (e *DefaultParameterEvolver) Propose(ctx context.Context, key string, evidence []Feedback) (any, float64, error)

Propose implements ParameterEvolver. Delegates to the injected proposer. Does not read or write ParameterStore: the interface contract says Propose is read-only relative to storage.

type DefaultShadowRunner

type DefaultShadowRunner struct {
	// contains filtered or unexported fields
}

DefaultShadowRunner is the reference ShadowRunner. It runs each sampled test Candidate twice through the Evaluator -- once with the baseline parameter value, once with the proposed candidate value -- and reports per-cohort average fitness, per-dimension mean breakdown, and a divergence score in [0, 1].

What this does NOT do (by design):

  • It does not touch ParameterStore.Set. Shadow comparison is read-only relative to the production parameter chain. The caller decides, based on the returned ShadowResult, whether to call ParameterEvolver.Apply.
  • It does not decide how to sample. The caller supplies CandidateSampler.
  • It does not parallelise scoring. Evaluator implementations (notably FuncEvaluator wrapping caller code) are not guaranteed goroutine-safe. A caller that needs throughput runs multiple RunShadow calls in parallel from the outside.

Divergence formula:

div = min(|avg_baseline - avg_candidate| / max(|avg_baseline|, |avg_candidate|, 1e-9), 1.0)

CLEVER: relative divergence (not absolute) so fitness scales (1000 vs 1001 = 0.1% diff) are comparable with small-scale scales (0.1 vs 0.5 = 400% diff). Absolute diff would produce wildly misleading divergence values across domains.

func NewDefaultShadowRunner

func NewDefaultShadowRunner(store ParameterStore, evaluator Evaluator, sampler CandidateSampler, applier ParamApplier) (*DefaultShadowRunner, error)

NewDefaultShadowRunner constructs the runner. All four dependencies are required; any nil is rejected at construction so the first RunShadow call does not explode with a nil-deref.

func (*DefaultShadowRunner) RunShadow

func (r *DefaultShadowRunner) RunShadow(ctx context.Context, baselineKey string, candidateValue any, traffic float64) (ShadowResult, error)

RunShadow implements ShadowRunner.

type Evaluator

type Evaluator interface {
	Score(ctx context.Context, c Candidate) (fitness float64, breakdown map[string]float64, err error)
}

Evaluator 对 Candidate 打 fitness 分, 返回总分和分项 breakdown.

breakdown 的三个用途:

  1. 调试 -- 哪个维度低导致总分低
  2. 长反射 -- 发现某维度预测不准 (MetaEvaluator 未来的输入)
  3. 透明度面板 -- 给客户展示评分依据

breakdown 可为 nil (单维度评分时).

type EvolutionProposal

type EvolutionProposal struct {
	ID          string         `json:"id"`
	Type        EvolutionType  `json:"type"`
	Title       string         `json:"title"`
	Description string         `json:"description"`
	Rationale   string         `json:"rationale"` // 为什么需要这个进化
	Content     any            `json:"content"`   // 具体内容(工具定义/技能/配置变更)
	CreatedAt   time.Time      `json:"created_at"`
	Status      ProposalStatus `json:"status"`
}

EvolutionProposal 是一个进化提案. Agent 想要进化时,先生成提案,等待审批后才执行.

type EvolutionStore

type EvolutionStore struct {
	// contains filtered or unexported fields
}

EvolutionStore 是进化产物的持久化存储.

func NewEvolutionStore

func NewEvolutionStore(dir string) (*EvolutionStore, error)

NewEvolutionStore 创建存储.

func (*EvolutionStore) ListProposals

func (s *EvolutionStore) ListProposals() ([]*EvolutionProposal, error)

ListProposals 列出所有提案.

func (*EvolutionStore) SaveProposal

func (s *EvolutionStore) SaveProposal(p *EvolutionProposal) error

SaveProposal 保存提案到磁盘.

type EvolutionType

type EvolutionType string

EvolutionType 是进化类型枚举.

const (
	EvolveNewTool    EvolutionType = "new_tool"    // 创建新工具
	EvolveNewSkill   EvolutionType = "new_skill"   // 学习新技能
	EvolveOptimize   EvolutionType = "optimize"    // 优化现有工作流
	EvolveSelfAdjust EvolutionType = "self_adjust" // 自适应调整
)

type Evolver

type Evolver struct {
	// contains filtered or unexported fields
}

Evolver 是自进化系统的主控制器. 它协调 ToolBuilder,SkillLearner,SelfReflector 三大子系统.

func NewEvolver

func NewEvolver(cfg *Config) (*Evolver, error)

NewEvolver 创建自进化系统.

func (*Evolver) History

func (e *Evolver) History() ([]*EvolutionProposal, error)

History 返回进化历史.

func (*Evolver) Propose

func (e *Evolver) Propose(ctx context.Context, proposal *EvolutionProposal) error

Propose 提交一个进化提案. Agent 调用此方法表达"我想进化"的意图,由审批流程决定是否执行.

func (*Evolver) Reflector

func (e *Evolver) Reflector() *SelfReflector

Reflector 返回自反思器.

func (*Evolver) SkillLearner

func (e *Evolver) SkillLearner() *SkillLearner

SkillLearner 返回技能学习器.

func (*Evolver) SystemPromptFragment

func (e *Evolver) SystemPromptFragment() string

SystemPromptFragment 返回注入到系统提示中的进化能力说明. 让 Agent 知道自己有创建工具,学习技能,自我反思的能力.

func (*Evolver) ToolBuilder

func (e *Evolver) ToolBuilder() *ToolBuilder

ToolBuilder 返回工具构建器(供 Engine 集成用).

type EvolverOption

type EvolverOption func(*DefaultParameterEvolver)

EvolverOption configures a DefaultParameterEvolver.

func WithAuditLogger

func WithAuditLogger(fn func(format string, args ...any)) EvolverOption

WithAuditLogger routes every Apply call (approved AND rejected) through fn so consumers can wire an AuditSink (DB / Loki / SIEM). Default: log.Printf. Passing nil is a no-op: silent audit would be a dangerous footgun for a governance-critical operation, so the default stays in place.

Named WithAuditLogger rather than WithLogger because WithLogger is already taken by DefaultLogReplayer in the same package. "Audit" is also the more accurate semantic for this sink: Apply decisions must be auditable.

type FeatureFunc

type FeatureFunc func(Candidate) float64

FeatureFunc maps a Candidate to a single scalar feature value. Pure; any error should be treated as a programmer bug (the caller picked a feature that cannot be evaluated on this candidate shape). If your feature genuinely can fail at runtime, route through FuncEvaluator instead.

type Feedback

type Feedback struct {
	Timestamp  time.Time
	Entity     string
	Metric     string
	Value      float64
	Confidence float64
	Meta       map[string]any
}

Feedback 是一条业务 KPI 反馈记录.

type FeedbackChannel

type FeedbackChannel interface {
	Report(ctx context.Context, entity string, metric string, value float64, confidence float64, meta map[string]any) error
	Query(ctx context.Context, entity string, since time.Time, metric string) ([]Feedback, error)
}

FeedbackChannel 是业务 KPI 反馈的双向通道.

合并原 FitnessSignal + FeedbackSource: 早期设计 FitnessSignal 负责 push, FeedbackSource 负责 pull. 两者是 同一份数据的两种访问模式, 合并减少概念负担. 实现可任选或都支持.

领域无关: Metric 是字符串 (如 "carrier_on_time_rate"), Value 是 float64 标量. 非标量 KPI 通过 Meta 传递. 引擎不假设任何业务 KPI 结构.

type FileFeedbackChannel

type FileFeedbackChannel struct {
	// contains filtered or unexported fields
}

FileFeedbackChannel is the local-filesystem implementation of FeedbackChannel.

On-disk layout:

<root>/<entity-escaped>/2026-04-18.jsonl
<root>/<entity-escaped>/2026-04-19.jsonl

Each line is one Feedback record serialised as JSON. Two-level sharding (entity first, then UTC day) matches the Query access pattern: Query is always scoped to a single entity and a time window, so we never open files belonging to other entities.

Entity is escaped with url.PathEscape (same convention as FileParameterStore keys), so entity values containing slashes / dots / whitespace / CJK characters map to a single safe directory name.

Concurrency: see the type-level comment on FileLogSource; the same POSIX O_APPEND atomicity argument applies here. Writers do not take a lock; readers never observe a partial line.

LEGACY: Query does an in-process linear scan over the matching files. Platform-layer SQL FeedbackChannel uses indexed time + metric queries; for v0.3 MVP volumes (client per day < ~10k feedback events) the scan is fast enough and avoids an external dependency.

func NewFileFeedbackChannel

func NewFileFeedbackChannel(root string) (*FileFeedbackChannel, error)

NewFileFeedbackChannel returns a feedback channel rooted at dir. Creates the directory if missing.

func (*FileFeedbackChannel) Query

func (c *FileFeedbackChannel) Query(ctx context.Context, entity string, since time.Time, metric string) ([]Feedback, error)

Query implements FeedbackChannel. Returns all records for the entity whose Timestamp >= since. Empty metric matches every metric; a non-empty metric is matched exactly. Results are sorted by Timestamp ascending.

An unknown entity (directory does not exist) is not an error: new entities legitimately have no history. Empty string entity is a caller error and returns ErrEntityRequired.

func (*FileFeedbackChannel) Report

func (c *FileFeedbackChannel) Report(ctx context.Context, entity, metric string, value, confidence float64, meta map[string]any) error

Report implements FeedbackChannel. Appends one Feedback record to the current UTC day file under the entity's directory. Timestamp is stamped with time.Now().UTC() so the chosen file and the stored timestamp agree; callers cannot override this (if backdating is needed it belongs in Meta, not the canonical Timestamp).

type FileLogSource

type FileLogSource struct {
	// contains filtered or unexported fields
}

FileLogSource is the local-filesystem implementation of LogSource.

On-disk layout:

<root>/2026-04-18.jsonl    # one file per UTC calendar day
<root>/2026-04-19.jsonl

Each line is one LogEntry serialised as JSON. Date-sharded so that Read(from, to) only opens files covering the requested time window rather than scanning the whole corpus.

CLEVER: POSIX O_APPEND guarantees atomicity for writes smaller than PIPE_BUF (4 KiB on Linux). A single LogEntry line is far below that in the expected shape, so concurrent writers do not need a mutex and readers will never observe a half-written line. This is the append-only counterpart to FileParameterStore's rename-based atomicity.

LEGACY: the reader does an in-process linear scan. v0.3 MVP log volume is expected to stay well under 1M lines/day per client, so a full scan is acceptable. An index-backed implementation belongs in the platform-layer SQL LogSource; adding an index here would drag in a CGO/boltdb dependency that CLAUDE.md rule 8 (zero external deps) forbids.

Append is a method on *FileLogSource, not a member of the LogSource interface. Different backends (SQL INSERT, object-storage upload) have wildly different write semantics; forcing Append into the interface would produce awkward wrappers. Callers who need to write hold a *FileLogSource directly.

func NewFileLogSource

func NewFileLogSource(root string) (*FileLogSource, error)

NewFileLogSource returns a log source rooted at dir. Creates the directory if missing.

func (*FileLogSource) Append

func (s *FileLogSource) Append(ctx context.Context, entry LogEntry) error

Append writes one LogEntry to the current UTC day file as a JSON line. If entry.Timestamp is the zero value it is set to time.Now().UTC(); any non-zero value is coerced to UTC so the day-file selection and the stored timestamp agree.

Not part of the LogSource interface. See the type-level comment for the rationale.

func (*FileLogSource) Days

func (s *FileLogSource) Days() ([]time.Time, error)

Days returns all UTC calendar days for which at least one entry has been appended, sorted ascending. Helper for tooling (e.g. tail viewers, housekeeping). Not part of the LogSource interface.

func (*FileLogSource) Read

func (s *FileLogSource) Read(ctx context.Context, from, to time.Time) (<-chan LogEntry, error)

Read implements LogSource. The returned channel has buffer 64. A background goroutine walks each UTC day file in [from, to], decodes each JSON line, filters by Timestamp, and sends matching entries downstream. The channel is closed when the window is exhausted or ctx is canceled.

Missing day files (no activity that day) are skipped silently. A malformed line aborts the stream and closes the channel early; the caller sees EOF and can diagnose via the file's contents.

type FileParameterStore

type FileParameterStore struct {
	// contains filtered or unexported fields
}

FileParameterStore is the local-filesystem implementation of ParameterStore.

On-disk layout:

<root>/<key-escaped>/
  current.json         # latest value + version number
  history/v1.json      # one file per version (Change struct)
  history/v2.json
  lock.json            # presence = locked, content has reason + timestamp

Key escaping: url.PathEscape, so dotted/slashed keys (e.g. "evolve.carrier_risk_penalty/Y-express") map to a single safe dir name.

Concurrency model:

  • Single process: sync.RWMutex serialises writers, in-memory subscriber list protected by the same lock.
  • File atomicity: temp file + os.Rename (POSIX atomic rename on same FS).
  • Cross-process: not guaranteed. The platform-layer SQL ParameterStore handles multi-tenant + multi-process concurrency.

LEGACY: Watch is in-process only. External edits to files (or another process holding the same root) are not detected. fsnotify is intentionally excluded (CLAUDE.md rule 8: zero external deps). Multi-process change notification is the SQL impl's job (LISTEN/NOTIFY).

func NewFileParameterStore

func NewFileParameterStore(root string) (*FileParameterStore, error)

NewFileParameterStore returns a store rooted at dir. Creates the root directory if missing.

func (*FileParameterStore) Get

func (s *FileParameterStore) Get(ctx context.Context, key string) (any, int, error)

Get implements ParameterStore.

func (*FileParameterStore) History

func (s *FileParameterStore) History(ctx context.Context, key string, limit int) ([]Change, error)

History implements ParameterStore. Returns versions in ascending order. limit <= 0 means no cap; otherwise the most-recent `limit` versions are returned.

func (*FileParameterStore) List

func (s *FileParameterStore) List(ctx context.Context, prefix string) ([]string, error)

List implements ParameterStore. Returns keys sorted ascending.

func (*FileParameterStore) Lock

func (s *FileParameterStore) Lock(ctx context.Context, key string, reason string) error

Lock implements ParameterStore. Requires the key to exist (returns ErrParameterNotFound otherwise). Re-locking an already-locked key overwrites the lock reason and emits a fresh lock event.

func (*FileParameterStore) Rollback

func (s *FileParameterStore) Rollback(ctx context.Context, key string, toVersion int, reason string) (int, error)

Rollback implements ParameterStore. Allowed even when the key is locked (per interface contract: locked params can still be rolled back to escape a bad release without first unlocking). Rollback creates a new version that copies the target's value, preserving the audit chain.

func (*FileParameterStore) Set

func (s *FileParameterStore) Set(ctx context.Context, key string, value any, reason string) (int, error)

Set implements ParameterStore. Returns ErrParameterLocked if the key is currently locked, ErrReasonRequired on empty reason.

func (*FileParameterStore) Unlock

func (s *FileParameterStore) Unlock(ctx context.Context, key string, reason string) error

Unlock implements ParameterStore. No-op if the key is not currently locked (idempotent), in which case no event is emitted.

func (*FileParameterStore) Watch

func (s *FileParameterStore) Watch(ctx context.Context, keyPrefix string) (<-chan ChangeEvent, error)

Watch implements ParameterStore. Returns a buffered channel receiving events whose Key starts with keyPrefix (empty prefix matches all). Channel closes when ctx is canceled.

CLEVER: events are dropped on a full subscriber buffer. A slow consumer must not block the writer path; production watchers should drain promptly or accept lossy semantics. Buffer size 16 absorbs short bursts.

type FilterFunc

type FilterFunc func(entry LogEntry) bool

FilterFunc 决定 LogEntry 是否进入 Replay 流. true 表示通过.

type FlytoLLMClient

type FlytoLLMClient struct {
	// contains filtered or unexported fields
}

FlytoLLMClient wraps a flyto.ModelProvider as an evolve.LLMClient.

Direction of dependency: evolve consumes flyto.ModelProvider; providers/ subpackages define concrete providers. The direction is evolve -> flyto, so keeping the adapter in the evolve package is the natural home. Placing it under providers/ would require providers to know about evolve, inverting the boundary.

Sampling-knob translation (Temperature / TopP): flyto.Request is the cross-provider greatest common denominator. As of L683 (commit 1/3), it carries Temperature *float64 and TopP *float64 with passthrough semantics across all 7 providers. The adapter therefore translates LLMCallOpts.{Temperature, TopP} (float64, zero-as-unset) into flyto.Request.{Temperature, TopP} (*float64, nil-as-unset): non-zero values become flyto.Float(v), zero leaves the field nil so the upstream provider applies its own default. Anthropic + extended thinking enforces temperature == 1.0 server-side; the anthropic provider pre-handles that override and emits a parameter_overridden WarningEvent through the stream, which the adapter ignores by design (only TextEvent / TextDeltaEvent drive candidate text -- WarningEvent surfaces to engine observers).

Sampling 旋钮翻译 (Temperature / TopP): flyto.Request 是跨 provider 最大公约数. L683 (commit 1/3) 之后, 它带 Temperature *float64 与 TopP *float64, 7 provider 全 passthrough. adapter 把 LLMCallOpts.{Temperature, TopP} (float64, 零=未设) 翻译为 flyto.Request.{Temperature, TopP} (*float64, nil=未设): 非零值经 flyto.Float(v) 装为指针, 零值保持 nil 让上游 provider 用自己默认. Anthropic + extended thinking 服务端强制 temperature == 1.0, anthropic provider 会预拦覆盖并发 parameter_overridden WarningEvent, adapter 按 设计忽略 (只用 TextEvent/TextDeltaEvent 驱动候选文本, WarningEvent 由 engine observer 路径消费).

Event filtering (candidate generation only needs text):

  • Aggregated: TextEvent (authoritative), TextDeltaEvent (fallback)
  • Error routed: ErrorEvent -> wraps ErrLLMFailed
  • Ignored: ToolUse*, Thinking*, Usage, Done, Turn*, SessionInfo, Permission*, Warning, Compact, Checkpoint*, and any other non-text event

Aggregation precedence (CLEVER): anthropic provider (and others that honor the full event contract) emit both TextDeltaEvent (streaming increments) and TextEvent (the complete text block on block_stop). A naive "sum both" strategy double-counts. We take TextEvent as authoritative: on each TextEvent arrival we append the final text and reset the delta accumulator. If the stream closes without any TextEvent (providers that skip the complete-block emission), we fall back to the delta accumulator so no text is lost.

func NewFlytoLLMClient

func NewFlytoLLMClient(provider flyto.ModelProvider, defaultModel, systemPrompt string) (*FlytoLLMClient, error)

NewFlytoLLMClient builds a FlytoLLMClient. provider is required; it is the underlying flyto.ModelProvider (anthropic / openai / minimax / ...) the adapter will drive. defaultModel is used whenever LLMCallOpts.Model is empty. systemPrompt is injected verbatim into flyto.Request.System on every Complete call; pass "" to omit.

func (*FlytoLLMClient) Complete

func (c *FlytoLLMClient) Complete(ctx context.Context, prompt string, opts LLMCallOpts) (string, error)

Complete implements LLMClient by invoking provider.Stream with a single user message and draining the event channel into a response string. See FlytoLLMClient godoc for event filtering and aggregation rules.

type FuncEvaluator

type FuncEvaluator struct {
	// contains filtered or unexported fields
}

FuncEvaluator wraps a user-supplied scoring function as an Evaluator.

Use this when your scorer is not well expressed as a weighted sum: e.g. a decision tree, a hard-constraint-first gate, a composed multi-stage score, or a function that genuinely needs to return an error (reading a lookup file, calling a side-channel service). Everything else should be a WeightedEvaluator -- it is more auditable and its coefficients can be evolved via ParameterStore.

func NewFuncEvaluator

func NewFuncEvaluator(fn func(ctx context.Context, c Candidate) (float64, map[string]float64, error)) (*FuncEvaluator, error)

NewFuncEvaluator wraps fn. fn must be non-nil.

func (*FuncEvaluator) Score

func (e *FuncEvaluator) Score(ctx context.Context, c Candidate) (float64, map[string]float64, error)

Score implements Evaluator by delegating to the wrapped function. ctx cancellation is checked before delegation; the wrapped fn is responsible for checking ctx itself on longer operations.

type FuncReflector

type FuncReflector struct {
	// contains filtered or unexported fields
}

FuncReflector wraps a plain function as a Reflector. Use this when the reflector needs bespoke behaviour (direct Apply via ParameterEvolver, webhook push, custom aggregation) that does not fit AggregatorReflector's descriptive-stats shape.

func NewFuncReflector

func NewFuncReflector(fn func(ctx context.Context, event ReplayEvent) error) (*FuncReflector, error)

NewFuncReflector wraps fn. fn must be non-nil.

func (*FuncReflector) OnEvent

func (r *FuncReflector) OnEvent(ctx context.Context, event ReplayEvent) error

OnEvent implements Reflector. ctx cancellation short-circuits before the wrapped fn runs.

type GenOpt

type GenOpt func(*genConfig)

GenOpt 是 Generator.Generate 的 functional option. 预期选项包括 温度 / 模型 / 角色 / 约束 / 历史注入 (见 docs/evolve-strategy.md §6.1).

func WithRoles

func WithRoles(roles ...string) GenOpt

WithRoles sets role-play prompts (做法 D in docs/evolve-strategy.md §6.1).

func WithTemperature

func WithTemperature(t float64) GenOpt

WithTemperature sets LLM sampling temperature for this Generate call. Zero is treated as "unset" (use the LLMClient/provider default), so a caller wanting deterministic 0.0 sampling must instead bypass this option and configure provider Config.Temperature directly. See LLMCallOpts godoc for the rationale.

WithTemperature 设置本次 Generate 调用的 LLM 采样温度. 零值视为 "未设" (用 LLMClient / provider 默认), 想要严格 deterministic 0.0 的调用方 应绕过本 option 直接在 provider Config 层固定. 详见 LLMCallOpts godoc.

func WithTopP

func WithTopP(p float64) GenOpt

WithTopP sets LLM nucleus sampling cutoff (top_p) for this Generate call. Zero is treated as "unset" with the same rationale as WithTemperature.

WithTopP 设置本次 Generate 调用的 LLM nucleus 采样阈值 (top_p). 零值视为 "未设", 语义同 WithTemperature.

type Generator

type Generator interface {
	Generate(ctx context.Context, input any, K int, opts ...GenOpt) ([]Candidate, error)
}

Generator 生成 K 个候选方案. 一次 LLM 调用可产出多个候选 (做法 B, 见 docs/evolve-strategy.md §6.1), 通常覆盖 4-6 次 LLM 调用 × 平均 3 候选.

为什么 K 是运行时参数而非构造时: 同一 Generator 在不同阶段需要不同 K (探索阶段 K=10, 稳态 K=3). Generate 参数化让一个 Generator 可复用于 fast loop 和 slow loop.

为什么 Candidate.Payload 用 any: 不同场景的 Candidate 结构差异巨大 (YAML / 承运商 ID / 补丁对象), 引擎层无法预设统一结构. 替代方案 (已放弃): interface{ Key() string; Payload() []byte } -- 强制 caller 额外包装, 且序列化开销不必要.

type GeneratorOption

type GeneratorOption func(*LLMGenerator) error

GeneratorOption configures an LLMGenerator at construction.

func WithMaxTokens

func WithMaxTokens(n int) GeneratorOption

WithMaxTokens caps the LLMClient output length per call. Zero = backend default.

func WithModel

func WithModel(m string) GeneratorOption

WithModel pins a default model string attached to every Complete call and recorded on Candidate.Meta["model"].

func WithPromptTemplate

func WithPromptTemplate(tmpl string) GeneratorOption

WithPromptTemplate overrides the default prompt template. The template is parsed with text/template and receives fields: K (int), Roles ([]string), RolesJoined (string), Input (string, already JSON-encoded when the original input was not a string).

type LLMCallOpts

type LLMCallOpts struct {
	Temperature float64
	TopP        float64
	Model       string
	MaxTokens   int
}

LLMCallOpts carries per-call tuning knobs. Fields with zero values are expected to fall back to implementation defaults (for example, the wrapper chooses the default model).

Temperature / TopP zero-semantics: zero means "unset" (let the wrapper choose). The wrapper translates a non-zero value to a non-nil pointer when populating flyto.Request, and leaves it nil otherwise so the upstream provider applies its own default. Callers who genuinely want deterministic 0.0 sampling cannot express it through these float64 fields -- they must configure it on the provider Config layer instead. This trade-off keeps the common path ergonomic (skip the option to use defaults) at the cost of one obscure case.

Temperature / TopP 零值语义: 零 = "未设" (让 wrapper 决定). wrapper 在写 flyto.Request 时, 非零值翻译为非 nil 指针, 零值保持 nil 让上游 provider 用自己默认. 想表达严格 deterministic 0.0 的调用方不能用 这两个 float64 字段, 必须改在 provider Config 层固定. 这个取舍换来 常用路径 (省略 option 即用默认) 的简洁, 代价是一个偏门 case.

type LLMClient

type LLMClient interface {
	Complete(ctx context.Context, prompt string, opts LLMCallOpts) (string, error)
}

LLMClient is the narrow LLM contract used by LLMGenerator.

Why not use flyto.Provider directly: flyto.Provider is the full Agent session abstraction (tool calls, streaming events, permissions). A generator only needs "prompt in, text out". Depending on Provider would drag the entire engine into evolve and break the "interface matrix consumable on its own" promise of the package.

Consumers wrap their preferred backend (flyto.Provider, openai SDK, local Ollama, etc) into something satisfying this interface. The wrapper lives in the consumer, not evolve.

type LLMGenerator

type LLMGenerator struct {
	// contains filtered or unexported fields
}

LLMGenerator is the LLM-backed Generator reference implementation.

Pipeline:

  1. Render promptTemplate with {K, Roles, RolesJoined, Input}.
  2. Call client.Complete(ctx, prompt, opts).
  3. Extract a JSON array from the response (strip markdown fences if any).
  4. Unmarshal to []rawCandidate, produce []Candidate with auto Meta.

LEGACY: structured output (OpenAI response_format=json_schema, Anthropic tool-use forced JSON) would eliminate the markdown-fence unwrap step, but support is uneven across providers as of 2026-04. Stay on prompt + parse for MVP; upgrade path is a new Option that delegates extraction to the LLMClient when the backend supports it.

func NewLLMGenerator

func NewLLMGenerator(client LLMClient, opts ...GeneratorOption) (*LLMGenerator, error)

NewLLMGenerator builds an LLMGenerator. client is required.

func (*LLMGenerator) Generate

func (g *LLMGenerator) Generate(ctx context.Context, input any, K int, opts ...GenOpt) ([]Candidate, error)

Generate implements Generator.

type LearnSkillTool

type LearnSkillTool struct {
	// contains filtered or unexported fields
}

LearnSkillTool 是"学习新技能"的工具.

func NewLearnSkillTool

func NewLearnSkillTool(evolver *Evolver) *LearnSkillTool

NewLearnSkillTool 创建 LearnSkill 工具.

func (*LearnSkillTool) Description

func (t *LearnSkillTool) Description(ctx context.Context) string

func (*LearnSkillTool) Execute

func (t *LearnSkillTool) Execute(ctx context.Context, input json.RawMessage, progress func(float64, string)) (*ToolResult, error)

func (*LearnSkillTool) InputSchema

func (t *LearnSkillTool) InputSchema() json.RawMessage

func (*LearnSkillTool) Name

func (t *LearnSkillTool) Name() string

type LearningSource

type LearningSource struct {
	// SessionID 来源会话 ID
	SessionID string `json:"session_id"`

	// TaskDescription 原始任务描述
	TaskDescription string `json:"task_description"`

	// TurnCount 完成任务用了多少轮
	TurnCount int `json:"turn_count"`

	// ToolsUsed 使用了哪些工具
	ToolsUsed []string `json:"tools_used"`

	// Timestamp 学习时间
	Timestamp time.Time `json:"timestamp"`
}

LearningSource 记录技能是从哪里学来的.

type LogEntry

type LogEntry struct {
	Timestamp  time.Time
	DecisionID string
	Entity     string
	Payload    any
	Meta       map[string]any
}

LogEntry 是一条决策日志的最低契约. Payload 存具体决策内容, 不同场景结构不同.

type LogReplayer

type LogReplayer interface {
	Replay(ctx context.Context, from, to time.Time, filter FilterFunc) error
	RegisterReflector(r Reflector)
}

LogReplayer 扫历史决策日志, 配对真实 KPI 反馈, 推送给已注册 Reflector.

为什么叫 Replayer 不叫 Scanner: Replayer 强调"重放决策时刻的完整上下文" -- 不只读日志, 还和 FeedbackChannel 配对真实后果. Scanner 只读不配.

调度边界: 扫描频率 (每日批 / 实时 / 事件触发) 由外部调度器控制, 不在 LogReplayer 接口范围. LogReplayer 只提供 Replay 方法, 调度交给 caller (cron / platform 层调度器 / 手动触发).

type LogSource

type LogSource interface {
	Read(ctx context.Context, from, to time.Time) (<-chan LogEntry, error)
}

LogSource 是 LogReplayer 的底层数据源.

为什么独立于 LogReplayer: LogSource 专注"数据读取", LogReplayer 专注"回放逻辑" (配对 / 过滤 / 分发). 同一 LogReplayer 可接不同 LogSource (文件 / SQL / 对象存储).

type Observation

type Observation struct {
	// Pattern 模式描述
	Pattern string `json:"pattern"`

	// Frequency 出现频率(次数或百分比描述)
	Frequency string `json:"frequency"`

	// Impact 影响评估
	Impact string `json:"impact"`

	// Example 具体例子
	Example string `json:"example,omitempty"`
}

Observation 是 Agent 观察到的一个模式.

type ParamApplier

type ParamApplier func(c Candidate, paramValue any) Candidate

ParamApplier projects a parameter value onto a Candidate. The engine does not know how "a carrier_risk_penalty value" shows up inside a candidate decision; the caller encodes that projection. A minimal applier clones the candidate and stores the value in Meta, letting the Evaluator pick it up; a richer applier rewrites Payload.

Purity contract: the applier must return a NEW Candidate (or a value-copy) rather than mutate the input. DefaultShadowRunner calls applier twice per test candidate (once for baseline, once for candidate param); in-place mutation would produce a race.

type ParameterEvolver

type ParameterEvolver interface {
	Propose(ctx context.Context, key string, evidence []Feedback) (proposedValue any, confidence float64, err error)
	Apply(ctx context.Context, key string, value any, approved bool, reason string) error
}

ParameterEvolver 根据 evidence 提议新参数值.

职责吸收原 SelectionPressure: 偏好权重本身就是参数的一种, 和规则系数 / 阈值无本质区别, 合并到 ParameterEvolver. 一个实现可同时处理多种参数类型.

Propose / Apply 两阶段: Propose 只返回建议值 + 置信度, 不写 ParameterStore. caller 决定是否 Apply (可能还要过 ApprovalFunc). 两阶段支持:

  • 人工审批 (Propose → 操作员看 → Apply)
  • Shadow 模式 (Propose → ShadowRunner 对比 → Apply)
  • 批量决策 (多个 Propose 一起审批)

Propose 接收 []Feedback 作为 evidence: Feedback 本身 (下方 KPI 通道 返回的一条 KPI 反馈) 字段完整包含所有 Propose 需要的证据维度 (Entity / Metric / Value / Confidence / Timestamp / Meta). 早期版本另定义了一个 字段完全相同、用途重合的 FeedbackRecord, 造成"同一语义两份类型"的 设计冗余, 本轮合并清理.

Propose takes []Feedback as evidence: Feedback (the KPI channel record defined below) already carries every dimension Propose needs (Entity / Metric / Value / Confidence / Timestamp / Meta). Earlier revisions defined a field-identical, semantically-overlapping FeedbackRecord sibling -- "one meaning, two types" duplication. This round collapses them into Feedback.

type ParameterStore

type ParameterStore interface {
	Get(ctx context.Context, key string) (value any, version int, err error)
	Set(ctx context.Context, key string, value any, reason string) (newVersion int, err error)
	List(ctx context.Context, prefix string) (keys []string, err error)
	History(ctx context.Context, key string, limit int) ([]Change, error)
	Rollback(ctx context.Context, key string, toVersion int, reason string) (newVersion int, err error)
	Lock(ctx context.Context, key string, reason string) error
	Unlock(ctx context.Context, key string, reason string) error
	Watch(ctx context.Context, keyPrefix string) (<-chan ChangeEvent, error)
}

ParameterStore 是 evolve 参数的版本化存储.

key 语义: 领域无关字符串 (如 "evolve.carrier_risk_penalty.Y-express"), caller 约定命名 scheme. 引擎不假设 key 结构. 替代方案 (已放弃): 结构化 ParamKey{Domain,Category,Entity} -- 违反"领域无关"原则 (引擎约束 scheme).

reason 强制 (合规硬需求): Set / Rollback / Lock / Unlock 的 reason 空字符串返回 ErrReasonRequired. 审计链的每条变更必须有人类可读原因.

Rollback 语义: 回滚 = "复制旧版本到新版本号", 保持审计链完整, 不删除中间历史.

Lock 独立于配置开关的原因: Lock 是原子操作 + 带审计事件 (谁 Lock / 为何 Lock). 配置开关 + Set 拦截器的组合方案有 race condition 且无审计.

多实现共存: 引擎提供 FileParameterStore (本地文件系统, CLI/TUI). platform 层 自行实现 SQL 版本 (多租户 + 分布式审计). 接口契约保证切换无缝.

type ProposalStatus

type ProposalStatus string

ProposalStatus 是提案状态.

const (
	StatusPending  ProposalStatus = "pending"
	StatusApproved ProposalStatus = "approved"
	StatusRejected ProposalStatus = "rejected"
	StatusApplied  ProposalStatus = "applied"
)

type ProposerFunc

type ProposerFunc func(ctx context.Context, key string, evidence []Feedback) (any, float64, error)

ProposerFunc maps KPI evidence to a proposed parameter value.

The caller supplies the mapping because evidence→value shape is highly domain specific: logistics might use a weighted moving average of on_time rate, finance might use an EMA of realised P&L, ad-tech might use a PID controller on conversion lift. Forcing a single statistical flavour on the engine layer would limit consumers; forcing a taxonomy (EMA / PID / mean) would bloat the surface area without covering all real cases.

Confidence is a 0..1 scalar the caller attaches to the proposal so the downstream gate (human approval / ShadowRunner / batch approver) can threshold-filter weak suggestions.

type ReflectTool

type ReflectTool struct {
	// contains filtered or unexported fields
}

ReflectTool 是"自我反思"的工具.

func NewReflectTool

func NewReflectTool(evolver *Evolver) *ReflectTool

NewReflectTool 创建 Reflect 工具.

func (*ReflectTool) Description

func (t *ReflectTool) Description(ctx context.Context) string

func (*ReflectTool) Execute

func (t *ReflectTool) Execute(ctx context.Context, input json.RawMessage, progress func(float64, string)) (*ToolResult, error)

func (*ReflectTool) InputSchema

func (t *ReflectTool) InputSchema() json.RawMessage

func (*ReflectTool) Name

func (t *ReflectTool) Name() string

type Reflection

type Reflection struct {
	// ID 唯一标识
	ID string `json:"id"`

	// SessionID 来源会话
	SessionID string `json:"session_id"`

	// Type 反思类型
	Type ReflectionType `json:"type"`

	// Summary 反思摘要
	Summary string `json:"summary"`

	// Observations Agent 观察到的模式
	Observations []Observation `json:"observations"`

	// Adjustments 建议的调整
	Adjustments []Adjustment `json:"adjustments"`

	// Metrics 会话指标
	Metrics *SessionMetrics `json:"metrics,omitempty"`

	// CreatedAt 创建时间
	CreatedAt time.Time `json:"created_at"`
}

Reflection 是一次反思记录.

type ReflectionType

type ReflectionType string

ReflectionType 反思类型.

const (
	// ReflectPostSession 会话结束后的反思
	ReflectPostSession ReflectionType = "post_session"

	// ReflectOnError 出错后的反思
	ReflectOnError ReflectionType = "on_error"

	// ReflectPeriodic 周期性反思(每 N 次会话)
	ReflectPeriodic ReflectionType = "periodic"
)

type Reflector

type Reflector interface {
	OnEvent(ctx context.Context, event ReplayEvent) error
}

Reflector 消费 LogReplayer 推送的事件, 转化为参数调整建议.

为什么独立于 ParameterEvolver: Reflector 是 event-driven (被 LogReplayer 回调), ParameterEvolver 是 on-demand (caller 主动问). 一个 Reflector 实现内部可调多个 ParameterEvolver 更新不同参数. 保持两者独立避免循环依赖.

错误语义: OnEvent 返回 error 时 LogReplayer 不会停止, 会记录错误并继续下一条. Reflector 实现不应假设事件处理顺序原子性 -- 同 key 多条事件可能并发到达.

type ReplayEvent

type ReplayEvent struct {
	Log      LogEntry
	Feedback *Feedback
	Meta     map[string]any
}

ReplayEvent is the event LogReplayer pushes to registered Reflectors. Feedback is nil when the decision has just landed and KPI has not arrived yet. Meta is a free-form extension slot for LogReplayer implementations to attach replay-time context (replay session ID, upstream source, causal cohort tag) that downstream Reflectors may consume -- external replayer impls fill it; the in-tree DefaultLogReplayer leaves it nil.

ReplayEvent 是 LogReplayer 推送给 Reflector 的事件. Feedback 为 nil 表示 KPI 尚未到达 (决策刚发生, 反馈延迟中). Meta 是给外部 LogReplayer 实现预留 的 free-form 扩展槽, 可挂 replay 会话 ID / 上游来源 / 因果分组 tag 等 供下游 Reflector 消费 -- 内置 DefaultLogReplayer 不填, 字段留着给外部 replayer 实现激活.

type ReplayerOption

type ReplayerOption func(*DefaultLogReplayer)

ReplayerOption configures a DefaultLogReplayer at construction time.

func WithFeedbackWindow

func WithFeedbackWindow(d time.Duration) ReplayerOption

WithFeedbackWindow sets the time window used for pairing. Default: 7 days. Panics if d <= 0; a zero/negative window would never match anything.

func WithLogger

func WithLogger(fn func(format string, args ...any)) ReplayerOption

WithLogger routes reflector error messages to a custom sink. Default: log.Printf (standard library). Pass a no-op to silence errors.

type RuntimeTool

type RuntimeTool struct {
	// contains filtered or unexported fields
}

RuntimeTool 将 ToolDefinition 转换为可执行的 Tool 接口实现. 这是 ToolBuilder 和 Engine 之间的桥梁.

Executor 依赖 (M1 方案 β 严格 DI)

RuntimeTool 持 execenv.Executor 用于启动 evolve 脚本/命令子进程. 本地 CLI 场景由 engine.Config.Executor 传 DefaultExecutor{}, 云端 SaaS 场景由 platform 层传 sandbox.Backend. 无 fallback, nil 直接 panic (在 NewRuntimeTool 校验).

func NewRuntimeTool

func NewRuntimeTool(def *ToolDefinition, cwd string, executor execenv.Executor) *RuntimeTool

NewRuntimeTool 基于定义创建运行时工具.

executor 参数必填, nil 会 panic. 严格 DI 契约 (M1 方案 β).

func (*RuntimeTool) Description

func (t *RuntimeTool) Description(ctx context.Context) string

Description 返回工具描述.

func (*RuntimeTool) Execute

func (t *RuntimeTool) Execute(ctx context.Context, input json.RawMessage, progress func(float64, string)) (*ToolResult, error)

Execute 执行工具.

func (*RuntimeTool) InputSchema

func (t *RuntimeTool) InputSchema() json.RawMessage

InputSchema 返回输入 schema.

func (*RuntimeTool) Metadata

func (t *RuntimeTool) Metadata() RuntimeToolMetadata

Metadata 返回工具元数据.

func (*RuntimeTool) Name

func (t *RuntimeTool) Name() string

Name 返回工具名.

type RuntimeToolMetadata

type RuntimeToolMetadata struct {
	ConcurrencySafe bool
	ReadOnly        bool
	IsEvolved       bool // 标记这是 Agent 进化出来的工具
	Version         int
}

RuntimeToolMetadata is the metadata returned by RuntimeTool.Metadata(), intended for the Engine tool registry to read when classifying an Agent-authored tool (concurrent-safe scheduling, read-only gating, audit tagging of evolved tools, version tracking for iteration).

Status (2026-04-21): all 4 fields are dead in scan-baseline because the adapter that would bridge evolve.RuntimeTool into Engine's tool registry is NOT wired in core. See engine_integration.go header for the C-plan status note; short version is "subagent / skill / memory currently cover reusable-capability needs, C-plan revisits when an industry platform hits a real RPA-shape workload". Fields stay exported so the adapter work does not have to reinvent the metadata shape.

RuntimeToolMetadata 是 RuntimeTool.Metadata() 的返回值, 本意是让 Engine 工具注册表读取用于: 并发安全调度 / 只读性判断 / 标记"这是进化出来的工具" 供审计 / 版本追踪迭代.

现状 (2026-04-21): 4 字段在 scan-baseline 全 dead, 因为 evolve.RuntimeTool 到 Engine 工具注册表的适配器没在 core 里接. 见 engine_integration.go 头部 C 方案状态注释; 简短版: "subagent / skill / memory 当前覆盖能力沉淀需求, C 方案等行业 platform 遇到真正 RPA 形态的反复任务再接". 字段保留 exported 让未来 adapter 直接复用此 shape 不用重定义.

type SelfReflector

type SelfReflector struct {
	// contains filtered or unexported fields
}

SelfReflector 负责自我反思和自适应.

func NewSelfReflector

func NewSelfReflector(store *EvolutionStore) *SelfReflector

NewSelfReflector 创建自反思器.

func (*SelfReflector) Apply

func (sr *SelfReflector) Apply(ctx context.Context, proposal *EvolutionProposal) error

Apply 执行自适应调整提案.

func (*SelfReflector) FormatForSystemPrompt

func (sr *SelfReflector) FormatForSystemPrompt() (string, error)

FormatForSystemPrompt 将反思洞察格式化为系统提示词片段.

func (*SelfReflector) GetLessons

func (sr *SelfReflector) GetLessons() ([]string, error)

GetLessons 提取所有"教训"类型的调整建议. 注入到系统提示中,让 Agent 避免重复犯错.

func (*SelfReflector) LoadRecentReflections

func (sr *SelfReflector) LoadRecentReflections(limit int) ([]*Reflection, []error)

LoadRecentReflections 加载最近的反思记录.

升华改进(ELEVATED): 返回 ([]*Reflection, []error) 双清单. 与 SkillLearner.LoadAll 保持一致的契约:成功清单和失败清单并行返回, 调用方可拿到所有可用反思,同时知道哪些文件损坏了,而不是被单一聚合 error 掩盖.

type SessionMetrics

type SessionMetrics struct {
	TurnCount         int            `json:"turn_count"`
	TotalInputTokens  int            `json:"total_input_tokens"`
	TotalOutputTokens int            `json:"total_output_tokens"`
	TotalCostUSD      float64        `json:"total_cost_usd"`
	ToolUseCounts     map[string]int `json:"tool_use_counts"`
	ErrorCount        int            `json:"error_count"`
	Duration          time.Duration  `json:"duration"`
	TaskCompleted     bool           `json:"task_completed"`
}

SessionMetrics 是会话指标.

type ShadowResult

type ShadowResult struct {
	BaselineFitness    float64
	CandidateFitness   float64
	BaselineBreakdown  map[string]float64
	CandidateBreakdown map[string]float64
	SampleSize         int
	Divergence         float64
	Meta               map[string]any
}

ShadowResult is the per-run comparison produced by ShadowRunner. Divergence sits on 0-1 (0 = baseline and candidate agree on every sample, 1 = they disagree everywhere).

BaselineBreakdown / CandidateBreakdown hand back the per-dimension score decomposition (the same breakdown Evaluator returns), so the gate UI can explain why the aggregate fitness differs -- not just "candidate is 0.03 higher" but "candidate wins on latency, loses on cost". Meta is a free-form slot for runners to annotate the run (traffic window, tenant, sampler version).

All three are dead in scan-baseline because core ships only the shadow scoring path; the decision UI / A-B gate scheduler / operator dashboard that consume breakdowns live in platform or caller code. test lock is in shadow_runner_default_test.go; Meta has no test lock (extension slot, by design).

ShadowResult 是 ShadowRunner 单次影子运行的对比结果. Divergence 在 0-1 之间 (0 = 每个样本 baseline 和 candidate 一致, 1 = 全部分歧).

BaselineBreakdown / CandidateBreakdown 把 Evaluator 返回的分维度打分透出 来, 让灰度决策面板能解释总分差异来源 -- 不只是 "候选高 0.03", 而是 "候选 胜在延迟, 输在成本". Meta 是 free-form 扩展槽, runner 可挂流量窗口 / 租户 / sampler 版本等标注.

3 字段在 scan-baseline 里标 dead 是因为 core 只出影子打分主路径; 消费 breakdown 的决策 UI / A-B 灰度调度器 / 运营 dashboard 在 platform 或 caller 侧. test 锁在 shadow_runner_default_test.go; Meta 无 test 锁 (扩展槽, 刻意留白).

type ShadowRunner

type ShadowRunner interface {
	RunShadow(ctx context.Context, baselineKey string, candidateValue any, traffic float64) (ShadowResult, error)
}

ShadowRunner 把候选参数值影子运行, 不影响生产, 返回 fitness 对比.

Traffic 语义: 0.0-1.0 之间, 表示影子模式覆盖的流量比例. 0.1 = 10% 请求走影子, 90% 走生产. 1.0 = 纯离线 replay 不影响实时流量.

和 ParameterStore 的边界: ShadowRunner 不写 ParameterStore, 只做"并行跑 + 评分". caller 根据 ShadowResult 决定是否调 ParameterEvolver.Apply 推生产.

三步灰度流程 (v0.4 RL policy 落地路径):

  1. Shadow (ShadowRunner, 生产不受影响)
  2. A/B (小比例灰度, 1-10% 真实流量)
  3. 生产 (全量切换)

ShadowRunner 负责第 1 步, 后两步由 platform 层调度.

type SkillDefinition

type SkillDefinition struct {
	// Name 技能名称
	Name string `json:"name"`

	// Description 简短描述
	Description string `json:"description"`

	// WhenToUse 什么时候应该使用这个技能(模型看到此信息后自动判断)
	WhenToUse string `json:"when_to_use"`

	// Steps 技能的执行步骤
	Steps []SkillStep `json:"steps"`

	// Prompt 完整的提示词模板(注入到系统提示中)
	Prompt string `json:"prompt"`

	// RequiredTools 需要的工具列表
	RequiredTools []string `json:"required_tools,omitempty"`

	// Tags 分类标签
	Tags []string `json:"tags,omitempty"`

	// LearnedFrom 学习来源
	LearnedFrom *LearningSource `json:"learned_from,omitempty"`

	// SuccessRate 成功率(Agent 自我评估)
	SuccessRate float64 `json:"success_rate"`

	// UsageCount 被使用的次数
	UsageCount int `json:"usage_count"`

	// Version 版本号
	Version int `json:"version"`

	// CreatedAt 创建时间
	CreatedAt time.Time `json:"created_at"`

	// UpdatedAt 最后更新时间
	UpdatedAt time.Time `json:"updated_at"`
}

SkillDefinition 是一个技能定义.

type SkillLearner

type SkillLearner struct {
	// contains filtered or unexported fields
}

SkillLearner 负责技能学习和管理.

func NewSkillLearner

func NewSkillLearner(store *EvolutionStore, maxPerSession int) *SkillLearner

NewSkillLearner 创建技能学习器.

func (*SkillLearner) Apply

func (sl *SkillLearner) Apply(ctx context.Context, proposal *EvolutionProposal) error

Apply 执行技能学习提案.

func (*SkillLearner) FormatForSystemPrompt

func (sl *SkillLearner) FormatForSystemPrompt() (string, error)

FormatForSystemPrompt 将所有技能格式化为系统提示词片段. Engine 在构建系统提示时调用此方法,让模型知道有哪些可用技能.

func (*SkillLearner) LoadAll

func (sl *SkillLearner) LoadAll() ([]*SkillDefinition, []error)

LoadAll 加载所有已学习的技能.

升华改进(ELEVATED): 返回 ([]*SkillDefinition, []error) 双清单而非单一 error. 原方案:部分失败只记录 firstSkipErr,有部分成功则静默丢弃错误, 全部失败才返回单个错误--调用方无法知道哪些文件损坏. 新方案:每个失败文件产生独立 error,成功和失败信息并行返回,调用方自行决策. 替代方案:<返回 (results, error) 聚合> - 否决原因:单一 error 掩盖了具体哪些文件有问题, 且无法区分"一个文件损坏"和"五个文件损坏".

func (*SkillLearner) RecordUsage

func (sl *SkillLearner) RecordUsage(name string, success bool) error

RecordUsage 记录技能被使用.

type SkillStep

type SkillStep struct {
	// Order 步骤顺序
	Order int `json:"order"`

	// Description 步骤描述
	Description string `json:"description"`

	// ToolName 使用的工具(可选)
	ToolName string `json:"tool_name,omitempty"`

	// InputTemplate 工具输入模板(支持变量替换)
	InputTemplate string `json:"input_template,omitempty"`

	// Condition 执行条件(可选,如"前一步失败时")
	Condition string `json:"condition,omitempty"`

	// Fallback 失败时的回退步骤
	Fallback string `json:"fallback,omitempty"`
}

SkillStep 是技能的一个执行步骤.

type ToolBuilder

type ToolBuilder struct {
	// contains filtered or unexported fields
}

ToolBuilder 负责运行时工具构建.

func NewToolBuilder

func NewToolBuilder(store *EvolutionStore, maxPerSession int) *ToolBuilder

NewToolBuilder 创建工具构建器.

func NewToolBuilderWithGuard

func NewToolBuilderWithGuard(store *EvolutionStore, maxPerSession int, guard security.SecretGuard) *ToolBuilder

NewToolBuilderWithGuard 创建带秘密扫描的工具构建器.

升华改进(ELEVATED): tool_builder 写入的 Script 字段可能含有 API key-- Agent 在生成脚本时可能无意间将当前 session 的环境变量硬编码进脚本. SecretGuard 在持久化前拦截,阻止 key 被写入磁盘(高风险路径). 替代方案:<不扫描,依靠 Agent 不犯错> - 否决原因:安全不能依赖 Agent 的"正确性".

func (*ToolBuilder) Apply

func (tb *ToolBuilder) Apply(ctx context.Context, proposal *EvolutionProposal) error

Apply 执行工具创建提案.

func (*ToolBuilder) LoadAll

func (tb *ToolBuilder) LoadAll() ([]*ToolDefinition, error)

LoadAll 加载所有已持久化的工具定义. Engine 启动时调用,恢复之前创建的工具.

type ToolDefinition

type ToolDefinition struct {
	// Name 工具名称(必须唯一,不能与内置工具冲突)
	Name string `json:"name"`

	// Description 工具描述(模型看到的说明)
	Description string `json:"description"`

	// InputSchema JSON Schema 格式的输入定义
	InputSchema json.RawMessage `json:"input_schema"`

	// ExecutionType 执行方式
	ExecutionType ToolExecType `json:"execution_type"`

	// Script shell 脚本内容(当 ExecutionType == ExecScript)
	// 输入参数通过环境变量传递:TOOL_INPUT_<PARAM_NAME>
	Script string `json:"script,omitempty"`

	// Command 命令模板(当 ExecutionType == ExecCommand)
	// 支持 {{.param_name}} 模板语法
	Command string `json:"command,omitempty"`

	// Version 版本号(Agent 迭代改进时递增)
	Version int `json:"version"`

	// CreatedAt 创建时间
	CreatedAt time.Time `json:"created_at"`

	// CreatedBy 创建者(会话 ID 或 Agent ID)
	CreatedBy string `json:"created_by,omitempty"`

	// Rationale Agent 为什么创建这个工具
	Rationale string `json:"rationale,omitempty"`

	// Tags 标签(用于搜索和分类)
	Tags []string `json:"tags,omitempty"`

	// ConcurrencySafe 是否可并发执行
	ConcurrencySafe bool `json:"concurrency_safe"`

	// ReadOnly 是否只读
	ReadOnly bool `json:"read_only"`
}

ToolDefinition 是一个运行时定义的工具. Agent 通过对话生成这个结构,然后注册到 Engine 中.

type ToolExecType

type ToolExecType string

ToolExecType 是工具执行方式.

const (
	// ExecScript 通过 shell 脚本执行
	// Agent 编写完整的 bash 脚本,参数通过环境变量传入
	ExecScript ToolExecType = "script"

	// ExecCommand 通过命令模板执行
	// Agent 编写带占位符的命令,引擎替换参数后执行
	ExecCommand ToolExecType = "command"

	// ExecComposite 组合现有工具
	// Agent 定义一个工具链(调用多个现有工具的序列)
	ExecComposite ToolExecType = "composite"
)

type ToolResult

type ToolResult struct {
	Output  string
	IsError bool
}

ToolResult is the execution result from RuntimeTool.Execute -- a simplified shape kept local to evolve so this package does not import pkg/tools and create a cycle.

Status (2026-04-21): both fields are dead in scan-baseline for the same reason as RuntimeToolMetadata -- the RuntimeTool.Execute → Engine-tool- registry → model-visible-result path is not wired in core. Tests in evolve_test.go assert IsError / Output to lock forward-propagation; the real consumer is the future adapter that translates evolve.ToolResult into tools.Result for model consumption.

ToolResult 是 RuntimeTool.Execute 的返回结构, 简化版保留在 evolve 包内避免 import pkg/tools 形成循环.

现状 (2026-04-21): 2 字段在 scan-baseline 标 dead, 原因与 RuntimeToolMetadata 相同 -- RuntimeTool.Execute → Engine 工具注册表 → 模型可见结果这条路径在 core 里没接. evolve_test.go 已锁 IsError / Output 的 forward 传递; 真正 消费方是未来把 evolve.ToolResult 翻译成 tools.Result 让模型消费的 adapter.

type WeightedEvaluator

type WeightedEvaluator struct {
	// contains filtered or unexported fields
}

WeightedEvaluator is the reference "cheap scorer" for the common weighted- linear fitness pattern: fitness = sum_i (feature_i(candidate) * weight_i).

Why this is the main reference impl, not a closure wrapper: Weighted-linear scoring covers the vast majority of "cheap first-pass" evaluators across domains (logistics carrier scoring, ad eCPM, multi-factor stock selection, hiring rubrics, engineering trade-offs). The shape is stable, declarative, and auditable; consumers plug in features without writing a full Evaluator.

For non-linear / fn-style scoring use FuncEvaluator.

func NewWeightedEvaluator

func NewWeightedEvaluator(opts ...WeightedOption) (*WeightedEvaluator, error)

NewWeightedEvaluator builds the evaluator and validates:

  • at least one feature registered
  • every feature has exactly one weight
  • every weight has exactly one feature (no orphan weights)

All errors are reported together (joined via errors.Join) so a misconfigured evaluator surfaces every issue in one go.

func (*WeightedEvaluator) Features

func (e *WeightedEvaluator) Features() []string

Features returns the registered feature names in sorted order. Useful for UI introspection and tests. Returns a copy; mutating the result does not affect the evaluator.

func (*WeightedEvaluator) Score

Score implements Evaluator. breakdown contains the RAW feature values, not weighted contributions: a debugger wants to see "on_time feature was 0.3", not "contribution 0.15 = 0.3 * 0.5". Reconstructing contributions from breakdown + weights is a one-liner; the reverse is lossy.

func (*WeightedEvaluator) Weights

func (e *WeightedEvaluator) Weights() map[string]float64

Weights returns a copy of the weight map.

type WeightedOption

type WeightedOption func(*weightedConfig)

WeightedOption configures a WeightedEvaluator.

func WithFeature

func WithFeature(name string, fn FeatureFunc) WeightedOption

WithFeature registers a named feature extractor. Duplicate names produce a constructor error (silently overwriting a feature is a frequent config- drift source: a copy-pasted option block overriding a prior definition).

func WithNormalization

func WithNormalization(on bool) WeightedOption

WithNormalization enables [0, 1] clipping of every feature value before the weighted sum. Default off: many real-world features (cost in RMB, latency in ms, inventory count) are absolute scales where clipping silently corrupts the score. Only enable when your features are already bounded-unit (fraction, score 0-1).

func WithWeight

func WithWeight(name string, w float64) WeightedOption

WithWeight sets a single weight. Negative values are allowed and convey "smaller feature value is better" (e.g. cost, latency, error rate).

func WithWeights

func WithWeights(m map[string]float64) WeightedOption

WithWeights sets weights in bulk. Equivalent to WithWeight per entry.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL