Documentation
¶
Overview ¶
Package evolve provides the v0.2+ self-evolution interface matrix for Flyto Agent: LLM-generated candidates, offline log replay, versioned parameter storage, and shadow-mode validation.
API Consumption Shapes ¶
All ten contracts here are form 3 (synchronous callback) under the three Flyto API shapes (see `docs/api-reference.md` "API 消费形态 / API Consumption Patterns"). The engine's evolve loop calls each interface synchronously in sequence: Generator produces a candidate -> Evaluator scores it -> Reflector critiques -> ApprovalFunc gates -> ParameterEvolver proposes -> ShadowRunner validates in shadow mode -> ParameterStore persists. LogSource / LogReplayer / FeedbackChannel are pull interfaces the evolve loop drains during offline replay and feedback ingestion. Consumers implement any or all of them to plug alternative storage, models, or evaluation policies; the interfaces are designed for coexistence (see "Multi-Implementation Coexistence" below).
API 消费形态:
本包 10 个契约在 Flyto 三种 API 形态下全部属于形态三 (同步回调, 见 `docs/api-reference.md` "API 消费形态 / API Consumption Patterns"). 引擎的 evolve loop 顺序同步调用: Generator 产候选 -> Evaluator 打分 -> Reflector 反思 -> ApprovalFunc 把关 -> ParameterEvolver 提议 -> ShadowRunner 影子模式验证 -> ParameterStore 持久化. LogSource / LogReplayer / FeedbackChannel 是 pull 接口, evolve loop 在离线回放和反馈摄入时读取. 消费者可实现任意子集来替换存储 / 模型 / 评估策略; 接口设计为可并存 (见下文 "Multi-Implementation Coexistence").
Interface Topology ¶
Ten contracts (9 interfaces + 1 func type, canonical definitions in interfaces.go):
Core loop : Generator / Evaluator / Reflector / ApprovalFunc (func) Parameter mgmt : ParameterStore / ParameterEvolver Log / feedback : LogReplayer / LogSource / FeedbackChannel Risk mitigation : ShadowRunner
Reference Implementations ¶
Shipped in this package (file-backed / in-process defaults). Alternative implementations (e.g. SQL-backed ParameterStore) are expected to live in the platform layer or consumer code and coexist with these via interface swap.
Generator -> generator_llm.go (LLMGenerator) Evaluator -> evaluator_impls.go (Func / Weighted / Rule) Reflector -> reflector_impls.go, self_reflector.go ApprovalFunc -> evolve.go (Config.ApprovalFunc) ParameterStore -> parameter_store_file.go ParameterEvolver -> parameter_evolver_default.go LogReplayer -> default_log_replayer.go LogSource -> log_source_file.go FeedbackChannel -> feedback_channel_file.go ShadowRunner -> shadow_runner_default.go
Provider Bridge ¶
llm_adapter_flyto.go ships FlytoLLMClient, a thin adapter that wraps any flyto.ModelProvider (anthropic / openai / minimax / gemini / ollama / lmstudio / openrouter) as an LLMClient consumable by LLMGenerator. Event aggregation prefers TextEvent as authoritative and falls back to TextDeltaEvent when the provider does not emit complete-block events, avoiding double-counting on providers that emit both. ErrorEvent is wrapped via fmt.Errorf("%w: ...", ErrLLMFailed) so callers can use errors.Is.
Note: flyto.Request (the cross-provider contract) does not expose Temperature, so LLMCallOpts.Temperature is intentionally ignored by this adapter. Set temperature at the provider factory Config (e.g. anthropic.Config) when per-call temperature control is needed.
End-to-End Example ¶
See core/examples/evolve_closed_loop/main.go -- all nine interfaces chained in a single process, demonstrating the full generate -> evaluate -> reflect -> propose -> shadow loop against fixture feedback.
Domain Agnostic ¶
All interfaces exchange business data as any / string / float64. The engine makes no assumption about any industry schema. Concrete data-source wiring (orders / settlement / complaints / KPI streams) belongs to the consumer layer. Rationale: docs/evolve-strategy.md and the "data ingress boundary" note in memory project_architecture_decisions.md.
Multi-Implementation Coexistence ¶
Every interface is expected to have multiple implementations side by side. For example the file-backed ParameterStore in this package serves CLI / TUI workflows, while platform/ is free to ship a SQL-backed ParameterStore for multi-tenant deployments. Interface contracts guarantee swap without code changes at call sites.
Strategic Background ¶
docs/evolve-strategy.md -- full strategy, Round 1/2/3 design rounds,
merge / absorb decisions, v0.2-v3.0 roadmap
CHANGELOG.md -- v0.2-dev delivery summary (Unreleased)
Academic References ¶
The design draws on and cites the following arXiv works:
2508.07407 Comprehensive Survey of Self-Evolving AI Agents 2507.21046 Survey of Self-Evolving Agents 2509.04642 Maestro: Joint Graph & Config Optimization 2512.09108 Artemis / Evolving Excellence (mixed-variable optimization) 2406.16218 Trace: Next AutoDiff (NeurIPS 2024)
Package evolve 实现 Agent 自进化能力.
战略定位 (详见 docs/evolve-strategy.md): evolve = 领域无关引擎 + 行业专属消费层 + ML 真实数据反射器 的飞轮. 对齐 2025 两篇权威综述 arXiv:2507.21046 和 arXiv:2508.07407, 填补 学术承认的 "业务 KPI 闭环系统级演化" 空白.
已解决 (credit assignment / 稳定性悖论 / 冷启动) 是研究级开放问题, 不是工程 TODO. 具体见 docs/evolve-strategy.md §9 §10.
设计理念:
- 自进化可追溯可撤销, 每次参数变更强制带 reason 入审计链
- 进化产物跨会话持久化, 多实现共存 (文件 / SQL / 自定义)
- 消费层审批 / 拒绝任何进化行为 (ApprovalFunc)
- 领域无关 = 全领域支持, 不内嵌任何行业 schema
v0.1 三大 struct (保留, 未来改造为下文接口矩阵的具体实现):
- ToolBuilder - 运行时定义新工具 (Agent 自己写工具代码)
- SkillLearner - 将成功工作流保存为可复用技能
- SelfReflector - 分析自身表现并自适应调整
v0.2+ 接口矩阵 (定义在 interfaces.go, 10 个 interface + 1 个 func):
核心 loop : Generator / Evaluator / Reflector / ApprovalFunc(func) 参数管理 : ParameterStore / ParameterEvolver 日志 / 反馈 : LogReplayer / LogSource / FeedbackChannel 风险缓解 : ShadowRunner
接口之间协作关系 (fast loop):
input → Generator → []Candidate → Evaluator → fitness 排序 → Top-K → 审批 → 执行
接口之间协作关系 (slow loop):
LogSource → LogReplayer → Reflector → ParameterEvolver.Propose
→ ApprovalFunc → ParameterStore.Set
Index ¶
- Variables
- type Adjustment
- type AdjustmentType
- type AggregatedStats
- type AggregatorReflector
- func (r *AggregatorReflector) Entities() []string
- func (r *AggregatorReflector) Metrics(entity string) []string
- func (r *AggregatorReflector) OnEvent(ctx context.Context, event ReplayEvent) error
- func (r *AggregatorReflector) Reset()
- func (r *AggregatorReflector) Stats(entity, metric string) AggregatedStats
- type ApprovalFunc
- type Candidate
- type CandidateSampler
- type Change
- type ChangeEvent
- type Config
- type CreateToolTool
- type DefaultLogReplayer
- type DefaultParameterEvolver
- type DefaultShadowRunner
- type Evaluator
- type EvolutionProposal
- type EvolutionStore
- type EvolutionType
- type Evolver
- func (e *Evolver) History() ([]*EvolutionProposal, error)
- func (e *Evolver) Propose(ctx context.Context, proposal *EvolutionProposal) error
- func (e *Evolver) Reflector() *SelfReflector
- func (e *Evolver) SkillLearner() *SkillLearner
- func (e *Evolver) SystemPromptFragment() string
- func (e *Evolver) ToolBuilder() *ToolBuilder
- type EvolverOption
- type FeatureFunc
- type Feedback
- type FeedbackChannel
- type FileFeedbackChannel
- type FileLogSource
- type FileParameterStore
- func (s *FileParameterStore) Get(ctx context.Context, key string) (any, int, error)
- func (s *FileParameterStore) History(ctx context.Context, key string, limit int) ([]Change, error)
- func (s *FileParameterStore) List(ctx context.Context, prefix string) ([]string, error)
- func (s *FileParameterStore) Lock(ctx context.Context, key string, reason string) error
- func (s *FileParameterStore) Rollback(ctx context.Context, key string, toVersion int, reason string) (int, error)
- func (s *FileParameterStore) Set(ctx context.Context, key string, value any, reason string) (int, error)
- func (s *FileParameterStore) Unlock(ctx context.Context, key string, reason string) error
- func (s *FileParameterStore) Watch(ctx context.Context, keyPrefix string) (<-chan ChangeEvent, error)
- type FilterFunc
- type FlytoLLMClient
- type FuncEvaluator
- type FuncReflector
- type GenOpt
- type Generator
- type GeneratorOption
- type LLMCallOpts
- type LLMClient
- type LLMGenerator
- type LearnSkillTool
- type LearningSource
- type LogEntry
- type LogReplayer
- type LogSource
- type Observation
- type ParamApplier
- type ParameterEvolver
- type ParameterStore
- type ProposalStatus
- type ProposerFunc
- type ReflectTool
- type Reflection
- type ReflectionType
- type Reflector
- type ReplayEvent
- type ReplayerOption
- type RuntimeTool
- func (t *RuntimeTool) Description(ctx context.Context) string
- func (t *RuntimeTool) Execute(ctx context.Context, input json.RawMessage, progress func(float64, string)) (*ToolResult, error)
- func (t *RuntimeTool) InputSchema() json.RawMessage
- func (t *RuntimeTool) Metadata() RuntimeToolMetadata
- func (t *RuntimeTool) Name() string
- type RuntimeToolMetadata
- type SelfReflector
- type SessionMetrics
- type ShadowResult
- type ShadowRunner
- type SkillDefinition
- type SkillLearner
- type SkillStep
- type ToolBuilder
- type ToolDefinition
- type ToolExecType
- type ToolResult
- type WeightedEvaluator
- type WeightedOption
Constants ¶
This section is empty.
Variables ¶
var ErrCandidateParseFailed = errors.New("evolve: candidate parse failed")
ErrCandidateParseFailed is returned when Generator cannot extract a JSON array of candidates from the LLM response. The caller decides whether to retry, relax parsing, or fall back to a static candidate set.
var ErrEntityRequired = errors.New("evolve: entity required")
ErrEntityRequired is returned by FeedbackChannel.Report / Query when entity is the empty string. Entity distinguishes data streams (carrier / driver / route / ...), collapsing it into a single bucket would poison Query results.
var ErrInvalidK = errors.New("evolve: K must be > 0")
ErrInvalidK is returned by Generator.Generate when K <= 0.
var ErrLLMFailed = errors.New("evolve: LLM call failed")
ErrLLMFailed wraps errors from the underlying LLM Complete call so callers can distinguish LLM failure from parse failure via errors.Is.
var ErrMetricRequired = errors.New("evolve: metric required")
ErrMetricRequired is returned by FeedbackChannel.Report when metric is the empty string. Query accepts empty metric as "all metrics".
var ErrParameterLocked = errors.New("evolve: parameter is locked")
ErrParameterLocked is returned when Set is called on a locked key. Unlock first, or use Rollback (allowed on locked keys by design).
var ErrParameterNotFound = errors.New("evolve: parameter not found")
ErrParameterNotFound is returned when Get / History / Rollback is called on a non-existent key.
var ErrReasonRequired = errors.New("evolve: reason required for audit chain")
ErrReasonRequired is returned when Set / Rollback / Lock / Unlock is called with empty reason. Audit chain requires human-readable reason.
var ErrVersionNotFound = errors.New("evolve: version not found")
ErrVersionNotFound is returned when Rollback targets a non-existent version.
Functions ¶
This section is empty.
Types ¶
type Adjustment ¶
type Adjustment struct {
// Type 调整类型
Type AdjustmentType `json:"type"`
// Description 调整描述
Description string `json:"description"`
// Target 调整目标(工具名,提示词片段等)
Target string `json:"target"`
// Suggestion 具体建议
Suggestion string `json:"suggestion"`
// Priority 优先级 1-5(5 最高)
Priority int `json:"priority"`
// Applied 是否已应用
Applied bool `json:"applied"`
}
Adjustment 是一个自适应调整建议.
type AdjustmentType ¶
type AdjustmentType string
AdjustmentType 调整类型.
const ( // AdjustPrompt 调整系统提示词 AdjustPrompt AdjustmentType = "prompt" // AdjustToolPriority 调整工具优先级/偏好 AdjustToolPriority AdjustmentType = "tool_priority" // AdjustContext 调整上下文预加载策略 AdjustContext AdjustmentType = "context" // AdjustWorkflow 调整工作流程 AdjustWorkflow AdjustmentType = "workflow" // AdjustLesson 记录一个教训 AdjustLesson AdjustmentType = "lesson" )
type AggregatedStats ¶
type AggregatedStats struct {
Count int // events whose Feedback matched this (entity, metric)
Sum float64 // sum of Feedback.Value
Mean float64 // Sum / Count; 0 when Count == 0
Min, Max float64 // bounds of observed Feedback.Value; 0 when Count == 0
PendingCount int // events whose Feedback was nil for this entity
LastTimestamp time.Time // most recent event timestamp (decision or feedback)
}
AggregatedStats summarises a stream of feedback events for one entity+metric pair. Returned by AggregatorReflector.Stats() as a pull-form snapshot (value copy, safe for external consumers to mutate).
Sum sits in scan-baseline as dead because no in-tree code reads it after AggregatorReflector writes it -- the consumer is an external watch-only panel / operator dashboard that calls Stats() and renders rolling totals. test lock is in reflector_impls_test.go.
AggregatedStats 汇总单条 (entity, metric) 的反馈事件流. 由 AggregatorReflector.Stats() 以 pull 形态返回快照 (值拷贝, 外部消费方自由 修改不污染内部状态).
Sum 在 scan-baseline 里标 dead 是因为 core 内无 reader -- 消费者是外部 watch-only 面板 / 运营 dashboard, 调 Stats() 取滚动累加展示. test 锁在 reflector_impls_test.go.
type AggregatorReflector ¶
type AggregatorReflector struct {
// contains filtered or unexported fields
}
AggregatorReflector is the zero-ML reference Reflector: it consumes ReplayEvent stream and maintains rolling descriptive statistics keyed by (entity, metric). Stats are read-only queryable; the reflector does not push back into ParameterEvolver or ParameterStore.
Why this is a useful ref impl even without ML: A staged rollout of evolve typically runs a "watch-only" phase where the operator just wants to see "what would the reflector see?" A stats view is a cheap way to get there and doubles as the input signal for an eventual ML reflector. The interface contract for Reflector makes no assumption that reflection has to trigger writes.
Concurrency:
- OnEvent acquires a write lock for the duration of one update. Contention is minor under typical replay rates (hundreds of events/sec).
- Stats / Entities / Metrics / Reset take RLock / Lock as needed.
- All exported reads return copies; internal maps are never leaked.
func NewAggregatorReflector ¶
func NewAggregatorReflector() *AggregatorReflector
NewAggregatorReflector returns an empty reflector.
func (*AggregatorReflector) Entities ¶
func (r *AggregatorReflector) Entities() []string
Entities returns every known entity, sorted.
func (*AggregatorReflector) Metrics ¶
func (r *AggregatorReflector) Metrics(entity string) []string
Metrics returns every known metric for the given entity, sorted. The pending pseudo-metric ("") is excluded.
func (*AggregatorReflector) OnEvent ¶
func (r *AggregatorReflector) OnEvent(ctx context.Context, event ReplayEvent) error
OnEvent implements Reflector.
Event classification:
- event.Feedback == nil: counted as pending under key (entity, "") so the operator can see decisions still awaiting feedback.
- event.Feedback != nil: contributes to (entity, metric) count / sum / min / max / mean / lastTs. PendingCount of the matched cell is also decremented up to zero (first-touch pairing "resolved" the earlier pending event), but never below zero -- feedback arriving before any pending event is legal (e.g. external feedback ingest).
func (*AggregatorReflector) Reset ¶
func (r *AggregatorReflector) Reset()
Reset drops all aggregated state.
func (*AggregatorReflector) Stats ¶
func (r *AggregatorReflector) Stats(entity, metric string) AggregatedStats
Stats returns a snapshot of (entity, metric) statistics. Unknown pair returns the zero AggregatedStats (all fields zero) with no error. metric="" returns the pending-only view for entity (Count=0, Sum=0, PendingCount filled).
type ApprovalFunc ¶
type ApprovalFunc func(ctx context.Context, proposal *EvolutionProposal) (bool, error)
ApprovalFunc 是进化行为的审批回调. 消费层实现此函数来决定是否批准 Agent 的自进化行为. 这是安全边界 -- Agent 不能在没有人类审批的情况下改造自己. 精妙之处(CLEVER): 进化审批回调--Agent 不能自行决定是否进化,必须经过人类审批. 这是整个自进化系统的安全边界:即使 Agent 写出了完美的工具代码, 没有人类批准就不会被注册到引擎中.approvalFunc 为 nil 时自动拒绝所有提案(安全默认值).
type Candidate ¶
Candidate 是 Generator 产出的单个候选方案. Meta 携带生成时的元数据 (temperature / model / role), 供 Evaluator 和 ParameterEvolver 回溯.
type CandidateSampler ¶
CandidateSampler supplies the test set for a shadow run. traffic is the caller-requested coverage in [0, 1]; the sampler interprets it (hash partition, time-window slice, random subsample, fixture replay) and returns the selected Candidates.
Returning nil/empty is legal and produces a zero-SampleSize ShadowResult (no data, no divergence).
type Change ¶
Change 是一次参数变更的历史记录. Author 记录改动发起方 (evolver id / 人工 user / rollback / lock / unlock).
type ChangeEvent ¶
ChangeEvent is what ParameterStore.Watch pushes to subscribers. IsLock=true marks a Lock/Unlock event (Change.Value is nil in that case); otherwise it's a Set/Rollback. Change carries the full audit row (Version / Value / Reason / Timestamp / Author) so subscribers do not need a follow-up History() call.
Consumers are external: dashboards rendering the audit timeline, platform services fanning Watch out to tenants, test harnesses asserting ordering. scan-baseline.json lists Change / IsLock as dead because no in-tree code reads them after the test harness verifies forward-propagation in parameter_store_file_test.go -- expected pull-API state.
ChangeEvent 是 ParameterStore.Watch 推送给订阅方的事件. IsLock=true 标记 Lock/Unlock (此时 Change.Value 为 nil), 否则是 Set/Rollback. Change 带完整 审计行 (Version/Value/Reason/Timestamp/Author), 订阅方不用再回 History().
消费方在 core 之外: 审计时间线 dashboard / platform 把 Watch 分发给多租户 / 测试检查顺序. scan-baseline.json 把 Change / IsLock 列 dead 是因为 core 内无 reader -- test 在 parameter_store_file_test.go 锁了 forward, 外部 pull 消费是预期形态.
type Config ¶
type Config struct {
// StoreDir 进化产物的存储目录
StoreDir string
// ApprovalFunc 审批回调(nil 则自动拒绝所有提案)
ApprovalFunc ApprovalFunc
// AutoApproveReadOnly, when true, lets read-only evolutions bypass
// ApprovalFunc. "Read-only" means the evolution only adds declarative
// content that the engine will read later -- currently limited to
// EvolveNewSkill (a markdown skill file). Code-executing evolutions
// (EvolveNewTool), workflow mutations (EvolveOptimize), and runtime
// self-tuning (EvolveSelfAdjust) are NEVER auto-approved by this flag,
// because their content can escape the read-only boundary.
//
// This is a security ergonomics trade-off: skill learning is the
// high-frequency low-risk path (agent reads its own past lessons),
// and routing every skill proposal through a human gate either
// trains operators to click-through blindly (worse than automation)
// or blocks the loop entirely. Everything else stays locked.
//
// AutoApproveReadOnly 为 true 时, 只读进化绕过 ApprovalFunc. "只读"
// 指进化只添加引擎稍后会读的声明性内容 -- 当前仅限 EvolveNewSkill
// (markdown 技能文件). 会执行代码的进化 (EvolveNewTool) / 改工作流
// (EvolveOptimize) / 运行时自调 (EvolveSelfAdjust) 永不被此标志
// 自动批准, 因其内容可能突破只读边界.
//
// 这是安全人机工程权衡: 学技能是高频低风险路径 (agent 读自己以往
// 教训), 每条技能提案过人工必导致运维盲点"无脑通过" (比自动化还糟)
// 或直接阻塞 loop. 其他全留锁.
AutoApproveReadOnly bool
// MaxToolsPerSession 单次会话最多创建的工具数
MaxToolsPerSession int
// MaxSkillsPerSession 单次会话最多学习的技能数
MaxSkillsPerSession int
// Observer 可观测性接口(可选).
// 升华改进(ELEVATED): 通过构造函数注入--进化系统的生命周期与 Engine 一致,
// 不会在运行中更换 observer,所以构造时注入最简洁.
// 替代方案:SetObserver Setter 注入(多一个状态变更点,不如构造时一次性注入清晰).
Observer flyto.EventObserver
// SecretGuard 秘密扫描(可选).
// 设置后,ToolBuilder 在保存工具脚本前扫描内容,阻止含 API key 的脚本持久化.
// nil 时不扫描(向后兼容).
SecretGuard security.SecretGuard
}
Config 是 Evolver 的配置.
type CreateToolTool ¶
type CreateToolTool struct {
// contains filtered or unexported fields
}
CreateToolTool 是"创建新工具"的工具. Agent 调用此工具来定义一个新的运行时工具.
func NewCreateToolTool ¶
func NewCreateToolTool(evolver *Evolver, cwd string) *CreateToolTool
NewCreateToolTool 创建 CreateTool 工具.
func (*CreateToolTool) Description ¶
func (t *CreateToolTool) Description(ctx context.Context) string
func (*CreateToolTool) Execute ¶
func (t *CreateToolTool) Execute(ctx context.Context, input json.RawMessage, progress func(float64, string)) (*ToolResult, error)
func (*CreateToolTool) InputSchema ¶
func (t *CreateToolTool) InputSchema() json.RawMessage
func (*CreateToolTool) Name ¶
func (t *CreateToolTool) Name() string
type DefaultLogReplayer ¶
type DefaultLogReplayer struct {
// contains filtered or unexported fields
}
DefaultLogReplayer is the reference LogReplayer implementation. It composes a LogSource and a FeedbackChannel; either can be any implementation (file / SQL / object-storage), so the struct is not named "FileLogReplayer".
Pairing strategy: For every LogEntry read from the source, the replayer looks up the entity's feedback history and pairs it with at most one feedback per metric. The "first-touch" rule picks the earliest feedback whose Timestamp falls in [entry.Timestamp, entry.Timestamp + window). Zero matches produce a single ReplayEvent with Feedback=nil (contract: nil = KPI not yet arrived).
CLEVER: feedback lookup is lazy per-entity. The first time an entity is seen in the stream we call FeedbackChannel.Query once and cache the result for the rest of the replay. A 1000-entity replay makes 1000 queries total, not 1000 per decision. Cache lives for the duration of one Replay call.
LEGACY: aggregation is fixed at "first-touch per metric". Mean / last / median are deliberately not surfaced -- consumers that need them can compute from raw feedback inside their Reflector. Adding aggregation knobs here would pollute the interface without clear v0.3 demand.
Reflector dispatch:
- Serial in registration order. Reflectors may hold in-process state; parallel dispatch would force every implementor to be thread-safe.
- OnEvent errors are logged but not returned. The contract says the replayer does not stop on reflector errors.
- The reflector slice is snapshotted at the start of each Replay, so a RegisterReflector call mid-replay does not affect the current run.
func NewDefaultLogReplayer ¶
func NewDefaultLogReplayer(source LogSource, feedback FeedbackChannel, opts ...ReplayerOption) *DefaultLogReplayer
NewDefaultLogReplayer constructs a replayer composing source and feedback. Both are required; passing nil is a programming error and panics.
func (*DefaultLogReplayer) RegisterReflector ¶
func (d *DefaultLogReplayer) RegisterReflector(r Reflector)
RegisterReflector adds r to the reflector list. Safe to call concurrently. Duplicate registrations are allowed; the reflector will receive each event as many times as it was registered (the caller's responsibility).
func (*DefaultLogReplayer) Replay ¶
func (d *DefaultLogReplayer) Replay(ctx context.Context, from, to time.Time, filter FilterFunc) error
Replay implements LogReplayer. Scans LogSource for [from, to], pairs each entry with feedback, and dispatches to every registered Reflector.
type DefaultParameterEvolver ¶
type DefaultParameterEvolver struct {
// contains filtered or unexported fields
}
DefaultParameterEvolver is the reference ParameterEvolver.
Propose is a thin delegate to ProposerFunc. Apply is the gatekeeper: approved=true routes the value into ParameterStore.Set; approved=false is a no-op on the store (rejected proposals must not pollute the version chain) but both branches emit an audit log line so an operator can see every decision that was offered, regardless of outcome.
Relation to two-phase design: The interface split Propose/Apply exists to insert gates between them (human approval, ShadowRunner compare, batch approval). This impl keeps that contract intact -- there is no side channel from Propose to Apply. The caller is expected to call Apply once they have decided.
func NewDefaultParameterEvolver ¶
func NewDefaultParameterEvolver(store ParameterStore, proposer ProposerFunc, opts ...EvolverOption) (*DefaultParameterEvolver, error)
NewDefaultParameterEvolver constructs the evolver. Both store and proposer are required; either nil is a programming error and rejected at construction rather than deferred to first call.
func (*DefaultParameterEvolver) Apply ¶
func (e *DefaultParameterEvolver) Apply(ctx context.Context, key string, value any, approved bool, reason string) error
Apply implements ParameterEvolver.
Semantics:
- approved=true: delegates to ParameterStore.Set. Empty reason surfaces the store's ErrReasonRequired verbatim (we do not fabricate a reason on behalf of the caller -- audit trails must reflect human intent).
- approved=false: no-op on the store. An operator rejected the proposal; the version chain stays clean.
Both branches emit an audit log line so the decision itself is observable even when the store is not touched.
func (*DefaultParameterEvolver) Propose ¶
func (e *DefaultParameterEvolver) Propose(ctx context.Context, key string, evidence []Feedback) (any, float64, error)
Propose implements ParameterEvolver. Delegates to the injected proposer. Does not read or write ParameterStore: the interface contract says Propose is read-only relative to storage.
type DefaultShadowRunner ¶
type DefaultShadowRunner struct {
// contains filtered or unexported fields
}
DefaultShadowRunner is the reference ShadowRunner. It runs each sampled test Candidate twice through the Evaluator -- once with the baseline parameter value, once with the proposed candidate value -- and reports per-cohort average fitness, per-dimension mean breakdown, and a divergence score in [0, 1].
What this does NOT do (by design):
- It does not touch ParameterStore.Set. Shadow comparison is read-only relative to the production parameter chain. The caller decides, based on the returned ShadowResult, whether to call ParameterEvolver.Apply.
- It does not decide how to sample. The caller supplies CandidateSampler.
- It does not parallelise scoring. Evaluator implementations (notably FuncEvaluator wrapping caller code) are not guaranteed goroutine-safe. A caller that needs throughput runs multiple RunShadow calls in parallel from the outside.
Divergence formula:
div = min(|avg_baseline - avg_candidate| / max(|avg_baseline|, |avg_candidate|, 1e-9), 1.0)
CLEVER: relative divergence (not absolute) so fitness scales (1000 vs 1001 = 0.1% diff) are comparable with small-scale scales (0.1 vs 0.5 = 400% diff). Absolute diff would produce wildly misleading divergence values across domains.
func NewDefaultShadowRunner ¶
func NewDefaultShadowRunner(store ParameterStore, evaluator Evaluator, sampler CandidateSampler, applier ParamApplier) (*DefaultShadowRunner, error)
NewDefaultShadowRunner constructs the runner. All four dependencies are required; any nil is rejected at construction so the first RunShadow call does not explode with a nil-deref.
func (*DefaultShadowRunner) RunShadow ¶
func (r *DefaultShadowRunner) RunShadow(ctx context.Context, baselineKey string, candidateValue any, traffic float64) (ShadowResult, error)
RunShadow implements ShadowRunner.
type Evaluator ¶
type Evaluator interface {
Score(ctx context.Context, c Candidate) (fitness float64, breakdown map[string]float64, err error)
}
Evaluator 对 Candidate 打 fitness 分, 返回总分和分项 breakdown.
breakdown 的三个用途:
- 调试 -- 哪个维度低导致总分低
- 长反射 -- 发现某维度预测不准 (MetaEvaluator 未来的输入)
- 透明度面板 -- 给客户展示评分依据
breakdown 可为 nil (单维度评分时).
type EvolutionProposal ¶
type EvolutionProposal struct {
ID string `json:"id"`
Type EvolutionType `json:"type"`
Title string `json:"title"`
Description string `json:"description"`
Rationale string `json:"rationale"` // 为什么需要这个进化
Content any `json:"content"` // 具体内容(工具定义/技能/配置变更)
CreatedAt time.Time `json:"created_at"`
Status ProposalStatus `json:"status"`
}
EvolutionProposal 是一个进化提案. Agent 想要进化时,先生成提案,等待审批后才执行.
type EvolutionStore ¶
type EvolutionStore struct {
// contains filtered or unexported fields
}
EvolutionStore 是进化产物的持久化存储.
func NewEvolutionStore ¶
func NewEvolutionStore(dir string) (*EvolutionStore, error)
NewEvolutionStore 创建存储.
func (*EvolutionStore) ListProposals ¶
func (s *EvolutionStore) ListProposals() ([]*EvolutionProposal, error)
ListProposals 列出所有提案.
func (*EvolutionStore) SaveProposal ¶
func (s *EvolutionStore) SaveProposal(p *EvolutionProposal) error
SaveProposal 保存提案到磁盘.
type EvolutionType ¶
type EvolutionType string
EvolutionType 是进化类型枚举.
const ( EvolveNewTool EvolutionType = "new_tool" // 创建新工具 EvolveNewSkill EvolutionType = "new_skill" // 学习新技能 EvolveOptimize EvolutionType = "optimize" // 优化现有工作流 EvolveSelfAdjust EvolutionType = "self_adjust" // 自适应调整 )
type Evolver ¶
type Evolver struct {
// contains filtered or unexported fields
}
Evolver 是自进化系统的主控制器. 它协调 ToolBuilder,SkillLearner,SelfReflector 三大子系统.
func (*Evolver) Propose ¶
func (e *Evolver) Propose(ctx context.Context, proposal *EvolutionProposal) error
Propose 提交一个进化提案. Agent 调用此方法表达"我想进化"的意图,由审批流程决定是否执行.
func (*Evolver) SystemPromptFragment ¶
SystemPromptFragment 返回注入到系统提示中的进化能力说明. 让 Agent 知道自己有创建工具,学习技能,自我反思的能力.
func (*Evolver) ToolBuilder ¶
func (e *Evolver) ToolBuilder() *ToolBuilder
ToolBuilder 返回工具构建器(供 Engine 集成用).
type EvolverOption ¶
type EvolverOption func(*DefaultParameterEvolver)
EvolverOption configures a DefaultParameterEvolver.
func WithAuditLogger ¶
func WithAuditLogger(fn func(format string, args ...any)) EvolverOption
WithAuditLogger routes every Apply call (approved AND rejected) through fn so consumers can wire an AuditSink (DB / Loki / SIEM). Default: log.Printf. Passing nil is a no-op: silent audit would be a dangerous footgun for a governance-critical operation, so the default stays in place.
Named WithAuditLogger rather than WithLogger because WithLogger is already taken by DefaultLogReplayer in the same package. "Audit" is also the more accurate semantic for this sink: Apply decisions must be auditable.
type FeatureFunc ¶
FeatureFunc maps a Candidate to a single scalar feature value. Pure; any error should be treated as a programmer bug (the caller picked a feature that cannot be evaluated on this candidate shape). If your feature genuinely can fail at runtime, route through FuncEvaluator instead.
type Feedback ¶
type Feedback struct {
Timestamp time.Time
Entity string
Metric string
Value float64
Confidence float64
Meta map[string]any
}
Feedback 是一条业务 KPI 反馈记录.
type FeedbackChannel ¶
type FeedbackChannel interface {
Report(ctx context.Context, entity string, metric string, value float64, confidence float64, meta map[string]any) error
Query(ctx context.Context, entity string, since time.Time, metric string) ([]Feedback, error)
}
FeedbackChannel 是业务 KPI 反馈的双向通道.
合并原 FitnessSignal + FeedbackSource: 早期设计 FitnessSignal 负责 push, FeedbackSource 负责 pull. 两者是 同一份数据的两种访问模式, 合并减少概念负担. 实现可任选或都支持.
领域无关: Metric 是字符串 (如 "carrier_on_time_rate"), Value 是 float64 标量. 非标量 KPI 通过 Meta 传递. 引擎不假设任何业务 KPI 结构.
type FileFeedbackChannel ¶
type FileFeedbackChannel struct {
// contains filtered or unexported fields
}
FileFeedbackChannel is the local-filesystem implementation of FeedbackChannel.
On-disk layout:
<root>/<entity-escaped>/2026-04-18.jsonl <root>/<entity-escaped>/2026-04-19.jsonl
Each line is one Feedback record serialised as JSON. Two-level sharding (entity first, then UTC day) matches the Query access pattern: Query is always scoped to a single entity and a time window, so we never open files belonging to other entities.
Entity is escaped with url.PathEscape (same convention as FileParameterStore keys), so entity values containing slashes / dots / whitespace / CJK characters map to a single safe directory name.
Concurrency: see the type-level comment on FileLogSource; the same POSIX O_APPEND atomicity argument applies here. Writers do not take a lock; readers never observe a partial line.
LEGACY: Query does an in-process linear scan over the matching files. Platform-layer SQL FeedbackChannel uses indexed time + metric queries; for v0.3 MVP volumes (client per day < ~10k feedback events) the scan is fast enough and avoids an external dependency.
func NewFileFeedbackChannel ¶
func NewFileFeedbackChannel(root string) (*FileFeedbackChannel, error)
NewFileFeedbackChannel returns a feedback channel rooted at dir. Creates the directory if missing.
func (*FileFeedbackChannel) Query ¶
func (c *FileFeedbackChannel) Query(ctx context.Context, entity string, since time.Time, metric string) ([]Feedback, error)
Query implements FeedbackChannel. Returns all records for the entity whose Timestamp >= since. Empty metric matches every metric; a non-empty metric is matched exactly. Results are sorted by Timestamp ascending.
An unknown entity (directory does not exist) is not an error: new entities legitimately have no history. Empty string entity is a caller error and returns ErrEntityRequired.
func (*FileFeedbackChannel) Report ¶
func (c *FileFeedbackChannel) Report(ctx context.Context, entity, metric string, value, confidence float64, meta map[string]any) error
Report implements FeedbackChannel. Appends one Feedback record to the current UTC day file under the entity's directory. Timestamp is stamped with time.Now().UTC() so the chosen file and the stored timestamp agree; callers cannot override this (if backdating is needed it belongs in Meta, not the canonical Timestamp).
type FileLogSource ¶
type FileLogSource struct {
// contains filtered or unexported fields
}
FileLogSource is the local-filesystem implementation of LogSource.
On-disk layout:
<root>/2026-04-18.jsonl # one file per UTC calendar day <root>/2026-04-19.jsonl
Each line is one LogEntry serialised as JSON. Date-sharded so that Read(from, to) only opens files covering the requested time window rather than scanning the whole corpus.
CLEVER: POSIX O_APPEND guarantees atomicity for writes smaller than PIPE_BUF (4 KiB on Linux). A single LogEntry line is far below that in the expected shape, so concurrent writers do not need a mutex and readers will never observe a half-written line. This is the append-only counterpart to FileParameterStore's rename-based atomicity.
LEGACY: the reader does an in-process linear scan. v0.3 MVP log volume is expected to stay well under 1M lines/day per client, so a full scan is acceptable. An index-backed implementation belongs in the platform-layer SQL LogSource; adding an index here would drag in a CGO/boltdb dependency that CLAUDE.md rule 8 (zero external deps) forbids.
Append is a method on *FileLogSource, not a member of the LogSource interface. Different backends (SQL INSERT, object-storage upload) have wildly different write semantics; forcing Append into the interface would produce awkward wrappers. Callers who need to write hold a *FileLogSource directly.
func NewFileLogSource ¶
func NewFileLogSource(root string) (*FileLogSource, error)
NewFileLogSource returns a log source rooted at dir. Creates the directory if missing.
func (*FileLogSource) Append ¶
func (s *FileLogSource) Append(ctx context.Context, entry LogEntry) error
Append writes one LogEntry to the current UTC day file as a JSON line. If entry.Timestamp is the zero value it is set to time.Now().UTC(); any non-zero value is coerced to UTC so the day-file selection and the stored timestamp agree.
Not part of the LogSource interface. See the type-level comment for the rationale.
func (*FileLogSource) Days ¶
func (s *FileLogSource) Days() ([]time.Time, error)
Days returns all UTC calendar days for which at least one entry has been appended, sorted ascending. Helper for tooling (e.g. tail viewers, housekeeping). Not part of the LogSource interface.
func (*FileLogSource) Read ¶
Read implements LogSource. The returned channel has buffer 64. A background goroutine walks each UTC day file in [from, to], decodes each JSON line, filters by Timestamp, and sends matching entries downstream. The channel is closed when the window is exhausted or ctx is canceled.
Missing day files (no activity that day) are skipped silently. A malformed line aborts the stream and closes the channel early; the caller sees EOF and can diagnose via the file's contents.
type FileParameterStore ¶
type FileParameterStore struct {
// contains filtered or unexported fields
}
FileParameterStore is the local-filesystem implementation of ParameterStore.
On-disk layout:
<root>/<key-escaped>/ current.json # latest value + version number history/v1.json # one file per version (Change struct) history/v2.json lock.json # presence = locked, content has reason + timestamp
Key escaping: url.PathEscape, so dotted/slashed keys (e.g. "evolve.carrier_risk_penalty/Y-express") map to a single safe dir name.
Concurrency model:
- Single process: sync.RWMutex serialises writers, in-memory subscriber list protected by the same lock.
- File atomicity: temp file + os.Rename (POSIX atomic rename on same FS).
- Cross-process: not guaranteed. The platform-layer SQL ParameterStore handles multi-tenant + multi-process concurrency.
LEGACY: Watch is in-process only. External edits to files (or another process holding the same root) are not detected. fsnotify is intentionally excluded (CLAUDE.md rule 8: zero external deps). Multi-process change notification is the SQL impl's job (LISTEN/NOTIFY).
func NewFileParameterStore ¶
func NewFileParameterStore(root string) (*FileParameterStore, error)
NewFileParameterStore returns a store rooted at dir. Creates the root directory if missing.
func (*FileParameterStore) History ¶
History implements ParameterStore. Returns versions in ascending order. limit <= 0 means no cap; otherwise the most-recent `limit` versions are returned.
func (*FileParameterStore) Lock ¶
Lock implements ParameterStore. Requires the key to exist (returns ErrParameterNotFound otherwise). Re-locking an already-locked key overwrites the lock reason and emits a fresh lock event.
func (*FileParameterStore) Rollback ¶
func (s *FileParameterStore) Rollback(ctx context.Context, key string, toVersion int, reason string) (int, error)
Rollback implements ParameterStore. Allowed even when the key is locked (per interface contract: locked params can still be rolled back to escape a bad release without first unlocking). Rollback creates a new version that copies the target's value, preserving the audit chain.
func (*FileParameterStore) Set ¶
func (s *FileParameterStore) Set(ctx context.Context, key string, value any, reason string) (int, error)
Set implements ParameterStore. Returns ErrParameterLocked if the key is currently locked, ErrReasonRequired on empty reason.
func (*FileParameterStore) Unlock ¶
Unlock implements ParameterStore. No-op if the key is not currently locked (idempotent), in which case no event is emitted.
func (*FileParameterStore) Watch ¶
func (s *FileParameterStore) Watch(ctx context.Context, keyPrefix string) (<-chan ChangeEvent, error)
Watch implements ParameterStore. Returns a buffered channel receiving events whose Key starts with keyPrefix (empty prefix matches all). Channel closes when ctx is canceled.
CLEVER: events are dropped on a full subscriber buffer. A slow consumer must not block the writer path; production watchers should drain promptly or accept lossy semantics. Buffer size 16 absorbs short bursts.
type FilterFunc ¶
FilterFunc 决定 LogEntry 是否进入 Replay 流. true 表示通过.
type FlytoLLMClient ¶
type FlytoLLMClient struct {
// contains filtered or unexported fields
}
FlytoLLMClient wraps a flyto.ModelProvider as an evolve.LLMClient.
Direction of dependency: evolve consumes flyto.ModelProvider; providers/ subpackages define concrete providers. The direction is evolve -> flyto, so keeping the adapter in the evolve package is the natural home. Placing it under providers/ would require providers to know about evolve, inverting the boundary.
Sampling-knob translation (Temperature / TopP): flyto.Request is the cross-provider greatest common denominator. As of L683 (commit 1/3), it carries Temperature *float64 and TopP *float64 with passthrough semantics across all 7 providers. The adapter therefore translates LLMCallOpts.{Temperature, TopP} (float64, zero-as-unset) into flyto.Request.{Temperature, TopP} (*float64, nil-as-unset): non-zero values become flyto.Float(v), zero leaves the field nil so the upstream provider applies its own default. Anthropic + extended thinking enforces temperature == 1.0 server-side; the anthropic provider pre-handles that override and emits a parameter_overridden WarningEvent through the stream, which the adapter ignores by design (only TextEvent / TextDeltaEvent drive candidate text -- WarningEvent surfaces to engine observers).
Sampling 旋钮翻译 (Temperature / TopP): flyto.Request 是跨 provider 最大公约数. L683 (commit 1/3) 之后, 它带 Temperature *float64 与 TopP *float64, 7 provider 全 passthrough. adapter 把 LLMCallOpts.{Temperature, TopP} (float64, 零=未设) 翻译为 flyto.Request.{Temperature, TopP} (*float64, nil=未设): 非零值经 flyto.Float(v) 装为指针, 零值保持 nil 让上游 provider 用自己默认. Anthropic + extended thinking 服务端强制 temperature == 1.0, anthropic provider 会预拦覆盖并发 parameter_overridden WarningEvent, adapter 按 设计忽略 (只用 TextEvent/TextDeltaEvent 驱动候选文本, WarningEvent 由 engine observer 路径消费).
Event filtering (candidate generation only needs text):
- Aggregated: TextEvent (authoritative), TextDeltaEvent (fallback)
- Error routed: ErrorEvent -> wraps ErrLLMFailed
- Ignored: ToolUse*, Thinking*, Usage, Done, Turn*, SessionInfo, Permission*, Warning, Compact, Checkpoint*, and any other non-text event
Aggregation precedence (CLEVER): anthropic provider (and others that honor the full event contract) emit both TextDeltaEvent (streaming increments) and TextEvent (the complete text block on block_stop). A naive "sum both" strategy double-counts. We take TextEvent as authoritative: on each TextEvent arrival we append the final text and reset the delta accumulator. If the stream closes without any TextEvent (providers that skip the complete-block emission), we fall back to the delta accumulator so no text is lost.
func NewFlytoLLMClient ¶
func NewFlytoLLMClient(provider flyto.ModelProvider, defaultModel, systemPrompt string) (*FlytoLLMClient, error)
NewFlytoLLMClient builds a FlytoLLMClient. provider is required; it is the underlying flyto.ModelProvider (anthropic / openai / minimax / ...) the adapter will drive. defaultModel is used whenever LLMCallOpts.Model is empty. systemPrompt is injected verbatim into flyto.Request.System on every Complete call; pass "" to omit.
func (*FlytoLLMClient) Complete ¶
func (c *FlytoLLMClient) Complete(ctx context.Context, prompt string, opts LLMCallOpts) (string, error)
Complete implements LLMClient by invoking provider.Stream with a single user message and draining the event channel into a response string. See FlytoLLMClient godoc for event filtering and aggregation rules.
type FuncEvaluator ¶
type FuncEvaluator struct {
// contains filtered or unexported fields
}
FuncEvaluator wraps a user-supplied scoring function as an Evaluator.
Use this when your scorer is not well expressed as a weighted sum: e.g. a decision tree, a hard-constraint-first gate, a composed multi-stage score, or a function that genuinely needs to return an error (reading a lookup file, calling a side-channel service). Everything else should be a WeightedEvaluator -- it is more auditable and its coefficients can be evolved via ParameterStore.
func NewFuncEvaluator ¶
func NewFuncEvaluator(fn func(ctx context.Context, c Candidate) (float64, map[string]float64, error)) (*FuncEvaluator, error)
NewFuncEvaluator wraps fn. fn must be non-nil.
func (*FuncEvaluator) Score ¶
func (e *FuncEvaluator) Score(ctx context.Context, c Candidate) (float64, map[string]float64, error)
Score implements Evaluator by delegating to the wrapped function. ctx cancellation is checked before delegation; the wrapped fn is responsible for checking ctx itself on longer operations.
type FuncReflector ¶
type FuncReflector struct {
// contains filtered or unexported fields
}
FuncReflector wraps a plain function as a Reflector. Use this when the reflector needs bespoke behaviour (direct Apply via ParameterEvolver, webhook push, custom aggregation) that does not fit AggregatorReflector's descriptive-stats shape.
func NewFuncReflector ¶
func NewFuncReflector(fn func(ctx context.Context, event ReplayEvent) error) (*FuncReflector, error)
NewFuncReflector wraps fn. fn must be non-nil.
func (*FuncReflector) OnEvent ¶
func (r *FuncReflector) OnEvent(ctx context.Context, event ReplayEvent) error
OnEvent implements Reflector. ctx cancellation short-circuits before the wrapped fn runs.
type GenOpt ¶
type GenOpt func(*genConfig)
GenOpt 是 Generator.Generate 的 functional option. 预期选项包括 温度 / 模型 / 角色 / 约束 / 历史注入 (见 docs/evolve-strategy.md §6.1).
func WithTemperature ¶
WithTemperature sets LLM sampling temperature for this Generate call. Zero is treated as "unset" (use the LLMClient/provider default), so a caller wanting deterministic 0.0 sampling must instead bypass this option and configure provider Config.Temperature directly. See LLMCallOpts godoc for the rationale.
WithTemperature 设置本次 Generate 调用的 LLM 采样温度. 零值视为 "未设" (用 LLMClient / provider 默认), 想要严格 deterministic 0.0 的调用方 应绕过本 option 直接在 provider Config 层固定. 详见 LLMCallOpts godoc.
type Generator ¶
type Generator interface {
Generate(ctx context.Context, input any, K int, opts ...GenOpt) ([]Candidate, error)
}
Generator 生成 K 个候选方案. 一次 LLM 调用可产出多个候选 (做法 B, 见 docs/evolve-strategy.md §6.1), 通常覆盖 4-6 次 LLM 调用 × 平均 3 候选.
为什么 K 是运行时参数而非构造时: 同一 Generator 在不同阶段需要不同 K (探索阶段 K=10, 稳态 K=3). Generate 参数化让一个 Generator 可复用于 fast loop 和 slow loop.
为什么 Candidate.Payload 用 any: 不同场景的 Candidate 结构差异巨大 (YAML / 承运商 ID / 补丁对象), 引擎层无法预设统一结构. 替代方案 (已放弃): interface{ Key() string; Payload() []byte } -- 强制 caller 额外包装, 且序列化开销不必要.
type GeneratorOption ¶
type GeneratorOption func(*LLMGenerator) error
GeneratorOption configures an LLMGenerator at construction.
func WithMaxTokens ¶
func WithMaxTokens(n int) GeneratorOption
WithMaxTokens caps the LLMClient output length per call. Zero = backend default.
func WithModel ¶
func WithModel(m string) GeneratorOption
WithModel pins a default model string attached to every Complete call and recorded on Candidate.Meta["model"].
func WithPromptTemplate ¶
func WithPromptTemplate(tmpl string) GeneratorOption
WithPromptTemplate overrides the default prompt template. The template is parsed with text/template and receives fields: K (int), Roles ([]string), RolesJoined (string), Input (string, already JSON-encoded when the original input was not a string).
type LLMCallOpts ¶
LLMCallOpts carries per-call tuning knobs. Fields with zero values are expected to fall back to implementation defaults (for example, the wrapper chooses the default model).
Temperature / TopP zero-semantics: zero means "unset" (let the wrapper choose). The wrapper translates a non-zero value to a non-nil pointer when populating flyto.Request, and leaves it nil otherwise so the upstream provider applies its own default. Callers who genuinely want deterministic 0.0 sampling cannot express it through these float64 fields -- they must configure it on the provider Config layer instead. This trade-off keeps the common path ergonomic (skip the option to use defaults) at the cost of one obscure case.
Temperature / TopP 零值语义: 零 = "未设" (让 wrapper 决定). wrapper 在写 flyto.Request 时, 非零值翻译为非 nil 指针, 零值保持 nil 让上游 provider 用自己默认. 想表达严格 deterministic 0.0 的调用方不能用 这两个 float64 字段, 必须改在 provider Config 层固定. 这个取舍换来 常用路径 (省略 option 即用默认) 的简洁, 代价是一个偏门 case.
type LLMClient ¶
type LLMClient interface {
Complete(ctx context.Context, prompt string, opts LLMCallOpts) (string, error)
}
LLMClient is the narrow LLM contract used by LLMGenerator.
Why not use flyto.Provider directly: flyto.Provider is the full Agent session abstraction (tool calls, streaming events, permissions). A generator only needs "prompt in, text out". Depending on Provider would drag the entire engine into evolve and break the "interface matrix consumable on its own" promise of the package.
Consumers wrap their preferred backend (flyto.Provider, openai SDK, local Ollama, etc) into something satisfying this interface. The wrapper lives in the consumer, not evolve.
type LLMGenerator ¶
type LLMGenerator struct {
// contains filtered or unexported fields
}
LLMGenerator is the LLM-backed Generator reference implementation.
Pipeline:
- Render promptTemplate with {K, Roles, RolesJoined, Input}.
- Call client.Complete(ctx, prompt, opts).
- Extract a JSON array from the response (strip markdown fences if any).
- Unmarshal to []rawCandidate, produce []Candidate with auto Meta.
LEGACY: structured output (OpenAI response_format=json_schema, Anthropic tool-use forced JSON) would eliminate the markdown-fence unwrap step, but support is uneven across providers as of 2026-04. Stay on prompt + parse for MVP; upgrade path is a new Option that delegates extraction to the LLMClient when the backend supports it.
func NewLLMGenerator ¶
func NewLLMGenerator(client LLMClient, opts ...GeneratorOption) (*LLMGenerator, error)
NewLLMGenerator builds an LLMGenerator. client is required.
type LearnSkillTool ¶
type LearnSkillTool struct {
// contains filtered or unexported fields
}
LearnSkillTool 是"学习新技能"的工具.
func NewLearnSkillTool ¶
func NewLearnSkillTool(evolver *Evolver) *LearnSkillTool
NewLearnSkillTool 创建 LearnSkill 工具.
func (*LearnSkillTool) Description ¶
func (t *LearnSkillTool) Description(ctx context.Context) string
func (*LearnSkillTool) Execute ¶
func (t *LearnSkillTool) Execute(ctx context.Context, input json.RawMessage, progress func(float64, string)) (*ToolResult, error)
func (*LearnSkillTool) InputSchema ¶
func (t *LearnSkillTool) InputSchema() json.RawMessage
func (*LearnSkillTool) Name ¶
func (t *LearnSkillTool) Name() string
type LearningSource ¶
type LearningSource struct {
// SessionID 来源会话 ID
SessionID string `json:"session_id"`
// TaskDescription 原始任务描述
TaskDescription string `json:"task_description"`
// TurnCount 完成任务用了多少轮
TurnCount int `json:"turn_count"`
// ToolsUsed 使用了哪些工具
ToolsUsed []string `json:"tools_used"`
// Timestamp 学习时间
Timestamp time.Time `json:"timestamp"`
}
LearningSource 记录技能是从哪里学来的.
type LogEntry ¶
type LogEntry struct {
Timestamp time.Time
DecisionID string
Entity string
Payload any
Meta map[string]any
}
LogEntry 是一条决策日志的最低契约. Payload 存具体决策内容, 不同场景结构不同.
type LogReplayer ¶
type LogReplayer interface {
Replay(ctx context.Context, from, to time.Time, filter FilterFunc) error
RegisterReflector(r Reflector)
}
LogReplayer 扫历史决策日志, 配对真实 KPI 反馈, 推送给已注册 Reflector.
为什么叫 Replayer 不叫 Scanner: Replayer 强调"重放决策时刻的完整上下文" -- 不只读日志, 还和 FeedbackChannel 配对真实后果. Scanner 只读不配.
调度边界: 扫描频率 (每日批 / 实时 / 事件触发) 由外部调度器控制, 不在 LogReplayer 接口范围. LogReplayer 只提供 Replay 方法, 调度交给 caller (cron / platform 层调度器 / 手动触发).
type LogSource ¶
LogSource 是 LogReplayer 的底层数据源.
为什么独立于 LogReplayer: LogSource 专注"数据读取", LogReplayer 专注"回放逻辑" (配对 / 过滤 / 分发). 同一 LogReplayer 可接不同 LogSource (文件 / SQL / 对象存储).
type Observation ¶
type Observation struct {
// Pattern 模式描述
Pattern string `json:"pattern"`
// Frequency 出现频率(次数或百分比描述)
Frequency string `json:"frequency"`
// Impact 影响评估
Impact string `json:"impact"`
// Example 具体例子
Example string `json:"example,omitempty"`
}
Observation 是 Agent 观察到的一个模式.
type ParamApplier ¶
ParamApplier projects a parameter value onto a Candidate. The engine does not know how "a carrier_risk_penalty value" shows up inside a candidate decision; the caller encodes that projection. A minimal applier clones the candidate and stores the value in Meta, letting the Evaluator pick it up; a richer applier rewrites Payload.
Purity contract: the applier must return a NEW Candidate (or a value-copy) rather than mutate the input. DefaultShadowRunner calls applier twice per test candidate (once for baseline, once for candidate param); in-place mutation would produce a race.
type ParameterEvolver ¶
type ParameterEvolver interface {
Propose(ctx context.Context, key string, evidence []Feedback) (proposedValue any, confidence float64, err error)
Apply(ctx context.Context, key string, value any, approved bool, reason string) error
}
ParameterEvolver 根据 evidence 提议新参数值.
职责吸收原 SelectionPressure: 偏好权重本身就是参数的一种, 和规则系数 / 阈值无本质区别, 合并到 ParameterEvolver. 一个实现可同时处理多种参数类型.
Propose / Apply 两阶段: Propose 只返回建议值 + 置信度, 不写 ParameterStore. caller 决定是否 Apply (可能还要过 ApprovalFunc). 两阶段支持:
- 人工审批 (Propose → 操作员看 → Apply)
- Shadow 模式 (Propose → ShadowRunner 对比 → Apply)
- 批量决策 (多个 Propose 一起审批)
Propose 接收 []Feedback 作为 evidence: Feedback 本身 (下方 KPI 通道 返回的一条 KPI 反馈) 字段完整包含所有 Propose 需要的证据维度 (Entity / Metric / Value / Confidence / Timestamp / Meta). 早期版本另定义了一个 字段完全相同、用途重合的 FeedbackRecord, 造成"同一语义两份类型"的 设计冗余, 本轮合并清理.
Propose takes []Feedback as evidence: Feedback (the KPI channel record defined below) already carries every dimension Propose needs (Entity / Metric / Value / Confidence / Timestamp / Meta). Earlier revisions defined a field-identical, semantically-overlapping FeedbackRecord sibling -- "one meaning, two types" duplication. This round collapses them into Feedback.
type ParameterStore ¶
type ParameterStore interface {
Get(ctx context.Context, key string) (value any, version int, err error)
Set(ctx context.Context, key string, value any, reason string) (newVersion int, err error)
List(ctx context.Context, prefix string) (keys []string, err error)
History(ctx context.Context, key string, limit int) ([]Change, error)
Rollback(ctx context.Context, key string, toVersion int, reason string) (newVersion int, err error)
Lock(ctx context.Context, key string, reason string) error
Unlock(ctx context.Context, key string, reason string) error
Watch(ctx context.Context, keyPrefix string) (<-chan ChangeEvent, error)
}
ParameterStore 是 evolve 参数的版本化存储.
key 语义: 领域无关字符串 (如 "evolve.carrier_risk_penalty.Y-express"), caller 约定命名 scheme. 引擎不假设 key 结构. 替代方案 (已放弃): 结构化 ParamKey{Domain,Category,Entity} -- 违反"领域无关"原则 (引擎约束 scheme).
reason 强制 (合规硬需求): Set / Rollback / Lock / Unlock 的 reason 空字符串返回 ErrReasonRequired. 审计链的每条变更必须有人类可读原因.
Rollback 语义: 回滚 = "复制旧版本到新版本号", 保持审计链完整, 不删除中间历史.
Lock 独立于配置开关的原因: Lock 是原子操作 + 带审计事件 (谁 Lock / 为何 Lock). 配置开关 + Set 拦截器的组合方案有 race condition 且无审计.
多实现共存: 引擎提供 FileParameterStore (本地文件系统, CLI/TUI). platform 层 自行实现 SQL 版本 (多租户 + 分布式审计). 接口契约保证切换无缝.
type ProposalStatus ¶
type ProposalStatus string
ProposalStatus 是提案状态.
const ( StatusPending ProposalStatus = "pending" StatusApproved ProposalStatus = "approved" StatusRejected ProposalStatus = "rejected" StatusApplied ProposalStatus = "applied" )
type ProposerFunc ¶
ProposerFunc maps KPI evidence to a proposed parameter value.
The caller supplies the mapping because evidence→value shape is highly domain specific: logistics might use a weighted moving average of on_time rate, finance might use an EMA of realised P&L, ad-tech might use a PID controller on conversion lift. Forcing a single statistical flavour on the engine layer would limit consumers; forcing a taxonomy (EMA / PID / mean) would bloat the surface area without covering all real cases.
Confidence is a 0..1 scalar the caller attaches to the proposal so the downstream gate (human approval / ShadowRunner / batch approver) can threshold-filter weak suggestions.
type ReflectTool ¶
type ReflectTool struct {
// contains filtered or unexported fields
}
ReflectTool 是"自我反思"的工具.
func NewReflectTool ¶
func NewReflectTool(evolver *Evolver) *ReflectTool
NewReflectTool 创建 Reflect 工具.
func (*ReflectTool) Description ¶
func (t *ReflectTool) Description(ctx context.Context) string
func (*ReflectTool) Execute ¶
func (t *ReflectTool) Execute(ctx context.Context, input json.RawMessage, progress func(float64, string)) (*ToolResult, error)
func (*ReflectTool) InputSchema ¶
func (t *ReflectTool) InputSchema() json.RawMessage
func (*ReflectTool) Name ¶
func (t *ReflectTool) Name() string
type Reflection ¶
type Reflection struct {
// ID 唯一标识
ID string `json:"id"`
// SessionID 来源会话
SessionID string `json:"session_id"`
// Type 反思类型
Type ReflectionType `json:"type"`
// Summary 反思摘要
Summary string `json:"summary"`
// Observations Agent 观察到的模式
Observations []Observation `json:"observations"`
// Adjustments 建议的调整
Adjustments []Adjustment `json:"adjustments"`
// Metrics 会话指标
Metrics *SessionMetrics `json:"metrics,omitempty"`
// CreatedAt 创建时间
CreatedAt time.Time `json:"created_at"`
}
Reflection 是一次反思记录.
type ReflectionType ¶
type ReflectionType string
ReflectionType 反思类型.
const ( // ReflectPostSession 会话结束后的反思 ReflectPostSession ReflectionType = "post_session" // ReflectOnError 出错后的反思 ReflectOnError ReflectionType = "on_error" // ReflectPeriodic 周期性反思(每 N 次会话) ReflectPeriodic ReflectionType = "periodic" )
type Reflector ¶
type Reflector interface {
OnEvent(ctx context.Context, event ReplayEvent) error
}
Reflector 消费 LogReplayer 推送的事件, 转化为参数调整建议.
为什么独立于 ParameterEvolver: Reflector 是 event-driven (被 LogReplayer 回调), ParameterEvolver 是 on-demand (caller 主动问). 一个 Reflector 实现内部可调多个 ParameterEvolver 更新不同参数. 保持两者独立避免循环依赖.
错误语义: OnEvent 返回 error 时 LogReplayer 不会停止, 会记录错误并继续下一条. Reflector 实现不应假设事件处理顺序原子性 -- 同 key 多条事件可能并发到达.
type ReplayEvent ¶
ReplayEvent is the event LogReplayer pushes to registered Reflectors. Feedback is nil when the decision has just landed and KPI has not arrived yet. Meta is a free-form extension slot for LogReplayer implementations to attach replay-time context (replay session ID, upstream source, causal cohort tag) that downstream Reflectors may consume -- external replayer impls fill it; the in-tree DefaultLogReplayer leaves it nil.
ReplayEvent 是 LogReplayer 推送给 Reflector 的事件. Feedback 为 nil 表示 KPI 尚未到达 (决策刚发生, 反馈延迟中). Meta 是给外部 LogReplayer 实现预留 的 free-form 扩展槽, 可挂 replay 会话 ID / 上游来源 / 因果分组 tag 等 供下游 Reflector 消费 -- 内置 DefaultLogReplayer 不填, 字段留着给外部 replayer 实现激活.
type ReplayerOption ¶
type ReplayerOption func(*DefaultLogReplayer)
ReplayerOption configures a DefaultLogReplayer at construction time.
func WithFeedbackWindow ¶
func WithFeedbackWindow(d time.Duration) ReplayerOption
WithFeedbackWindow sets the time window used for pairing. Default: 7 days. Panics if d <= 0; a zero/negative window would never match anything.
func WithLogger ¶
func WithLogger(fn func(format string, args ...any)) ReplayerOption
WithLogger routes reflector error messages to a custom sink. Default: log.Printf (standard library). Pass a no-op to silence errors.
type RuntimeTool ¶
type RuntimeTool struct {
// contains filtered or unexported fields
}
RuntimeTool 将 ToolDefinition 转换为可执行的 Tool 接口实现. 这是 ToolBuilder 和 Engine 之间的桥梁.
Executor 依赖 (M1 方案 β 严格 DI) ¶
RuntimeTool 持 execenv.Executor 用于启动 evolve 脚本/命令子进程. 本地 CLI 场景由 engine.Config.Executor 传 DefaultExecutor{}, 云端 SaaS 场景由 platform 层传 sandbox.Backend. 无 fallback, nil 直接 panic (在 NewRuntimeTool 校验).
func NewRuntimeTool ¶
func NewRuntimeTool(def *ToolDefinition, cwd string, executor execenv.Executor) *RuntimeTool
NewRuntimeTool 基于定义创建运行时工具.
executor 参数必填, nil 会 panic. 严格 DI 契约 (M1 方案 β).
func (*RuntimeTool) Description ¶
func (t *RuntimeTool) Description(ctx context.Context) string
Description 返回工具描述.
func (*RuntimeTool) Execute ¶
func (t *RuntimeTool) Execute(ctx context.Context, input json.RawMessage, progress func(float64, string)) (*ToolResult, error)
Execute 执行工具.
func (*RuntimeTool) InputSchema ¶
func (t *RuntimeTool) InputSchema() json.RawMessage
InputSchema 返回输入 schema.
func (*RuntimeTool) Metadata ¶
func (t *RuntimeTool) Metadata() RuntimeToolMetadata
Metadata 返回工具元数据.
type RuntimeToolMetadata ¶
type RuntimeToolMetadata struct {
ConcurrencySafe bool
ReadOnly bool
IsEvolved bool // 标记这是 Agent 进化出来的工具
Version int
}
RuntimeToolMetadata is the metadata returned by RuntimeTool.Metadata(), intended for the Engine tool registry to read when classifying an Agent-authored tool (concurrent-safe scheduling, read-only gating, audit tagging of evolved tools, version tracking for iteration).
Status (2026-04-21): all 4 fields are dead in scan-baseline because the adapter that would bridge evolve.RuntimeTool into Engine's tool registry is NOT wired in core. See engine_integration.go header for the C-plan status note; short version is "subagent / skill / memory currently cover reusable-capability needs, C-plan revisits when an industry platform hits a real RPA-shape workload". Fields stay exported so the adapter work does not have to reinvent the metadata shape.
RuntimeToolMetadata 是 RuntimeTool.Metadata() 的返回值, 本意是让 Engine 工具注册表读取用于: 并发安全调度 / 只读性判断 / 标记"这是进化出来的工具" 供审计 / 版本追踪迭代.
现状 (2026-04-21): 4 字段在 scan-baseline 全 dead, 因为 evolve.RuntimeTool 到 Engine 工具注册表的适配器没在 core 里接. 见 engine_integration.go 头部 C 方案状态注释; 简短版: "subagent / skill / memory 当前覆盖能力沉淀需求, C 方案等行业 platform 遇到真正 RPA 形态的反复任务再接". 字段保留 exported 让未来 adapter 直接复用此 shape 不用重定义.
type SelfReflector ¶
type SelfReflector struct {
// contains filtered or unexported fields
}
SelfReflector 负责自我反思和自适应.
func NewSelfReflector ¶
func NewSelfReflector(store *EvolutionStore) *SelfReflector
NewSelfReflector 创建自反思器.
func (*SelfReflector) Apply ¶
func (sr *SelfReflector) Apply(ctx context.Context, proposal *EvolutionProposal) error
Apply 执行自适应调整提案.
func (*SelfReflector) FormatForSystemPrompt ¶
func (sr *SelfReflector) FormatForSystemPrompt() (string, error)
FormatForSystemPrompt 将反思洞察格式化为系统提示词片段.
func (*SelfReflector) GetLessons ¶
func (sr *SelfReflector) GetLessons() ([]string, error)
GetLessons 提取所有"教训"类型的调整建议. 注入到系统提示中,让 Agent 避免重复犯错.
func (*SelfReflector) LoadRecentReflections ¶
func (sr *SelfReflector) LoadRecentReflections(limit int) ([]*Reflection, []error)
LoadRecentReflections 加载最近的反思记录.
升华改进(ELEVATED): 返回 ([]*Reflection, []error) 双清单. 与 SkillLearner.LoadAll 保持一致的契约:成功清单和失败清单并行返回, 调用方可拿到所有可用反思,同时知道哪些文件损坏了,而不是被单一聚合 error 掩盖.
type SessionMetrics ¶
type SessionMetrics struct {
TurnCount int `json:"turn_count"`
TotalInputTokens int `json:"total_input_tokens"`
TotalOutputTokens int `json:"total_output_tokens"`
TotalCostUSD float64 `json:"total_cost_usd"`
ToolUseCounts map[string]int `json:"tool_use_counts"`
ErrorCount int `json:"error_count"`
Duration time.Duration `json:"duration"`
TaskCompleted bool `json:"task_completed"`
}
SessionMetrics 是会话指标.
type ShadowResult ¶
type ShadowResult struct {
BaselineFitness float64
CandidateFitness float64
BaselineBreakdown map[string]float64
CandidateBreakdown map[string]float64
SampleSize int
Divergence float64
Meta map[string]any
}
ShadowResult is the per-run comparison produced by ShadowRunner. Divergence sits on 0-1 (0 = baseline and candidate agree on every sample, 1 = they disagree everywhere).
BaselineBreakdown / CandidateBreakdown hand back the per-dimension score decomposition (the same breakdown Evaluator returns), so the gate UI can explain why the aggregate fitness differs -- not just "candidate is 0.03 higher" but "candidate wins on latency, loses on cost". Meta is a free-form slot for runners to annotate the run (traffic window, tenant, sampler version).
All three are dead in scan-baseline because core ships only the shadow scoring path; the decision UI / A-B gate scheduler / operator dashboard that consume breakdowns live in platform or caller code. test lock is in shadow_runner_default_test.go; Meta has no test lock (extension slot, by design).
ShadowResult 是 ShadowRunner 单次影子运行的对比结果. Divergence 在 0-1 之间 (0 = 每个样本 baseline 和 candidate 一致, 1 = 全部分歧).
BaselineBreakdown / CandidateBreakdown 把 Evaluator 返回的分维度打分透出 来, 让灰度决策面板能解释总分差异来源 -- 不只是 "候选高 0.03", 而是 "候选 胜在延迟, 输在成本". Meta 是 free-form 扩展槽, runner 可挂流量窗口 / 租户 / sampler 版本等标注.
3 字段在 scan-baseline 里标 dead 是因为 core 只出影子打分主路径; 消费 breakdown 的决策 UI / A-B 灰度调度器 / 运营 dashboard 在 platform 或 caller 侧. test 锁在 shadow_runner_default_test.go; Meta 无 test 锁 (扩展槽, 刻意留白).
type ShadowRunner ¶
type ShadowRunner interface {
RunShadow(ctx context.Context, baselineKey string, candidateValue any, traffic float64) (ShadowResult, error)
}
ShadowRunner 把候选参数值影子运行, 不影响生产, 返回 fitness 对比.
Traffic 语义: 0.0-1.0 之间, 表示影子模式覆盖的流量比例. 0.1 = 10% 请求走影子, 90% 走生产. 1.0 = 纯离线 replay 不影响实时流量.
和 ParameterStore 的边界: ShadowRunner 不写 ParameterStore, 只做"并行跑 + 评分". caller 根据 ShadowResult 决定是否调 ParameterEvolver.Apply 推生产.
三步灰度流程 (v0.4 RL policy 落地路径):
- Shadow (ShadowRunner, 生产不受影响)
- A/B (小比例灰度, 1-10% 真实流量)
- 生产 (全量切换)
ShadowRunner 负责第 1 步, 后两步由 platform 层调度.
type SkillDefinition ¶
type SkillDefinition struct {
// Name 技能名称
Name string `json:"name"`
// Description 简短描述
Description string `json:"description"`
// WhenToUse 什么时候应该使用这个技能(模型看到此信息后自动判断)
WhenToUse string `json:"when_to_use"`
// Steps 技能的执行步骤
Steps []SkillStep `json:"steps"`
// Prompt 完整的提示词模板(注入到系统提示中)
Prompt string `json:"prompt"`
// RequiredTools 需要的工具列表
RequiredTools []string `json:"required_tools,omitempty"`
// Tags 分类标签
Tags []string `json:"tags,omitempty"`
// LearnedFrom 学习来源
LearnedFrom *LearningSource `json:"learned_from,omitempty"`
// SuccessRate 成功率(Agent 自我评估)
SuccessRate float64 `json:"success_rate"`
// UsageCount 被使用的次数
UsageCount int `json:"usage_count"`
// Version 版本号
Version int `json:"version"`
// CreatedAt 创建时间
CreatedAt time.Time `json:"created_at"`
// UpdatedAt 最后更新时间
UpdatedAt time.Time `json:"updated_at"`
}
SkillDefinition 是一个技能定义.
type SkillLearner ¶
type SkillLearner struct {
// contains filtered or unexported fields
}
SkillLearner 负责技能学习和管理.
func NewSkillLearner ¶
func NewSkillLearner(store *EvolutionStore, maxPerSession int) *SkillLearner
NewSkillLearner 创建技能学习器.
func (*SkillLearner) Apply ¶
func (sl *SkillLearner) Apply(ctx context.Context, proposal *EvolutionProposal) error
Apply 执行技能学习提案.
func (*SkillLearner) FormatForSystemPrompt ¶
func (sl *SkillLearner) FormatForSystemPrompt() (string, error)
FormatForSystemPrompt 将所有技能格式化为系统提示词片段. Engine 在构建系统提示时调用此方法,让模型知道有哪些可用技能.
func (*SkillLearner) LoadAll ¶
func (sl *SkillLearner) LoadAll() ([]*SkillDefinition, []error)
LoadAll 加载所有已学习的技能.
升华改进(ELEVATED): 返回 ([]*SkillDefinition, []error) 双清单而非单一 error. 原方案:部分失败只记录 firstSkipErr,有部分成功则静默丢弃错误, 全部失败才返回单个错误--调用方无法知道哪些文件损坏. 新方案:每个失败文件产生独立 error,成功和失败信息并行返回,调用方自行决策. 替代方案:<返回 (results, error) 聚合> - 否决原因:单一 error 掩盖了具体哪些文件有问题, 且无法区分"一个文件损坏"和"五个文件损坏".
func (*SkillLearner) RecordUsage ¶
func (sl *SkillLearner) RecordUsage(name string, success bool) error
RecordUsage 记录技能被使用.
type SkillStep ¶
type SkillStep struct {
// Order 步骤顺序
Order int `json:"order"`
// Description 步骤描述
Description string `json:"description"`
// ToolName 使用的工具(可选)
ToolName string `json:"tool_name,omitempty"`
// InputTemplate 工具输入模板(支持变量替换)
InputTemplate string `json:"input_template,omitempty"`
// Condition 执行条件(可选,如"前一步失败时")
Condition string `json:"condition,omitempty"`
// Fallback 失败时的回退步骤
Fallback string `json:"fallback,omitempty"`
}
SkillStep 是技能的一个执行步骤.
type ToolBuilder ¶
type ToolBuilder struct {
// contains filtered or unexported fields
}
ToolBuilder 负责运行时工具构建.
func NewToolBuilder ¶
func NewToolBuilder(store *EvolutionStore, maxPerSession int) *ToolBuilder
NewToolBuilder 创建工具构建器.
func NewToolBuilderWithGuard ¶
func NewToolBuilderWithGuard(store *EvolutionStore, maxPerSession int, guard security.SecretGuard) *ToolBuilder
NewToolBuilderWithGuard 创建带秘密扫描的工具构建器.
升华改进(ELEVATED): tool_builder 写入的 Script 字段可能含有 API key-- Agent 在生成脚本时可能无意间将当前 session 的环境变量硬编码进脚本. SecretGuard 在持久化前拦截,阻止 key 被写入磁盘(高风险路径). 替代方案:<不扫描,依靠 Agent 不犯错> - 否决原因:安全不能依赖 Agent 的"正确性".
func (*ToolBuilder) Apply ¶
func (tb *ToolBuilder) Apply(ctx context.Context, proposal *EvolutionProposal) error
Apply 执行工具创建提案.
func (*ToolBuilder) LoadAll ¶
func (tb *ToolBuilder) LoadAll() ([]*ToolDefinition, error)
LoadAll 加载所有已持久化的工具定义. Engine 启动时调用,恢复之前创建的工具.
type ToolDefinition ¶
type ToolDefinition struct {
// Name 工具名称(必须唯一,不能与内置工具冲突)
Name string `json:"name"`
// Description 工具描述(模型看到的说明)
Description string `json:"description"`
// InputSchema JSON Schema 格式的输入定义
InputSchema json.RawMessage `json:"input_schema"`
// ExecutionType 执行方式
ExecutionType ToolExecType `json:"execution_type"`
// Script shell 脚本内容(当 ExecutionType == ExecScript)
// 输入参数通过环境变量传递:TOOL_INPUT_<PARAM_NAME>
Script string `json:"script,omitempty"`
// Command 命令模板(当 ExecutionType == ExecCommand)
// 支持 {{.param_name}} 模板语法
Command string `json:"command,omitempty"`
// Version 版本号(Agent 迭代改进时递增)
Version int `json:"version"`
// CreatedAt 创建时间
CreatedAt time.Time `json:"created_at"`
// CreatedBy 创建者(会话 ID 或 Agent ID)
CreatedBy string `json:"created_by,omitempty"`
// Rationale Agent 为什么创建这个工具
Rationale string `json:"rationale,omitempty"`
// Tags 标签(用于搜索和分类)
Tags []string `json:"tags,omitempty"`
// ConcurrencySafe 是否可并发执行
ConcurrencySafe bool `json:"concurrency_safe"`
// ReadOnly 是否只读
ReadOnly bool `json:"read_only"`
}
ToolDefinition 是一个运行时定义的工具. Agent 通过对话生成这个结构,然后注册到 Engine 中.
type ToolExecType ¶
type ToolExecType string
ToolExecType 是工具执行方式.
const ( // ExecScript 通过 shell 脚本执行 // Agent 编写完整的 bash 脚本,参数通过环境变量传入 ExecScript ToolExecType = "script" // ExecCommand 通过命令模板执行 // Agent 编写带占位符的命令,引擎替换参数后执行 ExecCommand ToolExecType = "command" // ExecComposite 组合现有工具 // Agent 定义一个工具链(调用多个现有工具的序列) ExecComposite ToolExecType = "composite" )
type ToolResult ¶
ToolResult is the execution result from RuntimeTool.Execute -- a simplified shape kept local to evolve so this package does not import pkg/tools and create a cycle.
Status (2026-04-21): both fields are dead in scan-baseline for the same reason as RuntimeToolMetadata -- the RuntimeTool.Execute → Engine-tool- registry → model-visible-result path is not wired in core. Tests in evolve_test.go assert IsError / Output to lock forward-propagation; the real consumer is the future adapter that translates evolve.ToolResult into tools.Result for model consumption.
ToolResult 是 RuntimeTool.Execute 的返回结构, 简化版保留在 evolve 包内避免 import pkg/tools 形成循环.
现状 (2026-04-21): 2 字段在 scan-baseline 标 dead, 原因与 RuntimeToolMetadata 相同 -- RuntimeTool.Execute → Engine 工具注册表 → 模型可见结果这条路径在 core 里没接. evolve_test.go 已锁 IsError / Output 的 forward 传递; 真正 消费方是未来把 evolve.ToolResult 翻译成 tools.Result 让模型消费的 adapter.
type WeightedEvaluator ¶
type WeightedEvaluator struct {
// contains filtered or unexported fields
}
WeightedEvaluator is the reference "cheap scorer" for the common weighted- linear fitness pattern: fitness = sum_i (feature_i(candidate) * weight_i).
Why this is the main reference impl, not a closure wrapper: Weighted-linear scoring covers the vast majority of "cheap first-pass" evaluators across domains (logistics carrier scoring, ad eCPM, multi-factor stock selection, hiring rubrics, engineering trade-offs). The shape is stable, declarative, and auditable; consumers plug in features without writing a full Evaluator.
For non-linear / fn-style scoring use FuncEvaluator.
func NewWeightedEvaluator ¶
func NewWeightedEvaluator(opts ...WeightedOption) (*WeightedEvaluator, error)
NewWeightedEvaluator builds the evaluator and validates:
- at least one feature registered
- every feature has exactly one weight
- every weight has exactly one feature (no orphan weights)
All errors are reported together (joined via errors.Join) so a misconfigured evaluator surfaces every issue in one go.
func (*WeightedEvaluator) Features ¶
func (e *WeightedEvaluator) Features() []string
Features returns the registered feature names in sorted order. Useful for UI introspection and tests. Returns a copy; mutating the result does not affect the evaluator.
func (*WeightedEvaluator) Score ¶
func (e *WeightedEvaluator) Score(ctx context.Context, c Candidate) (float64, map[string]float64, error)
Score implements Evaluator. breakdown contains the RAW feature values, not weighted contributions: a debugger wants to see "on_time feature was 0.3", not "contribution 0.15 = 0.3 * 0.5". Reconstructing contributions from breakdown + weights is a one-liner; the reverse is lossy.
func (*WeightedEvaluator) Weights ¶
func (e *WeightedEvaluator) Weights() map[string]float64
Weights returns a copy of the weight map.
type WeightedOption ¶
type WeightedOption func(*weightedConfig)
WeightedOption configures a WeightedEvaluator.
func WithFeature ¶
func WithFeature(name string, fn FeatureFunc) WeightedOption
WithFeature registers a named feature extractor. Duplicate names produce a constructor error (silently overwriting a feature is a frequent config- drift source: a copy-pasted option block overriding a prior definition).
func WithNormalization ¶
func WithNormalization(on bool) WeightedOption
WithNormalization enables [0, 1] clipping of every feature value before the weighted sum. Default off: many real-world features (cost in RMB, latency in ms, inventory count) are absolute scales where clipping silently corrupts the score. Only enable when your features are already bounded-unit (fraction, score 0-1).
func WithWeight ¶
func WithWeight(name string, w float64) WeightedOption
WithWeight sets a single weight. Negative values are allowed and convey "smaller feature value is better" (e.g. cost, latency, error rate).
func WithWeights ¶
func WithWeights(m map[string]float64) WeightedOption
WithWeights sets weights in bulk. Equivalent to WithWeight per entry.
Source Files
¶
- default_log_replayer.go
- doc.go
- engine_integration.go
- evaluator_impls.go
- evolve.go
- feedback_channel_file.go
- generator_llm.go
- interfaces.go
- llm_adapter_flyto.go
- log_source_file.go
- parameter_evolver_default.go
- parameter_store_file.go
- reflector_impls.go
- self_reflector.go
- shadow_runner_default.go
- skill_learner.go
- tool_builder.go