Evaluation

Call fluiq.eval() once after instrument() — Fluiq runs an LLM-as-judge on every traced LLM response, scores each metric (0–1), and stores results in your dashboard. Use block mode to gate on quality in CI.

Call fluiq.eval() once after instrument() — every subsequent traced LLM call is scored in the background. Results appear in the Evaluations dashboard; a warning is logged when a score falls below its threshold.

Python
import openai
import fluiq

fluiq.instrument(api_key="fl_...")
fluiq.eval(
    metrics=["hallucination", "relevance"],  # scored on every LLM call
    mode="warn",                             # default — never blocks
    thresholds={"hallucination": 0.8, "relevance": 0.7},
)

client = openai.OpenAI()

# Evaluation fires automatically after this call
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What year did World War II end?"}],
)
print(response.choices[0].message.content)
# Scores visible in the Fluiq Evaluations tab