Evaluation
Call fluiq.eval() once after instrument() — Fluiq runs an LLM-as-judge on every traced LLM response, scores each metric (0–1), and stores results in your dashboard. Use block mode to gate on quality in CI.
Call fluiq.eval() once after instrument() — every subsequent traced LLM call is scored in the background. Results appear in the Evaluations dashboard; a warning is logged when a score falls below its threshold.
Python
import openai
import fluiq
fluiq.instrument(api_key="fl_...")
fluiq.eval(
metrics=["hallucination", "relevance"], # scored on every LLM call
mode="warn", # default — never blocks
thresholds={"hallucination": 0.8, "relevance": 0.7},
)
client = openai.OpenAI()
# Evaluation fires automatically after this call
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What year did World War II end?"}],
)
print(response.choices[0].message.content)
# Scores visible in the Fluiq Evaluations tab