Track Costs
Agenta automatically tracks costs, token usage, and performance metrics for your LLM applications. This data is captured in the ag.metrics namespace of each span.
Overview
When you instrument your application with Agenta, we automatically collect cost and performance metrics for spans of type chat.
Costs are calculated using the latest pricing for each model provider. Token usage is tracked separately for input (prompt) and output (completion) tokens. Execution time is measured in milliseconds for each operation.
Metrics Structure
Cost Metrics
Costs are tracked in USD with the following breakdown:
{
"metrics": {
"costs": {
"cumulative": {
"total": 0.0070902,
"prompt": 0.00355,
"completion": 0.00354
}
}
}
}
The total field shows the total cost across all LLM calls in this span and its children. The prompt field shows the cost attributed to input tokens. The completion field shows the cost for output tokens.
Token Usage
Token consumption is tracked with separate counts for input and output:
{
"metrics": {
"tokens": {
"cumulative": {
"total": 992,
"prompt": 175,
"completion": 817
}
}
}
}
The total field shows all tokens used (prompt plus completion). The prompt field shows input tokens consumed. The completion field shows output tokens generated.
Duration
Execution time is measured in milliseconds:
{
"metrics": {
"duration": {
"cumulative": 19889.343
}
}
}
Agenta tracks metrics at two levels. Incremental metrics represent costs for a single span only. Cumulative metrics aggregate values from the current span plus all child spans.
How to Track Costs
With Auto-Instrumentation
When you use auto-instrumentation from compatible libraries, prompts and tokens are automatically extracted and formatted. Costs are calculated when possible.
import agenta as ag
from openinference.instrumentation.openai import OpenAIInstrumentor
ag.init()
OpenAIInstrumentor().instrument()
@ag.instrument()
def generate_response(prompt: str):
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
With Manual Instrumentation
You can manually add cost metrics to spans using incremental metrics:
import agenta as ag
@ag.instrument()
def custom_llm_call(prompt: str):
# Your custom LLM call logic
response = my_custom_llm.generate(prompt)
# Manually track incremental metrics (for this span only)
ag.tracing.store_metrics({
"costs.incremental.total": 0.0025,
"costs.incremental.prompt": 0.0015,
"costs.incremental.completion": 0.001,
"tokens.incremental.total": 150,
"tokens.incremental.prompt": 100,
"tokens.incremental.completion": 50
})
# Cumulative metrics are automatically calculated by the backend
return response
Automatic Cost Calculation
Agenta calculates costs automatically for major LLM providers using the LiteLLM library. When the cost is not provided in the span and the span type is chat, we try to infer the cost from the number of tokens.
Custom Pricing
For custom models or providers, you can manually set costs using incremental metrics:
import agenta as ag
@ag.instrument()
def custom_model_call(prompt: str):
response = my_model.generate(prompt)
# Calculate custom cost
prompt_tokens = len(prompt.split())
completion_tokens = len(response.split())
# Custom pricing
cost_per_prompt_token = 0.00001
cost_per_completion_token = 0.00002
prompt_cost = prompt_tokens * cost_per_prompt_token
completion_cost = completion_tokens * cost_per_completion_token
total_cost = prompt_cost + completion_cost
# Set incremental metrics
ag.tracing.store_metrics({
"costs.incremental.total": total_cost,
"costs.incremental.prompt": prompt_cost,
"costs.incremental.completion": completion_cost,
"tokens.incremental.total": prompt_tokens + completion_tokens,
"tokens.incremental.prompt": prompt_tokens,
"tokens.incremental.completion": completion_tokens
})
return response
Next steps
Learn about adding metadata to enrich your traces.