Managing Testsets
This guide covers how to create, list, and retrieve testsets using the Agenta SDK for evaluation purposes.
Creating a Testset
Use ag.testsets.acreate() to create a new testset with data:
- Python SDK
import agenta as ag
# Create a testset with simple data
testset = await ag.testsets.acreate(
data=[
{"country": "Germany", "capital": "Berlin"},
{"country": "France", "capital": "Paris"},
{"country": "Spain", "capital": "Madrid"}
],
name="Country Capitals",
)
print(f"Created testset with ID: {testset.id}")
print(f"Name: {testset.name}")
print(f"Slug: {testset.slug}")
Parameters:
data: A list of dictionaries containing your test data. Each dictionary represents one testcase.name: The name of your testset.
Returns: A TestsetRevision object containing:
id: The UUID of the created testsetname: The testset nameslug: The testset slugdata: The test data (withtestcasesstructure)
Sample Output:
{
"id": "01963413-3d39-7650-80ce-3ad5d688da6c",
"name": "Country Capitals",
"slug": "3ad5d688da6c",
"data": {
"testcases": [
{"data": {"country": "Germany", "capital": "Berlin"}},
{"data": {"country": "France", "capital": "Paris"}},
{"data": {"country": "Spain", "capital": "Madrid"}}
]
}
}
The data parameter accepts a simple list of dictionaries. The SDK automatically converts this to the structured TestsetRevisionData format internally.
Upserting a Testset
Use ag.testsets.aupsert() to create a new testset or update an existing one if it already exists:
- Python SDK
import agenta as ag
# Create or update a testset
testset = await ag.testsets.aupsert(
name="Country Capitals",
data=[
{"country": "Germany", "capital": "Berlin"},
{"country": "France", "capital": "Paris"},
{"country": "Spain", "capital": "Madrid"},
{"country": "Italy", "capital": "Rome"},
],
)
print(f"Upserted testset with ID: {testset.id}")
Parameters:
name(required): The name of your testset. Used to find existing testset.data(required): A list of dictionaries containing your test data.testset_id(optional): If provided, updates the testset with this specific ID.
Returns: A TestsetRevision object with the created or updated testset.
Use aupsert() when you want to update an existing testset with the same name, or create it if it doesn't exist. This is useful in CI/CD pipelines where you want to keep testsets synchronized. Use acreate() when you explicitly want to create a new testset every time.
Listing Testsets
To list all testsets in your project, use ag.testsets.alist():
- Python SDK
import agenta as ag
# List all testsets
testsets = await ag.testsets.alist()
print(f"Found {len(testsets)} testsets:")
for testset in testsets:
print(f" - {testset.name} (ID: {testset.id})")
Parameters: None required.
Returns: A list of TestsetRevision objects, each containing:
id: The testset UUIDname: The testset nameslug: The testset slug- Additional metadata fields
Sample Output:
[
{
"id": "01963413-3d39-7650-80ce-3ad5d688da6c",
"name": "Country Capitals",
"slug": "country-capitals"
},
{
"id": "01963520-4e4a-8761-91df-4be6e799eb7d",
"name": "Math Problems",
"slug": "math-problems"
}
]
Retrieving a Testset by ID
To retrieve a specific testset by its ID, use ag.testsets.aretrieve():
- Python SDK
import agenta as ag
# Retrieve a specific testset (using the testset_id from creation)
testset = await ag.testsets.aretrieve(testset_id=testset_id)
if testset:
print(f"Retrieved testset: {testset.id}")
print(f"Testcases count: {len(testset.data.testcases) if testset.data and testset.data.testcases else 0}")
else:
print("Testset not found")
Parameters:
testset_id: The UUID of the testset to retrieve
Returns: A TestsetRevision object (or None if not found) containing:
id: The testset revision UUIDtestset_id: The parent testset UUIDslug: The revision slugversion: The revision version numberdata: TheTestsetRevisionDatawith all testcases
Sample Output:
{
"id": "01963413-3d39-7650-80ce-3ad5d688da6c",
"testset_id": "01963413-3d39-7650-80ce-3ad5d688da6c",
"slug": "3ad5d688da6c",
"version": "1",
"data": {
"testcases": [
{"data": {"country": "Germany", "capital": "Berlin"}},
{"data": {"country": "France", "capital": "Paris"}},
{"data": {"country": "Spain", "capital": "Madrid"}}
]
}
}
Currently using the legacy testset API. When retrieving a testset, the function returns a TestsetRevision object with version "1". In the future, this will support the new versioning system where each update creates a new revision.
Retrieving a Testset by Name
While there's no dedicated function for this, you can easily find a testset by name by filtering the results from ag.testsets.alist():
- Python SDK
import agenta as ag
async def get_testset_by_name(name: str):
"""Helper function to find a testset by name."""
testsets = await ag.testsets.alist()
if not testsets:
return None
for testset in testsets:
if testset.name == name:
return testset
return None
# Usage
testset = await get_testset_by_name("Country Capitals")
if testset:
print(f"Found testset: {testset.name} with ID: {testset.id}")
else:
print("Testset not found")
This pattern shows how you can implement your own helper functions to filter and find testsets based on custom criteria. You can extend this to search by tags or other metadata fields.
Working with Test Data
Once you have a testset, you can access the testcases within it:
- Python SDK
import agenta as ag
# Retrieve a testset
testset = await ag.testsets.aretrieve(testset_id=testset_id)
# Access testcases
if testset and testset.data and testset.data.testcases:
for testcase in testset.data.testcases:
print(f"Testcase: {testcase.data}")
# Use testcase.data in your evaluation
Each testcase contains a data field with the dictionary you provided during creation. You can use these testcases directly in your evaluations.
