Skip to main content

Collection Persistence

Chroma can persist collection configuration to disk so that collections — including their embedding function settings — survive process restarts. InertialAIEmbeddingFunction fully supports this via Chroma's get_config() / build_from_config() protocol, while ensuring that API credentials are never written to disk.

How It Works

When Chroma serializes a collection, it calls get_config() on the embedding function and stores the result. When the collection is reopened, Chroma calls build_from_config() with the stored config to reconstruct the embedding function automatically.

InertialAIEmbeddingFunction.get_config() stores the environment variable name rather than the API key value:

{
"api_key_env_var": "INERTIALAI_API_KEY", # name only — never the key value
"model_name": "inertial-embed-alpha",
"dimensions": None,
"timeout": 60.0,
}

At load time, build_from_config() reads the API key from the named environment variable. This means:

  • Persisted collections are safe to commit to version control — no credentials are stored.
  • The environment variable must be set in the process that reopens the collection.
  • Rotating your API key only requires updating the environment variable — no changes to stored collection data are needed.

Example: Persist and Reload a Collection

import json
import chromadb
from inertialai_chroma import InertialAIEmbeddingFunction

# --- Session 1: Create the collection and add data ---

client = chromadb.PersistentClient(path="./chroma-data")
ef = InertialAIEmbeddingFunction()

collection = client.create_collection("sensors", embedding_function=ef)

collection.add(
documents=[
json.dumps({
"text": "CNC machine #42, cutting aluminum, normal operation",
"time_series": [
[2100, 2150, 2180, 2140, 2120],
[65, 66, 68, 67, 66],
],
}),
json.dumps({
"text": "CNC machine #42, cutting aluminum, tool wear detected",
"time_series": [
[2100, 2090, 2070, 2050, 2030],
[65, 68, 72, 77, 83],
],
}),
],
ids=["run-001", "run-002"],
)

print("Collection created and data added.")

# --- Session 2: Reopen the collection (in a new process) ---

client2 = chromadb.PersistentClient(path="./chroma-data")

# Chroma reconstructs InertialAIEmbeddingFunction from the stored config,
# resolving INERTIALAI_API_KEY from the environment automatically.
# No need to pass embedding_function= here.
collection2 = client2.get_collection("sensors")

results = collection2.query(
query_texts=["elevated vibration and temperature"],
n_results=1,
)

print(results["ids"])
# [['run-002']]

Note: When reopening a persisted collection, do not pass embedding_function= to get_collection(). Chroma reconstructs the embedding function from the stored config automatically. If you pass an embedding function here, Chroma will use it to validate the stored config — which can raise an error if settings such as model_name or dimensions do not match.

Custom Environment Variable Names

If you manage multiple InertialAI API keys — for example, separate keys for development and production — configure api_key_env_var to point to the appropriate variable:

import chromadb
from inertialai_chroma import InertialAIEmbeddingFunction

client = chromadb.PersistentClient(path="./chroma-prod")

# The name "INERTIALAI_API_KEY_PROD" is stored in the collection config
ef = InertialAIEmbeddingFunction(api_key_env_var="INERTIALAI_API_KEY_PROD")

collection = client.create_collection("prod-sensors", embedding_function=ef)

When this collection is reopened, Chroma will look for INERTIALAI_API_KEY_PROD in the environment. Ensure the variable is available in every process that opens the collection.

Security Considerations

  • Do not pass the API key via api_key=. That parameter is deprecated. Its value is not included in get_config(), which means the embedding function cannot be reconstructed automatically when the collection is reopened. Use api_key_env_var instead.
  • API key values are never written to disk. The persisted config contains only the environment variable name.
  • Rotate keys by updating the environment variable. No stored collection data needs to change.