Quickstart
This guide walks you through installing inertialai-chroma, connecting to a Chroma collection, and running your first embedding and similarity search.
Prerequisites
- Python 3.11 or later
- An InertialAI API key (sign up here)
- Docker installed (to run Chroma locally)
Step 1: Start a Chroma Instance
Run Chroma using the official Docker image:
docker run -d \
--name chroma \
-p 8000:8000 \
chromadb/chroma
Verify it's running:
curl http://localhost:8000/api/v2/heartbeat
Step 2: Install the Package
pip install inertialai-chroma
or with uv:
uv add inertialai-chroma
Step 3: Set Your API Key
Set your InertialAI API key as an environment variable:
export INERTIALAI_API_KEY="your-api-key"
InertialAIEmbeddingFunction reads this variable by default. If you use a different variable name, see the Configuration Reference.
Step 4: Embed Text Documents
Create a collection and add text documents. InertialAIEmbeddingFunction is called automatically by Chroma on every add() and query():
import chromadb
from inertialai_chroma import InertialAIEmbeddingFunction
# Connect to the running Chroma instance
client = chromadb.HttpClient(host="localhost", port=8000)
# Create the embedding function — reads INERTIALAI_API_KEY from env
ef = InertialAIEmbeddingFunction()
# Create a collection with the embedding function attached
collection = client.create_collection("sensors", embedding_function=ef)
# Add text documents — InertialAI's API is called in the background
collection.add(
documents=[
"temperature spike detected at noon on sensor array B",
"stable overnight temperature readings within normal range",
"humidity levels elevated in zone 3 during afternoon hours",
"pressure anomaly recorded at 14:32 on sensor unit 7",
"all systems nominal — environmental conditions within threshold",
],
ids=["doc-1", "doc-2", "doc-3", "doc-4", "doc-5"],
)
# Query — again, embedding happens automatically
results = collection.query(
query_texts=["unusual thermal event"],
n_results=2,
)
print(results["documents"])
# [['temperature spike detected at noon on sensor array B',
# 'pressure anomaly recorded at 14:32 on sensor unit 7']]
Step 5: Embed Time-Series Data
To embed raw sensor readings, serialize them as a JSON string using json.dumps(). Time-series data is structured as a list of channels, where each channel is a list of numerical readings ordered by time:
import json
import chromadb
from inertialai_chroma import InertialAIEmbeddingFunction
client = chromadb.HttpClient(host="localhost", port=8000)
ef = InertialAIEmbeddingFunction()
collection = client.create_collection("cnc-machines", embedding_function=ef)
collection.add(
documents=[
json.dumps({
"time_series": [
[2100, 2150, 2180, 2140, 2120], # RPM
[65, 66, 68, 67, 66], # Temperature (°C)
[8.2, 8.5, 8.7, 8.4, 8.3], # Vibration (mm/s)
]
}),
json.dumps({
"time_series": [
[1800, 1820, 1850, 1800, 1790], # RPM
[72, 74, 78, 80, 82], # Temperature (°C) — rising
[9.1, 9.4, 10.2, 11.0, 12.1], # Vibration — elevated
]
}),
],
ids=["machine-42-normal", "machine-17-fault"],
)
# Query with a new reading to find the most similar stored pattern
results = collection.query(
query_texts=[
json.dumps({
"time_series": [
[1810, 1830, 1860, 1810, 1800],
[71, 73, 77, 79, 81],
[9.0, 9.3, 10.1, 10.8, 11.9],
]
})
],
n_results=1,
)
print(results["ids"])
# [['machine-17-fault']]
Step 6: Embed Multi-Modal Data
The most powerful capability of inertial-embed-alpha is combining raw sensor readings with natural language context into a single vector. Include both text and time_series keys in your JSON dict:
import json
import chromadb
from inertialai_chroma import InertialAIEmbeddingFunction
client = chromadb.HttpClient(host="localhost", port=8000)
ef = InertialAIEmbeddingFunction()
collection = client.create_collection("patient-vitals", embedding_function=ef)
collection.add(
documents=[
json.dumps({
"text": "Post-exercise recovery, patient ID 1001, age 28, male, marathon runner",
"time_series": [
[155, 148, 140, 133, 127, 122], # Heart rate (BPM) — recovering
[98.9, 98.8, 98.7, 98.6, 98.5, 98.4], # Body temp (°F)
],
}),
json.dumps({
"text": "Resting baseline, patient ID 1002, age 45, female, sedentary lifestyle",
"time_series": [
[72, 71, 73, 72, 71, 72], # Heart rate (BPM) — stable
[98.2, 98.2, 98.3, 98.2, 98.2, 98.1], # Body temp (°F)
],
}),
json.dumps({
"text": "Atrial fibrillation episode, patient ID 1003, age 67, male, cardiac history",
"time_series": [
[88, 112, 79, 134, 95, 118], # Heart rate (BPM) — irregular
[98.6, 98.7, 98.8, 98.9, 99.0, 99.1], # Body temp (°F)
],
}),
],
ids=["patient-1001", "patient-1002", "patient-1003"],
)
# Find stored records most similar to a new patient's readings
results = collection.query(
query_texts=[
json.dumps({
"text": "Elevated heart rate post-activity, patient ID 2001, age 31, male",
"time_series": [
[162, 153, 144, 136, 129, 124],
[99.0, 98.9, 98.8, 98.7, 98.6, 98.5],
],
})
],
n_results=2,
)
print(results["ids"])
# [['patient-1001', 'patient-1002']]
The multi-modal vector captures both the shape of the signal and the semantic meaning of the text — producing more precise matches than either modality alone.
Next Steps
- Multi-Modal Embeddings — Learn more about the multi-modal input format with examples across industrial IoT, healthcare, financial, and other domains.
- Collection Persistence — Understand how collections are safely persisted to disk and reloaded across process restarts.
- Configuration Reference — Explore all constructor options including dimensionality reduction, custom timeouts, and distance spaces.