Multi-Modal Embeddings
InertialAI's inertial-embed-alpha model can encode text, raw time-series data, or both together into a single dense vector. This is called a multi-modal embedding — and it is InertialAI's core differentiator.
In a multi-modal embedding, the numerical signal and its semantic context are fused simultaneously. Two sensor readings from machines running under similar conditions and described with similar language are closer in vector space than readings that share only one of those properties. This unlocks a level of search precision that neither text-only nor time-series-only embeddings can achieve on their own.
Input Format
Chroma documents are always strings. To pass structured multi-modal data, serialize your input dict as a JSON string using json.dumps().
InertialAIEmbeddingFunction inspects each document automatically: if it successfully parses as a JSON dict, the parsed object is forwarded directly to the InertialAI API. If it is a plain string, it is wrapped as {"text": document}. You do not need to do anything special for text-only documents.
| Document type | How to pass it |
|---|---|
| Text only | Plain string, e.g. "temperature spike at noon" |
| Time-series only | json.dumps({"time_series": [[ch1_vals], [ch2_vals]]}) |
| Multi-modal | json.dumps({"text": "...", "time_series": [[ch1_vals], [ch2_vals]]}) |
Time-Series Format
time_series is a list of channels, where each channel is a list of numerical readings ordered by time:
{
"time_series": [
[val_t1, val_t2, val_t3, ...], # Channel 1
[val_t1, val_t2, val_t3, ...], # Channel 2
# ... additional channels
]
}
For example, a 3-axis accelerometer with 10 timesteps:
{
"time_series": [
[0.12, 0.15, 0.14, 0.18, 0.16, 0.13, 0.11, 0.17, 0.15, 0.14], # X-axis
[0.03, 0.02, 0.04, 0.03, 0.05, 0.02, 0.03, 0.04, 0.03, 0.02], # Y-axis
[9.79, 9.81, 9.80, 9.82, 9.81, 9.80, 9.79, 9.81, 9.80, 9.81], # Z-axis (gravity)
]
}
Domain Examples
Industrial / Manufacturing
Embed CNC machine readings with operational context to find similar production runs or surface fault signatures:
import json
import chromadb
from inertialai_chroma import InertialAIEmbeddingFunction
client = chromadb.HttpClient(host="localhost", port=8000)
ef = InertialAIEmbeddingFunction()
collection = client.create_collection("cnc-runs", embedding_function=ef)
collection.add(
documents=[
json.dumps({
"text": "CNC machine #42, production shift 2, cutting 6061 aluminum, normal operation",
"time_series": [
[2100, 2150, 2180, 2140, 2120], # RPM
[65, 66, 68, 67, 66], # Temperature (°C)
[8.2, 8.5, 8.7, 8.4, 8.3], # Vibration (mm/s)
],
}),
json.dumps({
"text": "CNC machine #42, production shift 2, cutting 6061 aluminum, tool wear detected",
"time_series": [
[2100, 2090, 2070, 2050, 2030], # RPM — dropping under load
[65, 68, 72, 77, 83], # Temperature (°C) — rising
[8.2, 9.1, 10.3, 11.8, 13.2], # Vibration — elevated
],
}),
],
ids=["run-001-normal", "run-002-wear"],
)
# Search for runs with a similar fault signature
results = collection.query(
query_texts=[
json.dumps({
"text": "possible tool wear, cutting steel",
"time_series": [
[2100, 2085, 2062, 2041, 2018],
[65, 69, 74, 80, 88],
[8.3, 9.3, 10.6, 12.2, 14.1],
],
})
],
n_results=1,
)
print(results["ids"])
# [['run-002-wear']]
Healthcare / Wearables
Pair patient vitals with demographic and clinical context to power case-similarity retrieval:
collection.add(
documents=[
json.dumps({
"text": "Post-exercise monitoring, patient 1001, age 28, male, competitive cyclist",
"time_series": [
[172, 165, 157, 150, 143, 137], # Heart rate (BPM) — recovering
[98.9, 98.8, 98.7, 98.6, 98.5, 98.4], # Body temp (°F)
[16, 15, 14, 13, 13, 12], # Respiration rate (breaths/min)
],
}),
json.dumps({
"text": "Resting baseline, patient 1002, age 52, female, hypertension history",
"time_series": [
[78, 77, 79, 78, 77, 78], # Heart rate (BPM) — elevated resting
[98.4, 98.4, 98.5, 98.4, 98.3, 98.4], # Body temp (°F)
[14, 14, 15, 14, 14, 13], # Respiration rate (breaths/min)
],
}),
json.dumps({
"text": "Atrial fibrillation episode, patient 1003, age 67, male, known cardiac arrhythmia",
"time_series": [
[88, 112, 79, 134, 95, 118], # Heart rate (BPM) — irregular rhythm
[98.6, 98.7, 98.8, 98.9, 99.0, 99.1], # Body temp (°F)
[18, 17, 20, 16, 19, 17], # Respiration rate (breaths/min)
],
}),
],
ids=["patient-1001", "patient-1002", "patient-1003"],
)
IoT / Smart Building
Embed environmental sensor readings with location and operational context to detect climate anomalies or occupancy patterns:
collection.add(
documents=[
json.dumps({
"text": "Building A, floor 3, conference room, HVAC on, full occupancy",
"time_series": [
[21.5, 21.8, 22.1, 22.4, 22.6], # Temperature (°C) — rising with occupancy
[42.1, 43.5, 45.2, 46.8, 48.1], # Humidity (%) — increasing
[1012, 1012, 1013, 1013, 1012], # Pressure (hPa)
[850, 920, 1050, 1180, 1240], # CO₂ (ppm) — rising
],
}),
json.dumps({
"text": "Building A, floor 3, conference room, HVAC on, empty overnight",
"time_series": [
[19.2, 19.1, 19.0, 19.1, 19.2], # Temperature (°C) — stable, cooler
[38.5, 38.3, 38.2, 38.4, 38.5], # Humidity (%) — stable, lower
[1013, 1013, 1013, 1012, 1013], # Pressure (hPa)
[415, 412, 410, 411, 413], # CO₂ (ppm) — at ambient level
],
}),
],
ids=["conf-room-occupied", "conf-room-empty"],
)
Financial Markets
Embed price and volume time-series with market context to identify analogous market conditions:
collection.add(
documents=[
json.dumps({
"text": "AAPL, tech sector, Q4 earnings beat, bull market rally, large-cap",
"time_series": [
[150.2, 151.3, 153.8, 156.2, 158.1], # Price (USD)
[1250000, 1380000, 1920000, 2100000, 1890000], # Volume
],
}),
json.dumps({
"text": "AAPL, tech sector, macro downturn, Fed rate hike announcement, large-cap",
"time_series": [
[158.5, 155.2, 151.8, 148.3, 144.1], # Price (USD) — declining
[1890000, 2340000, 2780000, 3100000, 2940000], # Volume — elevated selling
],
}),
],
ids=["aapl-rally", "aapl-selloff"],
)
Sports Analytics
Embed motion capture data with athlete and activity context for performance analysis and movement classification:
collection.add(
documents=[
json.dumps({
"text": "Standing vertical jump, athlete ID 789, elite basketball player, training session",
"time_series": [
[34.61, 35.11, 35.21, 35.36, 35.25], # Vertical acceleration (m/s²)
[2.1, 2.3, 2.5, 2.4, 2.2], # Horizontal velocity (m/s)
],
}),
json.dumps({
"text": "Drop jump from 40 cm box, athlete ID 789, elite basketball player, training session",
"time_series": [
[18.2, 22.4, 41.3, 38.7, 29.1], # Vertical acceleration (m/s²)
[0.1, 0.2, 0.8, 1.1, 0.9], # Horizontal velocity (m/s)
],
}),
],
ids=["athlete-789-standing-jump", "athlete-789-drop-jump"],
)
Mixed Collections
You can mix text-only, time-series-only, and multi-modal documents in the same collection. Input type detection is automatic:
collection.add(
documents=[
# Plain text — no JSON serialization needed
"sensor offline — manual inspection required for unit 7",
# Time-series only
json.dumps({
"time_series": [[72, 74, 73, 75, 71]]
}),
# Multi-modal
json.dumps({
"text": "elevated temperature, HVAC fault likely",
"time_series": [[28.5, 29.1, 29.8, 30.4, 31.2]],
}),
],
ids=["note-1", "reading-1", "alert-1"],
)
Querying with Multi-Modal Inputs
You can query a collection using any input mode, regardless of how the stored documents were embedded. Pass your query using the same JSON serialization format:
results = collection.query(
query_texts=[
json.dumps({
"text": "possible tool wear event, elevated temperature",
"time_series": [
[2100, 2085, 2065, 2040, 2020],
[65, 69, 74, 80, 87],
[8.3, 9.2, 10.5, 12.1, 14.0],
],
})
],
n_results=3,
)
Best Practices
- Describe conditions, not just labels. Text in the
textfield is most effective when it describes the context in which the data was recorded — machine ID, patient demographics, location, operational state — rather than a bare category label. - Be consistent with channel count. Using the same number of channels across all documents in a collection helps the model make accurate comparisons.
- Be consistent with time-series length. Within each channel, consistent timestep counts across documents improve embedding quality.
- Mix input modes freely. You do not need separate collections for different input types — the embedding function handles detection automatically.
- Match query mode to document mode when possible. Querying with a multi-modal input will produce the most semantically precise results against multi-modal documents.