Bundle Dataset
Join two to five remote data sources into a single named dataset. Returns a dataset_id you can pass directly to analyze or visualize.
/v1/datasets/bundle
curl -X POST "https://analytics.toolkitapi.io/v1/datasets/bundle" \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"sources": [
{
"alias": "orders",
"data_url": "https://storage.example.com/orders.csv",
"file_type": "csv"
},
{
"alias": "customers",
"data_url": "https://storage.example.com/customers.parquet",
"file_type": "parquet"
}
],
"joins": [
{
"left_alias": "orders",
"right_alias": "customers",
"left_key": "customer_id",
"right_key": "id",
"join_type": "INNER"
}
]
}'
import httpx
resp = httpx.post(
"https://analytics.toolkitapi.io/v1/datasets/bundle",
json={
"sources": [
{
"alias": "orders",
"data_url": "https://storage.example.com/orders.csv",
"file_type": "csv"
},
{
"alias": "customers",
"data_url": "https://storage.example.com/customers.parquet",
"file_type": "parquet"
}
],
"joins": [
{
"left_alias": "orders",
"right_alias": "customers",
"left_key": "customer_id",
"right_key": "id",
"join_type": "INNER"
}
]
},
)
print(resp.json())
const resp = await fetch("https://analytics.toolkitapi.io/v1/datasets/bundle", {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
"sources": [
{
"alias": "orders",
"data_url": "https://storage.example.com/orders.csv",
"file_type": "csv"
},
{
"alias": "customers",
"data_url": "https://storage.example.com/customers.parquet",
"file_type": "parquet"
}
],
"joins": [
{
"left_alias": "orders",
"right_alias": "customers",
"left_key": "customer_id",
"right_key": "id",
"join_type": "INNER"
}
]
}),
});
const data = await resp.json();
console.log(data);
# See curl example
{
"dataset_id": "ds_bundle_xyz789",
"sources": ["orders", "customers"],
"columns": [
{"name": "orders.order_id", "type": "Int64", "nullable": false},
{"name": "orders.customer_id", "type": "Int64", "nullable": false},
{"name": "orders.revenue", "type": "Float64", "nullable": true},
{"name": "customers.id", "type": "Int64", "nullable": false},
{"name": "customers.region", "type": "String", "nullable": true},
{"name": "customers.tier", "type": "String", "nullable": true}
],
"schema_fingerprint": "fp_bundle_xyz789"
}
Description
How to Use
1. Upload or locate each data file and obtain a URL the API can reach (public endpoint or pre-signed S3/GCS URL). 2. Assign a short, unique `alias` to each source (e.g. `orders`, `customers`). 3. Define at least one `JoinDefinition` pairing `left_alias` + `left_key` with `right_alias` + `right_key`. 4. `POST` the payload to `/v1/datasets/bundle`. 5. Note the returned `dataset_id` and pass it to `/v1/analyze` or `/v1/visualize`.
About This Tool
The **Bundle Dataset** endpoint fetches two to five remote data files, executes one or more joins across them, and registers the resulting virtual dataset under a fresh `dataset_id`. That ID can be passed immediately to `/v1/analyze`, `/v1/visualize`, or `/v1/validate-chart` — exactly like a dataset produced from a single source.
Sources can be a mix of formats (CSV, JSON, Parquet, TSV) hosted on any public or pre-signed URL. Each source receives a short `alias` that becomes the column-name prefix in the combined schema (e.g. `orders.revenue`, `customers.region`), preventing collisions and making downstream query expressions unambiguous.
Up to four `JoinDefinition` entries chain the sources together. If you need a cross-product or anti-join, use `join_type: CROSS` or `LEFT ANTI` respectively.
Why Use This Tool
- Cross-table reporting — join a transactions file to a dimension table of customers or products for enriched analysis without a database.
- Multi-format ETL — combine a nightly Parquet export with a CSV lookup table in a single call.
- Agent-driven analytics — an LLM can discover relevant data files, bundle them on the fly, and immediately query the result.
- Schema exploration — inspect `columns` in the response to understand the full merged schema before writing a query.
Start using Bundle Dataset now
Get your free API key and make your first request in under a minute.