📦

Bundle Dataset

Join two to five remote data sources into a single named dataset. Returns a dataset_id you can pass directly to analyze or visualize.

POST /v1/datasets/bundle
curl -X POST "https://analytics.toolkitapi.io/v1/datasets/bundle" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "sources": [
      {
        "alias": "orders",
        "data_url": "https://storage.example.com/orders.csv",
        "file_type": "csv"
      },
      {
        "alias": "customers",
        "data_url": "https://storage.example.com/customers.parquet",
        "file_type": "parquet"
      }
    ],
    "joins": [
      {
        "left_alias": "orders",
        "right_alias": "customers",
        "left_key": "customer_id",
        "right_key": "id",
        "join_type": "INNER"
      }
    ]
  }'
import httpx

resp = httpx.post(
    "https://analytics.toolkitapi.io/v1/datasets/bundle",
    json={
    "sources": [
      {
        "alias": "orders",
        "data_url": "https://storage.example.com/orders.csv",
        "file_type": "csv"
      },
      {
        "alias": "customers",
        "data_url": "https://storage.example.com/customers.parquet",
        "file_type": "parquet"
      }
    ],
    "joins": [
      {
        "left_alias": "orders",
        "right_alias": "customers",
        "left_key": "customer_id",
        "right_key": "id",
        "join_type": "INNER"
      }
    ]
  },
)
print(resp.json())
const resp = await fetch("https://analytics.toolkitapi.io/v1/datasets/bundle", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    "sources": [
      {
        "alias": "orders",
        "data_url": "https://storage.example.com/orders.csv",
        "file_type": "csv"
      },
      {
        "alias": "customers",
        "data_url": "https://storage.example.com/customers.parquet",
        "file_type": "parquet"
      }
    ],
    "joins": [
      {
        "left_alias": "orders",
        "right_alias": "customers",
        "left_key": "customer_id",
        "right_key": "id",
        "join_type": "INNER"
      }
    ]
  }),
});
const data = await resp.json();
console.log(data);
# See curl example
Response 200 OK
{
  "dataset_id": "ds_bundle_xyz789",
  "sources": ["orders", "customers"],
  "columns": [
    {"name": "orders.order_id", "type": "Int64", "nullable": false},
    {"name": "orders.customer_id", "type": "Int64", "nullable": false},
    {"name": "orders.revenue", "type": "Float64", "nullable": true},
    {"name": "customers.id", "type": "Int64", "nullable": false},
    {"name": "customers.region", "type": "String", "nullable": true},
    {"name": "customers.tier", "type": "String", "nullable": true}
  ],
  "schema_fingerprint": "fp_bundle_xyz789"
}

Description

Join two to five remote data sources into a single named dataset. Returns a dataset_id you can pass directly to analyze or visualize.

How to Use

1

1. Upload or locate each data file and obtain a URL the API can reach (public endpoint or pre-signed S3/GCS URL). 2. Assign a short, unique `alias` to each source (e.g. `orders`, `customers`). 3. Define at least one `JoinDefinition` pairing `left_alias` + `left_key` with `right_alias` + `right_key`. 4. `POST` the payload to `/v1/datasets/bundle`. 5. Note the returned `dataset_id` and pass it to `/v1/analyze` or `/v1/visualize`.

About This Tool

The **Bundle Dataset** endpoint fetches two to five remote data files, executes one or more joins across them, and registers the resulting virtual dataset under a fresh `dataset_id`. That ID can be passed immediately to `/v1/analyze`, `/v1/visualize`, or `/v1/validate-chart` — exactly like a dataset produced from a single source.

Sources can be a mix of formats (CSV, JSON, Parquet, TSV) hosted on any public or pre-signed URL. Each source receives a short `alias` that becomes the column-name prefix in the combined schema (e.g. `orders.revenue`, `customers.region`), preventing collisions and making downstream query expressions unambiguous.

Up to four `JoinDefinition` entries chain the sources together. If you need a cross-product or anti-join, use `join_type: CROSS` or `LEFT ANTI` respectively.

Why Use This Tool

Start using Bundle Dataset now

Get your free API key and make your first request in under a minute.