DataHub API
DataHub API allows you to:
-
Send data records (events) to DataHub to create new datasets or update existing ones.
-
Remove data records from DataHub.
Data formats
DataHub API supports two data formats:
Data format | Endpoint | Media type of the request body |
---|---|---|
JSON | /datahub/v1/events | application/json |
CSV (uncompressed, or ZIP or GZ of a single CSV file) | /datahub/v1/csv/upload | multipart/form-data |
Once you receive a success response (HTTP 201 OK), the ingested data will be available in the DoiT console within 15 minutes, regardless of the data formats.
JSON payload
Payload schema
Before sending your data, you need to convert it to JSON payload based on the DataHub Events schema. Below are the main properties in the schema:
-
provider: The identifier of the data provider. Choose a human-readable value so you can easily recognize the dataset.
-
id: The unique identifier of an event (data record). DoiT DataHub uses the event ID to handle duplicate data (newer data overwrites existing one). You also need the event ID to delete an incorrect ingestion (an event can be deleted 90 minutes after ingestion).
-
time: The timestamp of the event.
-
dimensions: The dimensions of the dataset. You can specify three types of dimensions to serve as filters or grouping criteria in Cloud Analytics:
-
fixed: Standard dimensions of DoiT Cloud Analytics. See Allowed keys for fixed dimensions.
-
label: Custom dimensions of your choice. They appear under Labels in Cloud Analytics.
-
project_label: Labels set on the Google Cloud project level. They appear under Project Labels in Cloud Analytics.
-
-
metrics: The applicable metrics of the ingested data. You can specify two types of metrics:
-
cost and usage: These two metrics map to the Cost and Usage basic metrics in Cloud Analytics.
-
Custom metrics: Metrics relevant to your objectives, such as business metrics, internal usage metrics, and so on. They appear under DataHub metrics in Cloud Analytics.
-
Make sure to sanitize your data, for example, mask personally identifiable information (PII), before sending it to DoiT.
Example payload
Below is an example payload of a DataHub event POST request. It contains the data to be sent to DataHub.
{
"events": [
{
"provider": "Product usage",
"id": "beb21d99-a8c9-4dc0-8a69-5d684cc41e6c",
"dimensions": [
{
"key": "Team",
"type": "label",
"value": "platform"
}
],
"time": "2024-03-01T23:00:00Z",
"metrics": [
{
"value": 1042,
"type": "User Interactions"
}
]
}
]
}
CSV
See CSV ingestion for the syntax, conventions, and limitations when sending data using CSV.