Databases and Data Sources

HEAT uses a programmatic approach to managing data sources, which are described and administered via both the API and the Cluster Manager. Out of the box, HEAT includes several internal managed data sources that operate within the cluster to provide essential functionality. These internal sources are not designed for external access.

In addition to these data sources, HEAT maintains a number of internal databases that support its core operations. These include:

Platform Configuration: Stores configuration settings for the HEAT platform.
Session and Ingest Configurations: Maintains the state of sessions and ingest configurations.

Crucially, no user data beyond configuration is stored in these internal databases. Instead, the HEAT core engine functions as a state engine, holding pointers to where ingested data is actually stored - in a functionally separate, external data store. This design ensures that all user-supplied data is kept external to HEAT’s internal state, while the platform itself handles request routing and overall orchestration.

Understanding this data architecture is critical for on-premises deployments. In many cases, HEAT will interact with databases that are not exclusively managed by the platform. When software updates are rolled out, schema changes and internal state modifications occur frequently. For this reason, we do not recommend connecting directly to HEAT’s internal databases (such as those used by the authentication system or core state engine).

To ease local deployments, HEAT provides managed services for common storage needs:

A managed blobstore service (Minio-based)
An Azure Storage emulator
A Cosmos DB emulator

Note: These managed services are designed for testing, validation and internal use only and are typically not utilized in any production-ready configuration.

Running External Database Engines

For storing session data (which is recommended), you can run database engines alongside HEAT using one of the following approaches:

Within the Cluster:
Deploy the database in a separate namespace within your Kubernetes cluster and expose it via NodePorts. HEAT will treat these databases as external data sources.
Outside the Cluster:
Install the database on a separate machine, VM, or any other infrastructure that suits your needs.

Once your external database is set up, you can add the data source to HEAT via the Cluster Manager or through the API. This flexibility allows you to integrate multiple databases or data sources into a single HEAT instance, catering to different projects or sessions.

This architecture enables HEAT to maintain its core functionality while seamlessly interfacing with external data stores, whether on-premises or in the cloud.