Node Instances
A Node Instance represents a single realization of a node within a specific Session. When a node from a Session Template is instantiated in a new session, HEAT creates a corresponding Node Instance. This allows each session to have its own self-contained version of the workflow, with data and configuration particular to that session’s requirements.
How Node Instances Differ From Node Templates
- Node Templates describe the type of operation (e.g., input, transform, processing) and the general configuration schema.
- Session Template Nodes are specific occurrences of those Node Templates inside a Session Template’s DAG, with additional details such as node name, references to data sources, or specialized settings.
- Node Instances are the runtime manifestations of these Session Template nodes, bound to a specific Session. They track per-session state, data outputs, and processing status.
Key Characteristics
-
Unique to Each Session
Each session spawns its own set of Node Instances, guaranteeing session-specific customizations and data remain isolated. Tweaking a Node Instance within one session doesn’t affect the Session Template or any other session. -
DAG Position & Parents
A Node Instance knows its “parents” (the nodes it depends on) from the Session Template. Once all parent Node Instances have completed (or entered a non-pending state), the Node Instance is eligible for processing. -
Configuration & Overrides
Although Node Instances inherit default settings from the Session Template node, you can adjust their configuration at the session level. This is often used for troubleshooting or one-off runs where you need to change parameters without altering the original Session Template. -
Processing State
- Initialization: The Node Instance is recognized, but no processing has begun.
- Processing: Currently being processed by a Runner.
- Success: Finished processing successfully.
- Failed: Encountered an error during processing.
-
Outputs
The outcome of every node processing is a Node Output. Each time a Node Instance is processed, it produces a new Node Output containing the results of that specific run.- Multiple Outputs Per Node Instance: When parent nodes update and trigger reprocessing, the same Node Instance may generate additional outputs. This enables you to maintain a history of outputs (“go back in time”) and compare results from different processing attempts.
- Idempotence: All node processing should be idempotent. Reprocessing a node should not overwrite existing outputs; instead, it always produces a new Node Output. This preserves previous outputs, ensuring that descendant nodes can reference the correct, historical input when needed.
- Data Flow: The Node Output typically becomes the input for downstream (descendant) nodes. This chaining ensures that each stage of the workflow uses validated and versioned data, allowing for precise tracking and troubleshooting.
Reprocessing & Iteration
Sometimes a Node Instance will reprocess if:
- Parent Data Changes: A parent node might receive new data or be reprocessed, making the downstream Node Instance eligible for processing again.
- Manual Tweaks: A user or administrator modifies the Node Instance’s configuration for debugging or targeted analysis.
- Automated Triggers: Some workflows automatically update a Node Instance if a certain condition is met (e.g., a scheduled recheck for fresh data).
Because Node Instances are idempotent by design, multiple re-runs won’t corrupt existing results; they simply generate updated outputs while preserving earlier ones.
Warning: The guarantee of idempotency is the responsibility of the runner implementation. We strongly recommend against developing runners or nodes that alter database state or overwrite existing records. Instead, adopt an append-only style of output where every reprocessing event produces a new output, preserving historical data and ensuring reliable traceability.
Common Use Cases
- Troubleshooting
If a particular session exhibits unexpected behavior, you can adjust one Node Instance’s configuration and re-run it—without altering the Session Template or other sessions. - Partial Data Updates
Multiple input nodes may feed incremental data (e.g., partial CSV files). Each arrival can trigger reprocessing of downstream Node Instances to incorporate the new information. - Advanced Analytics
Some Node Instances run CPU- or GPU-intensive tasks (e.g., ML inferences). When reprocessing is necessary, HEAT provisions the appropriate Runner each time to ensure the Node Instance has the required resources.
Best Practices
- Keep Node Instances Short-Lived
Node Instances exist primarily to represent processing stages within a single session. If you find yourself storing extensive state in a Node Instance configuration, consider externalizing that data (e.g., in a database). - Log Changes
When reprocessing or adjusting a Node Instance’s configuration, log the reason. This provides valuable debugging context if unexpected results arise. - Validate Before Reprocessing
Confirm that any updated inputs or settings are valid (e.g., correct data format or updated references to Data Sources) to avoid repeated failures. - Monitor Node Status
Use the HEAT dashboard or Cluster Manager logs to watch each Node Instance progress. This helps pinpoint bottlenecks or failing nodes in real time.
Lifecycle Example
- Session Creation
A new session is created using a Session Template with three nodes: an input node, a transform node, and a final output node. - Node Instances Spawned
HEAT creates a Node Instance for each node in the Session Template. The input node instance is immediately processable (it’s waiting for data), while the transform and output nodes remain pending. - Data Arrival
Data is ingested via an ingest, producing a Node Output on the targeted input node. The transform node instance is now eligible to run. - Transform & Output
The transform node instance runs, completes, and produces a Node Output, which then triggers the final output node. - Reprocessing
If the input node instance receives new data (e.g., a second file), it re-enters a pending/active state. Once completed, it produces a new Node Output without overwriting the previous one. Downstream nodes that depend on this data can be reprocessed, allowing you to track and compare outputs over time.
Next Steps
- See how Node Instances arise in a Session after a Session Template is applied.
- Learn about Runners that power Node Instance execution.
- Explore Cluster Manager to monitor and troubleshoot Node Instances in real time.