Hardware and System Substrate cover CPUs, GPUs, TPUs, NPUs, FPGAs, custom accelerators, memory hierarchy, drivers, interconnects, operating systems, containers, virtual machines, and isolation primitives.
Key takeaways
- Capacity and topology discovery
- The boundary fails in recognizable ways such as device loss or reset.
- A product may span this layer and adjacent layers; classify responsibilities rather than brand language.
Definition and scope
CPUs, GPUs, TPUs, NPUs, FPGAs, custom accelerators, memory hierarchy, drivers, interconnects, operating systems, containers, virtual machines, and isolation primitives.
Responsibilities
- Capacity and topology discovery
- Device and host memory allocation
- Driver and firmware compatibility
- Interconnect and collective availability
- Power, thermal, NUMA, and tenancy constraints
Inputs, outputs, and boundaries
The layer consumes artifacts or requests from the layer above and relies on services from the layer below. Its contract should define supported inputs, produced outputs, lifecycle, compatibility, resource ownership, and failure semantics.
Failure modes
- Device loss or reset
- Out-of-memory or fragmentation
- Driver/runtime incompatibility
- Fabric congestion or partition
- Thermal throttling or power cap
Implementation guidance
- Document exact device, memory, driver, firmware, interconnect, and topology assumptions.
- Treat hardware isolation and confidential-computing capabilities as explicit deployment properties.
- Measure workload behavior under realistic power, thermal, and multi-tenant conditions.
Metrics
Measure the layer with workload-appropriate objectives. Avoid comparing unrelated categories or publishing unqualified performance numbers.
