Data Center Thermal and Energy Efficiency: Modeling Sensor Degradation, Control Latency, and Hotspot Probability During Workload Transitions
Main Article Content
Abstract
This article develops an engineering-oriented framework that treats data center cooling as a reliability decision system, quantifying how uncertainty propagates through measurement, state estimation, threshold governance, and control execution to determine hotspot exceedance probability, excursion duration distributions, time-to-mitigation, nuisance interventions, and energy overhead. A scenario-based comparative quantitative study is presented for a representative row-based data hall with variable-speed CRAC/CRAH control and workload-driven power variability, comparing four operational architectures: baseline threshold control, increased sensing without governance, model-predictive control with limited drift handling, and a governance-optimized two-tier architecture combining nuisance-constrained alarms, drift-aware verification, workload-aware preemptive control, and staged mitigation actions. Results indicate that (i) tail risk of hotspot duration is dominated by control latency and sensor bias drift rather than by average temperature, (ii) dense sensing reduces random uncertainty but can increase nuisance intervention if alarms are not governed, and (iii) a governed two-tier strategy reduces hotspot tail risk while maintaining energy efficiency by shifting effort from disruptive interventions to bounded verification and preemptive setpoint shaping. The paper provides copy-ready tables and full prompts for data-driven figures suitable for Techne submission and adaptation to site-specific telemetry.
Article Details
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.