Fan Tuning
Contents
Core Temperature
You can find out the core temperature with the "sensors" command from the "lm-sensors" package:
# sensors | grep Core Chip 0 Core 0: +28.0°C (lowest = +23.0°C, highest = +29.0°C) Chip 0 Core 4: +28.0°C (lowest = +25.0°C, highest = +31.0°C) Chip 0 Core 8: +28.0°C (lowest = +25.0°C, highest = +31.0°C) Chip 0 Core 12: +28.0°C (lowest = +25.0°C, highest = +31.0°C) Chip 8 Core 16: +28.0°C (lowest = +24.0°C, highest = +36.0°C) Chip 8 Core 20: +28.0°C (lowest = +23.0°C, highest = +36.0°C) Chip 8 Core 24: +28.0°C (lowest = +22.0°C, highest = +34.0°C) Chip 8 Core 28: +28.0°C (lowest = +22.0°C, highest = +35.0°C)
Fan Speed
You can find out the current fan speed with the "ipmitool" command from the "ipmi-tools" package:
ipmitool sdr type fan fan0 | DDh | ok | 29.1 | 12000 RPM fan1 | DEh | ok | 29.2 | 4400 RPM fan2 | DFh | ok | 29.3 | 4500 RPM fan3 | E2h | ok | 29.4 | 4300 RPM fan4 | E3h | ok | 29.5 | 11900 RPM fan5 | E4h | ok | 29.6 | 0 RPM fan6 | E5h | ns | 29.7 | Disabled
Fan profile tuning on Talos II systems
Fan tuning is the black magic by which changes in core temperature (see above) cause changes in fan speed (see above). To be inducted into these dark arts, read below.
THIS ARTICLE IS WORK IN PROGRESS AND STILL EXPERIMENTAL. USE AT YOUR OWN RISK, DON'T BURN OR OVERHEAT YOUR HARDWARE!
There are number of components on OpenBMC dedicated to fan operation and monitoring.
- Upstream source is located at https://github.com/openbmc/phosphor-fan-presence
- RCS fork can be found at https://git.raptorcs.com/git/phosphor-fan-presence
- Actual fan profile for Talos II can be found here https://git.raptorcs.com/git/talos-openbmc/
Reminder of Fan zone configuration on Talos II
- Zone 1: CPU 1 (1 fan) in the manual
- zone0 in the yaml files
- fan4 in the yaml files
- zone0 in the yaml files
- Zone 2: CPU 2 (1 fan) in the manual
- zone1 in the yaml files
- fan5 in the yaml files
- zone1 in the yaml files
- Zone 3: Chassis (4 fans, fifth connector not monitored) in the manual
- zone2 in the yaml files
- fan0 fan1 fan2 fan3 in the yaml files
- zone2 in the yaml files
To tune fan profiles one needs to modify yaml files and recompile firmware.
start by cloning
$ git clone -b raptor-v1.07 https://git.raptorcs.com/git/talos-openbmc
if you want beta firmware, you need to checkout another branch (04-16-2019 at the time of this page creation)
for 1.07 fan profiles are stored in meta-openbmc-machines/meta-openpower/meta-rcs/meta-talos/recipes-phosphor/fans for beta versions fan profiles are stored in meta-rcs/meta-talos/recipes-phosphor/fans
phosphor-fan-monitor-config-native/monitor.yaml
this file contains monitor settings
in the monitor config the most interesting value for us is deviation: -500
RCS created a patch https://git.raptorcs.com/git/phosphor-fan-presence/commit/?id=696aed8abe58696e5eb173da38b848825523883a which allows specifying negative deviation, which tells monitor that this value is the minimum RPM fan is allowed to rotate before it's considered failed.
Keep this patch in mind, as openbmc upstream documents this value differently.
If your FAN operating range allows lower RPM you would want to set this value here. For example, some Noctua fans allows 300RPM (+-20%).
Thus if thermal conditions allow fan to spin down to 300 RPM, the monitor will consider this a fan failure and will re-attempt to spin it up and will fail again several seconds later, creating and endless loop of fan spinup<->failure events.
phosphor-fan-control-zone-config-native/zones.yaml
this file contains zone settings
example config files with explanation: https://git.raptorcs.com/git/phosphor-fan-presence/tree/control/example
in the zone config the full_speed: 1000 is a mapped from absolute sensor value to logical number which roughly means max fan RPM, do not edit it.
most interesting value to edit is default_floor: 100
Default floor speed for the zone that fan speeds can not go below.
this value determines default minimum value relative to full_speed RPM fan will try to maintain if target temperature conditions are satisfied.
for example if full_speed = 1000, default_floor: 100 will try to spin down to 10% of it's full speed. for 4000RPM fan it will spin at 400RPM if thermal conditions are satisfied.
make sure that speed resulting from default floor is equal or more than absolute value of negative deviation in the monitor config.
phosphor-fan-control-events-config-native/events.yaml
this file contain event response configuration. editing this file is out of scope of this article for now.
After modifying yaml files, proceed to Compiling_Firmware, you only need to build openbmc, skip the checkout step as you already have the sources with local modifications.
example diff from Gyakovlev for 04-16-2019 branch firmware
this sets CPU fans to never spin down below 30% of max RPM and chassis fans to never go below 20% of max RPM.
diff --git a/meta-rcs/meta-talos/recipes-phosphor/fans/phosphor-fan-control-zone-config-native/zones.yaml b/meta-rcs/meta-talos/recipes-phosphor/fans/phosphor-fan-control-zone-config-native/zones.yaml index 588fed35c..755345f8e 100644 --- a/meta-rcs/meta-talos/recipes-phosphor/fans/phosphor-fan-control-zone-config-native/zones.yaml +++ b/meta-rcs/meta-talos/recipes-phosphor/fans/phosphor-fan-control-zone-config-native/zones.yaml @@ -13,7 +13,7 @@ zone_configuration: - air - all full_speed: 1000 - default_floor: 100 + default_floor: 300 increase_delay: 5 decrease_interval: 5 - zone: 1 @@ -21,7 +21,7 @@ zone_configuration: - air - all full_speed: 1000 - default_floor: 100 + default_floor: 300 increase_delay: 5 decrease_interval: 5 - zone: 2 @@ -29,7 +29,7 @@ zone_configuration: - air - all full_speed: 1000 - default_floor: 100 + default_floor: 200 increase_delay: 5 decrease_interval: 5 @@ -43,7 +43,7 @@ zone_configuration: - water - all full_speed: 1000 - default_floor: 100 + default_floor: 300 increase_delay: 5 decrease_interval: 5 - zone: 2 @@ -51,6 +51,6 @@ zone_configuration: - water - all full_speed: 1000 - default_floor: 100 + default_floor: 200 increase_delay: 5 decrease_interval: 5