Difference between revisions of "Fan Tuning"

From RCS Wiki
Jump to navigation Jump to search
 
(2 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
= Core Temperature =
 +
 +
You can find out the core temperature with the "sensors" command from the "lm-sensors" package:
 +
 +
<pre>
 +
# sensors | grep Core
 +
Chip 0 Core 0:            +28.0°C  (lowest = +23.0°C, highest = +29.0°C)
 +
Chip 0 Core 4:            +28.0°C  (lowest = +25.0°C, highest = +31.0°C)
 +
Chip 0 Core 8:            +28.0°C  (lowest = +25.0°C, highest = +31.0°C)
 +
Chip 0 Core 12:          +28.0°C  (lowest = +25.0°C, highest = +31.0°C)
 +
Chip 8 Core 16:          +28.0°C  (lowest = +24.0°C, highest = +36.0°C)
 +
Chip 8 Core 20:          +28.0°C  (lowest = +23.0°C, highest = +36.0°C)
 +
Chip 8 Core 24:          +28.0°C  (lowest = +22.0°C, highest = +34.0°C)
 +
Chip 8 Core 28:          +28.0°C  (lowest = +22.0°C, highest = +35.0°C)
 +
</pre>
 +
 +
= Fan Speed =
 +
 +
You can find out the current fan speed with the "ipmitool" command from the "ipmi-tools" package:
 +
 +
<pre>
 +
# ipmitool sdr type fan
 +
fan0            | DDh | ok  | 29.1 | 12000 RPM
 +
fan1            | DEh | ok  | 29.2 | 4400 RPM
 +
fan2            | DFh | ok  | 29.3 | 4500 RPM
 +
fan3            | E2h | ok  | 29.4 | 4300 RPM
 +
fan4            | E3h | ok  | 29.5 | 11900 RPM
 +
fan5            | E4h | ok  | 29.6 | 0 RPM
 +
fan6            | E5h | ns  | 29.7 | Disabled
 +
</pre>
 +
 
= Fan profile tuning on Talos II systems =
 
= Fan profile tuning on Talos II systems =
  
 +
Fan tuning is the black magic by which changes in core temperature (see above) cause changes in fan speed (see above).  To be inducted into these dark arts, read below.
  
 
----
 
----
Line 143: Line 175:
 
       decrease_interval: 5
 
       decrease_interval: 5
 
</pre>
 
</pre>
 +
 +
= See also =
 +
* [[Dual 92mm fan CPU]]
  
 
[[Category:Guides]]
 
[[Category:Guides]]

Latest revision as of 23:03, 11 April 2024

Core Temperature

You can find out the core temperature with the "sensors" command from the "lm-sensors" package:

# sensors | grep Core
Chip 0 Core 0:            +28.0°C  (lowest = +23.0°C, highest = +29.0°C)
Chip 0 Core 4:            +28.0°C  (lowest = +25.0°C, highest = +31.0°C)
Chip 0 Core 8:            +28.0°C  (lowest = +25.0°C, highest = +31.0°C)
Chip 0 Core 12:           +28.0°C  (lowest = +25.0°C, highest = +31.0°C)
Chip 8 Core 16:           +28.0°C  (lowest = +24.0°C, highest = +36.0°C)
Chip 8 Core 20:           +28.0°C  (lowest = +23.0°C, highest = +36.0°C)
Chip 8 Core 24:           +28.0°C  (lowest = +22.0°C, highest = +34.0°C)
Chip 8 Core 28:           +28.0°C  (lowest = +22.0°C, highest = +35.0°C)

Fan Speed

You can find out the current fan speed with the "ipmitool" command from the "ipmi-tools" package:

# ipmitool sdr type fan
fan0             | DDh | ok  | 29.1 | 12000 RPM
fan1             | DEh | ok  | 29.2 | 4400 RPM
fan2             | DFh | ok  | 29.3 | 4500 RPM
fan3             | E2h | ok  | 29.4 | 4300 RPM
fan4             | E3h | ok  | 29.5 | 11900 RPM
fan5             | E4h | ok  | 29.6 | 0 RPM
fan6             | E5h | ns  | 29.7 | Disabled

Fan profile tuning on Talos II systems

Fan tuning is the black magic by which changes in core temperature (see above) cause changes in fan speed (see above). To be inducted into these dark arts, read below.


THIS ARTICLE IS WORK IN PROGRESS AND STILL EXPERIMENTAL. USE AT YOUR OWN RISK, DON'T BURN OR OVERHEAT YOUR HARDWARE!


There are number of components on OpenBMC dedicated to fan operation and monitoring.


Reminder of Fan zone configuration on Talos II

  • Zone 1: CPU 1 (1 fan) in the manual
    • zone0 in the yaml files
      • fan4 in the yaml files
  • Zone 2: CPU 2 (1 fan) in the manual
    • zone1 in the yaml files
      • fan5 in the yaml files
  • Zone 3: Chassis (4 fans, fifth connector not monitored) in the manual
    • zone2 in the yaml files
      • fan0 fan1 fan2 fan3 in the yaml files


To tune fan profiles one needs to modify yaml files and recompile firmware.

start by cloning

$ git clone -b raptor-v1.07 https://git.raptorcs.com/git/talos-openbmc

if you want beta firmware, you need to checkout another branch (04-16-2019 at the time of this page creation)

for 1.07 fan profiles are stored in meta-openbmc-machines/meta-openpower/meta-rcs/meta-talos/recipes-phosphor/fans for beta versions fan profiles are stored in meta-rcs/meta-talos/recipes-phosphor/fans


phosphor-fan-monitor-config-native/monitor.yaml

this file contains monitor settings

in the monitor config the most interesting value for us is deviation: -500

RCS created a patch https://git.raptorcs.com/git/phosphor-fan-presence/commit/?id=696aed8abe58696e5eb173da38b848825523883a which allows specifying negative deviation, which tells monitor that this value is the minimum RPM fan is allowed to rotate before it's considered failed.

Keep this patch in mind, as openbmc upstream documents this value differently.

If your FAN operating range allows lower RPM you would want to set this value here. For example, some Noctua fans allows 300RPM (+-20%).

Thus if thermal conditions allow fan to spin down to 300 RPM, the monitor will consider this a fan failure and will re-attempt to spin it up and will fail again several seconds later, creating and endless loop of fan spinup<->failure events.


phosphor-fan-control-zone-config-native/zones.yaml

this file contains zone settings

example config files with explanation: https://git.raptorcs.com/git/phosphor-fan-presence/tree/control/example

in the zone config the full_speed: 1000 is a mapped from absolute sensor value to logical number which roughly means max fan RPM, do not edit it.


most interesting value to edit is default_floor: 100

Default floor speed for the zone that fan speeds can not go below.


this value determines default minimum value relative to full_speed RPM fan will try to maintain if target temperature conditions are satisfied.

for example if full_speed = 1000, default_floor: 100 will try to spin down to 10% of it's full speed. for 4000RPM fan it will spin at 400RPM if thermal conditions are satisfied.

make sure that speed resulting from default floor is equal or more than absolute value of negative deviation in the monitor config.


phosphor-fan-control-events-config-native/events.yaml

this file contain event response configuration. editing this file is out of scope of this article for now.



After modifying yaml files, proceed to Compiling_Firmware, you only need to build openbmc, skip the checkout step as you already have the sources with local modifications.

example diff from Gyakovlev for 04-16-2019 branch firmware

this sets CPU fans to never spin down below 30% of max RPM and chassis fans to never go below 20% of max RPM.

diff --git a/meta-rcs/meta-talos/recipes-phosphor/fans/phosphor-fan-control-zone-config-native/zones.yaml b/meta-rcs/meta-talos/recipes-phosphor/fans/phosphor-fan-control-zone-config-native/zones.yaml
index 588fed35c..755345f8e 100644
--- a/meta-rcs/meta-talos/recipes-phosphor/fans/phosphor-fan-control-zone-config-native/zones.yaml
+++ b/meta-rcs/meta-talos/recipes-phosphor/fans/phosphor-fan-control-zone-config-native/zones.yaml
@@ -13,7 +13,7 @@ zone_configuration:
       - air
       - all
       full_speed: 1000
-      default_floor: 100
+      default_floor: 300
       increase_delay: 5
       decrease_interval: 5
     - zone: 1
@@ -21,7 +21,7 @@ zone_configuration:
       - air
       - all
       full_speed: 1000
-      default_floor: 100
+      default_floor: 300
       increase_delay: 5
       decrease_interval: 5
     - zone: 2
@@ -29,7 +29,7 @@ zone_configuration:
       - air
       - all
       full_speed: 1000
-      default_floor: 100
+      default_floor: 200
       increase_delay: 5
       decrease_interval: 5
 
@@ -43,7 +43,7 @@ zone_configuration:
       - water
       - all
       full_speed: 1000
-      default_floor: 100
+      default_floor: 300
       increase_delay: 5
       decrease_interval: 5
     - zone: 2
@@ -51,6 +51,6 @@ zone_configuration:
       - water
       - all
       full_speed: 1000
-      default_floor: 100
+      default_floor: 200
       increase_delay: 5
       decrease_interval: 5

See also