SMT profiling with pmcstat and perf

From RCS Wiki
Jump to navigation Jump to search

This article discusses profiling symmetric multithreading (SMT) on the POWER9 architecture. It uses both Big-endian FreeBSD with pmcstat and Debian with perf.

The knowledge presented here was derived from a variety of sources which can be found in the #Additional Resources section.


Symmetric multithreading (SMT)

SMT principles

SMT is "multi-thread" not "multi-core"

This is an important distinction. SMT is a technology that increases throughput of instructions through parallelization where there are under-used CPU components. While SMT4 can support four threads per core and SMT8 can support eight threads per core, this is not an additional three and seven cores, respectively. There are trade-offs and benefits. Per-thread performance declines with increasing utilization of SMT levels, but overall performance and power consumption efficiency increase. Note that IBM did not market SMT as "multi-core," while several media sites conflated SMT with increased core count.

Comparison to RISC-V HARTs

Benchmark code

The code being profiled and used for benchmarking is the genomic comparison code that the M. P. Janson Institute for Analytical Medicine uses to look for molecular mimicry between pathogens and human tissue and hormones. This code uses a variety of techniques to compare nucleotide and amino acid sequences. There are both byte-by-byte and vector versions. The raw source code can be found at TBD

pmcstat

perf

Additional Resources

POWER9 User Manual v21

POWER9 Performance Monitoring Unit User Guide v12

POWER CPU Memory Affinity 3 - Scheduling processes to SMT and Virtual Processors

https://www.ibm.com/docs/en/linux-on-systems?topic=linuxonibm/performance/tuneforsybase/smtsettings.htm

George Neville-Neil's brief tutorial on pmcstat