SMT profiling with pmcstat and perf
This article discusses profiling symmetric multithreading (SMT) on the POWER9 architecture. It uses both Big-endian FreeBSD with pmcstat and Debian with perf.
The knowledge presented here was derived from a variety of sources which can be found in the #Additional Resources section.
Contents
Symmetric multithreading (SMT)
SMT principles
SMT is "multi-thread" not "multi-core"
This is an important distinction. SMT is a technology that increases throughput of instructions through parallelization where there are under-used CPU components. While SMT4 can support four threads per core and SMT8 can support eight threads per core, this is not an additional three and seven cores, respectively. There are trade-offs and benefits. Per-thread performance declines with increasing utilization of SMT levels, but overall performance and power consumption efficiency increase. Note that IBM did not market SMT as "multi-core," while several media sites conflated SMT with increased core count.
Comparison to RISC-V HARTs
Benchmark code
The code being profiled and used for benchmarking is the genomic comparison code that the M. P. Janson Institute for Analytical Medicine uses to look for molecular mimicry between pathogens and human tissue and hormones. This code uses a variety of techniques to compare nucleotide and amino acid sequences. There are both byte-by-byte and vector versions. The raw source code can be found at TBD
pmcstat
perf
Additional Resources
POWER9 Performance Monitoring Unit User Guide v12
POWER CPU Memory Affinity 3 - Scheduling processes to SMT and Virtual Processors