User:Torpcoms/Timemark/PowerAI+AC922

From RCS Wiki
Jump to navigation Jump to search

Unlike the Jeff Stuecheli presentation, this one is more about the IBM product than the processor itself. I would highly recommend skimming to what looks interesting to you, if anything.

First Power 9 System and PowerAI

The slides and video download links are on the AIX VUG wiki, if you are reading this in the future, you'll have to scroll down or ctrl + F to it; its full heading is December 12, 2017 - First Power 9 System and PowerAI - with Chris Mann, even though the first two slide decks are by Joel Dodd.

Timemarkers

Disclaimers: Slide 19 is shown while they continue talking about CORAL contracts, the time shown is for the start of 6-GPU design discussion. "Dodd" as Joel's last name is a guess, it's a bit hard to hear them.

00:00:00 setting up webinar
00:00:47 Intro - Joe Armstrong
00:04:06 PowerAI presentation - Joel Dodd
00:04:20 (slide 1-2)
00:04:56 (slide 1-3) IBM AI framework packaging
00:06:44 (slide 1-4)
00:07:41 (slide 1-5) car analogy
00:08:53 (slide 1-6) PowerAI Vision
00:10:19 (slide 1-7)
00:12:17 (slide 1-8)
00:14:35 (slide 1-9) Distributed deep learning
00:15:56 (slide 1-10) Large model support
00:17:19 (slide 1-11)
00:18:01 (slide 2-2) Lab services - cognitive workshops
00:19:40 (slide 2-6) Lab services
00:20:20 intermission - Joe Armstrong
00:21:05 Power Systems AC922 - Chris Mann
00:21:36 (slide 1)
00:22:02 (slide 2) IBM strategy
00:23:20 (slide 3) OpenPOWER HPC family
00:25:10 (slide 4) PowerAccel
00:28:52 (slide 5) AC922 overview
00:30:18 (slide 7) POWER9 processor
00:31:37 (slide 8) AC922 4-GPU design
00:33:55 (slide 9) Volta specs
00:35:10 (slide 10) NVLink changes
00:38:23 (slide 11) GPU bandwidth comparison
00:40:07 (slide 12) I/O attach evolution in POWER
00:43:12 (slide 13) IB-EDR PCIe Gen 3 vs Gen 4
00:44:20 (slide 14) Front + rear views
00:45:16 (slide 15) AC input (Rong Feng 203P-HP)
00:46:55 (slide 16) Memory options
00:48:06 (slide 17) CORAL
00:49:53 (slide 18) delivery/contract discussion
00:53:16 (slide 19) AC922 6-GPU design
00:55:26 (slide 20) CORAL install at LLNL
00:57:08 (slide 21) CORAL install at ORNL
00:59:05 (slide 22) closing
00:59:11 Questions intro - Joe Armstrong
00:59:27 Q: OS for AC922
01:00:25 Q: Fan loss tolerance
01:00:59 Q: Use cases for CORAL labs
01:02:33 Q: Will it run Crysis 3?
01:03:08 Q: DCM (Dual Chip Module) or SCM (Single Chip Module)?
01:03:25 Q: CAPI 2 vs OpenCAPI?
01:05:35 Q: Mixing DIMM sizes?
01:06:28 Crysis 3 explained
01:06:43 Q: AC not LC, will it run AIX?
01:07:55 Q: Mellanox adapters for storage?
01:10:34 Q: Manufacture location?
01:10:55 Q: AIX general questions
01:11:31 Q: NVLink configurations elaborate?
01:12:31 Q: AC922 model numbers (8335GTG as public model)
01:13:55 Q: VM is not PowerVM? KVM instead.
01:14:58 Q: Leak detection? No.
01:15:28 Q: Fans in water cooled systems?
01:16:00 Q: Hardware clustering
01:17:32 Q: Clock speed?
01:19:06 Q: Different model numbers for air or water cooled?
01:19:40 Q: PowerAI on AC922?
01:20:07    ESP version of PowerAI for AC922
01:20:35 Q: Does AC922 run in the Nutanix cluster?
01:21:01 Q: Water cooling in normal datacentre?
01:22:21 closing - Joe Armstrong
01:22:38 closing - Chris Mann
01:23:07 closing - Joel
01:23:24 closing - Joe Armstrong
01:25:30 end

Interesting Notes

Summit/Sierra supercomputer delivery completion is expected in June 2018. 1000 nodes delivered so far to each. IBM is shipping 100 nodes per day.

Aurora supercomputer using x86 was was delayed.

Crysis 3 is not a POWER9 benchmark