User:Torpcoms/Timemark/PowerAI+AC922
Unlike the Jeff Stuecheli presentation, this one is more about the IBM product than the processor itself. I would highly recommend skimming to what looks interesting to you, if anything.
First Power 9 System and PowerAI
The slides and video download links are on the AIX VUG wiki, if you are reading this in the future, you'll have to scroll down or ctrl
+ F
to it; its full heading is December 12, 2017 - First Power 9 System and PowerAI - with Chris Mann, even though the first two slide decks are by Joel Dodd.
Timemarkers
Disclaimers: Slide 19 is shown while they continue talking about CORAL contracts, the time shown is for the start of 6-GPU design discussion. "Nutanix" is a guess, its a bit hard to hear them. "Dodd" as Joel's last name is also a guess.
00:00:00 setting up webinar 00:00:47 Intro - Joe Armstrong 00:04:06 PowerAI presentation - Joel Dodd 00:04:20 (slide 1-2) 00:04:56 (slide 1-3) IBM AI framework packaging 00:06:44 (slide 1-4) 00:07:41 (slide 1-5) car analogy 00:08:53 (slide 1-6) PowerAI Vision 00:10:19 (slide 1-7) 00:12:17 (slide 1-8) 00:14:35 (slide 1-9) Distributed deep learning 00:15:56 (slide 1-10) Large model support 00:17:19 (slide 1-11) 00:18:01 (slide 2-2) Lab services - cognitive workshops 00:19:40 (slide 2-6) Lab services 00:20:20 intermission - Joe Armstrong 00:21:05 Power Systems AC922 - Chris Mann 00:21:36 (slide 1) 00:22:02 (slide 2) IBM strategy 00:23:20 (slide 3) OpenPOWER HPC family 00:25:10 (slide 4) PowerAccel 00:28:52 (slide 5) AC922 overview 00:30:18 (slide 7) POWER9 processor 00:31:37 (slide 8) AC922 4-GPU design 00:33:55 (slide 9) Volta specs 00:35:10 (slide 10) NVLink changes 00:38:23 (slide 11) GPU bandwidth comparison 00:40:07 (slide 12) I/O attach evolution in POWER 00:43:12 (slide 13) IB-EDR PCIe Gen 3 vs Gen 4 00:44:20 (slide 14) Front + rear views 00:45:16 (slide 15) AC input (Rong Feng 203P-HP) 00:46:55 (slide 16) Memory options 00:48:06 (slide 17) CORAL 00:49:53 (slide 18) delivery/contract discussion 00:53:16 (slide 19) AC922 6-GPU design 00:55:26 (slide 20) CORAL install at LLNL 00:57:08 (slide 21) CORAL install at ORNL 00:59:05 (slide 22) closing 00:59:11 Questions intro - Joe Armstrong 00:59:27 Q: OS for AC922 01:00:25 Q: Fan loss tolerance 01:00:59 Q: Use cases for CORAL labs 01:02:33 Q: Will it run Crysis 3? 01:03:08 Q: DCM (Dual Chip Module) or SCM (Single Chip Module)? 01:03:25 Q: CAPI 2 vs OpenCAPI? 01:05:35 Q: Mixing DIMM sizes? 01:06:28 Crysis 3 explained 01:06:43 Q: AC not LC, will it run AIX? 01:07:55 Q: Mellanox adapters for storage? 01:10:34 Q: Manufacture location? 01:10:55 Q: AIX general questions 01:11:31 Q: NVLink configurations elaborate? 01:12:31 Q: AC922 model numbers (8335GTG as public model) 01:13:55 Q: VM is not PowerVM? KVM instead. 01:14:58 Q: Leak detection? No. 01:15:28 Q: Fans in water cooled systems? 01:16:00 Q: Hardware clustering 01:17:32 Q: Clock speed? 01:19:06 Q: Different model numbers for air or water cooled? 01:19:40 Q: PowerAI on AC922? 01:20:07 ESP version of PowerAI for AC922 01:20:35 Q: Does AC922 run in the Nutanix cluster? 01:21:01 Q: Water cooling in normal datacentre? 01:22:21 closing - Joe Armstrong 01:22:38 closing - Chris Mann 01:23:07 closing - Joel 01:23:24 closing - Joe Armstrong 01:25:30 end
Interesting Notes
Summit/Sierra supercomputer delivery completion is expected in June 2018. 1000 nodes delivered so far to each. IBM is shipping 100 nodes per day.
Aurora supercomputer using x86 was was delayed.
Crysis 3 is not a POWER9 benchmark