By William An in AMD — Aug 12, 2021

SURF Week 6-9: AMD GCN3 Traces and Accel-Sim

Okay this is a rather long post, which covers what I did in the past four weeks to run the AMD GCN3 traces on Accel-Sim.

The plan was to map AMD GCN3 instructions into the Accel-Sim trace-driven part to my best knowledge with the help of its official ISA manual. Then we need to get the Accel-Sim trace-driven part running, which requires it to identify the incoming warp lane size and know the trace files are in AMD GCN3 ISA format. The reason is that the original Accel-Sim targets specifically on NVIDIA GPU family and has encode some assumptions like warp size being 32 inside the code. Additional flags in the trace header were added (-isa type and -warp size) to differentiate with the NVIDIA ones while maintaining compatible with the NVIDIA traces.

With the previous setup done, we runned the AMD GCN3 traces generated from MGPUSim benchmarks under the NVIDIA QV100 hardware configuration with changes made to compute unit number and cache in order to map the simulated GPU, AMD R9 Nano.

The modified Accel-Sim repo is at here.

Debugging

Lots of bugs related to the FLAT instructions in MGPUSim

Turns out that if the load inst is self-assigned, we will get the value of memory address rather than the memory address when reading the regs
Thus a snapshot of regfile is kept every time an instruction run

In addition, workloads like pagerank are experiencing segfault issue, which was caused by a null pointer not being checked. Further investigation is needed to resolve this.

Unusually high L2 cache hit: gpgpu_perf_sim_memcpy

During the simulation, I noticed that results from Accel-Sim have a unusual high L2 cache hit rate. With some diggings into the configuration, it looks like the option -gpgpu_perf_sim_memcpy = 1 will fill up the L2 cache when executing the memcpy instructions. On the other hand, when MGPUSim perform memcpy, it does not seem to do this.

Also MGPUSim seems to flush its caches between kernel launches, which might also be an issue when counting cache hits and misses

Future

Do benchmarks and get data and resolve bugs related to them
Config the gpu config files
Automate the whole trace process by github actions
Microbenchmarks: MGPUSim has them, but not public though
Sector cache and ipoly hashing issue

Debugging

Unusually high L2 cache hit: gpgpu_perf_sim_memcpy

Future

Subscribe to TheXYZLab Blog