Since my research involves aquiring actual hardware traces executed by the AMD GPUs, I looked for tools provided by AMD ROCm platform initially to see if there are tools like the NVIDIA NVBit that could be used for instrumentation. Unfortunately, AMD as of current time does not have such tool, so I have to switch to the MGPUSim simulator to generate traces instead. Still I will document here of what I found as reference.
In the AMD ROCm documentation page, the two tools I investigated are the ROCProfiler and the ROCTracer. The ROCProfiler can be used to generate application level traces like API calling traces but not actual hardware traces, which is not helpfule to my project. Despite this, it can gather hardware couter metrics like L1 cache hits count which could be useful if you want to compare these metrics with simulation results. As for the ROCTracer, it still cannot have hardware trace capability and serves as API for developers to develop tools related to application level traces only.
In addition to ROCm tools, I also looked into the Radeon Developer Panel, RGA (Radeon™ GPU Analyzer), and RGP (Radeon™ GPU Profiler), but these tools are more for the graphic side of GPU rather than the GPGPU. They might be helpful for applications like game engine renderer though.
As for the MGPU-Sim simulator, you can locate its repo here: https://gitlab.com/akita/mgpusim. To generate hardware traces from it, simply run program with
-isa-debug -timing to generate trace information. You can also visualize the traces in execution order using web trace tool here: https://gitlab.com/akita/mgpusim/-/tree/v2/emu/isadebugger.