kLoadA Class — pytorch Architecture
Architecture documentation for the kLoadA class in custom_mma_pipelined.h from the pytorch codebase.
Entity Profile
Source Code
aten/src/ATen/native/transformers/cuda/mem_eff_attention/gemm/custom_mma_pipelined.h lines 219–235
template <bool kLoadA = true, bool kLoadB = true>
CUTLASS_DEVICE static void prologue(
typename Base::SharedStorage& shared_storage,
///< iterator over A operand in global memory
IteratorA iterator_A,
///< iterator over B operand in global memory
IteratorB iterator_B,
int thread_idx,
int problem_size_k) {
prologue<kLoadA, kLoadB>(
shared_storage.operand_A,
shared_storage.operand_B,
iterator_A,
iterator_B,
thread_idx,
problem_size_k);
}
Source
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free