|
NVTX C API Reference v3
NVIDIA Tools Extension Library
|
Data Structures | |
| struct | nvtxMemCudaArrayRangeDesc_v1 |
| structure to describe memory in a CUDA array object More... | |
| struct | nvtxMemCuArrayRangeDesc_v1 |
| structure to describe memory in a CUDA array object More... | |
| struct | nvtxMemMarkInitializedBatch_v1 |
| Mark memory ranges as initialized. More... | |
Macros | |
| #define | NVTX_MEM_TYPE_CUDA_ARRAY 0x11 |
| The memory is from a CUDA runtime array. | |
| #define | NVTX_MEM_TYPE_CU_ARRAY 0x12 |
| The memory is from a CUDA device array. | |
| #define | NVTX_MEM_CUDA_PEER_ALL_DEVICES -1 |
Functions | |
| NVTX_DECLSPEC nvtxMemPermissionsHandle_t NVTX_API | nvtxMemCudaGetProcessWidePermissions (nvtxDomainHandle_t domain) |
| Get the permission object that represent the CUDA runtime device or cuda driver context. | |
| NVTX_DECLSPEC nvtxMemPermissionsHandle_t NVTX_API | nvtxMemCudaGetDeviceWidePermissions (nvtxDomainHandle_t domain, int device) |
| Get the permission object that represent the CUDA runtime device or cuda driver context. | |
| NVTX_DECLSPEC void NVTX_API | nvtxMemCudaSetPeerAccess (nvtxDomainHandle_t domain, nvtxMemPermissionsHandle_t permissions, int devicePeer, uint32_t flags) |
| Change the default behavior for all memory mapped in from a particular device. | |
| NVTX_DECLSPEC void NVTX_API | nvtxMemCudaMarkInitialized (nvtxDomainHandle_t domain, cudaStream_t stream, uint8_t isPerThreadStream, nvtxMemMarkInitializedBatch_t const *desc) |
| Register a region of memory inside of a heap of linear process virtual memory. | |
See page PAGE_MEMORY_CUDART.
| #define NVTX_MEM_CUDA_PEER_ALL_DEVICES -1 |
Definition at line 101 of file nvToolsExtMemCudaRt.h.
| #define NVTX_MEM_TYPE_CU_ARRAY 0x12 |
The memory is from a CUDA device array.
Relevant functions: cuArrayCreate, cuArray3DCreate Also CUarray from other types such as CUmipmappedArray
NVTX_MEM_HEAP_HANDLE_PROCESS_WIDE is not supported
nvtxMemHeapRegister receives a heapDesc of type cudaArray_t because the description can be retrieved by tools through cudaArrayGetInfo() nvtxMemRegionRegisterEx receives a regionDesc of type nvtxMemCuArrayRangeDesc_t
Definition at line 84 of file nvToolsExtMemCudaRt.h.
| #define NVTX_MEM_TYPE_CUDA_ARRAY 0x11 |
The memory is from a CUDA runtime array.
Relevant functions: cudaMallocArray, cudaMalloc3DArray Also cudaArray_t from other types such as cudaMipmappedArray_t
NVTX_MEM_HEAP_HANDLE_PROCESS_WIDE is not supported
nvtxMemHeapRegister receives a heapDesc of type cudaArray_t because the description can be retrieved by tools through cudaArrayGetInfo() nvtxMemRegionRegisterEx receives a regionDesc of type nvtxMemCudaArrayRangeDesc_t
Definition at line 58 of file nvToolsExtMemCudaRt.h.
Definition at line 97 of file nvToolsExtMemCudaRt.h.
Definition at line 71 of file nvToolsExtMemCudaRt.h.
Definition at line 184 of file nvToolsExtMemCudaRt.h.
| NVTX_DECLSPEC nvtxMemPermissionsHandle_t NVTX_API nvtxMemCudaGetDeviceWidePermissions | ( | nvtxDomainHandle_t | domain, |
| int | device | ||
| ) |
Get the permission object that represent the CUDA runtime device or cuda driver context.
This object will allow developers to adjust permissions applied to work executed on the GPU. It may be inherited or overridden by permissions object bound with NVTX_MEM_PERMISSIONS_BIND_SCOPE_CUDA_STREAM, depending on the binding flags.
Ex. change the peer to peer access permissions between devices in entirety or punch through special holes
By default, all memory is accessible that naturally would be to a CUDA kernel until modified otherwise by nvtxMemCudaSetPeerAccess or changing regions.
This object should also represent the CUDA driver API level context.
| NVTX_DECLSPEC nvtxMemPermissionsHandle_t NVTX_API nvtxMemCudaGetProcessWidePermissions | ( | nvtxDomainHandle_t | domain | ) |
Get the permission object that represent the CUDA runtime device or cuda driver context.
This object will allow developers to adjust permissions applied to work executed on the GPU. It may be inherited or overridden by permissions object bound with NVTX_MEM_PERMISSIONS_BIND_SCOPE_CUDA_STREAM, depending on the binding flags.
Ex. change the peer to peer access permissions between devices in entirety or punch through special holes
By default, all memory is accessible that naturally would be to a CUDA kernel until modified otherwise by nvtxMemCudaSetPeerAccess or changing regions.
This object should also represent the CUDA driver API level context.
| NVTX_DECLSPEC void NVTX_API nvtxMemCudaMarkInitialized | ( | nvtxDomainHandle_t | domain, |
| cudaStream_t | stream, | ||
| uint8_t | isPerThreadStream, | ||
| nvtxMemMarkInitializedBatch_t const * | desc | ||
| ) |
Register a region of memory inside of a heap of linear process virtual memory.
stream is the CUDA stream where the range was accessed and initialized.
| NVTX_DECLSPEC void NVTX_API nvtxMemCudaSetPeerAccess | ( | nvtxDomainHandle_t | domain, |
| nvtxMemPermissionsHandle_t | permissions, | ||
| int | devicePeer, | ||
| uint32_t | flags | ||
| ) |
Change the default behavior for all memory mapped in from a particular device.
While typically all memory defaults to readable and writable, users may desire to limit access to reduced default permissions such as read-only and a per-device basis.
Regions can used to further override smaller windows of memory.
devicePeer can be NVTX_MEM_CUDA_PEER_ALL_DEVICES