This paper describes a performance study of Remote Direct Memory Access (RDMA) programming techniques. Its goal is to use these results as a guide for making "best practice" RDMA programming decisions. Infiniband RDMA is widely used in scientific high performance computing (HPC) clusters as a low-latency, high-bandwidth, reliable interconnect accessed via MPI. Recently it is gaining adherents outside scientific HPC as high-speed clusters appear in other application areas for which MPI is not suitable. RDMA enables user applications to move data
directly between virtual memory on different nodes without operating system intervention, so there is a need to know how to incorporate RDMA access into high-level programs. But RDMA offers more options to a programmer than traditional sockets programming, and it is not always obvious what the performance tradeoffs of these options might be. This study is intended to provide some answers.