File buffer_collectives.h

Defines

GA_COMM_ID_BYTES

sizeof(gpucommCliqueId)

Typedefs

typedef struct _gpucomm gpucomm

Enums

gpucomm_reduce_ops

Values:

0

to sum (elemwise) arrays across ranks

1

to multiply (elemwise) arrays across ranks

2

to find max (elemwise) of arrays across ranks

3

to find min (elemwise) of arrays across ranks

Functions

int gpucomm_new(gpucomm ** comm, gpucontext * ctx, gpucommCliqueId comm_id, int ndev, int rank)

Create a new gpu communicator instance.

This must be called in parallel by all participants in the same world. The call will block until all participants have joined in. The world is defined by a shared comm_id.

Return

error code or GA_NO_ERROR if success

Parameters
  • comm: pointer to get a new gpu communicator

  • ctx: gpu context in which comm will be used (contains device information)

  • comm_id: id unique to communicators consisting a world

  • ndev: number of communicators/devices participating in the world

  • rank: user-defined rank, from 0 to ndev-1. Must be unique for the world.

void gpucomm_free(gpucomm * comm)

Destroy a gpu communicator instance.

Parameters
  • comm: gpu communicator to be destroyed

const char* gpucomm_error(gpucontext * ctx)

Returns nice error message concerning GA_COMM_ERROR.

Return

useful backend error message

Parameters
  • ctx: gpu context in which communicator was used

gpucontext* gpucomm_context(gpucomm * comm)

Returns gpu context in which comm is used.

Return

gpu context

Parameters
  • comm: gpu communicator

int gpucomm_gen_clique_id(gpucontext * ctx, gpucommCliqueId * comm_id)

Creates a unique comm_id.

The id is guarenteed to be unique in the same host, but not necessarily across hosts.

Return

error code or GA_NO_ERROR if success

Parameters
  • ctx: gpu context

  • comm_id: pointer to instance containing id

int gpucomm_get_count(gpucomm * comm, int * devcount)

Returns total number of devices participating in comm’s world.

Return

error code or GA_NO_ERROR if success

Parameters
  • comm: gpu communicator

  • devcount: pointer to store the number of devices

int gpucomm_get_rank(gpucomm * comm, int * rank)

Returns the rank of comm inside its world.

Return

error code or GA_NO_ERROR if success

Parameters
  • comm: gpu communicator

  • rank: pointer to store the rank

int gpucomm_reduce(gpudata * src, size_t offsrc, gpudata * dest, size_t offdest, size_t count, int typecode, int opcode, int root, gpucomm * comm)

Reduce collective operation for ranks in a communicator world [buffer level].

Note

Non root ranks can call this, using a NULL dest. In this case, offdest will not be used.

Note

Must be called separately for each rank in comm.

Return

error code or GA_NO_ERROR if success

Parameters
  • src: data in device’s buffer to be reduced

  • offsrc: memory offset after which data is saved in buffer src

  • dest: data in device’s buffer to collect result

  • offdest: memory offset after which data will be saved in buffer dest

  • count: number of elements to be reduced in each array

  • typecode: elements’ data type

  • opcode: reduce operation code

  • root: rank in comm which will collect result

  • comm: gpu communicator

int gpucomm_all_reduce(gpudata * src, size_t offsrc, gpudata * dest, size_t offdest, size_t count, int typecode, int opcode, gpucomm * comm)

AllReduce collective operation for ranks in a communicator world [buffer level].

Reduces data pointed by src using op operation and leaves identical copies of result in data pointed by dest on each rank of comm.

Note

Must be called separately for each rank in comm.

Return

error code or GA_NO_ERROR if success

Parameters
  • src: data in device’s buffer to be reduced

  • offsrc: memory offset after which data is saved in buffer src

  • dest: data in device’s buffer to collect result

  • offdest: memory offset after which data will be saved in buffer dest

  • count: number of elements to be reduced in each array

  • typecode: elements’ data type

  • opcode: reduce operation code (see gpucomm_reduce_ops)

  • comm: gpu communicator

int gpucomm_reduce_scatter(gpudata * src, size_t offsrc, gpudata * dest, size_t offdest, size_t count, int typecode, int opcode, gpucomm * comm)

ReduceScatter collective operation for ranks in a communicator world [buffer level].

Reduces data pointed by src using opcode operation and leaves reduced result scattered over data pointed by dest in the user-defined rank order in comm.

Note

Must be called separately for each rank in comm.

Return

error code or GA_NO_ERROR if success

Parameters
  • src: data in device’s buffer to be reduced

  • offsrc: memory offset after which data is saved in buffer src

  • dest: data in device’s buffer to collect scattered result

  • offdest: memory offset after which data will be saved in buffer dest

  • count: number of elements to be contained in result dest

  • typecode: elements’ data type

  • opcode: reduce operation code (see gpucomm_reduce_ops)

  • comm: gpu communicator

int gpucomm_broadcast(gpudata * array, size_t offset, size_t count, int typecode, int root, gpucomm * comm)

Broadcast collective operation for ranks in a communicator world [buffer level].

Copies data pointed by array to all ranks in comm.

Note

Must be called separately for each rank in comm.

Return

error code or GA_NO_ERROR if success

Parameters
  • array: data in device’s buffer to get copied or be received

  • offset: memory offset after which data in array begin

  • count: number of elements to be contained in array

  • typecode: elements’ data type

  • root: rank in comm which broadcasts its array

  • comm: gpu communicator

int gpucomm_all_gather(gpudata * src, size_t offsrc, gpudata * dest, size_t offdest, size_t count, int typecode, gpucomm * comm)

AllGather collective operation for ranks in a communicator world.

Each rank receives all data pointed by src of every rank in the user-defined rank order in comm.

Note

Must be called separately for each rank in comm.

Return

error code or GA_NO_ERROR if success

Parameters
  • src: data in device’s buffer to be gathered

  • offsrc: memory offset after which data in src begin

  • dest: data in device’s buffer to gather from all ranks

  • offdest: memory offset after which data in dest begin

  • count: number of elements to be gathered from each rank in src

  • typecode: elements’ data type

  • comm: gpu communicator

struct gpucommCliqueId
#include <buffer_collectives.h>

Dummy struct to define byte-array’s length through a type

Public Members

char gpucommCliqueId::internal[GA_COMM_ID_BYTES]