pygpu package¶
pygpu.gpuarray module¶
-
class
pygpu.gpuarray.
GpuArray
¶ Device array
To create instances of this class use
zeros()
,empty()
orarray()
. It cannot be instanciated directly.You can also subclass this class and make the module create your instances by passing the cls argument to any method that return a new GpuArray. This way of creating the class will NOT call your
__init__()
method.You can also implement your own
__init__()
method, but you must take care to ensure you properly initialized the GpuArray C fields before using it or you will most likely crash the interpreter.-
T
¶
-
astype
(dtype, order='A', copy=True)¶ Cast the elements of this array to a new type.
This function returns a new array will all elements cast to the supplied dtype, but otherwise unchanged.
If copy is False and the type and order match self is returned.
- Parameters
dtype (str or numpy.dtype or int) – type of the elements of the result
order ({'A', 'C', 'F'}) – memory layout of the result
copy (bool) – Always return a copy?
-
base
¶
-
base_data
¶ Return a pointer to the backing OpenCL object.
-
context
¶
-
copy
(order='C')¶ Return a copy if this array.
- Parameters
order ({'C', 'A', 'F'}) – memory layout of the copy
-
data
¶ Return a pointer to the raw OpenCL buffer object.
This will fail for arrays that have an offset.
-
dtype
¶ The dtype of the element
-
flags
¶ Return a flags object describing the properties of this array.
- This is mostly numpy-compatible with some exceptions:
Flags are always constant (numpy allows modification of certain flags in certain cicumstances).
OWNDATA is always True, since the data is refcounted in libgpuarray.
UPDATEIFCOPY is not supported, therefore always False.
-
get_ipc_handle
()¶
-
gpudata
¶ Return a pointer to the raw backend object.
-
itemsize
¶ The size of the base element.
-
ndim
¶ The number of dimensions in this object
-
offset
¶ Return the offset into the gpudata pointer for this array.
-
read
(dst)¶ Reads from this GpuArray into host’s Numpy array.
This method is as fast as or even faster than :ref:__array__ method and thus :ref:numpy.asarray. This is because it skips allocation of a new buffer in host’s memory to contain device’s GpuArray. It uses an existing Numpy ndarray as a buffer to get the GpuArray. It is required though that the GpuArray and the Numpy array to be compatible in byte size, contiguity and data type. It is also needed for dst to be writeable and properly aligned in host’s memory and for self to be contiguous. It is allowed for this GpuArray and dst to have different shapes.
- Parameters
dst (numpy.ndarray) – destination array in host
- Raises
ValueError – If this GpuArray is not compatible with src or if dst is not well behaved.
-
reshape
(shape, order='C')¶ Returns a new array with the given shape and order.
The new shape must have the same size (total number of elements) as the current one.
-
shape
¶ shape of this ndarray (tuple)
-
size
¶ The number of elements in this object.
-
strides
¶ data pointer strides (in bytes)
-
sync
()¶ Wait for all pending operations on this array.
This is done automatically when reading or writing from it, but can be useful as a separate operation for timings.
-
take1
(idx)¶
-
transfer
(new_ctx)¶
-
transpose
(*params)¶
-
typecode
¶ The gpuarray typecode for the data type of the array
-
view
(cls=GpuArray)¶ Return a view of this array.
The returned array shares device data with this one and both will reflect changes made to the other.
- Parameters
cls (type) – class of the view (must inherit from GpuArray)
-
write
(src)¶ Writes host’s Numpy array to device’s GpuArray.
This method is as fast as or even faster than :ref:asarray, because it skips possible allocation of a buffer in device’s memory. It uses this already allocated GpuArray buffer to contain src array from host’s memory. It is required though that the GpuArray and the Numpy array are compatible in byte size and data type. It is also needed for the GpuArray to be well behaved and contiguous. If src is not aligned or compatible in contiguity it will be copied to a new Numpy array in order to be. It is allowed for this GpuArray and src to have different shapes.
- Parameters
src (numpy.ndarray) – source array in host
- Raises
ValueError – If this GpuArray is not compatible with src or if it is not well behaved or contiguous.
-
-
exception
pygpu.gpuarray.
GpuArrayException
¶ Exception used for most errors related to libgpuarray.
-
class
pygpu.gpuarray.
GpuContext
¶ Class that holds all the information pertaining to a context.
The currently implemented modules (for the kind parameter) are “cuda” and “opencl”. Which are available depends on the build options for libgpuarray.
The flag values are defined in the gpuarray/buffer.h header and are in the “Context flags” group. If you want to use more than one value you must bitwise OR them together.
If you want an alternative interface check
init()
.- Parameters
kind (str) – module name for the context
devno (int) – device number
flags (int) – context flags
-
bin_id
¶ Binary compatibility id
-
devname
¶ Device name for this context
-
free_gmem
¶ Size of free global memory on the device
-
kind
¶
-
largest_memblock
¶ Size of the largest memory block you can allocate
-
lmemsize
¶ Size of the local (shared) memory, in bytes, for this context
-
maxgsize0
¶ Maximum global size for dimension 0
-
maxgsize1
¶ Maximum global size for dimension 1
-
maxgsize2
¶ Maximum global size for dimension 2
-
maxlsize0
¶ Maximum local size for dimension 0
-
maxlsize1
¶ Maximum local size for dimension 1
-
maxlsize2
¶ Maximum local size for dimension 2
-
numprocs
¶ Number of compute units for this context
-
ptr
¶ Raw pointer value for the context object
-
total_gmem
¶ Total size of global memory on the device
-
unique_id
¶ Device PCI Bus ID for this context
-
class
pygpu.gpuarray.
GpuKernel
(source, name, types, context=None, have_double=False, have_small=False, have_complex=False, have_half=False, cuda=False, opencl=False)¶ Compile a kernel on the device
The kernel function is retrieved using the provided name which must match what you named your kernel in source. You can safely reuse the same name multiple times.
The have_* parameter are there to tell libgpuarray that we need the particular type or feature to work for this kernel. If the request can’t be satified a
UnsupportedException
will be raised in the constructor.Once you have the kernel object you can simply call it like so:
k = GpuKernel(...) k(param1, param2, n=n)
where n is the minimum number of threads to run. libgpuarray will try to stay close to this number but may run a few more threads to match the hardware preferred multiple and stay efficient. You should watch out for this in your code and make sure to test against the size of your data.
If you want more control over thread allocation you can use the gs and ls parameters like so:
k = GpuKernel(...) k(param1, param2, gs=gs, ls=ls)
If you choose to use this interface, make sure to stay within the limits of k.maxlsize or the call will fail.
- Parameters
source (str) – complete kernel source code
name (str) – function name of the kernel
types (list or tuple) – list of argument types
context (GpuContext) – device on which the kernel is compiled
have_double (bool) – ensure working doubles?
have_small (bool) – ensure types smaller than float will work?
have_complex (bool) – ensure complex types will work?
have_half (bool) – ensure half-floats will work?
cuda (bool) – kernel is cuda code?
opencl (bool) – kernel is opencl code?
Notes
With the cuda backend, unless you use the cluda include, you must either pass the mangled name of your kernel or declare the function ‘extern “C”’, because cuda uses a C++ compiler unconditionally.
Warning
If you do not set the have_ flags properly, you will either get a device-specific error (the good case) or silent completly bogus data (the bad case).
-
context
¶
-
maxlsize
¶ Maximum local size for this kernel
-
numargs
¶ Number of arguments to kernel
-
preflsize
¶ Preferred multiple for local size for this kernel
-
exception
pygpu.gpuarray.
UnsupportedException
¶
-
pygpu.gpuarray.
abi_version
()¶
-
pygpu.gpuarray.
api_version
()¶
-
pygpu.gpuarray.
array
(obj, dtype='float64', copy=True, order=None, ndmin=0, context=None, cls=None)¶ Create a GpuArray from existing data
This function creates a new GpuArray from the data provided in obj except if obj is already a GpuArray and all the parameters match its properties and copy is False.
The properties of the resulting array depend on the input data except if overriden by other parameters.
This function is similar to
numpy.array()
except that it returns GpuArrays.- Parameters
obj (array-like) – data to initialize the result
dtype (string or numpy.dtype or int) – data type of the result elements
copy (bool) – return a copy?
order (str) – memory layout of the result
ndmin (int) – minimum number of result dimensions
context (GpuContext) – allocation context
cls (type) – result class (must inherit from GpuArray)
-
pygpu.gpuarray.
asarray
(a, dtype=None, order='A', context=None)¶ Returns a GpuArray from the data in a
If a is already a GpuArray and all other parameters match, then the object itself returned. If a is an instance of a subclass of GpuArray then a view of the base class will be returned. Otherwise a new object is create and the data is copied into it.
context is optional if a is a GpuArray (but must match exactly the context of a if specified) and is mandatory otherwise.
- Parameters
a (array-like) – data
dtype (str, numpy.dtype or int) – type of the elements
order ({'A', 'C', 'F'}) – layout of the data in memory, one of ‘A’ny, ‘C’ or ‘F’ortran
context (GpuContext) – context in which to do the allocation
-
pygpu.gpuarray.
ascontiguousarray
(a, dtype=None, context=None)¶ Returns a contiguous array in device memory (C order).
context is optional if a is a GpuArray (but must match exactly the context of a if specified) and is mandatory otherwise.
- Parameters
a (array-like) – input
dtype (str, numpy.dtype or int) – type of the return array
context (GpuContext) – context to use for a new array
-
pygpu.gpuarray.
asfortranarray
(a, dtype=None, context=None)¶ Returns a contiguous array in device memory (Fortran order)
context is optional if a is a GpuArray (but must match exactly the context of a if specified) and is mandatory otherwise.
- Parameters
a (array-like) – input
dtype (str, numpy.dtype or int) – type of the elements
context (GpuContext) – context in which to do the allocation
-
pygpu.gpuarray.
cl_wrap_ctx
(ptr)¶ Wrap an existing OpenCL context (the cl_context struct) into a GpuContext class.
-
pygpu.gpuarray.
count_devices
(kind, platform)¶ Returns number of devices in host’s platform compatible with kind.
-
pygpu.gpuarray.
count_platforms
(kind)¶ Return number of host’s platforms compatible with kind.
-
pygpu.gpuarray.
cuda_wrap_ctx
(ptr)¶ Wrap an existing CUDA driver context (CUcontext) into a GpuContext class.
If own is true, libgpuarray is now reponsible for the context and it will be destroyed once there are no references to it. Otherwise, the context will not be destroyed and it is the calling code’s reponsability.
-
pygpu.gpuarray.
dtype_to_ctype
(dtype)¶ Return the C name for a type.
- Parameters
dtype (numpy.dtype) – type to get the name for
-
pygpu.gpuarray.
dtype_to_typecode
(dtype)¶ Get the internal typecode for a type.
- Parameters
dtype (numpy.dtype) – type to get the code for
-
pygpu.gpuarray.
empty
(shape, dtype='float64', order='C', context=None, cls=None)¶ Returns an empty (uninitialized) array of the requested shape, type and order.
- Parameters
shape (iterable of ints) – number of elements in each dimension
dtype (str, numpy.dtype or int) – type of the elements
order ({'A', 'C', 'F'}) – layout of the data in memory, one of ‘A’ny, ‘C’ or ‘F’ortran
context (GpuContext) – context in which to do the allocation
cls (type) – class of the returned array (must inherit from GpuArray)
-
class
pygpu.gpuarray.
flags
¶ -
aligned
¶
-
behaved
¶
-
c_contiguous
¶
-
carray
¶
-
contiguous
¶
-
f_contiguous
¶
-
farray
¶
-
fnc
¶
-
forc
¶
-
fortran
¶
-
num
¶
-
owndata
¶
-
updateifcopy
¶
-
writeable
¶
-
-
pygpu.gpuarray.
from_gpudata
(data, offset, dtype, shape, context=None, strides=None, writable=True, base=None, cls=None)¶ Build a GpuArray from pre-allocated gpudata
- Parameters
data (int) – pointer to a gpudata structure
offset (int) – offset to the data location inside the gpudata
dtype (numpy.dtype) – data type of the gpudata elements
shape (iterable of ints) – shape to use for the result
context (GpuContext) – context of the gpudata
strides (iterable of ints) – strides for the results (C contiguous if not specified)
writable (bool) – is the data writable?
base (object) – base object that keeps gpudata alive
cls (type) – view type of the result
Notes
This function might be deprecated in a later relase since the only way to create gpudata pointers is through libgpuarray functions that aren’t exposed at the python level. It can be used with the value of the gpudata attribute of an existing GpuArray.
Warning
This function is intended for advanced use and will crash the interpreter if used improperly.
-
pygpu.gpuarray.
get_default_context
()¶ Return the currently defined default context (or None).
-
pygpu.gpuarray.
init
()¶ - init(dev, sched=’default’, single_stream=False, kernel_cache_path=None,
max_cache_size=sys.maxsize, initial_cache_size=0)
Creates a context from a device specifier.
Device specifiers are composed of the type string and the device id like so:
"cuda0" "opencl0:1"
For cuda the device id is the numeric identifier. You can see what devices are available by running nvidia-smi on the machine. Be aware that the ordering in nvidia-smi might not correspond to the ordering in this library. This is due to how cuda enumerates devices. If you don’t specify a number (e.g. ‘cuda’) the first available device will be selected according to the backend order.
For opencl the device id is the platform number, a colon (:) and the device number. There are no widespread and/or easy way to list available platforms and devices. You can experiement with the values, unavaiable ones will just raise an error, and there are no gaps in the valid numbers.
- Parameters
dev (str) – device specifier
sched ({'default', 'single', 'multi'}) – optimize scheduling for which type of operation
disable_alloc_cache (bool) – disable allocation cache (if any)
single_stream (bool) – enable single stream mode
Returns True if a and b may share memory, False otherwise.
-
pygpu.gpuarray.
open_ipc_handle
(c, hpy, l)¶ Open an IPC handle to get a new GpuArray from it.
- Parameters
c (GpuContext) – context
hpy (bytes) – binary handle data received
l (int) – size of the referred memory block
-
pygpu.gpuarray.
register_dtype
(dtype, cname)¶ Make a new type known to the cluda machinery.
This function return the associted internal typecode for the new type.
- Parameters
dtype (numpy.dtype) – new type
cname (str) – C name for the type declarations
-
pygpu.gpuarray.
set_default_context
(ctx)¶ Set the default context for the module.
The provided context will be used as a default value for all the other functions in this module which take a context as parameter. Call with None to clear the default value.
If you don’t call this function the context of all other functions is a mandatory argument.
This can be helpful to reduce clutter when working with only one context. It is strongly discouraged to use this function when working with multiple contexts at once.
- Parameters
ctx (GpuContext) – default context
-
pygpu.gpuarray.
zeros
(shape, dtype='float64', order='C', context=None, cls=None)¶ Returns an array of zero-initialized values of the requested shape, type and order.
- Parameters
shape (iterable of ints) – number of elements in each dimension
dtype (str, numpy.dtype or int) – type of the elements
order ({'A', 'C', 'F'}) – layout of the data in memory, one of ‘A’ny, ‘C’ or ‘F’ortran
context (GpuContext) – context in which to do the allocation
cls (type) – class of the returned array (must inherit from GpuArray)
pygpu.elemwise module¶
-
class
pygpu.elemwise.
GpuElemwise
¶
-
pygpu.elemwise.
as_argument
(o, name, read=False, write=False)¶
-
pygpu.elemwise.
compare
(a, op, b, broadcast=False, convert_f16=True)¶
-
pygpu.elemwise.
elemwise1
(a, op, oper=None, op_tmpl='res = %(op)sa', out=None, convert_f16=True)¶
-
pygpu.elemwise.
elemwise2
(a, op, b, ary, odtype=None, oper=None, op_tmpl='res = (%(out_t)s)a %(op)s (%(out_t)s)b', broadcast=False, convert_f16=True)¶
-
pygpu.elemwise.
ielemwise2
(a, op, b, oper=None, op_tmpl='a = a %(op)s b', broadcast=False, convert_f16=True)¶
pygpu.operations module¶
-
pygpu.operations.
array_split
(ary, indices_or_sections, axis=0)¶
-
pygpu.operations.
atleast_1d
(*arys)¶
-
pygpu.operations.
atleast_2d
(*arys)¶
-
pygpu.operations.
atleast_3d
(*arys)¶
-
pygpu.operations.
concatenate
(arys, axis=0, context=None)¶
-
pygpu.operations.
dsplit
(ary, indices_or_sections)¶
-
pygpu.operations.
dstack
(tup, context=None)¶
-
pygpu.operations.
hsplit
(ary, indices_or_sections)¶
-
pygpu.operations.
hstack
(tup, context=None)¶
-
pygpu.operations.
split
(ary, indices_or_sections, axis=0)¶
-
pygpu.operations.
vsplit
(ary, indices_or_sections)¶
-
pygpu.operations.
vstack
(tup, context=None)¶
pygpu.reduction module¶
-
class
pygpu.reduction.
ReductionKernel
(context, dtype_out, neutral, reduce_expr, redux, map_expr=None, arguments=None, preamble='', init_nd=None)¶
-
pygpu.reduction.
massage_op
(operation)¶
-
pygpu.reduction.
parse_c_args
(arguments)¶
-
pygpu.reduction.
reduce1
(ary, op, neutral, out_type, axis=None, out=None, oper=None)¶
pygpu.blas module¶
-
pygpu.blas.
dot
(X, Y, Z=None, overwrite_z=False)¶
-
pygpu.blas.
gemm
(alpha, A, B, beta, C=None, trans_a=False, trans_b=False, overwrite_c=False)¶
-
pygpu.blas.
gemmBatch_3d
(alpha, A, B, beta, C=None, trans_a=False, trans_b=False, overwrite_c=False)¶
-
pygpu.blas.
gemv
(alpha, A, X, beta=0.0, Y=None, trans_a=False, overwrite_y=False)¶
-
pygpu.blas.
ger
(alpha, X, Y, A=None, overwrite_a=False)¶
pygpu.collectives module¶
-
class
pygpu.collectives.
GpuComm
(cid, ndev, rank)¶ Represents a communicator which participates in a multi-gpu clique.
It is used to invoke collective operations to gpus inside its clique.
- Parameters
cid (GpuCommCliqueId) – Unique id shared among participating communicators.
ndev (int) – Number of communicators inside the clique.
rank (int) – User-defined rank of this communicator inside the clique. It influences order of collective operations.
-
all_gather
(self, src, dest=None, nd_up=1)¶ AllGather collective operation for ranks in a communicator world.
- Parameters
src (GpuArray) – Array to be gathered.
dest (GpuArray) – Array to receive all gathered arrays from ranks in GpuComm.
nd_up (int) – Used when creating result array. Indicates how many extra dimensions user wants result to have. Default is 1, which means that the result will store each rank’s gathered array in one extra new dimension.
Notes
Providing nd_up == 0 means that gathered arrays will be appended to the dimension with the largest stride.
-
all_reduce
(self, src, op, dest=None)¶ AllReduce collective operation for ranks in a communicator world.
- Parameters
Notes
Not providing dest argument for a caller will result in creating a new compatible
GpuArray
and returning result in it.
-
broadcast
(self, array, root=- 1)¶ Broadcast collective operation for ranks in a communicator world.
- Parameters
array (GpuArray) – Array to be reduced.
root (int) – Rank in GpuComm which broadcasts its array.
Notes
root is necessary when invoking from a non-root rank. Root caller does not need to provide root argument.
-
count
¶ Total number of communicators inside the clique
-
rank
¶ User-defined rank of this communicator inside the clique
-
reduce
(self, src, op, dest=None, root=- 1)¶ Reduce collective operation for ranks in a communicator world.
- Parameters
Notes
root is necessary when invoking from a non-root rank. Root caller does not need to provide root argument.
Not providing dest argument for a root caller will result in creating a new compatible
GpuArray
and returning result in it.
-
reduce_scatter
(self, src, op, dest=None)¶ ReduceScatter collective operation for ranks in a communicator world.
- Parameters
Notes
Not providing dest argument for a caller will result in creating a new compatible
GpuArray
and returning result in it.
-
class
pygpu.collectives.
GpuCommCliqueId
(context=None, comm_id=None)¶ Represents a unique id shared among
GpuComm
communicators which participate in a multi-gpu clique.- Parameters
context (GpuContext) – Reference to which gpu this GpuCommCliqueId object belongs.
comm_id (bytes) – Existing unique id to be passed in this object.
-
context
¶
pygpu.dtypes module¶
Type mapping helpers.
-
pygpu.dtypes.
dtype_to_ctype
(dtype)¶ Return the C type that corresponds to dtype.
- Parameters
dtype (data type) – a numpy dtype
-
pygpu.dtypes.
get_common_dtype
(obj1, obj2, allow_double)¶ Returns the proper output type for a numpy operation involving the two provided objects. This may not be suitable for certain obscure numpy operations.
If allow_double is False, a return type of float64 will be forced to float32 and complex128 will be forced to complex64.
-
pygpu.dtypes.
get_np_obj
(obj)¶ Returns a numpy object of the same dtype and comportement as the source suitable for output dtype determination.
This is used since the casting rules of numpy are rather obscure and the best way to imitate them is to try an operation ans see what it does.
-
pygpu.dtypes.
parse_c_arg_backend
(c_arg, scalar_arg_class, vec_arg_class)¶
-
pygpu.dtypes.
register_dtype
(dtype, c_names)¶ Associate a numpy dtype with its C equivalents.
Will register dtype for use with the gpuarray module. If the c_names argument is a list then the first element of that list is taken as the primary association and will be used for generated C code. The other types will be mapped to the provided dtype when going in the other direction.
- Parameters
dtype (numpy.dtype or string) – type to associate
c_names (str or list) – list of C type names
-
pygpu.dtypes.
upcast
(*args)¶
pygpu.tools module¶
-
pygpu.tools.
as_argument
(obj, name)¶
-
pygpu.tools.
check_args
(args, collapse=False, broadcast=False)¶ Returns the properties of arguments and checks if they all match (are all the same shape)
If collapse is True dimension collapsing will be performed. If collapse is False dimension collapsing will not be performed.
If broadcast is True array broadcasting will be performed which means that dimensions which are of size 1 in some arrays but not others will be repeated to match the size of the other arrays. If broadcast is False no broadcasting takes place.
-
pygpu.tools.
lru_cache
(maxsize=20)¶
-
pygpu.tools.
prod
(iterable)¶