If ib_umem_release() is called after ib_uverbs_close() sets context->closing,
then a process can get stuck in a D state, because the code boils down to
if (down_write_trylock(&mm->mmap_sem))
down_write(&mm->mmap_sem);
which is obviously a stupid instant deadlock. Fix the code so that we
only try to take the lock once.
This bug was introduced in commit f7c6a7b5 ("IB/uverbs: Export
ib_umem_get()/ib_umem_release() to modules") which fortunately never
made it into a release, and was reported by Pete Wyckoff <pw@osc.edu>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
IB/cm: Improve local id allocation
IPoIB/cm: Fix SRQ WR leak
IB/ipoib: Fix typos in error messages
IB/mlx4: Check if SRQ is full when posting receive
IB/mlx4: Pass send queue sizes from userspace to kernel
IB/mlx4: Fix check of opcode in mlx4_ib_post_send()
mlx4_core: Fix array overrun in dump_dev_cap_flags()
IB/mlx4: Fix RESET to RESET and RESET to ERROR transitions
IB/mthca: Fix RESET to ERROR transition
IB/mlx4: Set GRH:HopLimit when sending globally routed MADs
IB/mthca: Set GRH:HopLimit when building MLX headers
IB/mlx4: Fix check of max_qp_dest_rdma in modify QP
IB/mthca: Fix use-after-free on device restart
IB/ehca: Return proper error code if register_mr fails
IPoIB: Handle P_Key table reordering
IB/core: Use start_port() and end_port()
IB/core: Add helpers for uncached GID and P_Key searches
IB/ipath: Fix potential deadlock with multicast spinlocks
IB/core: Free umem when mm is already gone
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.
This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
getting them indirectly
Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
they don't need sched.h
b) sched.h stops being dependency for significant number of files:
on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
after patch it's only 3744 (-8.3%).
Cross-compile tested on
all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
alpha alpha-up
arm
i386 i386-up i386-defconfig i386-allnoconfig
ia64 ia64-up
m68k
mips
parisc parisc-up
powerpc powerpc-up
s390 s390-up
sparc sparc-up
sparc64 sparc64-up
um-x86_64
x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
as well as my two usual configs.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Free umem when task's mm is already destroyed by the time
ib_umem_release gets called.
Found by Dotan Barak at Mellanox.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When memory pinned with ib_umem_get() is released, ib_umem_release()
needs to subtract the amount of memory being unpinned from
mm->locked_vm. However, ib_umem_release() may be called with
mm->mmap_sem already held for writing if the memory is being released
as part of an munmap() call, so it is sometimes necessary to defer
this accounting into a workqueue.
However, the work struct used to defer this accounting is dynamically
allocated before it is queued, so there is the possibility of failing
that allocation. If the allocation fails, then ib_umem_release has no
choice except to bail out and leave the process with a permanently
elevated locked_vm.
Fix this by allocating the structure to defer accounting as part of
the original struct ib_umem, so there's no possibility of failing a
later allocation if creating the struct ib_umem and pinning memory
succeeds.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Export ib_umem_get()/ib_umem_release() and put low-level drivers in
control of when to call ib_umem_get() to pin and DMA map userspace,
rather than always calling it in ib_uverbs_reg_mr() before calling the
low-level driver's reg_user_mr method.
Also move these functions to be in the ib_core module instead of
ib_uverbs, so that driver modules using them do not depend on
ib_uverbs.
This has a number of advantages:
- It is better design from the standpoint of making generic code a
library that can be used or overridden by device-specific code as
the details of specific devices dictate.
- Drivers that do not need to pin userspace memory regions do not
need to take the performance hit of calling ib_mem_get(). For
example, although I have not tried to implement it in this patch,
the ipath driver should be able to avoid pinning memory and just
use copy_{to,from}_user() to access userspace memory regions.
- Buffers that need special mapping treatment can be identified by
the low-level driver. For example, it may be possible to solve
some Altix-specific memory ordering issues with mthca CQs in
userspace by mapping CQ buffers with extra flags.
- Drivers that need to pin and DMA map userspace memory for things
other than memory regions can use ib_umem_get() directly, instead
of hacks using extra parameters to their reg_phys_mr method. For
example, the mlx4 driver that is pending being merged needs to pin
and DMA map QP and CQ buffers, but it does not need to create a
memory key for these buffers. So the cleanest solution is for mlx4
to call ib_umem_get() in the create_qp and create_cq methods.
Signed-off-by: Roland Dreier <rolandd@cisco.com>