697f4d68cf
Bunch of performance improvements and cleanups Zach Brown and I have been working on. The code should be pretty solid at this point, though it could of course use more review and testing. The results in my testing are pretty impressive, particularly when an ioctx is being shared between multiple threads. In my crappy synthetic benchmark, with 4 threads submitting and one thread reaping completions, I saw overhead in the aio code go from ~50% (mostly ioctx lock contention) to low single digits. Performance with ioctx per thread improved too, but I'd have to rerun those benchmarks. The reason I've been focused on performance when the ioctx is shared is that for a fair number of real world completions, userspace needs the completions aggregated somehow - in practice people just end up implementing this aggregation in userspace today, but if it's done right we can do it much more efficiently in the kernel. Performance wise, the end result of this patch series is that submitting a kiocb writes to _no_ shared cachelines - the penalty for sharing an ioctx is gone there. There's still going to be some cacheline contention when we deliver the completions to the aio ringbuffer (at least if you have interrupts being delivered on multiple cores, which for high end stuff you do) but I have a couple more patches not in this series that implement coalescing for that (by taking advantage of interrupt coalescing). With that, there's basically no bottlenecks or performance issues to speak of in the aio code. This patch: use_mm() is used in more places than just aio. There's no need to mention callers when describing the function. Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
60 lines
1.2 KiB
C
60 lines
1.2 KiB
C
/* Copyright (C) 2009 Red Hat, Inc.
|
|
*
|
|
* See ../COPYING for licensing terms.
|
|
*/
|
|
|
|
#include <linux/mm.h>
|
|
#include <linux/mmu_context.h>
|
|
#include <linux/export.h>
|
|
#include <linux/sched.h>
|
|
|
|
#include <asm/mmu_context.h>
|
|
|
|
/*
|
|
* use_mm
|
|
* Makes the calling kernel thread take on the specified
|
|
* mm context.
|
|
* (Note: this routine is intended to be called only
|
|
* from a kernel thread context)
|
|
*/
|
|
void use_mm(struct mm_struct *mm)
|
|
{
|
|
struct mm_struct *active_mm;
|
|
struct task_struct *tsk = current;
|
|
|
|
task_lock(tsk);
|
|
active_mm = tsk->active_mm;
|
|
if (active_mm != mm) {
|
|
atomic_inc(&mm->mm_count);
|
|
tsk->active_mm = mm;
|
|
}
|
|
tsk->mm = mm;
|
|
switch_mm(active_mm, mm, tsk);
|
|
task_unlock(tsk);
|
|
|
|
if (active_mm != mm)
|
|
mmdrop(active_mm);
|
|
}
|
|
EXPORT_SYMBOL_GPL(use_mm);
|
|
|
|
/*
|
|
* unuse_mm
|
|
* Reverses the effect of use_mm, i.e. releases the
|
|
* specified mm context which was earlier taken on
|
|
* by the calling kernel thread
|
|
* (Note: this routine is intended to be called only
|
|
* from a kernel thread context)
|
|
*/
|
|
void unuse_mm(struct mm_struct *mm)
|
|
{
|
|
struct task_struct *tsk = current;
|
|
|
|
task_lock(tsk);
|
|
sync_mm_rss(mm);
|
|
tsk->mm = NULL;
|
|
/* active_mm is still 'mm' */
|
|
enter_lazy_tlb(mm, tsk);
|
|
task_unlock(tsk);
|
|
}
|
|
EXPORT_SYMBOL_GPL(unuse_mm);
|