Commit Graph

14940 Commits

Author SHA1 Message Date
Frederic Weisbecker
1d2c6cfd40 kill-the-bkl/reiserfs: turn GFP_ATOMIC flag to GFP_NOFS in reiserfs_get_block()
GFP_ATOMIC was used in reiserfs_get_block to not lose the Bkl so that
nobody can modify the tree in the middle of its work. Now that we
kicked out the bkl, we can use a more friendly flag. We use GFP_NOFS
here because we already hold the reiserfs lock.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>
2009-11-20 18:25:02 +01:00
Frederic Weisbecker
27b3a5c51b kill-the-bkl/reiserfs: drop the fs race watchdog from _get_block_create_0()
We had a watchdog in _get_block_create_0() that jumped to a fixup retry
path in case the bkl got relaxed while calling kmap().
This is not necessary anymore since we now have a reiserfs lock that is
not implicitly relaxed while sleeping.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>
2009-10-14 23:34:31 +02:00
Frederic Weisbecker
205cb37b89 kill-the-bkl/reiserfs: definitely drop the bkl from reiserfs_ioctl()
The reiserfs ioctl path doesn't need the big kernel lock anymore , now
that the filesystem synchronizes through its own lock.

We can then turn reiserfs_ioctl() into an unlocked_ioctl callback.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>
2009-10-14 23:28:12 +02:00
Frederic Weisbecker
ac78a07893 kill-the-bkl/reiserfs: always lock the ioctl path
Reiserfs uses the ioctl callback for its file operations, which means
that its ioctl path is still locked by the bkl, this was synchronizing
with the rest of the filsystem operations. We have changed that by
locking it with the new reiserfs lock but we do that only from the
compat_ioctl callback.

Fix that by locking reiserfs_ioctl() everytime.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>
2009-10-14 23:27:57 +02:00
Frederic Weisbecker
48f6ba5e69 kill-the-bkl/reiserfs: fix reiserfs lock to cpu_add_remove_lock dependency
While creating the reiserfs workqueue during the journal
initialization, we are holding the reiserfs lock, but
create_workqueue() also holds the cpu_add_remove_lock, creating
then the following dependency:

- reiserfs lock -> cpu_add_remove_lock

But we also have the following existing dependencies:

- mm->mmap_sem -> reiserfs lock
- cpu_add_remove_lock -> cpu_hotplug.lock -> slub_lock -> sysfs_mutex

The merged dependency chain then becomes:

- mm->mmap_sem -> reiserfs lock -> cpu_add_remove_lock ->
	cpu_hotplug.lock -> slub_lock -> sysfs_mutex

But when we fill a dir entry in sysfs_readir(), we are holding the
sysfs_mutex and we also might fault while copying the directory entry
to the user, leading to the following dependency:

- sysfs_mutex -> mm->mmap_sem

The end result is then a lock inversion between sysfs_mutex and
mm->mmap_sem, as reported in the following lockdep warning:

[ INFO: possible circular locking dependency detected ]
2.6.31-07095-g25a3912 #4
-------------------------------------------------------
udevadm/790 is trying to acquire lock:
 (&mm->mmap_sem){++++++}, at: [<c1098942>] might_fault+0x72/0xc0

but task is already holding lock:
 (sysfs_mutex){+.+.+.}, at: [<c110813c>] sysfs_readdir+0x7c/0x260

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (sysfs_mutex){+.+.+.}:
      [...]

-> #4 (slub_lock){+++++.}:
      [...]

-> #3 (cpu_hotplug.lock){+.+.+.}:
      [...]

-> #2 (cpu_add_remove_lock){+.+.+.}:
      [...]

-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
      [...]

-> #0 (&mm->mmap_sem){++++++}:
      [...]

This can be fixed by relaxing the reiserfs lock while creating the
workqueue.
This is fine to relax the lock here, we just keep it around to pass
through reiserfs lock checks and for paranoid reasons.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Tested-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
2009-10-05 16:31:37 +02:00
Frederic Weisbecker
193be0ee17 kill-the-bkl/reiserfs: Fix induced mm->mmap_sem to sysfs_mutex dependency
Alexander Beregalov reported the following warning:

	=======================================================
	[ INFO: possible circular locking dependency detected ]
	2.6.31-03149-gdcc030a #1
	-------------------------------------------------------
	udevadm/716 is trying to acquire lock:
	 (&mm->mmap_sem){++++++}, at: [<c107249a>] might_fault+0x4a/0xa0

	but task is already holding lock:
	 (sysfs_mutex){+.+.+.}, at: [<c10cb9aa>] sysfs_readdir+0x5a/0x200

	which lock already depends on the new lock.

	the existing dependency chain (in reverse order) is:

	-> #3 (sysfs_mutex){+.+.+.}:
	       [...]

	-> #2 (&bdev->bd_mutex){+.+.+.}:
	       [...]

	-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
	       [...]

	-> #0 (&mm->mmap_sem){++++++}:
	       [...]

On reiserfs mount path, we take the reiserfs lock and while
initializing the journal, we open the device, taking the
bdev->bd_mutex. Then rescan_partition() may signal the change
to sysfs.

We have then the following dependency:

	reiserfs_lock -> bd_mutex -> sysfs_mutex

Later, while entering reiserfs_readpage() after a pagefault in an
mmaped reiserfs file, we are holding the mm->mmap_sem, and we are going
to take the reiserfs lock too.
We have then the following dependency:

	mm->mmap_sem -> reiserfs_lock

which, expanded with the previous dependency gives us:

	mm->mmap_sem -> reiserfs_lock -> bd_mutex -> sysfs_mutex

Now while entering the sysfs readdir path, we are holding the
sysfs_mutex. And when we copy a directory entry to the user buffer, we
might fault and then take the mm->mmap_sem lock. Which leads to the
circular locking dependency reported.

We can fix that by relaxing the reiserfs lock during the call to
journal_init_dev(), which is the place where we open the mounted
device.

This is fine to relax the lock here because we are in the begining of
the reiserfs mount path and there is nothing to protect at this time,
the journal is not intialized.
We just keep this lock around for paranoid reasons.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Tested-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
2009-09-17 05:31:37 +02:00
Frederic Weisbecker
8050318598 kill-the-bkl/reiserfs: panic in case of lock imbalance
Until now, trying to unlock the reiserfs write lock whereas the current
task doesn't hold it lead to a simple warning.
We should actually warn and panic in this case to avoid the user datas
to reach an unstable state.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
2009-09-14 07:18:30 +02:00
Frederic Weisbecker
7e94277050 kill-the-bkl/reiserfs: fix recursive reiserfs write lock in reiserfs_commit_write()
reiserfs_commit_write() is always called with the write lock held.
Thus the current calls to reiserfs_write_lock() in this function are
acquiring the lock recursively.
We can safely drop them.

This also solves further assumptions for this lock to be really
released while calling reiserfs_write_unlock().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
2009-09-14 07:18:29 +02:00
Frederic Weisbecker
b10ab4c337 kill-the-bkl/reiserfs: fix recursive reiserfs lock in reiserfs_mkdir()
reiserfs_mkdir() acquires the reiserfs lock, assuming it has been called
from the dir inodes callbacks, without the lock held.

But it can also be called from other internal sites such as
reiserfs_xattr_init() which already holds the lock. This recursive
locking leads to further wrong assumptions. For example, later calls
to reiserfs_mutex_lock_safe() won't actually unlock the reiserfs lock
the time we acquire a given mutex, creating unexpected lock inversions.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
2009-09-14 07:18:27 +02:00
Frederic Weisbecker
ae635c0bbd kill-the-bkl/reiserfs: fix "reiserfs lock" / "inode mutex" lock inversion dependency
reiserfs_xattr_init is called with the reiserfs write lock held, but
if the ".reiserfs_priv" entry is not created, we take the superblock
root directory inode mutex until .reiserfs_priv is created.

This creates a lock dependency inversion against other sites such as
reiserfs_file_release() which takes an inode mutex and the reiserfs
lock after.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
2009-09-14 07:18:26 +02:00
Frederic Weisbecker
08f14fc896 kill-the-bkl/reiserfs: move the concurrent tree accesses checks per superblock
When do_balance() balances the tree, a trick is performed to
provide the ability for other tree writers/readers to check whether
do_balance() is executing concurrently (requires CONFIG_REISERFS_CHECK).

This is done to protect concurrent accesses to the tree. The trick
is the following:

When do_balance is called, a unique global variable called cur_tb
takes a pointer to the current tree to be rebalanced.
Once do_balance finishes its work, cur_tb takes the NULL value.

Then, concurrent tree readers/writers just have to check the value
of cur_tb to ensure do_balance isn't executing concurrently.
If it is, then it proves that schedule() occured on do_balance(),
which then relaxed the bkl that protected the tree.

Now that the bkl has be turned into a mutex, this check is still
fine even though do_balance() becomes preemptible: the write lock
will not be automatically released on schedule(), so the tree is
still protected.

But this is only fine if we have a single reiserfs mountpoint.
Indeed, because the bkl is a global lock, it didn't allowed
concurrent executions between a tree reader/writer in a mount point
and a do_balance() on another tree from another mountpoint.

So assuming all these readers/writers weren't supposed to be
reentrant, the current check now sometimes detect false positives with
the current per-superblock mutex which allows this reentrancy.

This patch keeps the concurrent tree accesses check but moves it
per superblock, so that only trees from a same mount point are
checked to be not accessed concurrently.

[ Impact: fix spurious panic while running several reiserfs mount-points ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:25 +02:00
Frederic Weisbecker
c72e05756b kill-the-bkl/reiserfs: acquire the inode mutex safely
While searching a pathname, an inode mutex can be acquired
in do_lookup() which calls reiserfs_lookup() which in turn
acquires the write lock.

On the other side reiserfs_fill_super() can acquire the write_lock
and then call reiserfs_lookup_privroot() which can acquire an
inode mutex (the root of the mount point).

So we theoretically risk an AB - BA lock inversion that could lead
to a deadlock.

As for other lock dependencies found since the bkl to mutex
conversion, the fix is to use reiserfs_mutex_lock_safe() which
drops the lock dependency to the write lock.

[ Impact: fix a possible deadlock with reiserfs ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:24 +02:00
Frederic Weisbecker
2ac626955e kill-the-bkl/reiserfs: unlock only when needed in search_by_key
search_by_key() is the site which most requires the lock.
This is mostly because it is a very central function and also
because it releases/reaqcuires the write lock at least once each
time it is called.

Such release/reacquire creates a lot of contention in this place and
also opens more the window which let another thread changing the tree.
When it happens, the current path searching over the tree must be
retried from the beggining (the root) which is a wasteful and
time consuming recovery.

This patch factorizes two release/reacquire sequences:

- reading leaf nodes blocks
- reading current block

The latter immediately follows the former.

The whole sequence is safe as a single unlocked section because
we check just after if the tree has changed during these operations.

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:22 +02:00
Frederic Weisbecker
c63e3c0b24 kill-the-bkl/reiserfs: use mutex_lock in reiserfs_mutex_lock_safe
reiserfs_mutex_lock_safe() is a hack to avoid any dependency between
an internal reiserfs mutex and the write lock, it has been proposed
to follow the old bkl logic.

The code does the following:

while (!mutex_trylock(m)) {
	reiserfs_write_unlock(s);
	schedule();
	reiserfs_write_lock(s);
}

It then imitate the implicit behaviour of the lock when it was
a Bkl and hadn't such dependency:

mutex_lock(m) {
	if (fastpath)
		let's go
	else {
		wait_for_mutex() {
			schedule() {
				unlock_kernel()
				reacquire_lock_kernel()
			}
		}
	}
}

The problem is that by using such explicit schedule(), we don't
benefit of the adaptive mutex spinning on owner.

The logic in use now is:

reiserfs_write_unlock(s);
mutex_lock(m); // -> possible adaptive spinning
reiserfs_write_lock(s);

[ Impact: restore the use of adaptive spinning mutexes in reiserfs ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:21 +02:00
Frederic Weisbecker
d6f5b0aa08 kill-the-bkl/reiserfs: factorize the locking in reiserfs_write_end()
reiserfs_write_end() is a hot path in reiserfs.
We have two wasteful write lock lock/release inside that can be gathered
without changing the code logic.

This patch factorizes them out in a single protected section, reducing the
number of contentions inside.

[ Impact: reduce lock contention in a reiserfs hotpath ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:20 +02:00
Frederic Weisbecker
09eb47a7c5 kill-the-bkl/reiserfs: reduce number of contentions in search_by_key()
search_by_key() is a central function in reiserfs which searches
the patch in the fs tree from the root to a node given its key.

It is the function that is most requesting the write lock
because it's a path very often used.

Also we forget to release the lock while reading the next tree node,
making us holding the lock in a wasteful way.

Then we release the lock while reading the current node and its childs,
all-in-one. It should be safe because we have a reference to these
blocks and even if we read a block that will be concurrently changed,
we have an fs_changed check later that will make us retry the path from
the root.

[ Impact: release the write lock while unused in a hot path ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:19 +02:00
Frederic Weisbecker
b1c839bb2d kill-the-bkl/reiserfs: don't hold the write recursively in reiserfs_lookup()
The write lock can be acquired recursively in reiserfs_lookup(). But we may
want to *really* release the lock before possible rescheduling from a
reiserfs_lookup() callee.

Hence we want to only acquire the lock once (ie: not recursively).

[ Impact: prevent from possible false unreleased write lock on sleeping ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:17 +02:00
Frederic Weisbecker
26931309a4 kill-the-bkl/reiserfs: lock only once on reiserfs_get_block()
reiserfs_get_block() is one of these sites where the write lock might
be acquired recursively.

It's a particular problem because this function is called very often.
It's a hot spot which needs to reschedule() periodically while converting
direct items to indirect ones because it can take some time.

Then if we are applying the write lock release/reacquire pattern on
schedule() here, it may not produce the desired effect since we may have
locked in more than one depth.

The solution is to use reiserfs_write_lock_once() which won't try
to reacquire the lock recursively. Then the lock will be *really*
released before schedule().

Also, we only release the lock if TIF_NEED_RESCHED is set to not
create wasteful numerous contentions.

[ Impact: fix a too long holded lock case in reiserfs_get_block() ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:16 +02:00
Frederic Weisbecker
6e3647acb4 kill-the-BKL/reiserfs: release the write lock on flush_commit_list()
flush_commit_list() uses ll_rw_block() to commit the pending log blocks.
ll_rw_block() might sleep, and the bkl was released at this point. Then
we can also relax the write lock at this point.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:13 +02:00
Frederic Weisbecker
4c5eface5d kill-the-BKL/reiserfs: release the write lock inside reiserfs_read_bitmap_block()
reiserfs_read_bitmap_block() uses sb_bread() to read the bitmap block. This
helper might sleep.

Then, when the bkl was used, it was released at this point. We can then
relax the write lock too here.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:11 +02:00
Frederic Weisbecker
148d3504c1 kill-the-BKL/reiserfs: release the write lock inside get_neighbors()
get_neighbors() is used to get the left and/or right blocks
against a given one in order to balance a tree.

sb_bread() is used to read the buffer of these neighors blocks and
while it waits for this operation, it might sleep.

The bkl was released at this point, and then we can also release
the write lock before calling sb_bread().

This is safe because if the filesystem is changed after this
lock release, the function returns REPEAT_SEARCH (aka SCHEDULE_OCCURRED
in the function header comments) in order to repeat the neighbhor
research.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:10 +02:00
Frederic Weisbecker
5e69e3a449 kill-the-BKL/reiserfs: release write lock while rescheduling on prepare_for_delete_or_cut()
prepare_for_delete_or_cut() can process several types of items, including
indirect items, ie: items which contain no file data but pointers to
unformatted nodes scattering the datas of a file.

In this case it has to zero out these pointers to block numbers of
unformatted nodes and release the bitmap from these block numbers.

It can take some time, so a rescheduling() is performed between each
block processed. We can safely release the write lock while
rescheduling(), like the bkl did, because the code checks just after
if the item has moved after sleeping.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:09 +02:00
Frederic Weisbecker
e6950a4da3 kill-the-BKL/reiserfs: release the write lock before rescheduling on do_journal_end()
When do_journal_end() copies data to the journal blocks buffers in memory,
it reschedules if needed between each block copied and dirtyfied.

We can also release the write lock at this rescheduling stage,
like did the bkl implicitly.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:08 +02:00
Frederic Weisbecker
dc8f6d8936 kill-the-BKL/reiserfs: only acquire the write lock once in reiserfs_dirty_inode
Impact: fix a deadlock

reiserfs_dirty_inode() is the super_operations::dirty_inode() callback
of reiserfs. It can be called from different contexts where the write
lock can be already held.

But this function also grab the write lock (possibly recursively).
Subsequent release of the lock before sleep will actually not release
the lock if the caller of mark_inode_dirty() (which in turn calls
reiserfs_dirty_inode()) already owns the lock.

A typical case:

reiserfs_write_end() {
	acquire_write_lock()
	mark_inode_dirty() {
		reiserfs_dirty_inode() {
			reacquire_write_lock() {
				journal_begin() {
					do_journal_begin_r() {
						/*
						 * fail to release, still
						 * one depth of lock
						 */
						release_write_lock()
						reiserfs_wait_on_write_block() {
							wait_event()

The event is usually provided by something which needs the write lock but
it hasn't been released.

We use reiserfs_write_lock_once() here to ensure we only grab the
write lock in one level.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@texware.it>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
LKML-Reference: <1239680065-25013-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-14 07:18:04 +02:00
Frederic Weisbecker
22c963addc kill-the-BKL/reiserfs: lock only once in reiserfs_truncate_file
Impact: fix a deadlock

reiserfs_truncate_file() can be called from multiple context where
the write lock can be already hold or not.

This function also acquire (possibly recursively) the write
lock. Subsequent releases before sleeping will not actually release
the lock because we may be in more than one lock depth degree.

A typical case is:

reiserfs_file_release {
	acquire_the_lock()
	reiserfs_truncate_file()
		reacquire_the_lock()
		journal_begin() {
			do_journal_begin_r() {
				reiserfs_wait_on_write_block() {
					/*
					 * Not released because still one
					 * depth owned
					 */
					release_lock()
					wait_for_event()

At this stage the event never happen because the one which provides
it needs the write lock.

We use reiserfs_write_lock_once() here to ensure that we don't acquire the
write lock recursively.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@texware.it>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
LKML-Reference: <1239680065-25013-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-14 07:18:03 +02:00
Frederic Weisbecker
daf88c8983 kill-the-BKL/reiserfs: provide a tool to lock only once the write lock
Sometimes we don't want to recursively hold the per superblock write
lock because we want to be sure it is actually released when we come
to sleep.

This patch introduces the necessary tools for that.

reiserfs_write_lock_once() does the same job than reiserfs_write_lock()
except that it won't try to acquire recursively the lock if the current
task already owns it. Also the lock_depth before the call of this function
is returned.

reiserfs_write_unlock_once() unlock only if reiserfs_write_lock_once()
returned a depth equal to -1, ie: only if it actually locked.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@texware.it>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
LKML-Reference: <1239680065-25013-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-14 07:18:02 +02:00
Frederic Weisbecker
a412f9efdd reiserfs, kill-the-BKL: fix unsafe j_flush_mutex lock
Impact: fix a deadlock

The j_flush_mutex is acquired safely in journal.c:
if we can't take it, we free the reiserfs per superblock lock
and wait a bit.

But we have a remaining place in kupdate_transactions() where
j_flush_mutex is still acquired traditionnaly. Thus the following
scenario (warned by lockdep) can happen:

A						B

mutex_lock(&write_lock)			mutex_lock(&write_lock)
	mutex_lock(&j_flush_mutex)	mutex_lock(&j_flush_mutex) //block
	mutex_unlock(&write_lock)
	sleep...
	mutex_lock(&write_lock) //deadlock

Fix this by using reiserfs_mutex_lock_safe() in kupdate_transactions().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@texware.it>
Cc: Jeff Mahoney <jeffm@suse.com>
LKML-Reference: <1239660635-12940-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-14 07:18:01 +02:00
Frederic Weisbecker
8ebc423238 reiserfs: kill-the-BKL
This patch is an attempt to remove the Bkl based locking scheme from
reiserfs and is intended.

It is a bit inspired from an old attempt by Peter Zijlstra:

   http://lkml.indiana.edu/hypermail/linux/kernel/0704.2/2174.html

The bkl is heavily used in this filesystem to prevent from
concurrent write accesses on the filesystem.

Reiserfs makes a deep use of the specific properties of the Bkl:

- It can be acqquired recursively by a same task
- It is released on the schedule() calls and reacquired when schedule() returns

The two properties above are a roadmap for the reiserfs write locking so it's
very hard to simply replace it with a common mutex.

- We need a recursive-able locking unless we want to restructure several blocks
  of the code.
- We need to identify the sites where the bkl was implictly relaxed
  (schedule, wait, sync, etc...) so that we can in turn release and
  reacquire our new lock explicitly.
  Such implicit releases of the lock are often required to let other
  resources producer/consumer do their job or we can suffer unexpected
  starvations or deadlocks.

So the new lock that replaces the bkl here is a per superblock mutex with a
specific property: it can be acquired recursively by a same task, like the
bkl.

For such purpose, we integrate a lock owner and a lock depth field on the
superblock information structure.

The first axis on this patch is to turn reiserfs_write_(un)lock() function
into a wrapper to manage this mutex. Also some explicit calls to
lock_kernel() have been converted to reiserfs_write_lock() helpers.

The second axis is to find the important blocking sites (schedule...(),
wait_on_buffer(), sync_dirty_buffer(), etc...) and then apply an explicit
release of the write lock on these locations before blocking. Then we can
safely wait for those who can give us resources or those who need some.
Typically this is a fight between the current writer, the reiserfs workqueue
(aka the async commiter) and the pdflush threads.

The third axis is a consequence of the second. The write lock is usually
on top of a lock dependency chain which can include the journal lock, the
flush lock or the commit lock. So it's dangerous to release and trying to
reacquire the write lock while we still hold other locks.

This is fine with the bkl:

      T1                       T2

lock_kernel()
    mutex_lock(A)
    unlock_kernel()
    // do something
                            lock_kernel()
                                mutex_lock(A) -> already locked by T1
                                schedule() (and then unlock_kernel())
    lock_kernel()
    mutex_unlock(A)
    ....

This is not fine with a mutex:

      T1                       T2

mutex_lock(write)
    mutex_lock(A)
    mutex_unlock(write)
    // do something
                           mutex_lock(write)
                              mutex_lock(A) -> already locked by T1
                              schedule()

    mutex_lock(write) -> already locked by T2
    deadlock

The solution in this patch is to provide a helper which releases the write
lock and sleep a bit if we can't lock a mutex that depend on it. It's another
simulation of the bkl behaviour.

The last axis is to locate the fs callbacks that are called with the bkl held,
according to Documentation/filesystem/Locking.

Those are:

- reiserfs_remount
- reiserfs_fill_super
- reiserfs_put_super

Reiserfs didn't need to explicitly lock because of the context of these callbacks.
But now we must take care of that with the new locking.

After this patch, reiserfs suffers from a slight performance regression (for now).
On UP, a high volume write with dd reports an average of 27 MB/s instead
of 30 MB/s without the patch applied.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Bron Gondwana <brong@fastmail.fm>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
LKML-Reference: <1239070789-13354-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-14 07:17:59 +02:00
Mimi Zohar
acd0c93517 IMA: update ima_counts_put
- As ima_counts_put() may be called after the inode has been freed,
verify that the inode is not NULL, before dereferencing it.

- Maintain the IMA file counters in may_open() properly, decrementing
any counter increments on subsequent errors.

Reported-by: Ciprian Docan <docan@eden.rutgers.edu>
Reported-by: J.R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com
Signed-off-by: James Morris <jmorris@namei.org>
2009-09-07 11:54:58 +10:00
Linus Torvalds
5136a6c0fd Merge git://git.infradead.org/~dwmw2/mtd-2.6.31
* git://git.infradead.org/~dwmw2/mtd-2.6.31:
  JFFS2: add missing verify buffer allocation/deallocation
  mtd: nftl: fix offset alignments
  mtd: nftl: write support is broken
  mtd: m25p80: fix null pointer dereference bug
2009-09-05 14:57:04 -07:00
Linus Torvalds
0edfa2b1b5 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: actually enable the swapext compat handler
2009-09-05 14:25:14 -07:00
Linus Torvalds
5a09adf130 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_key
2009-09-05 14:24:33 -07:00
Nicolas Pitre
9de6886ec6 ext2: fix unbalanced kmap()/kunmap()
In ext2_rename(), dir_page is acquired through ext2_dotdot().  It is
then released through ext2_set_link() but only if old_dir != new_dir.
Failing that, the pkmap reference count is never decremented and the
page remains pinned forever.  Repeat that a couple times with highmem
pages and all pkmap slots get exhausted, and every further kmap() calls
end up stalling on the pkmap_map_wait queue at which point the whole
system comes to a halt.

Signed-off-by: Nicolas Pitre <nico@marvell.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05 13:41:08 -07:00
Linus Torvalds
ac7ac9f2b9 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2: ocfs2_write_begin_nolock() should handle len=0
  ocfs2: invalidate dentry if its dentry_lock isn't initialized.
2009-09-05 13:38:37 -07:00
Oleg Nesterov
a2a8474c3f exec: do not sleep in TASK_TRACED under ->cred_guard_mutex
Tom Horsley reports that his debugger hangs when it tries to read
/proc/pid_of_tracee/maps, this happens since

	"mm_for_maps: take ->cred_guard_mutex to fix the race with exec"
	04b836cbf19e885f8366bccb2e4b0474346c02d

commit in 2.6.31.

But the root of the problem lies in the fact that do_execve() path calls
tracehook_report_exec() which can stop if the tracer sets PT_TRACE_EXEC.

The tracee must not sleep in TASK_TRACED holding this mutex.  Even if we
remove ->cred_guard_mutex from mm_for_maps() and proc_pid_attr_write(),
another task doing PTRACE_ATTACH should not hang until it is killed or the
tracee resumes.

With this patch do_execve() does not use ->cred_guard_mutex directly and
we do not hold it throughout, instead:

	- introduce prepare_bprm_creds() helper, it locks the mutex
	  and calls prepare_exec_creds() to initialize bprm->cred.

	- install_exec_creds() drops the mutex after commit_creds(),
	  and thus before tracehook_report_exec()->ptrace_stop().

	  or, if exec fails,

	  free_bprm() drops this mutex when bprm->cred != NULL which
	  indicates install_exec_creds() was not called.

Reported-by: Tom Horsley <tom.horsley@att.net>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05 11:30:42 -07:00
Sunil Mushran
8379e7c46c ocfs2: ocfs2_write_begin_nolock() should handle len=0
Bug introduced by mainline commit e7432675f8
The bug causes ocfs2_write_begin_nolock() to oops when len=0.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 14:28:31 -07:00
Massimo Cirillo
bc8cec0dff JFFS2: add missing verify buffer allocation/deallocation
The function jffs2_nor_wbuf_flash_setup() doesn't allocate the verify buffer
if CONFIG_JFFS2_FS_WBUF_VERIFY is defined, so causing a kernel panic when
that macro is enabled and the verify function is called. Similarly the
jffs2_nor_wbuf_flash_cleanup() must free the buffer if
CONFIG_JFFS2_FS_WBUF_VERIFY is enabled.
The following patch fixes the problem.
The following patch applies to 2.6.30 kernel.

Signed-off-by: Massimo Cirillo <maxcir@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org
2009-09-03 15:01:34 +01:00
Christoph Hellwig
3725867dcc xfs: actually enable the swapext compat handler
Fix a small typo in the compat ioctl handler that cause the swapext
compat handler to never be called.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Tested-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-09-01 17:00:46 -05:00
Ian Kent
37d0892c5a autofs4 - fix missed case when changing to use struct path
In the recent change by Al Viro that changes verious subsystems
to use "struct path" one case was missed in the autofs4 module
which causes mounts to no longer expire.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-31 17:44:05 -10:00
Ryusuke Konishi
b1f1b8ce0a nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_key
This will fix the following preempt count underflow reported from
users with the title "[NILFS users] segctord problem" (Message-ID:
<949415.6494.qm@web58808.mail.re1.yahoo.com> and Message-ID:
<debc30fc0908270825v747c1734xa59126623cfd5b05@mail.gmail.com>):

 WARNING: at kernel/sched.c:4890 sub_preempt_count+0x95/0xa0()
 Hardware name: HP Compaq 6530b (KR980UT#ABC)
 Modules linked in: bridge stp llc bnep rfcomm l2cap xfs exportfs nilfs2 cowloop loop vboxnetadp vboxnetflt vboxdrv btusb bluetooth uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 arc4 snd_hda_codec_analog ecb iwlagn iwlcore rfkill lib80211 mac80211 snd_hda_intel snd_hda_codec ehci_hcd uhci_hcd usbcore snd_hwdep snd_pcm tg3 cfg80211 psmouse snd_timer joydev libphy ohci1394 snd_page_alloc hp_accel lis3lv02d ieee1394 led_class i915 drm i2c_algo_bit video backlight output i2c_core dm_crypt dm_mod
 Pid: 4197, comm: segctord Not tainted 2.6.30-gentoo-r4-64 #7
 Call Trace:
  [<ffffffff8023fa05>] ? sub_preempt_count+0x95/0xa0
  [<ffffffff802470f8>] warn_slowpath_common+0x78/0xd0
  [<ffffffff8024715f>] warn_slowpath_null+0xf/0x20
  [<ffffffff8023fa05>] sub_preempt_count+0x95/0xa0
  [<ffffffffa04ce4db>] nilfs_btnode_prepare_change_key+0x11b/0x190 [nilfs2]
  [<ffffffffa04d01ad>] nilfs_btree_assign_p+0x19d/0x1e0 [nilfs2]
  [<ffffffffa04d10ad>] nilfs_btree_assign+0xbd/0x130 [nilfs2]
  [<ffffffffa04cead7>] nilfs_bmap_assign+0x47/0x70 [nilfs2]
  [<ffffffffa04d9bc6>] nilfs_segctor_do_construct+0x956/0x20f0 [nilfs2]
  [<ffffffff805ac8e2>] ? _spin_unlock_irqrestore+0x12/0x40
  [<ffffffff803c06e0>] ? __up_write+0xe0/0x150
  [<ffffffff80262959>] ? up_write+0x9/0x10
  [<ffffffffa04ce9f3>] ? nilfs_bmap_test_and_clear_dirty+0x43/0x60 [nilfs2]
  [<ffffffffa04cd627>] ? nilfs_mdt_fetch_dirty+0x27/0x60 [nilfs2]
  [<ffffffffa04db5fc>] nilfs_segctor_construct+0x8c/0xd0 [nilfs2]
  [<ffffffffa04dc3dc>] nilfs_segctor_thread+0x15c/0x3a0 [nilfs2]
  [<ffffffffa04dbe20>] ? nilfs_construction_timeout+0x0/0x10 [nilfs2]
  [<ffffffff80252633>] ? add_timer+0x13/0x20
  [<ffffffff802370da>] ? __wake_up_common+0x5a/0x90
  [<ffffffff8025e960>] ? autoremove_wake_function+0x0/0x40
  [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2]
  [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2]
  [<ffffffff8025e556>] kthread+0x56/0x90
  [<ffffffff8020cdea>] child_rip+0xa/0x20
  [<ffffffff8025e500>] ? kthread+0x0/0x90
  [<ffffffff8020cde0>] ? child_rip+0x0/0x20

This problem was caused due to a missing radix_tree_preload() call in
the retry path of nilfs_btnode_prepare_change_key() function.

Reported-by: Eric A <eric225125@yahoo.com>
Reported-by: Jerome Poulin <jeromepoulin@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Jerome Poulin <jeromepoulin@gmail.com>
Cc: stable@kernel.org
2009-08-31 12:03:06 +09:00
Eric Paris
750a8870fe inotify: update the group mask on mark addition
Seperating the addition and update of marks in inotify resulted in a
regression in that inotify never gets events.  The inotify group mask is
always 0.  This mask should be updated any time a new mark is added.

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-28 12:51:14 -04:00
Eric Paris
83cb10f0ef inotify: fix length reporting and size checking
0db501bd06 introduced a regresion in that it now sends a nul
terminator but the length accounting when checking for space or
reporting to userspace did not take this into account.  This corrects
all of the rounding logic.

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-28 11:57:55 -04:00
Brian Rogers
b962e7312a inotify: do not send a block of zeros when no pathname is available
When an event has no pathname, there's no need to pad it with a null byte and
therefore generate an inotify_event sized block of zeros. This fixes a
regression introduced by commit 0db501bd06 where
my system wouldn't finish booting because some process was being confused by
this.

Signed-off-by: Brian Rogers <brian@xyzw.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-28 10:03:06 -04:00
Tao Ma
a1b08e75df ocfs2: invalidate dentry if its dentry_lock isn't initialized.
In commit a5a0a63092, when
ocfs2_attch_dentry_lock fails, we call an extra iput and reset
dentry->d_fsdata to NULL. This resolve a bug, but it isn't
completed and the dentry is still there. When we want to use
it again, ocfs2_dentry_revalidate doesn't catch it and return
true. That make future ocfs2_dentry_lock panic out.
One bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1162.

The resolution is to add a check for dentry->d_fsdata in
revalidate process and return false if dentry->d_fsdata is NULL,
so that a new ocfs2_lookup will be called again.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-08-27 18:10:54 -07:00
Linus Torvalds
9c504cadc4 Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
  inotify: Ensure we alwasy write the terminating NULL.
  inotify: fix locking around inotify watching in the idr
  inotify: do not BUG on idr entries at inotify destruction
  inotify: seperate new watch creation updating existing watches
2009-08-27 12:26:02 -07:00
Linus Torvalds
cf481442f2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: update documentation pointers
  9p: remove unnecessary v9fses->options which duplicates the mount string
  net/9p: insulate the client against an invalid error code sent by a 9p server
  9p: Add missing cast for the error return value in v9fs_get_inode
  9p: Remove redundant inode uid/gid assignment
  9p: Fix possible regressions when ->get_sb fails.
  9p: Fix v9fs show_options
  9p: Fix possible memleak in v9fs_inode_from fid.
  9p: minor comment fixes
  9p: Fix possible inode leak in v9fs_get_inode.
  9p: Check for error in return value of v9fs_fid_add
2009-08-27 12:24:08 -07:00
David Howells
9886e836a6 AFS: Stop readlink() on AFS crashing due to NULL 'file' ptr
kAFS crashes when asked to read a symbolic link because page_getlink()
passes a NULL file pointer to read_mapping_page(), but afs_readpage()
expects a file pointer from which to extract a key.

Modify afs_readpage() to request the appropriate key from the calling
process's keyrings if a file struct is not supplied with one attached.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-27 12:22:08 -07:00
Eric W. Biederman
0db501bd06 inotify: Ensure we alwasy write the terminating NULL.
Before the rewrite copy_event_to_user always wrote a terqminating '\0'
byte to user space after the filename.  Since the rewrite that
terminating byte was skipped if your filename is exactly a multiple of
event_size.  Ouch!

So add one byte to name_size before we round up and use clear_user to
set userspace to zero like /dev/zero does instead of copying the
strange nul_inotify_event.  I can't quite convince myself len_to_zero
will never exceed 16 and even if it doesn't clear_user should be more
efficient and a more accurate reflection of what the code is trying to
do.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-27 08:02:10 -04:00
Eric Paris
dead537dd8 inotify: fix locking around inotify watching in the idr
The are races around the idr storage of inotify watches.  It's possible
that a watch could be found from sys_inotify_rm_watch() in the idr, but it
could be removed from the idr before that code does it's removal.  Move the
locking and the refcnt'ing so that these have to happen atomically.

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-27 08:02:04 -04:00
Eric Paris
cf4374267f inotify: do not BUG on idr entries at inotify destruction
If an inotify watch is left in the idr when an fsnotify group is destroyed
this will lead to a BUG.  This is not a dangerous situation and really
indicates a programming bug and leak of memory.  This patch changes it to
use a WARN and a printk rather than killing people's boxes.

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-27 08:02:04 -04:00