Commit Graph

14900 Commits

Author SHA1 Message Date
David Woodhouse
83121942b2 Btrfs: Fix crash on read failures at mount
If the tree roots hit read errors during mount, btrfs is not properly
erroring out.  We need to check the uptodate bits after
reading in the tree root node.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22 16:52:13 -04:00
Daniel Cadete
c271b49241 Btrfs: remove of redundant btrfs_header_level
This removes the continues call's of btrfs_header_level. One call of
btrfs_header_level(c) its enough.

Signed-off-by Daniel Cadete <danielncadete10@gmail.com>

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22 16:52:13 -04:00
Julia Lawall
33c17ad571 Btrfs: adjust NULL test
Move the call to BUG_ON to before the dereference of the tested value.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22 16:49:01 -04:00
David Woodhouse
3acada49c2 Btrfs: Remove broken sanity check from btrfs_rmap_block()
It was never actually doing anything anyway (see the loop condition),
and it would be difficult to make it work for RAID[56].

Even if it was actually working, it's checking for the wrong thing
anyway. Instead of checking whether we list a block which _doesn't_ land
at the relevant physical location, it should be checking that we _have_
listed all the logical blocks which refer to the required physical
location on all devices.

This function is only called from remove_sb_from_cache() to ensure that
we reserve the logical blocks which would reside at the same physical
location as the superblock copies. So listing more blocks than we need
is actually OK.

With RAID[56] we're going to throw away an entire stripe for each block
we have to ignore, so we _are_ going to list blocks other than the
ones which actually contain the superblock.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22 16:49:01 -04:00
Julia Lawall
29c5e8ce01 Btrfs: convert nested spin_lock_irqsave to spin_lock
If spin_lock_irqsave is called twice in a row with the same second
argument, the interrupt state at the point of the second call overwrites
the value saved by the first call.  Indeed, the second call does not need
to save the interrupt state, so it is changed to a simple spin_lock.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22 16:49:00 -04:00
Peter Zijlstra
023d43c7b5 lockdep: Fix lockdep annotation for pipe_double_lock()
The presumed use of the pipe_double_lock() routine is to lock 2 locks in
a deadlock free way by ordering the locks by their address. However it
fails to keep the specified lock classes in order and explicitly
annotates a deadlock.

Rectify this.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Miklos Szeredi <mszeredi@suse.cz>
LKML-Reference: <1248163763.15751.11098.camel@twins>
2009-07-22 21:14:14 +02:00
Linus Torvalds
1f9758d4e7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  fs/Kconfig: move nilfs2 out
2009-07-22 10:05:00 -07:00
Yan Zheng
4a8c9a62d7 Btrfs: make sure all dirty blocks are written at commit time
Write dirty block groups may allocate new block, and so may add new delayed
back ref. btrfs_run_delayed_refs may make some block groups dirty.

commit_cowonly_roots does not handle the recursion properly, and some dirty
blocks can be left unwritten at commit time. This patch moves
btrfs_run_delayed_refs into the loop that writes dirty block groups, and makes
the code not break out of the loop until there are no dirty block groups or
delayed back refs.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22 10:07:05 -04:00
Yan Zheng
33c66f430b Btrfs: fix locking issue in btrfs_find_next_key
When walking up the tree, btrfs_find_next_key assumes the upper level tree
block is properly locked. This isn't always true even path->keep_locks is 1.
This is because btrfs_find_next_key may advance path->slots[] several times
instead of only once.

When 'path->slots[level] >= btrfs_header_nritems(path->nodes[level])' is found,
we can't guarantee the original value of 'path->slots[level]' is
'btrfs_header_nritems(path->nodes[level]) - 1'. If it's not, the tree block at
'level + 1' isn't locked.

This patch fixes the issue by explicitly checking the locking state,
re-searching the tree if it's not locked.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22 09:59:00 -04:00
Yan Zheng
e457afec60 Btrfs: fix double increment of path->slots[0] in btrfs_next_leaf
if 1 is returned by btrfs_search_slot, the path already points to the
first item with 'key > searching key'. So increasing path->slots[0] by
one is superfluous in that case.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22 09:59:00 -04:00
Yan Zheng
bf1fb512a5 Btrfs: properly update space information after shrinking device.
Change 'goto done' to 'break' for the case of all device extents have
been freed, so that the code updates space information will be execute.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22 09:59:00 -04:00
Yan Zheng
1bec1aed1e Btrfs: fix definition of struct btrfs_extent_inline_ref
use __le64 instead of u64 in on-disk structure definition.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-22 09:59:00 -04:00
Trond Myklebust
d953126a28 NFSv4: Fix a problem whereby a buggy server can oops the kernel
We just had a case in which a buggy server occasionally returns the wrong
attributes during an OPEN call. While the client does catch this sort of
condition in nfs4_open_done(), and causes the nfs4_atomic_open() to return
-EISDIR, the logic in nfs_atomic_lookup() is broken, since it causes a
fallback to an ordinary lookup instead of just returning the error.

When the buggy server then returns a regular file for the fallback lookup,
the VFS allows the open, and bad things start to happen, since the open
file doesn't have any associated NFSv4 state.

The fix is firstly to return the EISDIR/ENOTDIR errors immediately, and
secondly to ensure that we are always careful when dereferencing the
nfs_open_context state pointer.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-07-21 19:22:38 -04:00
Jan Kara
f7b1aa69be ocfs2: Fix deadlock on umount
In commit ea455f8ab6, we moved the dentry lock
put process into ocfs2_wq. This causes problems during umount because ocfs2_wq
can drop references to inodes while they are being invalidated by
invalidate_inodes() causing all sorts of nasty things (invalidate_inodes()
ending in an infinite loop, "Busy inodes after umount" messages etc.).

We fix the problem by stopping ocfs2_wq from doing any further releasing of
inode references on the superblock being unmounted, wait until it finishes
the current round of releasing and finally cleaning up all the references in
dentry_lock_list from ocfs2_put_super().

The issue was tracked down by Tao Ma <tao.ma@oracle.com>.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-21 15:47:55 -07:00
Tao Ma
3c5e10683e ocfs2: Add extra credits and access the modified bh in update_edge_lengths.
In normal tree rotation left process, we will never touch the tree
branch above subtree_index and ocfs2_extend_rotate_transaction doesn't
reserve the credits for them either.

But when we want to delete the rightmost extent block, we have to update
the rightmost records for all the rightmost branch(See
ocfs2_update_edge_lengths), so we have to allocate extra credits for them.
What's more, we have to access them also.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-21 14:41:54 -07:00
Trond Myklebust
fccba80455 NFSv4: Fix an NFSv4 mount regression
Commit 008f55d0e0 (nfs41: recover lease in
_nfs4_lookup_root) forces the state manager to always run on mount. This is
a bug in the case of NFSv4.0, which doesn't require us to send a
setclientid until we want to grab file state.

In any case, this is completely the wrong place to be doing state
management. Moving that code into nfs4_init_session...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-07-21 16:48:07 -04:00
Trond Myklebust
b64aec8d1e NFSv4: Fix an Oops in nfs4_free_lock_state
The oops http://www.kerneloops.org/raw.php?rawid=537858&msgid= appears to
be due to the nfs4_lock_state->ls_state field being uninitialised. This
happens if the call to nfs4_free_lock_state() is triggered at the end of
nfs4_get_lock_state().

The fix is to move the initialisation of ls_state into the allocator.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-07-21 16:47:46 -04:00
Eric Paris
f44aebcc56 inotify: use GFP_NOFS under potential memory pressure
inotify can have a watchs removed under filesystem reclaim.

=================================
[ INFO: inconsistent lock state ]
2.6.31-rc2 #16
---------------------------------
inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
khubd/217 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (iprune_mutex){+.+.?.}, at: [<c10ba899>] invalidate_inodes+0x20/0xe3
{IN-RECLAIM_FS-W} state was registered at:
  [<c10536ab>] __lock_acquire+0x2c9/0xac4
  [<c1053f45>] lock_acquire+0x9f/0xc2
  [<c1308872>] __mutex_lock_common+0x2d/0x323
  [<c1308c00>] mutex_lock_nested+0x2e/0x36
  [<c10ba6ff>] shrink_icache_memory+0x38/0x1b2
  [<c108bfb6>] shrink_slab+0xe2/0x13c
  [<c108c3e1>] kswapd+0x3d1/0x55d
  [<c10449b5>] kthread+0x66/0x6b
  [<c1003fdf>] kernel_thread_helper+0x7/0x10
  [<ffffffff>] 0xffffffff

Two things are needed to fix this.  First we need a method to tell
fsnotify_create_event() to use GFP_NOFS and second we need to stop using
one global IN_IGNORED event and allocate them one at a time.  This solves
current issues with multiple IN_IGNORED on a queue having tail drop
problems and simplifies the allocations since we don't have to worry about
two tasks opperating on the IGNORED event concurrently.

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-07-21 15:26:27 -04:00
Eric Paris
c05594b621 fsnotify: fix inotify tail drop check with path entries
fsnotify drops new events when they are the same as the tail event on the
queue to be sent to userspace.  The problem is that if the event comes with
a path we forget to break out of the switch statement and fall into the
code path which matches on events that do not have any type of file backed
information (things like IN_UNMOUNT and IN_Q_OVERFLOW).  The problem is
that this code thinks all such events should be dropped.  Fix is to add a
break.

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-07-21 15:26:26 -04:00
Eric Paris
4a148ba988 inotify: check filename before dropping repeat events
inotify drops events if the last event on the queue is the same as the
current event.  But it does 2 things wrong.  First it is comparing old->inode
with new->inode.  But after an event if put on the queue the ->inode is no
longer allowed to be used.  It's possible between the last event and this new
event the inode could be reused and we would falsely match the inode's memory
address between two differing events.

The second problem is that when a file is removed fsnotify is passed the
negative dentry for the removed object rather than the postive dentry from
immediately before the removal.  This mean the (broken) inotify tail drop code
was matching the NULL ->inode of differing events.

The fix is to check the file name which is stored with events when doing the
tail drop instead of wrongly checking the address of the stored ->inode.

Reported-by: Scott James Remnant <scott@ubuntu.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2009-07-21 15:26:26 -04:00
Eric Paris
520dc2a526 fsnotify: use def_bool in kconfig instead of letting the user choose
fsnotify doens't give the user anything.  If someone chooses inotify or
dnotify it should build fsnotify, if they don't select one it shouldn't be
built.  This patch changes fsnotify to be a def_bool=n and makes everything
else select it.  Also fixes the issue people complained about on lwn where
gdm hung because they didn't have inotify and they didn't get the inotify
build option.....

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-07-21 15:26:26 -04:00
Eric Paris
7e790dd5fc inotify: fix error paths in inotify_update_watch
inotify_update_watch could leave things in a horrid state on a number of
error paths.  We could try to remove idr entries that didn't exist, we
could send an IN_IGNORED to userspace for watches that don't exist, and a
bit of other stupidity.  Clean these up by doing the idr addition before we
put the mark on the inode since we can clean that up on error and getting
off the inode's mark list is hard.

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-07-21 15:26:26 -04:00
Eric Paris
75fe2b2639 inotify: do not leak inode marks in inotify_add_watch
inotify_add_watch had a couple of problems.  The biggest being that if
inotify_add_watch was called on the same inode twice (to update or change the
event mask) a refence was taken on the original inode mark by
fsnotify_find_mark_entry but was not being dropped at the end of the
inotify_add_watch call.  Thus if inotify_rm_watch was called although the mark
was removed from the inode, the refcnt wouldn't hit zero and we would leak
memory.

Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2009-07-21 15:26:26 -04:00
Eric Paris
5549f7cdf8 inotify: drop user watch count when a watch is removed
The inotify rewrite forgot to drop the inotify watch use cound when a watch
was removed.  This means that a single inotify fd can only ever register a
maximum of /proc/sys/fs/max_user_watches even if some of those had been
freed.

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-07-21 15:26:26 -04:00
dingdinghua
f1015c4477 jbd: fix race between write_metadata_buffer and get_write_access
The function journal_write_metadata_buffer() calls jbd_unlock_bh_state(bh_in)
too early; this could potentially allow another thread to call get_write_access
on the buffer head, modify the data, and dirty it, and allowing the wrong data
to be written into the journal.  Fortunately, if we lose this race, the only
time this will actually cause filesystem corruption is if there is a system
crash or other unclean shutdown of the system before the next commit can take
place.

Signed-off-by: dingdinghua <dingdinghua85@gmail.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-07-21 11:54:42 +02:00
Wengang Wang
1f4cea3790 ocfs2: Fail ocfs2_get_block() immediately when a block needs allocation
ocfs2_get_block() does no allocation.  Hole filling for writes should
have happened farther up in the call chain.  We detect this case and
print an error, but we then continue with the function.  We should be
exiting immediately.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-20 17:36:07 -07:00
Wengang Wang
cbfa9639ae ocfs2: Fix error return in ocfs2_write_cluster()
A typo caused ocfs2_write_cluster() to return 0 in some error cases.
Fix it.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-20 17:32:55 -07:00
Linus Torvalds
457f82bac6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: Fix incorrect parameters to v9fs_file_readn.
  9p: Possible regression in p9_client_stat
  9p: default 9p transport module fix
2009-07-20 16:48:31 -07:00
Subrata Modak
44d8e4e1e9 ocfs2: Fix compilation warning for fs/ocfs2/xattr.c
gcc 4.4.1 generates the following build warning on i386:

CC [M]  fs/ocfs2/xattr.o
fs/ocfs2/xattr.c: In function ‘ocfs2_xattr_block_get’:
fs/ocfs2/xattr.c:1055: warning: ‘block_off’ may be used uninitialized in this function

The following fix is based on a similar approach by David Howells
few days back: http://lkml.org/lkml/2009/7/9/109,

Signed-off-by: Subrata Modak<subrata@linux.vnet.ibm.com>,
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-20 15:52:57 -07:00
Goldwyn Rodrigues
cefcb800fa ocfs2: Initialize count in aio_write before generic_write_checks
generic_write_checks() expects count to be initialized to the size of
the write.  Writes to files open with O_DIRECT|O_LARGEFILE write 0 bytes
because count is uninitialized.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-20 15:47:58 -07:00
Jeff Layton
90a98b2f3f cifs: free nativeFileSystem field before allocating a new one
...otherwise, we'll leak this memory if we have to reconnect (e.g. after
network failure).

Signed-off-by: Jeff Layton <jlayton@redhat.com>
CC: Stable <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-07-20 18:24:37 +00:00
Steve French
f6c4338543 Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 2009-07-16 04:21:39 +00:00
Jan Kara
43237b5490 ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle()
Get rid of extenddisksize parameter of ext3_get_blocks_handle(). This seems to
be a relict from some old days and setting disksize in this function does not
make much sence. Currently it was set only by ext3_getblk().  Since the
parameter has some effect only if create == 1, it is easy to check that the
three callers which end up calling ext3_getblk() with create == 1 (ext3_append,
ext3_quota_write, ext3_mkdir) do the right thing and set disksize themselves.

Signed-off-by: Jan Kara <jack@suse.cz>
2009-07-15 21:30:46 +02:00
Jan Kara
1e9fd53b78 jbd: Fix a race between checkpointing code and journal_get_write_access()
The following race can happen:

  CPU1                          CPU2
                                checkpointing code checks the buffer, adds
                                  it to an array for writeback
do_get_write_access()
  ...
  lock_buffer()
  unlock_buffer()
                                  flush_batch() submits the buffer for IO
  __jbd_journal_file_buffer()

  So a buffer under writeout is returned from do_get_write_access(). Since
the filesystem code relies on the fact that journaled buffers cannot be
written out, it does not take the buffer lock and so it can modify buffer
while it is under writeout. That can lead to a filesystem corruption
if we crash at the right moment. The similar problem can happen with
the journal_get_create_access() path.
  We fix the problem by clearing the buffer dirty bit under buffer_lock
even if the buffer is on BJ_None list. Actually, we clear the dirty bit
regardless the list the buffer is in and warn about the fact if
the buffer is already journalled.

Thanks for spotting the problem goes to dingdinghua <dingdinghua85@gmail.com>.

Reported-by: dingdinghua <dingdinghua85@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-07-15 21:30:07 +02:00
Jan Kara
9eaaa2d575 ext3: Fix truncation of symlinks after failed write
Contents of long symlinks is written via standard write methods. So when the
write fails, we add inode to orphan list. But symlinks don't have .truncate
method defined so nobody properly removes them from the orphan list (both on
disk and in memory).

Fix this by calling ext3_truncate() directly instead of calling vmtruncate()
(which is saner anyway since we don't need anything vmtruncate() does except
from calling .truncate in these paths).  We also add inode to orphan list only
if ext3_can_truncate() is true (currently, it can be false for symlinks when
there are no blocks allocated) - otherwise orphan list processing will complain
and ext3_truncate() will not remove inode from on-disk orphan list.

Signed-off-by: Jan Kara <jack@suse.cz>
2009-07-15 21:28:07 +02:00
Jan Kara
7447a668a3 jbd: Fail to load a journal if it is too short
Due to on disk corruption, it can happen that journal is too short. Fail
to load it in such case so that we don't oops somewhere later.

Reported-by: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-07-15 21:26:23 +02:00
Linus Torvalds
8aa651e23e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
  dlm: free socket in error exit path
  dlm: fix plock use-after-free
  dlm: Fix uninitialised variable warning in lock.c
2009-07-14 18:37:24 -07:00
Linus Torvalds
c0c50b541a Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing/function-profiler: do not free per cpu variable stat
  tracing/events: Move TRACE_SYSTEM outside of include guard
2009-07-14 18:34:32 -07:00
Abhishek Kulkarni
9c9ad6162e 9p: Fix incorrect parameters to v9fs_file_readn.
Fix v9fs_vfs_readpage. The offset and size parameters to v9fs_file_readn
were interchanged and hence passed incorrectly.

Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2009-07-14 15:54:42 -05:00
Casey Dahlin
a89d63a159 dlm: free socket in error exit path
In the tcp_connect_to_sock() error exit path, the socket
allocated at the top of the function was not being freed.

Signed-off-by: Casey Dahlin <cdahlin@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2009-07-14 12:28:43 -05:00
Ryusuke Konishi
4fed598a49 fs/Kconfig: move nilfs2 out
fs/Kconfig file was split into individual fs/*/Kconfig files before
nilfs was merged.  I've found the current config entry of nilfs is
tainting the work.  Sorry, I didn't notice.  This fixes the violation.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
2009-07-14 12:34:17 +09:00
Linus Torvalds
1cf29683f4 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: fix race between write_metadata_buffer and get_write_access
  ext4: Fix ext4_mb_initialize_context() to initialize all fields
  ext4: fix null handler of ioctls in no journal mode
  ext4: Fix buffer head reference leak in no-journal mode
  ext4: Move __ext4_journalled_writepage() to avoid forward declaration
  ext4: Fix mmap/truncate race when blocksize < pagesize && !nodellaoc
  ext4: Fix mmap/truncate race when blocksize < pagesize && delayed allocation
  ext4: Don't look at buffer_heads outside i_size.
  ext4: Fix goal inum check in the inode allocator
  ext4: fix no journal corruption with locale-gen
  ext4: Calculate required journal credits for inserting an extent properly
  ext4: Fix truncation of symlinks after failed write
  jbd2: Fix a race between checkpointing code and journal_get_write_access()
  ext4: Use rcu_barrier() on module unload.
  ext4: naturally align struct ext4_allocation_request
  ext4: mark several more functions in mballoc.c as noinline
  ext4: Fix potential reclaim deadlock when truncating partial block
  jbd2: Remove GFP_ATOMIC kmalloc from inside spinlock critical region
  ext4: Fix type warning on 64-bit platforms in tracing events header
2009-07-13 16:39:25 -07:00
dingdinghua
96577c4382 jbd2: fix race between write_metadata_buffer and get_write_access
The function jbd2_journal_write_metadata_buffer() calls
jbd_unlock_bh_state(bh_in) too early; this could potentially allow
another thread to call get_write_access on the buffer head, modify the
data, and dirty it, and allowing the wrong data to be written into the
journal.  Fortunately, if we lose this race, the only time this will
actually cause filesystem corruption is if there is a system crash or
other unclean shutdown of the system before the next commit can take
place.

Signed-off-by: dingdinghua <dingdinghua85@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-13 17:55:35 -04:00
Linus Torvalds
a4dc32374e Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
  wm97xx_batery: replace driver_data with dev_get_drvdata()
  omap: video: remove direct access of driver_data
  Sound: remove direct access of driver_data
  driver model: fix show/store prototypes in doc.
  Firmware: firmware_class, fix lock imbalance
  Driver Core: remove BUS_ID_SIZE
  sparc: remove driver-core BUS_ID_SIZE
  partitions: fix broken uevent_suppress conversion
  devres: WARN() and return, don't crash on device_del() of uninitialized device
2009-07-13 10:24:08 -07:00
Theodore Ts'o
833576b362 ext4: Fix ext4_mb_initialize_context() to initialize all fields
Pavel Roskin pointed out that kmemcheck indicated that
ext4_mb_store_history() was accessing uninitialized values of
ac->ac_tail and ac->ac_buddy leading to garbage in the mballoc
history.  Fix this by initializing the entire structure to all zeros
first.

Also, two fields were getting doubly initialized by the caller of
ext4_mb_initialize_context, so remove them for efficiency's sake.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-13 09:45:52 -04:00
Peng Tao
ac046f1d61 ext4: fix null handler of ioctls in no journal mode
The EXT4_IOC_GROUP_ADD and EXT4_IOC_GROUP_EXTEND ioctls should not
flush the journal in no_journal mode.  Otherwise, running resize2fs on
a mounted no_journal partition triggers the following error messages:

BUG: unable to handle kernel NULL pointer dereference at 00000014
IP: [<c039d282>] _spin_lock+0x8/0x19
*pde = 00000000 
Oops: 0002 [#1] SMP

Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-13 09:30:17 -04:00
Curt Wohlgemuth
e6b5d30104 ext4: Fix buffer head reference leak in no-journal mode
We found a problem with buffer head reference leaks when using an ext4
partition without a journal.  In particular, calls to ext4_forget() would
not to a brelse() on the input buffer head, which will cause pages they
belong to to not be reclaimable.

Further investigation showed that all places where ext4_journal_forget() and
ext4_journal_revoke() are called are subject to the same problem.  The patch
below changes __ext4_journal_forget/__ext4_journal_revoke to do an explicit
release of the buffer head when the journal handle isn't valid.

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-13 09:07:20 -04:00
Li Zefan
d0b6e04a4c tracing/events: Move TRACE_SYSTEM outside of include guard
If TRACE_INCLDUE_FILE is defined, <trace/events/TRACE_INCLUDE_FILE.h>
will be included and compiled, otherwise it will be
<trace/events/TRACE_SYSTEM.h>

So TRACE_SYSTEM should be defined outside of #if proctection,
just like TRACE_INCLUDE_FILE.

Imaging this scenario:

 #include <trace/events/foo.h>
    -> TRACE_SYSTEM == foo
 ...
 #include <trace/events/bar.h>
    -> TRACE_SYSTEM == bar
 ...
 #define CREATE_TRACE_POINTS
 #include <trace/events/foo.h>
    -> TRACE_SYSTEM == bar !!!

and then bar.h will be included and compiled.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A5A9CF1.2010007@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-13 10:59:55 +02:00
Heiko Carstens
f8c73c790c partitions: fix broken uevent_suppress conversion
git commit f67f129e "Driver core: implement uevent suppress in kobject"
contains this chunk for fs/partitions/check.c:

 	/* suppress uevent if the disk supresses it */
-	if (!ddev->uevent_suppress)
+	if (!dev_get_uevent_suppress(pdev))
 		kobject_uevent(&pdev->kobj, KOBJ_ADD);

However that should have been

-	if (!ddev->uevent_suppress)
+	if (!dev_get_uevent_suppress(ddev))

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Ming Lei <tom.leiming@gmail.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-12 13:02:09 -07:00
Artem Bityutskiy
dd0d9a46f5 AFS: Fix compilation warning
Fix the following warning:

  fs/afs/dir.c: In function 'afs_d_revalidate':
  fs/afs/dir.c:567: warning: 'fid.vnode' may be used uninitialized in this function
  fs/afs/dir.c:567: warning: 'fid.unique' may be used uninitialized in this function

by marking the 'fid' variable as an uninitialized_var.  The problem is
that gcc doesn't always manage to work out that fid is always set on the
path through the function that uses it.

Cc: linux-afs@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-12 12:24:07 -07:00
Alexey Dobriyan
405f55712d headers: smp_lock.h redux
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
  It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

  This will make hardirq.h inclusion cheaper for every PREEMPT=n config
  (which includes allmodconfig/allyesconfig, BTW)

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-12 12:22:34 -07:00
Linus Torvalds
81e4e1ba7e Revert "fuse: Fix build error" as unnecessary
This reverts commit 097041e576.

Trond had a better fix, which is the parent of this one ("Fix compile
error due to congestion_wait() changes")

Requested-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-11 11:22:34 -07:00
Bartlomiej Zolnierkiewicz
8711c67bee isofs: fix Joliet regression
commit 5404ac8e44 ("isofs: cleanup mount
option processing") missed conversion of joliet option flag resulting
in non-working Joliet support.

CC: walt <w41ter@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-10 19:18:59 -07:00
Linus Torvalds
44c695b13b Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
  UBIFS: fix corruption dump
  UBIFS: clean up free space checking
  UBIFS: small amendments in the LEB scanning code
  UBIFS: dump a little more in case of corruptions
  MAINTAINERS: update ahunter's e-mail address
  UBIFS: allow more than one volume to be mounted
  UBIFS: fix assertion warning
  UBIFS: minor spelling and grammar fixes
  UBIFS: fix 64-bit divisions in debug print
  UBIFS: few spelling fixes
  UBIFS: set write-buffer timout to 3-5 seconds
  UBIFS: slightly optimize write-buffer timer usage
  UBIFS: improve debugging messaged
  UBIFS: fix integer overflow warning
2009-07-10 19:14:48 -07:00
Linus Torvalds
04eef90c2e Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  osdblk: Adjust queue limits to lower device's limits
  osdblk: a Linux block device for OSD objects
  MAINTAINERS: Add osd maintained files (F:)
  exofs: Avoid using file_fsync()
  exofs: Remove IBM copyrights
  exofs: Fix bio leak in error handling path (sync read)
2009-07-10 19:12:24 -07:00
Larry Finger
097041e576 fuse: Fix build error
When building v2.6.31-rc2-344-g69ca06c, the following build errors are
found due to missing includes:

 CC [M]  fs/fuse/dev.o
fs/fuse/dev.c: In function ‘request_end’:
fs/fuse/dev.c:289: error: ‘BLK_RW_SYNC’ undeclared (first use in this function)
...
fs/nfs/write.c: In function ‘nfs_set_page_writeback’:
fs/nfs/write.c:207: error: ‘BLK_RW_ASYNC’ undeclared (first use in this function)

Signed-off-by: Larry Finger@lwfinger.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-10 19:09:46 -07:00
Wengang Wang
812e7a6a43 ocfs2: log the actual return value of ocfs2_file_aio_write()
in ocfs2_file_aio_write(), log_exit() could don't log the value
which is really returned. this patch fixes it.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-10 16:53:52 -07:00
Linus Torvalds
69ca06c945 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  cfq-iosched: reset oom_cfqq in cfq_set_request()
  block: fix sg SG_DXFER_TO_FROM_DEV regression
  block: call blk_scsi_ioctl_init()
  Fix congestion_wait() sync/async vs read/write confusion
2009-07-10 14:29:58 -07:00
Linus Torvalds
9f2d8be426 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix disorder in cp count on error during deleting checkpoints
  nilfs2: fix lockdep warning between regular file and inode file
  nilfs2: fix incorrect KERN_CRIT messages in case of write failures
  nilfs2: fix hang problem of log writer which occurs after write failures
  nilfs2: remove unlikely directive causing mis-conversion of error code
2009-07-10 14:27:21 -07:00
FUJITA Tomonori
ecb554a846 block: fix sg SG_DXFER_TO_FROM_DEV regression
I overlooked SG_DXFER_TO_FROM_DEV support when I converted sg to use
the block layer mapping API (2.6.28).

Douglas Gilbert explained SG_DXFER_TO_FROM_DEV:

http://www.spinics.net/lists/linux-scsi/msg37135.html

=
The semantics of SG_DXFER_TO_FROM_DEV were:
   - copy user space buffer to kernel (LLD) buffer
   - do SCSI command which is assumed to be of the DATA_IN
     (data from device) variety. This would overwrite
     some or all of the kernel buffer
   - copy kernel (LLD) buffer back to the user space.

The idea was to detect short reads by filling the original
user space buffer with some marker bytes ("0xec" it would
seem in this report). The "resid" value is a better way
of detecting short reads but that was only added this century
and requires co-operation from the LLD.
=

This patch changes the block layer mapping API to support this
semantics. This simply adds another field to struct rq_map_data and
enables __bio_copy_iov() to copy data from user space even with READ
requests.

It's better to add the flags field and kills null_mapped and the new
from_user fields in struct rq_map_data but that approach makes it
difficult to send this patch to stable trees because st and osst
drivers use struct rq_map_data (they were converted to use the block
layer in 2.6.29 and 2.6.30). Well, I should clean up the block layer
mapping API.

zhou sf reported this regiression and tested this patch:

http://www.spinics.net/lists/linux-scsi/msg37128.html
http://www.spinics.net/lists/linux-scsi/msg37168.html

Reported-by: zhou sf <sxzzsf@gmail.com>
Tested-by: zhou sf <sxzzsf@gmail.com>
Cc: stable@kernel.org
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-10 20:31:53 +02:00
Jens Axboe
8aa7e847d8 Fix congestion_wait() sync/async vs read/write confusion
Commit 1faa16d228 accidentally broke
the bdi congestion wait queue logic, causing us to wait on congestion
for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-10 20:31:53 +02:00
Steve French
65bc98b005 [CIFS] Distinguish posix opens and mkdirs from legacy mkdirs in stats
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-07-10 15:27:25 +00:00
Linus Torvalds
c2cc49a2f8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: when ATTR_READONLY is set, only clear write bits on non-directories
  cifs: remove cifsInodeInfo->inUse counter
  cifs: convert cifs_get_inode_info and non-posix readdir to use cifs_iget
  [CIFS] update cifs version number
  cifs: add and use CIFSSMBUnixSetFileInfo for setattr calls
  cifs: make a separate function for filling out FILE_UNIX_BASIC_INFO
  cifs: rename CIFSSMBUnixSetInfo to CIFSSMBUnixSetPathInfo
  cifs: add pid of initiating process to spnego upcall info
  cifs: fix regression with O_EXCL creates and optimize away lookup
  cifs: add new cifs_iget function and convert unix codepath to use it
2009-07-09 20:40:58 -07:00
Jeff Layton
d0c280d26d cifs: when ATTR_READONLY is set, only clear write bits on non-directories
cifs: when ATTR_READONLY is set, only clear write bits on non-directories

On windows servers, ATTR_READONLY apparently either has no meaning or
serves as some sort of queue to certain applications for unrelated
behavior. This MS kbase article has details:

http://support.microsoft.com/kb/326549/

Don't clear the write bits directory mode when ATTR_READONLY is set.

Reported-by: pouchat@peewiki.net
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-07-09 23:06:04 +00:00
Jeff Layton
aeaaf253c4 cifs: remove cifsInodeInfo->inUse counter
cifs: remove cifsInodeInfo->inUse counter

It was purported to be a refcounter of some sort, but was never
used that way. It never served any purpose that wasn't served equally well
by the I_NEW flag.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-07-09 23:06:00 +00:00
Jeff Layton
0b8f18e358 cifs: convert cifs_get_inode_info and non-posix readdir to use cifs_iget
cifs: convert cifs_get_inode_info and non-posix readdir to use cifs_iget

Rather than allocating an inode and filling it out, have
cifs_get_inode_info fill out a cifs_fattr and call cifs_iget. This means
a pretty hefty reorganization of cifs_get_inode_info.

For the readdir codepath, add a couple of new functions for filling out
cifs_fattr's from different FindFile response infolevels.

Finally, remove cifs_new_inode since there are no more callers.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-07-09 23:05:48 +00:00
Steve French
b77863bfa1 [CIFS] update cifs version number
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-07-09 22:51:38 +00:00
Jeff Layton
3bbeeb3c93 cifs: add and use CIFSSMBUnixSetFileInfo for setattr calls
cifs: add and use CIFSSMBUnixSetFileInfo for setattr calls

When there's an open filehandle, SET_FILE_INFO is apparently preferred
over SET_PATH_INFO. Add a new variant that sets a FILE_UNIX_INFO_BASIC
infolevel via SET_FILE_INFO and switch cifs_setattr_unix to use the
new call when there's an open filehandle available.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-07-09 21:15:10 +00:00
Jeff Layton
654cf14ac0 cifs: make a separate function for filling out FILE_UNIX_BASIC_INFO
cifs: make a separate function for filling out FILE_UNIX_BASIC_INFO

The SET_FILE_INFO variant will need to do the same thing here. Break
this code out into a separate function that both variants can call.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-07-09 21:15:06 +00:00
Jeff Layton
01ea95e3b6 cifs: rename CIFSSMBUnixSetInfo to CIFSSMBUnixSetPathInfo
cifs: rename CIFSSMBUnixSetInfo to CIFSSMBUnixSetPathInfo

...in preparation of adding a SET_FILE_INFO variant.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-07-09 21:15:02 +00:00
Jeff Layton
c4c1bff64d cifs: add pid of initiating process to spnego upcall info
cifs: add pid of initiating process to spnego upcall info

This will allow the upcall to poke in /proc/<pid>/environ and get
the value of the $KRB5CCNAME env var for the process.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-07-09 21:14:58 +00:00
Artem Bityutskiy
0611254760 UBIFS: fix corruption dump
In the 'ubifs_recover_leb()' function, when we find corrupted
empty space, we dump 8K starting from the offset where the last
node ends. This is OK if the corrupted empty space is somewhere
near that offset. But if the corruption is far at the end of the
LEB, we will dump all 0xFF bytes and complitely ignore the
interesting data. This is observed on a PPC ("kilauea") with
NOR flash.

This patch changes the behavior and teaches UBIFS to print only
interesting data. I.e., now we find where corruption starts and
start dumping from that offset.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
2009-07-09 09:19:39 +03:00
Artem Bityutskiy
431102fed3 UBIFS: clean up free space checking
recovery.c has 'is_empty()' helper and it is better to use
this helper instead of re-implementing it in several places.
This patch does this and removes some amount of unneeded code.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
2009-07-09 09:19:38 +03:00
Artem Bityutskiy
ed43f2f06c UBIFS: small amendments in the LEB scanning code
This patch fixes few minor things I've spotted while going through
code:

1. Better document return codes
2. If 'ubifs_scan_a_node()' returns some thing we do not expect,
   treat this as an error.
3. Try to do recovery only when 'ubifs_scan()' returns %-EUCLEAN,
   not on any error.
4. If empty space starts at a non-aligned address, print a message.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
2009-07-09 09:19:38 +03:00
Artem Bityutskiy
086b3640c1 UBIFS: dump a little more in case of corruptions
In case of corruptions, dump 8192 bytes instead of 4096. The
largest node is 4096+ bytes, so it is better to see a node
boundary, which is not always possible when only 4096 bytes
are printed.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
2009-07-09 09:19:38 +03:00
Jeff Liu
17ae26b669 ocfs2: trivial fix for s/migrate/migration/ in dlmrecovery.c logging
in dlmrecovery.c:1121, replace 'migrate' to 'migration' to keep the consistency
by comparing to other lines with the similar log info in the same file.

Signed-off-by: Jeff Liu <jeff.liu@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-08 15:34:40 -07:00
Jeff Mahoney
8b712cd58a ocfs2: Fixup orphan scan cleanup after failed mount
If the mount fails for any reason, ocfs2_dismount_volume calls
ocfs2_orphan_scan_stop. It requires that ocfs2_orphan_scan_init
be called to setup the mutex and work queues, but that doesn't
happen if the mount has failed and we oops accessing an uninitialized
work queue.

This patch splits the init and startup of the orphan scan, eliminating
the oops.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-08 15:34:02 -07:00
Jeff Layton
5ddf1e0ff0 cifs: fix regression with O_EXCL creates and optimize away lookup
cifs: fix regression with O_EXCL creates and optimize away lookup

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Shirish Pargaonkar <shirishp@gmail.com>
CC: Stable Kernel <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-07-08 21:55:45 +00:00
Joe Perches
ad361c9884 Remove multiple KERN_ prefixes from printk formats
Commit 5fd29d6ccb ("printk: clean up
handling of log-levels and newlines") changed printk semantics.  printk
lines with multiple KERN_<level> prefixes are no longer emitted as
before the patch.

<level> is now included in the output on each additional use.

Remove all uses of multiple KERN_<level>s in formats.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-08 10:30:03 -07:00
Linus Torvalds
728b690fd5 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6:
  quota: Fix possible deadlock during parallel quotaon and quotaoff
2009-07-08 09:35:50 -07:00
Catalin Marinas
d5ce5b40bc Free the memory allocated by memdup_user() in fs/sysfs/bin.c
Commit 1c8542c7bb replaced kmalloc() with memdup_user() in the write()
function but also dropped the kfree(temp). The memdup_user() function
allocates memory which is never freed.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Parag Warudkar <parag.warudkar@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-08 09:34:07 -07:00
Alexey Dobriyan
b43f3cbd21 headers: mnt_namespace.h redux
Fix various silly problems wrt mnt_namespace.h:

 - exit_mnt_ns() isn't used, remove it
 - done that, sched.h and nsproxy.h inclusions aren't needed
 - mount.h inclusion was need for vfsmount_lock, but no longer
 - remove mnt_namespace.h inclusion from files which don't use anything
   from mnt_namespace.h

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-08 09:31:56 -07:00
Jiaying Zhang
d01730d74d quota: Fix possible deadlock during parallel quotaon and quotaoff
The following test script triggers a deadlock on ext2 filesystem:
while true; do quotaon /dev/hda >&/dev/null; usleep $RANDOM; done &
while true; do quotaoff /dev/hda >&/dev/null; usleep $RANDOM; done &

I found there is a potential deadlock between quotaon and quotaoff (or
quotasync). Basically, all of quotactl operations need to be protected by
dqonoff_mutex. vfs_quota_off and vfs_quota_sync also call sb->s_op->quota_write
that needs to grab the i_mutex of the quota file.  But in vfs_quota_on_inode
(called from quotaon operation), the current code tries to grab  the i_mutex of
the quota file first before getting quonoff_mutex.

Reverse the order in which we take locks in vfs_quota_on_inode().

Jan Kara: Changed changelog to be more readable, made lockdep happy with
  I_MUTEX_QUOTA.

Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-07-07 18:15:21 +02:00
Oleg Nesterov
793285fcaf cred_guard_mutex: do not return -EINTR to user-space
do_execve() and ptrace_attach() return -EINTR if
mutex_lock_interruptible(->cred_guard_mutex) fails.

This is not right, change the code to return ERESTARTNOINTR.

Perhaps we should also change proc_pid_attr_write().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-06 13:57:04 -07:00
Zhang, Yanmin
3beab0b424 sys_sync(): fix 16% performance regression in ffsb create_4k test
I run many ffsb test cases on JBODs (typically 13/12 disks).  Comparing
with kernel 2.6.30, 2.6.31-rc1 has about 16% regression with
ffsb_create_4k.  The sub test case creates files continuously for 10
minitues and every file is 1MB.

Bisect located below patch.

5cee5815d1 is first bad commit
commit 5cee5815d1
Author: Jan Kara <jack@suse.cz>
Date:   Mon Apr 27 16:43:51 2009 +0200

    vfs: Make sys_sync() use fsync_super() (version 4)

    It is unnecessarily fragile to have two places (fsync_super() and do_sync())
    doing data integrity sync of the filesystem. Alter __fsync_super() to
    accommodate needs of both callers and use it. So after this patch
    __fsync_super() is the only place where we gather all the calls needed to
    properly send all data on a filesystem to disk.

As a matter of fact, ffsb calls sys_sync in the end to make sure all data
is flushed to disks and the flushing is counted into the result.  vmstat
shows ffsb is blocked when syncing for a long time.  With 2.6.30, ffsb is
blocked for a short time.

I checked the patch and did experiments to recover the original methods.
Eventually, the root cause is the patch deletes the calling to
wakeup_pdflush when syncing, so only ffsb is blocked on disk I/O.
wakeup_pdflush could ask pdflush to write back pages with ffsb at the
same time.

[akpm@linux-foundation.org: restore comment too]
Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-06 13:57:03 -07:00
Daniel Mack
7fcd9c3ecb UBIFS: allow more than one volume to be mounted
UBIFS uses a bdi device per volume, but does not care to hand out unique
names to each of them. This causes an error when trying to mount more
than one volumes. Append the UBI volume and device ID to avoid that.

[Amended a bit by Artem Bityutskiy]

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Cc: Adrian Hunter <ext-adrian.hunter@nokia.com>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-07-05 18:45:19 +03:00
Artem Bityutskiy
1fb8bd01ed UBIFS: fix assertion warning
When debugging is enabled and an unclean file-system is mounter,
the following assertion is triggered:

UBIFS assert failed in ubifs_tnc_start_commit at 805 (pid 1081)
Call Trace:
[cfaffbd0] [c0006cf8] show_stack+0x44/0x16c (unreliable)
[cfaffc10] [c011b738] ubifs_tnc_start_commit+0xbb8/0xd18
[cfaffc90] [c0112670] do_commit+0x150/0xa44
[cfaffd10] [c0125234] ubifs_rcvry_gc_commit+0xd8/0x544
[cfaffd60] [c0100e9c] ubifs_fill_super+0xe78/0x15f8
[cfaffdf0] [c0102118] ubifs_get_sb+0x20c/0x320
[cfaffe70] [c007f764] vfs_kern_mount+0x58/0xe0
[cfaffe90] [c007f83c] do_kern_mount+0x40/0xf8
[cfaffeb0] [c0095c24] do_mount+0x550/0x758
[cfafff10] [c0095ebc] sys_mount+0x90/0xe0
[cfafff40] [c000ed4c] ret_from_syscall+0x0/0x3c

The reason is that we initialize 'c->min_leb_idx' early, and do
not re-calculate it after journal replay.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-07-05 18:45:19 +03:00
Adrian Hunter
681947d2fa UBIFS: minor spelling and grammar fixes
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
2009-07-05 18:45:18 +03:00
Adrian Hunter
4473758944 UBIFS: fix 64-bit divisions in debug print
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
2009-07-05 18:45:18 +03:00
Artem Bityutskiy
cb54ef8b13 UBIFS: few spelling fixes
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-07-05 18:45:18 +03:00
Artem Bityutskiy
2a35a3a8ab UBIFS: set write-buffer timout to 3-5 seconds
This patch cleans up write-buffer timeout initialization and
sets it to 3-5 interval.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-07-05 18:45:17 +03:00
Artem Bityutskiy
0b335b9d7d UBIFS: slightly optimize write-buffer timer usage
This patch adds the following minor optimization:

1. If write-buffer does not use the timer, indicate it with the
   wbuf->no_timer variable, instead of using the wbuf->softlimit
   variable. This is better because wbuf->softlimit is of ktime_t
   type, and the ktime_to_ns function contains 64-bit multiplication.

2. Do not call the 'hrtimer_cancel()' function for write-buffers
   which do not use timers.

3. Do not cancel the timer in 'ubifs_put_super()' because the
   synchronization function does this.

This patch also removes a confusing comment.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-07-05 18:45:16 +03:00
Artem Bityutskiy
70aee2f153 UBIFS: improve debugging messaged
1. Make the I/O debugging message print the journal head number.
2. Add prints to timer functions.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-07-05 18:45:16 +03:00
Adrian Hunter
e3dc5a665d UBIFS: fix integer overflow warning
Fix the following warning:

fs/ubifs/io.c: In function 'ubifs_wbuf_init':
fs/ubifs/io.c:860: warning: integer overflow in expression

And limit maximum hrtimer delta to ULONG_MAX because the
argument is 'unsigned long'.

Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-07-05 18:45:15 +03:00
Jiro SEKIBA
d9a0a345ab nilfs2: fix disorder in cp count on error during deleting checkpoints
This fixes a bug that checkpoint count gets wrong on errors when
deleting a series of checkpoints.

The count error is persistent since the checkpoint count is stored on
disk.  Some userland programs refer to the count via ioctl, and this
bugfix is needed to prevent malfunction of such programs.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable@kernel.org
2009-07-05 10:44:20 +09:00
Ryusuke Konishi
ff54de363a nilfs2: fix lockdep warning between regular file and inode file
This will fix the following false positive of recursive locking which
lockdep has detected:

=============================================
[ INFO: possible recursive locking detected ]
2.6.30-nilfs #42
---------------------------------------------
nilfs_cleanerd/10607 is trying to acquire lock:
 (&bmap->b_sem){++++-.}, at: [<e0d025b7>] nilfs_bmap_lookup_at_level+0x1a/0x74 [nilfs2]

but task is already holding lock:
 (&bmap->b_sem){++++-.}, at: [<e0d024e0>] nilfs_bmap_truncate+0x19/0x6a [nilfs2]
other info that might help us debug this:
2 locks held by nilfs_cleanerd/10607:
 #0:  (&nilfs->ns_segctor_sem){++++.+}, at: [<e0d0d75a>] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
 #1:  (&bmap->b_sem){++++-.}, at: [<e0d024e0>] nilfs_bmap_truncate+0x19/0x6a [nilfs2]

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-07-05 10:44:20 +09:00
Ryusuke Konishi
4a52df7797 nilfs2: fix incorrect KERN_CRIT messages in case of write failures
In case of write-failure retries, the following KERN_CRIT level
messages are mistakenly output by nilfs_dat_commit_start() function:

nilfs_dat_commit_start: vbn = 408463, start = 12506, end = 18446744073709551615, pbn = 530210
nilfs_dat_commit_start: vbn = 408515, start = 12506, end = 18446744073709551615, pbn = 530211
nilfs_dat_commit_start: vbn = 408464, start = 12506, end = 18446744073709551615, pbn = 530212
...

This suppresses these messages.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable@kernel.org
2009-07-05 10:44:20 +09:00
Ryusuke Konishi
8227b29722 nilfs2: fix hang problem of log writer which occurs after write failures
Leandro Lucarella gave me a report that nilfs gets stuck after its
write function fails.

The problem turned out to be caused by bugs which leave writeback flag
on pages.  This fixes the problem by ensuring to clear the writeback
flag in error path.

Reported-by: Leandro Lucarella <llucax@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable@kernel.org
2009-07-05 10:44:20 +09:00
Ryusuke Konishi
0cfae3d879 nilfs2: remove unlikely directive causing mis-conversion of error code
The following error code handling in nilfs_segctor_write() function
wrongly converted negative error codes to a truth value (i.e. 1):

   err = unlikely(err) ? : res;

which originaly meant to be

   err = err ? : res;

This mis-conversion caused that write or sync functions receive the
unexpected error code.  This fixes the bug by removing the unlikely
directive.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable@kernel.org
2009-07-05 10:44:19 +09:00
Linus Torvalds
14c1b7c212 Merge branch 'for-2.6.31' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.31' of git://linux-nfs.org/~bfields/linux:
  NFSD: Don't hold unrefcounted creds over call to nfsd_setuser()
2009-07-04 10:11:38 -07:00
David Howells
033a666ccb NFSD: Don't hold unrefcounted creds over call to nfsd_setuser()
nfsd_open() gets an unrefcounted pointer to the current process's effective
credentials at the top of the function, then calls nfsd_setuser() via
fh_verify() - which may replace and destroy the current process's effective
credentials - and then passes the unrefcounted pointer to dentry_open() - but
the credentials may have been destroyed by this point.

Instead, the value from current_cred() should be passed directly to
dentry_open() as one of its arguments, rather than being cached in a variable.

Possibly fh_verify() should return the creds to use.

This is a regression introduced by
745ca2475a "CRED: Pass credentials through
dentry_open()".

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-and-Verified-By: Steve Dickson <steved@redhat.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-07-03 10:21:10 -04:00
Linus Torvalds
746a99a5af Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
  fs/notify/inotify: decrement user inotify count on close
2009-07-02 16:54:07 -07:00
Linus Torvalds
5291a12f05 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: fix error message formatting
  Btrfs: fix use after free in btrfs_start_workers fail path
  Btrfs: honor nodatacow/sum mount options for new files
  Btrfs: update backrefs while dropping snapshot
  Btrfs: account for space we may use in fallocate
  Btrfs: fix the file clone ioctl for preallocated extents
  Btrfs: don't log the inode in file_write while growing the file
2009-07-02 16:52:38 -07:00
Hu Tao
68f5a38c3e Btrfs: fix error message formatting
Make an error msg look nicer by inserting a space between number and word.

Signed-off-by: Hu Tao <hu.taoo@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-02 13:55:45 -04:00
Jiri Slaby
9b627e9bf4 Btrfs: fix use after free in btrfs_start_workers fail path
worker memory is already freed on one fail path in btrfs_start_workers,
but is still dereferenced. Switch the dereference and kfree.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-02 13:50:58 -04:00
Chris Mason
9427216476 Btrfs: honor nodatacow/sum mount options for new files
The btrfs attr patches unconditionally inherited the inode flags field
without honoring nodatacow and nodatasum.  This fix makes sure
we properly record the nodatacow/sum mount options in new inodes.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-02 13:41:17 -04:00
Yan Zheng
2c47e605a9 Btrfs: update backrefs while dropping snapshot
The new backref format has restriction on type of backref item.  If a tree
block isn't referenced by its owner tree, full backrefs must be used for the
pointers in it. When a tree block loses its owner tree's reference, backrefs
for the pointers in it should be updated to full backrefs. Current
btrfs_drop_snapshot misses the code that updates backrefs, so it's unsafe for
general use.

This patch adds backrefs update code to btrfs_drop_snapshot.  It isn't a
problem in the restricted form btrfs_drop_snapshot is used today, but for
general snapshot deletion this update is required.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-02 13:41:17 -04:00
Josef Bacik
a970b0a16c Btrfs: account for space we may use in fallocate
Using Eric Sandeen's xfstest for fallocate, you can easily trigger a ENOSPC
panic on btrfs.  This is because we do not account for data we may use when
doing the fallocate.  This patch fixes the problem by properly reserving space,
and then just freeing it when we are done.  The reservation stuff was made with
delalloc in mind, so its a little crude for this case, but it keeps the box
from panicing.

Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-07-02 13:41:16 -04:00
Chris Mason
c8a894d77d Btrfs: fix the file clone ioctl for preallocated extents 2009-07-02 13:41:16 -04:00
Chris Mason
f597bb19cc Btrfs: don't log the inode in file_write while growing the file 2009-07-02 13:41:16 -04:00
Keith Packard
bdae997f44 fs/notify/inotify: decrement user inotify count on close
The per-user inotify_devs value is incremented each time a new file is
allocated, but never decremented. This led to inotify_init failing after a
limited number of calls.

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2009-07-02 08:23:00 -04:00
Jeff Layton
cc0bad7552 cifs: add new cifs_iget function and convert unix codepath to use it
cifs: add new cifs_iget function and convert unix codepath to use it

In order to unify some codepaths, introduce a common cifs_fattr struct
for storing inode attributes. The different codepaths (unix, legacy,
normal, etc...) can fill out this struct with inode info. It can then be
passed as an arg to a common set of routines to get and update inodes.

Add a new cifs_iget function that uses iget5_locked to identify inodes.
This will compare inodes based on the uniqueid value in a cifs_fattr
struct.

Rather than filling out an already-created inode, have
cifs_get_inode_info_unix instead fill out cifs_fattr and hand that off
to cifs_iget. cifs_iget can then properly look for hardlinked inodes.

On the readdir side, add a new cifs_readdir_lookup function that spawns
populated dentries. Redefine FILE_UNIX_INFO so that it's basically a
FILE_UNIX_BASIC_INFO that has a few fields wrapped around it. This
allows us to more easily use the same function for filling out the fattr
as the non-readdir codepath.

With this, we should then have proper hardlink detection and can
eventually get rid of some nasty CIFS-specific hacks for handing them.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-07-01 21:26:42 +00:00
Linus Torvalds
5c5d4e8eaf Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6:
  mtd: nand: fix build failure and incorrect return from omap_wait()
  mtd: Use BLOCK_NIL consistently in NFTL/INFTL
  mtd: m25p80 timeout too short for worst-case m25p16 devices
  mtd: atmel_nand: Fix typo s/parititions/partitions/
  mtd: cmdlineparts: Use 64-bit format when printing a debug message.
  mtd: maps: Remove BUS_ID_SIZE from integrator_flash
  jffs2: fix another potential leak on error path in scan.c
2009-07-01 11:25:46 -07:00
Linus Torvalds
fa172f4006 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: invalidation reverse calls
  fuse: allow umask processing in userspace
  fuse: fix bad return value in fuse_file_poll()
  fuse: fix return value of fuse_dev_write()
2009-07-01 11:20:46 -07:00
Amerigo Wang
e2dbe12557 elf: fix one check-after-use
Check before use it.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-01 11:14:28 -07:00
Martin K. Petersen
7878cba9f0 block: Create bip slabs with embedded integrity vectors
This patch restores stacking ability to the block layer integrity
infrastructure by creating a set of dedicated bip slabs.  Each bip slab
has an embedded bio_vec array at the end.  This cuts down on memory
allocations and also simplifies the code compared to the original bvec
version.  Only the largest bip slab is backed by a mempool.  The pool is
contained in the bio_set so stacking drivers can ensure forward
progress.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@carl.(none)>
2009-07-01 10:56:25 +02:00
Wolfgang Illmeyer
752fa51e4c hostfs: set maximum filesize in superblock for proper LFS support
Maximum file size for hostfs mounts defaults to 2GB, so bigger files cannot be
read/written through hostfs. This patch initializes the maximum file size to
MAX_LFS_SIZE.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13531

Signed-off-by: Wolfgang Illmeyer <wolfgang@illmeyer.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-30 18:56:03 -07:00
Bryan Donlan
4d6c13f87d ext2: return -EIO not -ESTALE on directory traversal through deleted inode
ext2_iget() returns -ESTALE if invoked on a deleted inode, in order to
report errors to NFS properly.  However, in ext[234]_lookup(), this
-ESTALE can be propagated to userspace if the filesystem is corrupted such
that a directory entry references a deleted inode.  This leads to a
misleading error message - "Stale NFS file handle" - and confusion on the
part of the admin.

The bug can be easily reproduced by creating a new filesystem, making a
link to an unused inode using debugfs, then mounting and attempting to ls
-l said link.

This patch thus changes ext2_lookup to return -EIO if it receives -ESTALE
from ext2_iget(), as ext2 does for other filesystem metadata corruption;
and also invokes the appropriate ext*_error functions when this case is
detected.

Signed-off-by: Bryan Donlan <bdonlan@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-30 18:56:00 -07:00
KAMEZAWA Hiroyuki
341c87bf34 elf: limit max map count to safe value
With ELF, at generating coredump, some more headers other than used
vmas are added.

When max_map_count == 65536, a core generated by following kinds of
code can be unreadable because the number of ELF's program header is
written in 16bit in Ehdr (please see elf.h) and the number overflows.

==
	... = mmap(); (munmap, mprotect, etc...)
	if (failed)
		abort();
==

This can happen in mmap/munmap/mprotect/etc...which calls split_vma().

I think 65536 is not safe as _default_ and reduce it to 65530 is good
for avoiding unexpected corrupted core.

Anyway, max_map_count can be enlarged by sysctl if a user is brave..

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Jakub Jelinek <jakub@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-30 18:55:59 -07:00
Davide Libenzi
133890103b eventfd: revised interface and cleanups
Change the eventfd interface to de-couple the eventfd memory context, from
the file pointer instance.

Without such change, there is no clean way to racely free handle the
POLLHUP event sent when the last instance of the file* goes away.  Also,
now the internal eventfd APIs are using the eventfd context instead of the
file*.

This patch is required by KVM's IRQfd code, which is still under
development.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Gregory Haskins <ghaskins@novell.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-30 18:55:58 -07:00
Jiri Slaby
f7c2df9b55 AFS: Fix lock imbalance
Don't unlock on vfs_rejected_lock path in afs_do_setlk, since the lock
is unlocked after abort_attempt label.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-30 13:30:44 -07:00
John Muir
3b463ae0c6 fuse: invalidation reverse calls
Add notification messages that allow the filesystem to invalidate VFS
caches.

Two notifications are added:

 1) inode invalidation

   - invalidate cached attributes
   - invalidate a range of pages in the page cache (this is optional)

 2) dentry invalidation

   - try to invalidate a subtree in the dentry cache

Care must be taken while accessing the 'struct super_block' for the
mount, as it can go away while an invalidation is in progress.  To
prevent this, introduce a rw-semaphore, that is taken for read during
the invalidation and taken for write in the ->kill_sb callback.

Cc: Csaba Henk <csaba@gluster.com>
Cc: Anand Avati <avati@zresearch.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-06-30 20:12:24 +02:00
Miklos Szeredi
e0a43ddcc0 fuse: allow umask processing in userspace
This patch lets filesystems handle masking the file mode on creation.
This is needed if filesystem is using ACLs.

 - The CREATE, MKDIR and MKNOD requests are extended with a "umask"
   parameter.

 - A new FUSE_DONT_MASK flag is added to the INIT request/reply.  With
   this the filesystem may request that the create mode is not masked.

CC: Jean-Pierre André <jean-pierre.andre@wanadoo.fr>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-06-30 20:12:23 +02:00
Miklos Szeredi
201fa69a28 fuse: fix bad return value in fuse_file_poll()
Fix fuse_file_poll() which returned a -errno value instead of a poll
mask.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
2009-06-30 20:06:24 +02:00
Csaba Henk
b4c458b3a2 fuse: fix return value of fuse_dev_write()
On 64 bit systems -- where sizeof(ssize_t) > sizeof(int) -- the following test
exposes a bug due to a non-careful return of an int or unsigned value:

implement a FUSE filesystem which sends an unsolicited notification to
the kernel with invalid opcode. The respective write to /dev/fuse
will return (1 << 32) - EINVAL with errno == 0 instead of -1 with
errno == EINVAL.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
2009-06-30 20:06:23 +02:00
Mimi Zohar
94e5d714f6 integrity: add ima_counts_put (updated)
This patch fixes an imbalance message as reported by J.R. Okajima.
The IMA file counters are incremented in ima_path_check. If the
actual open fails, such as ETXTBSY, decrement the counters to
prevent unnecessary imbalance messages.

Reported-by: J.R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-06-29 08:59:10 +10:00
Jeff Layton
f0a71eb820 cifs: fix fh_mutex locking in cifs_reopen_file
Fixes a regression caused by commit a6ce4932fb

When this lock was converted to a mutex, the locks were turned into
unlocks and vice-versa.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-06-27 23:46:43 +00:00
Linus Torvalds
aada1bc927 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] remove unknown mount option warning message
  [CIFS] remove bkl usage from umount begin
  cifs: Fix incorrect return code being printed in cFYI messages
  [CIFS] cleanup asn handling for ntlmssp
  [CIFS] Copy struct *after* setting the port, instead of before.
  cifs: remove rw/ro options
  cifs: fix problems with earlier patches
  cifs: have cifs parse scope_id out of IPv6 addresses and use it
  [CIFS] Do not send tree disconnect if session is already disconnected
  [CIFS] Fix build break
  cifs: display scopeid in /proc/mounts
  cifs: add new routine for converting AF_INET and AF_INET6 addrs
  cifs: have cifs_show_options show forceuid/forcegid options
  cifs: remove unneeded NULL checks from cifs_show_options
2009-06-26 09:37:19 -07:00
Steve French
71a394faaa [CIFS] remove unknown mount option warning message
Jeff's previous patch which removed the unneeded rw/ro
parsing can cause a minor warning in dmesg (about the
unknown rw or ro mount option) at mount time. This
patch makes cifs ignore them in kernel to remove the warning
(they are already handled in the mount helper and VFS).

Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-06-26 04:07:18 +00:00
Steve French
ad8034f197 [CIFS] remove bkl usage from umount begin
The lock_kernel call moved into the fs for umount_begin
is not needed.  This adds a check to make sure we don't
call umount_begin twice on the same fs.

umount_begin for cifs is probably not needed and
may eventually be able to be removed, but in
the meantime this smaller patch is safe and
gets rid of the bkl from this path which provides
some benefit.

Acked-by: Jeff Layton <redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-06-26 03:25:49 +00:00
Suresh Jayaraman
0f3bc09ee1 cifs: Fix incorrect return code being printed in cFYI messages
FreeXid() along with freeing Xid does add a cifsFYI debug message that
prints rc (return code) as well. In some code paths where we set/return
error code after calling FreeXid(), incorrect error code is being
printed when cifsFYI is enabled.

This could be misleading in few cases. For eg.
In cifs_open() if cifs_fill_filedata() returns a valid pointer to
cifsFileInfo, FreeXid() prints rc=-13 whereas 0 is actually being
returned. Fix this by setting rc before calling FreeXid().

Basically convert

FreeXid(xid);			rc = -ERR;
return -ERR;		=>	FreeXid(xid);
				return rc;

[Note that Christoph would like to replace the GetXid/FreeXid
calls, which are primarily used for debugging.  This seems
like a good longer term goal, but although there is an
alternative tracing facility, there are no examples yet
available that I know of that we can use (yet) to
convert this cifs function entry/exit logging, and for
creating an identifier that we can use to correlate
all dmesg log entries for a particular vfs operation
(ie identify all log entries for a particular vfs
request to cifs: e.g. a particular close or read or write
or byte range lock call ... and just using the thread id
is harder).  Eventually when a replacement
for this is available (e.g. when NFS switches over and various
samples to look at in other file systems) we can remove the
GetXid/FreeXid macro but in the meantime multiple people
use this run time configurable logging all the time
for debugging, and Suresh's patch fixes a problem
which made it harder to notice some low
memory problems in the log so it is worthwhile
to fix this problem until a better logging
approach is able to be used]

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-06-25 19:12:57 +00:00
Steve French
f46c7234e4 [CIFS] cleanup asn handling for ntlmssp
Also removes obsolete distinction between rawntlmssp and ntlmssp (in asn/SPNEGO)
since as jra noted we can always send raw ntlmssp in session setup now.

remove check for experimental runtime flag (/proc/fs/cifs/Experimental) in
ntlmssp path.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-06-25 03:07:48 +00:00
Simo Leone
6debdbc0ba [CIFS] Copy struct *after* setting the port, instead of before.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Simo Leone <simo@archlinux.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-06-25 02:44:43 +00:00
Jeff Layton
6459340cfc cifs: remove rw/ro options
cifs: remove rw/ro options

These options are handled at the VFS layer. They only ever set the
option in the smb_vol struct. Nothing was ever done with them afterward
anyway.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-06-25 02:33:01 +00:00
Jeff Layton
b48a485884 cifs: fix problems with earlier patches
cifs: fix problems with earlier patches

cifs_show_address hasn't been introduced yet, and fix a typo that was
silently fixed by a later patch in the series.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-06-25 02:32:57 +00:00
Jeff Layton
681bf72e48 cifs: have cifs parse scope_id out of IPv6 addresses and use it
This patch has CIFS look for a '%' in an IPv6 address. If one is
present then it will try to treat that value as a numeric interface
index suitable for stuffing into the sin6_scope_id field.

This should allow people to mount servers on IPv6 link-local addresses.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: David Holder <david@erion.co.uk>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-06-25 01:14:36 +00:00
Steve French
268875b9d1 [CIFS] Do not send tree disconnect if session is already disconnected
Noticed this when tree connect timed out (due to Samba server crash) -
we try to send a tree disconnect for a tid that does not exist
since we don't have a valid tree id yet. This checks that the
session is valid before sending the tree disconnect to handle
this case.

Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-06-25 00:29:21 +00:00
Al Viro
d5bb68adda another race fix in jfs_check_acl()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 17:02:42 -04:00
Al Viro
72c04902d1 Get "no acls for this inode" right, fix shmem breakage
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 16:58:48 -04:00
Linus Torvalds
936940a9c7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (23 commits)
  switch xfs to generic acl caching helpers
  helpers for acl caching + switch to those
  switch shmem to inode->i_acl
  switch reiserfs to inode->i_acl
  switch reiserfs to usual conventions for caching ACLs
  reiserfs: minimal fix for ACL caching
  switch nilfs2 to inode->i_acl
  switch btrfs to inode->i_acl
  switch jffs2 to inode->i_acl
  switch jfs to inode->i_acl
  switch ext4 to inode->i_acl
  switch ext3 to inode->i_acl
  switch ext2 to inode->i_acl
  add caching of ACLs in struct inode
  fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls
  cleanup __writeback_single_inode
  ... and the same for vfsmount id/mount group id
  Make allocation of anon devices cheaper
  update Documentation/filesystems/Locking
  devpts: remove module-related code
  ...
2009-06-24 10:03:12 -07:00
Linus Torvalds
d7ed9c05eb Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
  udf: remove redundant tests on unsigned
  udf: Use device size when drive reported bogus number of written blocks
2009-06-24 09:57:10 -07:00
Al Viro
1cbd20d820 switch xfs to generic acl caching helpers
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:17:07 -04:00
Al Viro
073aaa1b14 helpers for acl caching + switch to those
helpers: get_cached_acl(inode, type), set_cached_acl(inode, type, acl),
forget_cached_acl(inode, type).

ubifs/xattr.c needed includes reordered, the rest is a plain switchover.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:17:07 -04:00
Al Viro
281eede032 switch reiserfs to inode->i_acl
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:17:06 -04:00
Al Viro
7a77b15d92 switch reiserfs to usual conventions for caching ACLs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:17:06 -04:00
Al Viro
e68888bcb6 reiserfs: minimal fix for ACL caching
reiserfs uses NULL as "unknown" and ERR_PTR(-ENODATA) as "no ACL";
several codepaths store the former instead of the latter.

All those codepaths go through iset_acl() and all cases when it's
called with NULL acl are for the second variety, so the minimal
fix is to teach iset_acl() to deal with that.

Proper fix is to switch to more usual conventions and avoid back
and forth between internally used ERR_PTR(-ENODATA) and NULL
expected by the rest of the kernel.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:17:05 -04:00
Al Viro
d441b1c293 switch nilfs2 to inode->i_acl
Actually, get rid of private analog, since nothing in there is
using ACLs at all so far.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:17:05 -04:00
Al Viro
5affd88a10 switch btrfs to inode->i_acl
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:17:05 -04:00
Al Viro
290c263bf8 switch jffs2 to inode->i_acl
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:17:05 -04:00
Al Viro
05fc0790b6 switch jfs to inode->i_acl
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:17:04 -04:00
Al Viro
d4bfe2f76d switch ext4 to inode->i_acl
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:17:04 -04:00
Al Viro
6582a0e6f6 switch ext3 to inode->i_acl
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:17:04 -04:00
Al Viro
5e78b43568 switch ext2 to inode->i_acl
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:15:28 -04:00
Al Viro
f19d4a8fa6 add caching of ACLs in struct inode
No helpers, no conversions yet.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:15:27 -04:00
Ankit Jain
3e63cbb1ef fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls
This patch adds ioctls to vfs for compatibility with legacy XFS
pre-allocation ioctls (XFS_IOC_*RESVP*). The implementation
effectively invokes sys_fallocate for the new ioctls.
Also handles the compat_ioctl case.
Note: These legacy ioctls are also implemented by OCFS2.

[AV: folded fixes from hch]

Signed-off-by: Ankit Jain <me@ankitjain.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:15:27 -04:00
Christoph Hellwig
01c031945f cleanup __writeback_single_inode
There is no reason to for the split between __writeback_single_inode and
__sync_single_inode, the former just does a couple of checks before
tail-calling the latter.  So merge the two, and while we're at it split
out the I_SYNC waiting case for data integrity writers, as it's
logically separate function.  Finally rename __writeback_single_inode to
writeback_single_inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:15:26 -04:00
Al Viro
f21f62208a ... and the same for vfsmount id/mount group id
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:15:26 -04:00
Al Viro
c63e09eccc Make allocation of anon devices cheaper
Standard trick - add a new variable (start) such that
for each n < start n is known to be busy.  Allocation can
skip checking everything in [0..start) and if it returns
n, we can set start to n + 1.  Freeing below start sets
start to what we'd just freed.

Of course, it still sucks if we do something like
	free 0
	allocate
	allocate
in a loop - still O(n^2) time.  However, on saner loads it
improves the things a lot and the entire thing is not worth
the trouble of switching to something with better worst-case
behaviour.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:15:25 -04:00
H. Peter Anvin
f6cc746bbb devpts: remove module-related code
These days, the devpts filesystem is closely integrated with the pty
memory management, and cannot be built as a module, even less removed
from the kernel.  Accordingly, remove all module-related stuff from
this filesystem.

[ v2: only remove code that's actually dead ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:15:24 -04:00
Trond Myklebust
3b22edc573 VFS: Switch init_mount_tree() to use the new create_mnt_ns() helper
Eliminates some duplicated code...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:15:24 -04:00
J. R. Okajima
654f562c52 vfs: fix nd->root leak in do_filp_open()
commit 2a73787110 "Cache root in nameidata"
introduced a new member nd->root, but forgot to put it in do_filp_open().

Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:15:24 -04:00
Christoph Hellwig
b5450d9c84 reiserfs: remove stray unlock_super in reiserfs_resize
Reiserfs doesn't use lock_super anywhere internally, and ->remount_fs
which calls reiserfs_resize does have it currently but also expects it
to be held on return, so there's no business for the unlock_super here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked by Edward Shishkin <edward.shishkin@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:15:24 -04:00
Roel Kluin
3391faa4f1 udf: remove redundant tests on unsigned
first_block and goal are unsigned. When negative they are wrapped and caught by
the other test.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-06-24 13:48:28 +02:00
Linus Torvalds
cf5434e894 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2/trivial: Wrap ocfs2_sysfile_cluster_lock_key within define.
  ocfs2: Add lockdep annotations
  vfs: Set special lockdep map for dirs only if not set by fs
  ocfs2: Disable orphan scanning for local and hard-ro mounts
  ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init()
  ocfs2: Stop orphan scan as early as possible during umount
  ocfs2: Fix ocfs2_osb_dump()
  ocfs2: Pin journal head before accessing jh->b_committed_data
  ocfs2: Update atime in splice read if necessary.
  ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API.
2009-06-23 19:36:02 -07:00
Trond Myklebust
b88f8a546f NFS: Correct the NFS mount path when following a referral
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 21:28:25 -07:00
Trond Myklebust
0b75b35c7c NFS: Fix nfs_path() to always return a '/' at the beginning of the path
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 21:28:25 -07:00
Trond Myklebust
c02d7adf8c NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private namespace
As noted in the previous patch, the NFSv4 client mount code currently
has several limitations. If the mount path contains symlinks, or
referrals, or even if it just contains a '..', then the client code in
nfs4_path_walk() will fail with an error.

This patch replaces the nfs4_path_walk()-based lookup with a helper
function that sets up a private namespace to represent the namespace on the
server, then uses the ordinary VFS and NFS path lookup code to walk down the
mount path in that namespace.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 21:28:25 -07:00
Trond Myklebust
cf8d2c11cb VFS: Add VFS helper functions for setting up private namespaces
The purpose of this patch is to improve the remote mount path lookup
support for distributed filesystems such as the NFSv4 client.

When given a mount command of the form "mount server:/foo/bar /mnt", the
NFSv4 client is required to look up the filehandle for "server:/", and
then look up each component of the remote mount path "foo/bar" in order
to find the directory that is actually going to be mounted on /mnt.
Following that remote mount path may involve following symlinks,
crossing server-side mount points and even following referrals to
filesystem volumes on other servers.

Since the standard VFS path lookup code already supports walking paths
that contain all these features (using in-kernel automounts for
following referrals) we would like to be able to reuse that rather than
duplicate the full path traversal functionality in the NFSv4 client code.

This patch therefore defines a VFS helper function create_mnt_ns(), that
sets up a temporary filesystem namespace and attaches a root filesystem to
it. It exports the create_mnt_ns() and put_mnt_ns() function for use by
filesystem modules.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 21:28:25 -07:00
Trond Myklebust
616511d039 VFS: Uninline the function put_mnt_ns()
In order to allow modules to use it without having to export vfsmount_lock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 21:28:25 -07:00
David Woodhouse
4839641333 jffs2: fix another potential leak on error path in scan.c
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-06-23 01:34:19 +01:00
Linus Torvalds
ac1b7c378e Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6: (63 commits)
  mtd: OneNAND: Allow setting of boundary information when built as module
  jffs2: leaking jffs2_summary in function jffs2_scan_medium
  mtd: nand: Fix memory leak on txx9ndfmc probe failure.
  mtd: orion_nand: use burst reads with double word accesses
  mtd/nand: s3c6400 support for s3c2410 driver
  [MTD] [NAND] S3C2410: Use DIV_ROUND_UP
  [MTD] [NAND] S3C2410: Deal with unaligned lengths in S3C2440 buffer read/write
  [MTD] [NAND] S3C2410: Allow the machine code to get the BBT table from NAND
  [MTD] [NAND] S3C2410: Added a kerneldoc for s3c2410_nand_set
  mtd: physmap_of: Add multiple regions and concatenation support
  mtd: nand: max_retries off by one in mxc_nand
  mtd: nand: s3c2410_nand_setrate(): use correct macros for 2412/2440
  mtd: onenand: add bbt_wait & unlock_all as replaceable for some platform
  mtd: Flex-OneNAND support
  mtd: nand: add OMAP2/OMAP3 NAND driver
  mtd: maps: Blackfin async: fix memory leaks in probe/remove funcs
  mtd: uclinux: mark local stuff static
  mtd: uclinux: do not allow to be built as a module
  mtd: uclinux: allow systems to override map addr/size
  mtd: blackfin NFC: fix hang when using NAND on BF527-EZKITs
  ...
2009-06-22 16:56:22 -07:00
Tao Ma
d246ab307d ocfs2/trivial: Wrap ocfs2_sysfile_cluster_lock_key within define.
Actually ocfs2_sysfile_cluster_lock_key is only used if we enable
CONFIG_DEBUG_LOCK_ALLOC. Wrap it so that we can avoid a building
warning.
fs/ocfs2/sysfile.c:53: warning: ‘ocfs2_sysfile_cluster_lock_key’
defined but not used

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22 14:34:29 -07:00
Jan Kara
cb25797d45 ocfs2: Add lockdep annotations
Add lockdep support to OCFS2. The support also covers all of the cluster
locks except for open locks, journal locks, and local quotafile locks. These
are special because they are acquired for a node, not for a particular process
and lockdep cannot deal with such type of locking.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22 14:34:26 -07:00
Jan Kara
9a7aa12f39 vfs: Set special lockdep map for dirs only if not set by fs
Some filesystems need to set lockdep map for i_mutex differently for
different directories. For example OCFS2 has system directories (for
orphan inode tracking and for gathering all system files like journal
or quota files into a single place) which have different locking
locking rules than standard directories. For a filesystem setting
lockdep map is naturaly done when the inode is read but we have to
modify unlock_new_inode() not to overwrite the lockdep map the filesystem
has set.

Acked-by: peterz@infradead.org
CC: mingo@redhat.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22 14:34:22 -07:00
Sunil Mushran
df152c241d ocfs2: Disable orphan scanning for local and hard-ro mounts
Local and Hard-RO mounts do not need orphan scanning.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22 14:24:55 -07:00
Sunil Mushran
3211949f89 ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init()
We don't access the LVB in our ocfs2_*_lock_res_init() functions.

Since the LVB can become invalid during some cluster recovery
operations, the dlmglue must be able to handle an uninitialized
LVB.

For the orphan scan lock, we initialized an uninitialzed LVB with our
scan sequence number plus one.  This starts a normal orphan scan
cycle.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22 14:24:53 -07:00
Sunil Mushran
692684e19e ocfs2: Stop orphan scan as early as possible during umount
Currently if the orphan scan fires a tick before the user issues the umount,
the umount will wait for the queued orphan scan tasks to complete.

This patch makes the umount stop the orphan scan as early as possible so as
to reduce the probability of the queued tasks slowing down the umount.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22 14:24:51 -07:00
Sunil Mushran
c3d38840ab ocfs2: Fix ocfs2_osb_dump()
Skip printing information that is not valid for local mounts.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22 14:24:49 -07:00
Sunil Mushran
94e41ecfe0 ocfs2: Pin journal head before accessing jh->b_committed_data
This patch adds jbd_lock_bh_state() and jbd_unlock_bh_state() around accessses
to jh->b_committed_data.

Fixes oss bugzilla#1131
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1131

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22 14:24:47 -07:00
Tao Ma
1962f39abb ocfs2: Update atime in splice read if necessary.
We should call ocfs2_inode_lock_atime instead of ocfs2_inode_lock
in ocfs2_file_splice_read like we do in ocfs2_file_aio_read so
that we can update atime in splice read if necessary.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22 14:24:45 -07:00
Joel Becker
1c520dfbf3 ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API.
The Lock Value Block (LVB) of a DLM lock can be lost when nodes die and
the DLM cannot reconstruct its state.  Clients of the DLM need to know
this.

ocfs2's internal DLM, o2dlm, explicitly zeroes out the LVB when it loses
track of the state.  This is not a standard behavior, but ocfs2 has
always relied on it.  Thus, an o2dlm LVB is always "valid".

ocfs2 now supports both o2dlm and fs/dlm via the stack glue.  When
fs/dlm loses track of an LVBs state, it sets a flag
(DLM_SBF_VALNOTVALID) on the Lock Status Block (LKSB).  The contents of
the LVB may be garbage or merely stale.

ocfs2 doesn't want to try to guess at the validity of the stale LVB.
Instead, it should be checking the VALNOTVALID flag.  As this is the
'standard' way of treating LVBs, we will promote this behavior.

We add a stack glue API ocfs2_dlm_lvb_valid().  It returns non-zero when
the LVB is valid.  o2dlm will always return valid, while fs/dlm will
check VALNOTVALID.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
2009-06-22 14:24:30 -07:00
Linus Torvalds
7e0338c0de Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsd
* 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits)
  SUNRPC: Fix the TCP server's send buffer accounting
  nfsd41: Backchannel: minorversion support for the back channel
  nfsd41: Backchannel: cleanup nfs4.0 callback encode routines
  nfsd41: Remove ip address collision detection case
  nfsd: optimise the starting of zero threads when none are running.
  nfsd: don't take nfsd_mutex twice when setting number of threads.
  nfsd41: sanity check client drc maxreqs
  nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct
  NFS: kill off complicated macro 'PROC'
  sunrpc: potential memory leak in function rdma_read_xdr
  nfsd: minor nfsd_vfs_write cleanup
  nfsd: Pull write-gathering code out of nfsd_vfs_write
  nfsd: track last inode only in use_wgather case
  sunrpc: align cache_clean work's timer
  nfsd: Use write gathering only with NFSv2
  NFSv4: kill off complicated macro 'PROC'
  NFSv4: do exact check about attribute specified
  knfsd: remove unreported filehandle stats counters
  knfsd: fix reply cache memory corruption
  knfsd: reply cache cleanups
  ...
2009-06-22 12:55:50 -07:00
Linus Torvalds
df36b439c5 Merge branch 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (128 commits)
  nfs41: sunrpc: xprt_alloc_bc_request() should not use spin_lock_bh()
  nfs41: Move initialization of nfs4_opendata seq_res to nfs4_init_opendata_res
  nfs: remove unnecessary NFS_INO_INVALID_ACL checks
  NFS: More "sloppy" parsing problems
  NFS: Invalid mount option values should always fail, even with "sloppy"
  NFS: Remove unused XDR decoder functions
  NFS: Update MNT and MNT3 reply decoding functions
  NFS: add XDR decoder for mountd version 3 auth-flavor lists
  NFS: add new file handle decoders to in-kernel mountd client
  NFS: Add separate mountd status code decoders for each mountd version
  NFS: remove unused function in fs/nfs/mount_clnt.c
  NFS: Use xdr_stream-based XDR encoder for MNT's dirpath argument
  NFS: Clean up MNT program definitions
  lockd: Don't bother with RPC ping for NSM upcalls
  lockd: Update NSM state from SM_MON replies
  NFS: Fix false error return from nfs_callback_up() if ipv6.ko is not available
  NFS: Return error code from nfs_callback_up() to user space
  NFS: Do not display the setting of the "intr" mount option
  NFS: add support for splice writes
  nfs41: Backchannel: CB_SEQUENCE validation
  ...
2009-06-22 12:53:06 -07:00
Hitoshi Mitake
e38be994b9 Making fs/minix/minix.h double including safe
I happened to find that fs/minix/minix.h doesn't guard double include.

Yes, I know this never cause something destructive because this is
self-evidence that no source file includes minix.h twice, but I think
fixing this is better than disregarding it.

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 11:34:42 -07:00
Boaz Harrosh
baaf94cdc7 exofs: Avoid using file_fsync()
The use of file_fsync() in exofs_file_sync() is not necessary since it
does some extra stuff not used by exofs. Open code just the parts that
are currently needed.

TODO: Farther optimization can be done to sync the sb only on inode
update of new files, Usually the sb update is not needed in exofs.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-06-21 17:53:47 +03:00
Boaz Harrosh
27d2e14919 exofs: Remove IBM copyrights
Boaz,
Congrats on getting all the OSD stuff into 2.6.30!
I just pulled the git, and saw that the IBM copyrights are still there.
Please remove them from all files:
 * Copyright (C) 2005, 2006
 * International Business Machines

IBM has revoked all rights on the code - they gave it to me.

Thanks!
Avishay

Signed-off-by: Avishay Traeger <avishay@gmail.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-06-21 17:53:47 +03:00
Boaz Harrosh
b76a3f93d0 exofs: Fix bio leak in error handling path (sync read)
When failing a read request in the sync path, called from
write_begin, I forgot to free the allocated bio, fix it.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-06-21 17:53:44 +03:00
Benny Halevy
578e458568 nfs41: Move initialization of nfs4_opendata seq_res to nfs4_init_opendata_res
nfs4_open_recover_helper clears opendata->o_res
before calling nfs4_init_opendata_res, thus causing
NFSv4.0 OPEN operations to be sent rather than nfsv4.1.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-20 14:55:12 -04:00
Linus Torvalds
12e24f34cb Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
  perfcounter: Handle some IO return values
  perf_counter: Push perf_sample_data through the swcounter code
  perf_counter tools: Define and use our own u64, s64 etc. definitions
  perf_counter: Close race in perf_lock_task_context()
  perf_counter, x86: Improve interactions with fast-gup
  perf_counter: Simplify and fix task migration counting
  perf_counter tools: Add a data file header
  perf_counter: Update userspace callchain sampling uses
  perf_counter: Make callchain samples extensible
  perf report: Filter to parent set by default
  perf_counter tools: Handle lost events
  perf_counter: Add event overlow handling
  fs: Provide empty .set_page_dirty() aop for anon inodes
  perf_counter: tools: Makefile tweaks for 64-bit powerpc
  perf_counter: powerpc: Add processor back-end for MPC7450 family
  perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
  perf_counter: powerpc: Change how processor-specific back-ends get selected
  perf_counter: powerpc: Use unsigned long for register and constraint values
  perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
  perf_counter tools: Add and use isprint()
  ...
2009-06-20 11:29:32 -07:00
OGAWA Hirofumi
3e107603ae fat: Fix the removal of opts->fs_dmask
(ce3b0f8d5c: New helper - current_umask())
is removing the opts->fs_dmask, probably it's a cut-and-paste
miss or something.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2009-06-20 21:50:47 +09:00
Linus Torvalds
bee89ab228 Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
  inotify: inotify_destroy_mark_entry could get called twice
2009-06-19 17:46:44 -07:00
Linus Torvalds
31583d6acf Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  Fix kernel-doc parameter name typo in blk-settings.c:
  block: rename CONFIG_LBD to CONFIG_LBDAF
  block: Fix bounce_pfn setting
  hd: stop defining MAJOR_NR
2009-06-19 17:43:04 -07:00
Eric Paris
528da3e9e2 inotify: inotify_destroy_mark_entry could get called twice
inotify_destroy_mark_entry could get called twice for the same mark since it
is called directly in inotify_rm_watch and when the mark is being destroyed for
another reason.  As an example assume that the file being watched was just
deleted so inotify_destroy_mark_entry would get called from the path
fsnotify_inoderemove() -> fsnotify_destroy_marks_by_inode() ->
fsnotify_destroy_mark_entry() -> inotify_destroy_mark_entry().  If this
happened at the same time as userspace tried to remove a watch via
inotify_rm_watch we could attempt to remove the mark from the idr twice and
could thus double dec the ref cnt and potentially could be in a use after
free/double free situation.  The fix is to have inotify_rm_watch use the
generic recursive safe fsnotify_destroy_mark_by_entry() so we are sure the
inotify_destroy_mark_entry() function can only be called one.

This patch also renames the function to inotify_ingored_remove_idr() so it is
clear what is actually going on in the function.

Hopefully this fixes:
[   20.342058] idr_remove called for id=20 which is not allocated.
[   20.348000] Pid: 1860, comm: udevd Not tainted 2.6.30-tip #1077
[   20.353933] Call Trace:
[   20.356410]  [<ffffffff811a82b7>] idr_remove+0x115/0x18f
[   20.361737]  [<ffffffff8134259d>] ? _spin_lock+0x6d/0x75
[   20.367061]  [<ffffffff8111640a>] ? inotify_destroy_mark_entry+0xa3/0xcf
[   20.373771]  [<ffffffff8111641e>] inotify_destroy_mark_entry+0xb7/0xcf
[   20.380306]  [<ffffffff81115913>] inotify_freeing_mark+0xe/0x10
[   20.386238]  [<ffffffff8111410d>] fsnotify_destroy_mark_by_entry+0x143/0x170
[   20.393293]  [<ffffffff811163a3>] inotify_destroy_mark_entry+0x3c/0xcf
[   20.399829]  [<ffffffff811164d1>] sys_inotify_rm_watch+0x9b/0xc6
[   20.405850]  [<ffffffff8100bcdb>] system_call_fastpath+0x16/0x1b

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
Tested-by: Peter Ziljlstra <peterz@infradead.org>
2009-06-19 12:42:48 -04:00
Bartlomiej Zolnierkiewicz
90c699a9ee block: rename CONFIG_LBD to CONFIG_LBDAF
Follow-up to "block: enable by default support for large devices
and files on 32-bit archs".

Rename CONFIG_LBD to CONFIG_LBDAF to:
- allow update of existing [def]configs for "default y" change
- reflect that it is used also for large files support nowadays

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-19 08:08:50 +02:00
Andy Adamson
ab52ae6db0 nfsd41: Backchannel: minorversion support for the back channel
Prepare to share backchannel code with NFSv4.1.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[nfsd41: use nfsd4_cb_sequence for callback minorversion]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-18 18:33:57 -07:00
Andy Adamson
ef52bff840 nfsd41: Backchannel: cleanup nfs4.0 callback encode routines
Mimic the client and prepare to share the back channel xdr with NFSv4.1.
Bump the number of operations in each encode routine, then backfill the
number of operations.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-18 18:33:57 -07:00
Trond Myklebust
1f84603c09 Merge branch 'devel-for-2.6.31' into for-2.6.31
Conflicts:
	fs/nfs/client.c
	fs/nfs/super.c
2009-06-18 18:13:44 -07:00
Mike Sager
6ddbbbfe52 nfsd41: Remove ip address collision detection case
Verified that cthon and pynfs exchange id tests pass (except for the
two expected fails: EID8 and EID50)

Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-18 17:43:53 -07:00
Linus Torvalds
0732f87761 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: clean up jbd2_journal_try_to_free_buffers()
  ext4: Don't update ctime for non-extent-mapped inodes
  ext4: Fix up whitespace issues in fs/ext4/inode.c
  ext4: Fix 64-bit block type problem on 32-bit platforms
  ext4: teach the inode allocator to use a goal inode number
  ext4: Use a hash of the topdir directory name for the Orlov parent group
  ext4: document the "abort" mount option
  ext4: move the abort flag from s_mount_opts to s_mount_flags
  ext4: update the s_last_mounted field in the superblock
  ext4: change s_mount_opt to be an unsigned int
  ext4: online defrag -- Add EXT4_IOC_MOVE_EXT ioctl
  ext4: avoid unnecessary spinlock in critical POSIX ACL path
  ext3: avoid unnecessary spinlock in critical POSIX ACL path
  ext4: convert instrumentation from markers to tracepoints
  jbd2: convert instrumentation from markers to tracepoints
2009-06-18 14:07:46 -07:00
Peter Oberparleiter
0b923606e7 seq_file: add function to write binary data
seq_write() can be used to construct seq_files containing arbitrary data.
Required by the gcov-profiling interface to synthesize binary profiling
data files.

Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Li Wei <W.Li@Sun.COM>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:57 -07:00