Commit Graph

2913 Commits

Author SHA1 Message Date
Linus Torvalds
1d77062b14 Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (51 commits)
  nfs: remove nfs_put_link()
  nfs-build-fix-99
  git-nfs-build-fixes
  Merge branch 'odirect'
  NFS: alloc nfs_read/write_data as direct I/O is scheduled
  NFS: Eliminate nfs_get_user_pages()
  NFS: refactor nfs_direct_free_user_pages
  NFS: remove user_addr, user_count, and pos from nfs_direct_req
  NFS: "open code" the NFS direct write rescheduler
  NFS: Separate functions for counting outstanding NFS direct I/Os
  NLM: Fix reclaim races
  NLM: sem to mutex conversion
  locks.c: add the fl_owner to nlm_compare_locks
  NFS: Display the chosen RPCSEC_GSS security flavour in /proc/mounts
  NFS: Split fs/nfs/inode.c
  NFS: Fix typo in nfs_do_clone_mount()
  NFS: Fix compile errors introduced by referrals patches
  NFSv4: Ensure that referral mounts bind to a reserved port
  NFSv4: A root pathname is sent as a zero component4
  NFSv4: Follow a referral
  ...
2006-06-25 10:54:14 -07:00
Linus Torvalds
25581ad107 Merge master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb
* master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (244 commits)
  V4L/DVB (4210b): git-dvb: tea575x-tuner build fix
  V4L/DVB (4210a): git-dvb versus matroxfb
  V4L/DVB (4209): Added some BTTV PCI IDs for newer boards
  Fixes some sync issues between V4L/DVB development and GIT
  V4L/DVB (4206): Cx88-blackbird: always set encoder height based on tvnorm->id
  V4L/DVB (4205): Merge tda9887 module into tuner.
  V4L/DVB (4203): Explicitly set the enum values.
  V4L/DVB (4202): allow selecting CX2341x port mode
  V4L/DVB (4200): Disable bitrate_mode when encoding mpeg-1.
  V4L/DVB (4199): Add cx2341x-specific control array to cx2341x.c
  V4L/DVB (4198): Avoid newer usages of obsoleted experimental MPEGCOMP API
  V4L/DVB (4197): Port new MPEG API to saa7134-empress with saa6752hs
  V4L/DVB (4196): Port cx88-blackbird to the new MPEG API.
  V4L/DVB (4193): Update cx2341x fw encoding API doc.
  V4L/DVB (4192): Use control helpers for saa7115, cx25840, msp3400.
  V4L/DVB (4191): Add CX2341X MPEG encoder module.
  V4L/DVB (4190): Add helper functions for control processing to v4l2-common.
  V4L/DVB (4189): Add videodev support for VIDIOC_S/G/TRY_EXT_CTRLS.
  V4L/DVB (4188): Add new MPEG control/ioctl definitions to videodev2.h
  V4L/DVB (4186): Add support for the DNTV Live! mini DVB-T card.
  ...
2006-06-25 10:09:31 -07:00
Hua Zhong
f58a1ebb22 [PATCH] remove unlikely(sb) in prune_dcache
likely profiling shows that the following is a miss.

After boot:
[+- ] Type | # True | # False | Function:Filename@Line
+unlikely |     1074|        0  prune_dcache()@:fs/dcache.c@409

After a bonnie++ run:
+unlikely |    66716|    19584  prune_dcache()@:fs/dcache.c@409

So remove it.

Signed-off-by: Hua Zhong <hzhong@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:26 -07:00
Evgeniy Dushistov
7d93a1a53a [PATCH] ext2: cleanup: put_page and comment fix
Things which force me think a little: why so?

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:25 -07:00
Ulrich Drepper
45c9b11a1d [PATCH] Implement AT_SYMLINK_FOLLOW flag for linkat
When the linkat() syscall was added the flag parameter was added in the
last minute but it wasn't used so far.  The following patch should change
that.  My tests show that this is all that's needed.

If OLDNAME is a symlink setting the flag causes linkat to follow the
symlink and create a hardlink with the target.  This is actually the
behavior POSIX demands for link() as well but Linux wisely does not do
this.  With this flag (which will most likely be in the next POSIX
revision) the programmer can choose the behavior, defaulting to the safe
variant.  As a side effect it is now possible to implement a
POSIX-compliant link(2) function for those who are interested.

  touch file
  ln -s file symlink

  linkat(fd, "symlink", fd, "newlink", 0)
    -> newlink is hardlink of symlink

  linkat(fd, "symlink", fd, "newlink", AT_SYMLINK_FOLLOW)
    -> newlink is hardlink of file

The value of AT_SYMLINK_FOLLOW is determined by the definition we already
use in glibc.

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:22 -07:00
Frode Isaksen
04a3446c90 [PATCH] fs: sys_poll with timeout -1 bug fix
If you do a poll() call with timeout -1, the wait will be a big number
(depending on HZ) instead of infinite wait, since -1 is passed to the
msecs_to_jiffies function.

Signed-off-by: Frode Isaksen <frode.isaksen@gmail.com>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:22 -07:00
Al Viro
2b943cf09d [PATCH] fix %s in affs_fill_super()
%s is only valid if array is known to contain NUL or precision is given and
does not exceed the size of array.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:22 -07:00
Serge E. Hallyn
fa366ad5d7 [PATCH] kthread: convert smbiod
Update smbiod to use kthread instead of deprecated kernel_thread.

[akpm@osdl.org: cleanup]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:21 -07:00
Eric Sesterhenn
099a71d995 [PATCH] Remove needless checks in fs/9p/vfs_inode.c
coverity found two needless checks in vfs_inode.c (cid #1165 and #1164)
In both cases inode is always NULL when we goto error; either because it
is still initialized to NULL or is set to NULL explicitly. This patch
simply removes these checks to save some code.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@lanl.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:20 -07:00
Miklos Szeredi
9c8ef5614d [PATCH] fuse: scramble lock owner ID
VFS uses current->files pointer as lock owner ID, and it wouldn't be
prudent to expose this value to userspace.  So scramble it with XTEA using
a per connection random key, known only to the kernel.  Only one direction
needs to be implemented, since the ID is never sent in the reverse
direction.

The XTEA algorithm is implemented inline since it's simple enough to do so,
and this adds less complexity than if the crypto API were used.

Thanks to Jesper Juhl for the idea.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:20 -07:00
Miklos Szeredi
a4d27e75ff [PATCH] fuse: add request interruption
Add synchronous request interruption.  This is needed for file locking
operations which have to be interruptible.  However filesystem may implement
interruptibility of other operations (e.g.  like NFS 'intr' mount option).

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:19 -07:00
Miklos Szeredi
f9a2842e56 [PATCH] fuse: rename the interrupted flag
Rename the 'interrupted' flag to 'aborted', since it indicates exactly that,
and next patch will introduce an 'interrupted' flag for a

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:19 -07:00
Miklos Szeredi
33649c91a3 [PATCH] fuse: ensure FLUSH reaches userspace
All POSIX locks owned by the current task are removed on close().  If the
FLUSH request resulting initiated by close() fails to reach userspace, there
might be locks remaining, which cannot be removed.

The only reason it could fail, is if allocating the request fails.  In this
case use the request reserved for RELEASE, or if that is currently used by
another FLUSH, wait for it to become available.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:19 -07:00
Miklos Szeredi
7142125937 [PATCH] fuse: add POSIX file locking support
This patch adds POSIX file locking support to the fuse interface.

This implementation doesn't keep any locking state in kernel.  Unlocking on
close() is handled by the FLUSH message, which now contains the lock owner id.

Mandatory locking is not supported.  The filesystem may enfoce mandatory
locking in userspace if needed.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:19 -07:00
Miklos Szeredi
bafa96541b [PATCH] fuse: add control filesystem
Add a control filesystem to fuse, replacing the attributes currently exported
through sysfs.  An empty directory '/sys/fs/fuse/connections' is still created
in sysfs, and mounting the control filesystem here provides backward
compatibility.

Advantages of the control filesystem over the previous solution:

  - allows the object directory and the attributes to be owned by the
    filesystem owner, hence letting unpriviled users abort the
    filesystem connection

  - does not suffer from module unload race

[akpm@osdl.org: fix this fs for recent dhowells depredations]
[akpm@osdl.org: fix 64-bit printk warnings]
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:19 -07:00
Miklos Szeredi
51eb01e735 [PATCH] fuse: no backgrounding on interrupt
Don't put requests into the background when a fatal interrupt occurs while the
request is in userspace.  This removes a major wart from the implementation.

Backgrounding of requests was introduced to allow breaking of deadlocks.
However now the same can be achieved by aborting the filesystem through the
'abort' sysfs attribute.

This is a change in the interface, but should not cause problems, since these
kinds of deadlocks never happen during normal operation.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:19 -07:00
Ian Kent
f9022f6633 [PATCH] autofs4: need to invalidate children on tree mount expire
I've found a case where invalid dentrys in a mount tree, waiting to be
cleaned up by d_invalidate, prevent the expected expire.

In this case dentrys created during a lookup for which a mount fails or has
no entry in the mount map contribute to the d_count of the parent dentry.
These dentrys may not be invalidated prior to comparing the interanl usage
count of valid autofs dentrys against the dentry d_count which makes a
mount tree appear busy so it doesn't expire.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:18 -07:00
Peter Staubach
6e656be899 [PATCH] ftruncate does not always update m/ctime
In the course of trying to track down a bug where a file mtime was not
being updated correctly, it was discovered that the m/ctime updates were
not quite being handled correctly for ftruncate() calls.

Quoth SUSv3:

open(2):

        If O_TRUNC is set and the file did previously exist, upon
        successful completion, open() shall mark for update the st_ctime
        and st_mtime fields of the file.

truncate(2):

        Upon successful completion, if the file size is changed, this
        function shall mark for update the st_ctime and st_mtime fields
        of the file, and the S_ISUID and S_ISGID bits of the file mode
        may be cleared.

ftruncate(2):

        Upon successful completion, if fildes refers to a regular file,
        the ftruncate() function shall mark for update the st_ctime and
        st_mtime fields of the file and the S_ISUID and S_ISGID bits of
        the file mode may be cleared. If the ftruncate() function is
        unsuccessful, the file is unaffected.

The open(O_TRUNC) and truncate cases were being handled correctly, but the
ftruncate case was being handled like the truncate case.  The semantics of
truncate and ftruncate don't quite match, so ftruncate needs to be handled
slightly differently.

The attached patch addresses this issue for ftruncate(2).

My thanx to Stephen Tweedie and Trond Myklebust for their help in
understanding the situation and semantics.

Signed-off-by: Peter Staubach <staubach@redhat.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:15 -07:00
Johann Lombardi
92eeccd8ba [PATCH] ext3: cleanup dead code in ext3_add_entry()
The variables nlen and rlen are defined/initialized but not used in
ext3_add_entry().

Signed-off-by: Johann Lombardi <johann.lombardi@bull.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:15 -07:00
Florin Malita
0710d36a0f [PATCH] 9pfs: missing result check in v9fs_vfs_readlink() and v9fs_vfs_link()
__getname() may fail and return NULL (as pointed out by Coverity 437 &
1220).

Signed-off-by: Florin Malita <fmalita@gmail.com>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
Cc: <rminnich@lanl.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:15 -07:00
Davide Libenzi
3419b23a91 [PATCH] epoll: use unlocked wqueue operations
A few days ago Arjan signaled a lockdep red flag on epoll locks, and
precisely between the epoll's device structure lock (->lock) and the wait
queue head lock (->lock).

Like I explained in another email, and directly to Arjan, this can't happen
in reality because of the explicit check at eventpoll.c:592, that does not
allow to drop an epoll fd inside the same epoll fd.  Since lockdep is
working on per-structure locks, it will never be able to know of policies
enforced in other parts of the code.

It was decided time ago of having the ability to drop epoll fds inside
other epoll fds, that triggers a very trick wakeup operations (due to
possibly reentrant callback-driven wakeups) handled by the
ep_poll_safewake() function.  While looking again at the code though, I
noticed that all the operations done on the epoll's main structure wait
queue head (->wq) are already protected by the epoll lock (->lock), so that
locked-style functions can be used to manipulate the ->wq member.  This
makes both a lock-acquire save, and lockdep happy.

Running totalmess on my dual opteron for a while did not reveal any problem
so far:

http://www.xmailserver.org/totalmess.c

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:13 -07:00
Valerie Henson
21730eed11 [PATCH] Make EXT2_DEBUG work again
This patch makes EXT2_DEBUG work again.  Due to lack of proper include
file, EXT2_DEBUG was undefined in bitmap.c and ext2_count_free() is left
out.  Moved to balloc.c and removed bitmap.c entirely.

Second, debug versions of ext2_count_free_{inodes/blocks} reacquires
superblock lock.  Moved lock into callers.

Signed-off-by: Val Henson <val_henson@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:12 -07:00
H. Peter Anvin
69755652c9 [PATCH] Make procfs obligatory except under CONFIG_EMBEDDED
Make procfs non-optional unless EMBEDDED is set, just like sysfs.  procfs
is already de facto required for a large subset of Linux functionality.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:11 -07:00
Mingming Cao
43d23f9039 [PATCH] ext3_fsblk_t: the rest of in-kernel filesystem blocks conversion
Convert the ext3 in-kernel filesystem blocks to ext3_fsblk_t.  Convert the
rest of all unsigned long type in-kernel filesystem blocks to ext3_fsblk_t,
and replace the printk format string respondingly.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:10 -07:00
Mingming Cao
1c2bf374a4 [PATCH] ext3_fsblk_t: filesystem, group blocks and bug fixes
Some of the in-kernel ext3 block variable type are treated as signed 4 bytes
int type, thus limited ext3 filesystem to 8TB (4kblock size based).  While
trying to fix them, it seems quite confusing in the ext3 code where some
blocks are filesystem-wide blocks, some are group relative offsets that need
to be signed value (as -1 has special meaning).  So it seem saner to define
two types of physical blocks: one is filesystem wide blocks, another is
group-relative blocks.  The following patches clarify these two types of
blocks in the ext3 code, and fix the type bugs which limit current 32 bit ext3
filesystem limit to 8TB.

With this series of patches and the percpu counter data type changes in the mm
tree, we are able to extend exts filesystem limit to 16TB.

This work is also a pre-request for the recent >32 bit ext3 work, and makes
the kernel to able to address 48 bit ext3 block a lot easier: Simply redefine
ext3_fsblk_t from unsigned long to sector_t and redefine the format string for
ext3 filesystem block corresponding.

Two RFC with a series patches have been posted to ext2-devel list and have
been reviewed and discussed:
http://marc.theaimsgroup.com/?l=ext2-devel&m=114722190816690&w=2

http://marc.theaimsgroup.com/?l=ext2-devel&m=114784919525942&w=2

Patches are tested on both 32 bit machine and 64 bit machine, <8TB ext3 and
>8TB ext3 filesystem(with the latest to be released e2fsprogs-1.39).  Tests
includes overnight fsx, tiobench, dbench and fsstress.

This patch:

Defines ext3_fsblk_t and ext3_grpblk_t, and the printk format string for
filesystem wide blocks.

This patch classifies all block group relative blocks, and ext3_fsblk_t blocks
occurs in the same function where used to be confusing before.  Also include
kernel bug fixes for filesystem wide in-kernel block variables.  There are
some fileystem wide blocks are treated as int/unsigned int type in the kernel
currently, especially in ext3 block allocation and reservation code.  This
patch fixed those bugs by converting those variables to ext3_fsblk_t(unsigned
long) type.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:10 -07:00
NeilBrown
01408c4939 [PATCH] Prepare for __copy_from_user_inatomic to not zero missed bytes
The problem is that when we write to a file, the copy from userspace to
pagecache is first done with preemption disabled, so if the source address is
not immediately available the copy fails *and* *zeros* *the* *destination*.

This is a problem because a concurrent read (which admittedly is an odd thing
to do) might see zeros rather that was there before the write, or what was
there after, or some mixture of the two (any of these being a reasonable thing
to see).

If the copy did fail, it will immediately be retried with preemption
re-enabled so any transient problem with accessing the source won't cause an
error.

The first copying does not need to zero any uncopied bytes, and doing so
causes the problem.  It uses copy_from_user_atomic rather than copy_from_user
so the simple expedient is to change copy_from_user_atomic to *not* zero out
bytes on failure.

The first of these two patches prepares for the change by fixing two places
which assume copy_from_user_atomic does zero the tail.  The two usages are
very similar pieces of code which copy from a userspace iovec into one or more
page-cache pages.  These are changed to remove the assumption.

The second patch changes __copy_from_user_inatomic* to not zero the tail.
Once these are accepted, I will look at similar patches of other architectures
where this is important (ppc, mips and sparc being the ones I can find).

This patch:

There is a problem with __copy_from_user_inatomic zeroing the tail of the
buffer in the case of an error.  As it is called in atomic context, the error
may be transient, so it results in zeros being written where maybe they
shouldn't be.

In the usage in filemap, this opens a window for a well timed read to see data
(zeros) which is not consistent with any ordering of reads and writes.

Most cases where __copy_from_user_inatomic is called, a failure results in
__copy_from_user being called immediately.  As long as the latter zeros the
tail, the former doesn't need to.  However in *copy_from_user_iovec
implementations (in both filemap and ntfs/file), it is assumed that
copy_from_user_inatomic will zero the tail.

This patch removes that assumption, so that after this patch it will
be safe for copy_from_user_inatomic to not zero the tail.

This patch also adds some commentary to filemap.h and asm-i386/uaccess.h.

After this patch, all architectures that might disable preempt when
kmap_atomic is called need to have their __copy_from_user_inatomic* "fixed".
This includes
 - powerpc
 - i386
 - mips
 - sparc

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:09 -07:00
Theodore Ts'o
d2e5b13c4a [PATCH] ext3: remove inconsistent space before exclamation point in mount code
This was reported as Debian bug #336604.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:07 -07:00
Theodore Ts'o
e8f1c6227a [PATCH] ext3: fix memory leak when the journal file is corrupted
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:07 -07:00
Theodore Ts'o
f16fdadba2 [PATCH] ext2: clean up dead code from mount code
The variable i is guaranteed to be the same as db_count given the previous
for loop.  So get rid of it since it's dead code.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:07 -07:00
Mingming Cao
fcd5df3588 [PATCH] Avoid disk sector_t overflow for >2TB ext3 filesystem
If ext3 filesystem is larger than 2TB, and sector_t is a u32 (i.e.
CONFIG_LBD not defined in the kernel), the calculation of the disk sector
will overflow.  Add check at ext3_fill_super() and ext3_group_extend() to
prevent mount/remount/resize >2TB ext3 filesystem if sector_t size is 4
bytes.

Verified this patch on a 32 bit platform without CONFIG_LBD defined
(sector_t is 32 bits long), mount refuse to mount a 10TB ext3.

Signed-off-by: Mingming Cao<cmm@us.ibm.com>
Acked-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:07 -07:00
Jan Engelhardt
a04ee14636 [PATCH] openpromfs: factorize out
"Move" "common code" out to PTR_NOD, which does the conversion from private
pointer to node number.  This is to reduce potential casting/conversion errors
due to redundancy.  (The naming PTR_NOD follows PTR_ERR, turning a pointer
into xyz.)

[akpm@osdl.org: cleanups]
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:05 -07:00
Jan Engelhardt
515decdccf [PATCH] openpromfs: remove unnecessary casts
Remove unnecessary casts in fs/openpromfs/inode.c

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:05 -07:00
Jan Engelhardt
0928d68056 [PATCH] openpromfs: fix missing NUL
tchars is not '\0'-terminated so the strtoul may run into problems.  Fix that.
 Also make tchars as big as a long in hexadecimal form would take rather than
just 16.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:05 -07:00
Adrian Bunk
138bb68ac9 [PATCH] fs/ufs/inode.c: make 2 functions static
Make two needlessly global functions static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:04 -07:00
Evgeniy Dushistov
098d5af7be [PATCH] ufs: ubh_ll_rw_block cleanup
In ufs code there is function: ubh_ll_rw_block, it has parameter how many
ufs_buffer_head it should handle, but it always called with "1" on the place
of this parameter.  This patch removes unused parameter of "ubh_ll_wr_block".

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:04 -07:00
Evgeniy Dushistov
ee3ffd6c12 [PATCH] ufs: make fsck -f happy
ufs super block contains some statistic about file systems, like amount of
directories, free blocks, inodes and so on.

UFS1 hold this information in one location and uses 32bit integers for such
information, UFS2 hold statistic in another location and uses 64bit integers.

There is transition variant, if UFS1 has type 44BSD and flags field in super
block has some special value this mean that we work with statistic like UFS2
does.  and this also means that nobody care about old(UFS1) statistic.

So if start fsck against such file system, after usage linux ufs driver, it
found error: at now only UFS1 like statistic is updated.

This patch should fix this.  Also it contains some minor cleanup: CodingSytle
and remove unused variables.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:04 -07:00
Evgeniy Dushistov
577a82752f [PATCH] ufs: fsync implementation
Presently ufs doesn't support "fsync", this make some applications unhappy,
for example vim.  This patch fixes this situation.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:04 -07:00
Evgeniy Dushistov
647b7e87b5 [PATCH] ufs: one way to access super block
Super block of UFS usually has size >512, because of fragment size may be 512,
this cause some problems.

Currently, there are two methods to work with ufs super block:

1) split structure which describes ufs super blocks into structures with
   size <=512

2) use one structure which describes ufs super block, and hope that array
   of "buffer_head" which holds "super block", has such construction:

	bh[n]->b_data + bh[n]->b_size == bh[n + 1]->b_data

The second variant may cause some problems in the future, and usage of two
variants cause unnecessary code duplication.

This patch remove the second variant.  Also patch contains some CodingStyle
fixes.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:04 -07:00
Evgeniy Dushistov
f391475812 [PATCH] ufs: missed brelse and wrong baseblk
This patch fixes two bugs, which introduced by previous patches:

1) Missed "brelse"

2) Sometimes "baseblk" may be wrongly calculated, if i_size is equal to
   zero, which lead infinite cycle in "mpage_writepages".

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:04 -07:00
Andrew Morton
96710b29e0 [PATCH] ufs: printk warning fixes
fs/ufs/super.c: In function `ufs_print_super_stuff':
fs/ufs/super.c:103: warning: unsigned int format, different type arg (arg 2)    fs/ufs/super.c: In function `ufs2_print_super_stuff':                           fs/ufs/super.c:147: warning: unsigned int format, different type arg (arg 2)    fs/ufs/super.c: In function `ufs_print_cylinder_stuff':
fs/ufs/super.c:175: warning: unsigned int format, different type arg (arg 2)

Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:04 -07:00
Evgeniy Dushistov
022a6dc5f4 [PATCH] ufs: zero metadata
Presently if we allocate several "metadata" blocks (pointers to indirect
blocks for example), we fill with zeroes only the first block.  This cause
some problems in "truncate" function.  Also this patch remove some unused
arguments from several functions and add comments.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:03 -07:00
Evgeniy Dushistov
2e006393ba [PATCH] ufs: unlock_super without lock
ufs_free_blocks function looks now in so way:
if (err)
 goto failed;
 lock_super();
failed:
 unlock_super();

So if error happen we'll unlock not locked super.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:03 -07:00
Evgeniy Dushistov
50aa4eb0b9 [PATCH] ufs: i_blocks wrong count
At now UFS code uses DQUOT_* mechanism, but it also update inode->i_blocks
manually, this cause wrong i_blocks value.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:03 -07:00
Evgeniy Dushistov
dd187a2603 [PATCH] ufs: little directory lookup optimization
This patch make little optimization of ufs_find_entry like "ext2" does.  Save
number of page and reuse it again in the next call.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:03 -07:00
Evgeniy Dushistov
abf5d15fd2 [PATCH] ufs: easy debug
Currently to turn on debug mode "user" has to edit ~10 files, to turn off he
has to do it again.

This patch introduce such changes:
1)turn on(off) debug messages via ".config"
2)remove unnecessary duplication of code
3)make "UFSD" macros more similar to function
4)fix some compiler warnings

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:03 -07:00
Evgeniy Dushistov
5afb3145c9 [PATCH] ufs: Unmark CONFIG_UFS_FS_WRITE as BROKEN
To find new bugs, I suggest revert this patch:
http://lkml.org/lkml/2006/1/31/275 in -mm tree.

So others can test "write support" of UFS.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:03 -07:00
Evgeniy Dushistov
3e41f597b1 [PATCH] ufs: not usual amounts of fragments per block
The writing to UFS file system with block/fragment!=8 may cause bogus
behaviour.  The problem in "ufs_bitmap_search" function, which doesn't work
correctly in "block/fragment!=8" case.  The idea is stolen from BSD code.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:02 -07:00
Evgeniy Dushistov
9695ef16ed [PATCH] ufs: wrong type cast
There are two ugly macros in ufs code:
#define UCPI_UBH ((struct ufs_buffer_head *)ucpi)
#define USPI_UBH ((struct ufs_buffer_head *)uspi)
when uspi looks like
struct {
struct ufs_buffer_head ;
}
and USPI_UBH has some sence,
ucpi looks like
struct {
struct not_ufs_buffer_head;
}

To prevent bugs in future, this patch convert macros to inline function and
fix "ucpi" structure.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:02 -07:00
Evgeniy Dushistov
b71034e5e6 [PATCH] ufs: directory and page cache: from blocks to pages
Change function in fs/ufs/dir.c and fs/ufs/namei.c to work with pages
instead of straight work with blocks.  It fixed such bugs:

* for i in `seq 1 1000`; do touch $i; done - crash system
* mkdir create directory without "." and ".." entries

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:02 -07:00
Evgeniy Dushistov
826843a347 [PATCH] ufs: directory and page cache: install aops
This series of patches finished "bugs fixing" mentioned
here http://lkml.org/lkml/2006/1/31/275 .

The main bugs:
* for i in `seq 1 1000`; do touch $i; done - crash system
* mkdir create directory without "." and ".." entries

The suggested solution is work with page cache instead of straight work
with blocks.  Such solution has following advantages

* reduce code size and its complexity
* some global locks go away
* fix bugs

The most part of code is stolen from ext2, because of it has similar
directory structure.

Patches testes with UFS1 and UFS2 file systems.

This patch installs i_mapping->a_ops for directory inodes and removes some
duplicated code.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:02 -07:00
Evgeniy Dushistov
6ef4d6bf86 [PATCH] ufs: change block number on the fly
First of all some necessary notes about UFS by it self: To avoid waste of disk
space the tail of file consists not from blocks (which is ordinary big enough,
16K usually), it consists from fragments(which is ordinary 2K).  When file is
growing its tail occupy 1 fragment, 2 fragments...  At some stage decision to
allocate whole block is made and all fragments are moved to one block.

How this situation was handled before:

  ufs_prepare_write
  ->block_prepare_write
    ->ufs_getfrag_block
      ->...
        ->ufs_new_fragments:

	bh = sb_bread
	bh->b_blocknr = result + i;
	mark_buffer_dirty (bh);

This is wrong solution, because:

- it didn't take into consideration that there is another cache: "inode page
  cache"

- because of sb_getblk uses not b_blocknr, (it uses page->index) to find
  certain block, this breaks sb_getblk.

How this situation is handled now: we go though all "page inode cache", if
there are no such page in cache we load it into cache, and change b_blocknr.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:01 -07:00
Evgeniy Dushistov
c9a27b5dca [PATCH] ufs: right block allocation
* After block allocation, we map it on the same "address" as 8 others
  blocks

* We nullify block several times: once in ufs/block.c and once in
  block_*write_full_page, and use different "caches" for this.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:01 -07:00
Evgeniy Dushistov
2061df0f89 [PATCH] ufs: ufs_trunc_indirect: infinite cycle
Currently, ufs write support have two sets of problems: work with files and
work with directories.

This series of patches should solve the first problem.

This patch is similar to http://lkml.org/lkml/2006/1/17/61 this patch
complements it.

The situation the same: in ufs_trunc_(not direct), we read block, check if
count of links to it is equal to one, if so we finish cycle, if not
continue.  Because of "count of links" always >=2 this operation cause
infinite cycle and hang up the kernel.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:01 -07:00
Cliff Wickman
552c03483e [PATCH] fs/freevxfs: cleanup of spelling errors
Fix of some spelling errors in fs/freevxfs error messages and comments

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:01 -07:00
Alexey Dobriyan
9bf2aa129a nfs: remove nfs_put_link()
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-25 06:39:35 -04:00
Andrew Morton
6ab86aa130 nfs-build-fix-99
fs/built-in.o:(__param+0x20): undefined reference to `nfs_idmap_cache_timeout'
fs/built-in.o:(__param+0x48): undefined reference to `nfs_callback_set_tcpport'

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andreas Gruenbacher <agruen@suse.de>
Cc: Andy Adamson <andros@citi.umich.edu>
Cc: Chuck Lever <cel@netapp.com>
Cc: David Howells <dhowells@redhat.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Manoj Naik <manoj@almaden.ibm.com>
Cc: Marc Eshel <eshel@almaden.ibm.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-25 06:38:47 -04:00
Andrew Morton
d75d54147d git-nfs-build-fixes
Fix various problems with nfs4 disabled.  And various other things.

In file included from fs/nfs/inode.c:50:
fs/nfs/internal.h:24: error: static declaration of 'nfs_do_refmount' follows non-static declaration
include/linux/nfs_fs.h:320: error: previous declaration of 'nfs_do_refmount' was here
fs/nfs/internal.h:65: warning: 'struct nfs4_fs_locations' declared inside parameter list
fs/nfs/internal.h:65: warning: its scope is only this definition or declaration, which is probably not what you want
fs/nfs/internal.h: In function 'nfs4_path':
fs/nfs/internal.h:97: error: 'struct nfs_server' has no member named 'mnt_path'
fs/nfs/inode.c: In function 'init_once':
fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'open_states'
fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'delegation'
fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'delegation_state'
fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'rwsem'
distcc[26452] ERROR: compile fs/nfs/inode.c on g5/64 failed
make[1]: *** [fs/nfs/inode.o] Error 1
make: *** [fs/nfs/inode.o] Error 2
make: *** Waiting for unfinished jobs....
In file included from fs/nfs/nfs3xdr.c:26:
fs/nfs/internal.h:24: error: static declaration of 'nfs_do_refmount' follows non-static declaration
include/linux/nfs_fs.h:320: error: previous declaration of 'nfs_do_refmount' was here
fs/nfs/internal.h:65: warning: 'struct nfs4_fs_locations' declared inside parameter list
fs/nfs/internal.h:65: warning: its scope is only this definition or declaration, which is probably not what you want
fs/nfs/internal.h: In function 'nfs4_path':
fs/nfs/internal.h:97: error: 'struct nfs_server' has no member named 'mnt_path'
distcc[26486] ERROR: compile fs/nfs/nfs3xdr.c on g5/64 failed
make[1]: *** [fs/nfs/nfs3xdr.o] Error 1
make: *** [fs/nfs/nfs3xdr.o] Error 2
In file included from fs/nfs/nfs3proc.c:24:
fs/nfs/internal.h:24: error: static declaration of 'nfs_do_refmount' follows non-static declaration
include/linux/nfs_fs.h:320: error: previous declaration of 'nfs_do_refmount' was here
fs/nfs/internal.h:65: warning: 'struct nfs4_fs_locations' declared inside parameter list
fs/nfs/internal.h:65: warning: its scope is only this definition or declaration, which is probably not what you want
fs/nfs/internal.h: In function 'nfs4_path':
fs/nfs/internal.h:97: error: 'struct nfs_server' has no member named 'mnt_path'
distcc[26469] ERROR: compile fs/nfs/nfs3proc.c on bix/32 failed
make[1]: *** [fs/nfs/nfs3proc.o] Error 1
make: *** [fs/nfs/nfs3proc.o] Error 2
**FAILED**

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andreas Gruenbacher <agruen@suse.de>
Cc: Andy Adamson <andros@citi.umich.edu>
Cc: Chuck Lever <cel@netapp.com>
Cc: David Howells <dhowells@redhat.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Manoj Naik <manoj@almaden.ibm.com>
Cc: Marc Eshel <eshel@almaden.ibm.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-25 06:38:11 -04:00
Trond Myklebust
ccf01ef7aa Merge branch 'odirect' 2006-06-25 06:27:31 -04:00
Andrew Morton
1b77c54ee1 V4L/DVB (3809a): Remove compat stuff for DMX_GET_EVENT
The ioctl were removed by:
V4L/DVB (3727): Remove DMX_GET_EVENT and associated data structures
due to the ioctl DMX_GET_EVENT has never been implemented, and also
scrambling events can't be generated in a useful way by the hardware.

This patch removes the corresponding entry at fs/compat_ioctl.c

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2006-06-25 01:58:10 -03:00
Chuck Lever
82b145c5a5 NFS: alloc nfs_read/write_data as direct I/O is scheduled
Re-arrange the logic in the NFS direct I/O path so that nfs_read/write_data
structs are allocated just before they are scheduled, rather than
allocating them all at once before we start scheduling requests.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-24 13:11:39 -04:00
Chuck Lever
06cf6f2ed0 NFS: Eliminate nfs_get_user_pages()
Neil Brown observed that the kmalloc() in nfs_get_user_pages() is more
likely to fail if the I/O is large enough to require the allocation of more
than a single page to keep track of all the pinned pages in the user's
buffer.

Instead of tracking one large page array per dreq/iocb, track pages per
nfs_read/write_data, just like the cached I/O path does.  An array for
pages is already allocated for us by nfs_readdata_alloc() (and the write
and commit equivalents).

This is also required for adding support for vectored I/O to the NFS direct
I/O path.

The original reason to pin the user buffer and allocate all the NFS data
structures before trying to schedule I/O was to ensure all needed resources
are allocated on the client before starting to send requests.  This reduces
the chance that resource exhaustion on the client will cause a short read
or write.

On the other hand, for an application making very large application I/O
requests, this means that it will be nearly impossible for the application
to make forward progress on a resource-limited client.

Thus, moving the buffer pinning functionality into the I/O scheduling
loops should be good for scalability.  The next patch will do the same for
NFS data structure allocation.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-24 13:11:39 -04:00
Chuck Lever
9c93ab7dff NFS: refactor nfs_direct_free_user_pages
Clean-up and fix a minor bug: the logic was dirtying page cache pages on
both read and write operations.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-24 13:11:39 -04:00
Chuck Lever
51a7bc6cae NFS: remove user_addr, user_count, and pos from nfs_direct_req
Make the user_addr, user_count, and pos parameters explicit to the
scheduler routines, and remove the fields from nfs_direct_req.  The
iovec API will be passing in a series of these, not just one set.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-24 13:11:39 -04:00
Chuck Lever
fedb595c66 NFS: "open code" the NFS direct write rescheduler
An NFSv3/v4 client must reschedule on-the-wire writes if the writes are
UNSTABLE, and the server reboots before the client can complete a
subsequent COMMIT request.

To support direct asynchronous scatter-gather writes, the write
rescheduler in fs/nfs/direct.c must not depend on the I/O parameters
in the controlling nfs_direct_req structure.  iovecs can be somewhat
arbitrarily complex, so there could be an unbounded amount of information
to save for a rarely encountered requirement.

Refactor the direct write rescheduler so it uses information from each
nfs_write_data structure to reschedule writes, instead of caching that
information in the controlling nfs_direct_req structure.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-24 13:11:38 -04:00
Chuck Lever
b1c5921c5b NFS: Separate functions for counting outstanding NFS direct I/Os
Factor out the logic that increments and decrements the outstanding I/O
count.  This will be a commonly used bit of code in upcoming patches.
Also make this an atomic_t again, since it will be very often manipulated
outside dreq->spin lock.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-24 13:11:38 -04:00
Trond Myklebust
816724e65c Merge branch 'master' of /home/trondmy/kernel/linux-2.6/
Conflicts:

	fs/nfs/inode.c
	fs/super.c

Fix conflicts between patch 'NFS: Split fs/nfs/inode.c' and patch
'VFS: Permit filesystem to override root dentry on mount'
2006-06-24 13:07:53 -04:00
Jens Axboe
9e94cd4fd1 [PATCH] splice: retrieve mapping after locking the page
Otherwise we could be racing with truncate/mapping removal.

Problem found/fixed by Nick Piggin <npiggin@suse.de>, logic rewritten
by me.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:39 +02:00
Jens Axboe
b31dc66a54 [PATCH] Kill PF_SYNCWRITE flag
A process flag to indicate whether we are doing sync io is incredibly
ugly. It also causes performance problems when one does a lot of async
io and then proceeds to sync it. Part of the io will go out as async,
and the other part as sync. This causes a disconnect between the
previously submitted io and the synced io. For io schedulers such as CFQ,
this will cause us lost merges and suboptimal behaviour in scheduling.

Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
the O_DIRECT path just directly indicate that the writes are sync
by using WRITE_SYNC instead.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:39 +02:00
Mike Miller
98bd34eaf1 [PATCH] make kernel warn about incorrectly sized partitions
Sometimes partitions claim to be larger than the reported capacity of a
disk device.  This patch makes the kernel warn about those partitions.

We still permit these patitions to be used.  Quoting Andries Brouwer
<Andries.Brouwer@cwi.nl>:

 Case 1: The kernel is mistaken about the size of the disk.  (There are
 commands to clip a disk to a certain capacity, there are jumpers to tell a
 disk that it should report a certain capacity etc.  Usually this is because
 of BIOS bugs.  In bad cases the machine will crash in the BIOS and hence fail
 to boot if the disk reports full capacity.) In such cases actually accessing
 the blocks of the partition may work fine, or may work fine after running an
 unclip utility.  I wrote "setmax" some years ago precisely for this reason.

 Case 2: There was a messy partition table (maybe just a rounding error) but
 the actual filesystem on the partition is contained in the physical disk.
 Now using the filesystem goes without problem.

 Case 3: Both partition and filesystem extend beyond the end of the disk.  In
 forensic or debugging situations one often uses a copy of the start of a
 disk.  Now access beyond the end gives an expected I/O error.

Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Stephen Cameron <steve.cameron@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:09 -07:00
Jan Kara
78ce89c92b [PATCH] JBD: split checkpoint lists
Split the checkpoint list of the transaction into two lists.  In the first
list we keep the buffers that need to be submitted for IO.  In the second
list are kept buffers that were already submitted and we just have to wait
for the IO to complete.  This should simplify a handling of checkpoint
lists a bit and can eventually be also a performance gain.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:08 -07:00
Oleg Nesterov
626ab0e69d [PATCH] list: use list_replace_init() instead of list_splice_init()
list_splice_init(list, head) does unneeded job if it is known that
list_empty(head) == 1.  We can use list_replace_init() instead.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:07 -07:00
Mingming Cao
0216bfcffe [PATCH] percpu counter data type changes to suppport more than 2**31 ext3 free blocks counter
The percpu counter data type are changed in this set of patches to support
more users like ext3 who need more than 32 bit to store the free blocks
total in the filesystem.

- Generic perpcu counters data type changes.  The size of the global counter
  and local counter were explictly specified using s64 and s32.  The global
  counter is changed from long to s64, while the local counter is changed from
  long to s32, so we could avoid doing 64 bit update in most cases.

- Users of the percpu counters are updated to make use of the new
  percpu_counter_init() routine now taking an additional parameter to allow
  users to pass the initial value of the global counter.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:06 -07:00
Jesper Juhl
785d55708c [PATCH] binflt_elf: remove more casts
Remove redundant casts from NEW_AUX_ENT() arguments in fs/binfmt_elf.c

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:05 -07:00
Jesper Juhl
f4e5cc2c44 [PATCH] binfmt_elf: CodingStyle cleanup and remove some pointless casts
Do a CodingStyle cleanup of fs/binfmt_elf.c and also remove some pointless
casts of kmalloc() return values in the same file.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:05 -07:00
Andrew Morton
e6022603b9 [PATCH] ext3_clear_inode(): avoid kfree(NULL)
Steven Rostedt <rostedt@goodmis.org> points out that `rsv' here is usually
NULL, so we should avoid calling kfree().

Also, fix up some nearby whitespace damage.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:05 -07:00
Andrew Morton
304c4c841a [PATCH] jbd: avoid kfree(NULL)
There are a couple of places where JBD has to check to see whether an unneeded
memory allocation was performed.  Usually it _was_ needed, so we end up
calling kfree(NULL).  We can micro-optimise that by checking the pointer
before calling kfree().

Thanks to Steven Rostedt <rostedt@goodmis.org> for identifying this.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:05 -07:00
Jan Kara
9ada734098 [PATCH] jbd: fix BUG in journal_commit_transaction()
Fix possible assertion failure in journal_commit_transaction() on
jh->b_next_transaction == NULL (when we are processing BJ_Forget list and
buffer is not jbddirty).

!jbddirty buffers can be placed on BJ_Forget list for example by
journal_forget() or by __dispose_buffer() - generally such buffer means
that it has been freed by this transaction.

Freed buffers should not be reallocated until the transaction has committed
(that's why we have the assertion there) but they *can* be reallocated when
the transaction has already been committed to disk and we are just
processing the BJ_Forget list (as soon as we remove b_committed_data from
the bitmap bh, ext3 will be able to reallocate buffers freed by the
committing transaction).  So we have to also count with the case that the
buffer has been reallocated and b_next_transaction has been already set.

And one more subtle point: it can happen that we manage to reallocate the
buffer and also mark it jbddirty.  Then we also add the freed buffer to the
checkpoint list of the committing trasaction.  But that should do no harm.

Non-jbddirty buffers should be filed to BJ_Reserved and not BJ_Metadata
list.  It can actually happen that we refile such buffers during the commit
phase when we reallocate in the running transaction blocks deleted in
committing transaction (and that can happen if the committing transaction
already wrote all the data and is just cleaning up BJ_Forget list).

Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:04 -07:00
Vadim Lobanov
4a4b69f79b [PATCH] Poll cleanups/microoptimizations
The "count" and "pt" variables are declared and modified by do_poll(), as
well as accessed and written indirectly in the do_pollfd() subroutine.

This patch pulls all handling of these variables into the do_poll()
function, thereby eliminating the odd use of indirection in do_pollfd().
This is done by pulling the "struct pollfd" traversal loop from do_pollfd()
into its only caller do_poll().  As an added bonus, the patch saves a few
clock cycles, and also adds comments to make the code easier to follow.

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:03 -07:00
Adrian Bunk
2da1326463 [PATCH] fs/fat/misc.c: unexport fat_sync_bhs
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:03 -07:00
Adrian Bunk
b0904e147f [PATCH] fs/locks.c: make posix_locks_deadlock() static
We can now make posix_locks_deadlock() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:03 -07:00
Miklos Szeredi
75e1fcc0b1 [PATCH] vfs: add lock owner argument to flush operation
Pass the POSIX lock owner ID to the flush operation.

This is useful for filesystems which don't want to store any locking state
in inode->i_flock but want to handle locking/unlocking POSIX locks
internally.  FUSE is one such filesystem but I think it possible that some
network filesystems would need this also.

Also add a flag to indicate that a POSIX locking request was generated by
close(), so filesystems using the above feature won't send an extra locking
request in this case.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:02 -07:00
Miklos Szeredi
ff7b86b820 [PATCH] locks: clean up locks_remove_posix()
locks_remove_posix() can use posix_lock_file() instead of doing the lock
removal by hand.  posix_lock_file() now does exacly the same.

The comment about pids no longer applies, posix_lock_file() takes only the
owner into account.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:02 -07:00
Miklos Szeredi
39005d022a [PATCH] locks: don't do unnecessary allocations
posix_lock_file() always allocates new locks in advance, even if it's easy to
determine that no allocations will be needed.

Optimize these cases:

 - FL_ACCESS flag is set

 - Unlocking the whole range

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:02 -07:00
Miklos Szeredi
0d9a490abe [PATCH] locks: don't unnecessarily fail posix lock operations
posix_lock_file() was too cautious, failing operations on OOM, even if they
didn't actually require an allocation.

This has the disadvantage, that a failing unlock on process exit could lead to
a memory leak.  There are two possibilites for this:

- filesystem implements .lock() and calls back to posix_lock_file().  On
cleanup of files_struct locks_remove_posix() is called which should remove all
locks belonging to files_struct.  However if filesystem calls
posix_lock_file() which fails, then those locks will never be freed.

- if a file is closed while a lock is blocked, then after acquiring
fcntl_setlk() will undo the lock.  But this unlock itself might fail on OOM,
again possibly leaking the lock.

The solution is to move the checking of the allocations until after it is sure
that they will be needed.  This will solve the above problem since unlock will
always succeed unless it splits an existing region.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:02 -07:00
Pekka Enberg
090d2b185d [PATCH] read_mapping_page for address space
Add read_mapping_page() which is used for callers that pass
mapping->a_ops->readpage as the filler for read_cache_page.  This removes
some duplication from filesystem code.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:02 -07:00
Al Viro
0c426f26cc [PATCH] ext2 XIP won't build without MMU
Disable Ext2 XIP if the kernel is configured in no-MMU mode as the former
won't build.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:55 -07:00
Al Viro
530018bf3d [PATCH] frv: binfmt_elf_fdpic __user annotations
Add __user annotations to binfmt_elf_fdpic.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:54 -07:00
James Morris
03e6806063 [PATCH] lsm: add task_setioprio hook
Implement an LSM hook for setting a task's IO priority, similar to the hook
for setting a tasks's nice value.

A previous version of this LSM hook was included in an older version of
multiadm by Jan Engelhardt, although I don't recall it being submitted
upstream.

Also included is the corresponding SELinux hook, which re-uses the setsched
permission in the proccess class.

Signed-off-by: James Morris <jmorris@namei.org>
Acked-by:  Stephen Smalley <sds@tycho.nsa.gov>
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:53 -07:00
OGAWA Hirofumi
111ebb6e6f [PATCH] writeback: fix range handling
When a writeback_control's `start' and `end' fields are used to
indicate a one-byte-range starting at file offset zero, the required
values of .start=0,.end=0 mean that the ->writepages() implementation
has no way of telling that it is being asked to perform a range
request.  Because we're currently overloading (start == 0 && end == 0)
to mean "this is not a write-a-range request".

To make all this sane, the patch changes range of writeback_control.

So caller does: If it is calling ->writepages() to write pages, it
sets range (range_start/end or range_cyclic) always.

And if range_cyclic is true, ->writepages() thinks the range is
cyclic, otherwise it just uses range_start and range_end.

This patch does,

    - Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h
      -1 is usually ok for range_end (type is long long). But, if someone did,

		range_end += val;		range_end is "val - 1"
		u64val = range_end >> bits;	u64val is "~(0ULL)"

      or something, they are wrong. So, this adds LLONG_MAX to avoid nasty
      things, and uses LLONG_MAX for range_end.

    - All callers of ->writepages() sets range_start/end or range_cyclic.

    - Fix updates of ->writeback_index. It seems already bit strange.
      If it starts at 0 and ended by check of nr_to_write, this last
      index may reduce chance to scan end of file.  So, this updates
      ->writeback_index only if range_cyclic is true or whole-file is
      scanned.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Steven French <sfrench@us.ibm.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:49 -07:00
Chen, Kenneth W
a43a8c39bb [PATCH] tightening hugetlb strict accounting
Current hugetlb strict accounting for shared mapping always assume mapping
starts at zero file offset and reserves pages between zero and size of the
file.  This assumption often reserves (or lock down) a lot more pages then
necessary if application maps at none zero file offset.  libhugetlbfs is
one example that requires proper reservation on shared mapping starts at
none zero offset.

This patch extends the reservation and hugetlb strict accounting to support
any arbitrary pair of (offset, len), resulting a much more robust and
accurate scheme.  More importantly, it won't lock down any hugetlb pages
outside file mapping.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:48 -07:00
KAMEZAWA Hiroyuki
6f0419e06a [PATCH] for_each_possible_cpu: xfs
for_each_cpu() actually iterates across all possible CPUs.  We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs.  This is inefficient and
possibly buggy.

We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.

This patch replaces for_each_cpu with for_each_possible_cpu.
in xfs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
David Howells
d6938d1b27 [PATCH] XFS: Use the dentry passed to statfs() to limit the scope of the results
Enable XFS to limit the statfs() results to the project quota covering the
dentry used as a base for call.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
David Howells
726c334223 [PATCH] VFS: Permit filesystem to perform statfs with a known root dentry
Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.

This complements the get_sb() patch.  That reduced the significance of
sb->s_root, allowing NFS to place a fake root there.  However, NFS does
require a dentry to use as a target for the statfs operation.  This permits
the root in the vfsmount to be used instead.

linux/mount.h has been added where necessary to make allyesconfig build
successfully.

Interest has also been expressed for use with the FUSE and XFS filesystems.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
David Howells
454e2398be [PATCH] VFS: Permit filesystem to override root dentry on mount
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.

The filesystem is then required to manually set the superblock and root dentry
pointers.  For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).

The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.

This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing.  In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.

The patch also makes the following changes:

 (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
     pointer argument and return an integer, so most filesystems have to change
     very little.

 (*) If one of the convenience function is not used, then get_sb() should
     normally call simple_set_mnt() to instantiate the vfsmount. This will
     always return 0, and so can be tail-called from get_sb().

 (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
     dcache upon superblock destruction rather than shrink_dcache_anon().

     This is required because the superblock may now have multiple trees that
     aren't actually bound to s_root, but that still need to be cleaned up. The
     currently called functions assume that the whole tree is rooted at s_root,
     and that anonymous dentries are not the roots of trees which results in
     dentries being left unculled.

     However, with the way NFS superblock sharing are currently set to be
     implemented, these assumptions are violated: the root of the filesystem is
     simply a dummy dentry and inode (the real inode for '/' may well be
     inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
     with child trees.

     [*] Anonymous until discovered from another tree.

 (*) The documentation has been adjusted, including the additional bit of
     changing ext2_* into foo_* in the documentation.

[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
Andrew Morton
d702ccb342 [PATCH] prune_one_dentry() tweaks
- Add description of d_lock handling to comments over prune_one_dentry().

- It has three callsites - uninline it, saving 200 bytes of text.

Cc: Jan Blunck <jblunck@suse.de>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Olaf Hering <olh@suse.de>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22 15:05:57 -07:00
NeilBrown
0feae5c47a [PATCH] Fix dcache race during umount
The race is that the shrink_dcache_memory shrinker could get called while a
filesystem is being unmounted, and could try to prune a dentry belonging to
that filesystem.

If it does, then it will call in to iput on the inode while the dentry is
no longer able to be found by the umounting process.  If iput takes a
while, generic_shutdown_super could get all the way though
shrink_dcache_parent and shrink_dcache_anon and invalidate_inodes without
ever waiting on this particular inode.

Eventually the superblock gets freed anyway and if the iput tried to touch
it (which some filesystems certainly do), it will lose.  The promised
"Self-destruct in 5 seconds" doesn't lead to a nice day.

The race is closed by holding s_umount while calling prune_one_dentry on
someone else's dentry.  As a down_read_trylock is used,
shrink_dcache_memory will no longer try to prune the dentry of a filesystem
that is being unmounted, and unmount will not be able to start until any
such active prune_one_dentry completes.

This requires that prune_dcache *knows* which filesystem (if any) it is
doing the prune on behalf of so that it can be careful of other
filesystems.  shrink_dcache_memory isn't called it on behalf of any
filesystem, and so is careful of everything.

shrink_dcache_anon is now passed a super_block rather than the s_anon list
out of the superblock, so it can get the s_anon list itself, and can pass
the superblock down to prune_dcache.

If prune_dcache finds a dentry that it cannot free, it leaves it where it
is (at the tail of the list) and exits, on the assumption that some other
thread will be removing that dentry soon.  To try to make sure that some
work gets done, a limited number of dnetries which are untouchable are
skipped over while choosing the dentry to work on.

I believe this race was first found by Kirill Korotaev.

Cc: Jan Blunck <jblunck@suse.de>
Acked-by: Kirill Korotaev <dev@openvz.org>
Cc: Olaf Hering <olh@suse.de>
Acked-by: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22 15:05:57 -07:00
Miklos Szeredi
c89681ed7d [PATCH] remove steal_locks()
This patch removes the steal_locks() function.

steal_locks() doesn't work correctly with any filesystem that does it's own
lock management, including NFS, CIFS, etc.

In addition it has weird semantics on local filesystems in case tasks
sharing file-descriptor tables are doing POSIX locking operations in
parallel to execve().

The steal_locks() function has an effect on applications doing:

clone(CLONE_FILES)
  /* in child */
  lock
  execve
  lock

POSIX locks acquired before execve (by "child", "parent" or any further
task sharing files_struct) will after the execve be owned exclusively by
"child".

According to Chris Wright some LSB/LTP kind of suite triggers without the
stealing behavior, but there's no known real-world application that would
also fail.

Apps using NPTL are not affected, since all other threads are killed before
execve.

Apps using LinuxThreads are only affected if they

  - have multiple threads during exec (LinuxThreads doesn't kill other
    threads, the app may do it with pthread_kill_other_threads_np())
  - rely on POSIX locks being inherited across exec

Both conditions are documented, but not their interaction.

Apps using clone() natively are affected if they

  - use clone(CLONE_FILES)
  - rely on POSIX locks being inherited across exec

The above scenarios are unlikely, but possible.

If the patch is vetoed, there's a plan B, that involves mostly keeping the
weird stealing semantics, but changing the way lock ownership is handled so
that network and local filesystems work consistently.

That would add more complexity though, so this solution seems to be
preferred by most people.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22 15:05:57 -07:00
OGAWA Hirofumi
09d967c6f3 [PATCH] Fix a race condition between ->i_mapping and iput()
This race became a cause of oops, and can reproduce by the following.

    while true; do
	dd if=/dev/zero of=/dev/.static/dev/hdg1 bs=512 count=1000 & sync
    done

This race condition was between __sync_single_inode() and iput().

          cpu0 (fs's inode)                 cpu1 (bdev's inode)
          -----------------                 -------------------
                                       close("/dev/hda2")
                                       [...]
__sync_single_inode()
   /* copy the bdev's ->i_mapping */
   mapping = inode->i_mapping;

                                       generic_forget_inode()
                                          bdev_clear_inode()
					     /* restre the fs's ->i_mapping */
				             inode->i_mapping = &inode->i_data;
				          /* bdev's inode was freed */
                                          destroy_inode(inode);

   if (wait) {
      /* dereference a freed bdev's mapping->host */
      filemap_fdatawait(mapping);  /* Oops */

Since __sync_single_inode() is only taking a ref-count of fs's inode, the
another process can be close() and freeing the bdev's inode while writing
fs's inode.  So, __sync_signle_inode() accesses the freed ->i_mapping,
oops.

This patch takes a ref-count on the bdev's inode for the fs's inode before
setting a ->i_mapping, and the clear_inode() of the fs's inode does iput() on
the bdev's inode.  So if the fs's inode is still living, bdev's inode
shouldn't be freed.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22 15:05:57 -07:00
Anton Altaparmakov
f893afbe12 [PATCH] NTFS: Critical bug fix (affects MIPS and possibly others)
Many thanks to Pauline Ng for the detailed bug report and analysis!

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22 15:05:55 -07:00
Linus Torvalds
52ab3f3dc7 Merge git://oss.sgi.com:8090/xfs-2.6
* git://oss.sgi.com:8090/xfs-2.6: (43 commits)
  [XFS] Remove files from the build that are now unused.
  [XFS] Fix a Makefile issue related to exports.o handling.
  [XFS] Remove version 1 directory code.	Never functioned on Linux, just
  [XFS] Map EFSCORRUPTED to an actual error code, not just a made up one
  [XFS] Kill direct access to ->count in valusema(); all we ever use it for
  [XFS] Remove unneeded conditional code on NFS export interface related
  [XFS] Remove an incorrect use of unlikely() on a relatively likely code
  [XFS] Push some common code out of write path into core XFS code for
  [XFS] Remove unnecessary local from open_exec dmapi path.
  [XFS] Minor XFS documentation updates.
  [XFS] Fix broken const use inside local suffix_strtoul routine.
  [XFS] Fix nused counter.  It's currently getting set to -1 rather than
  [XFS] Fix mismerge of the fs_writable cleanup patch causing a freeze/thaw
  [XFS] Fix up debug code so that bulkstat wont generate thousands of
  [XFS] Remove unused parameter from di2xflags routine.
  [XFS] Cleanup a missed porting conversion, and freezing.
  [XFS] Resolve a namespace collision on remaining vtypes for FreeBSD
  [XFS] Resolve a namespace collision on vnode/vnodeops for FreeBSD porters.
  [XFS] Resolve a namespace collision on vfs/vfsops for FreeBSD porters.
  [XFS] statvfs component of directory/project quota support, code
  ...
2006-06-21 18:10:19 -07:00