Down the road we want to eliminate the use of the global kernel lock entirely
from the NFS client. To do this, we need to protect the fields in the
nfs_inode structure adequately. Start by serializing updates to the
"cache_validity" field.
Note this change addresses an SMP hang found by njw@osdl.org, where processes
deadlock because nfs_end_data_update and nfs_revalidate_mapping update the
"cache_validity" field without proper serialization.
Test plan:
Millions of fsx ops on SMP clients. Run Nick Wilson's breaknfs program on
large SMP clients.
Signed-off-by: Chuck Lever <cel@netapp.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce atomic bitops to manipulate the bits in the nfs_inode structure's
"flags" field.
Using bitops means we can use a generic wait_on_bit call instead of an ad hoc
locking scheme in fs/nfs/inode.c, so we can remove the "nfs_i_wait" field from
nfs_inode at the same time.
The other new flags field will continue to use bitmask and logic AND and OR.
This permits several flags to be set at the same time efficiently. The
following patch adds a spin lock to protect these flags, and this spin lock
will later cover other fields in the nfs_inode structure, amortizing the cost
of using this type of serialization.
Test plan:
Millions of fsx ops on SMP clients.
Signed-off-by: Chuck Lever <cel@netapp.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Certain bits in nfsi->flags can be manipulated with atomic bitops, and some
are better manipulated via logical bitmask operations.
This patch splits the flags field into two. The next patch introduces atomic
bitops for one of the fields.
Test plan:
Millions of fsx ops on SMP clients.
Signed-off-by: Chuck Lever <cel@netapp.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The nfsd holds the big kernel lock upon exit, when it really shouldn't.
Not to mention that this breaks Ingo's RT patch. This is a trivial fix
to release the lock.
Ingo, this patch also works with your kernel, and stops the problem with
nfsd.
Note, there's a "goto out;" where "out:" is right above svc_exit_thread.
The point of the goto also holds the kernel_lock, so I don't see any
problem here in releasing it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When the client performs an exclusive create and opens the file for writing,
a Netapp filer will first create the file using the mode 01777. It does this
since an NFSv3/v4 exclusive create cannot immediately set the mode bits.
The 01777 mode then gets put into the inode->i_mode. After the file creation
is successful, we then do a setattr to change the mode to the correct value
(as per the NFS spec).
The problem is that nfs_refresh_inode() no longer updates inode->i_mode, so
the latter retains the 01777 mode. A bit later, the VFS notices this, and calls
remove_suid(). This of course now resets the file mode to inode->i_mode & 0777.
Hey presto, the file mode on the server is now magically changed to 0777. Duh...
Fixes http://bugzilla.linux-nfs.org/show_bug.cgi?id=32
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
the buffers when mapping them after the VM had discarded them.
Thanks to Martin MOKREJŠ for the bug report.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
This adds a MOVE_SELF event to inotify. It is sent whenever the inode
you are watching is moved. We need this event so that we can catch
something like this:
- app1:
watch /etc/mtab
- app2:
cp /etc/mtab /tmp/mtab-work
mv /etc/mtab /etc/mtab~
mv /tmp/mtab-work /etc/mtab
app1 still thinks it's watching /etc/mtab but it's actually watching
/etc/mtab~.
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We are saving the wrong thing in ->last_wd. We want the wd, not the
return value.
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix path name conversion for long filenames when mapchars mount option
was specified at mount time.
Signed-off-by: Steve French (sfrench@us.ibm.com)
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix missing entries in search results when very long file names and more
than 50 (or so) of such long search entries in the directory.
FindNext could send corrupt last byte of resume name when resume key was
a few hundred bytes long file name or longer.
Fixes Samba Bug # 2932
Signed-off-by: Steve French (sfrench@us.ibm.com)
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initialize key object ID in inode so that we don't try to remove the inode
when we fail on some checks even before we manage to allocate something.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
TxAnchor.anon_list is protected by jfsTxnLock (TXN_LOCK), but there was
a place in txLock() that was removing an entry from the list without holding
the spinlock.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
The patch below unhooks fsnotify from vfs_unlink & vfs_rmdir. It
introduces two new fsnotify calls, that are hooked in at the dcache
level. This not only more closely matches how the VFS layer works, it
also avoids the problem with locking and inode lifetimes.
The two functions are
- fsnotify_nameremove -- called when a directory entry is going away.
It notifies the PARENT of the deletion. This is called from
d_delete().
- inoderemove -- called when the files inode itself is going away. It
notifies the inode that is being deleted. This is called from
dentry_iput().
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I'm resending this patch, because I still believe it's the correct fix.
Tested before/after applying the patch with a test application
available from:
http://www.inf.bme.hu/~mszeredi/nstest.c
Bind mount from a foreign namespace results in an un-removable mount.
The reason is that mnt->mnt_namespace is copied from the old mount in
clone_mnt(). Because of this check_mnt() in sys_umount() will fail.
The solution is to set mnt->mnt_namespace to current->namespace in
clone_mnt(). clone_mnt() is either called from do_loopback() or
copy_tree(). copy_tree() is called from do_loopback() or
copy_namespace().
When called (directly or indirectly) from do_loopback(), always
current->namspace is being modified: check_mnt(nd->mnt). So setting
mnt->mnt_namespace to current->namspace is the right thing to do.
When called from copy_namespace(), the setting of mnt_namespace is
irrelevant, since mnt_namespace is reset later in that function for
all copied mounts.
Jamie said:
This patch is correct. The old code was buggy for more fundamental and
serious reason: it broke the invariant that a tree of vfsmnts all have the
same value of mnt_namespace (and the same for the mnt_list list).
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: Jamie Lokier <jamie@shareable.org>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This uses the new deflateBound() thing to sanity-check the input to the
zlib decompressor before we even bother to start reading in the blocks.
Problem noted by Tim Yamin <plasmaroo@gentoo.org>
This avoids the whole #ifdef mess by just getting a copy of
dentry->d_inode before d_delete is called - that makes the codepaths the
same for the INOTIFY/DNOTIFY cases as for the regular no-notify case.
I've been running this under a Gnome session for the last 10 minutes.
Inotify is being used extensively.
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The included patch fixes a problem where a inotify client would receive a
delete event before the file was actually deleted. The bug affects both
dnotify & inotify.
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The inotify help text still refers to the character device. Update it.
Fixes kernel bug #4993.
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If there was a read error, the bnode might miss some pages, so skip them.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If inode size hasn't changed, don't do anything further in truncate, which
also prevents a dirty inode, what might upset some readonly devices quite
badly.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some error paths may iput an invalid inode with i_nlink=0. jfs should
not try to actually delete such an inode.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
When you rm a watch, an IN_IGNORED event is sent down the event queue
with the watch descriptor that you just rm'd.
If you then add a watch you could get the ignored watch's wd and if you
haven't read the entire event queue, user space will think that it's
newly created watch was just ignored.
To avoid this problem we just use idr_get_new_above instead of
idr_get_new.
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When a file is moved over an existing file that you are watching,
inotify won't send you a DELETE_SELF event and it won't unref the inode
until the inotify instance is closed by the application.
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
o sysfs_dirent's s_mode field should also be updated in sysfs_setattr(), else
there could be inconsistency in the two fields. s_mode is used while
->readdir so as not to bring in the inode to cache.
Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
o sysfs_chmod_file() must update the new iattr field in sysfs_dirent else
the mode change will not be persistent in case of inode evacuation from
cache.
Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Actually implement the hostfs "sync" method.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix bug introduced in 2.6.11-rc2: when we clone a BIO we need to copy over the
current index into it as well.
It corrupts data with some MD setups.
See http://bugzilla.kernel.org/show_bug.cgi?id=4946
Huuuuuuuuge thanks to Matthew Stapleton <matthew4196@gmail.com> for doggedly
chasing this one down.
Acked-by: Jens Axboe <axboe@suse.de>
Cc: <linux-raid@vger.kernel.org>
Cc: <dm-devel@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
`gcc -W' likes to complain if the static keyword is not at the beginning of
the declaration. This patch fixes all remaining occurrences of "inline
static" up with "static inline" in the entire kernel tree (140 occurrences in
47 files).
While making this change I came across a few lines with trailing whitespace
that I also fixed up, I have also added or removed a blank line or two here
and there, but there are no functional changes in the patch.
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
turn many #if $undefined_string into #ifdef $undefined_string to fix some
warnings after -Wno-def was added to global CFLAGS
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The cache parameter to mb_cache_shrink isn't used. We may as well remove
it.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I believe that there is a problem with the handling of POSIX locks, which
the attached patch should address.
The problem appears to be a race between fcntl(2) and close(2). A
multithreaded application could close a file descriptor at the same time as
it is trying to acquire a lock using the same file descriptor. I would
suggest that that multithreaded application is not providing the proper
synchronization for itself, but the OS should still behave correctly.
SUS3 (Single UNIX Specification Version 3, read: POSIX) indicates that when
a file descriptor is closed, that all POSIX locks on the file, owned by the
process which closed the file descriptor, should be released.
The trick here is when those locks are released. The current code releases
all locks which exist when close is processing, but any locks in progress
are handled when the last reference to the open file is released.
There are three cases to consider.
One is the simple case, a multithreaded (mt) process has a file open and
races to close it and acquire a lock on it. In this case, the close will
release one reference to the open file and when the fcntl is done, it will
release the other reference. For this situation, no locks should exist on
the file when both the close and fcntl operations are done. The current
system will handle this case because the last reference to the open file is
being released.
The second case is when the mt process has dup(2)'d the file descriptor.
The close will release one reference to the file and the fcntl, when done,
will release another, but there will still be at least one more reference
to the open file. One could argue that the existence of a lock on the file
after the close has completed is okay, because it was acquired after the
close operation and there is still a way for the application to release the
lock on the file, using an existing file descriptor.
The third case is when the mt process has forked, after opening the file
and either before or after becoming an mt process. In this case, each
process would hold a reference to the open file. For each process, this
degenerates to first case above. However, the lock continues to exist
until both processes have released their references to the open file. This
lock could block other lock requests.
The changes to release the lock when the last reference to the open file
aren't quite right because they would allow the lock to exist as long as
there was a reference to the open file. This is too long.
The new proposed solution is to add support in the fcntl code path to
detect a race with close and then to release the lock which was just
acquired when such as race is detected. This causes locks to be released
in a timely fashion and for the system to conform to the POSIX semantic
specification.
This was tested by instrumenting a kernel to detect the handling locks and
then running a program which generates case #3 above. A dangling lock
could be reliably generated. When the changes to detect the close/fcntl
race were added, a dangling lock could no longer be generated.
Cc: Matthew Wilcox <willy@debian.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oliver Paukstadt from our test department is testing the xip patches in
Linus' git-tree. He found a problem that shows when reading a file that
contains sparse blocks (holes) on a -o xip mounted ext2 filesystem: the
BUG_ON() in fs/ext2/xip.c:40 triggers where it should not. The problem was
introduced by a cleanup in my previous patch, this patch fixes it.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If the automount daemon receives a signal which causes it to sumarily
terminate the autofs4 module leaks dentries. The same problem exists with
detached mount requests without the warning.
This patch cleans these dentries at umount.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We must drop references to quota structures before releasing the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We must drop references to quota structures before releasing the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
reiserfs_new_inode() can call iput() with the xattr lock held. This will
cause a deadlock to occur when reiserfs_delete_xattrs() is called to clean
up.
The following patch releases the lock and reacquires it after the iput.
This is safe because interaction with xattrs is complete, and the relock is
just to balance out the release in the caller.
The locking needs some reworking to be more sane, but that's more intrusive
and I was just looking to fix this bug.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here's a patch to fix a missing refrigerator call in jffs2.
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Under heavy load, hot metadata pages are often locked by non-committed
transactions, making them difficult to flush to disk. This prevents
the sync point from advancing past a transaction that had modified the
page.
There is a point during the sync barrier processing where all
outstanding transactions have been committed to disk, but no new
transaction have been allowed to proceed. This is the best time
to write the metadata.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>