Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
as correctly noted at bug #12454. Someone can lookup entry with NULL
->owner, thus not pinning enything, and release it later resulting
in module refcount underflow.
We can keep ->owner and supply it at registration time like ->proc_fops
and ->data.
But this leaves ->owner as easy-manipulative field (just one C assignment)
and somebody will forget to unpin previous/pin current module when
switching ->owner. ->proc_fops is declared as "const" which should give
some thoughts.
->read_proc/->write_proc were just fixed to not require ->owner for
protection.
rmmod'ed directories will be empty and return "." and ".." -- no harm.
And directories with tricky enough readdir and lookup shouldn't be modular.
We definitely don't want such modular code.
Removing ->owner will also make PDE smaller.
So, let's nuke it.
Kudos to Jeff Layton for reminding about this, let's say, oversight.
http://bugzilla.kernel.org/show_bug.cgi?id=12454
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Stephen Rothwell reports:
Today's linux-next build (powerpc ppc64_defconfig) failed like this:
fs/built-in.o: In function `.nfs_get_client':
client.c:(.text+0x115010): undefined reference to `.__ipv6_addr_type'
Fix by moving the IPV6 specific parts of commit
d7371c41b0 ("Bug 11061, NFS mounts dropped")
into the '#ifdef IPV6..." section.
Also fix up a couple of formatting issues.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=11061
sockaddr structures can't be reliably compared using memcmp() because
there are padding bytes in the structure which can't be guaranteed to
be the same even when the sockaddr structures refer to the same
socket. Instead compare all the relevant fields. In the case of IPv6
sin6_flowinfo is not compared because it only affects QoS and
sin6_scope_id is only compared if the address is "link local" because
"link local" addresses need only be unique to a specific link.
Signed-off-by: Ian Dall <ian@beware.dropbear.id.au>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Hi Trond,
I have been looking at a bugreport where trying to open applications on KDE
on a NFS mounted home fails temporarily. There have been multiple reports on
different kernel versions pointing to this common issue:
http://bugzilla.kernel.org/show_bug.cgi?id=12557https://bugs.launchpad.net/ubuntu/+source/linux/+bug/269954http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=508866.html
This issue can be reproducible consistently by doing this on a NFS mounted
home (KDE):
1. Open 2 xterm sessions
2. From one of the xterm session, do "ssh -X <remote host>"
3. "stat ~/.Xauthority" on the remote SSH session
4. Close the two xterm sessions
5. On the server do a "stat ~/.Xauthority"
6. Now on the client, try to open xterm
This will fail.
Even if the filehandle had become stale, the NFS client should invalidate
the cache/inode and should repeat LOOKUP. Looking at the packet capture when
the failure occurs shows that there were two subsequent ACCESS() calls with
the same filehandle and both fails with -ESTALE error.
I have tested the fix below. Now the client issue a LOOKUP after the
ACCESS() call fails with -ESTALE. If all this makes sense to you, can you
consider this for inclusion?
Thanks,
If the server returns an -ESTALE error due to stale filehandle in response to
an ACCESS() call, we need to invalidate the cache and inode so that LOOKUP()
can be retried. Without this change, the nfs client retries ACCESS() with the
same filehandle, fails again and could lead to temporary failure of
applications running on nfs mounted home.
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix a memory leak due to allocation in the XDR layer. In cases where the
RPC call needs to be retransmitted, we end up allocating new pages without
clearing the old ones. Fix this by moving the allocation into
nfs3_proc_setacls().
Also fix an issue discovered by Kevin Rudd, whereby the amount of memory
reserved for the acls in the xdr_buf->head was miscalculated, and causing
corruption.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The changeset ea31a4437c (nfs: Fix
misparsing of nfsv4 fs_locations attribute) causes the mountpath that is
calculated at the beginning of try_location() to be clobbered when we
later strncpy a non-nul terminated hostname using an incorrect buffer
length.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened. They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim. This bug could
cause filesystem deadlocks.
The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called. It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock. The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.
Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on
this flag in their write_begin function. Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).
This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a
random example).
[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org> [2.6.28.x]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Cleaned up the calling convention: just pass in the AOP flags
untouched to the grab_cache_page_write_begin() function. That
just simplifies everybody, and may even allow future expansion of the
logic. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
nfs4_map_errors() can become static.
Signed-off-by: WANG Cong <wangcong@zeuux.org>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch adds client-side support to allow for callbacks other than
AUTH_SYS.
Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
3 call sites look at hdr.status before returning success.
hdr.status must be zero in this case so there's no point in this.
Currently, hdr.status is correctly processed at decode_op_hdr time
if the op status cannot be decoded.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When there are no op replies encoded in the compound reply
hdr.status still contains the overall status of the compound
rpc. This can happen, e.g., when the server returns a
NFS4ERR_MINOR_VERS_MISMATCH error.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Marten Gajda <marten.gajda@fernuni-hagen.de> states:
I tracked the problem down to the function nfs4_do_open_expired.
Within this function _nfs4_open_expired is called and may return
-NFS4ERR_DELAY. When a further call to _nfs4_open_expired is
executed and does not return -NFS4ERR_DELAY the "exception.retry"
variable is not reset to 0, causing the loop to iterate again
(and as long as err != -NFS4ERR_DELAY, probably forever)
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Hi.
I've been looking at a bugzilla which describes a problem where
a customer was advised to use either the "noac" or "actimeo=0"
mount options to solve a consistency problem that they were
seeing in the file attributes. It turned out that this solution
did not work reliably for them because sometimes, the local
attribute cache was believed to be valid and not timed out.
(With an attribute cache timeout of 0, the cache should always
appear to be timed out.)
In looking at this situation, it appears to me that the problem
is that the attribute cache timeout code has an off-by-one
error in it. It is assuming that the cache is valid in the
region, [read_cache_jiffies, read_cache_jiffies + attrtimeo]. The
cache should be considered valid only in the region,
[read_cache_jiffies, read_cache_jiffies + attrtimeo). With this
change, the options, "noac" and "actimeo=0", work as originally
expected.
This problem was previously addressed by special casing the
attrtimeo == 0 case. However, since the problem is only an off-
by-one error, the cleaner solution is address the off-by-one
error and thus, not require the special case.
Thanx...
ps
Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This ensures that we don't have to look up the dentry again after we return
the delegation if we know that the directory didn't change.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, the callback server is listening on IPv6 if it is enabled. This
means that IPv4 addresses will always be mapped.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the client is not using a delegation, the right thing to do is to return
it as soon as possible. This helps reduce the amount of state the server
has to track, as well as reducing the potential for conflicts with other
clients.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Let the actual delegreturn stuff be run in the state manager thread rather
than allocating a separate kthread.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We really shouldn't be resetting the sequence ids when doing state
expiration recovery, since we don't know if the server still remembers our
previous state owners. There are servers out there that do attempt to
preserve client state even if the lease has expired. Such a server would
only release that state if a conflicting OPEN request occurs.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a delegation cleanup phase to the state management loop, and do the
NFS4ERR_CB_PATH_DOWN recovery there.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a flag to mark delegations as requiring return, then run a garbage
collector. In the future, this will allow for more flexible delegation
management, where delegations may be marked for return if it turns out
that they are not being referenced.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv4 defines a number of state errors which the client does not currently
handle. Among those we should worry about are:
NFS4ERR_ADMIN_REVOKED - the server's administrator revoked our locks
and/or delegations.
NFS4ERR_BAD_STATEID - the client and server are out of sync, possibly
due to a delegation return racing with an OPEN
request.
NFS4ERR_OPENMODE - the client attempted to do something not sanctioned
by the open mode of the stateid. Should normally just
occur as a result of a delegation return race.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Now that we're using the flags to indicate state that needs to be
recovered, as well as having implemented proper refcounting and spinlocking
on the state and open_owners, we can get rid of nfs_client->cl_sem. The
only remaining case that was dubious was the file locking, and that case is
now covered by the nfsi->rwsem.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The unlock path is currently failing to take the nfs_client->cl_sem read
lock, and hence the recovery path may see locks disappear from underneath
it.
Also ensure that it takes the nfs_inode->rwsem read lock so that it there
is no conflict with delegation recalls.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the client for some reason is not able to recover all its state within
the time allotted for the grace period, and the server reboots again, the
client is not allowed to recover the state that was 'lost' using reboot
recovery.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Without an extra lock, we cannot just assume that the delegation->inode is
valid when we're traversing the rcu-protected nfs_client lists. Use the
delegation->lock to ensure that it is truly valid.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When we can update_open_stateid(), we need to be certain that we don't
race with a delegation return. While we could do this by grabbing the
nfs_client->cl_lock, a dedicated spin lock in the delegation structure
will scale better.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the admin has specified the "noresvport" option for an NFS mount
point, the kernel's NFS client uses an unprivileged source port for
the main NFS transport. The kernel's lockd client should use an
unprivileged port in this case as well.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>