kernel-ark/fs/nfs/nfs4_fs.h
David Howells 54ceac4515 NFS: Share NFS superblocks per-protocol per-server per-FSID
The attached patch makes NFS share superblocks between mounts from the same
server and FSID over the same protocol.

It does this by creating each superblock with a false root and returning the
real root dentry in the vfsmount presented by get_sb(). The root dentry set
starts off as an anonymous dentry if we don't already have the dentry for its
inode, otherwise it simply returns the dentry we already have.

We may thus end up with several trees of dentries in the superblock, and if at
some later point one of anonymous tree roots is discovered by normal filesystem
activity to be located in another tree within the superblock, the anonymous
root is named and materialises attached to the second tree at the appropriate
point.

Why do it this way? Why not pass an extra argument to the mount() syscall to
indicate the subpath and then pathwalk from the server root to the desired
directory? You can't guarantee this will work for two reasons:

 (1) The root and intervening nodes may not be accessible to the client.

     With NFS2 and NFS3, for instance, mountd is called on the server to get
     the filehandle for the tip of a path. mountd won't give us handles for
     anything we don't have permission to access, and so we can't set up NFS
     inodes for such nodes, and so can't easily set up dentries (we'd have to
     have ghost inodes or something).

     With this patch we don't actually create dentries until we get handles
     from the server that we can use to set up their inodes, and we don't
     actually bind them into the tree until we know for sure where they go.

 (2) Inaccessible symbolic links.

     If we're asked to mount two exports from the server, eg:

	mount warthog:/warthog/aaa/xxx /mmm
	mount warthog:/warthog/bbb/yyy /nnn

     We may not be able to access anything nearer the root than xxx and yyy,
     but we may find out later that /mmm/www/yyy, say, is actually the same
     directory as the one mounted on /nnn. What we might then find out, for
     example, is that /warthog/bbb was actually a symbolic link to
     /warthog/aaa/xxx/www, but we can't actually determine that by talking to
     the server until /warthog is made available by NFS.

     This would lead to having constructed an errneous dentry tree which we
     can't easily fix. We can end up with a dentry marked as a directory when
     it should actually be a symlink, or we could end up with an apparently
     hardlinked directory.

     With this patch we need not make assumptions about the type of a dentry
     for which we can't retrieve information, nor need we assume we know its
     place in the grand scheme of things until we actually see that place.

This patch reduces the possibility of aliasing in the inode and page caches for
inodes that may be accessed by more than one NFS export. It also reduces the
number of superblocks required for NFS where there are many NFS exports being
used from a server (home directory server + autofs for example).

This in turn makes it simpler to do local caching of network filesystems, as it
can then be guaranteed that there won't be links from multiple inodes in
separate superblocks to the same cache file.

Obviously, cache aliasing between different levels of NFS protocol could still
be a problem, but at least that gives us another key to use when indexing the
cache.

This patch makes the following changes:

 (1) The server record construction/destruction has been abstracted out into
     its own set of functions to make things easier to get right.  These have
     been moved into fs/nfs/client.c.

     All the code in fs/nfs/client.c has to do with the management of
     connections to servers, and doesn't touch superblocks in any way; the
     remaining code in fs/nfs/super.c has to do with VFS superblock management.

 (2) The sequence of events undertaken by NFS mount is now reordered:

     (a) A volume representation (struct nfs_server) is allocated.

     (b) A server representation (struct nfs_client) is acquired.  This may be
     	 allocated or shared, and is keyed on server address, port and NFS
     	 version.

     (c) If allocated, the client representation is initialised.  The state
     	 member variable of nfs_client is used to prevent a race during
     	 initialisation from two mounts.

     (d) For NFS4 a simple pathwalk is performed, walking from FH to FH to find
     	 the root filehandle for the mount (fs/nfs/getroot.c).  For NFS2/3 we
     	 are given the root FH in advance.

     (e) The volume FSID is probed for on the root FH.

     (f) The volume representation is initialised from the FSINFO record
     	 retrieved on the root FH.

     (g) sget() is called to acquire a superblock.  This may be allocated or
     	 shared, keyed on client pointer and FSID.

     (h) If allocated, the superblock is initialised.

     (i) If the superblock is shared, then the new nfs_server record is
     	 discarded.

     (j) The root dentry for this mount is looked up from the root FH.

     (k) The root dentry for this mount is assigned to the vfsmount.

 (3) nfs_readdir_lookup() creates dentries for each of the entries readdir()
     returns; this function now attaches disconnected trees from alternate
     roots that happen to be discovered attached to a directory being read (in
     the same way nfs_lookup() is made to do for lookup ops).

     The new d_materialise_unique() function is now used to do this, thus
     permitting the whole thing to be done under one set of locks, and thus
     avoiding any race between mount and lookup operations on the same
     directory.

 (4) The client management code uses a new debug facility: NFSDBG_CLIENT which
     is set by echoing 1024 to /proc/net/sunrpc/nfs_debug.

 (5) Clone mounts are now called xdev mounts.

 (6) Use the dentry passed to the statfs() op as the handle for retrieving fs
     statistics rather than the root dentry of the superblock (which is now a
     dummy).

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22 23:24:37 -04:00

229 lines
7.6 KiB
C

/*
* linux/fs/nfs/nfs4_fs.h
*
* Copyright (C) 2005 Trond Myklebust
*
* NFSv4-specific filesystem definitions and declarations
*/
#ifndef __LINUX_FS_NFS_NFS4_FS_H
#define __LINUX_FS_NFS_NFS4_FS_H
#ifdef CONFIG_NFS_V4
struct idmap;
/*
* In a seqid-mutating op, this macro controls which error return
* values trigger incrementation of the seqid.
*
* from rfc 3010:
* The client MUST monotonically increment the sequence number for the
* CLOSE, LOCK, LOCKU, OPEN, OPEN_CONFIRM, and OPEN_DOWNGRADE
* operations. This is true even in the event that the previous
* operation that used the sequence number received an error. The only
* exception to this rule is if the previous operation received one of
* the following errors: NFSERR_STALE_CLIENTID, NFSERR_STALE_STATEID,
* NFSERR_BAD_STATEID, NFSERR_BAD_SEQID, NFSERR_BADXDR,
* NFSERR_RESOURCE, NFSERR_NOFILEHANDLE.
*
*/
#define seqid_mutating_err(err) \
(((err) != NFSERR_STALE_CLIENTID) && \
((err) != NFSERR_STALE_STATEID) && \
((err) != NFSERR_BAD_STATEID) && \
((err) != NFSERR_BAD_SEQID) && \
((err) != NFSERR_BAD_XDR) && \
((err) != NFSERR_RESOURCE) && \
((err) != NFSERR_NOFILEHANDLE))
enum nfs4_client_state {
NFS4CLNT_STATE_RECOVER = 0,
NFS4CLNT_LEASE_EXPIRED,
};
/*
* struct rpc_sequence ensures that RPC calls are sent in the exact
* order that they appear on the list.
*/
struct rpc_sequence {
struct rpc_wait_queue wait; /* RPC call delay queue */
spinlock_t lock; /* Protects the list */
struct list_head list; /* Defines sequence of RPC calls */
};
#define NFS_SEQID_CONFIRMED 1
struct nfs_seqid_counter {
struct rpc_sequence *sequence;
int flags;
u32 counter;
};
struct nfs_seqid {
struct nfs_seqid_counter *sequence;
struct list_head list;
};
static inline void nfs_confirm_seqid(struct nfs_seqid_counter *seqid, int status)
{
if (seqid_mutating_err(-status))
seqid->flags |= NFS_SEQID_CONFIRMED;
}
/*
* NFS4 state_owners and lock_owners are simply labels for ordered
* sequences of RPC calls. Their sole purpose is to provide once-only
* semantics by allowing the server to identify replayed requests.
*/
struct nfs4_state_owner {
spinlock_t so_lock;
struct list_head so_list; /* per-clientid list of state_owners */
struct nfs_client *so_client;
u32 so_id; /* 32-bit identifier, unique */
atomic_t so_count;
struct rpc_cred *so_cred; /* Associated cred */
struct list_head so_states;
struct list_head so_delegations;
struct nfs_seqid_counter so_seqid;
struct rpc_sequence so_sequence;
};
/*
* struct nfs4_state maintains the client-side state for a given
* (state_owner,inode) tuple (OPEN) or state_owner (LOCK).
*
* OPEN:
* In order to know when to OPEN_DOWNGRADE or CLOSE the state on the server,
* we need to know how many files are open for reading or writing on a
* given inode. This information too is stored here.
*
* LOCK: one nfs4_state (LOCK) to hold the lock stateid nfs4_state(OPEN)
*/
struct nfs4_lock_state {
struct list_head ls_locks; /* Other lock stateids */
struct nfs4_state * ls_state; /* Pointer to open state */
fl_owner_t ls_owner; /* POSIX lock owner */
#define NFS_LOCK_INITIALIZED 1
int ls_flags;
struct nfs_seqid_counter ls_seqid;
u32 ls_id;
nfs4_stateid ls_stateid;
atomic_t ls_count;
};
/* bits for nfs4_state->flags */
enum {
LK_STATE_IN_USE,
NFS_DELEGATED_STATE,
};
struct nfs4_state {
struct list_head open_states; /* List of states for the same state_owner */
struct list_head inode_states; /* List of states for the same inode */
struct list_head lock_states; /* List of subservient lock stateids */
struct nfs4_state_owner *owner; /* Pointer to the open owner */
struct inode *inode; /* Pointer to the inode */
unsigned long flags; /* Do we hold any locks? */
spinlock_t state_lock; /* Protects the lock_states list */
nfs4_stateid stateid;
unsigned int n_rdonly;
unsigned int n_wronly;
unsigned int n_rdwr;
int state; /* State on the server (R,W, or RW) */
atomic_t count;
};
struct nfs4_exception {
long timeout;
int retry;
};
struct nfs4_state_recovery_ops {
int (*recover_open)(struct nfs4_state_owner *, struct nfs4_state *);
int (*recover_lock)(struct nfs4_state *, struct file_lock *);
};
extern struct dentry_operations nfs4_dentry_operations;
extern struct inode_operations nfs4_dir_inode_operations;
/* inode.c */
extern ssize_t nfs4_getxattr(struct dentry *, const char *, void *, size_t);
extern int nfs4_setxattr(struct dentry *, const char *, const void *, size_t, int);
extern ssize_t nfs4_listxattr(struct dentry *, char *, size_t);
/* nfs4proc.c */
extern int nfs4_map_errors(int err);
extern int nfs4_proc_setclientid(struct nfs_client *, u32, unsigned short, struct rpc_cred *);
extern int nfs4_proc_setclientid_confirm(struct nfs_client *, struct rpc_cred *);
extern int nfs4_proc_async_renew(struct nfs_client *, struct rpc_cred *);
extern int nfs4_proc_renew(struct nfs_client *, struct rpc_cred *);
extern int nfs4_do_close(struct inode *inode, struct nfs4_state *state);
extern struct dentry *nfs4_atomic_open(struct inode *, struct dentry *, struct nameidata *);
extern int nfs4_open_revalidate(struct inode *, struct dentry *, int, struct nameidata *);
extern int nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *fhandle);
extern int nfs4_proc_fs_locations(struct inode *dir, struct dentry *dentry,
struct nfs4_fs_locations *fs_locations, struct page *page);
extern struct nfs4_state_recovery_ops nfs4_reboot_recovery_ops;
extern struct nfs4_state_recovery_ops nfs4_network_partition_recovery_ops;
extern const u32 nfs4_fattr_bitmap[2];
extern const u32 nfs4_statfs_bitmap[2];
extern const u32 nfs4_pathconf_bitmap[2];
extern const u32 nfs4_fsinfo_bitmap[2];
extern const u32 nfs4_fs_locations_bitmap[2];
/* nfs4renewd.c */
extern void nfs4_schedule_state_renewal(struct nfs_client *);
extern void nfs4_renewd_prepare_shutdown(struct nfs_server *);
extern void nfs4_kill_renewd(struct nfs_client *);
extern void nfs4_renew_state(void *);
/* nfs4state.c */
struct rpc_cred *nfs4_get_renew_cred(struct nfs_client *clp);
extern u32 nfs4_alloc_lockowner_id(struct nfs_client *);
extern struct nfs4_state_owner * nfs4_get_state_owner(struct nfs_server *, struct rpc_cred *);
extern void nfs4_put_state_owner(struct nfs4_state_owner *);
extern void nfs4_drop_state_owner(struct nfs4_state_owner *);
extern struct nfs4_state * nfs4_get_open_state(struct inode *, struct nfs4_state_owner *);
extern void nfs4_put_open_state(struct nfs4_state *);
extern void nfs4_close_state(struct nfs4_state *, mode_t);
extern void nfs4_state_set_mode_locked(struct nfs4_state *, mode_t);
extern void nfs4_schedule_state_recovery(struct nfs_client *);
extern void nfs4_put_lock_state(struct nfs4_lock_state *lsp);
extern int nfs4_set_lock_state(struct nfs4_state *state, struct file_lock *fl);
extern void nfs4_copy_stateid(nfs4_stateid *, struct nfs4_state *, fl_owner_t);
extern struct nfs_seqid *nfs_alloc_seqid(struct nfs_seqid_counter *counter);
extern int nfs_wait_on_sequence(struct nfs_seqid *seqid, struct rpc_task *task);
extern void nfs_increment_open_seqid(int status, struct nfs_seqid *seqid);
extern void nfs_increment_lock_seqid(int status, struct nfs_seqid *seqid);
extern void nfs_free_seqid(struct nfs_seqid *seqid);
extern const nfs4_stateid zero_stateid;
/* nfs4xdr.c */
extern uint32_t *nfs4_decode_dirent(uint32_t *p, struct nfs_entry *entry, int plus);
extern struct rpc_procinfo nfs4_procedures[];
struct nfs4_mount_data;
/* callback_xdr.c */
extern struct svc_version nfs4_callback_version1;
#else
#define nfs4_close_state(a, b) do { } while (0)
#endif /* CONFIG_NFS_V4 */
#endif /* __LINUX_FS_NFS_NFS4_FS.H */