This adds the superblock as a key for glock lookups. Since the glocks
are already stored in a per-superblock table, this has no effect at
the moment. Later on this will change though.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We can take advantage of the slab allocator to ensure that all the list
heads and the spinlock (plus one or two other fields) are initialised
by slab to speed up allocation of glocks.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Remove the unused sync feature from glocks. This is currently done by
calling the required functions to sync pages/blocks directly so this
code isn't needed.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
For all the usual reasons of enforcing correctness and potentially
reducing code size, this patch makes the glock operations const.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch allows the simultaneous mounting of gfs2meta and gfs2
filesystems. A restriction however is that a gfs2meta fs may only be
mounted if its corresponding gfs2 filesystem is also mounted. Also, a
gfs2 filesystem cannot be unmounted before its gfs2meta filesystem.
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
log_refund() incorrectly assumed that if a transaction had been touched, it
always committed buffers to the incore log. Thus, when you got around to
flushing the log, you would need one more block than you committed, to account
for the header. So it automatically set reserved to 1, which had the effect of
making sdp->sd_log_blks_reserved one greater when you got to gfs2_log_flush().
However, if you don't actually commit anything to the incore log between
flushes, you don't need the header, because you aren't writing anything out.
With this patch, log_refund() only increments reservered to account for the
header if something has been committed since the last flush.
Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
I noticed the gfs2_scand seemed to be taking a lot of CPU,
so in order to cut that down a bit, here is a patch. Firstly
the type of a glock is a constant during its lifetime, so that
its possible to check this without needing locking. I've moved
the (common) case of testing for an inode glock outside of
the glmutex lock.
Also there was a mutex left over from when the glock cache was
master of the inode cache. That isn't required any more so I've
removed that too.
There is probably scope for further speed ups in the future
in this area.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This should clarify the logic in gfs2_releasepage() relating to
error handling as well as making the response to errors a bit
more graceful.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
A list_del should have been a list_del_init in lops.c which was
resulting in incorrect status returns from list_empty().
Signed-off-by: Steven Whitheouse <swhiteho@redhat.com>
This fixes a memory leak of struct gfs2_bufdata and also some
problems in the ordered write handling code. It needs a bit
more testing, but I believe that the reference counting of
ordered write buffers should now be correct.
This is aimed at fixing Red Hat bugzilla: #201028 and #201082
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
recovery.c add a brelse to deal with gfs2_replay_read_block being called
twice on the same block.
add a dput to drop the ref count on the root inode.
This was causing lingering glocks and thus causing
a mount failure to hang.
Fix a endian conversion macro that was was swizzling
16bits when it should have been swizzling 32.
Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
In some cases we can enter write page without there being buffers
attached to the page. In this case the function to add gfs2_bufdata
to the buffers fails sliently causing further failures down the
stack.
This fix ensures that we always add buffers in writepage if they
didn't already exist (mmap is one way to trigger this).
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
When the result of a posix lock request is read it needs to be matched up
with the correct waiting request. The owner field needs to be used in the
comparison since more than one process may be waiting for locks on the
same file.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Use the gfs2_ prefix on the register/unregister functions for the lock
modules. The gfs_ prefix was left from an old idea on how to share these
with gfs1.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Mmapped files were able to trigger a lock ordering bug. Private
maps do not need to take the glock so early on. Shared maps do
unfortunately, however we can get around that by adding a flag
into the flags for the struct gfs2_file. This only works because
we are taking an exclusive lock at this point, so we know that
nobody else can be racing with us.
Fixes Red Hat bugzilla: #201196
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This was a nasty bug which resulted in corruption of hash tables
in the directory code with larger directories. We forgot to
increment a pointer in the read/write routines internal to the
directory code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We need to use fl_owner instead of fl_pid to track the owner of a posix
lock. Pass the owner value out to user space where cluster plocks are
managed.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Tidy up some files and remove an unused routine in meta_io.h. Also
added a bit of extra debugging in meta_io.h.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This means that we don't need to create a special inode just to contain
a struct address_space in order to read a single disk block. Instead
we read the disk block directly. Its slightly faster, and uses slightly
less memory, but the real reason for doing this is that it removes a
special case from the glock code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The remaining routines in page.c were all only used in one other
file, so they are now moved into the files where they are referenced
and made static. Thus page.[ch] are no longer required.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Tidy up gfs2_unstuffer_page by:
a) Moving it into bmap.c
b) Making it static
c) Calling it directly from gfs2_unstuff_dinode
d) Updating all callers of gfs2_unstuff_dinode due to one less
required argument.
It doesn't change the behaviour at all.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per comments received, alter the GFS2 direct I/O path so that
it uses the standard read functions "out of the box". Needs a
small change to one of the VFS functions. This reduces the size
of the code quite a lot and also removes the need for one new export.
Some more work remains to be done, but this is the bones of the
thing.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
traced the "umount hang due to spurious glock" issue that I was having
with gfs2meta. It's in the do_gfs2_set_flags function, which does a
gfs2_holder_init as well as a gfs2_glock_nq_init (increases ref count by
2 instead of 1).
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Typo causes the error value from the wrong lock to be checked.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fix an endian coversion bug in log.c spotted by Kevin Anderson.
Cc: Kevin Anderson <kanderso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fix a use after free bug in dir.c spotted by Kevin Anderson.
Cc: Kevin Anderson <kanderso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Update the NFS filehandles so that they contain the file type.
Signed-off-by: Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
A missing initialisation when creating a new on disk inode.
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We must not call GFP_KERNEL memory allocations while we
are holding the log lock (read or write) since that may
trigger a log flush resulting in a deadlock.
Eventually we need to fix the locking in log.c, for now
this solves the problem at the expense of freeing up memory
as fast as we would like to. This needs to be revisited
later on.
Cc: Kevin Anderson <kanderso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This adds a generation number for the eventual use of NFS to the
ondisk inode. Its backward compatible with the current code since
it doesn't really matter what the generation number is to start with,
and indeed since its set to zero, due to it being taken from padding
in both the inode and rgrp header, it should be fine.
The eventual plan is to use this rather than no_formal_ino in the
NFS filehandles. At that point no_formal_ino will be unused.
At the same time we also add a releasepages call back to the
"normal" address space for gfs2 inodes. Also I've removed a
one-linrer function thats not required any more.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This fixes a bug where we were releasing a page incorrectly
sometimes when reading a stuffed file. This fixes the bug
that Kevin reported when using Xen.
Cc: Kevin Anderson <kanderso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This really is the correct fix this time. We just ignore all
glocks associated with inodes until the inodes are pushed
from the inode cache. At that point the glocks are queued for
reclaim, so we don't need to do it here.
Also fix one or two other minor bugs.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Under certain circumstances the glock scanning logic would
demote locks which ought not to have been selected for
demotion.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Change our one existing old-style lock initialiser to a new-style
one. This allows the lock validator to work as intended.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Update GFS2 for dhowells API changes.
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This removes the call in GFS2 to tty_write_message and replaces
it with a printk. As the export was added by GFS2, we remove this
as well.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This removes one instance of GFP_NOFAIL from the glock callback
function. It also fixes a bug where a , was used at a line end
rather than ; causing unintended results.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Don't use a wrapper for generic_file_sendfile but call it
directly.
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
List new development mailing list and correct web page url.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch makes the following needlessly global code static:
- eaops.c: struct gfs2_security_eaops
- rgrp.c: gfs2_free_uninit_di()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
gfs2_repermission is just a wrapper for permission, so remove it and
call permission directly where required.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The rename inode operation was trying to lock the same
inode twice in the case of renaming with the source
and destination directories the same. We now test for
this and just lock once.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Nick Piggin's comments on lkml, remove the unused ra_state
variable.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
fs/gfs2/locking/dlm/thread.c: In function ‘process_complete’:
fs/gfs2/locking/dlm/thread.c:56: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘uint64_t’
fs/gfs2/locking/dlm/thread.c:69: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 4 has type ‘uint64_t’
fs/gfs2/locking/dlm/thread.c:102: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘uint64_t’
fs/gfs2/locking/dlm/thread.c:124: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 4 has type ‘uint64_t’
fs/gfs2/locking/dlm/thread.c:146: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 4 has type ‘uint64_t’
fs/gfs2/locking/dlm/thread.c:148: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 4 has type ‘uint64_t’
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
fs/gfs2/glock.c: In function ‘gfs2_holder_get’:
fs/gfs2/glock.c:439: warning: passing argument 2 of ‘set_bit’ from incompatible pointer type
fs/gfs2/glock.c: In function ‘rq_promote’:
fs/gfs2/glock.c:512: warning: passing argument 2 of ‘set_bit’ from incompatible pointer type
fs/gfs2/glock.c:526: warning: passing argument 2 of ‘set_bit’ from incompatible pointer type
...
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Include the glock in the transaction, even when not journaling
data in order that ordered write data will be correctly flushed
when the lock is released.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch fixes the way we have been dealing with unlinked,
but still open files. It removes all limits (other than memory
for inodes, as per every other filesystem) on numbers of these
which we can support on GFS2. It also means that (like other
fs) its the responsibility of the last process to close the file
to deallocate the storage, rather than the person who did the
unlinking. Note that with GFS2, those two events might take place
on different nodes.
Also there are a number of other changes:
o We use the Linux inode subsystem as it was intended to be
used, wrt allocating GFS2 inodes
o The Linux inode cache is now the point which we use for
local enforcement of only holding one copy of the inode in
core at once (previous to this we used the glock layer).
o We no longer use the unlinked "special" file. We just ignore it
completely. This makes unlinking more efficient.
o We now use the 4th block allocation state. The previously unused
state is used to track unlinked but still open inodes.
o gfs2_inoded is no longer needed
o Several fields are now no longer needed (and removed) from the in
core struct gfs2_inode
o Several fields are no longer needed (and removed) from the in core
superblock
There are a number of future possible optimisations and clean ups
which have been made possible by this patch.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The caller ensures that ea_list_i() is never called with an
invalid type, so lets BUG() if we see one. This clears up
a couple of compiler warnings too.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We can reclaim some space by moving fields in some structures
in order to allow them to pack better on 64 bit architectures.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This should fix the mount problems with gfs2 and selinux.
Signed-off-by: Ryan O'Hara <rohara@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This adds support to GFS2 for selinux extended attributes. There is a
known bug in gfs2_ea_get() which is believed to be independant of this
patch. Further patches will follow once that bug is fixed in order to
make GFS2 use as much of the generic eattr infrastructure as possible.
Signed-off-by: Ryan O'Hara <rohara@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Setup the lock_dlm kobject before setting up the dlm lockspace instead
of after. We want to use the sysfs files to detect the mount without
having to wait for the dlm setup which can take a while.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This adds some extra debugging to glock.c and changes
inode.c's deallocation code to call the debugging code
at a suitable moment. I'm chasing down a particular bug
to do with deallocation at the moment and the code can
go again once the bug is fixed.
Also this includes the first part of some changes to unify
the Linux struct inode and GFS2's struct gfs2_inode. This
transformation will happen in small parts over the next short
period.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We no longer use semaphores, everything has been converted to
mutex or rwsem, so we don't need to include this header any more.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch drops the log spinlock when an I/O error occurs
to avoid any possible problems in case of blocking or
recursion in the I/O error routine. It also has a few
cosmetic changes to tidy up various other files.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Since they are small and will be inlined by the complier,
it makes sense to merge the contents of bits.[ch] into
rgrp.c
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The ref count of certain glock's got elevated too far during unlink
which caused umount to fail. This fixes it.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The attributes logic for immutable was wrong so that there was
not way to remove this attribute once set. This fixes the
bug.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The gh_owner field shouldn't be set or reset outside the glock code.
These were left over from when recursive locking was allowed. It
isn't any more, so they are not needed.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The original code ordered the blocks allocated in the build_height
routine backwards causing excessive disk seeks during a read of the
metadata. This patch reverses the order to try and reduce disk seeks.
Example: A five level metadata tree, I = Inode, P = Pointers, D = Data
You need to read the blocks in the order:
I P5 P4 P3 P2 P1 D
in order to read a single data block. The new code now orders the blocks
in this way. The old code used to order them as:
I P1 P2 P3 P4 P5 D
requiring two extra seeks on average. Note that for files which are
grown by gradual extension rather than by truncate or by llseek/write
at a large offset, this doesn't apply. In the case of writing to a
file linearly, this routine will only be called upon to extend the
height of the tree by one block at a time, so the ordering is
determined by when its called rather than by the internals of the
routine itself. Optimising that part of the ordering is a much
harder problem.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This adds readpages support (and also corrects a small bug in
the readpage error path at the same time). Hopefully this will
improve performance by allowing GFS to submit larger lumps of
I/O at a time.
In order to simplify the setting of BH_Boundary, it currently gets
set when we hit the end of a indirect pointer block. There is
always a boundary at this point with the current allocation code.
It doesn't get all the boundaries right though, so there is still
room for improvement in this.
See comments in fs/gfs2/ops_address.c for further information about
readpages with GFS2.
Signed-off-by: Steven Whitehouse
Well, I managed to track down the bug in gfs2 that was causing
my grief. Below is a patch for the problem. Please incorporate
as you see fit. Or should I say: as you see git.
The problem was basically that you never set d_ops for the root
inode, so the wrong hash algorithm was being used. But only for
the root directory. Turns out that if I used subdirectories, it
used the proper hash and my files were found just fine.
Signed-off-by: Robert S Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As pointed out by Wendy Cheng, the logic in GFS2's writepage() function
wasn't quite right with respect to invalidating pages when a file has been
truncated. This patch fixes that.
CC: Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch contains the following possible cleanups:
- make needlessly global code static
- #if 0 unused functions
- remove the following global function that was both unused and
unimplemented:
- super.c: gfs2_do_upgrade()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Despite my earlier careful search, there was a recursive lock left
in the deallocation code. This removes it. It also should speed up
deallocation be reducing the number of locking operations which take
place by using two "try lock" operations on the two locks involved in
inode deallocation which allows us to grab the locks out of order
(compared with NFS which grabs the inode lock first and the iopen
lock later). It is ok for us to fail while doing this since if it
does fail it means that someone else is still using the inode and
thus it wouldn't be possible to deallocate anyway.
This fixes the bug reported to me by Rob Kenna.
Cc: Rob Kenna <rkenna@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This saves the journal recovery result and makes it visible through sysfs.
User space needs to know if the node actually recovered the journal or
tried and gave up.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch changes the last user of recursive locking so that
it no longer needs this feature and removes it from the glock
layer. This makes the glock code a lot simpler and easier to
understand. Its also a prerequsite to adding support for the
AOP_TRUNCATED_PAGE return code (or at least it is if you don't
want your brain to melt in the process)
I've left in a couple of checks just in case there is some place
else in the code which is still using this feature that I didn't
spot yet, but they can probably be removed long term.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We don't need the inherited flags since this action can be
implied by setting the flags on directories where they
wouldn't otherwise make sense. It reduces the number of extra
flags by two. Also updated the list of flags to take account of
one extra ext2/3 flag.
Cc: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Remove select of SYSFS as requested by Greg KH. Change whitespace to
tabs rather than spaces in places where it was incorrect and removed
'default m' as suggested by Adrian Bunk.
Reorganised Makefile as suggested by Sam Ravnborg.
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Adrian Bunk <bunk@stusta.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Andrew Morton's comments, remove uneeded casts and use
wait_event_interruptible() rather than open code the wait.
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
1. Comment whitespace fix
2. Removed unused header files from dir.c
3. Split the gfs2_dir_get_buffer() function into two functions
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
In order to make the file and line number reporting work
correctly, this has been moved back into the header file.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is one of the changes related to journal recovery I mentioned a
couple weeks ago. We can get into a situation where there are only
readonly nodes currently mounting the fs, but there are journals that need
to be recovered. Since the readonly nodes can't recover journals, the
next rw mounter needs to go through and check all journals and recover any
that are dirty (i.e. what the first node to mount the fs does). This rw
mounter needs to skip the journals held by the existing readonly nodes.
Skipping those journals amounts to using the TRY flag on the journal locks
so acquiring the lock of a journal held by a readonly node will fail
instead of blocking indefinately.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>