Commit Graph

628 Commits

Author SHA1 Message Date
Benjamin Marzinski
c4f68a130f [GFS2] delay glock demote for a minimum hold time
When a lot of IO, with some distributed mmap IO, is run on a GFS2 filesystem in
a cluster, it will deadlock. The reason is that do_no_page() will repeatedly
call gfs2_sharewrite_nopage(), because each node keeps giving up the glock
too early, and is forced to call unmap_mapping_range(). This bumps the
mapping->truncate_count sequence count, forcing do_no_page() to retry. This
patch institutes a minimum glock hold time a tenth a second.  This insures
that even in heavy contention cases, the node has enough time to get some
useful work done before it gives up the glock.

A second issue is that when gfs2_glock_dq() is called from within a page fault
to demote a lock, and the associated page needs to be written out, it will
try to acqire a lock on it, but it has already been locked at a higher level.
This patch puts makes gfs2_glock_dq() use the work queue as well, to avoid this
issue. This is the same patch as Steve Whitehouse originally proposed to fix
this issue, execpt that gfs2_glock_dq() now grabs a reference to the glock
before it queues up the work on it.

Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:48 +01:00
Abhijith Das
d1e2777d4f [GFS2] panic after can't parse mount arguments
When you try to mount gfs2 with -o garbage, the mount fails and the gfs2
superblock is deallocated and becomes NULL. The vfs comes around later
on and calls gfs2_kill_sb. At this point the hidden gfs2 superblock
pointer (sb->s_fs_info) is NULL and dereferencing it through
gfs2_meta_syncfs causes the panic. (the other function call to
gfs2_delete_debugfs_file() succeeds because this function already checks
for a NULL pointer)

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:46 +01:00
Bob Peterson
ec217e0ece [GFS2] Patch to protect sd_log_num_jdata
This is a patch to GFS2 to protect sd_log_num_jdata with the
gfs2_log_lock.  Without this patch, there is a timing window
where you can get hit the following assert from function
gfs2_log_flush():

gfs2_assert_withdraw(sdp,
			sdp->sd_log_num_buf + sdp->sd_log_num_jdata ==
			sdp->sd_log_commited_buf +
			sdp->sd_log_commited_databuf);

I've tested it on my roth cluster and it fixes the problem.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:43 +01:00
Abhijith Das
a947e03356 [GFS2] Wendy's dump lockname in hex & fix glock dump
With this patch, gfs2 glockdump through the debugfs filesystem will only
dump glocks for the specified filesystem instead of all glocks. Also, to
aid debugging, the glock number is dumped in hex instead of decimal.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Abhijith Das <adas@redhat.com>
2007-10-10 08:55:41 +01:00
Wendy Cheng
a13b8c5f23 [GFS2] Reduce truncate IO traffic
Current GFS2 setattr call unconditionally invokes do_shrink even the
requested size and actual file size are equal. This has generated large
amount of extra IOs found during NFS benchmark runs. This patch moves
the relevant logic out of shrink code path. Since setattr is a system
call, the time stamps update is still required.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:36 +01:00
Benjamin Marzinski
9a5ad13856 [GFS2] Add NULL entry to token table
match_token() was returning garbage data instead of a fail value. This data
happened to match a valid option id for an option that required an argument (in
this case, lockproto=%s) For match_token() to correctly fail if the option
doesn't match any of the tokens, the token table must end with a NULL entry.
This patch adds the NULL entry.

Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:34 +01:00
Steven Whitehouse
382e6e256b [GFS2] Add a missing gfs2_trans_add_bh()
This was missing from the dir_split_leaf() function although in
most cases its not a problem due to other functions having
already previously called gfs2_trans_add_bh. This makes certain
that it is correct.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Wendy Cheng <wcheng@redhat.com>
2007-10-10 08:55:32 +01:00
Steven Whitehouse
bb3b0e3df5 [GFS2] Clean up invalidatepage/releasepage
This patch fixes some bugs relating to journaled data files by cleaning
up the gfs2_invalidatepage() and gfs2_releasepage() functions. We now
never block during gfs2_releasepage(), instead we always either release
or refuse to release depending on the status of the buffers.

This fixes Red Hat bugzillas #248969 and #252392.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
2007-10-10 08:55:29 +01:00
Abhijith Das
2d9a4bbf6d [GFS2] Fix quota do_list operation hang
This is the filesystem part of the patches to fix this bz. There are
additional userland patches (gfs2_quota, libgfs2) for the complete
solution. This patch adds a new field qu_ll_next to the gfs2_quota
structure. This field allows us to create linked lists of quotas in the
ondisk quota inode. Instead of scanning through the entire sparse quota
file for valid quotas, we can now simply walk through the user and group
quota linked lists to perform the do_list operation.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:27 +01:00
Denis Cheng
34eaae398e [GFS2] fixed a NULL pointer assignment BUG
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:24 +01:00
Abhijith Das
0fd5355470 [GFS2] Force unstuff of hidden quota inode
This patch forcibly unstuffs (if stuffed) the hidden quota inode at the
first availble opportunity. In any practical scenario the quota inode
won't be stuffed, so this is ok to do. Unstuffing the quota inode allows
us to ignore the case of a stuffed quota inode in gfs2_adjust_quota().

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:22 +01:00
Denis Cheng
5d35e31f43 [GFS2] better code for translating characters
the original code could work, but I think this code could work better.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:20 +01:00
Denis Cheng
2d3ba1ea97 [GFS2] unneeded typecast
sb->s_fs_info is a void pointer, thus the type cast is not needed.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:17 +01:00
Denis Cheng
adb4ec13cd [GFS2] use list_for_each_entry instead
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:15 +01:00
Bob Peterson
75be73a824 [GFS2] Ensure journal file cache is flushed after recovery
This is for bugzilla bug #248176: GFS2: invalid metadata block

Patches 1 thru 3 were accepted upstream, but there were problems
with 4 and 5.  Those issues have been resolved and now the recovery
tests are passing without errors.  This code has gone through
41 * 3 successful gfs2 recovery tests before it hit an
unrelated (openais) problem.  I'm continuing to test it.

This is a complete rewrite of patch 5 for bug #248176, written by
Steve Whitehouse.  This is referred to in the bugzilla record as
"new 6" and "a different solution".

The problem was that the journal inodes, although protected by
a glock, were not synched with the other nodes because they don't
use the inode glock synch operations (i.e. no "glops" were defined).
Therefore, journal recovery on a journal-recovering node were causing
the blocks to get out of sync with the node that was actually trying
to use that journal as it comes back up from a reboot.

There are two possible solutions: (1) To make the journals use the
normal inode glock sync operations, or (2) To make the journal
operations take effect immediately (i.e. no caching).  Although
option 1 works, it turns out to be a lot more code.  Steve opted
for option 2, which is much simpler and therefore less prone to
regression errors.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

--
2007-10-10 08:55:12 +01:00
Bob Peterson
5f3eae7546 [GFS2] invalid metadata block - REVISED
This is for bugzilla bug #248176: GFS2: invalid metadata block

Patches 1 thru 3 were accepted upstream, but there were problems
with 4 and 5.  Those issues have been resolved and now the recovery
tests are passing without errors.  This code has gone through
41 * 3 successful gfs2 recovery tests before it hit an
unrelated (openais) problem.

This is a complete rewrite of patch 4 for bug #248176.

Part of the problem was that inodes were being recycled
before their buffers were flushed to the journal logs.
Another problem was that the clone bitmaps were being
searched for deleted inodes to recycle, but only the
"real" bitmaps should be searched for that purpose.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:10 +01:00
Steven Whitehouse
8fbbfd214c [GFS2] Reduce number of gfs2_scand processes to one
We only need a single gfs2_scand process rather than the one
per filesystem which we had previously. As a result the parameter
determining the frequency of gfs2_scand runs becomes a module
parameter rather than a mount parameter as it was before.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:08 +01:00
Denis Cheng
ca5a939b33 [GFS2] use the declaration of gfs2_dops in the header file instead
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:05 +01:00
Denis Cheng
4ef290025c [GFS2] mark struct *_operations const
these struct *_operations are all method tables, thus should be const.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:03 +01:00
Bob Peterson
0f8468c8be [GFS2] Detach buf data during in-place writeback
This is patch 5 of 5 for bug #248176

Metadata corruption was occurring because page references weren't
being removed in all cases.  I previously added a function called
detach_bufdata, but I discovered there already WAS a function out
there to do the job.  It's called gfs2_meta_cache_flush.  So I added
a call to that to remove the page references.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:01 +01:00
Denis Cheng
cee23c79d0 [GFS2] use an temp variable to reduce a spin_unlock
this is more clear.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:54:58 +01:00
Bob Peterson
6760bdcd03 [GFS2] Prevent infinite loop in try_rgrp_unlink()
This is patch three of five for bug #248176.

The try_rgrp_unlink code in rgrp.c had an infinite loop.  This was
caused because the bitmap function rgblk_search can return a block
less than the "goal" block, in which case it was looping.  The fix is
to make it always march forward as needed.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:54:56 +01:00
Bob Peterson
693ddeabbb [GFS2] Revert part of earlier log.c changes
This is patch 2 of 5 for bug #248176.

The list_move code previously concocted in log.c for bug #238162
(see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=238162#c23)
never runs as bh can now never be NULL at this point.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:54:53 +01:00
Bob Peterson
905d2aefa9 [GFS2] Move some code inside the log lock
This is the first of five patches for bug #248176:

There were still some critical variables being manipulated outside
the log_lock spinlock.  That usually resulted in a hang.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:54:51 +01:00
Steven Whitehouse
7b08fc6201 [GFS2] Fix an oops in glock dumping
This fixes an oops which was occurring during glock dumping due to the
seq file code not taking a reference to the glock. Also this fixes a
memory leak which occurred in certain cases, in turn preventing the
filesystem from unmounting.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:54:49 +01:00
Steve French
afd0942d98 [GFS2] GFS2 not checking pointer on create when running under nfsd
When looking at an unrelated problem, I noticed that nfsd does not
set nameidata pointer on create (ie nd is NULL).  This should
cause an oops in some cases in which when NFSd is mounted over GFS2.

Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:54:46 +01:00
Jesper Juhl
aa0481e58a [GFS2] Clean up duplicate includes in fs/gfs2/
This patch cleans up duplicate includes in
	fs/gfs2/

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:54:44 +01:00
Josef Whiter
26caee5bc6 [GFS2] Fix calculation of demote state
If a glock is in the exclusive state and a request for demote to
deferred has been received, then further requests for demote to
shared are being ignored. This patch fixes that by ensuring that
we demote to unlocked in that case.

Signed-off-by: Josef Whiter <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:54:42 +01:00
Steven Whitehouse
87124e581b [GFS2] Fix two races relating to glock callbacks
One of the races relates to referencing a variable while not holding
its protecting spinlock. The patch simply moves the test inside the
spin lock. The other races occurs when a demote to unlocked request
occurs during the time a demote to shared request is already running.
This of course only happens in the case that the lock was in the
exclusive mode to start with. The patch adds a check to see if another
demote request has occurred in the mean time and if it has, then it
performs a second demote.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:54:39 +01:00
NeilBrown
6712ecf8f6 Drop 'size' argument from bio_endio and bi_end_io
As bi_end_io is only called once when the reqeust is complete,
the 'size' argument is now redundant.  Remove it.

Now there is no need for bio_endio to subtract the size completed
from bi_size.  So don't do that either.

While we are at it, change bi_end_io to return void.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-10 09:25:57 +02:00
Pavel Emelyanov
7afaac6202 GFS2: clean up explicit check for mandatory locks
The __mandatory_lock(inode) function makes the same check, but makes the code
more readable.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-09 18:32:46 -04:00
Steven Whitehouse
d18c4d687d [GFS2] Revert remounting w/o acl option leaves acls enabled
This reverts commit 569a7b6c2e. The
code was correct originally. The default setting for ACLs after a
remount should be to be the same as before the remount.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-08-14 10:34:40 +01:00
Steven Whitehouse
b9af7ca6d3 [GFS2] Fix setting of inherit jdata attr
Due to a mix up between the jdata attribute and inherit jdata attribute
it has not been possible to set the inherit jdata attribute on
directories. This is now fixed and the ioctl will report the inherit
jdata attribute for directories rather than the jdata attribute as it
did previously. This stems from our need to have the one bit in the
ioctl attr flags mean two different things according to whether the
underlying inode is a directory or not.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-08-14 10:34:11 +01:00
Steven Whitehouse
a867bb28c1 [GFS2] Fix incorrect error path in prepare_write()
The error path in prepare_write() was incorrect in the (very rare) event
that the transaction fails to start. The following prevents a NULL
pointer dereference,

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-08-14 10:33:44 +01:00
Steven Whitehouse
6eefaf61f6 [GFS2] Fix incorrect return code in rgrp.c
The following patch fixes a bug where 0 was being used as a return code
to indicate "nothing to do" when in fact 0 was a valid block location
which might be returned by the function.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-08-14 10:33:15 +01:00
Bob Peterson
24c7387333 [GFS2] soft lockup in rgblk_search
This patch seems to fix the problem described in bugzilla bug 246114.
It was written by Steve Whitehouse with some tweaking by me.

The code was looping in the relatively new section of code designed to
search for and reuse unlinked inodes.  In cases where it was finding an
appropriate inode to reuse, it was looping around and finding the same
block over and over because a "<=" check should have been a "<" when
comparing the goal block to the last unlinked block found.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-08-14 10:32:43 +01:00
Bob Peterson
bdcb88562c [GFS2] soft lockup detected in databuf_lo_before_commit
This is part 2 of the patch for bug #245832, part 1 of which is already
in the git tree.

The problem was that sdp->sd_log_num_databuf was not always being
protected by the gfs2_log_lock spinlock, but the sd_log_le_databuf
(which it is supposed to reflect) was protected.  That meant there
was a timing window during which gfs2_log_flush called
databuf_lo_before_commit and the count didn't match what was
really on the linked list in that window.  So when it ran out of
items on the linked list, it decremented total_dbuf from 0 to -1 and
thus never left the "while(total_dbuf)" loop.

The solution is to protect the variable sdp->sd_log_num_databuf so
that the value will always match the contents of the linked list,
and therefore the number will never go negative, and therefore, the
loop will be exited properly.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-08-14 10:32:04 +01:00
Christoph Hellwig
0af1a45046 rename setlease to generic_setlease
Make it a little more clear that this is the default implementation for
the setleast operation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:43 -07:00
Paul Mundt
20c2df83d2 mm: Remove slab destructors from kmem_cache_create().
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-20 10:11:58 +09:00
Nick Piggin
83c54070ee mm: fault feedback #2
This patch completes Linus's wish that the fault return codes be made into
bit flags, which I agree makes everything nicer.  This requires requires
all handle_mm_fault callers to be modified (possibly the modifications
should go further and do things like fault accounting in handle_mm_fault --
however that would be for another patch).

[akpm@linux-foundation.org: fix alpha build]
[akpm@linux-foundation.org: fix s390 build]
[akpm@linux-foundation.org: fix sparc build]
[akpm@linux-foundation.org: fix sparc64 build]
[akpm@linux-foundation.org: fix ia64 build]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Still apparently needs some ARM and PPC loving - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:41 -07:00
Nick Piggin
d0217ac04c mm: fault feedback #1
Change ->fault prototype.  We now return an int, which contains
VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
 FAULT_RET_ code tells the VM whether a page was found, whether it has been
locked, and potentially other things.  This is not quite the way he wanted
it yet, but that's changed in the next patch (which requires changes to
arch code).

This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
that a page is locked which requires filemap_nopage to go away (because we
can no longer remain backward compatible without that flag), but we were
going to do that anyway.

struct fault_data is renamed to struct vm_fault as Linus asked. address
is now a void __user * that we should firmly encourage drivers not to use
without really good reason.

The page is now returned via a page pointer in the vm_fault struct.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:41 -07:00
Nick Piggin
54cb8821de mm: merge populate and nopage into fault (fixes nonlinear)
Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
the virtual address -> file offset differently from linear mappings.

->populate is a layering violation because the filesystem/pagecache code
should need to know anything about the virtual memory mapping.  The hitch here
is that the ->nopage handler didn't pass down enough information (ie.  pgoff).
 But it is more logical to pass pgoff rather than have the ->nopage function
calculate it itself anyway (because that's a similar layering violation).

Having the populate handler install the pte itself is likewise a nasty thing
to be doing.

This patch introduces a new fault handler that replaces ->nopage and
->populate and (later) ->nopfn.  Most of the old mechanism is still in place
so there is a lot of duplication and nice cleanups that can be removed if
everyone switches over.

The rationale for doing this in the first place is that nonlinear mappings are
subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
to duplicate the synchronisation logic rather than just consolidate the two.

After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
pagecache.  Seems like a fringe functionality anyway.

NOPAGE_REFAULT is removed.  This should be implemented with ->fault, and no
users have hit mainline yet.

[akpm@linux-foundation.org: cleanup]
[randy.dunlap@oracle.com: doc. fixes for readahead]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:41 -07:00
Nick Piggin
d00806b183 mm: fix fault vs invalidate race for linear mappings
Fix the race between invalidate_inode_pages and do_no_page.

Andrea Arcangeli identified a subtle race between invalidation of pages from
pagecache with userspace mappings, and do_no_page.

The issue is that invalidation has to shoot down all mappings to the page,
before it can be discarded from the pagecache.  Between shooting down ptes to
a particular page, and actually dropping the struct page from the pagecache,
do_no_page from any process might fault on that page and establish a new
mapping to the page just before it gets discarded from the pagecache.

The most common case where such invalidation is used is in file truncation.
This case was catered for by doing a sort of open-coded seqlock between the
file's i_size, and its truncate_count.

Truncation will decrease i_size, then increment truncate_count before
unmapping userspace pages; do_no_page will read truncate_count, then find the
page if it is within i_size, and then check truncate_count under the page
table lock and back out and retry if it had subsequently been changed (ptl
will serialise against unmapping, and ensure a potentially updated
truncate_count is actually visible).

Complexity and documentation issues aside, the locking protocol fails in the
case where we would like to invalidate pagecache inside i_size.  do_no_page
can come in anytime and filemap_nopage is not aware of the invalidation in
progress (as it is when it is outside i_size).  The end result is that
dangling (->mapping == NULL) pages that appear to be from a particular file
may be mapped into userspace with nonsense data.  Valid mappings to the same
place will see a different page.

Andrea implemented two working fixes, one using a real seqlock, another using
a page->flags bit.  He also proposed using the page lock in do_no_page, but
that was initially considered too heavyweight.  However, it is not a global or
per-file lock, and the page cacheline is modified in do_no_page to increment
_count and _mapcount anyway, so a further modification should not be a large
performance hit.  Scalability is not an issue.

This patch implements this latter approach.  ->nopage implementations return
with the page locked if it is possible for their underlying file to be
invalidated (in that case, they must set a special vm_flags bit to indicate
so).  do_no_page only unlocks the page after setting up the mapping
completely.  invalidation is excluded because it holds the page lock during
invalidation of each page (and ensures that the page is not mapped while
holding the lock).

This also allows significant simplifications in do_no_page, because we have
the page locked in the right place in the pagecache from the start.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:41 -07:00
Marc Eshel
60446067ba gfs2: stop giving out non-cluster-coherent leases
Since gfs2 can't prevent conflicting opens or leases on other nodes, we
probably shouldn't allow it to give out leases at all.

Put the newly defined lease operation into use in gfs2 by turning off
lease, unless we're using the "nolock' locking module (in which case all
locking is local anyway).

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
2007-07-18 19:17:19 -04:00
Satyam Sharma
3bd858ab1c Introduce is_owner_or_cap() to wrap CAP_FOWNER use with fsuid check
Introduce is_owner_or_cap() macro in fs.h, and convert over relevant
users to it. This is done because we want to avoid bugs in the future
where we check for only effective fsuid of the current task against a
file's owning uid, without simultaneously checking for CAP_FOWNER as
well, thus violating its semantics.
[ XFS uses special macros and structures, and in general looked ...
untouchable, so we leave it alone -- but it has been looked over. ]

The (current->fsuid != inode->i_uid) check in generic_permission() and
exec_permission_lite() is left alone, because those operations are
covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations
falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone.

Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Cc: Al Viro <viro@ftp.linux.org.uk>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 12:00:03 -07:00
Christoph Hellwig
a569425512 knfsd: exportfs: add exportfs.h header
currently the export_operation structure and helpers related to it are in
fs.h.  fs.h is already far too large and there are very few places needing the
export bits, so split them off into a separate header.

[akpm@linux-foundation.org: fix cifs build]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:06 -07:00
Alexey Dobriyan
aa0ac36518 Remove capability.h from mm.h
I forgot to remove capability.h from mm.h while removing sched.h!  This
patch remedies that, because the only inline function which was using
CAP_something was made out of line.

Cross-compile tested without regressions on:

	all powerpc defconfigs
	all mips defconfigs
	all m68k defconfigs
	all arm defconfigs
	all ia64 defconfigs

	alpha alpha-allnoconfig alpha-defconfig alpha-up
	arm
	i386 i386-allnoconfig i386-defconfig i386-up
	ia64 ia64-allnoconfig ia64-defconfig ia64-up
	m68k
	mips
	parisc parisc-allnoconfig parisc-defconfig parisc-up
	powerpc powerpc-up
	s390 s390-allnoconfig s390-defconfig s390-up
	sparc sparc-allnoconfig sparc-defconfig sparc-up
	sparc64 sparc64-allnoconfig sparc64-defconfig sparc64-up
	um-x86_64
	x86_64 x86_64-allnoconfig x86_64-defconfig x86_64-up

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:45 -07:00
Linus Torvalds
1b21f458dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (57 commits)
  [GFS2] Accept old format NFS filehandles
  [GFS2] Small fixes to logging code
  [DLM] dump more lock values
  [GFS2] Remove i_mode passing from NFS File Handle
  [GFS2] Obtaining no_formal_ino from directory entry
  [GFS2] git-gfs2-nmw-build-fix
  [GFS2] System won't suspend with GFS2 file system mounted
  [GFS2] remounting w/o acl option leaves acls enabled
  [GFS2] inode size inconsistency
  [DLM] Telnet to port 21064 can stop all lockspaces
  [GFS2] Fix gfs2_block_truncate_page err return
  [GFS2] Addendum to the journaled file/unmount patch
  [GFS2] Simplify multiple glock aquisition
  [GFS2] assertion failure after writing to journaled file, umount
  [GFS2] Use zero_user_page() in stuffed_readpage()
  [GFS2] Remove bogus '\0' in rgrp.c
  [GFS2] Journaled file write/unstuff bug
  [DLM] don't require FS flag on all nodes
  [GFS2] Fix deallocation issues
  [GFS2] return conflicts for GETLK
  ...
2007-07-10 13:56:13 -07:00
Steven Whitehouse
3ebf44902f [GFS2] Accept old format NFS filehandles
On Tue, 2007-07-10 at 10:06 +0100, Christoph Hellwig wrote:
> > -#define GFS2_LARGE_FH_SIZE 10
> > -
> > -struct gfs2_fh_obj {
> > -   struct gfs2_inum_host this;
> > -   u32 imode;
> > -};
> > +#define GFS2_LARGE_FH_SIZE 8
>
> Because gfs2_decode_fh only accepts file handles with GFS2_LARGE_FH_SIZE
> or GFS2_LARGE_FH_SIZE you don't accept filehandles sent out by and older
> gfs version anymore.  Stale filehandles because of a new kernel version
> are a big no-no, so please add back code to handle the old filehandles
> on the decode side.
>

This should fix that problem I think since its only relating to end of
the fh we can just ignore that field in order to accept the older
format.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Wendy Cheng <wcheng@redhat.com>
2007-07-10 12:28:27 +01:00
Jens Axboe
5ffc4ef45b sendfile: remove .sendfile from filesystems that use generic_file_sendfile()
They can use generic_file_splice_read() instead. Since sys_sendfile() now
prefers that, there should be no change in behaviour.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:13 +02:00
Steven Whitehouse
a0a24741ca [GFS2] Small fixes to logging code
This reverts part of an earlier patch which tried to reclaim
gfs2_bufdata structures too early and resulted in a "use after free"
case (this bit from me). Also a change to not write out log headers
unless we really need to (in the case of flushing nothing we don't need
a header) from Bob.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2007-07-09 15:43:07 +01:00
Wendy Cheng
35dcc52e3a [GFS2] Remove i_mode passing from NFS File Handle
GFS2 has been passing i_mode within NFS File Handle. Other than the
wrong assumption that there is always room for this extra 16 bit value,
the current gfs2_get_dentry doesn't really need the i_mode to work
correctly. Note that GFS2 NFS code does go thru the same lookup code
path as direct file access route (where the mode is obtained from name
lookup) but gfs2_get_dentry() is coded for different purpose. It is not
used during lookup time. It is part of the file access procedure call.
When the call is invoked, if on-disk inode is not in-memory, it has to
be read-in. This makes i_mode passing a useless overhead.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:24:11 +01:00
Wendy Cheng
bb9bcf0616 [GFS2] Obtaining no_formal_ino from directory entry
GFS2 lookup code doesn't ask for inode shared glock. This implies during
in-memory inode creation for existing file, GFS2 will not disk-read in
the inode contents. This leaves no_formal_ino un-initialized during
lookup time. The un-initialized no_formal_ino is subsequently encoded
into file handle. Clients will get ESTALE error whenever it tries to
access these files.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:24:08 +01:00
Abhijith Das
b365762924 [GFS2] System won't suspend with GFS2 file system mounted
The kernel threads in gfs2, namely gfs2_scand, gfs2_logd, gfs2_quotad,
gfs2_glockd, gfs2_recoverd weren't doing anything when the suspend
mechanism was trying to freeze them.

I put in calls to refrigerator() in the loops for all the daemons and
suspend works as expected.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:24:04 +01:00
Bob Peterson
569a7b6c2e [GFS2] remounting w/o acl option leaves acls enabled
This patch is for bugzilla bug #245663.  This crosswrites a fix from
gfs1 (bz #210369) so that the mount options are reset properly upon
remount.  This was tested on system trin-10.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:24:01 +01:00
Wendy Cheng
090ffaa55d [GFS2] inode size inconsistency
This should have been part of the NFS patch #1 but somehow I missed it
when packaging the patches. It is not a critical issue as the others (I
hope). RHEL 5.1 31.el5 kernel runs fine without this change.

Our truncate code is chopped into two parts, one for vfs inode changes
(in vmtruncate()) and one of gfs inode (in gfs2_truncatei()). These two
operatons are, unfortunately, not atomic. So it could happens that
vmtruncate() succeeds (inode->i_size is changed) but gfs2_truncatei
fails (say kernel temporarily out of memory). This would leave gfs inode
i_di.di_size out of sync with vfs inode i_size. It will later confuse
gfs2_commit_write() if a write is issued. Last time I checked, it will
cause file corruption.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:59 +01:00
S. Wendy Cheng
1875f2f31b [GFS2] Fix gfs2_block_truncate_page err return
Code segment inside gfs2_block_truncate_page() doesn't set the return
code correctly. This causes NFSD erroneously returns EIO back to client
with setattr procedure call (truncate error).

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:54 +01:00
Robert Peterson
773ed1a044 [GFS2] Addendum to the journaled file/unmount patch
This patch is an addendum to the previous journaled file/unmount patch.
It fixes a problem discovered during testing.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:52 +01:00
Steven Whitehouse
eaf5bd3cac [GFS2] Simplify multiple glock aquisition
There is a bug in the code which acquires multiple glocks where if the
initial out-of-order attempt fails part way though we can land up trying
to acquire the wrong number of glocks. This is part of the fix for red
hat bz #239737. The other part of the bz doesn't apply to upstream
kernels since it was fixed by:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d3717bdf8f08a0e1039158c8bab2c24d20f492b6

Since the out-of-order code doesn't appear to add anything to the
performance of GFS2, this patch just removed it rather than trying to
fix it. It should be much easier to see whats going on here now. In
addition, we don't allocate any memory unless we are using a lot of
glocks (which is a relatively uncommon case).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:50 +01:00
Robert Peterson
2332c4435b [GFS2] assertion failure after writing to journaled file, umount
This patch passes all my nasty tests that were causing the code to
fail under one circumstance or another.  Here is a complete summary
of all changes from today's git tree, in order of appearance:

1. There are now separate variables for metadata buffer accounting.
2. Variable sd_log_num_hdrs is no longer needed, since the header
   accounting is taken care of by the reserve/refund sequence.
3. Fixed a tiny grammatical problem in a comment.
4. Added a new function "calc_reserved" to calculate the reserved
   log space.  This isn't entirely necessary, but it has two benefits:
   First, it simplifies the gfs2_log_refund function greatly.
   Second, it allows for easier debugging because I could sprinkle the
   code with calls to this function to make sure the accounting is
   proper (by adding asserts and printks) at strategic point of the code.
5. In log_pull_tail there apparently was a kludge to fix up the
   accounting based on a "pull" parameter.  The buffer accounting is
   now done properly, so the kludge was removed.
6. File sync operations were making a call to gfs2_log_flush that
   writes another journal header.  Since that header was unplanned
   for (reserved) by the reserve/refund sequence, the free space had
   to be decremented so that when log_pull_tail gets called, the free
   space is be adjusted properly.  (Did I hear you call that a kludge?
   well, maybe, but a lot more justifiable than the one I removed).
7. In the gfs2_log_shutdown code, it optionally syncs the log by
   specifying the PULL parameter to log_write_header.  I'm not sure
   this is necessary anymore.  It just seems to me there could be
   cases where shutdown is called while there are outstanding log
   buffers.
8. In the (data)buf_lo_before_commit functions, I changed some offset
   values from being calculated on the fly to being constants.	That
   simplified some code and we might as well let the compiler do the
   calculation once rather than redoing those cycles at run time.
9. This version has my rewritten databuf_lo_add function.
   This version is much more like its predecessor, buf_lo_add, which
   makes it easier to understand.  Again, this might not be necessary,
   but it seems as if this one works as well as the previous one,
   maybe even better, so I decided to leave it in.
10. In databuf_lo_before_commit, a previous data corruption problem
   was caused by going off the end of the buffer.  The proper solution
   is to have the proper limit in place, rather than stopping earlier.
   (Thus my previous attempt to fix it is wrong).
   If you don't wrap the buffer, you're stopping too early and that
   causes more log buffer accounting problems.
11. In lops.h there are two new (previously mentioned) constants for
   figuring out the data offset for the journal buffers.
12. There are also two new functions, buf_limit and databuf_limit to
   calculate how many entries will fit in the buffer.
13. In function gfs2_meta_wipe, it needs to distinguish between pinned
   metadata buffers and journaled data buffers for proper journal buffer
   accounting.	It can't use the JDATA gfs2_inode flag because it's
   sometimes passed the "real" inode and sometimes the "metadata
   inode" and the inode flags will be random bits in a metadata
   gfs2_inode.	It needs to base its decision on which was passed in.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:47 +01:00
Steven Whitehouse
2840501ac8 [GFS2] Use zero_user_page() in stuffed_readpage()
As suggested by Robert P. J. Day <rpjday@mindspring.com>

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Robert P. J. Day <rpjday@mindspring.com>
2007-07-09 08:23:45 +01:00
Steven Whitehouse
c4201214cb [GFS2] Remove bogus '\0' in rgrp.c
Not sure how it slipped in, but we don't want it anyway.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:43 +01:00
Robert Peterson
8fb68595d5 [GFS2] Journaled file write/unstuff bug
This patch is for bugzilla bug 283162, which uncovered a number of
bugs pertaining to writing to files that have the journaled bit on.
These bugs happen most often when writing to the meta_fs because
the files are always journaled.  So operations like gfs2_grow were
particularly vulnerable, although many of the problems could be
recreated with normal files after setting the journaled bit on.
The problems fixed are:

-GFS2 wasn't ever writing unstuffed journaled data blocks to their
 in-place location on disk. Now it does.

-If you unmounted too quickly after doing IO to a journaled file,
 GFS2 was crashing because you would discard a buffer whose bufdata
 was still on the active items list.  GFS2 now deals with this
 gracefully.

-GFS2 was losing track of the bufdata for journaled data blocks,
 and it wasn't getting freed, causing an error when you tried to
 unmount the module.  GFS2 now frees all the bufdata structures.

-There was a memory corruption occurring because GFS2 wrote
 twice as many log entries for journaled buffers.

-It was occasionally trying to write journal headers in buffers
 that weren't currently mapped.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:40 +01:00
Abhijith Das
d93cfa9884 [GFS2] Fix deallocation issues
There were two issues during deallocation of unlinked inodes. The
first was relating to the use of a "try" lock which in the case of
the inode lock wasn't trying hard enough to deallocate in all
circumstances (now changed to a normal glock) and in the case of
the iopen lock didn't wait for the demotion of the shared lock before
attempting to get the exclusive lock, and thereby sometimes (timing dependent)
not completing the deallocation when it should have done.

The second issue related to the lack of a way to invalidate dcache entries
on remote nodes (now fixed by this patch) which meant that unlinks were
taking a long time to return disk space to the fs. By adding some code to
invalidate the dcache entries across the cluster for unlinked inodes, that
is now fixed.

This patch was written jointly by Abhijith Das and Steven Whitehouse.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:36 +01:00
David Teigland
a7a2ff8a95 [GFS2] return conflicts for GETLK
We weren't returning the correct result when GETLK found a conflict,
which is indicated by userspace passing back a 1.

Signed-off-by: Abhijith Das <adas redhat com>
Signed-off-by: David Teigland <teigland redhat com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:33 +01:00
David Teigland
d88101d4d8 [GFS2] set plock owner in GETLK info
Set the owner field in the plock info sent to userspace for GETLK.
Without this, gfs_controld won't correctly see when the GETLK from a
process matches one of the process's existing locks.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:31 +01:00
akpm@linux-foundation.org
037bcbb756 [GFS2] gfs2_lookupi() uninitialised var fix
fs/gfs2/inode.c: In function 'gfs2_lookupi':
fs/gfs2/inode.c:392: warning: 'error' may be used uninitialized in this function

Looks like a real bug to me.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:29 +01:00
Steven Whitehouse
c8cdf47937 [GFS2] Recovery for lost unlinked inodes
Under certain circumstances its possible (though rather unlikely) that
inodes which were unlinked by one node while still open on another might
get "lost" in the sense that they don't get deallocated if the node
which held the inode open crashed before it was unlinked.

This patch adds the recovery code which allows automatic deallocation of
the inode if its found during block allocation (the sensible time to
look for such inodes since we are scanning the rgrp's bitmaps anyway at
this time, so it adds no overhead to do this).

Since the inode will have had its i_nlink set to zero, all we need to
trigger recovery is a lookup and an iput(), and the normal deallocation
code takes care of the rest.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:26 +01:00
Robert Peterson
b35997d448 [GFS2] Can't mount GFS2 file system on AoE device
This patch fixes bug 243131: Can't mount GFS2 file system on AoE device.
When using AoE devices with lock_nolock, there is no locking table, so
gfs2 (and gfs1) uses the superblock s_id.  This turns out to be the device
name in some cases.  In the case of AoE, the device contains a slash,
(e.g. "etherd/e1.1p2") which is an invalid character when we try to
register the table in sysfs.  This patch replaces the "/" with underscore.
Rather than add a new variable to the stack, I'm just reusing a (char *)
variable that's no longer used: table.

This code has been tested on the failing system using a RHEL5 patch.
The upstream code was tested by using gfs2_tool sb to interject a "/"
into the table name of a clustered gfs2 file system.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:24 +01:00
Steven Whitehouse
e1cc86037b [GFS2] Fix bug in error path of inode
This fixes a bug in the ordering of operations in the error path of
createi. Its not valid to do an iput() when holding the inode's glock
since the iput() will (in this case) result in delete_inode() being
called which needs to grab the lock itself. This was causing the
recursive lock checking code to trigger.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:22 +01:00
Steven Whitehouse
ffed8ab342 [GFS2] Fix typo in rename of directories
A typo caused us to pass a NULL pointer when renaming directories. It
was accidentally introduced in: [GFS2] Clean up inode number handling

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:19 +01:00
Patrick Caulfield
44f487a553 [DLM] variable allocation
Add a new flag, DLM_LSFL_FS, to be used when a file system creates a lockspace.
This flag causes the dlm to use GFP_NOFS for allocations instead of GFP_KERNEL.
(This updated version of the patch uses gfp_t for ls_allocation.)

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-Off-By: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:17 +01:00
Steven Whitehouse
4bd91ba181 [GFS2] Add nanosecond timestamp feature
This adds a nanosecond timestamp feature to the GFS2 filesystem. Due
to the way that the on-disk format works, older filesystems will just
appear to have this field set to zero. When mounted by an older version
of GFS2, the filesystem will simply ignore the extra fields so that
it will again appear to have whole second resolution, so that its
trivially backward compatible.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:12 +01:00
Steven Whitehouse
bb8d8a6f54 [GFS2] Fix sign problem in quota/statfs and cleanup _host structures
This patch fixes some sign issues which were accidentally introduced
into the quota & statfs code during the endianess annotation process.
Also included is a general clean up which moves all of the _host
structures out of gfs2_ondisk.h (where they should not have been to
start with) and into the places where they are actually used (often only
one place). Also those _host structures which are not required any more
are removed entirely (which is the eventual plan for all of them).

The conversion routines from ondisk.c are also moved into the places
where they are actually used, which for almost every one, was just one
single place, so all those are now static functions. This also cleans up
the end of gfs2_ondisk.h which no longer needs the #ifdef __KERNEL__.

The net result is a reduction of about 100 lines of code, many functions
now marked static plus the bug fixes as mentioned above. For good
measure I ran the code through sparse after making these changes to
check that there are no warnings generated.

This fixes Red Hat bz #239686

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:10 +01:00
Benjamin Marzinski
ddf4b426aa [GFS2] fix jdata issues
This is a patch for the first three issues of RHBZ #238162

The first issue is that when you allocate a new page for a file, it will not
start off uptodate. This makes sense, since you haven't written anything to that
part of the file yet.  Unfortunately, gfs2_pin() checks to make sure that the
buffers are uptodate.  The solution to this is to mark the buffers uptodate in
gfs2_commit_write(), after they have been zeroed out and have the data written
into them.  I'm pretty confident with this fix, although it's not completely
obvious that there is no problem with marking the buffers uptodate here.

The second issue is simply that you can try to pin a data buffer that is already
on the incore log, and thus, already pinned. This patch checks to see if this
buffer is already on the log, and exits databuf_lo_add() if it is, just like
buf_lo_add() does.

The third issue is that gfs2_log_flush() doesn't do it's block accounting
correctly.  Both metadata and journaled data are logged, but gfs2_log_flush()
only compares the number of metadata blocks with the number of blocks to commit
to the ondisk journal.  This patch also counts the journaled data blocks.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:08 +01:00
Steven Whitehouse
89918647a4 [GFS2] Make the log reserved blocks depend on block size
The number of blocks which we reserve in the log at the start of each
transaction needs to depends upon the block size since the overhead is
related to the number of "pointers" which can be fitted into a single
block.

This relates to Red Hat bz #240435

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:03 +01:00
Abhijith Das
1990e91765 [GFS2] Quotas non-functional - fix another bug
This patch fixes a bug where gfs2 was writing update quota usage
information to the wrong location in the quota file.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:01 +01:00
Abhijith Das
2a87ab0806 [GFS2] Quotas non-functional - fix bug
This patch fixes an error in the quota code where a 'struct
gfs2_quota_lvb*' was being passed to gfs2_adjust_quota() instead of a
'struct gfs2_quota_data*'. Also moved 'struct gfs2_quota_lvb' from
fs/gfs2/incore.h to include/linux/gfs2_ondisk.h as per Steve's suggestion.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:26 +01:00
Steven Whitehouse
dbb7cae2a3 [GFS2] Clean up inode number handling
This patch cleans up the inode number handling code. The main difference
is that instead of looking up the inodes using a struct gfs2_inum_host
we now use just the no_addr member of this structure. The tests relating
to no_formal_ino can then be done by the calling code. This has
advantages in that we want to do different things in different code
paths if the no_formal_ino doesn't match. In the NFS patch we want to
return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
no_formal_ino doesn't match and thus we can withdraw in this case.

In order to later fix bz #201012, we need to be able to look up an inode
without knowing no_formal_ino, as the only information that is known to
us is the on-disk location of the inode in question.

This patch will also help us to fix bz #236099 at a later date by
cleaning up a lot of the code in that area.

There are no user visible changes as a result of this patch and there
are no changes to the on-disk format either.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:24 +01:00
Steven Whitehouse
41d7db0ab4 [GFS2] Reduce size of struct gdlm_lock
This patch removes the completion (which is rather large) from struct
gdlm_lock in favour of using the wait_on_bit() functions. We don't need
to add any extra fields to the structure to do this, so we save 32 bytes
(on x86_64) per structure. This adds up to quite a lot when we may
potentially have millions of these lock structures,

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: David Teigland <teigland@redhat.com>
2007-07-09 08:22:21 +01:00
Robert Peterson
cd81a4bac6 [GFS2] Addendum patch 2 for gfs2_grow
This addendum patch 2 corrects three things:

1. It fixes a stupid mistake in the previous addendum that broke gfs2.
   Ref: https://www.redhat.com/archives/cluster-devel/2007-May/msg00162.html
2. It fixes a problem that Dave Teigland pointed out regarding the
   external declarations in ops_address.h being in the wrong place.
3. It recasts a couple more %llu printks to (unsigned long long)
   as requested by Steve Whitehouse.

I would have loved to put this all in one revised patch, but there was
a rush to get some patches for RHEL5.	Therefore, the previous patches
were applied to the git tree "as is" and therefore, I'm posting another
addendum.  Sorry.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:19 +01:00
Nate Diller
0507ecf50f [GFS2] use zero_user_page
Use zero_user_page() instead of open-coding it.

Signed-off-by: Nate Diller <nate.diller@gmail.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-07-09 08:22:17 +01:00
Robert Peterson
6c53267f05 [GFS2] Kernel changes to support new gfs2_grow command (part 2)
To avoid code redundancy, I separated out the operational "guts" into
a new function called read_rindex_entry.  Then I made two functions:
the closer-to-original gfs2_ri_update (without the special condition
checks) and gfs2_ri_update_special that's designed with that condition
in mind.  (I don't like the name, but if you have a suggestion, I'm
all ears).

Oh, and there's an added benefit:  we don't need all the ugly gotos
anymore.  ;)

This patch has been tested with gfs2_fsck_hellfire (which runs for
three and a half hours, btw).

Signed-off-By: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:14 +01:00
Robert Peterson
7ae8fa8451 [GFS2] kernel changes to support new gfs2_grow command
This is another revision of my gfs2 kernel patch that allows
gfs2_grow to function properly.

Steve Whitehouse expressed some concerns about the previous
patch and I restructured it based on his comments.
The previous patch was doing the statfs_change at file close time,
under its own transaction.  The current patch does the statfs_change
inside the gfs2_commit_write function, which keeps it under the
umbrella of the inode transaction.

I can't call ri_update to re-read the rindex file during the
transaction because the transaction may have outstanding unwritten
buffers attached to the rgrps that would be otherwise blown away.
So instead, I created a new function, gfs2_ri_total, that will
re-read the rindex file just to total the file system space
for the sake of the statfs_change.  The ri_update will happen
later, when gfs2 realizes the version number has changed, as it
happened before my patch.

Since the statfs_change is happening at write_commit time and there
may be multiple writes to the rindex file for one grow operation.
So one consequence of this restructuring is that instead of getting
one kernel message to indicate the change, you may see several.
For example, before when you did a gfs2_grow, you'd get a single
message like:

GFS2: File system extended by 247876 blocks (968MB)

Now you get something like:

GFS2: File system extended by 207896 blocks (812MB)
GFS2: File system extended by 39980 blocks (156MB)

This version has also been successfully run against the hours-long
"gfs2_fsck_hellfire" test that does several gfs2_grow and gfs2_fsck
while interjecting file system damage.  It does this repeatedly
under a variety Resource Group conditions.

Signed-off-By: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:12 +01:00
Benjamin Marzinski
b524fe646c [GFS2] flush the glock completely in inode_go_sync
Fix for bz #231910
When filemap_fdatawrite() is called on the inode mapping in data=ordered mode,
it will add the glock to the log. In inode_go_sync(), if you do the
gfs2_log_flush() before this, after the filemap_fdatawrite() call, the glock
and its associated data buffers will be on the log again. This means you can
demote a lock from exclusive, without having it flushed from the log. The
attached patch simply moves the gfs2_log_flush up to after the
filemap_fdatawrite() call.

Originally, I tried moving the gfs2_log_flush to after gfs2_meta_sync(), but
that caused me to trip the following assert.

GFS2: fsid=cypher-36:test.0: fatal: assertion "!buffer_busy(bh)" failed
GFS2: fsid=cypher-36:test.0:   function = gfs2_ail_empty_gl, file = fs/gfs2/glops.c, line = 61

It appears that gfs2_log_flush() puts some of the glocks buffers in the busy
state and the filemap_fdatawrite() call is necessary to flush them. This makes
me worry slightly that a related problem could happen because of moving the
gfs2_log_flush() after the initial filemap_fdatawrite(), but I assume that
gfs2_ail_empty_gl() would catch that case as well.

Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:07 +01:00
Alexey Dobriyan
e8edc6e03a Detach sched.h from mm.h
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.

This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
   getting them indirectly

Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
   they don't need sched.h
b) sched.h stops being dependency for significant number of files:
   on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
   after patch it's only 3744 (-8.3%).

Cross-compile tested on

	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
	alpha alpha-up
	arm
	i386 i386-up i386-defconfig i386-allnoconfig
	ia64 ia64-up
	m68k
	mips
	parisc parisc-up
	powerpc powerpc-up
	s390 s390-up
	sparc sparc-up
	sparc64 sparc64-up
	um-x86_64
	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:18:19 -07:00
Christoph Lameter
a35afb830f Remove SLAB_CTOR_CONSTRUCTOR
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@ucw.cz>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-17 05:23:04 -07:00
Randy Dunlap
e63340ae6b header cleaning: don't include smp_lock.h when not used
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:07 -07:00
Guillaume Chazarain
3e9f45bd18 Factor outstanding I/O error handling
Cleanup: setting an outstanding error on a mapping was open coded too many
times.  Factor it out in mapping_set_error().

Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:57 -07:00
Linus Torvalds
2d56d3c43c Merge branch 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux
* 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux:
  gfs2: nfs lock support for gfs2
  lockd: add code to handle deferred lock requests
  lockd: always preallocate block in nlmsvc_lock()
  lockd: handle test_lock deferrals
  lockd: pass cookie in nlmsvc_testlock
  lockd: handle fl_grant callbacks
  lockd: save lock state on deferral
  locks: add fl_grant callback for asynchronous lock return
  nfsd4: Convert NFSv4 to new lock interface
  locks: add lock cancel command
  locks: allow {vfs,posix}_lock_file to return conflicting lock
  locks: factor out generic/filesystem switch from setlock code
  locks: factor out generic/filesystem switch from test_lock
  locks: give posix_test_lock same interface as ->lock
  locks: make ->lock release private data before returning in GETLK case
  locks: create posix-to-flock helper functions
  locks: trivial removal of unnecessary parentheses
2007-05-07 12:34:24 -07:00
Linus Torvalds
5cefcab3db Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (34 commits)
  [GFS2] Uncomment sprintf_symbol calling code
  [DLM] lowcomms style
  [GFS2] printk warning fixes
  [GFS2] Patch to fix mmap of stuffed files
  [GFS2] use lib/parser for parsing mount options
  [DLM] Lowcomms nodeid range & initialisation fixes
  [DLM] Fix dlm_lowcoms_stop hang
  [DLM] fix mode munging
  [GFS2] lockdump improvements
  [GFS2] Patch to detect corrupt number of dir entries in leaf and/or inode blocks
  [GFS2] bz 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump)
  [DLM] fs/dlm/ast.c should #include "ast.h"
  [DLM] Consolidate transport protocols
  [DLM] Remove redundant assignment
  [GFS2] Fix bz 234168 (ignoring rgrp flags)
  [DLM] change lkid format
  [DLM] interface for purge (2/2)
  [DLM] add orphan purging code (1/2)
  [DLM] split create_message function
  [GFS2] Set drop_count to 0 (off) by default
  ...
2007-05-07 12:26:27 -07:00
Christoph Lameter
50953fe9e0 slab allocators: Remove SLAB_DEBUG_INITIAL flag
I have never seen a use of SLAB_DEBUG_INITIAL.  It is only supported by
SLAB.

I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again?  The callback is
performed before each freeing of an object.

I would think that it is much easier to check the object state manually
before the free.  That also places the check near the code object
manipulation of the object.

Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on.  If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code.  But there is no such code
in the kernel.  I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e.  add debug code before kfree).

There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches.  Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

This is the last slab flag that SLUB did not support.  Remove the check for
unimplemented flags from SLUB.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:57 -07:00
Marc Eshel
586759f03e gfs2: nfs lock support for gfs2
Add NFS lock support to GFS2.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-06 20:38:50 -04:00
Marc Eshel
9d6a8c5c21 locks: give posix_test_lock same interface as ->lock
posix_test_lock() and ->lock() do the same job but have gratuitously
different interfaces.  Modify posix_test_lock() so the two agree,
simplifying some code in the process.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-05-06 17:39:00 -04:00
Greg Kroah-Hartman
823bccfc40 remove "struct subsystem" as it is no longer needed
We need to work on cleaning up the relationship between kobjects, ksets and
ktypes.  The removal of 'struct subsystem' is the first step of this,
especially as it is not really needed at all.

Thanks to Kay for fixing the bugs in this patch.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 18:57:59 -07:00
Steven Whitehouse
37fde8ca6c [GFS2] Uncomment sprintf_symbol calling code
Now that the patch from -mm has gone upstream, we can uncomment the code
in GFS2 which uses sprintf_symbol.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Robert Peterson <rpeterso@redhat.com>
2007-05-01 09:51:39 +01:00
akpm@linux-foundation.org
f391a4ead6 [GFS2] printk warning fixes
alpha:

fs/gfs2/dir.c: In function 'gfs2_dir_read_leaf':
fs/gfs2/dir.c:1322: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'sector_t'
fs/gfs2/dir.c: In function 'gfs2_dir_read':
fs/gfs2/dir.c:1455: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type '__u64'

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:48 +01:00
Steven Whitehouse
bf126aee6d [GFS2] Patch to fix mmap of stuffed files
If a stuffed file is mmaped and a page fault is generated at some offset
above the initial page, we need to create a zero page to hang the buffer
heads off before we can unstuff the file. This is a fix for bz #236087

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:46 +01:00
Josef Bacik
476c006be0 [GFS2] use lib/parser for parsing mount options
This patch converts the mount option parsing to use the kernels lib/parser stuff
like all of the other filesystems.  I tested this and it works well.  Thank you,

Signed-off-by: Josef Bacik <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:43 +01:00
Robert Peterson
5f8820960c [GFS2] lockdump improvements
The patch below consists of the following changes (in code order):

1. I fixed a minor compiler warning regarding the printing of
   a kernel symbol address.
2. I implemented a suggestion from Dave Teigland that moves
   the debugfs information for gfs2 into a subdirectory so
   we can easily expand our use of debugfs in the future.
   The current code keeps the glock information in:
   /debug/gfs2/<fs>
   With the patch, the new code keeps the glock information in:
   /debug/gfs2/<fs>/glock
   That will allow us to create more debugfs files in the future.
3. This fixes a bug whereby a failed mount attempt causes the
   debugfs file to not be deleted.  Failed mount attempts should
   always clean up after themselves, including deleting the
   debugfs file and/or directory.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:33 +01:00
Steven Whitehouse
bdd19a22f8 [GFS2] Patch to detect corrupt number of dir entries in leaf and/or inode blocks
This patch detects when the number of entries in a leaf block or inode
block (in the case of stuffed directories) is corrupt and informs the
user. It prevents us from running off the end of the array thats been
allocated for the sorting in this case,

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:30 +01:00
Robert Peterson
7a0079d9e3 [GFS2] bz 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump)
This is for Bugzilla Bug 236008: Kernel gpf doing cat /debugfs/gfs2/xxx
(lock dump) seen at the "gfs2 summit".  This also fixes the bug that caused
garbage to be printed by the "initialized at" field.  I apologize for the
kludge, but that code will all be ripped out anyway when the official
sprint_symbol function becomes available in the Linux kernel.  I also
changed some formatting so that spaces are replaced by proper tabs.

Signed-off-by: Robert Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:28 +01:00
Steven Whitehouse
a43a49066d [GFS2] Fix bz 234168 (ignoring rgrp flags)
Ths following patch makes GFS2 use the rgrp flags properly. Although
there are also separate flags for both data and metadata as well, I've
not implemented these as there seems little use for them. On the
otherhand, the "noalloc" flag is generally useful for future changes we
might which to make, so this ensures that we interpret it correctly.

In addition I fixed the comment above the function which was incorrect.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:17 +01:00
Steven Whitehouse
f01963f264 [GFS2] Set drop_count to 0 (off) by default
This sets the drop_count to 0 by default which is a better default
for most people.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:05 +01:00
David Teigland
b9af8a788a [GFS2] use log_error before LM_OUT_ERROR
We always want to see the details of the error returned to gfs, but
log_debug is often turned off, so use log_error (printk).

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:02 +01:00
Robert Peterson
04b933f27b [GFS2] Red Hat bz 228540: owner references
In Testing the previously posted and accepted patch for
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228540
I uncovered some gfs2 badness.  It turns out that the current
gfs2 code saves off a process pointer when glocks is taken
in both the glock and glock holder structures.  Those
structures will persist in memory long after the process has
ended; pointers to poisoned memory.

This problem isn't caused by the 228540 fix; the new capability
introduced by the fix just uncovered the problem.

I wrote this patch that avoids saving process pointers
and instead saves off the process pid.  Rather than
referencing the bad pointers, it now does process lookups.
There is special code that makes the output nicer for
printing holder information for processes that have ended.

This patch also adds a stub for the new "sprint_symbol"
function that exists in Andrew Morton's -mm patch set, but
won't go into the base kernel until 2.6.22, since it adds
functionality but doesn't fix a bug.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:55 +01:00
Benjamin Marzinski
172e045a7f [GFS2] flush the log if a transaction can't allocate space
This is a fix for bz #208514. When GFS2 frees up space, the freed blocks
aren't available for reuse until the resource group is successfully written
to the ondisk journal. So in rare cases, GFS2 operations will fail, saying
that the filesystem is out of space, when in reality, you are just waiting for
a log flush. For instance, on a 1Gig filesystem, if I continually write 10 Mb
to a file, and then truncate it, after a hundred interations, the write will
fail with -ENOSPC, even though the filesystem is just 1% full.

The attached patch calls a log flush in these cases.  I tested this patch
fairly heavily to check if there were any locking issues that I missed, and
it seems to work just fine. Also, this patch only does the log flush if
get_local_rgrp makes a complete loop of resource groups without skipping
any do to locking issues. The code would be slightly simpler if it just always
did the log flush after the first failed pass, and you could only ever have
to go through the loop twice, instead of up to three times. However, I guessed
that failing to find a rg simply do to locking issues would be common enough
to skip the log flush in that case, but I'm not certain that this is the right
way to go. Either way, I don't suppose this code will be hit all that often.

Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:52 +01:00
Benjamin Marzinski
6883562588 [GFS2] Fix log entry list corruption
When glock_lo_add and rg_lo_add attempt to add an element to the log, they
check to see if has already been added before locking the log. If another
process adds that element to the log in this window between the check and
locking the log, the element will be added to the list twice. This causes
the log element list to become corrupted in such a way that the log element
can never be successfully removed from the list. This patch pulls the
list_empty() check inside the log lock, to remove this window.

Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:50 +01:00
Steven Whitehouse
f35ac346bc [GFS2] Speed up lock_dlm's locking (move sprintf)
The following patch speeds up lock_dlm's locking by moving the sprintf
out from the lock acquisition path and into the lock creation path. This
reduces the amount of CPU time used in acquiring locks by a fair amount.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: David Teigland <teigland@redhat.com>
2007-05-01 09:10:47 +01:00
Steven Whitehouse
420d2a1028 [GFS2] Fix a bug on i386 due to evaluation order
Since gcc didn't evaluate the last two terms of the expression in
glock.c:1881 as a constant expression, it resulted in an error on
i386 due to the lack of a 64bit divide instruction. This adds some
brackets to fix the problem.

This was reported by Andrew Morton.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2007-05-01 09:10:42 +01:00
Steven Whitehouse
3b8249f617 [GFS2] Fix bz 224480 and cleanup glock demotion code
This patch prevents the printing of a warning message in cases where
the fs is functioning normally by handing off responsibility for
unlinked, but still open inodes, to another node for eventual deallocation.
Also, there is now an improved system for ensuring that such requests
to other nodes do not get lost. The callback on the iopen lock is
only ever called when i_nlink == 0 and when a node is unable to deallocate
it due to it still being in use on another node. When a node receives
the callback therefore, it knows that i_nlink must be zero, so we mark
it as such (in gfs2_drop_inode) in order that it will then attempt
deallocation of the inode itself.

As an additional benefit, queuing a demote request no longer requires
a memory allocation. This simplifies the code for dealing with gfs2_holders
as it removes one special case.

There are two new fields in struct gfs2_glock. gl_demote_state is the
state which the remote node has requested and gl_demote_time is the
time when the request came in. Both fields are only valid when the
GLF_DEMOTE flag is set in gl_flags.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:39 +01:00
Josef Whiter
1de9139092 [GFS2] Fix bz 231380, unlock page before dequeing glocks in gfs2_commit_write
If we are writing a file, and in the middle of writing the file
another node attempts to get a shared lock on that file (by doing a du for
example) the process doing the writing will hang waiting on lock_page.  The
reason for this is because when we have waiters on a exclusive glock, we will go
through and flush out all dirty pages associated with that inode and release the
lock.  The problem is that when we flush the dirty pages, we could hit a page
that we have locked durring the generic_file_buffered_write part of this
operation.  This patch unlocks the page before we go to dequeue the lock and
locks it immediatly afterwards, since generic_file_buffered_write needs the page
locked when the commit_write is completed.  This patch resolves the problem,
however if somebody sees a better way to do this please don't hesistate to yell.

Signed-off-by: Josef Whiter <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:37 +01:00
Josef Whiter
5c7342d894 [GFS2] fix bz 231369, gfs2 will oops if you specify an invalid mount option
If you specify an invalid mount option when trying to mount a gfs2 filesystem,
gfs2 will oops.  The attached patch resolves this problem.

Signed-off-by: Josef Whiter <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:32 +01:00
Robert Peterson
7c52b166c5 [GFS2] Add gfs2_tool lockdump support to gfs2 (bz 228540)
The attached patch resolves bz 228540.  This adds the capability
for gfs2 to dump gfs2 locks through the debugfs file system.
This used to exist in gfs1 as "gfs_tool lockdump" but it's missing from
gfs2 because all the ioctls were stripped out.  Please see the bugzilla
for more history about the fix.  This patch is also attached to the bugzilla
record.

The patch is against Steve Whitehouse's latest nmw git tree kernel
(2.6.21-rc1) and has been tested on system trin-10.

Signed-off-by: Robert Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:29 +01:00
Steven Whitehouse
c3f49bc209 [GFS2] Fix bz 229873, alternate test: assertion "!ip->i_inode.i_mapping->nrpages" failed
The following removes an incorrect assertion from the GFS2 glops code. This
fixes Red Hat bz 229873. Thanks to Abhijith Das for testing the patch
and confirming the fix.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
2007-03-07 14:03:53 -05:00
akpm@linux-foundation.org
95d97b7dd7 [GFS2] build fix
fs/gfs2/glock.c:2198: error: 'THIS_MODULE' undeclared here (not in a function)

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-03-07 14:03:25 -05:00
Steven Whitehouse
631c42e170 [GFS2] go_drop_bh is never used, so remove it
The ->go_drop_bh function is never used, so this removes it and the single
caller,

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-03-07 14:02:53 -05:00
Steven Whitehouse
04b159b132 [GFS2] Remove unused variable
Remove an unused variable.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-03-07 14:02:30 -05:00
Steven Whitehouse
1be3867955 [GFS2] Fix bz 229831, lookup returns wrong inode
The following patch fixes Red Hat bz 229831. Without this patch its
possible for the wrong inode to be returned in certain cases. It is a
pretty unusual event, so that its taken some time to track down. Thanks
and due to Josef Whiter who did a lot of the testing required to thrack
this down and fix it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-03-07 14:01:53 -05:00
Steven Whitehouse
cad5b93927 [GFS2] Fix bz 230143, incorrect flushing of rgrps
The below patch fixes a problem where we were not flushing rgrps
correctly. It only occurred in the specific case that a callback was
received for an rgrp which was dirty and when a journal log flush had
not already resulted in the rgrp being flushed anyway. This fixes Red
Hat bz 230143,

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-03-07 14:00:14 -05:00
Wendy Cheng
fb0d3bce8e [GFS2] pass formal ino in do_filldir_main
ok, the following is the minimum changes to get NFSD going before we
settle down this issue .. would appreciate this in the tree so other NFS
related works can get done in parallel.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-03-07 13:58:45 -05:00
Josef Whiter
a13cbe3753 [GFS2] fix hangup when multiple processes are trying to write to the same file
This fixes a problem I encountered while running bonnie++.  When you have one
thread that opens a file and starts to write to it, and then another thread that
tries to open and write to the same file, the second thread will loop forever
trying to grab the inode lock for that inode.  Basically we come in through
generic_buffered_file_write, which calls gfs2_prepare_write, which then attempts
to grab the glock.  Because we don't own the lock, gfs2_prepare_write gets
GLR_TRYFAILED, which returns AOP_TRUNCATED_PAGE to generic_buffered_file_write.
At this point generic_buffered_file_write loops around again and immediately
retries the prepare_write.  This means that the second process never gets off of
the processor in order to allow the process that holds the lock to finish its
work and let go of the lock.  This patch makes gfs2_glock_nq schedule() if it
gets back a GLR_TRYFAILED, which resolves this problem.

Signed-off-by: Josef Whiter <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-03-07 13:58:02 -05:00
Wendy Cheng
a7d2b2bdc9 [GFS2] NFS filehandle check
File handle checking error found in '07 NFS connectathon. The fh_type
and fh_len are not necessarily identical. Some of the client machines
could fail mount with stale filehandle without this patch.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-03-07 13:57:34 -05:00
Richard Fearn
d5a6751b32 [GFS2] add newline to printk message
Patch for the 2.6.20 stable tree that adds a missing newline to one of
the printk messages in fs/gfs2/ops_fstype.c.

Signed-off-by: Richard Fearn <richardfearn@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-03-07 13:57:10 -05:00
Josef Whiter
2e95b6653b [GFS2] fix locking mistake
This patch fixes a locking mistake in the quota code, we do a mutex_lock instead
of a mutex_unlock.

Signed-off-by: Josef Whiter <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-03-07 13:56:41 -05:00
Tim Schmielau
cd354f1ae7 [PATCH] remove many unneeded #includes of sched.h
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there.  Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm.  I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:54 -08:00
Josef 'Jeff' Sipek
ee9b6d61a2 [PATCH] Mark struct super_operations const
This patch is inspired by Arjan's "Patch series to mark struct
file_operations and struct inode_operations const".

Compile tested with gcc & sparse.

Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:47 -08:00
Arjan van de Ven
92e1d5be91 [PATCH] mark struct inode_operations const 2
Many struct inode_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:46 -08:00
Arjan van de Ven
00977a59b9 [PATCH] mark struct file_operations const 6
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:45 -08:00
Robert P. J. Day
c376222960 [PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc().
Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:27 -08:00
Adrian Bunk
a2cf822274 [GFS2] make gfs2_writepages() static
On Mon, Jan 29, 2007 at 08:45:28PM -0800, Andrew Morton wrote:
>...
> Changes since 2.6.20-rc6-mm2:
>...
>  git-gfs2-nmw.patch
>...
>  git trees
>...

This patch makes the needlessly global gfs2_writepages() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-07 10:48:48 -05:00
Steven Whitehouse
2d72e7101c [GFS2] Unlock page on prepare_write try lock failure
When the try lock of the glock failed in prepare_write we were
incorrectly exiting this function with the page still locked.
This was resulting in further I/O to this page hanging.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-07 10:25:59 -05:00
Wendy Cheng
549ae0ac3d [GFS2] nfsd readdirplus assertion failure
Glock assertion failure found in '07 NFS connectathon. One of the NFSDs
is doing a "readdirplus" procedure call. It passes the logic into
gfs2_readdir() where it obtains its directory inode glock. This is then
followed by filehandle construction that invokes lookup code. It hits
the assertion failure while trying to obtain the inode glock again
inside gfs2_drevalidate().

This patch bypasses the recursive glock call if caller already holds the
lock.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-06 11:36:01 -05:00
Randy Dunlap
9beeb9f3c5 [DLM/GFS2] indent help text
Indent help text as expected.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:38:20 -05:00
Russell Cattelan
ddee76089c [GFS2] Fix unlink deadlocks
Move the glock acquisition to outside of the transactions.

Lock odering must be preserved in order to prevent ABBA
deadlocks. The current gfs2_change_nlink code would tries
to grab the glock after having started a transaction and thus is holding
the log lock. This is inconsistent with other code paths in
gfs that grab the resource group glock prior to staring
a tranactions.

One problem with this fix is that the resource group
lock is always grabbed now even if the inode still has
ref count and can not be marked for unlink.

Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:38:17 -05:00
Steven Whitehouse
61be084efc [GFS2] Put back semaphore to avoid umount problem
Dave Teigland fixed this bug a while back, but I managed to mistakenly
remove the semaphore during later development. It is required to avoid
the list of inodes changing during an invalidate_inodes call. I have
made it an rwsem since the read side will be taken frequently during
normal filesystem operation. The write site will only happen during
umount of the file system.

Also the bug only triggers when using the DLM lock manager and only then
under certain conditions as its timing related.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: David Teigland <teigland@redhat.com>
2007-02-05 13:38:14 -05:00
Eric Sandeen
bbb28ab759 [GFS2] more CURRENT_TIME_SEC
Whoops, quilt user error, missed this one in the previous patch.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:38:11 -05:00
Adrian Bunk
0011727785 [GFS2/DLM] fix GFS2 circular dependency
On Sun, Jan 28, 2007 at 11:08:18AM +0100, Jiri Slaby wrote:
> Andrew Morton napsal(a):
> >Temporarily at
> >
> >	http://userweb.kernel.org/~akpm/2.6.20-rc6-mm1/
>
> Unable to select IPV6. Menuconfig doesn't offer it when INET is selected.
> When it's not it appears in the menu, but after state change it gets away.
> The same behaviour in xconfig, gconfig.
>
> $ mkdir ../a/tst
> $ make O=../a/tst menuconfig
>   HOSTCC  scripts/basic/fixdep
> [...]
>   HOSTLD  scripts/kconfig/mconf
> scripts/kconfig/mconf arch/i386/Kconfig
> Warning! Found recursive dependency: INET GFS2_FS_LOCKING_DLM SYSFS
> OCFS2_FS INET
>
> Maybe this is the problem?

Yes, patch below.

> regards,

cu
Adrian

<--  snip  -->

This patch fixes a circular dependency by letting GFS2_FS_LOCKING_DLM
and DLM depend on instead of select SYSFS.

Since SYSFS depends on EMBEDDED this change shouldn't cause any problems
for users.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:38:08 -05:00
Randy Dunlap
67f55897ee [GFS2/DLM] use sysfs
With CONFIG_DLM=m, CONFIG_PROC_FS=n, and CONFIG_SYSFS=n, kernel build
fails with:

WARNING: "kernel_subsys" [fs/gfs2/locking/dlm/lock_dlm.ko] undefined!
WARNING: "kernel_subsys" [fs/dlm/dlm.ko] undefined!
WARNING: "kernel_subsys" [fs/configfs/configfs.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

Since fs/dlm/lockspace.c and fs/gfs2/locking/dlm/sysfs.c use
kernel_subsys, they should either DEPEND on it or SELECT it.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:38:05 -05:00
David Teigland
ee32e4f3d3 [GFS2] make lock_dlm drop_count tunable in sysfs
We want to be able to change or disable the default drop_count (number at
which the dlm asks gfs to limit the the number of locks it's holding).
Add it to the collection of sysfs tunables for an fs.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:38:01 -05:00
David Teigland
2f708649ba [GFS2] increase default lock limit
Increase the number of locks at which point the dlm begins asking gfs to
reduce its lock usage.  The default value is largely arbitrary, but the
current value of 50,000 ends up limiting performance unnecessarily for too
many users.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:59 -05:00
Steven Whitehouse
8bd9572769 [GFS2] Fix list corruption in lops.c
The patch below appears to fix the list corruption that we are seeing on
occasion. Although the transaction structure is private to a single
thread, when the queued structures are dismantled during an in-core
commit, its possible for a different thread to be trying to add the same
structure to another, new, transaction at the same time.

To avoid this, this patch takes the log spinlock during this operation.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:56 -05:00
Steven Whitehouse
d7c103d0bd [GFS2] Fix recursive locking attempt with NFS
In certain cases, its possible for NFS to call the lookup code while
holding the glock (when doing a readdirplus operation) so we need to
check for that and not try and lock the glock twice. This also fixes a
typo in a previous NFS related GFS2 patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:53 -05:00
Steven Whitehouse
d043e1900c [GFS2] Fix typo in glock.c
This is a one letter typo fix in glock.c, spotted by Rob Kenna.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:41 -05:00
Eric Sandeen
ddfe062783 [GFS2] use CURRENT_TIME_SEC instead of get_seconds in gfs2
I was looking something else up and came across this...

I don't honestly have a good reason to change it other than to make it
like every other Linux filesystem in this regard.  ;-)  It doesn't
functionally change anything, but makes some lines shorter. :)

I'm also curious; why does gfs2 have 64-bits of on-disk timestamps, but
not in timespec_t format, and only stores second resolutions?  Seems like
you're halfway to sub-second resolutions already.

I suppose if that gets implemented then all of the below should
instead be CURRENT_TIME not CURRENT_TIME_SEC.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:38 -05:00
Steven Whitehouse
90101c3186 [GFS2] Compile fix for glock.c
This one liner got missed from the previous patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:35 -05:00
Steven Whitehouse
12132933c4 [GFS2] Remove queue_empty() function
This function is not longer required since we do not do recursive
locking in the glock layer. As a result all its callers can be
replaceed with list_empty() calls.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:32 -05:00
Steven Whitehouse
b5d32bead1 [GFS2] Tidy up glops calls
This patch doesn't make any changes to the ordering of the various
operations related to glocking, but it does tidy up the calls to the
glops.c functions to make the structure more obvious.

The two functions: gfs2_glock_xmote_th() and gfs2_glock_drop_th() can be
made static within glock.c since they are called by every set of glock
operations. The xmote_th and drop_th glock operations are then made
conditional upon those two routines existing and called from the
previously mentioned functions in glock.c respectively.

Also it can be seen that the go_sync operation isn't needed since it can
easily be replaced by calls to xmote_bh and drop_bh respectively. This
results in no longer (confusingly) calling back into routines in glock.c
from glops.c and also reducing the glock operations by one member.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:26 -05:00
Steven Whitehouse
1c0f4872dc [GFS2] Remove local exclusive glock mode
Here is a patch for GFS2 to remove the local exclusive flag. In
the places it was used, mutex's are always held earlier in the
call path, so it appears redundant in the LM_ST_SHARED case.

Also, the GFS2 holders were setting local exclusive in any case where
the requested lock was LM_ST_EXCLUSIVE. So the other places in the glock
code where the flag was tested have been replaced with tests for the
lock state being LM_ST_EXCLUSIVE in order to ensure the logic is the
same as before (i.e. LM_ST_EXCLUSIVE is always locally exclusive as well
as globally exclusive).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:20 -05:00
Steven Whitehouse
6bd9c8c2fb [GFS2] Remove unused go_callback operation
This is never used, so we might as well remove it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:17 -05:00
Steven Whitehouse
e5dab552c8 [GFS2] Remove the "greedy" function from glock.[ch]
The "greedy" code was an attempt to retain glocks for a minimum length
of time when they relate to mmap()ed files. The current implementation
of this feature is not, however, ideal in that it required allocating
memory in order to do this and its overly complicated.

It also misses the mark by ignoring the other I/O operations which are
just as likely to suffer from the same problem. So the plan is to remove
this now and then add the functionality back as part of the glock state
machine at a later date (and thus take into account all the possible
users of this feature)

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:14 -05:00
Steven Whitehouse
fee852e374 [GFS2] Shrink gfs2_inode memory by half
Here is something I spotted (while looking for something entirely
different) the other day.

Rather than using a completion in each and every struct gfs2_holder,
this removes it in favour of hashed wait queues, thus saving a
considerable amount of memory both on the stack (where a number of
gfs2_holder structures are allocated) and in particular in the
gfs2_inode which has 8 gfs2_holder structures embedded within it.

As a result on x86_64 the gfs2_inode shrinks from 2488 bytes to
1912 bytes, a saving of 576 bytes per inode (no thats not a typo!).
In actual practice we get a much better result than that since
now that a gfs2_inode is under the 2048 byte barrier, we get two
per 4k slab page effectively halving the amount of memory required
to store gfs2_inodes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:11 -05:00
Steven Whitehouse
330005c2b2 [GFS2] Remove max_atomic_write tunable
This removes an unused sysfs tunable parameter.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:08 -05:00
Steven Whitehouse
3699e3a44b [GFS2] Clean up/speed up readdir
This removes the extra filldir callback which gfs2 was using to
enclose an attempt at readahead for inodes during readdir. The
code was too complicated and also hurts performance badly in the
case that the getdents64/readdir call isn't being followed by
stat() and it wasn't even getting it right all the time when it
was.

As a result, on my test box an "ls" of a directory containing 250000
files fell from about 7mins (freshly mounted, so nothing cached) to
between about 15 to 25 seconds. When the directory content was cached,
the time taken fell from about 3mins to about 4 or 5 seconds.

Interestingly in the cached case, running "ls -l" once reduced the time
taken for subsequent runs of "ls" to about 6 secs even without this
patch. Now it turns out that there was a special case of glocks being
used for prefetching the metadata, but because of the timeouts for these
locks (set to 10 secs) the metadata was being timed out before it was
being used and this the prefetch code was constantly trying to prefetch
the same data over and over.

Calling "ls -l" meant that the inodes were brought into memory and once
the inodes are cached, the glocks are not disposed of until the inodes
are pushed out of the cache, thus extending the lifetime of the glocks,
and thus bringing down the time for subsequent runs of "ls"
considerably.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:04 -05:00
Steven Whitehouse
a8d638e30e [GFS2] Add writepages for "data=writeback" mounts
It occurred to me that although a gfs2 specific writepages for ordered
writes and journaled data would be tricky, by hooking writepages only
for "data=writeback" mounts we could take advantage of not needing
buffer heads (we don't use them on the read side, nor have we for some
time) and create much larger I/Os for the block layer.

Using blktrace both before and after, its possible to see that for large
I/Os, most of the requests generated through writepages are now 1024
sectors after this patch is applied as opposed to 8 sectors before.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:01 -05:00
Adrian Bunk
03dc6a538e [GFS2] make gfs2_change_nlink_i() static
On Thu, Jan 11, 2007 at 10:26:27PM -0800, Andrew Morton wrote:
>...
> Changes since 2.6.20-rc3-mm1:
>...
>  git-gfs2-nmw.patch
>...
>  git trees
>...

This patch makes the needlessly globlal gfs2_change_nlink_i() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:49 -05:00
Robert Peterson
7083146564 [GFS2] gfs2 knows of directories which it chooses not to display
This is for Red Hat bugzilla bug bz #222302:

Moving a virtual IP from node to node between two NFS-over-GFS2
servers was causing one of the GFS2 servers to become confused and
reference a deleted inode.  The problem was due to vfs dentries that did
not reference the gfs2_dops and therefore didn't call the gfs2 revalidate
code to revalidate a dentry after a directory had been deleted & recreated.
This patch is a crosswrite from a RHEL4 bug found in GFS1 as
bz #190756 and it is against the latest -nmw git tree.

Signed-off-by: Robert Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:46 -05:00
S. Wendy Cheng
87d21e07f3 [GFS2] Fix gfs2_rename deadlock
Second round of gfs2_rename lock re-ordering to allow Anaconda adding
root partition on top of gfs2. Previous to this patch the recursive
lock detector in glock.c can be triggered due to attempting to lock
the rgrp twice. This fixes it by checking to see whether the rgrp
is already locked.

This fixes Red Hat bugzilla #221237

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:31 -05:00
Russell Cattelan
6c93fd1e57 [GFS2] BZ 217008 fsfuzzer fix.
Update the quilt header comments to match the
code changes.

Change gfs2_lookup_simple to return an error in the case
of a NULL inode.
The callers of gfs2_lookup_simple do not check for NULL
in the no entry case and such would end up dereferencing a NULL ptr.

This fixes:
http://projects.info-pull.com/mokb/MOKB-15-11-2006.html

Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:28 -05:00
Steven Whitehouse
49686f7106 [GFS2] Fix ordering of page disposal vs. glock_dq
In case of unlinked files with dirty pages GFS2 wasn't clearing
the pages in quite the right order. This patch clears the pages
earlier (before the qlock_dq) to avoid the situation that the
release of the glock results in attempting to write back data that
has already been deallocated.

This fixes Red Hat bugzilla: #220117

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:24 -05:00
S. Wendy Cheng
5509826f1e [GFS2] Fix change nlink deadlock
Bugzilla 215088

Fix deadlock in gfs2_change_nlink() while installing RHEL5 into GFS2
partition. The gfs2_rename() apparently needs block allocation for the
new name (into the directory) where it requires rg locks. At the same
time, while updating the nlink count for the replaced file,
gfs2_change_nlink() tries to return the inode meta-data back to resource
group where it needs rg locks too. Our logic doesn't allow process to
acquire these locks recursively by the same process  (RHEL installer)
that results a BUG call. This only happens within rename code path and
only if the destination file exists before the rename operation.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:15 -05:00
Steven Whitehouse
e1d5b18ae9 [GFS2] Fail over to readpage for stuffed files
This is partially derrived from a patch written by Russell Cattelan.
It fixes a bug where there is a race between readpages and truncate
by ignoring readpages for stuffed files. This is ok because a stuffed
file will never be more than one block (minus sizeof(struct gfs2_dinode))
in size and block size is always less than page size, so we do not lose
anything efficiency-wise by not doing readahead for stuffed files. They
will have already been "read ahead" by the action of reading the inode
in, in the first place.

This is the remaining part of the fix for Red Hat bugzilla #218966
which had not yet made it upstream.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Russell Cattelan <cattelan@redhat.com>
2007-02-05 13:36:12 -05:00
Steven Whitehouse
c7b3383437 [GFS2] Fix DIO deadlock
This patch fixes Red Hat bugzilla #212627 in which a deadlock occurs
due to trying to take the i_mutex while holding a glock. The correct
locking order is defined as i_mutex -> glock in all cases.

I've left dealing with allocating writes. I know that we need to do
that, but for now this should do the trick. We don't need to take the
i_mutex on write, because the VFS has already taken it for us. On read
we don't need it since the glock is enough protection. The reason that
I've made some of the checks into a separate function is that we'll need
to do the checks again in the allocating write case eventually, so this
is partly in preparation for this. Likewise the return value test of !=
1 might look a bit odd and thats because we'll need a third return value
in case of requiring an allocation.

I've made the change to deferred mode on the glock to ensure flushing
read caches on other nodes. I notice that (using blktrace to look at
whats going on) we appear to do a better job of large I/Os than ext3
after this patch (in terms of not splitting up the I/Os).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Wendy Cheng <wcheng@redhat.com>
2007-02-05 13:36:09 -05:00
David Teigland
c378051177 [GFS2] don't try to lockfs after shutdown
If an fs has already been shut down, a lockfs callback should do nothing.
An fs that's been shut down can't acquire locks or do anything with
respect to the cluster.

Also, remove FIXME comment in withdraw function.  The missing bits of the
withdraw procedure are now all done by user space.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:35:44 -05:00
David Chinner
f73ca1b76c [PATCH] Revert bd_mount_mutex back to a semaphore
Revert bd_mount_mutex back to a semaphore so that xfs_freeze -f /mnt/newtest;
xfs_freeze -u /mnt/newtest works safely and doesn't produce lockdep warnings.

(XFS unlocks the semaphore from a different task, by design.  The mutex
code warns about this)

Signed-off-by: Dave Chinner <dgc@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-11 18:18:21 -08:00
Steven Whitehouse
1003f06953 [GFS2] Fix Kconfig
Here is a patch to fix up the Kconfig so that we don't land up with
problems when people disable the NET subsystem.  Thanks for all the hints and
suggestions that people have sent me regarding this.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Aleksandr Koltsoff <czr@iki.fi>
Cc: Toralf Förster <toralf.foerster@gmx.de>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Adrian Bunk <bunk@stusta.de>
Cc: Chris Zubrzycki <chris@middle--earth.org>
Cc: Patrick Caulfield <pcaulfie@redhat.com>
2006-12-15 12:51:51 -05:00
Josef Sipek
81454098f7 [PATCH] struct path: convert gfs2
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:45 -08:00
Linus Torvalds
1c1afa3c05 Merge master.kernel.org:/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* master.kernel.org:/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (73 commits)
  [DLM] Clean up lowcomms
  [GFS2] Change gfs2_fsync() to use write_inode_now()
  [GFS2] Fix indent in recovery.c
  [GFS2] Don't flush everything on fdatasync
  [GFS2] Add a comment about reading the super block
  [GFS2] Mount problem with the GFS2 code
  [GFS2] Remove gfs2_check_acl()
  [DLM] fix format warnings in rcom.c and recoverd.c
  [GFS2] lock function parameter
  [DLM] don't accept replies to old recovery messages
  [DLM] fix size of STATUS_REPLY message
  [GFS2] fs/gfs2/log.c:log_bmap() fix printk format warning
  [DLM] fix add_requestqueue checking nodes list
  [GFS2] Fix recursive locking in gfs2_getattr
  [GFS2] Fix recursive locking in gfs2_permission
  [GFS2] Reduce number of arguments to meta_io.c:getbuf()
  [GFS2] Move gfs2_meta_syncfs() into log.c
  [GFS2] Fix journal flush problem
  [GFS2] mark_inode_dirty after write to stuffed file
  [GFS2] Fix glock ordering on inode creation
  ...
2006-12-07 09:13:20 -08:00
Christoph Lameter
e18b890bb0 [PATCH] slab: remove kmem_cache_t
Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

	#!/bin/sh
	#
	# Replace one string by another in all the kernel sources.
	#

	set -e

	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
		quilt add $file
		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
		mv /tmp/$$ $file
		quilt refresh
	done

The script was run like this

	sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:25 -08:00
Steven Whitehouse
34126f9f41 [GFS2] Change gfs2_fsync() to use write_inode_now()
This is a bit better than the previous version of gfs2_fsync()
although it would be better still if we were able to call a
function which only wrote the inode & metadata. Its no big deal
though that this will potentially write the data as well since
the VFS has already done that before calling gfs2_fsync(). I've
also added a comment to explain whats going on here.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Morton <akpm@osdl.org>
2006-12-07 09:13:14 -05:00
Steven Whitehouse
887bc5d00c [GFS2] Fix indent in recovery.c
As per comments from Andrew Morton and Jan Engelhardt, this fixes the
indent and removes the "static" from a variable declaration since its
not needed in this case (now allocated on the stack of the function
in question).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Cc: Andrew Morton <akpm@osdl.org>
2006-12-05 13:34:17 -05:00
David Howells
9db7372445 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/ata/libata-scsi.c
	include/linux/libata.h

Futher merge of Linus's head and compilation fixups.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 17:01:28 +00:00
Al Viro
bd01f843c3 [PATCH] severing skbuff.h -> poll.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-12-04 02:00:31 -05:00
Steven Whitehouse
33c3de3287 [GFS2] Don't flush everything on fdatasync
The gfs2_fsync() function was doing a journal flush on each
and every call. While this is correct, its also a lot of
overhead. This patch means that on fdatasync flushes we
rely on the VFS to flush the data for us and we don't do
a journal flush unless we really need to.

We have to do a journal flush for stuffed files though because
they have the data and the inode metadata in the same block.
Journaled files also need a journal flush too of course.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:37:44 -05:00
Steven Whitehouse
aac1a3c77a [GFS2] Add a comment about reading the super block
The comment explains why we use the bio functions to read
the super block.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Srinivasa Ds <srinivasa@in.ibm.com>
2006-11-30 10:37:40 -05:00
Srinivasa Ds
0da3585e1e [GFS2] Mount problem with the GFS2 code
While mounting the gfs2 filesystem,our test team had a problem and we
got this error message.
=======================================================

GFS2: fsid=: Trying to join cluster "lock_nolock", "dasde1"
GFS2: fsid=dasde1.0: Joined cluster. Now mounting FS...
GFS2: not a GFS2 filesystem
GFS2: fsid=dasde1.0: can't read superblock: -22

==========================================================================
On debugging further we found that problem is while reading the super
block(gfs2_read_super) and comparing the magic number in it.
When I  replace the submit_bio() call(present in gfs2_read_super) with
the sb_getblk() and ll_rw_block(), mount operation succeded.
On further analysis we found that before calling submit_bio(),
bio->bi_sector was set to "sector" variable. This "sector" variable has
the same value of bh->b_blocknr(block number). Hence there is a need to
multiply this valuwith (blocksize >> 9)(9 because,sector size
2^9,samething happens in ll_rw_block also, before calling submit_bio()).
So I have developed the patch which solves this problem. Please let me
know your comments.
================================================================

Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:37:36 -05:00
Steven Whitehouse
77386e1f66 [GFS2] Remove gfs2_check_acl()
As pointed out by Adrian Bunk, the gfs2_check_acl() function is no
longer used. This patch removes it and renamed gfs2_check_acl_locked()
to gfs2_check_acl() since we only need one variant of that function now.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Adrian Bunk <bunk@stusta.de>
2006-11-30 10:37:32 -05:00
Randy Dunlap
0ac230699a [GFS2] lock function parameter
Fix function parameter typing:
fs/gfs2/glock.c💯 warning: function declaration isn't a prototype

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:37:18 -05:00
Ryusuke Konishi
aed3255f22 [GFS2] fs/gfs2/log.c:log_bmap() fix printk format warning
Fix a printk format warning in fs/gfs2/log.c:
fs/gfs2/log.c:322: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'sector_t'

Signed-off-by: Ryusuke Konishi <ryusuke@osrg.net>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:37:04 -05:00
Steven Whitehouse
dcf3dd852f [GFS2] Fix recursive locking in gfs2_getattr
The readdirplus NFS operation can result in gfs2_getattr being
called with the glock already held. In this case we do not want
to try and grab the lock again.

This fixes Red Hat bugzilla #215727

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:36:56 -05:00
Steven Whitehouse
300c7d75f3 [GFS2] Fix recursive locking in gfs2_permission
Since gfs2_permission may be called either from the VFS (in which case
we need to obtain a shared glock) or from GFS2 (in which case we already
have a glock) we need to test to see whether or not a lock is required.
The original test was buggy due to a potential race. This one should
be safe.

This fixes Red Hat bugzilla #217129

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:36:53 -05:00
Steven Whitehouse
cb4c031318 [GFS2] Reduce number of arguments to meta_io.c:getbuf()
Since the superblock and the address_space are determined by the
glock, we might as well just pass that as the argument since all
the callers already have that available.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:36:50 -05:00
Steven Whitehouse
a25311c8e0 [GFS2] Move gfs2_meta_syncfs() into log.c
By moving gfs2_meta_syncfs() into log.c, gfs2_ail1_start()
can be made static.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:36:45 -05:00
Steven Whitehouse
b004157ab5 [GFS2] Fix journal flush problem
This fixes a bug which resulted in poor performance due to flushing
the journal too often. The code path in question was via the inode_go_sync()
function in glops.c. The solution is not to flush the journal immediately
when inodes are ejected from memory, but batch up the work for glockd to
deal with later on. This means that glocks may now live on beyond the end of
the lifetime of their inodes (but not very much longer in the normal case).

Also fixed in this patch is a bug (which was hidden by the bug mentioned above) in
calculation of the number of free journal blocks.

The gfs2_logd process has been altered to be more responsive to the journal
filling up. We now wake it up when the number of uncommitted journal blocks
has reached the threshold level rather than trying to flush directly at the
end of each transaction. This again means doing fewer, but larger, log
flushes in general.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:36:42 -05:00
Steven Whitehouse
ae619320b2 [GFS2] mark_inode_dirty after write to stuffed file
Writes to stuffed files were not being marked dirty correctly.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:36:36 -05:00
Steven Whitehouse
28626e2078 [GFS2] Fix glock ordering on inode creation
The lock order here should be parent -> child rather than
numeric order.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:36:33 -05:00
Steven Whitehouse
1a14d3a68f [GFS2] Simplify glops functions
The go_sync callback took two flags, but one of them was set on every
call, so this patch removes once of the flags and makes the previously
conditional operations (on this flag), unconditional.

The go_inval callback took three flags, each of which was set on every
call to it. This patch removes the flags and makes the operations
unconditional, which makes the logic rather more obvious.

Two now unused flags are also removed from incore.h.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:36:30 -05:00
Steven Whitehouse
fa2ecfc5e1 [GFS2] Fix Kconfig wrt CRC32
GFS2 requires the CRC32 library function. This was reported by
Toralf Förster.

Cc: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:36:24 -05:00
Steven Whitehouse
5e7d65cd9d [GFS2] Make sentinel dirents compatible with gfs1
When deleting directory entries, we set the inum.no_addr to zero
in a dirent when its the first dirent in a block and thus cannot
be merged into the previous dirent as is the usual case. In gfs1,
inum.no_formal_ino was used instead.

This patch changes gfs2 to set both inum.no_addr and inum.no_formal_ino
to zero. It also changes the test from just looking at inum.no_addr to
look at both inum.no_addr and inum.no_formal_ino and a sentinel is
now considered to be a dirent in which _either_ (or both) of them
is set to zero.

This resolves Red Hat bugzillas: #215809, #211465

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:36:20 -05:00
Steven Whitehouse
dcd2479959 [GFS2] Remove unused function from inode.c
The gfs2_glock_nq_m_atime function is unused in so far as its only
ever called with num_gh = 1, and this falls through to the
gfs2_glock_nq_atime function, so we might as well call that directly.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:35:57 -05:00
Steven Whitehouse
175011cf6e [GFS2] Remove unused sysfs files
Four of the sysfs files are unused and can therefore be removed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:35:53 -05:00
Steven Whitehouse
4cf1ed8144 [GFS2] Tidy up bmap & fix boundary bug
This moves the locking for bmap into the bmap function itself
rather than using a wrapper function. It also fixes a bug where
the boundary flag was set on the wrong bh. Also the flags on
the mapped bh are reset earlier in the function to ensure that
they are 100% correct on the error path.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:35:49 -05:00
Steven Whitehouse
ab923031ce [GFS2] Fix memory allocation in glock.c
Change from GFP_KERNEL to GFP_NOFS as this was causing a
slow down when trying to push inodes from cache.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:35:46 -05:00
Russell Cattelan
61057c6bb3 [GFS2] Remove unused zero_readpage from stuffed_readpage
Stuffed files only consist of a maximum of
(gfs2 block size - sizeof(struct gfs2_dinode)) bytes. Since the
gfs2 block size is always less than page size, we will never see
a call to stuffed_readpage for anything other than the first page
in the file.

Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:57 -05:00
Russell Cattelan
7020933156 [GFS2] Fix race in logging code
The log lock is dropped prior to io submittion, but
this exposes a hole in which the log data structures
may be going away due to a truncate.
Store the buffer head in a local pointer prior to
dropping the lock and relay on the buffer_head lock
for consitency on the buffer head.

Signed-Off-By: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:55 -05:00
Steven Whitehouse
9e2dbdac3d [GFS2] Remove gfs2_inode_attr_in
This function wasn't really doing the right thing. There was no need
to update the inode size at this point and the updating of the
i_blocks field has now been moved to the places where di_blocks is
updated. A result of this patch and some those preceeding it is that
unlocking a glock is now a much more efficient process, since there
is no longer any requirement to copy data from the gfs2 inode into
the vfs inode at this point.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:52 -05:00
Steven Whitehouse
e7c698d74f [GFS2] Inode number is constant
Since the inode number is constant, we don't need to keep updating
it everytime we refresh the other inode fields.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:48 -05:00
Steven Whitehouse
6b124d8dba [GFS2] Only set inode flags when required
We were setting the inode flags from GFS2's flags far too often, even when they
couldn't possibly have changed. This patch reduces the amount of flag
setting going on so that we do it only when the inode is read in or
when the flags have changed. The create case is covered by the "when
the inode is read in" case.

This also fixes a bug where we didn't set S_SYNC correctly.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:45 -05:00
Steven Whitehouse
2ca99501fa [GFS2] Fix page lock/glock deadlock
This fixes a race between the glock and the page lock encountered
during truncate in gfs2_readpage and gfs2_prepare_write. The gfs2_readpages
function doesn't need the same fix since it only uses a try lock anyway, so
it will fail back to gfs2_readpage in the case of a potential deadlock.

This bug was spotted by Russell Cattelan.

Cc: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:43 -05:00
Steven Whitehouse
c594d88664 [GFS2] Remove unused GL_DUMP flag
There is no way to set the GL_DUMP flag, and in any case the
same thing can be done with systemtap if required for debugging,
so this removes it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:40 -05:00