kernel-ark

Author	SHA1	Message	Date
Miklos Szeredi	68b47139ea	[PATCH] namespace.c: fix bind mount from foreign namespace I'm resending this patch, because I still believe it's the correct fix. Tested before/after applying the patch with a test application available from: http://www.inf.bme.hu/~mszeredi/nstest.c Bind mount from a foreign namespace results in an un-removable mount. The reason is that mnt->mnt_namespace is copied from the old mount in clone_mnt(). Because of this check_mnt() in sys_umount() will fail. The solution is to set mnt->mnt_namespace to current->namespace in clone_mnt(). clone_mnt() is either called from do_loopback() or copy_tree(). copy_tree() is called from do_loopback() or copy_namespace(). When called (directly or indirectly) from do_loopback(), always current->namspace is being modified: check_mnt(nd->mnt). So setting mnt->mnt_namespace to current->namspace is the right thing to do. When called from copy_namespace(), the setting of mnt_namespace is irrelevant, since mnt_namespace is reset later in that function for all copied mounts. Jamie said: This patch is correct. The old code was buggy for more fundamental and serious reason: it broke the invariant that a tree of vfsmnts all have the same value of mnt_namespace (and the same for the mnt_list list). Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Acked-by: Jamie Lokier <jamie@shareable.org> Cc: <viro@parcelfarce.linux.theplanet.co.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-08-07 10:00:38 -07:00
Andrew Morton	e525e153c7	[PATCH] __bio_clone() dead comment Remove a very wrong comment. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-08-07 10:00:38 -07:00
Linus Torvalds	fab5a60a29	Check input buffer size in zisofs This uses the new deflateBound() thing to sanity-check the input to the zlib decompressor before we even bother to start reading in the blocks. Problem noted by Tim Yamin <plasmaroo@gentoo.org>	2005-08-06 09:42:06 -07:00
John McCutchan	0c3dba1534	[PATCH] Clean up inotify delete race fix This avoids the whole #ifdef mess by just getting a copy of dentry->d_inode before d_delete is called - that makes the codepaths the same for the INOTIFY/DNOTIFY cases as for the regular no-notify case. I've been running this under a Gnome session for the last 10 minutes. Inotify is being used extensively. Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-08-04 21:37:39 -07:00
John McCutchan	e234f35c54	[PATCH] inotify delete race fix The included patch fixes a problem where a inotify client would receive a delete event before the file was actually deleted. The bug affects both dnotify & inotify. Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-08-04 13:11:15 -07:00
Robert Love	3de11748c1	[PATCH] inotify: update help text The inotify help text still refers to the character device. Update it. Fixes kernel bug #4993. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-08-04 13:11:15 -07:00
Roman Zippel	74f9c9c258	[PATCH] hfs: don't reference missing page If there was a read error, the bnode might miss some pages, so skip them. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-08-01 21:38:00 -07:00
Roman Zippel	f76d28d235	[PATCH] hfs: don't dirty unchanged inode If inode size hasn't changed, don't do anything further in truncate, which also prevents a dirty inode, what might upset some readonly devices quite badly. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-08-01 21:38:00 -07:00
John McCutchan	b9c55d29e9	[PATCH] inotify: fix race between the kernel and user space When you rm a watch, an IN_IGNORED event is sent down the event queue with the watch descriptor that you just rm'd. If you then add a watch you could get the ignored watch's wd and if you haven't read the entire event queue, user space will think that it's newly created watch was just ignored. To avoid this problem we just use idr_get_new_above instead of idr_get_new. Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-08-01 09:16:53 -07:00
John McCutchan	7544953685	[PATCH] inotify: fix file deletion by rename detection When a file is moved over an existing file that you are watching, inotify won't send you a DELETE_SELF event and it won't unref the inode until the inotify instance is closed by the application. Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-08-01 09:16:53 -07:00
Maneesh Soni	9ca1eb3282	[PATCH] sysfs: fix sysfs_setattr o sysfs_dirent's s_mode field should also be updated in sysfs_setattr(), else there could be inconsistency in the two fields. s_mode is used while ->readdir so as not to bring in the inode to cache. Signed-off-by: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-29 13:12:49 -07:00
Maneesh Soni	bc062b1b5c	[PATCH] sysfs: fix sysfs_chmod_file o sysfs_chmod_file() must update the new iattr field in sysfs_dirent else the mode change will not be persistent in case of inode evacuation from cache. Signed-off-by: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-29 13:12:49 -07:00
Paolo 'Blaisorblade' Giarrusso	a2d76bd8fa	[PATCH] uml: implement hostfs syncing Actually implement the hostfs "sync" method. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-28 21:46:05 -07:00
Andrew Morton	a5453be48e	[PATCH] bio_clone fix Fix bug introduced in 2.6.11-rc2: when we clone a BIO we need to copy over the current index into it as well. It corrupts data with some MD setups. See http://bugzilla.kernel.org/show_bug.cgi?id=4946 Huuuuuuuuge thanks to Matthew Stapleton <matthew4196@gmail.com> for doggedly chasing this one down. Acked-by: Jens Axboe <axboe@suse.de> Cc: <linux-raid@vger.kernel.org> Cc: <dm-devel@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-28 08:38:59 -07:00
Linus Torvalds	49302d0c42	Merge head 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/shaggy/jfs-2.6	2005-07-27 16:42:22 -07:00
Jesper Juhl	77933d7276	[PATCH] clean up inline static vs static inline `gcc -W' likes to complain if the static keyword is not at the beginning of the declaration. This patch fixes all remaining occurrences of "inline static" up with "static inline" in the entire kernel tree (140 occurrences in 47 files). While making this change I came across a few lines with trailing whitespace that I also fixed up, I have also added or removed a blank line or two here and there, but there are no functional changes in the patch. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-27 16:26:20 -07:00
Olaf Hering	44456d37b5	[PATCH] turn many #if $undefined_string into #ifdef $undefined_string turn many #if $undefined_string into #ifdef $undefined_string to fix some warnings after -Wno-def was added to global CFLAGS Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-27 16:26:08 -07:00
Andreas Gruenbacher	02b775696f	[PATCH] reiserfs doesn't use mbcache reiserfs doesn't use the mbcache, so this can go. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-27 16:26:07 -07:00
Andreas Gruenbacher	8c52ab42c1	[PATCH] mbcache: Remove unused mb_cache_shrink parameter The cache parameter to mb_cache_shrink isn't used. We may as well remove it. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-27 16:26:07 -07:00
Peter Staubach	c293621bbf	[PATCH] stale POSIX lock handling I believe that there is a problem with the handling of POSIX locks, which the attached patch should address. The problem appears to be a race between fcntl(2) and close(2). A multithreaded application could close a file descriptor at the same time as it is trying to acquire a lock using the same file descriptor. I would suggest that that multithreaded application is not providing the proper synchronization for itself, but the OS should still behave correctly. SUS3 (Single UNIX Specification Version 3, read: POSIX) indicates that when a file descriptor is closed, that all POSIX locks on the file, owned by the process which closed the file descriptor, should be released. The trick here is when those locks are released. The current code releases all locks which exist when close is processing, but any locks in progress are handled when the last reference to the open file is released. There are three cases to consider. One is the simple case, a multithreaded (mt) process has a file open and races to close it and acquire a lock on it. In this case, the close will release one reference to the open file and when the fcntl is done, it will release the other reference. For this situation, no locks should exist on the file when both the close and fcntl operations are done. The current system will handle this case because the last reference to the open file is being released. The second case is when the mt process has dup(2)'d the file descriptor. The close will release one reference to the file and the fcntl, when done, will release another, but there will still be at least one more reference to the open file. One could argue that the existence of a lock on the file after the close has completed is okay, because it was acquired after the close operation and there is still a way for the application to release the lock on the file, using an existing file descriptor. The third case is when the mt process has forked, after opening the file and either before or after becoming an mt process. In this case, each process would hold a reference to the open file. For each process, this degenerates to first case above. However, the lock continues to exist until both processes have released their references to the open file. This lock could block other lock requests. The changes to release the lock when the last reference to the open file aren't quite right because they would allow the lock to exist as long as there was a reference to the open file. This is too long. The new proposed solution is to add support in the fcntl code path to detect a race with close and then to release the lock which was just acquired when such as race is detected. This causes locks to be released in a timely fashion and for the system to conform to the POSIX semantic specification. This was tested by instrumenting a kernel to detect the handling locks and then running a program which generates case #3 above. A dangling lock could be reliably generated. When the changes to detect the close/fcntl race were added, a dangling lock could no longer be generated. Cc: Matthew Wilcox <willy@debian.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-27 16:26:06 -07:00
Carsten Otte	0cfc11ed45	[PATCH] fix xip sparse file handling in ext2 Oliver Paukstadt from our test department is testing the xip patches in Linus' git-tree. He found a problem that shows when reading a file that contains sparse blocks (holes) on a -o xip mounted ext2 filesystem: the BUG_ON() in fs/ext2/xip.c:40 triggers where it should not. The problem was introduced by a cleanup in my previous patch, this patch fixes it. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-27 16:25:53 -07:00
Ian Kent	104e49fc1e	[PATCH] autofs4: fix infamous "Busy inodes after umount ..." message If the automount daemon receives a signal which causes it to sumarily terminate the autofs4 module leaks dentries. The same problem exists with detached mount requests without the warning. This patch cleans these dentries at umount. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-27 16:25:51 -07:00
Jan Kara	ab6862e6da	[PATCH] ext3: drop quota references before releasing inode We must drop references to quota structures before releasing the inode. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-27 16:25:50 -07:00
Jan Kara	c7e9a52ef0	[PATCH] ext2: drop quota reference before releasing inode We must drop references to quota structures before releasing the inode. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-27 16:25:50 -07:00
Jeff Mahoney	b3bb8afd96	[PATCH] reiserfs: fix deadlock in inode creation failure path w/ default ACL reiserfs_new_inode() can call iput() with the xattr lock held. This will cause a deadlock to occur when reiserfs_delete_xattrs() is called to clean up. The following patch releases the lock and reacquires it after the iput. This is safe because interaction with xattrs is complete, and the relock is just to balance out the release in the caller. The locking needs some reworking to be more sane, but that's more intrusive and I was just looking to fix this bug. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-27 16:25:50 -07:00
Nigel Cunningham	ef2a701d44	[PATCH] Fix missing refrigerator invocation in jffs2 Here's a patch to fix a missing refrigerator call in jffs2. Signed-off-by: Nigel Cunningham <nigel@suspend2.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-27 16:25:49 -07:00
Andrew Morton	89373de7dd	[PATCH] inotify: fix oops fix Cc: Robert Love <rml@novell.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-26 14:34:18 -07:00
Robert Love	e5ca844a9d	[PATCH] inotify: check retval in init Check for (unlikely) errors in the filesystem initialization stuff in our module_init() function. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-26 13:37:22 -07:00
Robert Love	1b2ccf0cc1	[PATCH] inotify: change default limits Change default inotify limits: Maximum instances per user to 128 and maximum events per queue to 16k. The max instances used to be 128; the change to 8 was a mistake. Memory consumption is fine. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-26 13:37:22 -07:00
Robert Love	5eb22cbcdb	[PATCH] inotify: exit path cleanups Handle error out paths better. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-26 13:37:22 -07:00
Robert Love	783bc29bbc	[PATCH] inotify: oops fix Bug fix: Ensure that the fd passed to inotify_add_watch() and inotify_rm_watch() belongs to inotify. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-26 13:37:21 -07:00
Robert Love	33ea2f52b8	[PATCH] inotify: use fget_light As an optimization, use fget_light() and fput_light() where possible. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-26 13:31:57 -07:00
Robert Love	b680716ed2	[PATCH] inotify: misc. cleanup Miscellaneous invariant clean up, comment fixes, and so on. Trivial stuff. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-26 13:31:57 -07:00
Dave Kleikamp	18190cc08d	JFS: Fix i_blocks accounting when allocation fails A failure in dbAlloc caused a directory's i_blocks to be incorrectly incremented, causing jfs_fsck to find the inode to be corrupt. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>	2005-07-26 09:29:13 -05:00
Dave Kleikamp	c2783f3a62	JFS: Don't set log_SYNCBARRIER when log->active == 0 If a metadata page is kept active, it is possible that the sync barrier logic continues to trigger, even if all active transactions have been phyically written to the journal. This can cause a hang, since the completion of the journal I/O is what unsets the sync barrier flag to allow new transactions to be created. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>	2005-07-25 08:58:54 -05:00
Dave Kleikamp	c40c202493	JFS: Fix typo in last patch Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>	2005-07-22 11:08:44 -05:00
Dave Kleikamp	21d1ee8b37	Merge with /home/shaggy/git/linus-clean/ Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>	2005-07-19 13:46:53 -05:00
Linus Torvalds	af6ea9ca23	Merge master.kernel.org:/pub/scm/linux/kernel/git/aia21/ntfs-2.6	2005-07-16 11:47:51 -07:00
Thomas Gleixner	2c4eec9802	Merge with rsync://fileserver/linux	2005-07-16 09:20:01 +02:00
Carsten Otte	afa597ba20	[PATCH] execute-in-place fixes This patch includes feedback from Andrew and Christoph. Thanks for taking time to review. Use of empty_zero_page was eliminated to fix compilation for architectures that don't have it. This patch removes setting pages up-to-date in ext2_get_xip_page and all bug checks to verify that the page is indeed up to date. Setting the page state on mapping to userland is bogus. None of the code patchs involved with these pages in mm cares about the page state. still on my ToDo list: identify a place outside second extended where __inode_direct_access should reside Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-15 09:54:50 -07:00
Qu Fuping	3d9b1cdd24	JFS: fsync wrong behavior when I/O failure occurs This is half of a patch that Qu Fuping submitted in April. The first part was applied to fs/mpage.c in 2.6.12-rc4. jfs_fsync should return error, but it doesn't wait for the metadata page to be uptodate, e.g.: jfs_fsync->jfs_commit_inode->txCommit->diWrite->read_metapage-> __get_metapage->read_cache_page reads a page from disk. Because read is async, when read_cache_page: err = filler(data, page), filler will not return error, it just submits I/O request and returns. So, page is not uptodate. Checking only if(IS_ERROR(mp->page)) is not enough, we should add "\|\| !PageUptodate(mp->page)" Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>	2005-07-15 10:36:08 -05:00
Dave Kleikamp	56d1254917	JFS: Remove assert statement in dbJoin & return -EIO instead Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>	2005-07-15 09:43:36 -05:00
Thomas Gleixner	5d157885f3	[JFFS2] Fix node allocation leak In the rare case of failing to write the cleanmarker the allocated node was not freed. Pointed out by Forrest Zhao Initial cleanup by Joern Engel Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2005-07-15 08:14:44 +02:00
Dave Kleikamp	00be3e7e5c	JFS: Remove bogus WARN_ON statement and some dead code Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>	2005-07-14 15:15:39 -05:00
Paolo 'Blaisorblade' Giarrusso	a0d43df931	[PATCH] uml: hostfs: unuse ROOT_DEV Minimal patch removing uses of ROOT_DEV; next patch unexports it. I've opposed this, but I've planned to reintroduce the functionality without using ROOT_DEV. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-14 09:00:25 -07:00
Paolo 'Blaisorblade' Giarrusso	8e0a218124	[PATCH] uml: fix hppfs error path Fix the error message to refer to the error code, i.e. err, not count, plus add some cosmetical fixes. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-14 09:00:25 -07:00
Anton Altaparmakov	c514720716	Automatic merge with /usr/src/ntfs-2.6.git.	2005-07-13 23:09:23 +01:00
Linus Torvalds	3720bd8b1e	Merge master.kernel.org:/pub/scm/linux/kernel/git/tglx/mtd-2.6	2005-07-13 12:19:30 -07:00
Steve Dickson	7ee91ec14b	[PATCH] NFS: procfs/sysctl interfaces for lockd do not work on x86_64 Allow the setting of NLM timeouts and grace periods through the proc and sysclt interfaces on x86_64 architectures Signed-off-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-13 11:25:24 -07:00
Anton Altaparmakov	88bd5121d6	[PATCH] Fix soft lockup due to NTFS: VFS part and explanation Something has changed in the core kernel such that we now get concurrent inode write outs, one e.g via pdflush and one via sys_sync or whatever. This causes a nasty deadlock in ntfs. The only clean solution unfortunately requires a minor vfs api extension. First the deadlock analysis: Prerequisive knowledge: NTFS has a file $MFT (inode 0) loaded at mount time. The NTFS driver uses the page cache for storing the file contents as usual. More interestingly this file contains the table of on-disk inodes as a sequence of MFT_RECORDs. Thus NTFS driver accesses the on-disk inodes by accessing the MFT_RECORDs in the page cache pages of the loaded inode $MFT. The situation: VFS inode X on a mounted ntfs volume is dirty. For same inode X, the ntfs_inode is dirty and thus corresponding on-disk inode, which is as explained above in a dirty PAGE_CACHE_PAGE belonging to the table of inodes ($MFT, inode 0). What happens: Process 1: sys_sync()/umount()/whatever... calls __sync_single_inode() for $MFT -> do_writepages() -> write_page for the dirty page containing the on-disk inode X, the page is now locked -> ntfs_write_mst_block() which clears PageUptodate() on the page to prevent anyone else getting hold of it whilst it does the write out (this is necessary as the on-disk inode needs "fixups" applied before the write to disk which are removed again after the write and PageUptodate is then set again). It then analyses the page looking for dirty on-disk inodes and when it finds one it calls ntfs_may_write_mft_record() to see if it is safe to write this on-disk inode. This then calls ilookup5() to check if the corresponding VFS inode is in icache(). This in turn calls ifind() which waits on the inode lock via wait_on_inode whilst holding the global inode_lock. Process 2: pdflush results in a call to __sync_single_inode for the same VFS inode X on the ntfs volume. This locks the inode (I_LOCK) then calls write-inode -> ntfs_write_inode -> map_mft_record() -> read_cache_page() of the page (in page cache of table of inodes $MFT, inode 0) containing the on-disk inode. This page has PageUptodate() clear because of Process 1 (see above) so read_cache_page() blocks when tries to take the page lock for the page so it can call ntfs_read_page(). Thus Process 1 is holding the page lock on the page containing the on-disk inode X and it is waiting on the inode X to be unlocked in ifind() so it can write the page out and then unlock the page. And Process 2 is holding the inode lock on inode X and is waiting for the page to be unlocked so it can call ntfs_readpage() or discover that Process 1 set PageUptodate() again and use the page. Thus we have a deadlock due to ifind() waiting on the inode lock. The only sensible solution: NTFS does not care whether the VFS inode is locked or not when it calls ilookup5() (it doesn't use the VFS inode at all, it just uses it to find the corresponding ntfs_inode which is of course attached to the VFS inode (both are one single struct); and it uses the ntfs_inode which is subject to its own locking so I_LOCK is irrelevant) hence we want a modified ilookup5_nowait() which is the same as ilookup5() but it does not wait on the inode lock. Without such functionality I would have to keep my own ntfs_inode cache in the NTFS driver just so I can find ntfs_inodes independent of their VFS inodes which would be slow, memory and cpu cycle wasting, and incredibly stupid given the icache already exists in the VFS. Below is a patch that does the ilookup5_nowait() implementation in fs/inode.c and exports it. ilookup5_nowait.diff: Introduce ilookup5_nowait() which is basically the same as ilookup5() but it does not wait on the inode's lock (i.e. it omits the wait_on_inode() done in ifind()). This is needed to avoid a nasty deadlock in NTFS. Signed-off-by: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-13 11:25:24 -07:00

1 2 3 4 5 ...

582 Commits