kernel-ark

Author	SHA1	Message	Date
Christoph Lameter	4983da07f1	[PATCH] page migration: fail if page is in a vma flagged VM_LOCKED page migration currently simply retries a couple of times if try_to_unmap() fails without inspecting the return code. However, SWAP_FAIL indicates that the page is in a vma that has the VM_LOCKED flag set (if ignore_refs ==1). We can check for that return code and avoid retrying the migration. migrate_page_remove_references() now needs to return a reason why the failure occured. So switch migrate_page_remove_references to use -Exx style error messages. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-14 21:43:02 -08:00
Linus Torvalds	0ee10a4423	Merge git://oss.sgi.com:8090/oss/git/rc-fixes * git://oss.sgi.com:8090/oss/git/rc-fixes: Fix a direct I/O locking issue revealed by the new mutex code.	2006-03-14 20:50:45 -08:00
Nathan Scott	3fb962bde4	Fix a direct I/O locking issue revealed by the new mutex code. Affects only XFS (i.e. DIO_OWN_LOCKING case) - currently it is not possible to get i_mutex locking correct when using DIO_OWN direct I/O locking in a filesystem due to indeterminism in the possible return code/lock/unlock combinations. This can cause a direct read to attempt a double i_mutex unlock inside XFS. We're now ensuring __blockdev_direct_IO always exits with the inode i_mutex (still) held for a direct reader. Tested with the three different locking modes (via direct block device access, ext3 and XFS) - both reading and writing; cannot find any regressions resulting from this change, and it clearly fixes the mutex_unlock warning originally reported here: http://marc.theaimsgroup.com/?l=linux-kernel&m=114189068126253&w=2 Signed-off-by: Nathan Scott <nathans@sgi.com> Acked-by: Christoph Hellwig <hch@lst.de>	2006-03-15 15:14:45 +11:00
Dave Kleikamp	c5111f504d	Merge with /home/shaggy/git/linus-clean/	2006-03-14 17:05:45 -06:00
Dave Kleikamp	a488edc914	[PATCH] JFS: Take logsync lock before testing mp->lsn This fixes a race where lsn could be cleared before taking the lock Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-14 14:00:48 -08:00
Trond Myklebust	30f4e20a0d	[PATCH] NLM: Ensure we do not Oops in the case of an unlock In theory, NLM specs assure us that the server will only reply LCK_GRANTED or LCK_DENIED_GRACE_PERIOD to our NLM_UNLOCK request. In practice, we should not assume this to be the case, and the code will currently Oops if we do. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-14 07:57:18 -08:00
Trond Myklebust	c12e87f465	[PATCH] NFSv4: fix mount segfault on errors returned that are < -1000 It turns out that nfs4_proc_get_root() may return raw NFSv4 errors instead of mapping them to kernel errors. Problem spotted by Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-14 07:57:18 -08:00
Trond Myklebust	143f412eb4	[PATCH] NFS: Fix a potential panic in O_DIRECT Based on an original patch by Mike O'Connor and Greg Banks of SGI. Mike states: A normal user can panic an NFS client and cause a local DoS with 'judicious'(?) use of O_DIRECT. Any O_DIRECT write to an NFS file where the user buffer starts with a valid mapped page and contains an unmapped page, will crash in this way. I haven't followed the code, but O_DIRECT reads with similar user buffers will probably also crash albeit in different ways. Details: when nfs_get_user_pages() calls get_user_pages(), it detects and correctly handles get_user_pages() returning an error, which happens if the first page covered by the user buffer's address range is unmapped. However, if the first page is mapped but some subsequent page isn't, get_user_pages() will return a positive number which is less than the number of pages requested (this behaviour is sort of analagous to a short write() call and appears to be intentional). nfs_get_user_pages() doesn't detect this and hands off the array of pages (whose last few elements are random rubbish from the newly allocated array memory) to it's caller, whence they go to nfs_direct_write_seg(), which then totally ignores the nr_pages it's given, and calculates its own idea of how many pages are in the array from the user buffer length. Needless to say, when it comes to transmit those uninitialised page* pointers, we see a crash in the network stack. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-14 07:57:17 -08:00
Nathan Scott	524fbf5dd1	[XFS] Revert kiocb and vattr stack changes, theory is the AIO rework will help here and vattr may be small enough. SGI-PV: 947312 SGI-Modid: xfs-linux-melb:xfs-kern:25423a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 14:07:53 +11:00
Nathan Scott	f30a121111	[XFS] Dynamically allocate the xfs_dinode_core_t structure to reduce our stack footprint in xfs_ialloc_ag_alloc. SGI-PV: 947312 SGI-Modid: xfs-linux-melb:xfs-kern:25420a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 14:07:36 +11:00
Mandy Kirkconnell	f020b67f3c	[XFS] Fix assert to check that in-core extents are inline only. SGI-PV: 950678 SGI-Modid: xfs-linux-melb:xfs-kern:207634a Signed-off-by: Mandy Kirkconnell <alkirkco@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 14:07:24 +11:00
Nathan Scott	a50cd26926	[XFS] Switch over from linvfs names for sb/quotactl operations for consistent naming. SGI-PV: 950556 SGI-Modid: xfs-linux-melb:xfs-kern:25382a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 14:06:18 +11:00
Nathan Scott	416c6d5bcf	[XFS] Switch over from linvfs names for inode operations for consistent naming. SGI-PV: 950556 SGI-Modid: xfs-linux-melb:xfs-kern:25381a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 14:00:51 +11:00
Nathan Scott	3562fd4565	[XFS] Switch over from linvfs names for file operations for consistent naming. SGI-PV: 950556 SGI-Modid: xfs-linux-melb:xfs-kern:25379a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 14:00:35 +11:00
Nathan Scott	e4c573bb6a	[XFS] Switch over from linvfs names for address space ops for consistent naming. SGI-PV: 950556 SGI-Modid: xfs-linux-melb:xfs-kern:25378a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:54:26 +11:00
Nathan Scott	b8b0f54656	[XFS] Remove a couple of no-longer-used macros/types from XFS. SGI-PV: 950556 SGI-Modid: xfs-linux-melb:xfs-kern:25377a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:47:32 +11:00
Nathan Scott	a365bdd5e8	[XFS] Reduce stack usage within xfs_bmapi by rearranging some code, splitting realtime/btree allocators apart. Based on Glens original patches. SGI-PV: 947312 SGI-Modid: xfs-linux-melb:xfs-kern:25372a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:34:16 +11:00
Nathan Scott	39269e29d4	[XFS] Reduce xfs_bmapi stack use by removing some local state variables, and directly testing flags instead. SGI-PV: 947312 SGI-Modid: xfs-linux-melb:xfs-kern:25370a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:33:50 +11:00
Nathan Scott	220b528413	[XFS] Dynamically allocate vattr in places it makes sense to do so, to reduce stack use. Also re-use vattr in some places so that multiple copies are not held on-stack. SGI-PV: 947312 SGI-Modid: xfs-linux-melb:xfs-kern:25369a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:33:36 +11:00
Nathan Scott	9b94c2eddf	[XFS] Take a dentry structure off the stack into the data segment. SGI-PV: 947312 SGI-Modid: xfs-linux-melb:xfs-kern:25361a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:32:54 +11:00
Nathan Scott	8f79405527	[XFS] Reduce complexity in xfs_trans_init by pushing complex macros out into functions and hence reduce the stack footprint there. SGI-PV: 947312 SGI-Modid: xfs-linux-melb:xfs-kern:25360a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:32:41 +11:00
Nathan Scott	f6d75cbed9	[XFS] Dynamically allocate xfs_dir2_put_args_t structure to reduce stack pressure in xfs_dir2_leaf_getdents routine. SGI-PV: 947312 SGI-Modid: xfs-linux-melb:xfs-kern:25359a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:32:24 +11:00
Nathan Scott	1f6553f9f9	[XFS] Dynamically allocate local kiocb structures in readv/writev routines to reduce stack footprint. SGI-PV: 947312 SGI-Modid: xfs-linux-melb:xfs-kern:25358a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:30:48 +11:00
Mandy Kirkconnell	0293ce3a9f	[XFS] 929045 567344 This mod introduces multi-level in-core file extent functionality, building upon the new layout introduced in mod xfs-linux:xfs-kern:207390a. The new multi-level extent allocations are only required for heavily fragmented files, so the old-style linear extent list is used on files until the extents reach a pre-determined size of 4k. 4k buffers are used because this is the system page size on Linux i386 and systems with larger page sizes don't seem to gain much, if anything, by using their native page size as the extent buffer size. Also, using 4k extent buffers everywhere provides a consistent interface for CXFS across different platforms. The 4k extent buffers are managed by an indirection array (xfs_ext_irec_t) which is basically just a pointer array with a bit of extra information to keep track of the number of extents in each buffer as well as the extent offset of each buffer. Major changes include: - Add multi-level in-core file extent functionality to the xfs_iext_ subroutines introduced in mod: xfs-linux:xfs-kern:207390a - Introduce 13 new subroutines which add functionality for multi-level in-core file extents: xfs_iext_add_indirect_multi() xfs_iext_remove_indirect() xfs_iext_realloc_indirect() xfs_iext_indirect_to_direct() xfs_iext_bno_to_irec() xfs_iext_idx_to_irec() xfs_iext_irec_init() xfs_iext_irec_new() xfs_iext_irec_remove() xfs_iext_irec_compact() xfs_iext_irec_compact_pages() xfs_iext_irec_compact_full() xfs_iext_irec_update_extoffs() SGI-PV: 928864 SGI-Modid: xfs-linux-melb:xfs-kern:207393a Signed-off-by: Mandy Kirkconnell <alkirkco@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:30:23 +11:00
Mandy Kirkconnell	4eea22f01b	[XFS] 929045 567344 This mod re-organizes some of the in-core file extent code to prepare for an upcoming mod which will introduce multi-level in-core extent allocations. Although the in-core extent management is using a new code path in this mod, the functionality remains the same. Major changes include: - Introduce 10 new subroutines which re-orgainze the existing code but do NOT change functionality: xfs_iext_get_ext() xfs_iext_insert() xfs_iext_add() xfs_iext_remove() xfs_iext_remove_inline() xfs_iext_remove_direct() xfs_iext_realloc_direct() xfs_iext_direct_to_inline() xfs_iext_inline_to_direct() xfs_iext_destroy() - Remove 2 subroutines (functionality moved to new subroutines above): xfs_iext_realloc() -replaced by xfs_iext_add() and xfs_iext_remove() xfs_bmap_insert_exlist() - replaced by xfs_iext_insert() xfs_bmap_delete_exlist() - replaced by xfs_iext_remove() - Replace all hard-coded (indexed) extent assignments with a call to xfs_iext_get_ext() - Replace all extent record pointer arithmetic (ep++, ep--, base + lastx,..) with calls to xfs_iext_get_ext() - Update comments to remove the idea of a single "extent list" and introduce "extent record" terminology instead SGI-PV: 928864 SGI-Modid: xfs-linux-melb:xfs-kern:207390a Signed-off-by: Mandy Kirkconnell <alkirkco@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:29:52 +11:00
Nathan Scott	9f989c9455	[XFS] Additional mount time superblock validation checks. SGI-PV: 950491 SGI-Modid: xfs-linux-melb:xfs-kern:25354a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:29:32 +11:00
David Chinner	01e1b69cfc	[XFS] using a spinlock per cpu for superblock counter exclusion results in a preēmpt counter overflow at 256p and above. Change the exclusion mechanism to use atomic bit operations and busy wait loops to emulate the spin lock exclusion mechanism but without the preempt count issues. SGI-PV: 950027 SGI-Modid: xfs-linux-melb:xfs-kern:25338a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:29:16 +11:00
Nathan Scott	87cbc49cd4	[XFS] Add xfs_map_buffer helper, use it in a couple of places. SGI-PV: 950211 SGI-Modid: xfs-linux-melb:xfs-kern:25312a Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:26:43 +11:00
Nathan Scott	f51623b21f	[XFS] Move some code around to avoid prototypes and prep for future writepages code. SGI-PV: 950211 SGI-Modid: xfs-linux-melb:xfs-kern:25311a Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:26:27 +11:00
Nathan Scott	02d7c92334	[XFS] Use XFS_VFSTOM in more places instead of open coding it. SGI-PV: 947206 SGI-Modid: xfs-linux-melb:xfs-kern:25310a Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:26:09 +11:00
Tim Shimmin	fcce0f1f9a	[XFS] forgot a couple of calls to XLOG_VEC_SET_TYPE when porting from irix to linux. SGI-PV: 931456 SGI-Modid: xfs-linux-melb:xfs-kern:25238a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:25:02 +11:00
Nathan Scott	a780143ea5	[XFS] UUID endianess fix. uu_timelow is a 32bit field and needs to be swapped with be32_to_cpu. SGI-PV: 943272 SGI-Modid: xfs-linux-melb:xfs-kern:25232a Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:24:46 +11:00
David Chinner	e8234a6871	[XFS] Add support for hotplug CPUs to the per-CPU superblock counters by registering a notifier callback that listens to CPU up/down events to modify the counters appropriately. SGI-PV: 949726 SGI-Modid: xfs-linux-melb:xfs-kern:25214a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:23:52 +11:00
Nathan Scott	2d0f864be3	[XFS] Make headers compile for more compiler variants; minor cleanup. SGI-PV: 949432 SGI-Modid: xfs-linux-melb:xfs-kern:25184a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:20:33 +11:00
Nathan Scott	d2c32edf64	[XFS] When compiling with gcc 4.0 and CONFIG_SMP unset, there are many warnings along the lines: xfs_linux.h:103:5: warning: "CONFIG_SMP" is not defined. SGI-PV: 946630 SGI-Modid: xfs-linux-melb:xfs-kern:25171a Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:20:13 +11:00
Nathan Scott	e0cc2325d1	[XFS] Flag the XFS inode cache as in need of spreading also. SGI-PV: 949073 SGI-Modid: xfs-linux-melb:xfs-kern:25170a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:19:55 +11:00
Nathan Scott	20722a9192	[XFS] Fix a mutex_destroy diagnostic about a locked-mutex-on-destroy from quota code. SGI-PV: 949149 SGI-Modid: xfs-linux-melb:xfs-kern:25123a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:19:08 +11:00
Nathan Scott	8758280fcc	[XFS] Cleanup the use of zones/slabs, more consistent and allows flags to be passed. SGI-PV: 949073 SGI-Modid: xfs-linux-melb:xfs-kern:25122a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:18:19 +11:00
David Chinner	8d280b98cf	[XFS] On machines with more than 8 cpus, when running parallel I/O threads, the incore superblock lock becomes the limiting factor for buffered write throughput. Make the contended fields in the incore superblock use per-cpu counters so that there is no global lock to limit scalability. SGI-PV: 946630 SGI-Modid: xfs-linux-melb:xfs-kern:25106a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:13:09 +11:00
Nathan Scott	9f4cbecd7e	[XFS] XFS propagates MS_NOATIME through two levels internally but doesn't actually use it. Kill this dead code. Signed-off-by: Christoph Hellwig <hch@lst.de> SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:25086a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:05:30 +11:00
David Chinner	0c9512d746	[XFS] find_exported_dentry(). XFS does not need to use this symbol as it is provided by a vector through the superblock export operations when the filesystem is exported by NFS. The fix is to call that vector instead of using the exported symbol directly. SGI-PV: 948858 SGI-Modid: xfs-linux-melb:xfs-kern:25062a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:02:13 +11:00
Badari Pulavarty	cd6ef84e6a	[PATCH] ext3: fix nobh mode for chattr +j inodes One can do "chattr +j" on a file to change its journalling mode. Fix writeback mode with "nobh" handling for it. Even though, we mount ext3 filesystem in writeback mode with "nobh" option, some one can do "chattr +j" on a single file to force it to do journalled mode. In order to do journaling, ext3_block_truncate_page() need to fallback to default case of creating buffers and adding them to transaction etc. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-11 09:19:34 -08:00
Kirill Korotaev	0adb25d2e7	[PATCH] ext3: ext3_symlink should use GFP_NOFS allocations inside This patch fixes illegal __GFP_FS allocation inside ext3 transaction in ext3_symlink(). Such allocation may re-enter ext3 code from try_to_free_pages. But JBD/ext3 code keeps a pointer to current journal handle in task_struct and, hence, is not reentrable. This bug led to "Assertion failure in journal_dirty_metadata()" messages. http://bugzilla.openvz.org/show_bug.cgi?id=115 Signed-off-by: Andrey Savochkin <saw@saw.sw.com.sg> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-11 09:19:34 -08:00
Atsushi Nemoto	0ef675d491	[PATCH] mtd: 64 bit fixes Fix some bugs in mtd/jffs2 on 64bit platform. The MEMGETBADBLOCK/MEMSETBADBLOCK ioctl are not listed in compat_ioctl.h. And some variables in jffs2 are declared as uint32_t but used to hold size_t values. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-09 19:47:37 -08:00
Dave Kleikamp	69eb66d7da	JFS: add uid, gid, and umask mount options OS/2 doesn't initialize the uid, gid, or unix-style permission bits. The uid, gid, & umask mount options perform pretty much like those for the fat file system, overriding what is stored on disk. This is useful for users sharing the file system with OS/2. I implemented a little feature so that if you mask the execute bit, it will be re-enabled on directories when the appropriate read bit is unmasked. I didn't want to implement an fmask & dmask option. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>	2006-03-09 13:59:30 -06:00
Randy Dunlap	1efa3c05f8	[NET] compat ifconf: fix limits A recent change to compat. dev_ifconf() in fs/compat_ioctl.c causes ifconf data to be truncated 1 entry too early when copying it to userspace. The correct amount of data (length) is returned, but the final entry is empty (zero, not filled in). The for-loop 'i' check should use <= to allow the final struct ifreq32 to be copied. I also used the ifconf-corruption program in kernel bugzilla #4746 to make sure that this change does not re-introduce the corruption. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-08 16:46:08 -08:00
Latchesar Ionkov	731805b494	[PATCH] v9fs: fix for access to unitialized variables or freed memory Miscellaneous fixes related to accessing uninitialized variables or memory that was already freed. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-08 14:14:02 -08:00
Horst Hummel	90f0094dc6	[PATCH] s390: dasd partition detection DASD allows to open a device as soon as gendisk is registered, which means the device is a fake device (capacity=0) and we do know nothing about blocksize and partitions at that point of time. In case the device is opened by someone, the bdev and inode creation is done with the fake device info and the following partition detection code is just using the wrong data. To avoid this modify the DASD state machine to make sure that the open is rejected until the device analysis is either finished or an unformatted device was detected. Signed-off-by: Horst Hummel <horst.hummel@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-08 14:14:01 -08:00
David Woodhouse	e96fb230cc	[PATCH] jffs2: avoid divide-by-zero Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-08 14:14:01 -08:00
Dipankar Sarma	529bf6be5c	[PATCH] fix file counting I have benchmarked this on an x86_64 NUMA system and see no significant performance difference on kernbench. Tested on both x86_64 and powerpc. The way we do file struct accounting is not very suitable for batched freeing. For scalability reasons, file accounting was constructor/destructor based. This meant that nr_files was decremented only when the object was removed from the slab cache. This is susceptible to slab fragmentation. With RCU based file structure, consequent batched freeing and a test program like Serge's, we just speed this up and end up with a very fragmented slab - llm22:~ # cat /proc/sys/fs/file-nr 587730 0 758844 At the same time, I see only a 2000+ objects in filp cache. The following patch I fixes this problem. This patch changes the file counting by removing the filp_count_lock. Instead we use a separate percpu counter, nr_files, for now and all accesses to it are through get_nr_files() api. In the sysctl handler for nr_files, we populate files_stat.nr_files before returning to user. Counting files as an when they are created and destroyed (as opposed to inside slab) allows us to correctly count open files with RCU. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-08 14:14:01 -08:00

... 5 6 7 8 9 ...

2251 Commits