kernel-ark

Author	SHA1	Message	Date
Konstantin Khorenko	3aa6e0aa8a	NFSD: memory corruption due to writing beyond the stat array If nfsd fails to find an exported via NFS file in the readahead cache, it should increment corresponding nfsdstats counter (ra_depth[10]), but due to a bug it may instead write to ra_depth[11], corrupting the following field. In a kernel with NFSDv4 compiled in the corruption takes the form of an increment of a counter of the number of NFSv4 operation 0's received; since there is no operation 0, this is harmless. In a kernel with NFSDv4 disabled it corrupts whatever happens to be in the memory beyond nfsdstats. Signed-off-by: Konstantin Khorenko <khorenko@openvz.org> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-02-14 10:35:18 -05:00
Benny Halevy	0af3f814cc	NFSD: use nfserr for status after decode_cb_op_status Bugs introduced in `85a5648019` "NFSD: Update XDR decoders in NFSv4 callback client" Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-02-14 10:35:18 -05:00
J. Bruce Fields	541ce98c10	nfsd: don't leak dentry count on mnt_want_write failure The exit cleanup isn't quite right here. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-02-14 10:31:08 -05:00
Linus Torvalds	c8e0b00ed1	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: call __jbd2_log_start_commit with j_state_lock write locked ext4: serialize unaligned asynchronous DIO ext4: make grpinfo slab cache names static ext4: Fix data corruption with multi-block writepages support ext4: fix up ext4 error handling ext4: unregister features interface on module unload ext4: fix panic on module unload when stopping lazyinit thread	2011-02-12 09:10:24 -08:00
Theodore Ts'o	e447183180	jbd2: call __jbd2_log_start_commit with j_state_lock write locked On an SMP ARM system running ext4, I've received a report that the first J_ASSERT in jbd2_journal_commit_transaction has been triggering: J_ASSERT(journal->j_running_transaction != NULL); While investigating possible causes for this problem, I noticed that __jbd2_log_start_commit() is getting called with j_state_lock only read-locked, in spite of the fact that it's possible for it might j_commit_request. Fix this by grabbing the necessary information so we can test to see if we need to start a new transaction before dropping the read lock, and then calling jbd2_log_start_commit() which will grab the write lock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-02-12 08:18:24 -05:00
Eric Sandeen	e9e3bcecf4	ext4: serialize unaligned asynchronous DIO ext4 has a data corruption case when doing non-block-aligned asynchronous direct IO into a sparse file, as demonstrated by xfstest 240. The root cause is that while ext4 preallocates space in the hole, mappings of that space still look "new" and dio_zero_block() will zero out the unwritten portions. When more than one AIO thread is going, they both find this "new" block and race to zero out their portion; this is uncoordinated and causes data corruption. Dave Chinner fixed this for xfs by simply serializing all unaligned asynchronous direct IO. I've done the same here. The difference is that we only wait on conversions, not all IO. This is a very big hammer, and I'm not very pleased with stuffing this into ext4_file_write(). But since ext4 is DIO_LOCKING, we need to serialize it at this high level. I tried to move this into ext4_ext_direct_IO, but by then we have the i_mutex already, and we will wait on the work queue to do conversions - which must also take the i_mutex. So that won't work. This was originally exposed by qemu-kvm installing to a raw disk image with a normal sector-63 alignment. I've tested a backport of this patch with qemu, and it does avoid the corruption. It is also quite a lot slower (14 min for package installs, vs. 8 min for well-aligned) but I'll take slow correctness over fast corruption any day. Mingming suggested that we can track outstanding conversions, and wait on those so that non-sparse files won't be affected, and I've implemented that here; unaligned AIO to nonsparse files won't take a perf hit. [tytso@mit.edu: Keep the mutex as a hashed array instead of bloating the ext4 inode] [tytso@mit.edu: Fix up namespace issues so that global variables are protected with an "ext4_" prefix.] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-02-12 08:17:34 -05:00
Eric Sandeen	2892c15ddd	ext4: make grpinfo slab cache names static In 2.6.37 I was running into oopses with repeated module loads & unloads. I tracked this down to: `fb1813f4` ext4: use dedicated slab caches for group_info structures (this was in addition to the features advert unload problem) The kstrdup & subsequent kfree of the cache name was causing a double free. In slub, at least, if I read it right it allocates & frees the name itself, slab seems to do something different... so in slub I think we were leaking -our- cachep->name, and double freeing the one allocated by slub. After getting lost in slab/slub/slob a bit, I just looked at other sized-caches that get allocated. jbd2, biovec, sgpool all do it more or less the way jbd2 does. Below patch follows the jbd2 method of dynamically allocating a cache at mount time from a list of static names. (This might also possibly fix a race creating the caches with parallel mounts running). [Folded in a fix from Dan Carpenter which fixed an off-by-one error in the original patch] Cc: stable@kernel.org Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-02-12 08:12:18 -05:00
Linus Torvalds	d40b0c3482	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: use single thread workqueues	2011-02-11 16:29:57 -08:00
Linus Torvalds	3aec46c1e0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: don't always drop malformed replies on the floor (try #3) cifs: clean up checks in cifs_echo_request [CIFS] Do not send SMBEcho requests on new sockets until SMBNegotiate	2011-02-11 16:29:50 -08:00
Boaz Harrosh	d863b50ab0	vfs: call rcu_barrier after ->kill_sb() In commit `fa0d7e3de6` ("fs: icache RCU free inodes"), we use rcu free inode instead of freeing the inode directly. It causes a crash when we rmmod immediately after we umount the volume[1]. So we need to call rcu_barrier after we kill_sb so that the inode is freed before we do rmmod. The idea is inspired by Aneesh Kumar. rcu_barrier will wait for all callbacks to end before preceding. The original patch was done by Tao Ma, but synchronize_rcu() is not enough here. 1. http://marc.info/?l=linux-fsdevel&m=129680863330185&w=2 Tested-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-11 16:12:19 -08:00
Linus Torvalds	2dab597441	Fix possible filp_cachep memory corruption In commit `31e6b01f41` ("fs: rcu-walk for path lookup") we started doing path lookup using RCU, which then falls back to a careful non-RCU lookup in case of problems (LOOKUP_REVAL). So do_filp_open() has this "re-do the lookup carefully" looping case. However, that means that we must not release the open-intent file data if we are going to loop around and use it once more! Fix this by moving the release of the open-intent data to the function that allocates it (do_filp_open() itself) rather than the helper functions that can get called multiple times (finish_open() and do_last()). This makes the logic for the lifetime of that field much more obvious, and avoids the possible double free. Reported-by: J. R. Okajima <hooanon05@yahoo.co.jp> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-11 15:53:38 -08:00
David Teigland	6b155c8fd4	dlm: use single thread workqueues The recent commit to use cmwq for send and recv threads `dcce240ead` introduced problems, apparently due to multiple workqueue threads. Single threads make the problems go away, so return to that until we fully understand the concurrency issues with multiple threads. Signed-off-by: David Teigland <teigland@redhat.com>	2011-02-11 16:50:47 -06:00
Jeff Layton	71823baff1	cifs: don't always drop malformed replies on the floor (try #3 ) Slight revision to this patch...use min_t() instead of conditional assignment. Also, remove the FIXME comment and replace it with the explanation that Steve gave earlier. After receiving a packet, we currently check the header. If it's no good, then we toss it out and continue the loop, leaving the caller waiting on that response. In cases where the packet has length inconsistencies, but the MID is valid, this leads to unneeded delays. That's especially problematic now that the client waits indefinitely for responses. Instead, don't immediately discard the packet if checkSMB fails. Try to find a matching mid_q_entry, mark it as having a malformed response and issue the callback. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-11 03:59:12 +00:00
Jeff Layton	195291e68c	cifs: clean up checks in cifs_echo_request Follow-on patch to `7e90d705` which is already in Steve's tree... The check for tcpStatus == CifsGood is not meaningful since it doesn't indicate whether the NEGOTIATE request has been done. Also, clarify why we're checking for maxBuf == 0. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-10 03:44:20 +00:00
Steve French	7e90d705fc	[CIFS] Do not send SMBEcho requests on new sockets until SMBNegotiate In order to determine whether an SMBEcho request can be sent we need to know that the socket is established (server tcpStatus == CifsGood) AND that an SMB NegotiateProtocol has been sent (server maxBuf != 0). Without the second check we can send an Echo request during reconnection before the server can accept it. CC: JG <jg@cms.ac> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-08 23:52:32 +00:00
Linus Torvalds	cb5520f02c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (33 commits) Btrfs: Fix page count calculation btrfs: Drop __exit attribute on btrfs_exit_compress btrfs: cleanup error handling in btrfs_unlink_inode() Btrfs: exclude super blocks when we read in block groups Btrfs: make sure search_bitmap finds something in remove_from_bitmap btrfs: fix return value check of btrfs_start_transaction() btrfs: checking NULL or not in some functions Btrfs: avoid uninit variable warnings in ordered-data.c Btrfs: catch errors from btrfs_sync_log Btrfs: make shrink_delalloc a little friendlier Btrfs: handle no memory properly in prepare_pages Btrfs: do error checking in btrfs_del_csums Btrfs: use the global block reserve if we cannot reserve space Btrfs: do not release more reserved bytes to the global_block_rsv than we need Btrfs: fix check_path_shared so it returns the right value btrfs: check return value of btrfs_start_ioctl_transaction() properly btrfs: fix return value check of btrfs_join_transaction() fs/btrfs/inode.c: Add missing IS_ERR test btrfs: fix missing break in switch phrase btrfs: fix several uncheck memory allocations ...	2011-02-07 14:06:18 -08:00
Linus Torvalds	257a65d795	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: remove checks for ses->status == CifsExiting cifs: add check for kmalloc in parse_dacl cifs: don't send an echo request unless NegProt has been done cifs: enable signing flag in SMB header when server has it on cifs: Possible slab memory corruption while updating extended stats (repost) CIFS: Fix variable types in cifs_iovec_read/write (try #2) cifs: fix length vs. total_read confusion in cifs_demultiplex_thread	2011-02-07 14:02:06 -08:00
Yan, Zheng	3a90983dbd	Btrfs: Fix page count calculation take offset of start position into account when calculating page count. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-07 14:13:51 -05:00
Curt Wohlgemuth	d50bdd5aa5	ext4: Fix data corruption with multi-block writepages support This fixes a corruption problem with the multi-block writepages submittal change for ext4, from commit `bd2d0210cf` ("ext4: use bio layer instead of buffer layer in mpage_da_submit_io"). (Note that this corruption is not present in 2.6.37 on ext4, because the corruption was detected after the feature was merged in 2.6.37-rc1, and so it was turned off by adding a non-default mount option, mblk_io_submit. With this commit, which hopefully fixes the last of the bugs with this feature, we'll be able to turn on this performance feature by default in 2.6.38, and remove the mblk_io_submit option.) The ext4 code path to bundle multiple pages for writeback in ext4_bio_write_page() had a bug: we should be clearing buffer head dirty flags before we submit the bio, not in the completion routine. The patch below was tested on 2.6.37 under KVM with the postgresql script which was submitted by Jon Nelson as documented in commit `1449032be1`. Without the patch, I'd hit the corruption problem about 50-70% of the time. With the patch, I executed the script > 100 times with no corruption seen. I also fixed a bug to make sure ext4_end_bio() doesn't dereference the bio after the bio_put() call. Reported-by: Jon Nelson <jnelson@jamponi.net> Reported-by: Matthias Bayer <jackdachef@gmail.com> Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2011-02-07 12:46:14 -05:00
Jeff Layton	d402539b8f	cifs: remove checks for ses->status == CifsExiting ses->status is never set to CifsExiting, so these checks are always false. Tested-by: JG <jg@cms.ac> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-07 17:25:55 +00:00
Alexey Charkov	8e4eef7a60	btrfs: Drop __exit attribute on btrfs_exit_compress As this function is called in some error paths while not removing the module, the __exit attribute prevents the kernel image from linking when btrfs is compiled in statically. Signed-off-by: Alexey Charkov <alchark@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-06 07:19:19 -05:00
Tsutomu Itoh	554233a6e0	btrfs: cleanup error handling in btrfs_unlink_inode() When btrfs_alloc_path() fails, btrfs_free_path() need not be called. Therefore, it changes the branch ahead. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-06 07:17:45 -05:00
Josef Bacik	3c14874acc	Btrfs: exclude super blocks when we read in block groups This has been resulting in a BUT_ON(ret) after btrfs_reserve_extent in btrfs_cow_file_range. The reason is we don't actually calculate the bytes_super for a block group until we go to cache it, which means that the space_info can hand out reservations for space that it doesn't actually have, and we can run out of data space. This is also a problem if you are using space caching since we don't ever calculate bytes_super for the block groups. So instead everytime we read a block group call exclude_super_stripes, which calculates the bytes_super for the block group so it can be left out of the space_info. Then whenever caching completes we just call free_excluded_extents so that the super excluded extents are freed up. Also if we are unmounting and we hit any block groups that haven't been cached we still need to call free_excluded_extents to make sure things are cleaned up properly. Thanks, Reported-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-06 07:17:44 -05:00
Josef Bacik	13dbc08987	Btrfs: make sure search_bitmap finds something in remove_from_bitmap When we're cleaning up the tree log we need to be able to remove free space from the block group. The problem is if that free space spans bitmaps we would not find the space since we're looking for too many bytes. So make sure the amount of bytes we search for is limited to either the number of bytes we want, or the number of bytes left in the bitmap. This was tested by a user who was hitting the BUG() after search_bitmap. With this patch he can now mount his fs. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-06 07:13:12 -05:00
Stanislav Fomichev	8132b65bc6	cifs: add check for kmalloc in parse_dacl Exit from parse_dacl if no memory returned from the call to kmalloc. Signed-off-by: Stanislav Fomichev <kernel@fomichev.me> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-06 00:36:23 +00:00
Sage Weil	e8e1ba96b2	ceph: queue cap_snaps once per realm We were forming a dirty list, and then queueing cap_snaps for each realm _and_ its children, regardless of whether the children were already in the dirty list. This meant we did it twice for some realms. Which in turn meant we corrupted mdsc->snap_flush_list when the cap_snap was re-added to the list it was already on, and could trigger an infinite loop. We were also using recursion to do reach all the children, a no-no when stack is limited. Instead, (re)queue any children on the dirty list, avoiding processing anything twice and avoiding any recursion. Signed-off-by: Sage Weil <sage@newdream.net>	2011-02-04 20:45:58 -08:00
Jeff Layton	247ec9b418	cifs: don't send an echo request unless NegProt has been done When the socket to the server is disconnected, the client more or less immediately calls cifs_reconnect to reconnect the socket. The NegProt and SessSetup however are not done until an actual call needs to be made. With the addition of the SMB echo code, it's possible that the server will initiate a disconnect on an idle socket. The client will then reconnect the socket but no NegotiateProtocol request is done. The SMBEcho workqueue job will then eventually pop, and an SMBEcho will be sent on the socket. The server will then reject it since no NegProt was done. The ideal fix would be to either have the socket not be reconnected until we plan to use it, or to immediately do a NegProt when the reconnect occurs. The code is not structured for this however. For now we must just settle for not sending any echoes until the NegProt is done. Reported-by: JG <jg@cms.ac> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-05 03:02:14 +00:00
Jeff Layton	e3f0dadb2b	cifs: enable signing flag in SMB header when server has it on cifs_sign_smb only generates a signature if the correct Flags2 bit is set. Make sure that it gets set correctly if we're sending an async call. This patch fixes: https://bugzilla.kernel.org/show_bug.cgi?id=28142 Reported-and-Tested-by: JG <jg@cms.ac> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-04 20:19:57 +00:00
Shirish Pargaonkar	64474bdd07	cifs: Possible slab memory corruption while updating extended stats (repost) Updating extended statistics here can cause slab memory corruption if a callback function frees slab memory (mid_entry). Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-04 20:18:06 +00:00
Tetsuo Handa	78d2978874	CRED: Fix kernel panic upon security_file_alloc() failure. In get_empty_filp() since 2.6.29, file_free(f) is called with f->f_cred == NULL when security_file_alloc() returned an error. As a result, kernel will panic() due to put_cred(NULL) call within RCU callback. Fix this bug by assigning f->f_cred before calling security_file_alloc(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-04 10:40:29 -08:00
Pavel Shilovsky	76429c148b	CIFS: Fix variable types in cifs_iovec_read/write (try #2 ) Variable 'i' should be unsigned long as it's used in circle with num_pages, and bytes_read/total_written should be ssize_t according to return value. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-04 04:41:06 +00:00
Linus Torvalds	89840966c5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus: hfsplus: fix up a comparism in hfsplus_file_extend hfsplus: fix two memory leaks in wrapper.c hfsplus: do not leak buffer on error hfsplus: fix failed mount handling	2011-02-03 16:31:43 -08:00
Christoph Hellwig	1065348d47	hfsplus: fix up a comparism in hfsplus_file_extend Revert an incorrect hunk from commit `b2837fcf49`, "hfsplus: %L-to-%ll, macro correction, and remove unneeded braces" revert a pointless change of comparism operation argument order, which turned out to not even be equivalent. Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2011-02-03 16:34:18 -07:00
Chuck Ebbert	a1dbcef017	hfsplus: fix two memory leaks in wrapper.c Signed-Off-By: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2011-02-03 16:34:11 -07:00
Chuck Ebbert	14dd01f883	hfsplus: do not leak buffer on error Signed-Off-By: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2011-02-03 16:34:05 -07:00
Christoph Hellwig	c5b8d0bce0	hfsplus: fix failed mount handling Currently the error handling in hfsplus_fill_super is a mess, and can lead to accessing fields in the superblock that haven't been even set up yet. Fix this by making sure we do not set up sb->s_root until we have the mount fully set up, and before that do proper step by step unwinding instead of using hfsplus_put_super as a big hammer. Reported-by: Dan Williams <dcbw@redhat.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2011-02-03 16:33:51 -07:00
Theodore Ts'o	dd68314ccf	ext4: fix up ext4 error handling Make sure we the correct cleanup happens if we die while trying to load the ext4 file system. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-02-03 14:33:49 -05:00
Lukas Czerner	8f021222c1	ext4: unregister features interface on module unload Ext4 features interface was not properly unregistered which led to problems while unloading/reloading ext4 module. This commit fixes that by adding proper kobject unregistration code into ext4_exit_fs() as well as fail-path of ext4_init_fs() Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2011-02-03 14:33:33 -05:00
Eric Sandeen	8f1f745331	ext4: fix panic on module unload when stopping lazyinit thread https://bugzilla.kernel.org/show_bug.cgi?id=27652 If the lazyinit thread is running, the teardown function ext4_destroy_lazyinit_thread() has problems: ext4_clear_request_list(); while (ext4_li_info->li_task) { wake_up(&ext4_li_info->li_wait_daemon); wait_event(ext4_li_info->li_wait_task, ext4_li_info->li_task == NULL); } Clearing the request list will cause the thread to exit and free ext4_li_info, so then we're waiting on something which is getting freed. Fix this up by making the thread respond to kthread_stop, and exit, without the need to wait for that exit in some other homegrown way. Cc: stable@kernel.org Reported-and-Tested-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-02-03 14:33:15 -05:00
Boaz Harrosh	0b0abeaf3d	Revert "exofs: Set i_mapping->backing_dev_info anyway" This reverts commit `115e19c535`. Apparently setting inode->bdi to one's own sb->s_bdi stops VFS from sending read-aheads. This problem was bisected to this commit. A revert fixes it. I'll investigate farther why is this happening for the next Kernel, but for now a revert. I'm sending to stable@kernel.org as well, since it exists also in 2.6.37. 2.6.36 is good and does not have this patch. CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-02 17:53:27 -08:00
Josef Bacik	d54cdc8ca7	fs: make block fiemap mapping length at least blocksize long Some filesystems don't deal well with being asked to map less than blocksize blocks (GFS2 for example). Since we are always mapping at least blocksize sections anyway, just make sure len is at least as big as a blocksize so we don't trip up any filesystems. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-02 16:03:20 -08:00
Namhyung Kim	3cd90ea42f	vfs: sparse: add __FMODE_EXEC FMODE_EXEC is a constant type of fmode_t but was used with normal integer constants. This results in following warnings from sparse. Fix it using new macro __FMODE_EXEC. fs/exec.c:116:58: warning: restricted fmode_t degrades to integer fs/exec.c:689:58: warning: restricted fmode_t degrades to integer fs/fcntl.c:777:9: warning: restricted fmode_t degrades to integer Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-02 16:03:19 -08:00
Eric Dumazet	0781b909b5	epoll: epoll_wait() should not use timespec_add_ns() commit `95aac7b1cd` ("epoll: make epoll_wait() use the hrtimer range feature") added a performance regression because it uses timespec_add_ns() with potential very large 'ns' values. [akpm@linux-foundation.org: s/epoll_set_mstimeout/ep_set_mstimeout/, per Davide] Reported-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Shawn Bohrer <shawn.bohrer@gmail.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: <stable@kernel.org> [2.6.37.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-02 16:03:18 -08:00
Jeff Layton	9587fcff42	cifs: fix length vs. total_read confusion in cifs_demultiplex_thread length at this point is the length returned by the last kernel_recvmsg call. total_read is the length of all of the data read so far. length is more or less meaningless at this point, so use total_read for everything. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-02 00:17:04 +00:00
Linus Torvalds	405b864d3f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: fix length checks in checkSMB [CIFS] Update cifs minor version cifs: No need to check crypto blockcipher allocation cifs: clean up some compiler warnings cifs: make CIFS depend on CRYPTO_MD4 cifs: force a reconnect if there are too many MIDs in flight cifs: don't pop a printk when sending on a socket is interrupted cifs: simplify SMB header check routine cifs: send an NT_CANCEL request when a process is signalled cifs: handle cancelled requests better cifs: fix two compiler warning about uninitialized vars	2011-02-02 10:22:40 +11:00
Tsutomu Itoh	98d5dc13e7	btrfs: fix return value check of btrfs_start_transaction() The error check of btrfs_start_transaction() is added, and the mistake of the error check on several places is corrected. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-01 07:17:27 -05:00
Tsutomu Itoh	5df6708348	btrfs: checking NULL or not in some functions Because NULL is returned when the memory allocation fails, it is checked whether it is NULL. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-01 07:16:37 -05:00
Chris Mason	c87fb6fdca	Btrfs: avoid uninit variable warnings in ordered-data.c This one isn't really an uninit variable, but for pretty obscure reasons. Let's make it clearly correct. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-31 20:33:37 -05:00
Linus Torvalds	0fd08c5545	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: NFSv4 readdir loses entries NFS: Micro-optimize nfs4_decode_dirent() NFS: Fix an NFS client lockdep issue NFS construct consistent co_ownerid for v4.1 NFS: nfs_wcc_update_inode() should set nfsi->attr_gencount NFS improve pnfs_put_deviceid_cache debug print NFS fix cb_sequence error processing NFS do not find client in NFSv4 pg_authenticate NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!" NFS: Prevent memory allocation failure in nfsacl_encode() NFS: nfsacl_{encode,decode} should return signed integer NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!" NFS: Fix "kernel BUG at fs/aio.c:554!" NFS4: Avoid potential NULL pointer dereference in decode_and_add_ds(). NFS: fix handling of malloc failure during nfs_flush_multi()	2011-02-01 09:41:02 +10:00
Jeff Layton	6284644e8d	cifs: fix length checks in checkSMB The cERROR message in checkSMB when the calculated length doesn't match the RFC1001 length is incorrect in many cases. It always says that the RFC1001 length is bigger than the SMB, even when it's actually the reverse. Fix the error message to say the reverse of what it does now when the SMB length goes beyond the end of the received data. Also, clarify the error message when the RFC length is too big. Finally, clarify the comments to show that the 512 byte limit on extra data at the end of the packet is arbitrary. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 22:35:37 +00:00
Linus Torvalds	fb9f1f17e9	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: xfs_bmap_add_extent_delay_real should init br_startblock xfs: fix dquot shaker deadlock xfs: handle CIl transaction commit failures correctly xfs: limit extsize to size of AGs and/or MAXEXTLEN xfs: prevent extsize alignment from exceeding maximum extent size xfs: limit extent length for allocation to AG size xfs: speculative delayed allocation uses rounddown_power_of_2 badly xfs: fix efi item leak on forced shutdown xfs: fix log ticket leak on forced shutdown.	2011-02-01 08:15:40 +10:00
Steve French	cab6958da0	[CIFS] Update cifs minor version Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 21:56:35 +00:00
Chris Mason	b31eabd86e	Btrfs: catch errors from btrfs_sync_log btrfs_sync_log returns -EAGAIN when we need full transaction commits instead of small log commits, but sometimes we were dropping the return value. In practice, we check for this a few different ways, but this is still a bug that can leave off full log commits when we really need them. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-31 16:48:24 -05:00
Josef Bacik	b1953bcec9	Btrfs: make shrink_delalloc a little friendlier Xfstests 224 will just sit there and spin for ever until eventually we give up flushing delalloc and exit. On my box this took several hours. I could not interrupt this process either, even though we use INTERRUPTIBLE. So do 2 things 1) Keep us from looping over and over again without reclaiming anything 2) If we get interrupted exit the loop I tested this and the test now exits in a reasonable amount of time, and can be interrupted with ctrl+c. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-31 16:27:28 -05:00
Shirish Pargaonkar	7a8587e7c8	cifs: No need to check crypto blockcipher allocation Missed one change as per earlier suggestion. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 17:29:18 +00:00
Jeff Layton	31c2659d78	cifs: clean up some compiler warnings New compiler warnings that I noticed when building a patchset based on recent Fedora kernel: fs/cifs/cifssmb.c: In function 'CIFSSMBSetFileSize': fs/cifs/cifssmb.c:4813:8: warning: variable 'data_offset' set but not used [-Wunused-but-set-variable] fs/cifs/file.c: In function 'cifs_open': fs/cifs/file.c:349:24: warning: variable 'pCifsInode' set but not used [-Wunused-but-set-variable] fs/cifs/file.c: In function 'cifs_partialpagewrite': fs/cifs/file.c:1149:23: warning: variable 'cifs_sb' set but not used [-Wunused-but-set-variable] fs/cifs/file.c: In function 'cifs_iovec_write': fs/cifs/file.c:1740:9: warning: passing argument 6 of 'CIFSSMBWrite2' from incompatible pointer type [enabled by default] fs/cifs/cifsproto.h:337:12: note: expected 'unsigned int ' but argument is of type 'size_t ' fs/cifs/readdir.c: In function 'cifs_readdir': fs/cifs/readdir.c:767:23: warning: variable 'cifs_sb' set but not used [-Wunused-but-set-variable] fs/cifs/cifs_dfs_ref.c: In function 'cifs_dfs_d_automount': fs/cifs/cifs_dfs_ref.c:342:2: warning: 'rc' may be used uninitialized in this function [-Wuninitialized] fs/cifs/cifs_dfs_ref.c:278:6: note: 'rc' was declared here Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 15:39:10 +00:00
Jeff Layton	f855f6cbeb	cifs: make CIFS depend on CRYPTO_MD4 Recently CIFS was changed to use the kernel crypto API for MD4 hashes, but the Kconfig dependencies were not changed to reflect this. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reported-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 15:26:07 +00:00
Jeff Layton	92a4e0f016	cifs: force a reconnect if there are too many MIDs in flight Currently, we allow the pending_mid_q to grow without bound with SIGKILL'ed processes. This could eventually be a DoS'able problem. An unprivileged user could a process that does a long-running call and then SIGKILL it. If he can also intercept the NT_CANCEL calls or the replies from the server, then the pending_mid_q could grow very large, possibly even to 2^16 entries which might leave GetNextMid in an infinite loop. Fix this by imposing a hard limit of 32k calls per server. If we cross that limit, set the tcpStatus to CifsNeedReconnect to force cifsd to eventually reconnect the socket and clean out the pending_mid_q. While we're at it, clean up the function a bit and eliminate an unnecessary NULL pointer check. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 04:38:15 +00:00
Jeff Layton	d804d41d16	cifs: don't pop a printk when sending on a socket is interrupted If we kill the process while it's sending on a socket then the kernel_sendmsg will return -EINTR. This is normal. No need to spam the ring buffer with this info. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 04:32:21 +00:00
Jeff Layton	68abaffa6b	cifs: simplify SMB header check routine ...just cleanup. There should be no behavior change. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 04:30:37 +00:00
Jeff Layton	2db7c58155	cifs: send an NT_CANCEL request when a process is signalled Use the new send_nt_cancel function to send an NT_CANCEL when the process is delivered a fatal signal. This is a "best effort" enterprise however, so don't bother to check the return code. There's nothing we can reasonably do if it fails anyway. Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 04:24:38 +00:00
Jeff Layton	1be912dde7	cifs: handle cancelled requests better Currently, when a request is cancelled via signal, we delete the mid immediately. If the request was already transmitted however, the client is still likely to receive a response. When it does, it won't recognize it however and will pop a printk. It's also a little dangerous to just delete the mid entry like this. We may end up reusing that mid. If we do then we could potentially get the response from the first request confused with the later one. Prevent the reuse of mids by marking them as cancelled and keeping them on the pending_mid_q list. If the reply comes in, we'll delete it from the list then. If it never comes, then we'll delete it at reconnect or when cifsd comes down. Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 04:23:31 +00:00
Steve French	58b8a5b45a	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6	2011-01-31 04:17:03 +00:00
Jeff Layton	ffeb414a59	cifs: fix two compiler warning about uninitialized vars fs/cifs/link.c: In function ‘symlink_hash’: fs/cifs/link.c:58:3: warning: ‘rc’ may be used uninitialized in this function [-Wuninitialized] fs/cifs/smbencrypt.c: In function ‘mdfour’: fs/cifs/smbencrypt.c:61:3: warning: ‘rc’ may be used uninitialized in this function [-Wuninitialized] Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 03:15:57 +00:00
Anton Altaparmakov	af5eb745ef	NTFS: Fix invalid pointer dereference in ntfs_mft_record_alloc(). In ntfs_mft_record_alloc() when mapping the new extent mft record with map_extent_mft_record() we overwrite @m with the return value and on error, we then try to use the old @m but that is no longer there as @m now contains an error code instead so we crash when dereferencing the error code as if it were a pointer. The simple fix is to use a temporary variable to store the return value thus preserving the original @m for later use. This is a backport from the commercial Tuxera-NTFS driver and is well tested... Thanks go to Julia Lawall for pointing this out (whilst I had fixed it in the commercial driver I had failed to fix it in the Linux kernel). Signed-off-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-31 12:58:11 +10:00
Linus Torvalds	9fbf0c08d4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: More crypto cleanup (try #2) CIFS: Add strictcache mount option CIFS: Implement cifs_strict_writev (try #4) [CIFS] Replace cifs md5 hashing functions with kernel crypto APIs	2011-01-31 12:56:27 +10:00
Josef Bacik	7adf5dfbb3	Btrfs: handle no memory properly in prepare_pages Instead of doing a BUG_ON(1) in prepare_pages if grab_cache_page() fails, just loop through the pages we've already grabbed and unlock and release them, then return -ENOMEM like we should. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:42:34 -05:00
Josef Bacik	ad0397a7a9	Btrfs: do error checking in btrfs_del_csums Got a report of a box panicing because we got a NULL eb in read_extent_buffer. His fs was borked and btrfs_search_path returned EIO, but we don't check for errors so the box paniced. Yes I know this will just make something higher up the stack panic, but that's a problem for future Josef. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:42:34 -05:00
Josef Bacik	68a82277b8	Btrfs: use the global block reserve if we cannot reserve space We call use_block_rsv right before we make an allocation in order to make sure we have enough space. Now normally people have called btrfs_start_transaction() with the appropriate amount of space that we need, so we just use some of that pre-reserved space and move along happily. The problem is where people use btrfs_join_transaction(), which doesn't actually reserve any space. So we try and reserve space here, but we cannot flush delalloc, so this forces us to return -ENOSPC when in reality we have plenty of space. The most common symptom is seeing a bunch of "couldn't dirty inode" messages in syslog. With xfstests 224 we end up falling back to start_transaction and then doing all the flush delalloc stuff which causes to hang for a very long time. So instead steal from the global reserve, which is what this is meant for anyway. With this patch and the other 2 I have sent xfstests 224 now passes successfully. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:37 -05:00
Josef Bacik	e9e22899de	Btrfs: do not release more reserved bytes to the global_block_rsv than we need When we do btrfs_block_rsv_release, if global_block_rsv is not full we will release all the extra bytes to global_block_rsv, even if it's only a little short of the amount of space that we need to reserve. This causes us to starve ourselves of reservable space during the transaction which will force us to shrink delalloc bytes and commit the transaction more often than we should. So instead just add the amount of bytes we need to add to the global reserve so reserved == size, and then add the rest back into the space_info for general use. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:37 -05:00
Josef Bacik	dedefd7215	Btrfs: fix check_path_shared so it returns the right value When running xfstests 224 I kept getting ENOSPC when trying to remove the files, and this is because we were returning ret from check_path_shared while it was uninitalized, which isn't right. Fix this to return 0 properly, and now xfstests 224 doesn't freak out when it tries to clean itself up. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:37 -05:00
Tsutomu Itoh	abd30bb0af	btrfs: check return value of btrfs_start_ioctl_transaction() properly btrfs_start_ioctl_transaction() returns ERR_PTR(), not NULL. So, it is necessary to use IS_ERR() to check the return value. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:37 -05:00
Tsutomu Itoh	3612b49598	btrfs: fix return value check of btrfs_join_transaction() The error check of btrfs_join_transaction()/btrfs_join_transaction_nolock() is added, and the mistake of the error check in several places is corrected. For more stable Btrfs, I think that we should reduce BUG_ON(). But, I think that long time is necessary for this. So, I propose this patch as a short-term solution. With this patch: - To more stable Btrfs, the part that should be corrected is clarified. - The panic isn't done by the NULL pointer reference etc. (even if BUG_ON() is increased temporarily) - The error code is returned in the place where the error can be easily returned. As a long-term plan: - BUG_ON() is reduced by using the forced-readonly framework, etc. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:37 -05:00
Julia Lawall	34d19bada0	fs/btrfs/inode.c: Add missing IS_ERR test After the conditional that precedes the following code, inode may be an ERR_PTR value. This can eg result from a memory allocation failure via the call to btrfs_iget, and thus does not imply that root is different than sub_root. Thus, an IS_ERR check is added to ensure that there is no dereference of inode in this case. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ identifier f; @@ f(...) { ... return ERR_PTR(...); } @@ identifier r.f, fld; expression x; statement S1,S2; @@ x = f(...) ... when != IS_ERR(x) ( if (IS_ERR(x) \|\|...) S1 else S2 \| *x->fld ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:37 -05:00
liubo	333e810544	btrfs: fix missing break in switch phrase There is a missing break in switch, fix it. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:37 -05:00
liubo	2a29edc6b6	btrfs: fix several uncheck memory allocations To make btrfs more stable, add several missing necessary memory allocation checks, and when no memory, return proper errno. We've checked that some of those -ENOMEM errors will be returned to userspace, and some will be catched by BUG_ON() in the upper callers, and none will be ignored silently. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:36 -05:00
liubo	6b82ce8d82	btrfs: fix uncheck memory allocation in btrfs_submit_compressed_read btrfs_submit_compressed_read() is lack of memory allocation checks and corresponding error route. After this fix, if it comes to "no memory" case, errno will be returned to userland step by step, and tell users this operation cannot go on. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:36 -05:00
Chris Mason	eab49bec41	Merge branch 'bug-fixes' of git://repo.or.cz/linux-btrfs-devel into btrfs-38	2011-01-28 16:24:59 -05:00
Chuck Lever	d1205f87bb	NFS: NFSv4 readdir loses entries On recent 2.6.38-rc kernels, connectathon basic test 6 fails on NFSv4 mounts of OpenSolaris with something like: > ./test6: readdir > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.12' dir entry, pass 0 > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.82' dir entry, pass 0 > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.164' dir entry, pass 0 > ./test6: (/mnt/klimt/matisse.test) Test failed with 3 errors > basic tests failed > Tests failed, leaving /mnt/klimt mounted > [cel@matisse cthon04]$ I narrowed the problem down to nfs4_decode_dirent() reporting that the decode buffer had overflowed while decoding the entries for those missing files. verify_attr_len() assumes both it's pointer arguments reside on the same page. When these arguments point to locations on two different pages, verify_attr_len() can report false errors. This can happen now that a large NFSv4 readdir result can span pages. We have reasonably good checking in nfs4_decode_dirent() anyway, so it should be safe to simply remove the extra checking. At a guess, this was introduced by commit `6650239a`, "NFS: Don't use vm_map_ram() in readdir". Cc: stable@kernel.org [2.6.37] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-28 13:41:35 -05:00
Chuck Lever	c08e76d0cd	NFS: Micro-optimize nfs4_decode_dirent() Make the decoding of NFSv4 directory entries slightly more efficient by: 1. Avoiding unnecessary byte swapping when checking XDR booleans, and 2. Not bumping "p" when its value will be immediately replaced by xdr_inline_decode() This commit makes nfs4_decode_dirent() consistent with similar logic in the other two decode_dirent() functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-28 13:37:35 -05:00
Trond Myklebust	e00b8a2404	NFS: Fix an NFS client lockdep issue There is no reason to be freeing the delegation cred in the rcu callback, and doing so is resulting in a lockdep complaint that rpc_credcache_lock is being called from both softirq and non-softirq contexts. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2011-01-28 13:37:09 -05:00
bpm@sgi.com	24446fc66f	xfs: xfs_bmap_add_extent_delay_real should init br_startblock When filling in the middle of a previous delayed allocation in xfs_bmap_add_extent_delay_real, set br_startblock of the new delay extent to the right to nullstartblock instead of 0 before inserting the extent into the ifork (xfs_iext_insert), rather than setting br_startblock afterward. Adding the extent into the ifork with br_startblock=0 can lead to the extent being copied into the btree by xfs_bmap_extent_to_btree if we happen to convert from extents format to btree format before updating br_startblock with the correct value. The unexpected addition of this delay extent to the btree can cause subsequent XFS_WANT_CORRUPTED_GOTO filesystem shutdown in several xfs_bmap_add_extent_delay_real cases where we are converting a delay extent to real and unexpectedly find an extent already inserted. For example: 911 case BMAP_LEFT_FILLING: 912 /* 913 * Filling in the first part of a previous delayed allocation. 914 * The left neighbor is not contiguous. 915 */ 916 trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); 917 xfs_bmbt_set_startoff(ep, new_endoff); 918 temp = PREV.br_blockcount - new->br_blockcount; 919 xfs_bmbt_set_blockcount(ep, temp); 920 xfs_iext_insert(ip, idx, 1, new, state); 921 ip->i_df.if_lastex = idx; 922 ip->i_d.di_nextents++; 923 if (cur == NULL) 924 rval = XFS_ILOG_CORE \| XFS_ILOG_DEXT; 925 else { 926 rval = XFS_ILOG_CORE; 927 if ((error = xfs_bmbt_lookup_eq(cur, new->br_startoff, 928 new->br_startblock, new->br_blockcount, 929 &i))) 930 goto done; 931 XFS_WANT_CORRUPTED_GOTO(i == 0, done); With the bogus extent in the btree we shutdown the filesystem at 931. The conversion from extents to btree format happens when the number of extents in the inode increases above ip->i_df.if_ext_max. xfs_bmap_extent_to_btree copies extents from the ifork into the btree, ignoring all delalloc extents which are denoted by br_startblock having some value of nullstartblock. SGI-PV: 1013221 Signed-off-by: Ben Myers <bpm@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-01-28 09:13:29 -06:00
Dave Chinner	0fbca4d1c3	xfs: fix dquot shaker deadlock Commit `368e136` ("xfs: remove duplicate code from dquot reclaim") fails to unlock the dquot freelist when the number of loop restarts is exceeded in xfs_qm_dqreclaim_one(). This causes hangs in memory reclaim. Rework the loop control logic into an unwind stack that all the different cases jump into. This means there is only one set of code that processes the loop exit criteria, and simplifies the unlocking of all the items from different points in the loop. It also fixes a double increment of the restart counter from the qi_dqlist_lock case. Reported-by: Malcolm Scott <lkml@malc.org.uk> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-01-28 09:05:36 -06:00
Dave Chinner	c6f990d1ff	xfs: handle CIl transaction commit failures correctly Failure to commit a transaction into the CIL is not handled correctly. This currently can only happen when racing with a shutdown and requires an explicit shutdown check, so it rare and can be avoided. Remove the shutdown check and make the CIL commit a void function to indicate it will always succeed, thereby removing the incorrectly handled failure case. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-01-28 09:05:36 -06:00
Dave Chinner	5315837dae	xfs: limit extsize to size of AGs and/or MAXEXTLEN The extent size hint can be set to larger than an AG. This means that the alignment process can push the range to be allocated outside the bounds of the AG, resulting in assert failures or corrupted bmbt records. Similarly, if the extsize is larger than the maximum extent size supported, the alignment process will produce extents that are too large to fit into the bmbt records, resulting in a different type of assert/corruption failure. Fix this by limiting extsize at the time іt is set firstly to be less than MAXEXTLEN, then to be a maximum of half the size of the AGs in the filesystem for non-realtime inodes. Realtime inodes do not allocate out of AGs, so don't have to be restricted by the size of AGs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-01-28 09:05:36 -06:00
Dave Chinner	4ce159890c	xfs: prevent extsize alignment from exceeding maximum extent size When doing delayed allocation, if the allocation size is for a maximally sized extent, extent size alignment can push it over this limit. This results in an assert failure in xfs_bmbt_set_allf() as the extent length is too large to find in the extent record. Fix this by ensuring that we allow for space that extent size alignment requires (up to 2 * (extsize -1) blocks as we have to handle both head and tail alignment) when limiting the maximum size of the extent. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-01-28 09:05:36 -06:00
Dave Chinner	14b064ceaa	xfs: limit extent length for allocation to AG size Delayed allocation extents can be larger than AGs, so when trying to convert a large range we may scan every AG inside xfs_bmap_alloc_nullfb() trying to find an AG with a size larger than an AG. We should stop when we find the first AG with a maximum possible allocation size. This causes excessive CPU usage when there are lots of AGs. The same problem occurs when doing preallocation of a range larger than an AG. Fix the problem by limiting real allocation lengths to the maximum that an AG can support. This means if we have empty AGs, we'll stop the search at the first of them. If there are no empty AGs, we'll still scan them all, but that is a different problem.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-01-28 09:05:35 -06:00
Dave Chinner	b8fc82630a	xfs: speculative delayed allocation uses rounddown_power_of_2 badly rounddown_power_of_2() returns an undefined result when passed a value of zero. The specualtive delayed allocation code is doing this when the inode is zero length. Hence occasionally the preallocation is much, much larger than is necessary (e.g. 8GB for a 270 _byte_ file). Ensure we don't even pass a zero value to this function so the result of preallocation is always the desired size. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-01-28 09:05:35 -06:00
Dave Chinner	e34a314c5e	xfs: fix efi item leak on forced shutdown After test 139, kmemleak shows: unreferenced object 0xffff880078b405d8 (size 400): comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s) hex dump (first 32 bytes): 60 c1 17 79 00 88 ff ff 60 c1 17 79 00 88 ff ff `..y....`..y.... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60 [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0 [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0 [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50 [<ffffffff8147cd6b>] xfs_efi_init+0x4b/0xb0 [<ffffffff814a4ee8>] xfs_trans_get_efi+0x58/0x90 [<ffffffff81455fab>] xfs_bmap_finish+0x8b/0x1d0 [<ffffffff814851b4>] xfs_itruncate_finish+0x2c4/0x5d0 [<ffffffff814a970f>] xfs_setattr+0x8df/0xa70 [<ffffffff814b5c7b>] xfs_vn_setattr+0x1b/0x20 [<ffffffff8117dc00>] notify_change+0x170/0x2e0 [<ffffffff81163bf6>] do_truncate+0x66/0xa0 [<ffffffff81163d0b>] sys_ftruncate+0xdb/0xe0 [<ffffffff8103a002>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff The cause of the leak is that the "remove" parameter of IOP_UNPIN() is never set when a CIL push is aborted. This means that the EFI item is never freed if it was in the push being cancelled. The problem is specific to delayed logging, but has uncovered a couple of problems with the handling of IOP_UNPIN(remove). Firstly, we cannot safely call xfs_trans_del_item() from IOP_UNPIN() in the CIL commit failure path or the iclog write failure path because for delayed loging we have no transaction context. Hence we must only call xfs_trans_del_item() if the log item being unpinned has an active log item descriptor. Secondly, xfs_trans_uncommit() does not handle log item descriptor freeing during the traversal of log items on a transaction. It can reference a freed log item descriptor when unpinning an EFI item. Hence it needs to use a safe list traversal method to allow items to be removed from the transaction during IOP_UNPIN(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-01-28 09:01:33 -06:00
Linus Torvalds	b12ece7d85	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: avoid picking MDS that is not active ceph: avoid immediate cap check after import ceph: fix flushing of caps vs cap import ceph: fix erroneous cap flush to non-auth mds ceph: fix cap_wanted_delay_{min,max} mount option initialization ceph: fix xattr rbtree search ceph: fix getattr on directory when using norbytes	2011-01-28 12:12:58 +10:00
Shirish Pargaonkar	ee2c925850	cifs: More crypto cleanup (try #2 ) Replaced md4 hashing function local to cifs module with kernel crypto APIs. As a result, md4 hashing function and its supporting functions in file md4.c are not needed anymore. Cleaned up function declarations, removed forward function declarations, and removed a header file that is being deleted from being included. Verified that sec=ntlm/i, sec=ntlmv2/i, and sec=ntlmssp/i work correctly. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-27 19:58:13 +00:00
Dave Chinner	7db37c5e65	xfs: fix log ticket leak on forced shutdown. The kmemleak detector shows this after test 139: unreferenced object 0xffff880079b88bb0 (size 264): comm "xfs_io", pid 4904, jiffies 4294909382 (age 276.824s) hex dump (first 32 bytes): 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N.......... ff ff ff ff ff ff ff ff 48 7b c9 82 ff ff ff ff ........H{...... backtrace: [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60 [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0 [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0 [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50 [<ffffffff8148f394>] xlog_ticket_alloc+0x34/0x170 [<ffffffff81494444>] xlog_cil_push+0xa4/0x3f0 [<ffffffff81494eca>] xlog_cil_force_lsn+0x15a/0x160 [<ffffffff814933a5>] _xfs_log_force_lsn+0x75/0x2d0 [<ffffffff814a264d>] _xfs_trans_commit+0x2bd/0x2f0 [<ffffffff8148bfdd>] xfs_iomap_write_allocate+0x1ad/0x350 [<ffffffff814ac17f>] xfs_map_blocks+0x21f/0x370 [<ffffffff814ad1b7>] xfs_vm_writepage+0x1c7/0x550 [<ffffffff8112200a>] __writepage+0x1a/0x50 [<ffffffff81122df2>] write_cache_pages+0x1c2/0x4c0 [<ffffffff81123117>] generic_writepages+0x27/0x30 [<ffffffff814aba5d>] xfs_vm_writepages+0x5d/0x80 By inspection, the leak occurs when xlog_write() returns and error and we jump to the abort path without dropping the reference on the active ticket. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-01-27 12:02:00 +11:00
Li Zefan	4d728ec7ae	Btrfs: Fix file clone when source offset is not 0 Suppose: - the source extent is: [0, 100] - the src offset is 10 - the clone length is 90 - the dest offset is 0 This statement: new_key.offset = key.offset + destoff - off will produce such an extent for the dest file: [ino, BTRFS_EXTENT_DATA_KEY, -10] , which is obviously wrong. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:11:18 +08:00
Miao Xie	b897abec03	Btrfs: Fix memory leak in writepage fixup work fixup, which is allocated when starting page write to fix up the extent without ORDERED bit set, should be freed after this work is done. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:10:30 +08:00
Miao Xie	d0f69686c2	Btrfs: Don't return acl info when mounting with noacl option Steps to reproduce: # mkfs.btrfs /dev/sda2 # mount /dev/sda2 /mnt # touch /mnt/file0 # setfacl -m 'u:root:x,g::x,o::x' /mnt/file0 # umount /mnt # mount /dev/sda2 -o noacl /mnt # getfacl /mnt/file0 ... user::rw- user:root:--x group::--x mask::--x other::--x The output should be: user::rw- group::--x other::--x Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:05:16 +08:00
Tero Roponen	3f3d0bc0df	Btrfs: Free correct pointer after using strsep We must save and free the original kstrdup()'ed pointer because strsep() modifies its first argument. Signed-off-by: Tero Roponen <tero.roponen@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:05:11 +08:00
Ian Kent	bdc924bb4c	Btrfs: Fix memory leak on finding existing super We missed a memory deallocation in commit `450ba0ea`. If an existing super block is found at mount and there is no error condition then the pre-allocated tree_root and fs_info are no not used and are not freeded. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:05:07 +08:00
Li Zefan	83a4d54840	Btrfs: Fix memory leak at umount fs_info, which is allocated in open_ctree(), should be freed in close_ctree(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:05:02 +08:00
Li Zefan	f333adb5d6	btrfs: Check mergeable free space when removing a cluster After returing extents from a cluster to the block group, some extents in the block group may be mergeable. Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:04:57 +08:00
Li Zefan	120d66eec0	btrfs: Add a helper try_merge_free_space() When adding a new extent, we'll firstly see if we can merge this extent to the left or/and right extent. Extract this as a helper try_merge_free_space(). As a side effect, we fix a small bug that if the new extent has non-bitmap left entry but is unmergeble, we'll directly link the extent without trying to drop it into bitmap. This also prepares for the next patch. Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:04:50 +08:00

1 2 3 4 5 ...

21422 Commits