Commit Graph

5976 Commits

Author SHA1 Message Date
Colin Ian King
f6870cc9a3 rbd: fix spelling mistake: "reregisteration" -> "reregistration"
Trivial fix to spelling mistake in rdb_warn message text.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:49 +02:00
Ilya Dryomov
dd4358550f rbd: get the latest osdmap when using an existing client
Currently we request the latest osdmap only if ceph_pg_poolid_by_name()
fails with -ENOENT.  This is effective with newly created pools, but we
also want to avoid attempting to map from pools that were recently
deleted and report "pool does not exist" instead.  (Such an attempt
eventually fails in the OSD client after map check code kicks in, but
the error message is confusing.)

Request the latest osdmap unconditionally after bumping a ref on an
existing client in rbd_client_find().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:46 +02:00
Ilya Dryomov
5feb0d8d2f rbd: move rbd_get_client() below rbd_put_client()
... to avoid a forward declaration in the next commit.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:46 +02:00
Ilya Dryomov
0a4a1e68d8 rbd: remove redundant declaration of rbd_spec_put()
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:46 +02:00
Ilya Dryomov
b133185217 rbd: allow "fancy" striping
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jason Dillaman <dillaman@redhat.com>
2018-04-02 10:12:44 +02:00
Ilya Dryomov
afb978884c rbd: introduce OWN_BVECS data type
If the layout is "fancy", we need to be able to rearrange the provided
bio_vecs in stripe unit chunks to make it possible for the messenger to
read/write directly from/to the provided data buffer, without employing
a temporary data buffer for assembling the result.

Higher level bio_vec arrays are generally immutable, so this requires
copying into a private array.  Only the bio_vecs themselves are shuffled
around, not the actual data.  OWN_BVECS doesn't own any pages.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:44 +02:00
Ilya Dryomov
e93aca0abb rbd: remove rbd_parent_request_{create,destroy}()
rbd_parent_request_create() takes a ref on obj_req for child_img_req.
There is no point in doing that because child_img_req is created on
behalf of obj_req -- obj_req is the initiator and can't be completed
before child_img_req.

Open-code the rest of rbd_parent_request_create() and remove it along
with rbd_parent_request_destroy().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:44 +02:00
Ilya Dryomov
dfd9875f11 rbd: get rid of img_req->{offset,length}
These are set, but no longer used.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:44 +02:00
Ilya Dryomov
0420c5dd2e rbd: remove rbd_img_request_fill() and helpers
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:43 +02:00
Ilya Dryomov
5a237819aa rbd: switch to common striping framework
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:43 +02:00
Ilya Dryomov
2bb1e56ec6 rbd: create+truncate for whole-object layered discards
A whole-object layered discard is implemented as a truncate rather
than a delete: a dummy object is needed to prevent the CoW machinery
from kicking in.  However, a truncate on a non-existent object is
a no-op.  If the object doesn't exist in HEAD, a discard request is
effectively ignored, which violates our "discard zeroes data" promise
and breaks REQ_OP_WRITE_ZEROES implementation.

A non-exclusive create on an existing object is also a no-op, so the
fix is to do a compound create+truncate instead.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:43 +02:00
Ilya Dryomov
86bd7998fa rbd: move to obj_req->img_extents
In preparation for rbd "fancy" striping, replace obj_req->img_offset
with obj_req->img_extents.  A single starting offset isn't sufficient
because we want only one OSD request per object and will merge adjacent
object extents in ceph_file_to_extents().  The final object extent may
map into multiple different byte ranges in the image.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:43 +02:00
Ilya Dryomov
43df3d35c0 rbd: incorporate ceph_object_extent
obj_req->object_no -> obj_req->ex.oe_objno
obj_req->offset -> obj_req->ex.oe_off
obj_req->length -> obj_req->ex.oe_len

... and use ex for linking object requests to image requests.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:43 +02:00
Ilya Dryomov
ecc633caeb rbd: store data_type in img_req instead of obj_req
All object requests are associated with an image request now -- avoid
duplicating the same info in each object request.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:42 +02:00
Ilya Dryomov
0be2d60ed8 rbd: remove obj_req->flags field
There are no standalone (!IMG_DATA) object requests anymore.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:42 +02:00
Ilya Dryomov
15961b4494 rbd: remove old request completion code
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:42 +02:00
Ilya Dryomov
7114edac35 rbd: new request completion code
Do away with partial request completions and all the associated
complexity.  Individual object requests no longer need to be completed
in order -- when the last one becomes ready, we complete the entire
higher level request all at once.

This also wraps up the conversion to a state machine model and
eliminates the recursion described in commit 6d69bb536b ("rbd:
prevent kernel stack blow up on rbd map").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:41 +02:00
Ilya Dryomov
efbd1a1106 rbd: update rbd_img_request_submit() signature
It should be void now.  Also, object requests are unlinked only in
image request destructor, which can't run before rbd_img_request_put(),
so no need for _safe.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:41 +02:00
Ilya Dryomov
9bb0248d9e rbd: add img_req->op_type field
Store op_type in its own field instead of packing it into flags.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:41 +02:00
Ilya Dryomov
a162b308dc rbd: simplify rbd_osd_req_create()
No need to pass rbd_dev and op_type to rbd_osd_req_create(): there are
no standalone (!IMG_DATA) object requests anymore and osd_req->r_flags
can be set in rbd_osd_req_format_{read,write}().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:40 +02:00
Ilya Dryomov
51c3509e5e rbd: remove old request handling code
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:40 +02:00
Ilya Dryomov
3da691bf43 rbd: new request handling code
The notable changes are:

- instead of explicitly stat'ing the object to see if it exists before
  issuing the write, send the write optimistically along with the stat
  in a single OSD request
- zero copyup optimization
- all object requests are associated with an image request and have
  a valid ->img_request pointer; there are no standalone (!IMG_DATA)
  object requests anymore
- code is structured as a state machine (vs a bunch of callbacks with
  implicit state)

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:40 +02:00
Ilya Dryomov
7e07efb12d rbd: move from raw pages to bvec data descriptors
In preparation for rbd "fancy" striping which requires bio_vec arrays,
wire up BVECS data type and kill off PAGES data type.  There is nothing
wrong with using page vectors for copyup requests -- it's just less
iterator boilerplate code to write for the new striping framework.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02 10:12:39 +02:00
Ilya Dryomov
f9dcbc44cd rbd: get rid of img_req->copyup_pages
The initiating object request is the proper owner -- save a bit of
space.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02 10:12:39 +02:00
Ilya Dryomov
06fbb69935 rbd: don't (ab)use obj_req->pages for stat requests
obj_req->pages is for provided data buffers.  stat requests are
internal and should be NODATA.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02 10:12:39 +02:00
Ilya Dryomov
df6ba7015d rbd: remove bio cloning helpers
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02 10:12:39 +02:00
Ilya Dryomov
5359a17d27 libceph, rbd: new bio handling code (aka don't clone bios)
The reason we clone bios is to be able to give each object request
(and consequently each ceph_osd_data/ceph_msg_data item) its own
pointer to a (list of) bio(s).  The messenger then initializes its
cursor with cloned bio's ->bi_iter, so it knows where to start reading
from/writing to.  That's all the cloned bios are used for: to determine
each object request's starting position in the provided data buffer.

Introduce ceph_bio_iter to do exactly that -- store position within bio
list (i.e. pointer to bio) + position within that bio (i.e. bvec_iter).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:38 +02:00
Ilya Dryomov
a1fbb5e7bb rbd: start enums at 1 instead of 0
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:38 +02:00
Ilya Dryomov
24f1df60ce rbd: set max_segment_size to UINT_MAX
Commit 21acdf45f4 ("rbd: set max_segments to USHRT_MAX") removed the
limit on max_segments.  Remove the limit on max_segment_size as well.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02 10:12:38 +02:00
Omar Sandoval
3148ffbdb9 loop: use killable lock in ioctls
Even after the previous patch to drop lo_ctl_mutex while calling
vfs_getattr(), there are other cases where we can end up sleeping for a
long time while holding lo_ctl_mutex. Let's avoid the uninterruptible
sleep from the ioctls.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-27 14:21:12 -06:00
Omar Sandoval
2d1d4c1e59 loop: don't call into filesystem while holding lo_ctl_mutex
We hit an issue where a loop device on NFS was stuck in
loop_get_status() doing vfs_getattr() after the NFS server died, which
caused a pile-up of uninterruptible processes waiting on lo_ctl_mutex.
There's no reason to hold this lock while we wait on the filesystem;
let's drop it so that other processes can do their thing. We need to
grab a reference on lo_backing_file while we use it, and we can get rid
of the check on lo_device, which has been unnecessary since commit
a34c0ae9ebd6 ("[PATCH] loop: remove the bio remapping capability") in
the linux-history tree.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-27 14:21:11 -06:00
Bart Van Assche
233bde21aa block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h>
It happens often while I'm preparing a patch for a block driver that
I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT
available for this driver? Do I have to introduce definitions of these
constants before I can use these constants? To avoid this confusion,
move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the
<linux/blkdev.h> header file such that these become available for all
block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h
header file conditional to avoid that including that header file after
<linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE
redefinition.

Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have
not been removed from uapi header files nor from NAND drivers in
which these constants are used for another purpose than converting
block layer offsets and sizes into a number of sectors.

Cc: David S. Miller <davem@davemloft.net>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-17 14:45:23 -06:00
Ross Zwisler
1d037577c3 loop: Fix lost writes caused by missing flag
The following commit:

commit aa4d86163e ("block: loop: switch to VFS ITER_BVEC")

replaced __do_lo_send_write(), which used ITER_KVEC iterators, with
lo_write_bvec() which uses ITER_BVEC iterators.  In this change, though,
the WRITE flag was lost:

-       iov_iter_kvec(&from, ITER_KVEC | WRITE, &kvec, 1, len);
+       iov_iter_bvec(&i, ITER_BVEC, bvec, 1, bvec->bv_len);

This flag is necessary for the DAX case because we make decisions based on
whether or not the iterator is a READ or a WRITE in dax_iomap_actor() and
in dax_iomap_rw().

We end up going through this path in configurations where we combine a PMEM
device with 4k sectors, a loopback device and DAX.  The consequence of this
missed flag is that what we intend as a write actually turns into a read in
the DAX code, so no data is ever written.

The very simplest test case is to create a loopback device and try and
write a small string to it, then hexdump a few bytes of the device to see
if the write took.  Without this patch you read back all zeros, with this
you read back the string you wrote.

For XFS this causes us to fail or panic during the following xfstests:

	xfs/074 xfs/078 xfs/216 xfs/217 xfs/250

For ext4 we have a similar issue where writes never happen, but we don't
currently have any xfstests that use loopback and show this issue.

Fix this by restoring the WRITE flag argument to iov_iter_bvec().  This
causes the xfstests to all pass.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Fixes: commit aa4d86163e ("block: loop: switch to VFS ITER_BVEC")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-09 08:36:36 -07:00
Maurizio Lombardi
2bbea6e117 cdrom: do not call check_disk_change() inside cdrom_open()
when mounting an ISO filesystem sometimes (very rarely)
the system hangs because of a race condition between two tasks.

PID: 6766   TASK: ffff88007b2a6dd0  CPU: 0   COMMAND: "mount"
 #0 [ffff880078447ae0] __schedule at ffffffff8168d605
 #1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
 #2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
 #3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
 #4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
 #5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
 #6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
 #7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
 #8 [ffff880078447da8] mount_bdev at ffffffff81202570
 #9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
#10 [ffff880078447e28] mount_fs at ffffffff81202d09
#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
#12 [ffff880078447ea8] do_mount at ffffffff81220fee
#13 [ffff880078447f28] sys_mount at ffffffff812218d6
#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007fd9ea914e9a  RSP: 00007ffd5d9bf648  RFLAGS: 00010246
    RAX: 00000000000000a5  RBX: ffffffff81698c49  RCX: 0000000000000010
    RDX: 00007fd9ec2bc210  RSI: 00007fd9ec2bc290  RDI: 00007fd9ec2bcf30
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000010
    R10: 00000000c0ed0001  R11: 0000000000000206  R12: 00007fd9ec2bc040
    R13: 00007fd9eb6b2380  R14: 00007fd9ec2bc210  R15: 00007fd9ec2bcf30
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

This task was trying to mount the cdrom.  It allocated and configured a
super_block struct and owned the write-lock for the super_block->s_umount
rwsem. While exclusively owning the s_umount lock, it called
sr_block_ioctl and waited to acquire the global sr_mutex lock.

PID: 6785   TASK: ffff880078720fb0  CPU: 0   COMMAND: "systemd-udevd"
 #0 [ffff880078417898] __schedule at ffffffff8168d605
 #1 [ffff880078417900] schedule at ffffffff8168dc59
 #2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
 #3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
 #4 [ffff8800784179d0] down_read at ffffffff8168cde0
 #5 [ffff8800784179e8] get_super at ffffffff81201cc7
 #6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
 #7 [ffff880078417a40] flush_disk at ffffffff8123a94b
 #8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
 #9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
#16 [ffff880078417d00] do_last at ffffffff8120d53d
#17 [ffff880078417db0] path_openat at ffffffff8120e6b2
#18 [ffff880078417e48] do_filp_open at ffffffff8121082b
#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
#20 [ffff880078417f70] sys_open at ffffffff811fde4e
#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007f29438b0c20  RSP: 00007ffc76624b78  RFLAGS: 00010246
    RAX: 0000000000000002  RBX: ffffffff81698c49  RCX: 0000000000000000
    RDX: 00007f2944a5fa70  RSI: 00000000000a0800  RDI: 00007f2944a5fa70
    RBP: 00007f2944a5f540   R8: 0000000000000000   R9: 0000000000000020
    R10: 00007f2943614c40  R11: 0000000000000246  R12: ffffffff811fde4e
    R13: ffff880078417f78  R14: 000000000000000c  R15: 00007f2944a4b010
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

This task tried to open the cdrom device, the sr_block_open function
acquired the global sr_mutex lock. The call to check_disk_change()
then saw an event flag indicating a possible media change and tried
to flush any cached data for the device.
As part of the flush, it tried to acquire the super_block->s_umount
lock associated with the cdrom device.
This was the same super_block as created and locked by the previous task.

The first task acquires the s_umount lock and then the sr_mutex_lock;
the second task acquires the sr_mutex_lock and then the s_umount lock.

This patch fixes the issue by moving check_disk_change() out of
cdrom_open() and let the caller take care of it.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-09 08:06:35 -07:00
Bart Van Assche
8b904b5b6b block: Use blk_queue_flag_*() in drivers instead of queue_flag_*()
This patch has been generated as follows:

for verb in set_unlocked clear_unlocked set clear; do
  replace-in-files queue_flag_${verb} blk_queue_flag_${verb%_unlocked} \
    $(git grep -lw queue_flag_${verb} drivers block/bsg*)
done

Except for protecting all queue flag changes with the queue lock
this patch does not change any functionality.

Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-08 14:13:48 -07:00
Bart Van Assche
4e699cb99d mtip32xx: Use the blk_queue_flag_*() functions
Use the blk_queue_flag_*() functions instead of open-coding these.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-08 14:13:48 -07:00
Jens Axboe
9296080a18 Merge branch 'stable/for-jens-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus
Pull a xen_blkfront fix from Konrad:

"It has one simple fix for the multi-queue support not showing up after
a block device was detached/re-attached."

* 'stable/for-jens-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen-blkfront: move negotiate_mq to cover all cases of new VBDs
2018-03-08 09:24:52 -07:00
Bhavesh Davda
7ed8ce1c5f xen-blkfront: move negotiate_mq to cover all cases of new VBDs
negotiate_mq should happen in all cases of a new VBD being discovered by
xen-blkfront, whether called through _probe() or a hot-attached new VBD
from dom-0 via xenstore. Otherwise, hot-attached new VBDs are left
configured without multi-queue.

Signed-off-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-03-07 15:51:24 -05:00
Ming Lei
66231ad3e2 block: null_blk: fix 'Invalid parameters' when loading module
On ARM64, the default page size has been 64K on some distributions, and
we should allow ARM64 people to play null_blk.

This patch fixes the issue by extend page bitmap size for supporting
other non-4KB PAGE_SIZE.

Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Kyungchan Koh <kkc6196@fb.com>,
Cc: weiping zhang <zhangweiping@didichuxing.com>
Cc: Yi Zhang <yi.zhang@redhat.com>
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-06 10:35:09 -07:00
Jiufei Xue
158e61865a block: fix a typo
Fix a typo in pkt_start_recovery.

Fixes: 74d46992e0 ("block: replace bi_bdev with a gendisk pointer and partitions index")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-01 08:41:27 -07:00
Bart Van Assche
498f6650ae block: Fix a race between the cgroup code and request queue initialization
Initialize the request queue lock earlier such that the following
race can no longer occur:

blk_init_queue_node()             blkcg_print_blkgs()
  blk_alloc_queue_node (1)
    q->queue_lock = &q->__queue_lock (2)
    blkcg_init_queue(q) (3)
                                    spin_lock_irq(blkg->q->queue_lock) (4)
  q->queue_lock = lock (5)
                                    spin_unlock_irq(blkg->q->queue_lock) (6)

(1) allocate an uninitialized queue;
(2) initialize queue_lock to its default internal lock;
(3) initialize blkcg part of request queue, which will create blkg and
    then insert it to blkg_list;
(4) traverse blkg_list and find the created blkg, and then take its
    queue lock, here it is the default *internal lock*;
(5) *race window*, now queue_lock is overridden with *driver specified
    lock*;
(6) now unlock *driver specified lock*, not the locked *internal lock*,
    unlock balance breaks.

The changes in this patch are as follows:
- Move the .queue_lock initialization from blk_init_queue_node() into
  blk_alloc_queue_node().
- Only override the .queue_lock pointer for legacy queues because it
  is not useful for blk-mq queues to override this pointer.
- For all all block drivers that initialize .queue_lock explicitly,
  change the blk_alloc_queue() call in the driver into a
  blk_alloc_queue_node() call and remove the explicit .queue_lock
  initialization. Additionally, initialize the spin lock that will
  be used as queue lock earlier if necessary.

Reported-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-28 12:23:35 -07:00
Bart Van Assche
5ee0524ba1 block: Add 'lock' as third argument to blk_alloc_queue_node()
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-28 12:23:35 -07:00
Bart Van Assche
392db38058 zram: Delete gendisk before cleaning up the request queue
Remove the disk, partition and bdi sysfs attributes before cleaning up
the request queue associated with the disk.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-28 12:23:35 -07:00
Bart Van Assche
0fa8ebdd42 block/loop: Delete gendisk before cleaning up the request queue
Remove the disk, partition and bdi sysfs attributes before cleaning up
the request queue associated with the disk.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-28 12:23:35 -07:00
Jens Axboe
24941b90e6 null_blk: add 'requeue' fault attribute
Similarly to the support we have for testing/faking timeouts for
null_blk, this adds support for triggering a requeue condition.
Considering the issues around restart we've been seeing, this should be
a useful addition to the testing arsenal to ensure that we are handling
requeue conditions correctly.

This works for queue mode 1 (legacy request_fn based path) and 2 (blk-mq
path), as there's no good way to do requeue with a bio based driver.
This is similar to the timeout path. For the blk-mq path, we alternate
between passing back BLK_STS_RESOURCE and manually calling
blk_mq_requeue_request() in the driver. The former will hit the core
requeue path, while the latter exercises the IO scheduler requeue
path.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-28 12:23:35 -07:00
Gustavo A. R. Silva
0979962f54 nbd: fix return value in error handling path
It seems that the proper value to return in this particular case is the
one contained into variable new_index instead of ret.

Addresses-Coverity-ID: 1465148 ("Copy-paste error")
Fixes: e46c7287b1 ("nbd: add a basic netlink interface")
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-27 15:51:37 -07:00
Jan Kara
3079c22ea8 genhd: Rename get_disk() to get_disk_and_module()
Rename get_disk() to get_disk_and_module() to make sure what the
function does. It's not a great name but at least it is now clear that
put_disk() is not it's counterpart.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-26 09:48:42 -07:00
Linus Torvalds
9e95dae76b Things have been very quiet on the rbd side, as work continues on the
big ticket items slated for the next merge window.
 
 On the CephFS side we have a large number of cap handling improvements,
 a fix for our long-standing abuse of ->journal_info in ceph_readpages()
 and yet another dentry pointer management patch.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJafGqnAAoJEEp/3jgCEfOLjNcH/R6G/xyytDMfxaN+D8DBqCPF
 IaQM7RtgYJeRzDIXYYCkDEBPYqLcD2fjHLzFotFNLcgLdeUcSOyfg7NuCOWWq7o2
 t4z6Ekyish3GWZLUmlSdPcToQ+xIlMRshU8ZmzCHTCzx8XjO+CAnCADp5dh8OKZx
 mCpRX16sXdc6ozE1hsGKIkUoNrkdj8d3+HseZ2Uxb/4FZBNgH3cmmg7c5y6M+sp6
 wT4NEES3baqq2v5cVfw7T+d4MNgRm4/JC1aBy1JBkQlmVFNGteQTT7yzo0X1AfJ+
 +kcR10ddg0gD4WGYhL+iZlQCfwyMp7vouHQbgTOgt+rDCitjDy5r1BAamtxnZjM=
 =ctaD
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.16-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "Things have been very quiet on the rbd side, as work continues on the
  big ticket items slated for the next merge window.

  On the CephFS side we have a large number of cap handling
  improvements, a fix for our long-standing abuse of ->journal_info in
  ceph_readpages() and yet another dentry pointer management patch"

* tag 'ceph-for-4.16-rc1' of git://github.com/ceph/ceph-client:
  ceph: improving efficiency of syncfs
  libceph: check kstrndup() return value
  ceph: try to allocate enough memory for reserved caps
  ceph: fix race of queuing delayed caps
  ceph: delete unreachable code in ceph_check_caps()
  ceph: limit rate of cap import/export error messages
  ceph: fix incorrect snaprealm when adding caps
  ceph: fix un-balanced fsc->writeback_count update
  ceph: track read contexts in ceph_file_info
  ceph: avoid dereferencing invalid pointer during cached readdir
  ceph: use atomic_t for ceph_inode_info::i_shared_gen
  ceph: cleanup traceless reply handling for rename
  ceph: voluntarily drop Fx cap for readdir request
  ceph: properly drop caps for setattr request
  ceph: voluntarily drop Lx cap for link/rename requests
  ceph: voluntarily drop Ax cap for requests that create new inode
  rbd: whitelist RBD_FEATURE_OPERATIONS feature bit
  rbd: don't NULL out ->obj_request in rbd_img_obj_parent_read_full()
  rbd: use kmem_cache_zalloc() in rbd_img_request_create()
  rbd: obj_request->completion is unused
2018-02-08 11:38:59 -08:00
Linus Torvalds
846ade7dd2 virtio, vhost: fixes, cleanups, features
This includes the disk/cache memory stats for for the virtio balloon,
 as well as multiple fixes and cleanups.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJaempgAAoJECgfDbjSjVRpGOIIAIWiarIFrjjcE+hxcFdKkKvC
 T8YQzfvxHTuBqD8m1jd/9R/U0RHYRM4MX+cg6tVz9J2VVhQ2Hjfrs7HExqoZKul8
 sOOzk0d+ii0iQvlgnmmXamlceizP4IuNZcia7FDAZK0hWfHE84dPrG3/hhWOGruN
 NDR1v7k8GBIMS+7lExQwzmy6gs4zGJftJUF9Fnb4CVT26wWbOKYS2exC8UJerZAE
 2puWx71Fd/C27x/iOsxZOME0tOmzU7MIRksSjDcT7YQLesIuJSbjpQd5yfvXW0qC
 7iEQzueZuYP4VroHvpDbKDxvFsIxhjFGKR4/sSG+a0Zso1ejS9fyuDL9a627PJI=
 =/qQR
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio/vhost updates from Michael Tsirkin:
 "virtio, vhost: fixes, cleanups, features

  This includes the disk/cache memory stats for for the virtio balloon,
  as well as multiple fixes and cleanups"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost: don't hold onto file pointer for VHOST_SET_LOG_FD
  vhost: don't hold onto file pointer for VHOST_SET_VRING_ERR
  vhost: don't hold onto file pointer for VHOST_SET_VRING_CALL
  ringtest: ring.c malloc & memset to calloc
  virtio_vop: don't kfree device on register failure
  virtio_pci: don't kfree device on register failure
  virtio: split device_register into device_initialize and device_add
  vhost: remove unused lock check flag in vhost_dev_cleanup()
  vhost: Remove the unused variable.
  virtio_blk: print capacity at probe time
  virtio: make VIRTIO a menuconfig to ease disabling it all
  virtio/ringtest: virtio_ring: fix up need_event math
  virtio/ringtest: fix up need_event math
  virtio: virtio_mmio: make of_device_ids const.
  firmware: Use PTR_ERR_OR_ZERO()
  virtio-mmio: Use PTR_ERR_OR_ZERO()
  vhost/scsi: Improve a size determination in four functions
  virtio_balloon: include disk/file caches memory statistics
2018-02-08 10:41:00 -08:00
Linus Torvalds
105cf3c8c6 pci-v4.16-changes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJad5lgAAoJEFmIoMA60/r8s2kQAI3PztawDpaCP9Z12pkbBHSt
 Ho0xTyk9rCZi9kQJbNjc+a+QrlA3QmTHXIXerB3LSWoh7M+XhsECjem92eHpgLNS
 JvYPhTfOrCr0vdiAmOz6hD0AqN/psrbfzgiJhSwomsGEFS77k7kERSJckRv81sxb
 Aj5F/WjucAgLorwm4auveAJEQ7atE7/6pkXzoqYm4G6NLOb46jUcRGndrnvXZBlz
 fws8fBM4BHyi7i25CYQl24tFq1CGax1rIPgLg+4KnH76bQk/N6Ju0sGVSzfh+hG8
 SIerK9bJbzGRAuNKoxB3aO1dyzsK3x9WztE2mG98w5trOISPIR1FqnvC/225FWAU
 d6eIXiC7wKnEx+DElNTzCjzfHc7SAJoupO32H7CoiTe5zPUlWlxJ1zLYkK1gt50q
 m8PRBiYTglxyznzrO0drtcdjEzvbdZNRrsYnul4wi1vSHzjk6F6XLtzT10XWM1M1
 1pXLB8384FTj0Hu4bq6Y3Aivkmz0Sf+eQM2NaOwe+Zj7/1VV0d3lvi4LUXkqzLCA
 FoXPJSMxG2Qu+iflCeYRQBJjExaZH3eNLZ3dT6QpcJrjaFVedd9u5DeeFqNL27zV
 bhr8TdqrR4p4rc8EBAGoCapw96IxLZROKB3gxbrZVOpfIZpzthwHbElHX6aqUgF4
 w/EV1JWs36WXWaxFk8wd
 =ttq9
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - skip AER driver error recovery callbacks for correctable errors
   reported via ACPI APEI, as we already do for errors reported via the
   native path (Tyler Baicar)

 - fix DPC shared interrupt handling (Alex Williamson)

 - print full DPC interrupt number (Keith Busch)

 - enable DPC only if AER is available (Keith Busch)

 - simplify DPC code (Bjorn Helgaas)

 - calculate ASPM L1 substate parameter instead of hardcoding it (Bjorn
   Helgaas)

 - enable Latency Tolerance Reporting for ASPM L1 substates (Bjorn
   Helgaas)

 - move ASPM internal interfaces out of public header (Bjorn Helgaas)

 - allow hot-removal of VGA devices (Mika Westerberg)

 - speed up unplug and shutdown by assuming Thunderbolt controllers
   don't support Command Completed events (Lukas Wunner)

 - add AtomicOps support for GPU and Infiniband drivers (Felix Kuehling,
   Jay Cornwall)

 - expose "ari_enabled" in sysfs to help NIC naming (Stuart Hayes)

 - clean up PCI DMA interface usage (Christoph Hellwig)

 - remove PCI pool API (replaced with DMA pool) (Romain Perier)

 - deprecate pci_get_bus_and_slot(), which assumed PCI domain 0 (Sinan
   Kaya)

 - move DT PCI code from drivers/of/ to drivers/pci/ (Rob Herring)

 - add PCI-specific wrappers for dev_info(), etc (Frederick Lawler)

 - remove warnings on sysfs mmap failure (Bjorn Helgaas)

 - quiet ROM validation messages (Alex Deucher)

 - remove redundant memory alloc failure messages (Markus Elfring)

 - fill in types for compile-time VGA and other I/O port resources
   (Bjorn Helgaas)

 - make "pci=pcie_scan_all" work for Root Ports as well as Downstream
   Ports to help AmigaOne X1000 (Bjorn Helgaas)

 - add SPDX tags to all PCI files (Bjorn Helgaas)

 - quirk Marvell 9128 DMA aliases (Alex Williamson)

 - quirk broken INTx disable on Ceton InfiniTV4 (Bjorn Helgaas)

 - fix CONFIG_PCI=n build by adding dummy pci_irqd_intx_xlate() (Niklas
   Cassel)

 - use DMA API to get MSI address for DesignWare IP (Niklas Cassel)

 - fix endpoint-mode DMA mask configuration (Kishon Vijay Abraham I)

 - fix ARTPEC-6 incorrect IS_ERR() usage (Wei Yongjun)

 - add support for ARTPEC-7 SoC (Niklas Cassel)

 - add endpoint-mode support for ARTPEC (Niklas Cassel)

 - add Cadence PCIe host and endpoint controller driver (Cyrille
   Pitchen)

 - handle multiple INTx status bits being set in dra7xx (Vignesh R)

 - translate dra7xx hwirq range to fix INTD handling (Vignesh R)

 - remove deprecated Exynos PHY initialization code (Jaehoon Chung)

 - fix MSI erratum workaround for HiSilicon Hip06/Hip07 (Dongdong Liu)

 - fix NULL pointer dereference in iProc BCMA driver (Ray Jui)

 - fix Keystone interrupt-controller-node lookup (Johan Hovold)

 - constify qcom driver structures (Julia Lawall)

 - rework Tegra config space mapping to increase space available for
   endpoints (Vidya Sagar)

 - simplify Tegra driver by using bus->sysdata (Manikanta Maddireddy)

 - remove PCI_REASSIGN_ALL_BUS usage on Tegra (Manikanta Maddireddy)

 - add support for Global Fabric Manager Server (GFMS) event to
   Microsemi Switchtec switch driver (Logan Gunthorpe)

 - add IDs for Switchtec PSX 24xG3 and PSX 48xG3 (Kelvin Cao)

* tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
  PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller
  dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller
  PCI: endpoint: Fix EPF device name to support multi-function devices
  PCI: endpoint: Add the function number as argument to EPC ops
  PCI: cadence: Add host driver for Cadence PCIe controller
  dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe host controller
  PCI: Add vendor ID for Cadence
  PCI: Add generic function to probe PCI host controllers
  PCI: generic: fix missing call of pci_free_resource_list()
  PCI: OF: Add generic function to parse and allocate PCI resources
  PCI: Regroup all PCI related entries into drivers/pci/Makefile
  PCI/DPC: Reformat DPC register definitions
  PCI/DPC: Add and use DPC Status register field definitions
  PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error()
  PCI/DPC: Remove unnecessary RP PIO register structs
  PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info()
  PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info()
  PCI/DPC: Make RP PIO log size check more generic
  PCI/DPC: Rename local "status" to "dpc_status"
  PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error()
  ...
2018-02-06 09:59:40 -08:00
Linus Torvalds
64b28683de for-linus-20180204
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJadzbSAAoJEPfTWPspceCmt5QP/jo6MSsNVevAQOE75Jje+qa/
 aF/BjHBdUmmI5WtPrtoz4igaJou7M2U0s8jdsc3c7uMw8dGTKc6ujIquSEn0wevY
 faJPTjWzLum3y50gwRHcrHCQIlxOe5/f9rJevW4+q76aMP3aWKjO4bgBExH+2XnA
 CaT+6d40skYt20Sy428H0yhVdDAMiQYXTeg4SssWQY9AvJSSiW7ax+vmP3r5BKpV
 dXHggwgzqDuMwLZG80Tfg4GHGv5qisIrqLOCxtXNYHDNb/aDmbTFTO2jPgobT8gW
 N2kWxsOkBayUdPw6Nt2Wlm4toQgR5GJGH04LH2vI5p4dp4Grvx/aFGvUbT7+sN1u
 g/mmqsUUnYuO5AJ8XY2s2F7ezaT6v9x8BbLHuA2vz0r5GsdFVXctZ/bXgQqkmh9i
 KLtfyOPldlczclVEuKL4xai1aXLcoBzDwyLxzbFp3+eAlhcgoSqxnMsE4fCJblCU
 dfShDChu1SbBD6dyGx8sol9cT48RFj2tBtpfcYxFW/NJJOQoh9FTqPQetYQxQ72c
 TadEf40hmw5Q2l0Hu5pwVbKHWUP0wn0VznkAOfT4VV1ysk93oExMbjgS2qh16xEZ
 oQwFDQMk3D8BXI9VwH8gUUnypkhcooMhznxSC3BQxjGn/R+byp7QEPvxSEZz/4nD
 BaBSbyAU5cpof+Eaqs4B
 =qeDb
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20180204' of git://git.kernel.dk/linux-block

Pull more block updates from Jens Axboe:
 "Most of this is fixes and not new code/features:

   - skd fix from Arnd, fixing a build error dependent on sla allocator
     type.

   - blk-mq scheduler discard merging fixes, one from me and one from
     Keith. This fixes a segment miscalculation for blk-mq-sched, where
     we mistakenly think two segments are physically contigious even
     though the request isn't carrying real data. Also fixes a bio-to-rq
     merge case.

   - Don't re-set a bit on the buffer_head flags, if it's already set.
     This can cause scalability concerns on bigger machines and
     workloads. From Kemi Wang.

   - Add BLK_STS_DEV_RESOURCE return value to blk-mq, allowing us to
     distuingish between a local (device related) resource starvation
     and a global one. The latter might happen without IO being in
     flight, so it has to be handled a bit differently. From Ming"

* tag 'for-linus-20180204' of git://git.kernel.dk/linux-block:
  block: skd: fix incorrect linux/slab_def.h inclusion
  buffer: Avoid setting buffer bits that are already set
  blk-mq-sched: Enable merging discard bio into request
  blk-mq: fix discard merge with scheduler attached
  blk-mq: introduce BLK_STS_DEV_RESOURCE
2018-02-04 11:16:35 -08:00
Arnd Bergmann
1d51877578 block: skd: fix incorrect linux/slab_def.h inclusion
skd includes slab_def.h to get access to the slab cache object size.
However, including this header breaks when we use SLUB or SLOB instead of
the SLAB allocator, since the structure layout is completely different,
as shown by this warning when we build this driver in one of the invalid
configurations with link-time optimizations enabled:

include/linux/slab.h:715:0: error: type of 'kmem_cache_size' does not match original declaration [-Werror=lto-type-mismatch]
 unsigned int kmem_cache_size(struct kmem_cache *s);

mm/slab_common.c:77:14: note: 'kmem_cache_size' was previously declared here
 unsigned int kmem_cache_size(struct kmem_cache *s)
              ^
mm/slab_common.c:77:14: note: code may be misoptimized unless -fno-strict-aliasing is used
include/linux/slab.h:147:0: error: type of 'kmem_cache_destroy' does not match original declaration [-Werror=lto-type-mismatch]
 void kmem_cache_destroy(struct kmem_cache *);

mm/slab_common.c:858:6: note: 'kmem_cache_destroy' was previously declared here
 void kmem_cache_destroy(struct kmem_cache *s)
      ^
mm/slab_common.c:858:6: note: code may be misoptimized unless -fno-strict-aliasing is used
include/linux/slab.h:140:0: error: type of 'kmem_cache_create' does not match original declaration [-Werror=lto-type-mismatch]
 struct kmem_cache *kmem_cache_create(const char *name, size_t size,

mm/slab_common.c:534:1: note: 'kmem_cache_create' was previously declared here
 kmem_cache_create(const char *name, size_t size, size_t align,
 ^

This removes the header inclusion and instead uses the kmem_cache_size()
interface to get the size in a reliable way.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-02 08:07:45 -07:00
Stefan Hajnoczi
daf2a50169 virtio_blk: print capacity at probe time
Print the capacity of the block device when the driver is probed.  Many
users expect this since SCSI disks (sd) do it.  Moreover, kernel dmesg
output is the primary source of troubleshooting information so it's
helpful to include the disk size there.

The capacity is already printed by virtio_blk when a resize event
occurs.  Extract the code and reuse it from virtblk_probe().

This patch also adds the block device name to the message so it can be
correlated with a specific device:

  virtio_blk virtio0: [vda] 20971520 512-byte logical blocks (10.7 GB/10.0 GiB)

Cc: Rodrigo A B Freire <rfreire@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-01 16:26:43 +02:00
Ming Lei
86ff7c2a80 blk-mq: introduce BLK_STS_DEV_RESOURCE
This status is returned from driver to block layer if device related
resource is unavailable, but driver can guarantee that IO dispatch
will be triggered in future when the resource is available.

Convert some drivers to return BLK_STS_DEV_RESOURCE.  Also, if driver
returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after
a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls.  BLK_MQ_DELAY_QUEUE is
3 ms because both scsi-mq and nvmefc are using that magic value.

If a driver can make sure there is in-flight IO, it is safe to return
BLK_STS_DEV_RESOURCE because:

1) If all in-flight IOs complete before examining SCHED_RESTART in
blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue
is run immediately in this case by blk_mq_dispatch_rq_list();

2) if there is any in-flight IO after/when examining SCHED_RESTART
in blk_mq_dispatch_rq_list():
- if SCHED_RESTART isn't set, queue is run immediately as handled in 1)
- otherwise, this request will be dispatched after any in-flight IO is
  completed via blk_mq_sched_restart()

3) if SCHED_RESTART is set concurently in context because of
BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two
cases and make sure IO hang can be avoided.

One invariant is that queue will be rerun if SCHED_RESTART is set.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-30 20:18:28 -07:00
Linus Torvalds
1ed2d76e02 Merge branch 'work.sock_recvmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull kern_recvmsg reduction from Al Viro:
 "kernel_recvmsg() is a set_fs()-using wrapper for sock_recvmsg(). In
  all but one case that is not needed - use of ITER_KVEC for ->msg_iter
  takes care of the data and does not care about set_fs(). The only
  exception is svc_udp_recvfrom() where we want cmsg to be store into
  kernel object; everything else can just use sock_recvmsg() and be done
  with that.

  A followup converting svc_udp_recvfrom() away from set_fs() (and
  killing kernel_recvmsg() off) is *NOT* in here - I'd like to hear what
  netdev folks think of the approach proposed in that followup)"

* 'work.sock_recvmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  tipc: switch to sock_recvmsg()
  smc: switch to sock_recvmsg()
  ipvs: switch to sock_recvmsg()
  mISDN: switch to sock_recvmsg()
  drbd: switch to sock_recvmsg()
  lustre lnet_sock_read(): switch to sock_recvmsg()
  cfs2: switch to sock_recvmsg()
  ncpfs: switch to sock_recvmsg()
  dlm: switch to sock_recvmsg()
  svc_recvfrom(): switch to sock_recvmsg()
2018-01-30 18:59:03 -08:00
Linus Torvalds
0a4b6e2f80 Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
 "This is the main pull request for block IO related changes for the
  4.16 kernel. Nothing major in this pull request, but a good amount of
  improvements and fixes all over the map. This contains:

   - BFQ improvements, fixes, and cleanups from Angelo, Chiara, and
     Paolo.

   - Support for SMR zones for deadline and mq-deadline from Damien and
     Christoph.

   - Set of fixes for bcache by way of Michael Lyle, including fixes
     from himself, Kent, Rui, Tang, and Coly.

   - Series from Matias for lightnvm with fixes from Hans Holmberg,
     Javier, and Matias. Mostly centered around pblk, and the removing
     rrpc 1.2 in preparation for supporting 2.0.

   - A couple of NVMe pull requests from Christoph. Nothing major in
     here, just fixes and cleanups, and support for command tracing from
     Johannes.

   - Support for blk-throttle for tracking reads and writes separately.
     From Joseph Qi. A few cleanups/fixes also for blk-throttle from
     Weiping.

   - Series from Mike Snitzer that enables dm to register its queue more
     logically, something that's alwways been problematic on dm since
     it's a stacked device.

   - Series from Ming cleaning up some of the bio accessor use, in
     preparation for supporting multipage bvecs.

   - Various fixes from Ming closing up holes around queue mapping and
     quiescing.

   - BSD partition fix from Richard Narron, fixing a problem where we
     can't mount newer (10/11) FreeBSD partitions.

   - Series from Tejun reworking blk-mq timeout handling. The previous
     scheme relied on atomic bits, but it had races where we would think
     a request had timed out if it to reused at the wrong time.

   - null_blk now supports faking timeouts, to enable us to better
     exercise and test that functionality separately. From me.

   - Kill the separate atomic poll bit in the request struct. After
     this, we don't use the atomic bits on blk-mq anymore at all. From
     me.

   - sgl_alloc/free helpers from Bart.

   - Heavily contended tag case scalability improvement from me.

   - Various little fixes and cleanups from Arnd, Bart, Corentin,
     Douglas, Eryu, Goldwyn, and myself"

* 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits)
  block: remove smart1,2.h
  nvme: add tracepoint for nvme_complete_rq
  nvme: add tracepoint for nvme_setup_cmd
  nvme-pci: introduce RECONNECTING state to mark initializing procedure
  nvme-rdma: remove redundant boolean for inline_data
  nvme: don't free uuid pointer before printing it
  nvme-pci: Suspend queues after deleting them
  bsg: use pr_debug instead of hand crafted macros
  blk-mq-debugfs: don't allow write on attributes with seq_operations set
  nvme-pci: Fix queue double allocations
  block: Set BIO_TRACE_COMPLETION on new bio during split
  blk-throttle: use queue_is_rq_based
  block: Remove kblockd_schedule_delayed_work{,_on}()
  blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays
  blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly()
  lib/scatterlist: Fix chaining support in sgl_alloc_order()
  blk-throttle: track read and write request individually
  block: add bdev_read_only() checks to common helpers
  block: fail op_is_write() requests to read-only partitions
  blk-throttle: export io_serviced_recursive, io_service_bytes_recursive
  ...
2018-01-29 11:51:49 -08:00
Ilya Dryomov
e573427a44 rbd: whitelist RBD_FEATURE_OPERATIONS feature bit
This feature bit restricts older clients from performing certain
maintenance operations against an image (e.g. clone, snap create).
krbd does not perform maintenance operations.

Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-01-29 18:36:03 +01:00
Ilya Dryomov
d98f153f1a rbd: don't NULL out ->obj_request in rbd_img_obj_parent_read_full()
If rbd_img_request_submit() fails, parent_request->obj_request is
NULLed out, triggering an assert in rbd_obj_request_put():

  rbd_img_request_put(parent_request)
    rbd_parent_request_destroy
      rbd_obj_request_put(NULL)

Just remove it -- parent_request->obj_request will be put in
rbd_parent_request_destroy().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 15:23:01 +01:00
Ilya Dryomov
a0c5895b27 rbd: use kmem_cache_zalloc() in rbd_img_request_create()
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 15:23:01 +01:00
Ilya Dryomov
2e584bce70 rbd: obj_request->completion is unused
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 15:22:56 +01:00
Corentin Labbe
796baeeef8 block: remove smart1,2.h
smart1,2.h is unused since commit d436641439 ("cpqarray: remove it from the kernel")
Remove it from tree.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-26 13:23:35 -07:00
Tina Ruchandani
85cf955df8 aoe: use ktime_t instead of timeval
'struct frame' uses two variables to store the sent timestamp - 'struct
timeval' and jiffies. jiffies is used to avoid discrepancies caused by
updates to system time. 'struct timeval' is deprecated because it uses
32-bit representation for seconds which will overflow in year 2038.

This patch does the following:
- Replace the use of 'struct timeval' and jiffies with ktime_t, which
  is the recommended type for timestamping
- ktime_t provides both long range (like jiffies) and high resolution
  (like timeval). Using ktime_get (monotonic time) instead of wall-clock
  time prevents any discprepancies caused by updates to system time.

[updates by Arnd below]
The original patch from Tina never went anywhere as we discussed how
to keep the impact on performance minimal. I've started over now but
arrived at basically the same patch that she had originally, except for
an slightly improved tsince_hr() function. I'm making it more robust
against overflows, and also optimize explicitly for the common case
in which a frame is less than 4.2 seconds old, using only a 32-bit
division in that case.

This should make the new version more efficient than the old code,
since we replace the existing two 32-bit division in do_gettimeofday()
plus one multiplication with a single single 32-bit division in
tsince_hr() and drop the double bookkeeping. It's also more efficient
than the ktime_get_us() API we discussed before, since that would
also rely on multiple divisions.

Link: https://lists.linaro.org/pipermail/y2038/2015-May/000276.html
Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com>
Cc: Ed Cashin <ed.cashin@acm.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-17 08:41:07 -07:00
Linus Torvalds
1545dec46d Two rbd fixes for 4.12 and 4.2 issues respectively, marked for stable.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJaV5ghAAoJEEp/3jgCEfOL/bEH/3UopVYhLpHPqIAGCQuONuvG
 4imm/19uVIsbJMCHZ/bbksPPOaswqCKws5ScfIJKzg9vI8PQaQ28BnQh/vELHlmk
 lUgdXgVPaCoM/3UlCquxmBn5IooLt0t7LdCvMO2MvF3mf+jThyfwcoc1H2xe1Yh2
 k9dPQWcfNBY7jGTBGzqyuNwfg9DZSMyBRx4JmOsqlCI/GDcA8DsLHIykA4wRQRF+
 gtSCYeUuZsfukCk4Z1ImoYyfbtK/tYH4s9UOc5pEsd0s0NUs8bn7hgQn2QKWmQt4
 HsOzNTiAl/pXSRi8BMqPyFGVdZLT1cRkeo/FW1Lkxg5NqJmv005ebJ1Jx8YHW9A=
 =JTlC
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Two rbd fixes for 4.12 and 4.2 issues respectively, marked for
  stable"

* tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-client:
  rbd: set max_segments to USHRT_MAX
  rbd: reacquire lock should update lock owner client id
2018-01-11 16:57:32 -08:00
Arnd Bergmann
33f782c49a null_blk: remove explicit 'select FAULT_INJECTION'
Selecting FAULT_INJECTION causes a Kconfig warning when CONFIG_DEBUG_KERNEL
is not set:

warning: (BLK_DEV_NULL_BLK && DRM_I915_SELFTEST) selects FAULT_INJECTION which has unmet direct dependencies (DEBUG_KERNEL)

The other drivers that use FAULT_INJECTION tend to have a separate
Kconfig symbol for turning on that feature, so let's do the same
thing here. This may add a bit more complexity than we like, but
it avoids the warning and is more consistent with the rest of the
kernel.

Fixes: 93b570464c ("null_blk: add option for managing IO timeouts")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-11 07:58:31 -07:00
Jens Axboe
93b570464c null_blk: add option for managing IO timeouts
Use the fault injection framework to provide a way for null_blk
to configure timeouts. This only works for queue_mode 1 and 2,
since the bio mode doesn't have code for tracking timeouts.

Let's say you want to have a 10% chance of timing out every
100,000 requests, and for 5 total timeouts, you could do:

modprobe null_blk timeout="100000,10,0,5"

This is useful for adding blktests to test that IO timeouts
are handled appropriately.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-10 09:06:23 -07:00
Jens Axboe
5448aca41c null_blk: wire up timeouts
This is needed to ensure that we actually handle timeouts.
Without it, the queue_mode=1 path will never call blk_add_timer(),
and the queue_mode=2 path will continually just return
EH_RESET_TIMER and we never actually complete the offending request.

This was used to test the new timeout code, and the changes around
killing off REQ_ATOM_COMPLETE.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-09 14:59:19 -07:00
Ilya Dryomov
21acdf45f4 rbd: set max_segments to USHRT_MAX
Commit d3834fefcf ("rbd: bump queue_max_segments") bumped
max_segments (unsigned short) to max_hw_sectors (unsigned int).
max_hw_sectors is set to the number of 512-byte sectors in an object
and overflows unsigned short for 32M (largest possible) objects, making
the block layer resort to handing us single segment (i.e. single page
or even smaller) bios in that case.

Cc: stable@vger.kernel.org
Fixes: d3834fefcf ("rbd: bump queue_max_segments")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2018-01-09 17:40:48 +01:00
Florian Margaine
edd8ca8015 rbd: reacquire lock should update lock owner client id
Otherwise, future operations on this RBD using exclusive-lock are
going to require the lock from a non-existent client id.

Cc: stable@vger.kernel.org
Fixes: 14bb211d32 ("rbd: support updating the lock cookie without releasing the lock")
Link: http://tracker.ceph.com/issues/19929
Signed-off-by: Florian Margaine <florian@platform.sh>
[idryomov@gmail.com: rbd_set_owner_cid() call, __rbd_lock() helper]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-09 17:40:21 +01:00
Linus Torvalds
ae6650163c loop: fix concurrent lo_open/lo_release
范龙飞 reports that KASAN can report a use-after-free in __lock_acquire.
The reason is due to insufficient serialization in lo_release(), which
will continue to use the loop device even after it has decremented the
lo_refcnt to zero.

In the meantime, another process can come in, open the loop device
again as it is being shut down. Confusion ensues.

Reported-by: 范龙飞 <long7573@126.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-06 09:32:07 -07:00
Arnd Bergmann
91f7b74aca DAC960: split up ioctl function to reduce stack size
When CONFIG_KASAN is set, all the local variables in this function are
allocated on the stack together, leading to a warning about possible
kernel stack overflow:

drivers/block/DAC960.c: In function 'DAC960_gam_ioctl':
drivers/block/DAC960.c:7061:1: error: the frame size of 2240 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]

By splitting up the function into smaller chunks, we can avoid that and
make the code slightly more readable at the same time. The coding style
in this file is completely nonstandard, and I chose to not touch that
at all, leaving the unconventional intendation unchanged to make it
easier to review the diff.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-06 09:18:00 -07:00
Ming Lei
263663cd3c block: convert to bio_first_bvec_all & bio_first_page_all
This patch converts to bio_first_bvec_all() & bio_first_page_all() for
retrieving the 1st bvec/page, and prepares for supporting multipage bvec.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-06 09:18:00 -07:00
Bart Van Assche
882d4171a8 pktcdvd: Fix a recently introduced NULL pointer dereference
Call bdev_get_queue(bdev) after bdev->bd_disk has been initialized
instead of just before that pointer has been initialized. This patch
avoids that the following command

pktsetup 1 /dev/sr0

triggers the following kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000548
IP: pkt_setup_dev+0x2db/0x670 [pktcdvd]
CPU: 2 PID: 724 Comm: pktsetup Not tainted 4.15.0-rc4-dbg+ #1
Call Trace:
 pkt_ctl_ioctl+0xce/0x1c0 [pktcdvd]
 do_vfs_ioctl+0x8e/0x670
 SyS_ioctl+0x3c/0x70
 entry_SYSCALL_64_fastpath+0x23/0x9a

Reported-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Fixes: commit ca18d6f769 ("block: Make most scsi_req_init() calls implicit")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Tested-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Cc: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Cc: <stable@vger.kernel.org> # v4.13
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-05 09:03:04 -07:00
Bart Van Assche
5a0ec388ef pktcdvd: Fix pkt_setup_dev() error path
Commit 523e1d399c ("block: make gendisk hold a reference to its queue")
modified add_disk() and disk_release() but did not update any of the
error paths that trigger a put_disk() call after disk->queue has been
assigned. That introduced the following behavior in the pktcdvd driver
if pkt_new_dev() fails:

Kernel BUG at 00000000e98fd882 [verbose debug info unavailable]

Since disk_release() calls blk_put_queue() anyway if disk->queue != NULL,
fix this by removing the blk_cleanup_queue() call from the pkt_setup_dev()
error path.

Fixes: commit 523e1d399c ("block: make gendisk hold a reference to its queue")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Cc: <stable@vger.kernel.org> # v3.2
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-05 09:03:03 -07:00
Matias Bjørling
74ede5af27 null_blk: remove lightnvm support
With rrpc to be removed, the null_blk lightnvm support is no longer
functional. Remove the lightnvm implementation and maybe add it to
another module in the future if someone takes on the challenge.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-05 08:50:12 -07:00
Romain Perier
4695a1ad3a block: DAC960: Replace PCI pool old API
The PCI pool API is deprecated.  Replace the PCI pool old API by the
appropriate function with the DMA pool API.

Tested-by: Peter Senna Tschudin <peter.senna@collabora.com>
Signed-off-by: Romain Perier <romain.perier@collabora.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Peter Senna Tschudin <peter.senna@collabora.com>
2018-01-02 16:09:50 -06:00
Jens Axboe
0864fe09ab null_blk: unalign call_single_data
Commit 966a967116 randomly added alignment to this structure, but
it's actually detrimental to performance of null_blk. Test case:

Running on both the home and remote node shows a ~5% degradation
in performance.

While in there, move blk_status_t to the hole after the integer tag
in the nullb_cmd structure. After this patch, we shrink the size
from 192 to 152 bytes.

Fixes: 966a967116 ("smp: Avoid using two cache lines for struct call_single_data")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-12-20 13:16:33 -07:00
Al Viro
f7765c3646 drbd: switch to sock_recvmsg()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-12-02 20:38:06 -05:00
Linus Torvalds
75f64f68af Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A selection of fixes/changes that should make it into this series.
  This contains:

   - NVMe, two merges, containing:
        - pci-e, rdma, and fc fixes
        - Device quirks

   - Fix for a badblocks leak in null_blk

   - bcache fix from Rui Hua for a race condition regression where
     -EINTR was returned to upper layers that didn't expect it.

   - Regression fix for blktrace for a bug introduced in this series.

   - blktrace cleanup for cgroup id.

   - bdi registration error handling.

   - Small series with cleanups for blk-wbt.

   - Various little fixes for typos and the like.

  Nothing earth shattering, most important are the NVMe and bcache fixes"

* 'for-linus' of git://git.kernel.dk/linux-block: (34 commits)
  nvme-pci: fix NULL pointer dereference in nvme_free_host_mem()
  nvme-rdma: fix memory leak during queue allocation
  blktrace: fix trace mutex deadlock
  nvme-rdma: Use mr pool
  nvme-rdma: Check remotely invalidated rkey matches our expected rkey
  nvme-rdma: wait for local invalidation before completing a request
  nvme-rdma: don't complete requests before a send work request has completed
  nvme-rdma: don't suppress send completions
  bcache: check return value of register_shrinker
  bcache: recover data from backing when data is clean
  bcache: Fix building error on MIPS
  bcache: add a comment in journal bucket reading
  nvme-fc: don't use bit masks for set/test_bit() numbers
  blk-wbt: fix comments typo
  blk-wbt: move wbt_clear_stat to common place in wbt_done
  blk-sysfs: remove NULL pointer checking in queue_wb_lat_store
  blk-wbt: remove duplicated setting in wbt_init
  nvme-pci: add quirk for delay before CHK RDY for WDC SN200
  block: remove useless assignment in bio_split
  null_blk: fix dev->badblocks leak
  ...
2017-12-01 08:05:45 -05:00
David Disseldorp
1addb798e9 null_blk: fix dev->badblocks leak
null_alloc_dev() allocates memory for dev->badblocks, but cleanup
currently only occurs in the configfs release codepath, missing a number
of other places.

This bug was found running the blktests block/010 test, alongside
kmemleak:
rapido1:/blktests# ./check block/010
...
rapido1:/blktests# echo scan > /sys/kernel/debug/kmemleak
[  306.966708] kmemleak: 32 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
rapido1:/blktests# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88001f86d000 (size 4096):
  comm "modprobe", pid 231, jiffies 4294892415 (age 318.252s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff814b0379>] kmemleak_alloc+0x49/0xa0
    [<ffffffff810f180f>] kmem_cache_alloc+0x9f/0xe0
    [<ffffffff8124e45f>] badblocks_init+0x2f/0x60
    [<ffffffffa0019fae>] 0xffffffffa0019fae
    [<ffffffffa0021273>] nullb_device_badblocks_store+0x63/0x130 [null_blk]
    [<ffffffff810004cd>] do_one_initcall+0x3d/0x170
    [<ffffffff8109fe0d>] do_init_module+0x56/0x1e9
    [<ffffffff8109ebd7>] load_module+0x1c47/0x26a0
    [<ffffffff8109f819>] SyS_finit_module+0xa9/0xd0
    [<ffffffff814b4f60>] entry_SYSCALL_64_fastpath+0x13/0x94

Fixes: 2f54a613c9 ("nullb: badbblocks support")
Reviewed-by: Shaohua Li <shli@fb.com>
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-22 08:43:11 -07:00
Kees Cook
841b86f328 treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts
With all callbacks converted, and the timer callback prototype
switched over, the TIMER_FUNC_TYPE cast is no longer needed,
so remove it. Conversion was done with the following scripts:

    perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \
        $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)

    perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \
        $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)

The now unused macros are also dropped from include/linux/timer.h.

Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21 16:35:54 -08:00
Kees Cook
86cb30ec07 treewide: setup_timer() -> timer_setup() (2 field)
This converts all remaining setup_timer() calls that use a nested field
to reach a struct timer_list. Coccinelle does not have an easy way to
match multiple fields, so a new script is needed to change the matches of
"&_E->_timer" into "&_E->_field1._timer" in all the rules.

spatch --very-quiet --all-includes --include-headers \
	-I ./arch/x86/include -I ./arch/x86/include/generated \
	-I ./include -I ./arch/x86/include/uapi \
	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
	--dir . \
	--cocci-file ~/src/data/timer_setup-2fields.cocci

@fix_address_of depends@
expression e;
@@

 setup_timer(
-&(e)
+&e
 , ...)

// Update any raw setup_timer() usages that have a NULL callback, but
// would otherwise match change_timer_function_usage, since the latter
// will update all function assignments done in the face of a NULL
// function initialization in setup_timer().
@change_timer_function_usage_NULL@
expression _E;
identifier _field1;
identifier _timer;
type _cast_data;
@@

(
-setup_timer(&_E->_field1._timer, NULL, _E);
+timer_setup(&_E->_field1._timer, NULL, 0);
|
-setup_timer(&_E->_field1._timer, NULL, (_cast_data)_E);
+timer_setup(&_E->_field1._timer, NULL, 0);
|
-setup_timer(&_E._field1._timer, NULL, &_E);
+timer_setup(&_E._field1._timer, NULL, 0);
|
-setup_timer(&_E._field1._timer, NULL, (_cast_data)&_E);
+timer_setup(&_E._field1._timer, NULL, 0);
)

@change_timer_function_usage@
expression _E;
identifier _field1;
identifier _timer;
struct timer_list _stl;
identifier _callback;
type _cast_func, _cast_data;
@@

(
-setup_timer(&_E->_field1._timer, _callback, _E);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, &_callback, _E);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, _callback, (_cast_data)_E);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, &_callback, (_cast_data)_E);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, (_cast_func)_callback, _E);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, (_cast_func)&_callback, _E);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, _callback, (_cast_data)_E);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, _callback, (_cast_data)&_E);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, &_callback, (_cast_data)_E);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, &_callback, (_cast_data)&_E);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, (_cast_func)_callback, (_cast_data)&_E);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, (_cast_func)&_callback, (_cast_data)&_E);
+timer_setup(&_E._field1._timer, _callback, 0);
|
 _E->_field1._timer@_stl.function = _callback;
|
 _E->_field1._timer@_stl.function = &_callback;
|
 _E->_field1._timer@_stl.function = (_cast_func)_callback;
|
 _E->_field1._timer@_stl.function = (_cast_func)&_callback;
|
 _E._field1._timer@_stl.function = _callback;
|
 _E._field1._timer@_stl.function = &_callback;
|
 _E._field1._timer@_stl.function = (_cast_func)_callback;
|
 _E._field1._timer@_stl.function = (_cast_func)&_callback;
)

// callback(unsigned long arg)
@change_callback_handle_cast
 depends on change_timer_function_usage@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._field1;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
identifier _handle;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *t
 )
 {
(
	... when != _origarg
	_handletype *_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _field1._timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle =
-(void *)_origarg;
+from_timer(_handle, t, _field1._timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle;
	... when != _handle
	_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _field1._timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle;
	... when != _handle
	_handle =
-(void *)_origarg;
+from_timer(_handle, t, _field1._timer);
	... when != _origarg
)
 }

// callback(unsigned long arg) without existing variable
@change_callback_handle_cast_no_arg
 depends on change_timer_function_usage &&
                     !change_callback_handle_cast@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._field1;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *t
 )
 {
+	_handletype *_origarg = from_timer(_origarg, t, _field1._timer);
+
	... when != _origarg
-	(_handletype *)_origarg
+	_origarg
	... when != _origarg
 }

// Avoid already converted callbacks.
@match_callback_converted
 depends on change_timer_function_usage &&
            !change_callback_handle_cast &&
	    !change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier t;
@@

 void _callback(struct timer_list *t)
 { ... }

// callback(struct something *handle)
@change_callback_handle_arg
 depends on change_timer_function_usage &&
	    !match_callback_converted &&
            !change_callback_handle_cast &&
            !change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._field1;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
@@

 void _callback(
-_handletype *_handle
+struct timer_list *t
 )
 {
+	_handletype *_handle = from_timer(_handle, t, _field1._timer);
	...
 }

// If change_callback_handle_arg ran on an empty function, remove
// the added handler.
@unchange_callback_handle_arg
 depends on change_timer_function_usage &&
	    change_callback_handle_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._field1;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
identifier t;
@@

 void _callback(struct timer_list *t)
 {
-	_handletype *_handle = from_timer(_handle, t, _field1._timer);
 }

// We only want to refactor the setup_timer() data argument if we've found
// the matching callback. This undoes changes in change_timer_function_usage.
@unchange_timer_function_usage
 depends on change_timer_function_usage &&
            !change_callback_handle_cast &&
            !change_callback_handle_cast_no_arg &&
	    !change_callback_handle_arg@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._field1;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type change_timer_function_usage._cast_data;
@@

(
-timer_setup(&_E->_field1._timer, _callback, 0);
+setup_timer(&_E->_field1._timer, _callback, (_cast_data)_E);
|
-timer_setup(&_E._field1._timer, _callback, 0);
+setup_timer(&_E._field1._timer, _callback, (_cast_data)&_E);
)

// If we fixed a callback from a .function assignment, fix the
// assignment cast now.
@change_timer_function_assignment
 depends on change_timer_function_usage &&
            (change_callback_handle_cast ||
             change_callback_handle_cast_no_arg ||
             change_callback_handle_arg)@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._field1;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_func;
typedef TIMER_FUNC_TYPE;
@@

(
 _E->_field1._timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_field1._timer.function =
-&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_field1._timer.function =
-(_cast_func)_callback;
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_field1._timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._field1._timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._field1._timer.function =
-&_callback;
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._field1._timer.function =
-(_cast_func)_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._field1._timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
)

// Sometimes timer functions are called directly. Replace matched args.
@change_timer_function_calls
 depends on change_timer_function_usage &&
            (change_callback_handle_cast ||
             change_callback_handle_cast_no_arg ||
             change_callback_handle_arg)@
expression _E;
identifier change_timer_function_usage._field1;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_data;
@@

 _callback(
(
-(_cast_data)_E
+&_E->_field1._timer
|
-(_cast_data)&_E
+&_E._field1._timer
|
-_E
+&_E->_field1._timer
)
 )

// If a timer has been configured without a data argument, it can be
// converted without regard to the callback argument, since it is unused.
@match_timer_function_unused_data@
expression _E;
identifier _field1;
identifier _timer;
identifier _callback;
@@

(
-setup_timer(&_E->_field1._timer, _callback, 0);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, _callback, 0L);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, _callback, 0UL);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, _callback, 0);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, _callback, 0L);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, _callback, 0UL);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_field1._timer, _callback, 0);
+timer_setup(&_field1._timer, _callback, 0);
|
-setup_timer(&_field1._timer, _callback, 0L);
+timer_setup(&_field1._timer, _callback, 0);
|
-setup_timer(&_field1._timer, _callback, 0UL);
+timer_setup(&_field1._timer, _callback, 0);
|
-setup_timer(_field1._timer, _callback, 0);
+timer_setup(_field1._timer, _callback, 0);
|
-setup_timer(_field1._timer, _callback, 0L);
+timer_setup(_field1._timer, _callback, 0);
|
-setup_timer(_field1._timer, _callback, 0UL);
+timer_setup(_field1._timer, _callback, 0);
)

@change_callback_unused_data
 depends on match_timer_function_unused_data@
identifier match_timer_function_unused_data._callback;
type _origtype;
identifier _origarg;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *unused
 )
 {
	... when != _origarg
 }

Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21 15:57:09 -08:00
Kees Cook
e99e88a9d2 treewide: setup_timer() -> timer_setup()
This converts all remaining cases of the old setup_timer() API into using
timer_setup(), where the callback argument is the structure already
holding the struct timer_list. These should have no behavioral changes,
since they just change which pointer is passed into the callback with
the same available pointers after conversion. It handles the following
examples, in addition to some other variations.

Casting from unsigned long:

    void my_callback(unsigned long data)
    {
        struct something *ptr = (struct something *)data;
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, ptr);

and forced object casts:

    void my_callback(struct something *ptr)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);

become:

    void my_callback(struct timer_list *t)
    {
        struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

Direct function assignments:

    void my_callback(unsigned long data)
    {
        struct something *ptr = (struct something *)data;
    ...
    }
    ...
    ptr->my_timer.function = my_callback;

have a temporary cast added, along with converting the args:

    void my_callback(struct timer_list *t)
    {
        struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;

And finally, callbacks without a data assignment:

    void my_callback(unsigned long data)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, 0);

have their argument renamed to verify they're unused during conversion:

    void my_callback(struct timer_list *unused)
    {
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

The conversion is done with the following Coccinelle script:

spatch --very-quiet --all-includes --include-headers \
	-I ./arch/x86/include -I ./arch/x86/include/generated \
	-I ./include -I ./arch/x86/include/uapi \
	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
	--dir . \
	--cocci-file ~/src/data/timer_setup.cocci

@fix_address_of@
expression e;
@@

 setup_timer(
-&(e)
+&e
 , ...)

// Update any raw setup_timer() usages that have a NULL callback, but
// would otherwise match change_timer_function_usage, since the latter
// will update all function assignments done in the face of a NULL
// function initialization in setup_timer().
@change_timer_function_usage_NULL@
expression _E;
identifier _timer;
type _cast_data;
@@

(
-setup_timer(&_E->_timer, NULL, _E);
+timer_setup(&_E->_timer, NULL, 0);
|
-setup_timer(&_E->_timer, NULL, (_cast_data)_E);
+timer_setup(&_E->_timer, NULL, 0);
|
-setup_timer(&_E._timer, NULL, &_E);
+timer_setup(&_E._timer, NULL, 0);
|
-setup_timer(&_E._timer, NULL, (_cast_data)&_E);
+timer_setup(&_E._timer, NULL, 0);
)

@change_timer_function_usage@
expression _E;
identifier _timer;
struct timer_list _stl;
identifier _callback;
type _cast_func, _cast_data;
@@

(
-setup_timer(&_E->_timer, _callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, &_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, &_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
 _E->_timer@_stl.function = _callback;
|
 _E->_timer@_stl.function = &_callback;
|
 _E->_timer@_stl.function = (_cast_func)_callback;
|
 _E->_timer@_stl.function = (_cast_func)&_callback;
|
 _E._timer@_stl.function = _callback;
|
 _E._timer@_stl.function = &_callback;
|
 _E._timer@_stl.function = (_cast_func)_callback;
|
 _E._timer@_stl.function = (_cast_func)&_callback;
)

// callback(unsigned long arg)
@change_callback_handle_cast
 depends on change_timer_function_usage@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
identifier _handle;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *t
 )
 {
(
	... when != _origarg
	_handletype *_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle =
-(void *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle;
	... when != _handle
	_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle;
	... when != _handle
	_handle =
-(void *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
)
 }

// callback(unsigned long arg) without existing variable
@change_callback_handle_cast_no_arg
 depends on change_timer_function_usage &&
                     !change_callback_handle_cast@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *t
 )
 {
+	_handletype *_origarg = from_timer(_origarg, t, _timer);
+
	... when != _origarg
-	(_handletype *)_origarg
+	_origarg
	... when != _origarg
 }

// Avoid already converted callbacks.
@match_callback_converted
 depends on change_timer_function_usage &&
            !change_callback_handle_cast &&
	    !change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier t;
@@

 void _callback(struct timer_list *t)
 { ... }

// callback(struct something *handle)
@change_callback_handle_arg
 depends on change_timer_function_usage &&
	    !match_callback_converted &&
            !change_callback_handle_cast &&
            !change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
@@

 void _callback(
-_handletype *_handle
+struct timer_list *t
 )
 {
+	_handletype *_handle = from_timer(_handle, t, _timer);
	...
 }

// If change_callback_handle_arg ran on an empty function, remove
// the added handler.
@unchange_callback_handle_arg
 depends on change_timer_function_usage &&
	    change_callback_handle_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
identifier t;
@@

 void _callback(struct timer_list *t)
 {
-	_handletype *_handle = from_timer(_handle, t, _timer);
 }

// We only want to refactor the setup_timer() data argument if we've found
// the matching callback. This undoes changes in change_timer_function_usage.
@unchange_timer_function_usage
 depends on change_timer_function_usage &&
            !change_callback_handle_cast &&
            !change_callback_handle_cast_no_arg &&
	    !change_callback_handle_arg@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type change_timer_function_usage._cast_data;
@@

(
-timer_setup(&_E->_timer, _callback, 0);
+setup_timer(&_E->_timer, _callback, (_cast_data)_E);
|
-timer_setup(&_E._timer, _callback, 0);
+setup_timer(&_E._timer, _callback, (_cast_data)&_E);
)

// If we fixed a callback from a .function assignment, fix the
// assignment cast now.
@change_timer_function_assignment
 depends on change_timer_function_usage &&
            (change_callback_handle_cast ||
             change_callback_handle_cast_no_arg ||
             change_callback_handle_arg)@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_func;
typedef TIMER_FUNC_TYPE;
@@

(
 _E->_timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_timer.function =
-&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_timer.function =
-(_cast_func)_callback;
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-&_callback;
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-(_cast_func)_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
)

// Sometimes timer functions are called directly. Replace matched args.
@change_timer_function_calls
 depends on change_timer_function_usage &&
            (change_callback_handle_cast ||
             change_callback_handle_cast_no_arg ||
             change_callback_handle_arg)@
expression _E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_data;
@@

 _callback(
(
-(_cast_data)_E
+&_E->_timer
|
-(_cast_data)&_E
+&_E._timer
|
-_E
+&_E->_timer
)
 )

// If a timer has been configured without a data argument, it can be
// converted without regard to the callback argument, since it is unused.
@match_timer_function_unused_data@
expression _E;
identifier _timer;
identifier _callback;
@@

(
-setup_timer(&_E->_timer, _callback, 0);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, 0L);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, 0UL);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0L);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0UL);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0L);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0UL);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0);
+timer_setup(_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0L);
+timer_setup(_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0UL);
+timer_setup(_timer, _callback, 0);
)

@change_callback_unused_data
 depends on match_timer_function_unused_data@
identifier match_timer_function_unused_data._callback;
type _origtype;
identifier _origarg;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *unused
 )
 {
	... when != _origarg
 }

Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21 15:57:07 -08:00
Kees Cook
b9eaf18722 treewide: init_timer() -> setup_timer()
This mechanically converts all remaining cases of ancient open-coded timer
setup with the old setup_timer() API, which is the first step in timer
conversions. This has no behavioral changes, since it ultimately just
changes the order of assignment to fields of struct timer_list when
finding variations of:

    init_timer(&t);
    f.function = timer_callback;
    t.data = timer_callback_arg;

to be converted into:

    setup_timer(&t, timer_callback, timer_callback_arg);

The conversion is done with the following Coccinelle script, which
is an improved version of scripts/cocci/api/setup_timer.cocci, in the
following ways:
 - assignments-before-init_timer() cases
 - limit the .data case removal to the specific struct timer_list instance
 - handling calls by dereference (timer->field vs timer.field)

spatch --very-quiet --all-includes --include-headers \
	-I ./arch/x86/include -I ./arch/x86/include/generated \
	-I ./include -I ./arch/x86/include/uapi \
	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
	--dir . \
	--cocci-file ~/src/data/setup_timer.cocci

@fix_address_of@
expression e;
@@

 init_timer(
-&(e)
+&e
 , ...)

// Match the common cases first to avoid Coccinelle parsing loops with
// "... when" clauses.

@match_immediate_function_data_after_init_timer@
expression e, func, da;
@@

-init_timer
+setup_timer
 ( \(&e\|e\)
+, func, da
 );
(
-\(e.function\|e->function\) = func;
-\(e.data\|e->data\) = da;
|
-\(e.data\|e->data\) = da;
-\(e.function\|e->function\) = func;
)

@match_immediate_function_data_before_init_timer@
expression e, func, da;
@@

(
-\(e.function\|e->function\) = func;
-\(e.data\|e->data\) = da;
|
-\(e.data\|e->data\) = da;
-\(e.function\|e->function\) = func;
)
-init_timer
+setup_timer
 ( \(&e\|e\)
+, func, da
 );

@match_function_and_data_after_init_timer@
expression e, e2, e3, e4, e5, func, da;
@@

-init_timer
+setup_timer
 ( \(&e\|e\)
+, func, da
 );
 ... when != func = e2
     when != da = e3
(
-e.function = func;
... when != da = e4
-e.data = da;
|
-e->function = func;
... when != da = e4
-e->data = da;
|
-e.data = da;
... when != func = e5
-e.function = func;
|
-e->data = da;
... when != func = e5
-e->function = func;
)

@match_function_and_data_before_init_timer@
expression e, e2, e3, e4, e5, func, da;
@@
(
-e.function = func;
... when != da = e4
-e.data = da;
|
-e->function = func;
... when != da = e4
-e->data = da;
|
-e.data = da;
... when != func = e5
-e.function = func;
|
-e->data = da;
... when != func = e5
-e->function = func;
)
... when != func = e2
    when != da = e3
-init_timer
+setup_timer
 ( \(&e\|e\)
+, func, da
 );

@r1 exists@
expression t;
identifier f;
position p;
@@

f(...) { ... when any
  init_timer@p(\(&t\|t\))
  ... when any
}

@r2 exists@
expression r1.t;
identifier g != r1.f;
expression e8;
@@

g(...) { ... when any
  \(t.data\|t->data\) = e8
  ... when any
}

// It is dangerous to use setup_timer if data field is initialized
// in another function.
@script:python depends on r2@
p << r1.p;
@@

cocci.include_match(False)

@r3@
expression r1.t, func, e7;
position r1.p;
@@

(
-init_timer@p(&t);
+setup_timer(&t, func, 0UL);
... when != func = e7
-t.function = func;
|
-t.function = func;
... when != func = e7
-init_timer@p(&t);
+setup_timer(&t, func, 0UL);
|
-init_timer@p(t);
+setup_timer(t, func, 0UL);
... when != func = e7
-t->function = func;
|
-t->function = func;
... when != func = e7
-init_timer@p(t);
+setup_timer(t, func, 0UL);
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21 15:57:06 -08:00
Kees Cook
24ed960abf treewide: Switch DEFINE_TIMER callbacks to struct timer_list *
This changes all DEFINE_TIMER() callbacks to use a struct timer_list
pointer instead of unsigned long. Since the data argument has already been
removed, none of these callbacks are using their argument currently, so
this renames the argument to "unused".

Done using the following semantic patch:

@match_define_timer@
declarer name DEFINE_TIMER;
identifier _timer, _callback;
@@

 DEFINE_TIMER(_timer, _callback);

@change_callback depends on match_define_timer@
identifier match_define_timer._callback;
type _origtype;
identifier _origarg;
@@

 void
-_callback(_origtype _origarg)
+_callback(struct timer_list *unused)
 { ... }

Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21 15:57:05 -08:00
Linus Torvalds
adb072d3cd We have a set of file locking improvements from Zheng, rbd rw/ro
state handling code cleanup from myself and some assorted CephFS fixes
 from Jeff.
 
 rbd now defaults to single-major=Y, lifting the limit of ~240 rbd
 images per host for everyone.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJaEwyIAAoJEEp/3jgCEfOLjgYH/jKJbQ1yJFPyTVTTv/U9/xH2
 kpHykEbzvvTT2TwNspbM9ZK4vSJPjYoHjL2qTRKxybuXYWYPxD2q6x+Z1iRP5G5N
 4Py3RUZaagCSSgbUhfNl3VCbdki6cIKHHz1tHWBuO75kFEg03yZroozzc3SCKH8T
 wHIa7UFxncDRroHMDiF5viF2tz4SfYSB0fd/Kev9qLJOiVr/lUTELfejlsu89ANT
 6UvXPiTd9iifxQxjLV+2eQM4x5JImiDJUhMvcqfDlY2l85LzVCVTPXFnN4ZoEPlt
 4NJj2SnnSQxSZLl1LwJC/gFYepdzW6qSxVqlpkAr0PvazZPushLpMA4AsKxWgVM=
 =qsu2
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.15-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "We have a set of file locking improvements from Zheng, rbd rw/ro state
  handling code cleanup from myself and some assorted CephFS fixes from
  Jeff.

  rbd now defaults to single-major=Y, lifting the limit of ~240 rbd
  images per host for everyone"

* tag 'ceph-for-4.15-rc1' of git://github.com/ceph/ceph-client:
  rbd: default to single-major device number scheme
  libceph: don't WARN() if user tries to add invalid key
  rbd: set discard_alignment to zero
  ceph: silence sparse endianness warning in encode_caps_cb
  ceph: remove the bump of i_version
  ceph: present consistent fsid, regardless of arch endianness
  ceph: clean up spinlocking and list handling around cleanup_cap_releases()
  rbd: get rid of rbd_mapping::read_only
  rbd: fix and simplify rbd_ioctl_set_ro()
  ceph: remove unused and redundant variable dropping
  ceph: mark expected switch fall-throughs
  ceph: -EINVAL on decoding failure in ceph_mdsc_handle_fsmap()
  ceph: disable cached readdir after dropping positive dentry
  ceph: fix bool initialization/comparison
  ceph: handle 'session get evicted while there are file locks'
  ceph: optimize flock encoding during reconnect
  ceph: make lock_to_ceph_filelock() static
  ceph: keep auth cap when inode has flocks or posix locks
2017-11-21 05:38:32 -10:00
Linus Torvalds
06ede5f608 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull more block layer updates from Jens Axboe:
 "A followup pull request, with some parts that either needed a bit more
  testing before going in, merge sync, or just later arriving fixes.
  This contains:

   - Timer related updates from Kees. These were purposefully delayed
     since I didn't want to pull in a later v4.14-rc tag to my block
     tree.

   - ide-cd prep sense buffer fix from Bart. Also delayed, as not to
     clash with the late fix we put into 4.14-rc.

   - Small BFQ updates series from Luca and Paolo.

   - Single nvmet fix from James, fixing a non-functional case there.

   - Bio fast clone fix from Michael, which made bcache return the wrong
     data for some cases.

   - Legacy IO path regression hang fix from Ming"

* 'for-linus' of git://git.kernel.dk/linux-block:
  bio: ensure __bio_clone_fast copies bi_partno
  nvmet_fc: fix better length checking
  block: wake up all tasks blocked in get_request()
  block, bfq: move debug blkio stats behind CONFIG_DEBUG_BLK_CGROUP
  block, bfq: update blkio stats outside the scheduler lock
  block, bfq: add missing invocations of bfqg_stats_update_io_add/remove
  doc, block, bfq: update max IOPS sustainable with BFQ
  ide: Make ide_cdrom_prep_fs() initialize the sense buffer pointer
  md: Convert timers to use timer_setup()
  block: swim3: Convert timers to use timer_setup()
  block/aoe: Convert timers to use timer_setup()
  amifloppy: Convert timers to use timer_setup()
  block/floppy: Convert callback to pass timer_list
2017-11-17 10:56:56 -08:00
Linus Torvalds
a3841f94c7 libnvdimm for 4.15
* Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable
  'userspace flush' of persistent memory updates via filesystem-dax
   mappings. It arranges for any filesystem metadata updates that may be
   required to satisfy a write fault to also be flushed ("on disk") before
   the kernel returns to userspace from the fault handler. Effectively
   every write-fault that dirties metadata completes an fsync() before
   returning from the fault handler. The new MAP_SHARED_VALIDATE mapping
   type guarantees that the MAP_SYNC flag is validated as supported by the
   filesystem's ->mmap() file operation.
 
 * Add support for the standard ACPI 6.2 label access methods that
   replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods. This
   enables interoperability with environments that only implement the
   standardized methods.
 
 * Add support for the ACPI 6.2 NVDIMM media error injection methods.
 
 * Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for latch
   last shutdown status, firmware update, SMART error injection, and
   SMART alarm threshold control.
 
 * Cleanup physical address information disclosures to be root-only.
 
 * Fix revalidation of the DIMM "locked label area" status to support
   dynamic unlock of the label area.
 
 * Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA
   (system-physical-address) command and error injection commands.
 
 Acknowledgements that came after the commits were pushed to -next:
 
 957ac8c421 dax: fix PMD faults on zero-length files
 Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
 
 a39e596baa xfs: support for synchronous DAX faults
 Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
 
 7b565c9f96 xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()
 Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaDfvcAAoJEB7SkWpmfYgCk7sP/2qJhBH+VTTdg2osDnhAdAhI
 co/AGEmsHFlUCMBb/Ek7UnMAmhBYiJU2q4ywPsNFBpusXpMlqNy5Iwo7k4/wQHE/
 SJcIM0g4zg0ViFuUhwV+C2T0R5UzFR8JLd9EYWj/YS6aJpurtotm5l4UStaM0Hzo
 AhxSXJLrBDuqCpbOxbctfiGEmdRL7aRfBEAARTNRKBn/iXxJUcYHlp62rtXQS+t4
 I6LC/URCWTNTTMGmzW6TRsgSD9WMfd19xKcGzN3qL6ee0KFccxN4ctFqHA/sFGOh
 iYLeR0XJUjJxyp+PkWGteXPVZL0Kj3bD/lSTG+Co5bm/ra8a/sh3TSFfgFyoBZD1
 EqMN8Ryf80hGp3FabeH2Iw2SviYPZpHSWgjddjxLD0RA6OmpzINc+Wm8eqApjMME
 sbZDTOijiab4QMQ0XamF4GuDHyQtawv5Y/w2Ehhl1tmiqW+5tKhsKqxkQt+/V3Yt
 RTVSRe2Pkway66b+cD64IdQ6L2tyonPnmi5IzgkKOhlOEGomy+4/U2Jt2bMbhzq6
 ymszKmXp2XI8P06wU8sHrIUeXO5I9qoKn/fZA73Eb8aIzgJe3tBE/5+Ab7RG6HB9
 1OVfcMWoXU1gNgNktTs63X1Lsg4aW9kt/K4fPHHcqUcaliEJpJTlAbg9GLF2buoW
 nQ+0fTRgMRihE3ZA0Fs3
 =h2vZ
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm and dax updates from Dan Williams:
 "Save for a few late fixes, all of these commits have shipped in -next
  releases since before the merge window opened, and 0day has given a
  build success notification.

  The ext4 touches came from Jan, and the xfs touches have Darrick's
  reviewed-by. An xfstest for the MAP_SYNC feature has been through
  a few round of reviews and is on track to be merged.

   - Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable
     'userspace flush' of persistent memory updates via filesystem-dax
     mappings. It arranges for any filesystem metadata updates that may
     be required to satisfy a write fault to also be flushed ("on disk")
     before the kernel returns to userspace from the fault handler.
     Effectively every write-fault that dirties metadata completes an
     fsync() before returning from the fault handler. The new
     MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag
     is validated as supported by the filesystem's ->mmap() file
     operation.

   - Add support for the standard ACPI 6.2 label access methods that
     replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods.
     This enables interoperability with environments that only implement
     the standardized methods.

   - Add support for the ACPI 6.2 NVDIMM media error injection methods.

   - Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for
     latch last shutdown status, firmware update, SMART error injection,
     and SMART alarm threshold control.

   - Cleanup physical address information disclosures to be root-only.

   - Fix revalidation of the DIMM "locked label area" status to support
     dynamic unlock of the label area.

   - Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA
     (system-physical-address) command and error injection commands.

  Acknowledgements that came after the commits were pushed to -next:

   - 957ac8c421 ("dax: fix PMD faults on zero-length files"):
       Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

   - a39e596baa ("xfs: support for synchronous DAX faults") and
     7b565c9f96 ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()")
        Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>"

* tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits)
  acpi, nfit: add 'Enable Latch System Shutdown Status' command support
  dax: fix general protection fault in dax_alloc_inode
  dax: fix PMD faults on zero-length files
  dax: stop requiring a live device for dax_flush()
  brd: remove dax support
  dax: quiet bdev_dax_supported()
  fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core
  tools/testing/nvdimm: unit test clear-error commands
  acpi, nfit: validate commands against the device type
  tools/testing/nvdimm: stricter bounds checking for error injection commands
  xfs: support for synchronous DAX faults
  xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()
  ext4: Support for synchronous DAX faults
  ext4: Simplify error handling in ext4_dax_huge_fault()
  dax: Implement dax_finish_sync_fault()
  dax, iomap: Add support for synchronous faults
  mm: Define MAP_SYNC and VM_SYNC flags
  dax: Allow tuning whether dax_insert_mapping_entry() dirties entry
  dax: Allow dax_iomap_fault() to return pfn
  dax: Fix comment describing dax_iomap_fault()
  ...
2017-11-17 09:51:57 -08:00
Colin Ian King
384bc41fc0 drivers/block/zram/zram_drv.c: make zram_page_end_io() static
zram_page_end_io() is local to the source and does not need to be in
global scope, so make it static.

Cleans up sparse warning:

  symbol 'zram_page_end_io' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20171016173336.20320-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Sergey Senozhatsky
0b07ff3972 zram: remove zlib from the list of recommended algorithms
ZSTD tends to outperform deflate/inflate, thus we remove zlib from the
list of recommended algorithms and recommend zstd instead.

Link: http://lkml.kernel.org/r/20170912050005.3247-2-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:03 -08:00
Sergey Senozhatsky
5ef3a8b125 zram: add zstd to the supported algorithms list
Add ZSTD to the list of supported compression algorithms.

ZRAM fio perf test:

                      LZO         DEFLATE         ZSTD

#jobs1
WRITE:              (2180MB/s)   (77.2MB/s)      (1429MB/s)
WRITE:              (1617MB/s)   (77.7MB/s)      (1202MB/s)
READ:                (426MB/s)   (595MB/s)       (1181MB/s)
READ:                (422MB/s)   (572MB/s)       (1020MB/s)
READ:                (318MB/s)   (67.8MB/s)      (563MB/s)
WRITE:               (318MB/s)   (67.9MB/s)      (564MB/s)
READ:                (336MB/s)   (68.3MB/s)      (583MB/s)
WRITE:               (335MB/s)   (68.2MB/s)      (582MB/s)
#jobs2
WRITE:              (3441MB/s)   (152MB/s)       (2141MB/s)
WRITE:              (2507MB/s)   (147MB/s)       (1888MB/s)
READ:                (801MB/s)   (1146MB/s)      (1890MB/s)
READ:                (767MB/s)   (1096MB/s)      (2073MB/s)
READ:                (621MB/s)   (126MB/s)       (1009MB/s)
WRITE:               (621MB/s)   (126MB/s)       (1009MB/s)
READ:                (656MB/s)   (125MB/s)       (1075MB/s)
WRITE:               (657MB/s)   (126MB/s)       (1077MB/s)
#jobs3
WRITE:              (4772MB/s)   (225MB/s)       (3394MB/s)
WRITE:              (3905MB/s)   (211MB/s)       (2939MB/s)
READ:               (1216MB/s)   (1608MB/s)      (3218MB/s)
READ:               (1159MB/s)   (1431MB/s)      (2981MB/s)
READ:                (906MB/s)   (156MB/s)       (1457MB/s)
WRITE:               (907MB/s)   (156MB/s)       (1458MB/s)
READ:                (953MB/s)   (158MB/s)       (1595MB/s)
WRITE:               (952MB/s)   (157MB/s)       (1593MB/s)
#jobs4
WRITE:              (6036MB/s)   (265MB/s)       (4469MB/s)
WRITE:              (5059MB/s)   (263MB/s)       (3951MB/s)
READ:               (1618MB/s)   (2066MB/s)      (4276MB/s)
READ:               (1573MB/s)   (1942MB/s)      (3830MB/s)
READ:               (1202MB/s)   (227MB/s)       (1971MB/s)
WRITE:              (1200MB/s)   (227MB/s)       (1968MB/s)
READ:               (1265MB/s)   (226MB/s)       (2116MB/s)
WRITE:              (1264MB/s)   (226MB/s)       (2114MB/s)
#jobs5
WRITE:              (5339MB/s)   (233MB/s)       (3781MB/s)
WRITE:              (4298MB/s)   (234MB/s)       (3276MB/s)
READ:               (1626MB/s)   (2048MB/s)      (4081MB/s)
READ:               (1567MB/s)   (1929MB/s)      (3758MB/s)
READ:               (1174MB/s)   (205MB/s)       (1747MB/s)
WRITE:              (1173MB/s)   (204MB/s)       (1746MB/s)
READ:               (1214MB/s)   (208MB/s)       (1890MB/s)
WRITE:              (1215MB/s)   (208MB/s)       (1892MB/s)
#jobs6
WRITE:              (5666MB/s)   (270MB/s)       (4338MB/s)
WRITE:              (4828MB/s)   (267MB/s)       (3772MB/s)
READ:               (1803MB/s)   (2058MB/s)      (4946MB/s)
READ:               (1805MB/s)   (2156MB/s)      (4711MB/s)
READ:               (1334MB/s)   (235MB/s)       (2135MB/s)
WRITE:              (1335MB/s)   (235MB/s)       (2137MB/s)
READ:               (1364MB/s)   (236MB/s)       (2268MB/s)
WRITE:              (1365MB/s)   (237MB/s)       (2270MB/s)
#jobs7
WRITE:              (5474MB/s)   (270MB/s)       (4300MB/s)
WRITE:              (4666MB/s)   (266MB/s)       (3817MB/s)
READ:               (2022MB/s)   (2319MB/s)      (5472MB/s)
READ:               (1924MB/s)   (2260MB/s)      (5031MB/s)
READ:               (1369MB/s)   (242MB/s)       (2153MB/s)
WRITE:              (1370MB/s)   (242MB/s)       (2155MB/s)
READ:               (1499MB/s)   (246MB/s)       (2310MB/s)
WRITE:              (1497MB/s)   (246MB/s)       (2307MB/s)
#jobs8
WRITE:              (5558MB/s)   (273MB/s)       (4439MB/s)
WRITE:              (4763MB/s)   (271MB/s)       (3918MB/s)
READ:               (2201MB/s)   (2599MB/s)      (6062MB/s)
READ:               (2105MB/s)   (2463MB/s)      (5413MB/s)
READ:               (1490MB/s)   (252MB/s)       (2238MB/s)
WRITE:              (1488MB/s)   (252MB/s)       (2236MB/s)
READ:               (1566MB/s)   (254MB/s)       (2434MB/s)
WRITE:              (1568MB/s)   (254MB/s)       (2437MB/s)
#jobs9
WRITE:              (5120MB/s)   (264MB/s)       (4035MB/s)
WRITE:              (4531MB/s)   (267MB/s)       (3740MB/s)
READ:               (1940MB/s)   (2258MB/s)      (4986MB/s)
READ:               (2024MB/s)   (2387MB/s)      (4871MB/s)
READ:               (1343MB/s)   (246MB/s)       (2038MB/s)
WRITE:              (1342MB/s)   (246MB/s)       (2037MB/s)
READ:               (1553MB/s)   (238MB/s)       (2243MB/s)
WRITE:              (1552MB/s)   (238MB/s)       (2242MB/s)
#jobs10
WRITE:              (5345MB/s)   (271MB/s)       (3988MB/s)
WRITE:              (4750MB/s)   (254MB/s)       (3668MB/s)
READ:               (1876MB/s)   (2363MB/s)      (5150MB/s)
READ:               (1990MB/s)   (2256MB/s)      (5080MB/s)
READ:               (1355MB/s)   (250MB/s)       (2019MB/s)
WRITE:              (1356MB/s)   (251MB/s)       (2020MB/s)
READ:               (1490MB/s)   (252MB/s)       (2202MB/s)
WRITE:              (1488MB/s)   (252MB/s)       (2199MB/s)

jobs1                              perfstat
instructions                 52,065,555,710 (    0.79)    855,731,114,587 (    2.64)       54,280,709,944 (    1.40)
branches                     14,020,427,116 ( 725.847)    101,733,449,582 (1074.521)       11,170,591,067 ( 992.869)
branch-misses                    22,626,174 (   0.16%)        274,197,885 (   0.27%)           25,915,805 (   0.23%)
jobs2                              perfstat
instructions                103,633,110,402 (    0.75)  1,710,822,100,914 (    2.59)      107,879,874,104 (    1.28)
branches                     27,931,237,282 ( 679.203)    203,298,267,479 (1037.326)       22,185,350,842 ( 884.427)
branch-misses                    46,103,811 (   0.17%)        533,747,204 (   0.26%)           49,682,483 (   0.22%)
jobs3                              perfstat
instructions                154,857,283,657 (    0.76)  2,565,748,974,197 (    2.57)      161,515,435,813 (    1.31)
branches                     41,759,490,355 ( 670.529)    304,905,605,277 ( 978.765)       33,215,805,907 ( 888.003)
branch-misses                    74,263,293 (   0.18%)        759,746,240 (   0.25%)           76,841,196 (   0.23%)
jobs4                              perfstat
instructions                206,215,849,076 (    0.75)  3,420,169,460,897 (    2.60)      215,003,061,664 (    1.31)
branches                     55,632,141,739 ( 666.501)    406,394,977,433 ( 927.241)       44,214,322,251 ( 883.532)
branch-misses                   102,287,788 (   0.18%)      1,098,617,314 (   0.27%)          103,891,040 (   0.23%)
jobs5                              perfstat
instructions                258,711,315,588 (    0.67)  4,275,657,533,244 (    2.23)      269,332,235,685 (    1.08)
branches                     69,802,821,166 ( 588.823)    507,996,211,252 ( 797.036)       55,450,846,129 ( 735.095)
branch-misses                   129,217,214 (   0.19%)      1,243,284,991 (   0.24%)          173,512,278 (   0.31%)
jobs6                              perfstat
instructions                312,796,166,008 (    0.61)  5,133,896,344,660 (    2.02)      323,658,769,588 (    1.04)
branches                     84,372,488,583 ( 520.541)    610,310,494,402 ( 697.642)       66,683,292,992 ( 693.939)
branch-misses                   159,438,978 (   0.19%)      1,396,368,563 (   0.23%)          174,406,934 (   0.26%)
jobs7                              perfstat
instructions                363,211,372,930 (    0.56)  5,988,205,600,879 (    1.75)      377,824,674,156 (    0.93)
branches                     98,057,013,765 ( 463.117)    711,841,255,974 ( 598.762)       77,879,009,954 ( 600.443)
branch-misses                   199,513,153 (   0.20%)      1,507,651,077 (   0.21%)          248,203,369 (   0.32%)
jobs8                              perfstat
instructions                413,960,354,615 (    0.52)  6,842,918,558,378 (    1.45)      431,938,486,581 (    0.83)
branches                    111,812,574,884 ( 414.224)    813,299,084,518 ( 491.173)       89,062,699,827 ( 517.795)
branch-misses                   233,584,845 (   0.21%)      1,531,593,921 (   0.19%)          286,818,489 (   0.32%)
jobs9                              perfstat
instructions                465,976,220,300 (    0.53)  7,698,467,237,372 (    1.47)      486,352,600,321 (    0.84)
branches                    125,931,456,162 ( 424.063)    915,207,005,715 ( 498.192)      100,370,404,090 ( 517.439)
branch-misses                   256,992,445 (   0.20%)      1,782,809,816 (   0.19%)          345,239,380 (   0.34%)
jobs10                             perfstat
instructions                517,406,372,715 (    0.53)  8,553,527,312,900 (    1.48)      540,732,653,094 (    0.84)
branches                    139,839,780,676 ( 427.732)  1,016,737,699,389 ( 503.172)      111,696,557,638 ( 516.750)
branch-misses                   259,595,561 (   0.19%)      1,952,570,279 (   0.19%)          357,818,661 (   0.32%)

seconds elapsed        20.630411534     96.084546565    12.743373571
seconds elapsed        22.292627625     100.984155001   14.407413560
seconds elapsed        22.396016966     110.344880848   14.032201392
seconds elapsed        22.517330949     113.351459170   14.243074935
seconds elapsed        28.548305104     156.515193765   19.159286861
seconds elapsed        30.453538116     164.559937678   19.362492717
seconds elapsed        33.467108086     188.486827481   21.492612173
seconds elapsed        35.617727591     209.602677783   23.256422492
seconds elapsed        42.584239509     243.959902566   28.458540338
seconds elapsed        47.683632526     269.635248851   31.542404137

Over all, ZSTD has slower WRITE, but much faster READ (perhaps
a static compression buffer used during the test helped ZSTD a
lot), which results in faster test results.

Memory consumption (zram mm_stat file):

zram LZO mm_stat
mm_stat (jobs1): 2147483648 23068672 33558528        0 33558528        0        0
mm_stat (jobs2): 2147483648 23068672 33558528        0 33558528        0        0
mm_stat (jobs3): 2147483648 23068672 33558528        0 33562624        0        0
mm_stat (jobs4): 2147483648 23068672 33558528        0 33558528        0        0
mm_stat (jobs5): 2147483648 23068672 33558528        0 33558528        0        0
mm_stat (jobs6): 2147483648 23068672 33558528        0 33562624        0        0
mm_stat (jobs7): 2147483648 23068672 33558528        0 33566720        0        0
mm_stat (jobs8): 2147483648 23068672 33558528        0 33558528        0        0
mm_stat (jobs9): 2147483648 23068672 33558528        0 33558528        0        0
mm_stat (jobs10): 2147483648 23068672 33558528        0 33562624        0        0

zram DEFLATE mm_stat
mm_stat (jobs1): 2147483648 16252928 25178112        0 25178112        0        0
mm_stat (jobs2): 2147483648 16252928 25178112        0 25178112        0        0
mm_stat (jobs3): 2147483648 16252928 25178112        0 25178112        0        0
mm_stat (jobs4): 2147483648 16252928 25178112        0 25178112        0        0
mm_stat (jobs5): 2147483648 16252928 25178112        0 25178112        0        0
mm_stat (jobs6): 2147483648 16252928 25178112        0 25178112        0        0
mm_stat (jobs7): 2147483648 16252928 25178112        0 25190400        0        0
mm_stat (jobs8): 2147483648 16252928 25178112        0 25190400        0        0
mm_stat (jobs9): 2147483648 16252928 25178112        0 25178112        0        0
mm_stat (jobs10): 2147483648 16252928 25178112        0 25178112        0        0

zram ZSTD mm_stat
mm_stat (jobs1): 2147483648 11010048 16781312        0 16781312        0        0
mm_stat (jobs2): 2147483648 11010048 16781312        0 16781312        0        0
mm_stat (jobs3): 2147483648 11010048 16781312        0 16785408        0        0
mm_stat (jobs4): 2147483648 11010048 16781312        0 16781312        0        0
mm_stat (jobs5): 2147483648 11010048 16781312        0 16781312        0        0
mm_stat (jobs6): 2147483648 11010048 16781312        0 16781312        0        0
mm_stat (jobs7): 2147483648 11010048 16781312        0 16781312        0        0
mm_stat (jobs8): 2147483648 11010048 16781312        0 16781312        0        0
mm_stat (jobs9): 2147483648 11010048 16781312        0 16785408        0        0
mm_stat (jobs10): 2147483648 11010048 16781312        0 16781312        0        0

==================================================================================

Official benchmarks [1]:

Compressor name         Ratio   Compression     Decompress.
zstd 1.1.3 -1           2.877   430 MB/s        1110 MB/s
zlib 1.2.8 -1           2.743   110 MB/s        400 MB/s
brotli 0.5.2 -0         2.708   400 MB/s        430 MB/s
quicklz 1.5.0 -1        2.238   550 MB/s        710 MB/s
lzo1x 2.09 -1           2.108   650 MB/s        830 MB/s
lz4 1.7.5               2.101   720 MB/s        3600 MB/s
snappy 1.1.3            2.091   500 MB/s        1650 MB/s
lzf 3.6 -1              2.077   400 MB/s        860 MB/s

Minchan said:

: I did test with my sample data and compared zstd with deflate.  zstd's
: compress ratio is lower a little bit but compression speed is much faster
: 3 times more and decompress speed is too 2 times more.  With different
: data, it is different but overall, zstd would be better for speed at the
: cost of a little lower compress ratio(about 5%) so I believe it's worth to
: replace deflate.

[1] https://github.com/facebook/zstd

Link: http://lkml.kernel.org/r/20170912050005.3247-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Tested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:03 -08:00
Minchan Kim
23c47d2ada bdi: introduce BDI_CAP_SYNCHRONOUS_IO
As discussed at

  https://lkml.kernel.org/r/<20170728165604.10455-1-ross.zwisler@linux.intel.com>

someday we will remove rw_page().  If so, we need something to detect
such super-fast storage on which synchronous IO operations like the
current rw_page are always a win.

Introduces BDI_CAP_SYNCHRONOUS_IO to indicate such devices.  With it, we
could use various optimization techniques.

Link: http://lkml.kernel.org/r/1505886205-9671-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:02 -08:00
Minchan Kim
e447a0151f zram: set BDI_CAP_STABLE_WRITES once
With fast swap storage, the platform wants to use swap more aggressively
and swap-in is crucial to application latency.

The rw_page() based synchronous devices like zram, pmem and btt are such
fast storage.  When I profile swapin performance with zram lz4
decompress test, S/W overhead is more than 70%.  Maybe, it would be
bigger in nvdimm.

This patchset reduces swap-in latency by skipping swapcache if the swap
device is a synchronous device like a rw_page() based device.

It enhances by 45% my swapin test (5G sequential swapin, no readahead)
from 2.41sec to 1.64sec.

This patch (of 4):

Commit 19b7ccf865 ("block: get rid of blk_integrity_revalidate()")
fixed a weird thing (i.e., reset BDI_CAP_STABLE_WRITES flag
unconditionally whenever revalidat_disk is called) so zram doesn't need
to reset the flag any more when revalidating the bdev.  Instead, set the
flag just once when the zram device is created.

It shouldn't change any behavior.

Link: http://lkml.kernel.org/r/1505886205-9671-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Hugh Dickins <hughd@google.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:02 -08:00
Kees Cook
b5775a6ba3 block: swim3: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-14 20:11:57 -07:00
Kees Cook
0e0cc9df86 block/aoe: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Ed L. Cashin" <ed.cashin@acm.org>
Cc: linux-block@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-14 20:11:57 -07:00
Kees Cook
cbb9d17875 amifloppy: Convert timers to use timer_setup()
This converts the amifloppy driver to pass the timer pointer to the
callback instead of the drive number (and flags). It eliminates the
decusagecounter flag, as it was unused, and drops the ininterrupt flag
which appeared to be a needless optimization. The drive can then be
calculated from the offset of the timer in the drive timer array.

Additionally moves to a static data variable instead of the
soon-to-be-gone timer->data field.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Krzysztof Halasa <khc@pm.waw.pl>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-14 20:11:57 -07:00
Kees Cook
b1bf42105a block/floppy: Convert callback to pass timer_list
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to passing in the timer pointer explicitly.
Calculate the drive from the offset of the timer in the timer list.

Cc: Jiri Kosina <jikos@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <tom.leiming@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Geliang Tang <geliangtang@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-14 20:10:35 -07:00
Linus Torvalds
e37e0ee019 A couple of dma-mapping updates:
- turn dma_cache_sync into a dma_map_ops instance and remove
    implementation that purely are dead because the architecture
    doesn't support noncoherent allocations
  - add a flag for busses that need DMA configuration (Robin Murphy)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAloLSrYLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYOMuQ//XXD94uNPYavrgXzGsAtg+I+LEm+xyk4T0dX5fxfj
 amXX49MHoGemjsBgzJlkQMMFqwDEdkKyEuFnEuy6OeowYCyD6zW0MJ3MwP9OosNJ
 PNTdGZIfSvxPYEW8cR9AdK3iQ2loMBZnYhd+O/oVjSugULLW2DNa7r2VRktcCKoh
 8Ob/8gL6Y9xEYJBRszhrBwKTa/hU8IThxxozBFzN7I3LIKyFboSTcwXGLAHow43g
 4anCTjWTaDcoU2JwY6UTRKRRTV+gD0ZRcsZfd8lNNb5rtMVZkBVOHbF14SMAmw1r
 kSgRcU3+WIFPhK/8wBYqtGZZGnOgFBTHVeqow3AdS728pBWlWl8niTK0DiIgCd3m
 qzScF6SqfN1bCZkZAy8FUV2l0DPYKS6lvyNkf00Eb2W/f6LEqAcjCi2QDDxRfaw+
 Vm97nPUiM+uXNy/6KtAy6ChdprSqx12/edXPp7Y3H2rS/+Dmr6exeix+wb7QUN8W
 JI7ZRHo4JLaJZk/XrZtGX/6jnN1Jo7vfApQOmYDY7kE1iGtOU/LQQj8gcZRVQxML
 4soN6ivSmZX2k03LabWHpYQ8QiyCSYChLC+Az7rQH47LDLeu1IdTJu6orpXpaxyo
 ymzEWlHbmF7mE66X4g/Up/eAYk2YLUA3rKLGVjAIaWDBzHftSFg5EaAnqMADC1G2
 hSo=
 =ALJf
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-4.15' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - turn dma_cache_sync into a dma_map_ops instance and remove
   implementation that purely are dead because the architecture doesn't
   support noncoherent allocations

 - add a flag for busses that need DMA configuration (Robin Murphy)

* tag 'dma-mapping-4.15' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: turn dma_cache_sync into a dma_map_ops method
  sh: make dma_cache_sync a no-op
  xtensa: make dma_cache_sync a no-op
  unicore32: make dma_cache_sync a no-op
  powerpc: make dma_cache_sync a no-op
  mn10300: make dma_cache_sync a no-op
  microblaze: make dma_cache_sync a no-op
  ia64: make dma_cache_sync a no-op
  frv: make dma_cache_sync a no-op
  x86: make dma_cache_sync a no-op
  floppy: consolidate the dummy fd_cacheflush definition
  drivers: flag buses which demand DMA configuration
2017-11-14 16:54:12 -08:00
Dan Williams
7a862fbbde brd: remove dax support
DAX support in brd is awkward because its backing page frames are
distinct from the ones provided by pmem, dcssblk, or axonram. We need
pfn_t_devmap() entries to fully support DAX, and the limited DAX support
for pfn_t_special() page frames is not interesting for brd when pmem is
already a superset of brd.  Lastly, brd is the only dax capable driver
that may sleep in its ->direct_access() implementation. So it causes a
global burden with no net gain of kernel functionality.

For all these reasons, remove DAX support.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-14 16:49:06 -08:00
Linus Torvalds
e2c5923c34 Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block
Pull core block layer updates from Jens Axboe:
 "This is the main pull request for block storage for 4.15-rc1.

  Nothing out of the ordinary in here, and no API changes or anything
  like that. Just various new features for drivers, core changes, etc.
  In particular, this pull request contains:

   - A patch series from Bart, closing the whole on blk/scsi-mq queue
     quescing.

   - A series from Christoph, building towards hidden gendisks (for
     multipath) and ability to move bio chains around.

   - NVMe
        - Support for native multipath for NVMe (Christoph).
        - Userspace notifications for AENs (Keith).
        - Command side-effects support (Keith).
        - SGL support (Chaitanya Kulkarni)
        - FC fixes and improvements (James Smart)
        - Lots of fixes and tweaks (Various)

   - bcache
        - New maintainer (Michael Lyle)
        - Writeback control improvements (Michael)
        - Various fixes (Coly, Elena, Eric, Liang, et al)

   - lightnvm updates, mostly centered around the pblk interface
     (Javier, Hans, and Rakesh).

   - Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)

   - Writeback series that fix the much discussed hundreds of millions
     of sync-all units. This goes all the way, as discussed previously
     (me).

   - Fix for missing wakeup on writeback timer adjustments (Yafang
     Shao).

   - Fix laptop mode on blk-mq (me).

   - {mq,name} tupple lookup for IO schedulers, allowing us to have
     alias names. This means you can use 'deadline' on both !mq and on
     mq (where it's called mq-deadline). (me).

   - blktrace race fix, oopsing on sg load (me).

   - blk-mq optimizations (me).

   - Obscure waitqueue race fix for kyber (Omar).

   - NBD fixes (Josef).

   - Disable writeback throttling by default on bfq, like we do on cfq
     (Luca Miccio).

   - Series from Ming that enable us to treat flush requests on blk-mq
     like any other request. This is a really nice cleanup.

   - Series from Ming that improves merging on blk-mq with schedulers,
     getting us closer to flipping the switch on scsi-mq again.

   - BFQ updates (Paolo).

   - blk-mq atomic flags memory ordering fixes (Peter Z).

   - Loop cgroup support (Shaohua).

   - Lots of minor fixes from lots of different folks, both for core and
     driver code"

* 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
  nvme: fix visibility of "uuid" ns attribute
  blk-mq: fixup some comment typos and lengths
  ide: ide-atapi: fix compile error with defining macro DEBUG
  blk-mq: improve tag waiting setup for non-shared tags
  brd: remove unused brd_mutex
  blk-mq: only run the hardware queue if IO is pending
  block: avoid null pointer dereference on null disk
  fs: guard_bio_eod() needs to consider partitions
  xtensa/simdisk: fix compile error
  nvme: expose subsys attribute to sysfs
  nvme: create 'slaves' and 'holders' entries for hidden controllers
  block: create 'slaves' and 'holders' entries for hidden gendisks
  nvme: also expose the namespace identification sysfs files for mpath nodes
  nvme: implement multipath access to nvme subsystems
  nvme: track shared namespaces
  nvme: introduce a nvme_ns_ids structure
  nvme: track subsystems
  block, nvme: Introduce blk_mq_req_flags_t
  block, scsi: Make SCSI quiesce and resume work reliably
  block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
  ...
2017-11-14 15:32:19 -08:00
Linus Torvalds
abc36be236 A couple of configfs cleanups:
- proper use of the bool type (Thomas Meyer)
   - constification of struct config_item_type (Bhumika Goyal)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAloLSTALHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYNxfhAAv3cunxiEPEAvs+1xuGd3cZYaxz7qinvIODPxIKoF
 kRWiuy5PUklRMnJ8seOgJ1p1QokX6Sk4cZ8HcctDJVByqODjOq4K5eaKVN1ZqJoz
 BUzO/gOqfs64r9yaFIlKfe8nFA+gpUftSeWyv3lThxAIJ1iSbue7OZ/A10tTOS1m
 RWp9FPepFv+nJMfWqeQU64BsoDQ4kgZ2NcEA+jFxNx5dlmIbLD49tk0lfddvZQXr
 j5WyAH73iugilLtNUGVOqSzHBY4kUvfCKUV7leirCegyMoGhFtA87m6Wzwbo6ZUI
 DwQLzWvuPaGv1P2PpNEHfKiNbfIEp75DRyyyf87DD3lc5ffAxQSm28mGuwcr7Rn5
 Ow/yWL6ERMzCLExoCzEkXYJISy7T5LIzYDgNggKMpeWxysAduF7Onx7KfW1bTuhK
 mHvY7iOXCjEvaIVaF8uMKE6zvuY1vCMRXaJ+kC9jcIE3gwhg+2hmQvrdJ2uAFXY+
 rkeF2Poj/JlblPU4IKWAjiPUbzB7Lv0gkypCB2pD4riaYIN5qCAgF8ULIGQp2hsO
 lYW1EEgp5FBop85oSO/HAGWeH9dFg0WaV7WqNRVv0AGXhKjgy+bVd7iYPpvs7mGw
 z9IqSQDORcG2ETLcFhZgiJpCk/itwqXBD+wgMOjJPP8lL+4kZ8FcuhtY9kc9WlJE
 Tew=
 =+tMO
 -----END PGP SIGNATURE-----

Merge tag 'configfs-for-4.15' of git://git.infradead.org/users/hch/configfs

Pull configfs updates from Christoph Hellwig:
 "A couple of configfs cleanups:

   - proper use of the bool type (Thomas Meyer)

   - constification of struct config_item_type (Bhumika Goyal)"

* tag 'configfs-for-4.15' of git://git.infradead.org/users/hch/configfs:
  RDMA/cma: make config_item_type const
  stm class: make config_item_type const
  ACPI: configfs: make config_item_type const
  nvmet: make config_item_type const
  usb: gadget: configfs: make config_item_type const
  PCI: endpoint: make config_item_type const
  iio: make function argument and some structures const
  usb: gadget: make config_item_type structures const
  dlm: make config_item_type const
  netconsole: make config_item_type const
  nullb: make config_item_type const
  ocfs2/cluster: make config_item_type const
  target: make config_item_type const
  configfs: make ci_type field, some pointers and function arguments const
  configfs: make config_item_type const
  configfs: Fix bool initialization/comparison
2017-11-14 14:44:04 -08:00