kernel-ark

Author	SHA1	Message	Date
Matthew Wilcox	1c2ad9faaf	NVMe: Simplify nvme_unmap_user_pages By using the iod->nents field (the same way other I/O paths do), we can avoid recalculating the number of sg entries at unmap time, and make nvme_unmap_user_pages() easier to call. Also, use the 'write' parameter instead of assuming DMA_FROM_DEVICE. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 14:54:22 -05:00
Matthew Wilcox	fe304c43c6	NVMe: Mark the end of the sg list For user I/O and admin commands, we were forgetting to mark the end of the SG list. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 14:54:14 -05:00
Matthew Wilcox	497421880a	NVMe: Fix DMA mapping for admin commands We were always mapping as DMA_FROM_DEVICE then unmapping with DMA_TO_DEVICE which was clearly not correct. Follow the same pattern as nvme_submit_io() and key off the bottom bit of the opcode to determine whether this is a read or a write. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 14:54:05 -05:00
Matthew Wilcox	ff976d724a	NVMe: Rename IO_TIMEOUT to NVME_IO_TIMEOUT IO_TIMEOUT is a little too generic and might be used by other parts of the kernel in the future. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 14:53:54 -05:00
Matthew Wilcox	eca18b2394	NVMe: Merge the nvme_bio and nvme_prp data structures The new merged data structure is called nvme_iod. This improves performance for mid-sized I/Os (in the 16k range) since we save a memory allocation. It is also a slightly simpler interface to use. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 14:51:20 -05:00
Matthew Wilcox	5c1281a3bf	NVMe: Change nvme_completion_fn to take a dev The queue is only needed for some rare occasions, and it's more consistent to pass the device around. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 14:51:00 -05:00
Matthew Wilcox	040a93b52a	NVMe: Change get_nvmeq to take a dev instead of a namespace Upcoming patches require calling get_nvmeq when we don't have a namespace. Some callers already have the device in a local variable anyway. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 14:49:18 -05:00
Matthew Wilcox	c2f5b65020	NVMe: Simplify completion handling Instead of encoding the handler type in the bottom two bits of the per-completion context pointer, store the handler function as well as the context pointer. This gives us more flexibility and the code is clearer. It comes at the cost of an extra 8k of memory per queue, but this feels like a reasonable price to pay. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 14:47:46 -05:00
Matthew Wilcox	010e646ba2	NVMe: Update Identify Controller data structure The driver was still using an old definition of Identify Controller which only came to light once we started using the 'number of namespaces' field properly. Reported-by: Nisheeth Bhat <nisheeth.bhat@intel.com> Reported-by: Khosrow Panah <Khosrow.Panah@idt.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 16:24:23 -04:00
Matthew Wilcox	f1938f6e1e	NVMe: Implement doorbell stride capability The doorbell stride allows devices to spread out their doorbells instead of packing them tightly. This feature was added as part of ECN 003. This patch also enables support for more than 512 queues :-) Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:05 -04:00
Matthew Wilcox	ce38c14957	NVMe: Version 0.7 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:05 -04:00
Matthew Wilcox	2b2c189687	NVMe: Don't probe namespace 0 ECN 001 documented that namespace 0 is not valid. Sending an Identify with CNS of 0 and Namespace of 0 is an undefined command. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:04 -04:00
Nisheeth Bhat	0d1bc91258	Fix calculation of number of pages in a PRP List The existing calculation underestimated the number of pages required as it did not take into account the pointer at the end of each page. The replacement calculation may overestimate the number of pages required if the last page in the PRP List is entirely full. By using ->npages as a counter as we fill in the pages, we ensure that we don't try to free a page that was never allocated. Signed-off-by: Nisheeth Bhat <nisheeth.bhat@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:04 -04:00
Matthew Wilcox	bc5fc7e4b2	NVMe: Create nvme_identify and nvme_get_features functions Instead of open-coding calls to nvme_submit_admin_cmd, these small wrappers are simpler to use (the patch removes 14 lines from nvme_dev_add() for example). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:04 -04:00
Matthew Wilcox	684f5c2025	NVMe: Fix memory leak in nvme_dev_add() The driver was allocating 8k of memory, then freeing 4k of it. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:04 -04:00
Nisheeth Bhat	d1a490e026	NVMe: Fix calls to dma_unmap_sg dma_unmap_sg() must be called with the same 'nents' passed to dma_map_sg(), not the number returned from dma_map_sg(). Signed-off-by: Nisheeth Bhat <nisheeth.bhat@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:04 -04:00
Matthew Wilcox	d0ba1e497b	NVMe: Correct sg list setup in nvme_map_user_pages Our SG list was constructed to always fill the entire first page, even if that was more than the length of the I/O. This is probably harmless, but some IOMMUs might do something bad. Correcting the first call to sg_set_page() made it look a lot closer to the sg_set_page() in the loop, so fold the first call to sg_set_page() into the loop. Reported-by: Nisheeth Bhat <nisheeth.bhat@intel.com> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>	2011-11-04 15:53:04 -04:00
Matthew Wilcox	6413214c5d	Fix bug in NVME_IOCTL_SUBMIT_IO Missing 'break' in the switch statement meant that we'd fall through to the 'return -EINVAL' case.	2011-11-04 15:53:04 -04:00
Matthew Wilcox	6bbf1acdde	NVMe: Rework ioctls Remove the special-purpose IDENTIFY, GET_RANGE_TYPE, DOWNLOAD_FIRMWARE and ACTIVATE_FIRMWARE commands. Replace them with a generic ADMIN_CMD ioctl that can submit any admin command. Add a new ID ioctl that returns the namespace ID of the queried device. It corresponds to the SCSI Idlun ioctl. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	eac623ba7a	NVMe: Add the nvme thread to the wait queue before waking it up If the I/O was not completed by a single NVMe command, we add the bio to the congestion list and wake up the kthread to resubmit it. But the kthread calls remove_wait_queue() unconditionally, which will oops if it's not on the wait queue. So add the kthread to the wait queue before waking it up. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	6f0f54499f	NVMe: Return real error from nvme_create_queue nvme_setup_io_queues() was assuming that a NULL return from nvme_create_queue() was an out-of-memory error. That's not necessarily true; the adapter might return -EIO, for example. Change the calling convention to return an ERR_PTR on failure instead of NULL. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	be5e094840	NVMe: Version 0.6 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	184d2944cb	NVMe: Add a few calling convention notes For the benefit of reviewers, add comments to a few functions describing their calling context Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	b77954cbdd	NVMe: Handle failures from memory allocations in nvme_setup_prps If any of the memory allocations in nvme_setup_prps fail, handle it by modifying the passed-in data length to reflect the number of bytes we are actually able to send. Also allow the caller to specify the GFP flags they need; for user-initiated commands, we can use GFP_KERNEL allocations. The various callers are updated to handle this possibility; the main I/O path is already prepared for this possibility (as it may happen due to nvme_map_bio being unable to map all the segments of the I/O). The other callers return -ENOMEM instead of doing partial I/Os. Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	5aff9382dd	NVMe: Use an IDA to allocate minor numbers The current approach of using the namespace ID as the minor number doesn't work when there are multiple adapters in the machine. Rather than statically partitioning the number of namespaces between adapters, dynamically allocate minor numbers to namespaces as they are detected. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	fd63e9ceee	NVMe: Add include of delay.h for msleep Previously it was being implicitly included through some other header file Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	8de055350f	NVMe: Add support for timing out I/Os In the kthread, walk the list of outstanding I/Os and check they've not hit the timeout. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	21075bdee0	NVMe: Rename cancel_cmdid_data to cancel_cmdid The trailing '_data' on the end was annoying and inconsistent. Also, make it actually return the data since this is needed for timing out commands. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	09a58f5364	NVMe: Fix bug in error handling When an I/O completed with an error, we would call bio_endio twice (once with -EIO and once with 0). Found by inspection. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	22605f9681	NVMe: Time out initialisation after a few seconds THe device reports (in its capability register) how long it will take to initialise. If that time elapses before the ready bit becomes set, conclude the device is broken and refuse to initialise it. Log a nice error message so the user knows why we did nothing. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	aba2080f3f	NVMe: Fix warning in free_irq We need to clear the affinity mask before calling free_irq() Reported-by: Shane Michael Matthews <shane.matthews@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	7f53f9d242	NVMe: Correct the Controller Configuration settings The arbitration field was extended by one bit, shifting the shutdown notification bits by one. Also, the SQ/CQ entry size was made configurable for future extensions. Reported-by: Paul Luse <paul.e.luse@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:01 -04:00
Matthew Wilcox	8ef700678f	NVMe: Version 0.5 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:01 -04:00
Matthew Wilcox	6c7d49455c	NVMe: Change the definition of nvme_user_io The read and write commands don't define a 'result', so there's no need to copy it back to userspace. Remove the ability of the ioctl to submit commands to a different namespace; it's just asking for trouble, and the use case I have in mind will be addressed througha different ioctl in the future. That removes the need for both the block_shift and nsid arguments. Check that the opcode is one of 'read' or 'write'. Future opcodes may be added in the future, but we will need a different structure definition for them. The nblocks field is redefined to be 0-based. This allows the user to request the full 65536 blocks. Don't byteswap the reftag, apptag and appmask. Martin Petersen tells me these are calculated in big-endian and are transmitted to the device in big-endian. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:01 -04:00
Matthew Wilcox	9d4af1b779	NVMe: Correct the definitions of two ioctls NVME_IOCTL_SUBMIT_IO has a struct nvme_user_io, not a struct nvme_rw_command as a parameter, and NVME_IOCTL_DOWNLOAD_FW is a Write, not a Read. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:01 -04:00
Matthew Wilcox	4948168280	NVMe: Add compat_ioctl Make ioctls work for 32-bit applications on 64-bit kernels. The structures are defined to be the same for both 32- and 64-bit applications, so we can use the same handler for both. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:01 -04:00
Matthew Wilcox	9ecdc94621	NVMe: Simplify queue lookup Fill in all the num_possible_cpus() entries with duplicate pointers. This reduces the complexity of the frequently-called get_nvmeq(), as well as avoiding a bug in it when there are fewer queues than CPUs. Reported-by: Shane Michael Matthews <shane.matthews@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:01 -04:00
Matthew Wilcox	3cb967c039	NVMe: Remove the kthread from the wait queue Once there are no more bios on the congestion list, we can stop waking up the nvme kthread every time a completion happens. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Matthew Wilcox	7523d834dd	NVMe: Fix off-by-one when filling in PRP lists If the last element in the PRP list fits on the end of the page, there's no need to allocate an extra page to put that single element in. It can fit on the end of the page. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Matthew Wilcox	ac88c36a38	NVMe: Fix interpretation of 'Number of Namespaces' field The spec says this is a 0s based value. We don't need to handle the maximal value because it's reserved to mean "every namespace". Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Matthew Wilcox	19e899b2f9	NVMe: Remove outdated comments The head can never overrun the tail since we won't allocate enough command IDs to let that happen. The status codes are in sync with the spec. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Matthew Wilcox	fa92282149	NVMe: Fix comment formatting Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Matthew Wilcox	714a7a2288	NVMe: Convert comments to kernel-doc notation Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Krzysztof Wierzbicki	2ddc4f74d8	NVMe: Update admin opcodes to match the 1.0RC spec Signed-off-by: Krzysztof Wierzbicki <krzysztof.wierzbicki@intel.com> Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	b57ab0fada	NVMe: Version 0.4 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	e6d15f79f9	NVMe: Reduce maximum queue depth by 1 The spec says we're not allowed to completely fill the submission queue. Solve this by reducing the number of allocatable cmdids by 1. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	d8ee9d69f2	NVMe: Fix discontiguous accesses When we submit subsequent portions of the I/O, we need to access the updated block, not start reading again from the original position. This was showing up as miscompares in the XFS randholes testcase. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	1ad2f8932a	NVMe: Handle bios that contain non-virtually contiguous addresses NVMe scatterlists must be virtually contiguous, like almost all I/Os. However, when the filesystem lays out files with a hole, it can be that adjacent LBAs map to non-adjacent virtual addresses. Handle this by submitting one NVMe command at a time for each virtually discontiguous range. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	00df5cb4eb	NVMe: Implement Flush Linux implements Flush as a bit in the bio. That means there may also be data associated with the flush; if so the flush should be sent before the data. To avoid completing the bio twice, I add CMD_CTX_FLUSH to indicate the completion routine should do nothing. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	c42705592b	NVMe: Mark CMD_CTX_CANCELLED as being unlikely Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00

1 2 3 4 5 ...

264521 Commits