In times of ARC 700 performance counters didn't have support of
interrupt an so for ARC we only had support of non-sampling events.
Put simply only "perf stat" was functional.
Now with ARC HS we have support of interrupts in performance counters
which this change introduces support of.
ARC performance counters act in the following way in regard of
interrupts generation.
[1] A counter counts starting from value set in PCT_COUNT register pair
[2] Once counter reaches value set in PCT_INT_CNT interrupt is raised
Basic setup look like this:
[1] PCT_COUNT = 0;
[2] PCT_INT_CNT = __limit_value__;
[3] Enable interrupts for that counter and let it run
[4] Let counter reach its limit
[5] Handle interrupt when it happens
Note that PCT HW block is build in CPU core and so ints interrupt
line (which is basically OR of all counters IRQs) is wired directly to
top-level IRQC. That means do de-assert PCT interrupt it's required to
reset IRQs from all counters that have reached their limit values.
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This generalization prepares for support of overflow interrupts.
Hardware event counters on ARC work that way:
Each counter counts from programmed start value (set in
ARC_REG_PCT_COUNT) to a limit value (set in ARC_REG_PCT_INT_CNT) and
once limit value is reached this timer generates an interrupt.
Even though this hardware implementation allows for more flexibility,
in Linux kernel we decided to mimic behavior of other architectures
this way:
[1] Set limit value as half of counter's max value (to allow counter to
run after reaching it limit, see below for more explanation):
---------->8-----------
arc_pmu->max_period = (1ULL << counter_size) / 2 - 1ULL;
---------->8-----------
[2] Set start value as "arc_pmu->max_period - sample_period" and then
count up to the limit
Our event counters don't stop on reaching max value (the one we set in
ARC_REG_PCT_INT_CNT) but continue to count until kernel explicitly
stops each of them.
And setting a limit as half of counter capacity is done to allow
capturing of additional events in between moment when interrupt was
triggered until we're actually processing PMU interrupts. That way
we're trying to be more precise.
For example if we count CPU cycles we keep track of cycles while
running through generic IRQ handling code:
[1] We set counter period as say 100_000 events of type "crun"
[2] Counter reaches that limit and raises its interrupt
[3] Once we get in PMU IRQ handler we read current counter value from
ARC_REG_PCT_SNAP ans see there something like 105_000.
If counters stop on reaching a limit value then we would miss
additional 5000 cycles.
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The number of counters in PCT can never be more than 32 (while
countable conditions could be 100+) for both ARCompact and ARCv2
And while at it update copyright dates.
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
printk() supports %*ph format specifier for printing a small buffers,
let's use it intead of %02x %02x...
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We recently did some cleanup here and now the static checkers notice
that there is a missing error code when ioremap() fails. Let's set it
to -ENOMEM.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
In the function storvsc_channel_init(), error code was not getting
set correctly in some of the failure cases. Fix this issue.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Allow WRITE_SAME for Windows10 and above hosts.
Tested-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: Keith Mange <keith.mange@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Use storage protocol version instead of vmbus protocol
version when determining storage capabilities.
Tested-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: Keith Mange <keith.mange@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Use correct defaults for values determined by protocol negotiation,
instead of resetting them with every scsi controller.
Tested-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: Keith Mange <keith.mange@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Currently we are making decisions based on vmbus protocol versions
that have been negotiated; use storage potocol versions instead.
[jejb: fold ARRAY_SIZE conversion suggested by Johannes Thumshirn
<jthumshirn@suse.de>
make vmstor_protocol static]
Tested-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: Keith Mange <keith.mange@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Use a single value to track protocol versions to simplify
comparisons and to be consistent with vmbus version tracking.
Tested-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: Keith Mange <keith.mange@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Rather than look for sets of specific protocol versions,
make decisions based on ranges. This will be safer and require fewer changes
going forward as we add more storage protocol versions.
Tested-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: Keith Mange <keith.mange@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
The queuecommand routine has a local dev pointer used for the
dev_* prints. The two prints that currently exist are tucked
under a debug define and thus can be left out. Use the actual
location instead of a local to avoid this warning.
This patch is intended to be applied after the "CXL Flash Error
Recovery and Superpipe" series.
Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
"port_sel" is a u64 so the shifting should also be a 64 bit shift.
Fixes: c21e0bbfc4 ('cxlflash: Base support for IBM CXL Flash Adapter')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
The > should be >= or we read one element past the end of the array.
Fixes: c21e0bbfc4 ('cxlflash: Base support for IBM CXL Flash Adapter')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Add support for physical LUN segmentation (virtual LUNs) to device
driver supporting the IBM CXL Flash adapter. This patch allows user
space applications to virtually segment a physical LUN into N virtual
LUNs, taking advantage of the translation features provided by this
adapter.
Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Reviewed-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Add superpipe supporting infrastructure to device driver for the IBM CXL
Flash adapter. This patch allows userspace applications to take advantage
of the accelerated I/O features that this adapter provides and bypass the
traditional filesystem stack.
Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Reviewed-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Introduce support for enhanced I/O error handling.
A device state is added to track 3 possible states of the device:
Normal - the device is operating normally and is fully operational
Limbo - the device is in a reset/recovery scenario and its operational
status is paused
Failed/terminating - the device has either failed to be reset/recovered
or is being terminated (removed); it is no longer
operational
All operations are allowed when the device is operating normally. When the
device transitions to limbo state, I/O must be paused. To help accomplish
this, a wait queue is introduced where existing and new threads can wait
until the device is no longer in limbo. When coming out of limbo, threads
need to check the state and error out gracefully when encountering the
failed state. When the device transitions to the failed/terminating state,
normal operations are no longer allowed. Only specially designated
operations related to graceful cleanup are permitted.
Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
On certain conditions, login failures will just invoke
qla2x00_mark_device_lost() with the intend to do login again;
but if login_retry has been set already, that would fail to set the
relogin needed flag which is required to wakeup the DPC to retry.
Signed-off-by: Arun Easi <arun.easi@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Fix for memory leak when command is not found by firmware due to
mismatch in sp reference count.
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Instead of resetting the adapter wait for the login to timeout
and retry. Resetting the adapter can cause extended path recovery
times.
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
If an SRB is NULL but the handle is in range just drop the
command instead of also resetting the adapter. If the handle
is in range then the command was valid at some point and may
have been aborted. Resetting the adapter can lead to extended
recovery times in this case.
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Aovid crashing the system in the scenario where firmware
just completes the command and it can not find the command
during abort mailbox processing. This scenario can lead to
sp reference counter being zero. Instead of crashing the
system, use WARN_ON to print warning in log file.
Signed-off-by: Hiral Patel <hiral.patel@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Execute qla25xx_manipulate_risc_semaphore() only for
ssdid 0x0175 and 0x0240.
Signed-off-by: Joe Carnuccio <joe.carnuccio@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
When we get logged out, mark the port lost and set dpc flag for relogin.
Signed-off-by: Joe Carnuccio <joe.carnuccio@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
scsi_error: should not get sense for timeout IO in scsi error handler
When an IO timeout occurs, the IO will be aborted in
scsi_abort_command() and SCSI_EH_ABORT_SCHEDULED will be set. Because
of that, the SCSI_EH_CANCEL_CMD will be clear in scsi_eh_scmd_add().
So when scsi error handler starts, it will get sense for this
timeout IO and the scmd of the IO request will be reused. In that
case, the scmd may be double released when racing with io_done(),
which will result in crash.
SO SCSI_EH_ABORT_SCHEDULED should also be checked when getting sense.
The bug maybe reproduced when the link between host and disk is
unstable.
Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Long Chun <long.chun@zte.com.cn>
Reviewed-by: Tan Hu <tan.hu@zte.com.cn>
Reviewed-by: Chen Donghai <chen.donghai@zte.com.cn>
Reviewed-by: Cai Qu <cai.qu@zte.com.cn>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Bump pm80xx driver version to 0.1.38.
Signed-off-by: Viswas G <Viswas.G@pmcs.com>
Reviewed-by: Suresh Thiagarajan <Suresh.Thiagarajan@pmcs.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
The request has to be retried incase if the length of the SSP
Response IU is invalid.
Signed-off-by: Viswas G <Viswas.G@pmcs.com>
Reviewed-by: Suresh Thiagarajan <Suresh.Thiagarajan@pmcs.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
PORT RECOVERY TIMEOUT is the maximum time between the controller's
detection of the PHY down until the receipt of the ID_Frame (from the
same remote SAS port). If the time expires before the ID_FRAME is
received, the port is considered INVALID and can be removed. The
IOP_EVENT_PORT_RECOVERY_TIMER_TMO event is reported following the
IOP_EVENT_ PHY_DOWN event when the PHY/port does not recover after
Port Recovery Time.
Signed-off-by: Viswas G <Viswas.G@pmcs.com>
Reviewed-by: Suresh Thiagarajan <Suresh.Thiagarajan@pmcs.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
If the link error happens, we don't need to disconnect the phy,
which will remove the drive. Instead acknowledging the controller
and logging the error will be enough.
Signed-off-by: Viswas G <Viswas.G@pmcs.com>
Reviewed-by: Suresh Thiagarajan <Suresh.Thiagarajan@pmcs.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
In pm8001_ccb_task_free(), the dma unmapping is done based on
ccb->n_elem value. This should be initialized to zero in the
task_abort(). Otherwise, pm8001_ccb_task_free() will try for
dma_unmap_sg() which is invalid for task abort and can lead to
kernel crash.
Changes From V1:
None
Signed-off-by: Viswas G <Viswas.G@pmcs.com>
Reviewed-by: Suresh Thiagarajan <Suresh.Thiagarajan@pmcs.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Thermal page code has been changed to 7 for the 12G controllers.
Signed-off-by: Viswas G <Viswas.G@pmcs.com>
Reviewed-by: Suresh Thiagarajan <Suresh.Thiagarajan@pmcs.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
In Nexus reset the device state request are not needed.
Signed-off-by: Viswas G <Viswas.G@pmcs.com>
Reviewed-by: Suresh Thiagarajan <Suresh.Thiagarajan@pmcs.com>
Acked-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Updated 12G linkrate to libsas.
Signed-off-by: Viswas G <Viswas.G@pmcs.com>
Reviewed-by: Suresh Thiagarajan <Suresh.Thiagarajan@pmcs.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Company has policy to use company email address, so update
my email address to company address.
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Use logical instead of bitwise AND.
Signed-off-by: Sebastian Herbszt <herbszt@gmx.de>
Reviewed-by: James Smart <james.smart@avagotech.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Invoking get_cmd_state for qla2xxx always returns 0. Instead change it
to return the actual fabric state from qla_tgt_cmd. This will help with
debugging.
Signed-off-by: Dilip Kumar Uppugandla <dilip@purestorage.com>
Signed-off-by: Spencer Baugh <sbaugh@catern.com>
Acked-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
The driver is calling hpsa_shutdown before calling scsi_remove_host.
hpsa_shutdown is disabling interrupts.
scsi_remove_host can trigger I/O operations, such as
SYNCHRONIZE CACHE when multipath is enabled which hang the system.
Call scsi_remove_host before calling hpsa_shutdown.
Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com>
Reviewed-by: Scott Teel <scott.teel@pmcs.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Don Brace <don.brace@pmcs.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
A regression was introduced into the hpsa driver a while back so
non-zero LUNs of multi-LUN devices may no longer be presented via
a SAS based Smart Array. I have not done a bisection to discover
the change that caused it.
The CISS firmware specification (available on sourceforge)
defines an 8 byte lunid that describes devices that the Smart
Array can see/present to the system. The current code in the hpsa
driver attempts to find matches for non-zero LUNs with LUN 0 for
a bus/target by zeroing out byte 4 of the lunid and find a match.
This method is sufficient for SCSI based Smart Arrays because
byte 5 is always 0. For SAS based Smart arrays byte 5 of the
lunid contains the path number for a multipath device and
either one or two bits (the documentation does not define how
many bits are used but it appears it may be one only) that
indicate if the given path number in byte 5 must always be
used to access that device. Byte 5 may not always be zero.
The following are lunids (spaces added for clarity) for a
MSL2024 single drive library connected via a H241 Smart Array:
00 00 00 00 01 00 00 01 (changer)
00 00 00 00 00 80 00 01 (tape)
In the 4th byte (counting from 0) you can see that the tape
is LUN 0 and the changer is LUN 1. The 0x80 set in the 5th byte
for the tape drive means the driver should force access to
path 0 (the library in this case was connected to one path only
anyway).
After the changes we can see the following in the dmesg output:
scsi 0:3:0:0: RAID HP H241 1.18 \
PQ: 0 ANSI: 5
scsi 0:2:0:0: Sequential-Access HP Ultrium 6-SCSI 354W \
PQ: 0 ANSI: 6
scsi 0:2:0:1: Medium Changer HP MSL G3 Series 8.70 \
PQ: 0 ANSI: 5
Showing that the changer is correctly identified as LUN 1 of
bus 2 target 0. Before the change the changer device is not seen.
Suggested-by: shane.seymour <shane.seymour@hp.com>
Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com>
Reviewed-by: Scott Teel <scott.teel@pmcs.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Don Brace <don.brace@pmcs.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
prevent adding volumes that are not available.
Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com>
Reviewed-by: Scott Teel <scott.teel@pmcs.com>
Reviewed-by: Justin Lindley <justin.lindley@pmcs.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Don Brace <don.brace@pmcs.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com>
Reviewed-by: Scott Teel <scott.teel@pmcs.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Don Brace <don.brace@pmcs.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com>
Reviewed-by: Scott Teel <scott.teel@pmcs.com>
Signed-off-by: Don Brace <don.brace@pmcs.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
showing that tables have been updated unnecessarily.
Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com>
Reviewed-by: Scott Teel <scott.teel@pmcs.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Don Brace <don.brace@pmcs.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>