Prevent known non types being detected as modifiers. Ensure we do not
look at any type which starts with a keyword.
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, sparsemem is only available if EXPERIMENTAL is enabled.
However, it hasn't ever been marked experimental.
It's been about four years since sparsemem was merged, and we have
platforms which depend on it; allow architectures to decide whether
sparsemem should be the default memory model.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow csrows to properly initialize when the topology only has active
channels on 2 and 3. This new check allows proper detection and
initialization in this topology. Only checking the first mrt that
represented channels 0 and 1 is not sufficient.
I also fixed up the related debug information path. I can submit as a 2nd
patch if needed.
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Acked-by: Aristeu Rozanski <aris@ruivo.org>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When building without CONFIG_PCI the edac_pci_idx variable is unused,
causing a build-time warning. Wrap the variable in #ifdef CONFIG_PCI,
just like the rest of the PCI support.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The i5400 EDAC driver has several bugs with chip-select row computation
which most likely lead to bugs in detailed error reporting. Attempts to
contact the authors have gone mostly unanswered so I am presenting my diff
here. I do not subscribe to lkml and would appreciate being kept in the
cc.
The most egregious problem was miscalculating the addresses of MTR
registers after register 0 by assuming they are 32bit rather than 16.
This caused the driver to miss half of the memories. Most motherboards
tend to have only 8 dimm slots and not 16, so this may not have been
noticed before.
Further, the row calculations multiplied the number of dimms several
times, ultimately ending up with a maximum row of 32. The chipset only
supports 4 dimms in each of 4 channels, so csrow could not be higher than
4 unless you use a row per-rank with dual-rank dimms. I opted to
eliminate this behavior as it is confusing to the user and the error
reporting works by slot and not rank. This gives a much clearer view of
memory by slot and channel in /sys.
Signed-off-by: Jeff Roberson <jroberson@jroberson.net>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Having ->procname but not ->proc_handler is valid when PROC_SYSCTL=n,
people use such combination to reduce ifdefs with non-standard handlers.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14408
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Peter Teoh <htmldeveloper@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Augment the documentation of the hwmon sysfs API to accomodate ACPI power
meters and the current desired behavior of power capping hardware drivers.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The gpio_twl4030_probe() function calls gpio_twl4030_remove(), and the
former has __devinit, so the latter cannot use __devexit. Otherwise we
hit the section mismatch warning:
WARNING: drivers/gpio/built-in.o(.devinit.text+0x71a): Section mismatch
in reference from the function _gpio_twl4030_probe() to the function
.devexit.text:_gpio_twl4030_remove()
The function __devinit _gpio_twl4030_probe() references a function
__devexit _gpio_twl4030_remove().
This is often seen when error handling in the init function uses
functionality in the exit path.
The fix is often to remove the __devexit annotation of
_gpio_twl4030_remove() so it may be used outside an exit section.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The IBM Saturn serial card has only one port. Without that fixup,
the kernel thinks it has two, which confuses userland setup and
admin tools as well.
[akpm@linux-foundation.org: fix pci-ids.h layout]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Alan Cox <alan@linux.intel.com>
Cc: Michael Reed <mreed10@us.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
is_power_of_2() appears not to be constant enough for BUILD_BUG_ON()
after the latest rework, so replace it with an open-coded test.
Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Claudio Scordino <claudio@evidence.eu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Isolators putting a page back to the LRU do not hold the page lock, and if
the page is mlocked, another thread might munlock it concurrently.
Expecting this, the putback code re-checks the evictability of a page when
it just moved it to the unevictable list in order to correct its decision.
The problem, however, is that ordering is not garuanteed between setting
PG_lru when moving the page to the list and checking PG_mlocked
afterwards:
#0: #1
spin_lock()
if (TestClearPageMlocked())
if (PageLRU())
move to evictable list
SetPageLRU()
spin_unlock()
if (!PageMlocked())
move to evictable list
The PageMlocked() check may get reordered before SetPageLRU() in #0,
resulting in #0 not moving the still mlocked page, and in #1 failing to
isolate and move the page as well. The page is now stranded on the
unevictable list.
The race condition is very unlikely. The consequence currently is one
page falling off the reclaim grid and eventually getting freed with
PG_unevictable set, which triggers a warning in the page allocator.
TestClearPageMlocked() in #1 already provides full memory barrier
semantics.
This patch adds an explicit full barrier to force ordering between
SetPageLRU() and PageMlocked() so that either one of the competitors
rescues the page.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If migrate_prep is failed, new variable is leaked. This patch fixes it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A particular fsfuzzer run caused an hfs file system to crash on mount.
This is due to a corrupted MDB extent record causing a miscalculation of
HFS_I(inode)->first_blocks for the extent tree. If the extent records are
zereod out, it won't trigger the first_blocks special case. Instead it
falls through to the extent code which we're still in the middle of
initializing.
This patch catches the 0 size extent records, reports the corruption, and
fails the mount.
Reported-by: Ramon de Carvalho Valle <rcvalle@linux.vnet.ibm.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is possible to have !Anon but SwapBacked pages, and some apps could
create huge number of such pages with MAP_SHARED|MAP_ANONYMOUS. These
pages go into the ANON lru list, and hence shall not be protected: we only
care mapped executable files. Failing to do so may trigger OOM.
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert
commit 71de1ccbe1
Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
AuthorDate: Mon Sep 21 17:01:31 2009 -0700
Commit: Linus Torvalds <torvalds@linux-foundation.org>
CommitDate: Tue Sep 22 07:17:27 2009 -0700
mm: oom analysis: add buffer cache information to show_free_areas()
show_free_areas() is called during page allocation failures, and page
allocation failures can occur in any calling context.
But nr_blockdev_pages() takes VFS locks which should not be taken from
hard IRQ context (at least). The result is lockdep warnings (and
deadlockability) during page allocation failures.
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As found in <http://bugs.debian.org/550010>, hfsplus is using type u32
rather than sector_t for some sector number calculations.
In particular, hfsplus_get_block() does:
u32 ablock, dblock, mask;
...
map_bh(bh_result, sb, (dblock << HFSPLUS_SB(sb).fs_shift) + HFSPLUS_SB(sb).blockoffset + (iblock & mask));
I am not confident that I can find and fix all cases where a sector number
may be truncated. For now, avoid data loss by refusing to mount HFS+
volumes with more than 2^32 sectors (2TB).
[akpm@linux-foundation.org: fix 32 and 64-bit issues]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Eric Sesterhenn <snakebyte@gmx.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
K: is for keyword. Syntax is perl extended regex.
Reorganized header documentation and indent the section entry descriptions
so that the first K: would not be considered a regex to match by
get_maintainer.pl
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Based on an idea from Wolfram Sang.
Add search for MAINTAINERS line "K:" regex pattern match in a patch or file
Matches are added after file pattern matches
Add --keywords command line switch (default 1, on)
Change version to 0.21
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Li Yang <leoli@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the mismerge of the W: URL and the S: status fields.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Which had an embedded and duplicated email address
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Tony Lindgren <tony@atomide.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move to alphabetic position
Use single line F: entries
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Quote a name with a period
remove L: linux-kernel@vger.kernel.org
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The missing probe handler hook will never probe the driver. Add it back.
Fixes broken MMC on OMAP.
We use platform_driver_probe() API since omap_hsmmc is not a hot-pluggable
device.
Signed-off-by: Roger Quadros <ext-roger.quadros@nokia.com>
Tested-by: Felipe Contreras <felipe.contreras@gmail.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Felipe Contreras <felipe.contreras@gmail.com>
Cc: Denis Karpov <ext-denis.2.karpov@nokia.com>
Cc: Madhusudhan Chikkature <madhu.cr@ti.com>
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
strstrip() can return a modified value of its input argument, when
removing elading whitesapce. So it is surely bug for this function's
return value to be ignored. The caller is probably going to use the
incorrect original pointer.
So mark it __must_check to prevent this frm happening (as it has before).
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 8aa7e847d (Fix congestion_wait() sync/async vs read/write
confusion) replace WRITE with BLK_RW_ASYNC. Unfortunately, concurrent mm
development made the unchanged place accidentally.
This patch fixes it too.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since commit 02b51df1b0 (proc connector: add
event for process becoming session leader) we have the following warning:
Badness at kernel/softirq.c:143
[...]
Krnl PSW : 0404c00180000000 00000000001481d4 (local_bh_enable+0xb0/0xe0)
[...]
Call Trace:
([<000000013fe04100>] 0x13fe04100)
[<000000000048a946>] sk_filter+0x9a/0xd0
[<000000000049d938>] netlink_broadcast+0x2c0/0x53c
[<00000000003ba9ae>] cn_netlink_send+0x272/0x2b0
[<00000000003baef0>] proc_sid_connector+0xc4/0xd4
[<0000000000142604>] __set_special_pids+0x58/0x90
[<0000000000159938>] sys_setsid+0xb4/0xd8
[<00000000001187fe>] sysc_noemu+0x10/0x16
[<00000041616cb266>] 0x41616cb266
The warning is
---> WARN_ON_ONCE(in_irq() || irqs_disabled());
The network code must not be called with disabled interrupts but
sys_setsid holds the tasklist_lock with spinlock_irq while calling the
connector.
After a discussion we agreed that we can move proc_sid_connector from
__set_special_pids to sys_setsid.
We also agreed that it is sufficient to change the check from
task_session(curr) != pid into err > 0, since if we don't change the
session, this means we were already the leader and return -EPERM.
One last thing:
There is also daemonize(), and some people might want to get a
notification in that case. Since daemonize() is only needed if a user
space does kernel_thread this does not look important (and there seems
to be no consensus if this connector should be called in daemonize). If
we really want this, we can add proc_sid_connector to daemonize() in an
additional patch (Scott?)
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Scott James Remnant <scott@ubuntu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Given such a long name, the kB count in /proc/meminfo's HardwareCorrupted
line is being shown too far right (it does align with x86_64's VmallocChunk
above, but I hope nobody will ever have that much corrupted!). Align it.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memory failure on a KSM page currently oopses on its NULL anon_vma in
page_lock_anon_vma(): that may not be much worse than the consequence of
ignoring it, but it is better to be consistent with how ZERO_PAGE and
hugetlb pages and other awkward cases are treated. Just skip it.
We could fix it for 2.6.32 at the KSM end, by putting a dummy anon_vma
pointer in there; but that would get harder next time, when KSM will put a
pointer to something else there (and I'm not currently planning to do any
work to open that up to memory_failure). So I would prefer this simple
PageKsm test, until the other exceptions are handled.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When CONFIG_CPU_FREQ is disabled, cpufreq_get() needs a stub. Used by kvm
(although it looks like a bit of the kvm code could be omitted when
CONFIG_CPU_FREQ is disabled).
arch/x86/built-in.o: In function `kvm_arch_init':
(.text+0x10de7): undefined reference to `cpufreq_get'
(Needed in linux-next's KVM tree, but it's correct in 2.6.32).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Tested-by: Eric Paris <eparis@redhat.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
move virtrng_remove to .devexit.text
move virtballoon_remove to .devexit.text
virtio_blk: Revert serial number support
virtio: let header files include virtio_ids.h
virtio_blk: revert QUEUE_FLAG_VIRT addition
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
niu: VLAN_ETH_HLEN should be used to make sure that the whole MAC header was copied to the head buffer in the Vlan packets case
KS8851: Fix ks8851_set_rx_mode() for IFF_MULTICAST
KS8851: Fix MAC address write order
KS8851: Add soft reset at probe time
net: fix section mismatch in fec.c
net: Fix struct inet_timewait_sock bitfield annotation
tcp: Try to catch MSG_PEEK bug
net: Fix IP_MULTICAST_IF
bluetooth: static lock key fix
bluetooth: scheduling while atomic bug fix
tcp: fix TCP_DEFER_ACCEPT retrans calculation
tcp: reduce SYN-ACK retrans for TCP_DEFER_ACCEPT
tcp: accept socket after TCP_DEFER_ACCEPT period
Revert "tcp: fix tcp_defer_accept to consider the timeout"
AF_UNIX: Fix deadlock on connecting to shutdown socket
ethoc: clear only pending irqs
ethoc: inline regs access
vmxnet3: use dev_dbg, fix build for CONFIG_BLOCK=n
virtio_net: use dev_kfree_skb_any() in free_old_xmit_skbs()
be2net: fix support for PCI hot plug
...
The function virtrng_remove is used only wrapped by __devexit_p so define
it using __devexit.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The function virtballoon_remove is used only wrapped by __devexit_p so
define it using __devexit.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This reverts "Add serial number support for virtio_blk, V4a".
Turns out that virtio_pci, lguest and s/390 all have an 8 bit limit
on virtio config space, so noone could ever use this.
This is coming back later in a cleaner form.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: john cooper <john.cooper@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Rusty,
commit 3ca4f5ca73
virtio: add virtio IDs file
moved all device IDs into a single file. While the change itself is
a very good one, it can break userspace applications. For example
if a userspace tool wanted to get the ID of virtio_net it used to
include virtio_net.h. This does no longer work, since virtio_net.h
does not include virtio_ids.h.
This patch moves all "#include <linux/virtio_ids.h>" from the C
files into the header files, making the header files compatible with
the old ones.
In addition, this patch exports virtio_ids.h to userspace.
CC: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It seems like the addition of QUEUE_FLAG_VIRT caueses major performance
regressions for Fedora users:
https://bugzilla.redhat.com/show_bug.cgi?id=509383https://bugzilla.redhat.com/show_bug.cgi?id=505695
while I can't reproduce those extreme regressions myself I think the flag
is wrong.
Rationale:
QUEUE_FLAG_VIRT expands to QUEUE_FLAG_NONROT which casus the queue
unplugged immediately. This is not a good behaviour for at least
qemu and kvm where we do have significant overhead for every
I/O operations. Even with all the latested speeups (native AIO,
MSI support, zero copy) we can only get native speed for up to 128kb
I/O requests we already are down to 66% of native performance for 4kb
requests even on my laptop running the Intel X25-M SSD for which the
QUEUE_FLAG_NONROT was designed.
If we ever get virtio-blk overhead low enough that this flag makes
sense it should only be set based on a feature flag set by the host.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>