The MAD layer was violating the DMA API by touching data buffers used
for sends after the DMA mapping was done. This causes problems on
non-cache-coherent architectures, because the device doing DMA won't
see updates to the payload buffers that exist only in the CPU cache.
Fix this by having all MAD consumers use ib_create_send_mad() to
allocate their send buffers, and moving the DMA mapping into the MAD
layer so it can be done just before calling send (and after any
modifications of the send buffer by the MAD layer).
Tested on a non-cache-coherent PowerPC 440SPe system.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix a bug which was reported and diagnosed by
Stefan Jones <stefan.jones@churchillrandoms.co.uk>
IDR trees include a cache of idr_layer objects. There's no way to destroy
this cache, so when we discard an overall idr tree we end up leaking some
memory.
Add and use idr_destroy() for this. v9fs and infiniband also need to use
idr_destroy() to avoid leaks.
Or, we make the cache global, like radix_tree_preload(). Which is probably
better. Later.
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As stated in Documentation/atomic_ops.txt, atomic functions
returning values must have the memory barriers both before and after
the operation.
Thanks to DaveM for pointing that out.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On architectures where the char type defaults to unsigned some of the
arithmetic in the AX.25 stack to fail, resulting in some packets being dropped
on receive.
Credits for tracking this down and the original patch to
Bob Brose N0QBJ <linuxhams@n0qbj-11.ampr.org>.
Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Patch from Ben Dooks
From: Guillaume Gourat <guillaume.gourat@nexvision.fr>
Add MASK definitions for DCLK0 and DCLK1
Signed-off-by: Guillaume Gourat <guillaume.gourat@nexvision.fr>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Ben Dooks
Avoid the possiblity that if the board is using
a 16.9334 or higher crystal with a high PLL
multiplier, then the pll value could overflow
the capability of an int.
Also fix the value types of the intermediate
variables to unsigned int.
Rewrite of patch from Guillaume Gourat
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This reverts commit 3359b54c8c and
replaces it with a cleaner version that is purely based on page table
operations, so that the synchronization between inode size and hugetlb
mappings becomes moot.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This introduces a limit parameter to the core bootmem allocator; The new
parameter indicates that physical memory allocated by the bootmem
allocator should be within the requested limit.
We also introduce alloc_bootmem_low_pages_limit, alloc_bootmem_node_limit,
alloc_bootmem_low_pages_node_limit apis, but alloc_bootmem_low_pages_limit
is the only api used for swiotlb.
The existing alloc_bootmem_low_pages() api could instead have been
changed and made to pass right limit to the core allocator. But that
would make the patch more intrusive for 2.6.14, as other arches use
alloc_bootmem_low_pages(). We may be done that post 2.6.14 as a
cleanup.
With this, swiotlb gets memory within 4G for both x86_64 and ia64
arches.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Ravikiran G Thirumalai <kiran@scalex86.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The hugetlb pages are currently pre-faulted. At the time of mmap of
hugepages, we populate the new PTEs. It is possible that HW has already
cached some of the unused PTEs internally. These stale entries never
get a chance to be purged in existing control flow.
This patch extends the check in page fault code for hugepages. Check if
a faulted address falls with in size for the hugetlb file backing it.
We return VM_FAULT_MINOR for these cases (assuming that the arch
specific page-faulting code purges the stale entry for the archs that
need it).
Signed-off-by: Rohit Seth <rohit.seth@intel.com>
[ This is apparently arguably an ia64 port bug. But the code won't
hurt, and for now it fixes a real problem on some ia64 machines ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch from Paul Schulz
The following trivial patch is to fix what looks like a typo in the PXA register
definitions. The correction comes directly from the definition in the
Intel Documentation.
http://www.intel.com/design/pca/applicationsprocessors/manuals/278693.htm
Intel(R) PXA 255 Processor - Developers Manual - Jan 2004 - Page 12-33
Neither 'UDCCS_IO_ROF' or 'UDCCS_IO_DME' are currently used elseware
in the main code (from grep of tree)... The current definitions have been
in the code since at lease 2.4.7.
Signed-off-by: Paul Schulz <paul@mawsonlakes.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Kenneth Tan
The get_irqnr_and_base subroutine of ixp4xx does not take interrupt 0 condition into account properly. We should not perform "subs" here. The Z flag will be set when interrupt 0 occur, which resulting "movne r1, sp" in the caller routine (irq_handler) not being executed.
When interrupt 0 occur:
o if CONFIG_CPU_IXP46X is not set, "subs" will set the Z flag and return
o if CONFIG_CPU_IXP46X is set, codes in upper interrupt handling will be trigerred. But since this is not supper interrupt, the "cmp" in the upper interrupt handling portion will set the Z flag and return
Signed-off-by: Kenneth Tan <chong.yin.tan@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Kenneth Tan
The cpu_is_ixp465 macro in include/asm-arm/arch-ixp4xx/hardware.h is always returning 0 because #ifdef CONFIG_CPU_IXP465 is always false.
Signed-off-by: Kenneth Tan <chong.yin.tan@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
lock_kiocb() was introduced to serialize retrying and cancellation. In the
process of doing so it tried to sleep waiting for KIF_LOCKED while holding
the ctx_lock spinlock. Recent fixes have ensured that multiple concurrent
retries won't be attempted for a given iocb. Cancel has other problems and
has no significant in-tree users that have been complaining about it. So
for the immediate future we'll revert sleeping with the lock held and will
address proper cancellation and retry serialization in the future.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Bind communication identifiers to a device to support device removal.
Export per HCA CM devices to userspace.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
This makes call_rcu() keep track of how many events there are on the RCU
list, and cause a reschedule event when the list gets too long.
This helps keep RCU event lists down.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add kernel/user ABI structures for marshalling poll CQ, request CQ
notification, post send, post receive, post SRQ receive, create AH and
destroy AH commands. These commands allow us to support userspace
verbs for devices that can't perform these operations directly from
userspace (eg the PathScale HCA).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Give each device a uverbs_cmd_mask, so that a low-level driver can
control which methods may be called on behalf of userspace.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add abi_version attribute to uverbs class devices to allow for
ABI versioning of device-specific interfaces.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Introduce new userspace verbs ABI version 3. This eliminates some
unneeded commands, and adds support for user-created completion
channels. This cleans up problems with file leaks on error paths, and
also makes sure that file descriptors are always installed into the
correct process.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
It seems that all the list_*_rcu primitives are missing a memory barrier
on the very first dereference. For example,
#define list_for_each_rcu(pos, head) \
for (pos = (head)->next; prefetch(pos->next), pos != (head); \
pos = rcu_dereference(pos->next))
It will go something like:
pos = (head)->next
prefetch(pos->next)
pos != (head)
do stuff
We're missing a barrier here.
pos = rcu_dereference(pos->next)
fetch pos->next
barrier given by rcu_dereference(pos->next)
store pos
Without the missing barrier, the pos->next value may turn out to be stale.
In fact, if "do stuff" were also dereferencing pos and relying on
list_for_each_rcu to provide the barrier then it may also break.
So here is a patch to make sure that we have a barrier for the first
element in the list.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
... otherwise, things like alpha and sparc64 break and break
badly. They define cpu_possible_map to something else in smp.h
*AFTER* having included cpumask.h.
If that puppy is a macro, expansion will happen at the actual
caller, when we'd already seen #define cpu_possible_map ... and we will
get the right thing used.
As an inline helper it will be tokenized before we get to that
define and that's it; no matter what we define later, it won't affect
anything. We get modules with dependency on cpu_possible_map instead
of the right symbol (phys_cpu_present_map in case of sparc64), or outright
link errors if they are built-in.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix copy and paste error in jiffies_to_AHZ conversion which leads to wrong
BSD accounting information on alpha and ia64 when
CONFIG_BSD_PROCESS_ACCT_V3 is turned on.
Also update comment to match reorganised header files.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch from Richard Purdie
Add a function to allow machines to set the parent of the pxa
framebuffer device. This means the power up/down sequence can be
controlled where required by the machine.
Update spitz to use the new function, fixing a compile error.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Ben Dooks
The __inwc/__outwc calls are capable of creating
LDRH and STRH instructions with offsets over 8bits
as GCC does not have a constraint for an 8bit
offset.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The hairy fast allocator in the sparc64 PCI IOMMU code
has a hard limit of 256 pages. Certain devices can
exceed this when performing very large I/Os.
So replace with a more simple allocator, based largely
upon the arch/ppc64/kernel/iommu.c code.
Signed-off-by: David S. Miller <davem@davemloft.net>
All the PCI controller drivers were doing the same thing
setting up the IOMMU software state, put it all in one spot.
Signed-off-by: David S. Miller <davem@davemloft.net>
Original patch by Harald Welte, with feedback from Herbert Xu
and testing by Sbastien Bernard.
EBTABLES, ARP tables, and IP/IP6 tables all assume that cpus
are numbered linearly. That is not necessarily true.
This patch fixes that up by calculating the largest possible
cpu number, and allocating enough per-cpu structure space given
that.
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch from Ben Dooks
include/asm-arm/arch-s3c2410/hardware.h was missing
the definition for s3c2440_set_dsc()
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When netpoll is not being used, the macro that
defines the removed routing netpoll_poll_lock
defines the return as zero, but the real
routine returns a `void *`
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch from Liam Girdwood
This patch updates the pxa2xx channel map registers definitions in
pxa-regs.h
Changes:-
o Added description for SSP2 registers
o Added definitions for SSP3 registers
Signed-off-by:Liam Girdwood <liam.girdwood@wolfsonmicro.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Glibc is about to get some new high precision timer stuff that relies on
the standard timebase of the PPC architecture.
However, some (rare & old) CPUs do not have such timebase and it is a
bit annoying to have your stuff just crash because you are running on
the wrong CPU...
This exposes to userland a CPU feature bit that tells that the current
processor doesn't have a standard timebase. It's negative logic so that
glibc will still "just work" on older kernels (it will just be unhappy
on those old CPUs but that doesn't really matter as distro tend to
update glibc & kernel at the same time).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Interestingly enough, ppc32 had broken timekeeping for ages... It
worked, but probably drifted a bit more than could be explained by the
actual bad precision of the timebase calibration. We discovered that
recently when somebody figured out that the common code was using
CLOCK_TICK_RATE to correct the timekeeing, and ppc32 had a completely
bogus value for it.
This patch turns it into something saner. Probably not as good as doing
something based on the actual timebase frequency precision but I'll
leave that sort of math to others. This at least makes it better for
the common HZ values.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is required to avoid unloading a module that has active timewait
sockets, such as DCCP.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the ability of changing the state a TCP connection. I know
that this must be used with care but it's required to provide a complete
conntrack creation via conntrack_netlink. So I'll document this aspect on
the upcoming docs.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initially we used 64bit counters for conntrack-based accounting, since we
had no event mechanism to tell userspace that our counters are about to
overflow. With nfnetlink_conntrack, we now have such a event mechanism and
thus can save 16bytes per connection.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To keep consistency, the TCP private protocol information is nested
attributes under CTA_PROTOINFO_TCP. This way the sequence of attributes to
access the TCP state information looks like here below:
CTA_PROTOINFO
CTA_PROTOINFO_TCP
CTA_PROTOINFO_TCP_STATE
instead of:
CTA_PROTOINFO
CTA_PROTOINFO_TCP_STATE
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without this #include, __be16 is not defined and userspace programs
will break.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When 'rustynat' was merged in 2.6.12, the use of the "helper" pointer of
struct ipt_nat_info was obsoleted, but the pointer not removed from the
struct.
This patch removes the pointer, thereby yet again shrinking struct
ip_conntrack.
Discovered-by: Rusty Russell <rusty@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Henrik Nordstrom pointed out, all our efforts with "split endian" (i.e.
host byte order tags, net byte order values) are useless, unless a parser
can determine whether an attribute is nested or not.
This patch steals the highest bit of nfattr.nfa_type to indicate whether
the data payload contains a nested nfattr (1) or not (0).
This will break userspace compatibility, but luckily no kernel with
nfnetlink was released so far.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
CPU hotplug fills up the possible map to NR_CPUs, but it did that after
setting up per CPU data. This lead to CPU data not getting allocated
for all possible CPUs, which lead to various side effects.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If a process issues an URB from userspace and (starts to) terminate
before the URB comes back, we run into the issue described above. This
is because the urb saves a pointer to "current" when it is posted to the
device, but there's no guarantee that this pointer is still valid
afterwards.
In fact, there are three separate issues:
1) the pointer to "current" can become invalid, since the task could be
completely gone when the URB completion comes back from the device.
2) Even if the saved task pointer is still pointing to a valid task_struct,
task_struct->sighand could have gone meanwhile.
3) Even if the process is perfectly fine, permissions may have changed,
and we can no longer send it a signal.
So what we do instead, is to save the PID and uid's of the process, and
introduce a new kill_proc_info_as_uid() function.
Signed-off-by: Harald Welte <laforge@gnumonks.org>
[ Fixed up types and added symbol exports ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>