Commit Graph

2383 Commits

Author SHA1 Message Date
Jens Axboe
ba1f188cae [PARISC] Add new ioprio_{set,get} syscalls
add syscall entries for ioprio_set/get as per Jens Axboe.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Grant Grundler <grundler@parisc-linux.org>

Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21 22:45:57 -04:00
Grant Grundler
a366064c3f [PARISC] Update bitops from parisc tree
Optimize ext2_find_next_zero_bit. Gives about 25% perf improvement with a
rsync test with ext3.

Signed-off-by: Randolph Chung <tausq@parisc-linux.org>

fix ext3 performance - ext2_find_next_zero() was culprit.
Kudos to jejb for pointing out the the possibility that ext2_test_bit
and ext2_find_next_zero() may in fact not be enumerating bits in
the bitmap because of endianess. Took sparc64 implementation and
adapted it to our tree. I suspect the real problem is ffz() wants
an unsigned long and was getting garbage in the top half of the
unsigned int. Not confirmed but that's what I suspect.

Signed-off-by: Grant Grundler <grundler@parisc-linux.org>

Fix find_next_bit for 32-bit
Make masking consistent for bitops

From: Joel Soete <soete.joel@tiscali.be>
Signed-off-by: Randolph Chung <tausq@parisc-linux.org>

Add back incorrectly removed ext2_find_first_zero_bit definition

Signed-off-by: James Bottomley <jejb@parisc-linux.org>

Fixup bitops.h to use volatile for *_bit() ops

Based on this email thread:
       http://marc.theaimsgroup.com/?t=108826637900003

In a nutshell:
        *_bit() want use of volatile.
        __*_bit() are "relaxed" and don't use spinlock or volatile.

other minor changes:
o replaces hweight64() macro with alias to generic_hweight64() (Joel Soete)
o cleanup ext2* macros so (a) it's obvious what the XOR magic is about
  and (b) one version that works for both 32/64-bit.
o replace 2 uses of CONFIG_64BIT with __LP64__. bitops.h used both.
  I think header files that might go to user space should use
  something userspace will know about (__LP64__).

Signed-off-by: Grant Grundler <grundler@parisc-linux.org>

Move SHIFT_PER_LONG to standard location for BITS_PER_LONG (asm/types.h)
and ditch the second definition of BITS_PER_LONG in bitops.h

Signed-off-by: Grant Grundler <grundler@parisc-linux.org>

Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21 22:45:22 -04:00
Kyle McMartin
f053725b89 [PARISC] Add ability for prctl to change unaligned trap behaviour
Add support for changing unaligned trap behaviour on a
per-thread basis.

Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21 22:43:15 -04:00
Randolph Chung
5cd55b0ede [PARISC] Take into account nullified insn and lock functions for profiling
export profile_pc() symbol - oprofile needs it when built as a module.

Signed-off-by: Grant Grundler <grundler@parisc-linux.org>

Take into account nullified insn and lock functions for profiling

This is needed at the end of functions; it is typical that the return
branch nullifies the next insn, which is in the next function. This
causes profiling data to show up against the "wrong" function.

We also count lock times against the locker. This is consistent with
other architectures.

Signed-off-by: Randolph Chung <tausq@parisc-linux.org>

Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21 22:42:18 -04:00
Matthew Wilcox
14e256c107 [PARISC] Update spinlocks from parisc tree
Neaten up the CONFIG_PA20 ifdefs

More merge fixes, this time for SMP

Signed-off-by: Matthew Wilcox <willy@parisc-linux.org>

Prettify the CONFIG_DEBUG_SPINLOCK __SPIN_LOCK_UNLOCKED initializers.

Clean up some warnings with CONFIG_DEBUG_SPINLOCK enabled.

Fix build with spinlock debugging turned on. Patch is cleaner like this,
too.

Remove mandatory 16-byte alignment requirement on PA2.0 processors by
using the ldcw,CO completer. Provides a nice insn savings.

Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21 22:41:25 -04:00
Grant Grundler
04d472dc83 [PARISC] Move pa_tlb_lock to tlb_flush.h
move pa_tlb_lock and it's primary consumers to tlb_flush.h
Future step will be to move spinlock_t definition out of system.h.

Signed-off-by: Grant Grundler <grundler@parisc-linux.org>

Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21 22:40:24 -04:00
Grant Grundler
896a375623 [PARISC] Make sure use of RFI conforms to PA 2.0 and 1.1 arch docs
2.6.12-rc4-pa3 : first pass at making sure use of RFI conforms to
PA 2.0 arch pages F-4 and F-5, PA 1.1 Arch page 3-19 and 3-20.

The discussion revolves around all the rules for clearing PSW Q-bit.
The hard part is meeting all the rules for "relied upon translation".

.align directive is used to guarantee the critical sequence ends more than
8 instructions (32 bytes) from the end of page.

Signed-off-by: Grant Grundler <grundler@parisc-linux.org>

Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21 22:40:07 -04:00
Matthew Wilcox
53f01bba49 [PARISC] Convert parisc_device to use struct resource for hpa
Convert pa_dev->hpa from an unsigned long to a struct resource.

Signed-off-by: Matthew Wilcox <willy@parisc-linux.org>

Fix up users of ->hpa to use ->hpa.start instead.

Signed-off-by: Matthew Wilcox <willy@parisc-linux.org>

Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21 22:36:40 -04:00
Matthew Wilcox
5658374766 [PARISC] Convert parisc_device tree to use struct device klists
Fix parse_tree_node.  much more needs to be done to fix this file.

Signed-off-by: Matthew Wilcox <willy@parisc-linux.org>

Make drivers.c compile based on a patch from Pat Mochel.

From: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>

Fix drivers.c to create new device tree nodes when no match is found.

Signed-off-by: Richard Hirst <rhirst@parisc-linux.org>

Do a proper depth-first search returning parents before children, using the
new klist infrastructure.

Signed-off-by: Richard Hirst <rhirst@parisc-linux.org>

Fixed parisc_device traversal so that pdc_stable works again
Fixed check_dev so it doesn't dereference a parisc_device until it
has verified the bus type

Signed-off-by: Randolph Chung <tausq@parisc-linux.org>

Convert pa_dev->hpa from an unsigned long to a struct resource.
Use insert_resource() instead of request_mem_region().
Request resources at bus walk time instead of driver probe time.
Don't release the resources as we don't have any hotplug parisc_device
support yet.
Add parisc_pathname() to conveniently get the textual representation
of the hwpath used in sysfs.
Inline the remnants of claim_device() into its caller.

Signed-off-by: Matthew Wilcox <willy@parisc-linux.org>

I noticed that some of the STI regions weren't showing up in iomem.
Reading the STI spec indicated that all STI devices occupy at least 32MB.
So check for STI HPAs and give them 32MB instead of 4kB.

Signed-off-by: Matthew Wilcox <willy@parisc-linux.org>

Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21 22:33:38 -04:00
Linus Torvalds
2c86c83bf4 Merge master.kernel.org:/home/rmk/linux-2.6-arm 2005-10-21 12:23:07 -07:00
Ben Dooks
7fe8785e41 [ARM] 3028/1: S3C2410 - add DCLK mask definitions
Patch from Ben Dooks

From: Guillaume Gourat <guillaume.gourat@nexvision.fr>

Add MASK definitions for DCLK0 and DCLK1

Signed-off-by: Guillaume Gourat <guillaume.gourat@nexvision.fr>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-10-20 23:21:20 +01:00
Ben Dooks
a7ce8edc82 [ARM] 3026/1: S3C2410 - avoid possible overflow in pll calculations
Patch from Ben Dooks

Avoid the possiblity that if the board is using
a 16.9334 or higher crystal with a high PLL
multiplier, then the pll value could overflow
the capability of an int.

Also fix the value types of the intermediate
variables to unsigned int.

Rewrite of patch from Guillaume Gourat

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-10-20 23:21:18 +01:00
Hugh Dickins
ac9b9c667c [PATCH] Fix handling spurious page fault for hugetlb region
This reverts commit 3359b54c8c and
replaces it with a cleaner version that is purely based on page table
operations, so that the synchronization between inode size and hugetlb
mappings becomes moot.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-20 09:02:07 -07:00
Linus Torvalds
26baeba8dd Merge master.kernel.org:/home/rmk/linux-2.6-arm 2005-10-19 23:12:03 -07:00
Yasunori Goto
281dd25cdc [PATCH] swiotlb: make sure initial DMA allocations really are in DMA memory
This introduces a limit parameter to the core bootmem allocator; The new
parameter indicates that physical memory allocated by the bootmem
allocator should be within the requested limit.

We also introduce alloc_bootmem_low_pages_limit, alloc_bootmem_node_limit,
alloc_bootmem_low_pages_node_limit apis, but alloc_bootmem_low_pages_limit
is the only api used for swiotlb.

The existing alloc_bootmem_low_pages() api could instead have been
changed and made to pass right limit to the core allocator.  But that
would make the patch more intrusive for 2.6.14, as other arches use
alloc_bootmem_low_pages().  We may be done that post 2.6.14 as a
cleanup.

With this, swiotlb gets memory within 4G for both x86_64 and ia64
arches.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Ravikiran G Thirumalai <kiran@scalex86.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-19 23:11:33 -07:00
Seth, Rohit
3359b54c8c [PATCH] Handle spurious page fault for hugetlb region
The hugetlb pages are currently pre-faulted.  At the time of mmap of
hugepages, we populate the new PTEs.  It is possible that HW has already
cached some of the unused PTEs internally.  These stale entries never
get a chance to be purged in existing control flow.

This patch extends the check in page fault code for hugepages.  Check if
a faulted address falls with in size for the hugetlb file backing it.
We return VM_FAULT_MINOR for these cases (assuming that the arch
specific page-faulting code purges the stale entry for the archs that
need it).

Signed-off-by: Rohit Seth <rohit.seth@intel.com>

[ This is apparently arguably an ia64 port bug. But the code won't
  hurt, and for now it fixes a real problem on some ia64 machines ]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-19 13:56:27 -07:00
Paul Schulz
d1972efaf2 [ARM] 3023/1: pxa-regs: Typo in ARM pxa register definitions.
Patch from Paul Schulz

The following trivial patch is to fix what looks like a typo in the PXA register
definitions. The correction comes directly from the definition in the
Intel Documentation.

 http://www.intel.com/design/pca/applicationsprocessors/manuals/278693.htm
 Intel(R) PXA 255 Processor - Developers Manual - Jan 2004 - Page 12-33

Neither 'UDCCS_IO_ROF' or 'UDCCS_IO_DME' are currently used elseware
in the main code (from grep of tree)... The current definitions have been
in the code since at lease 2.4.7.

Signed-off-by: Paul Schulz <paul@mawsonlakes.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-10-18 19:40:32 +01:00
Kenneth Tan
251b928cdf [ARM] 3021/1: Interrupt 0 bug fix for ixp4xx
Patch from Kenneth Tan

The get_irqnr_and_base subroutine of ixp4xx does not take interrupt 0 condition into account properly. We should not perform "subs" here. The Z flag will be set when interrupt 0 occur, which resulting "movne r1, sp" in the caller routine (irq_handler) not being executed.

When interrupt 0 occur:
o if CONFIG_CPU_IXP46X is not set, "subs" will set the Z flag and return
o if CONFIG_CPU_IXP46X is set, codes in upper interrupt handling will be trigerred. But since this is not supper interrupt, the "cmp" in the upper interrupt handling portion will set the Z flag and return

Signed-off-by: Kenneth Tan <chong.yin.tan@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-10-18 07:53:35 +01:00
Kenneth Tan
ad1b472bea [ARM] 3020/1: Fixes typo error CONFIG_CPU_IXP465, which should be CONFIG_CPU_IXP46X
Patch from Kenneth Tan

The cpu_is_ixp465 macro in include/asm-arm/arch-ixp4xx/hardware.h is always returning 0 because #ifdef CONFIG_CPU_IXP465 is always false.

Signed-off-by: Kenneth Tan <chong.yin.tan@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-10-18 07:51:35 +01:00
Nicolas Pitre
9b15c6c4e2 [ARM] 3019/1: fix wrong comments
Patch from Nicolas Pitre

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-10-18 07:51:34 +01:00
Zach Brown
4faa528528 [PATCH] aio: revert lock_kiocb()
lock_kiocb() was introduced to serialize retrying and cancellation.  In the
process of doing so it tried to sleep waiting for KIF_LOCKED while holding
the ctx_lock spinlock.  Recent fixes have ensured that multiple concurrent
retries won't be attempted for a given iocb.  Cancel has other problems and
has no significant in-tree users that have been complaining about it.  So
for the immediate future we'll revert sleeping with the lock held and will
address proper cancellation and retry serialization in the future.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-17 17:03:57 -07:00
Eric Dumazet
5ee832dbc6 [PATCH] rcu: keep rcu callback event counter
This makes call_rcu() keep track of how many events there are on the RCU
list, and cause a reschedule event when the list gets too long.

This helps keep RCU event lists down.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-17 15:27:58 -07:00
Herbert Xu
b24d18aa74 [PATCH] list: add missing rcu_dereference on first element
It seems that all the list_*_rcu primitives are missing a memory barrier
on the very first dereference.  For example,

#define list_for_each_rcu(pos, head) \
	for (pos = (head)->next; prefetch(pos->next), pos != (head); \
		pos = rcu_dereference(pos->next))

It will go something like:

	pos = (head)->next

	prefetch(pos->next)

	pos != (head)

	do stuff

We're missing a barrier here.

	pos = rcu_dereference(pos->next)

		fetch pos->next

		barrier given by rcu_dereference(pos->next)

		store pos

Without the missing barrier, the pos->next value may turn out to be stale.
In fact, if "do stuff" were also dereferencing pos and relying on
list_for_each_rcu to provide the barrier then it may also break.

So here is a patch to make sure that we have a barrier for the first
element in the list.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-17 08:59:10 -07:00
Al Viro
688ce17b85 [PATCH]: highest_possible_processor_id() has to be a macro
... otherwise, things like alpha and sparc64 break and break
badly.  They define cpu_possible_map to something else in smp.h
*AFTER* having included cpumask.h.

	If that puppy is a macro, expansion will happen at the actual
caller, when we'd already seen #define cpu_possible_map ... and we will
get the right thing used.

	As an inline helper it will be tokenized before we get to that
define and that's it; no matter what we define later, it won't affect
anything.  We get modules with dependency on cpu_possible_map instead
of the right symbol (phys_cpu_present_map in case of sparc64), or outright
link errors if they are built-in.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-16 00:17:33 -07:00
Linus Torvalds
f8cc5756de Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 2005-10-14 17:17:04 -07:00
Linus Torvalds
9e04099cb9 Merge master.kernel.org:/home/rmk/linux-2.6-arm 2005-10-14 17:16:35 -07:00
Tim Schmielau
e26148d934 [PATCH] Fix copy-and-paste error in BSD accounting
Fix copy and paste error in jiffies_to_AHZ conversion which leads to wrong
BSD accounting information on alpha and ia64 when
CONFIG_BSD_PROCESS_ACCT_V3 is turned on.

Also update comment to match reorganised header files.

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-14 17:10:12 -07:00
Richard Purdie
cb38c569e5 [ARM] 3011/1: pxafb: Add ability to set device parent + fix spitz compile error
Patch from Richard Purdie

Add a function to allow machines to set the parent of the pxa
framebuffer device. This means the power up/down sequence can be
controlled where required by the machine.

Update spitz to use the new function, fixing a compile error.

Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-10-14 16:07:25 +01:00
Ben Dooks
6205d158d1 [ARM] 3009/1: S3C2410 - io.h offsets too large for LDRH/STRH
Patch from Ben Dooks

The __inwc/__outwc calls are capable of creating
LDRH and STRH instructions with offsets over 8bits
as GCC does not have a constraint for an 8bit
offset.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-10-14 12:24:24 +01:00
David S. Miller
688cb30bdc [SPARC64]: Eliminate PCI IOMMU dma mapping size limit.
The hairy fast allocator in the sparc64 PCI IOMMU code
has a hard limit of 256 pages.  Certain devices can
exceed this when performing very large I/Os.

So replace with a more simple allocator, based largely
upon the arch/ppc64/kernel/iommu.c code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-13 22:15:24 -07:00
David S. Miller
51e8513615 [SPARC64]: Consolidate common PCI IOMMU init code.
All the PCI controller drivers were doing the same thing
setting up the IOMMU software state, put it all in one spot.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-13 21:10:08 -07:00
David S. Miller
c8923c6b85 [NETFILTER]: Fix OOPSes on machines with discontiguous cpu numbering.
Original patch by Harald Welte, with feedback from Herbert Xu
and testing by Sbastien Bernard.

EBTABLES, ARP tables, and IP/IP6 tables all assume that cpus
are numbered linearly.  That is not necessarily true.

This patch fixes that up by calculating the largest possible
cpu number, and allocating enough per-cpu structure space given
that.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-13 14:41:23 -07:00
Linus Torvalds
c931488cc4 Merge master.kernel.org:/home/rmk/linux-2.6-arm 2005-10-13 09:59:32 -07:00
Ben Dooks
9153bd75f2 [ARM] 3005/1: S3C2440 - add definition for s3c2440_set_dsc() call in hardware.h
Patch from Ben Dooks

include/asm-arm/arch-s3c2410/hardware.h was missing
the definition for s3c2440_set_dsc()

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-10-13 16:46:35 +01:00
Linus Torvalds
02d31ed258 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-10-12 19:07:59 -07:00
Ben Dooks
afb997c616 [NETPOLL]: wrong return for null netpoll_poll_lock()
When netpoll is not being used, the macro that
defines the removed routing netpoll_poll_lock
defines the return as zero, but the real
routine returns a `void *`

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-12 15:12:21 -07:00
Liam Girdwood
a451e28c76 [ARM] 3003/1: SSP channel map register updates for pxa2xx
Patch from Liam Girdwood

This patch updates the pxa2xx channel map registers definitions in
pxa-regs.h

Changes:-
  o Added description for SSP2 registers
  o Added definitions for SSP3 registers

Signed-off-by:Liam Girdwood <liam.girdwood@wolfsonmicro.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-10-12 19:58:12 +01:00
Benjamin Herrenschmidt
d8e998c58a [PATCH] ppc32: Tell userland about lack of standard TB
Glibc is about to get some new high precision timer stuff that relies on
the standard timebase of the PPC architecture.

However, some (rare & old) CPUs do not have such timebase and it is a
bit annoying to have your stuff just crash because you are running on
the wrong CPU...

This exposes to userland a CPU feature bit that tells that the current
processor doesn't have a standard timebase.  It's negative logic so that
glibc will still "just work" on older kernels (it will just be unhappy
on those old CPUs but that doesn't really matter as distro tend to
update glibc & kernel at the same time).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-12 08:24:47 -07:00
Benjamin Herrenschmidt
cbd27b8ced [PATCH] ppc32: Fix timekeeping
Interestingly enough, ppc32 had broken timekeeping for ages...  It
worked, but probably drifted a bit more than could be explained by the
actual bad precision of the timebase calibration.  We discovered that
recently when somebody figured out that the common code was using
CLOCK_TICK_RATE to correct the timekeeing, and ppc32 had a completely
bogus value for it.

This patch turns it into something saner.  Probably not as good as doing
something based on the actual timebase frequency precision but I'll
leave that sort of math to others.  This at least makes it better for
the common HZ values.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-12 08:24:47 -07:00
Arnaldo Carvalho de Melo
eeb2b85606 [TWSK]: Grab the module refcount for timewait sockets
This is required to avoid unloading a module that has active timewait
sockets, such as DCCP.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 21:25:23 -07:00
Pablo Neira Ayuso
3392315375 [NETFILTER] ctnetlink: allow userspace to change TCP state
This patch adds the ability of changing the state a TCP connection. I know
that this must be used with care but it's required to provide a complete
conntrack creation via conntrack_netlink. So I'll document this aspect on
the upcoming docs.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 21:23:28 -07:00
Harald Welte
a051a8f730 [NETFILTER]: Use only 32bit counters for CONNTRACK_ACCT
Initially we used 64bit counters for conntrack-based accounting, since we
had no event mechanism to tell userspace that our counters are about to
overflow.  With nfnetlink_conntrack, we now have such a event mechanism and
thus can save 16bytes per connection.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 21:21:10 -07:00
Pablo Neira Ayuso
e1c73b78e3 [NETFILTER] ctnetlink: add one nesting level for TCP state
To keep consistency, the TCP private protocol information is nested
attributes under CTA_PROTOINFO_TCP. This way the sequence of attributes to
access the TCP state information looks like here below:

CTA_PROTOINFO
CTA_PROTOINFO_TCP
CTA_PROTOINFO_TCP_STATE

instead of:

CTA_PROTOINFO
CTA_PROTOINFO_TCP_STATE

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 20:55:49 -07:00
Harald Welte
5bbc243aaf [NETFILTER]: Add missing include to ip_conntrack_tuple.h
Without this #include, __be16 is not defined and userspace programs
will break.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 20:54:01 -07:00
Harald Welte
b3a91d037a [NETFILTER] nat: remove bogus structure member
When 'rustynat' was merged in 2.6.12, the use of the "helper" pointer of
struct ipt_nat_info was obsoleted, but the pointer not removed from the
struct.

This patch removes the pointer, thereby yet again shrinking struct
ip_conntrack.

Discovered-by: Rusty Russell <rusty@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 20:52:36 -07:00
Harald Welte
ebe0bbf06c [NETFILTER] nfnetlink: use highest bit of nfa_type to indicate nested TLV
As Henrik Nordstrom pointed out, all our efforts with "split endian" (i.e.
host byte order tags, net byte order values) are useless, unless a parser
can determine whether an attribute is nested or not.

This patch steals the highest bit of nfattr.nfa_type to indicate whether
the data payload contains a nested nfattr (1) or not (0).

This will break userspace compatibility, but luckily no kernel with
nfnetlink was released so far.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 20:52:19 -07:00
Andi Kleen
421c7ce6d0 [PATCH] x86_64: Allocate cpu local data for all possible CPUs
CPU hotplug fills up the possible map to NR_CPUs, but it did that after
setting up per CPU data. This lead to CPU data not getting allocated
for all possible CPUs, which lead to various side effects.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-10 16:33:25 -07:00
Harald Welte
46113830a1 [PATCH] Fix signal sending in usbdevio on async URB completion
If a process issues an URB from userspace and (starts to) terminate
before the URB comes back, we run into the issue described above.  This
is because the urb saves a pointer to "current" when it is posted to the
device, but there's no guarantee that this pointer is still valid
afterwards.

In fact, there are three separate issues:

1) the pointer to "current" can become invalid, since the task could be
   completely gone when the URB completion comes back from the device.

2) Even if the saved task pointer is still pointing to a valid task_struct,
   task_struct->sighand could have gone meanwhile.

3) Even if the process is perfectly fine, permissions may have changed,
   and we can no longer send it a signal.

So what we do instead, is to save the PID and uid's of the process, and
introduce a new kill_proc_info_as_uid() function.

Signed-off-by: Harald Welte <laforge@gnumonks.org>
[ Fixed up types and added symbol exports ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-10 16:16:33 -07:00
Linus Torvalds
eb1b74e097 Merge master.kernel.org:/home/rmk/linux-2.6-arm 2005-10-10 08:38:52 -07:00
Rafael J. Wysocki
3dd083255d [PATCH] x86_64: Set up safe page tables during resume
The following patch makes swsusp avoid the possible temporary corruption
of page translation tables during resume on x86-64.  This is achieved by
creating a copy of the relevant page tables that will not be modified by
swsusp and can be safely used by it on resume.

The problem is that during resume on x86-64 swsusp may temporarily
corrupt the page tables used for the direct mapping of RAM.  If that
happens, a page fault occurs and cannot be handled properly, which leads
to the solid hang of the affected system.  This leads to the loss of the
system's state from before suspend and may result in the loss of data or
the corruption of filesystems, so it is a serious issue.  Also, it
appears to happen quite often (for me, as often as 50% of the time).

The problem is related to the fact that (at least) one of the PMD
entries used in the direct memory mapping (starting at PAGE_OFFSET)
points to a page table the physical address of which is much greater
than the physical address of the PMD entry itself.  Moreover,
unfortunately, the physical address of the page table before suspend
(i.e.  the one stored in the suspend image) happens to be different to
the physical address of the corresponding page table used during resume
(i.e.  the one that is valid right before swsusp_arch_resume() in
arch/x86_64/kernel/suspend_asm.S is executed).  Thus while the image is
restored, the "offending" PMD entry gets overwritten, so it does not
point to the right physical address any more (i.e.  there's no page
table at the address pointed to by it, because it points to the address
the page table has been at during suspend).  Consequently, if the PMD
entry is used later on, and it _is_ used in the process of copying the
image pages, a page fault occurs, but it cannot be handled in the normal
way and the system hangs.

In principle we can call create_resume_mapping() from
swsusp_arch_resume() (ie.  from suspend_asm.S), but then the memory
allocations in create_resume_mapping(), resume_pud_mapping(), and
resume_pmd_mapping() must be made carefully so that we use _only_
NosaveFree pages in them (the other pages are overwritten by the loop in
swsusp_arch_resume()).  Additionally, we are in atomic context at that
time, so we cannot use GFP_KERNEL.  Moreover, if one of the allocations
fails, we should free all of the allocated pages, so we need to trace
them somehow.

All of this is done in the appended patch, except that the functions
populating the page tables are located in arch/x86_64/kernel/suspend.c
rather than in init.c.  It may be done in a more elegan way in the
future, with the help of some swsusp patches that are in the works now.

[AK: move some externs into headers, renamed a function]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-10 08:36:46 -07:00