* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] 3030/2: fix permission check in the obscur cmpxchg syscall
[ARM] nommu: rename compressed/head.S symbols to a new style
[ARM] select TLS_REG_EMUL and NEEDS_SYSCALL_FOR_CMPXCHG
[ARM] nommu: Move hardware page table definitions to pgtable-hwdef.h
[ARM] Move read of processor ID out of lookup_processor_type()
[ARM] Fix typo in tlbflush.h
[ARM] noMMU: removes TLB codes in nommu mode
[ARM] noMMU: block sys_fork in nommu mode
[ARM] 3399/1: Fix link problem when CONFIG_PRINTK is disabled
[ARM] 3398/1: Fix the VFP registers loading/storing base address
[ARM] 3397/1: AT91RM9200 Header update
[ARM] 3385/1: Battery support for sharp zaurus sl-5500 (collie)
[ARM] SMP: don't set cpu_*_map in smp_prepare_boot_cpu
include/linux/clk.h is betraying its ARM origins
[ARM] Move enable_irq and disable_irq to assembler.h
[ARM] 3391/1: use PLAT8250_DEV_PLATFORM{,1} for platform device id instead of 0/1
Patch from Lennert Buytenhek
Add a PLAT8250_DEV_PLATFORM2, and convert the two ixdp2x01 CPLD serial
ports to use platform serial devices with ids PLAT8250_DEV_PLATFORM[12].
(The on-chip xscale UART is PLAT8250_DEV_PLATFORM, id #0.)
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
include/linux/clk.h is betraying its ARM origins.
Signed-off-by: Todd Poynor <tpoynor@mvista.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (22 commits)
[PATCH] fix audit_init failure path
[PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end and audit_format
[PATCH] sem2mutex: audit_netlink_sem
[PATCH] simplify audit_free() locking
[PATCH] Fix audit operators
[PATCH] promiscuous mode
[PATCH] Add tty to syscall audit records
[PATCH] add/remove rule update
[PATCH] audit string fields interface + consumer
[PATCH] SE Linux audit events
[PATCH] Minor cosmetic cleanups to the code moved into auditfilter.c
[PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALL
[PATCH] Fix IA64 success/failure indication in syscall auditing.
[PATCH] Miscellaneous bug and warning fixes
[PATCH] Capture selinux subject/object context information.
[PATCH] Exclude messages by message type
[PATCH] Collect more inode information during syscall processing.
[PATCH] Pass dentry, not just name, in fsnotify creation hooks.
[PATCH] Define new range of userspace messages.
[PATCH] Filter rule comparators
...
Fixed trivial conflict in security/selinux/hooks.c
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (103 commits)
SUNRPC,RPCSEC_GSS: spkm3--fix config dependencies
SUNRPC,RPCSEC_GSS: spkm3: import contexts using NID_cast5_cbc
LOCKD: Make nlmsvc_traverse_shares return void
LOCKD: nlmsvc_traverse_blocks return is unused
SUNRPC,RPCSEC_GSS: fix krb5 sequence numbers.
NFSv4: Dont list system.nfs4_acl for filesystems that don't support it.
SUNRPC,RPCSEC_GSS: remove unnecessary kmalloc of a checksum
SUNRPC: Ensure rpc_call_async() always calls tk_ops->rpc_release()
SUNRPC: Fix memory barriers for req->rq_received
NFS: Fix a race in nfs_sync_inode()
NFS: Clean up nfs_flush_list()
NFS: Fix a race with PG_private and nfs_release_page()
NFSv4: Ensure the callback daemon flushes signals
SUNRPC: Fix a 'Busy inodes' error in rpc_pipefs
NFS, NLM: Allow blocking locks to respect signals
NFS: Make nfs_fhget() return appropriate error values
NFSv4: Fix an oops in nfs4_fill_super
lockd: blocks should hold a reference to the nlm_file
NFSv4: SETCLIENTID_CONFIRM should handle NFS4ERR_DELAY/NFS4ERR_RESOURCE
NFSv4: Send the delegation stateid for SETATTR calls
...
This fixes problems with very large nodes (over 128GB) filling up all of
the first 4GB with their mem_map and not leaving enough space for the
swiotlb.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Move the core parser into dmi_scan.c. It can be useful for other
subsystems too.
- Differentiate between field doesn't exist and field is 0 or
unparseable. The first case is likely an old BIOS with broken ACPI,
the later is likely a slightly buggy BIOS where someone forget to
edit the date. Don't blacklist in the later case.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (21 commits)
BUG_ON() Conversion in drivers/video/
BUG_ON() Conversion in drivers/parisc/
BUG_ON() Conversion in drivers/block/
BUG_ON() Conversion in sound/sparc/cs4231.c
BUG_ON() Conversion in drivers/s390/block/dasd.c
BUG_ON() Conversion in lib/swiotlb.c
BUG_ON() Conversion in kernel/cpu.c
BUG_ON() Conversion in ipc/msg.c
BUG_ON() Conversion in block/elevator.c
BUG_ON() Conversion in fs/coda/
BUG_ON() Conversion in fs/binfmt_elf_fdpic.c
BUG_ON() Conversion in input/serio/hil_mlc.c
BUG_ON() Conversion in md/dm-hw-handler.c
BUG_ON() Conversion in md/bitmap.c
The comment describing how MS_ASYNC works in msync.c is confusing
rcu: undeclared variable used in documentation
fix typos "wich" -> "which"
typo patch for fs/ufs/super.c
Fix simple typos
tabify drivers/char/Makefile
...
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[NETFILTER] x_table.c: sem2mutex
[IPV4]: Aggregate route entries with different TOS values
[TCP]: Mark tcp_*mem[] __read_mostly.
[TCP]: Set default max buffers from memory pool size
[SCTP]: Fix up sctp_rcv return value
[NET]: Take RTNL when unregistering notifier
[WIRELESS]: Fix config dependencies.
[NET]: Fill in a 32-bit hole in struct sock on 64-bit platforms.
[NET]: Ensure device name passed to SO_BINDTODEVICE is NULL terminated.
[MODULES]: Don't allow statically declared exports
[BRIDGE]: Unaligned accesses in the ethernet bridge
This removes the support for pps. It's completely unused within the kernel
and is basically in the way for further cleanups. It should be easier to
readd proper support for it after the rest has been converted to NTP4
(where the pps mechanisms are quite different from NTP3 anyway).
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: Adrian Bunk <bunk@stusta.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes duplicate definitions from include/linux/udf_fs_i.h
which are already defined in fs/udf/ecma_167.h.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Moved check for online cpu out of smp_prepare_cpu()
- Moved default declaration of smp_prepare_cpu() to kernel/cpu.c
- Removed lock_cpu_hotplug() from smp_prepare_cpu() to around it, since
its called from cpu_up() as well now.
- Removed clearing from cpu_present_map during cpu_offline as it breaks
using cpu_up() directly during a subsequent online operation.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
text data bss dec hex filename
before: 3605597 1363528 363328 5332453 515de5 vmlinux
after: 3605295 1363612 363200 5332107 515c8b vmlinux
218 bytes saved.
Also, optimise any_online_cpu() out of existence on CONFIG_SMP=n.
This function seems inefficient. Can't we simply AND the two masks, then use
find_first_bit()?
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Shrinks the only caller (net/bridge/netfilter/ebtables.c) by 174 bytes.
Also, optimise highest_possible_processor_id() out of existence on
CONFIG_SMP=n.
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Documentation changes to help radix tree users avoid overrunning the tags
array. RADIX_TREE_TAGS moves to linux/radix-tree.h and is now known as
RADIX_TREE_MAX_TAGS (Nick Piggin's idea). Tag parameters are changed to
unsigned, and some comments are updated.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fls() takes an integer, so roundup_pow_of_two() is busted for ulongs larger
than 2^32-1.
Fix this by implementing and using fls_long().
(Why does roundup_pow_of_two() return a long?)
(Why is roundup_pow_of_two() __attribute_const__ whereas long_log2() is
__attribute_pure__?)
(Why does long_log2() suck so much? Because we were missing fls_long()?)
Cc: Roland Dreier <rdreier@cisco.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Cc: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move capable() to kernel/capability.c and eliminate duplicate
implementations. Add __capable() function which can be used to check for
capabiilty of any process.
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a proper prototype for setup_arch() in init.h.
This patch is based on a patch by Ben Dooks <ben-linux@fluff.org>.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Uninline some massive IRQ migration functions. Put them in the new
kernel/irq/migration.c.
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the assumption that driver_register() returns the number of devices
bound to the driver. In fact, it returns zero for success or a negative
error value.
zorro_module_init() used the device count to automatically unregister and
unload drivers that found no devices. That might have worked at one time,
but has been broken for some time because zorro_register_driver() returned
either a negative error or a positive count (never zero). So it could only
unregister on failure, when it's not needed anyway.
This functionality could be resurrected in individual drivers by counting
devices in their .probe() methods.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the assumption that driver_register() returns the number of devices
bound to the driver. In fact, it returns zero for success or a negative
error value.
dio_module_init() used the device count to automatically unregister and
unload drivers that found no devices. That might have worked at one time,
but has been broken for some time because dio_register_driver() returned
either a negative error or a positive count (never zero). So it could only
unregister on failure, when it's not needed anyway.
This functionality could be resurrected in individual drivers by counting
devices in their .probe() methods.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Philip Blundell <philb@gnu.org>
Cc: Jochen Friedrich <jochen@scram.de>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean up several places where gcc issues warnings when -W is specified.
Thanks to Neil for finding that.
Signed-off-by: Vladimir V. Saveliev <vs@namesys.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Hans Reiser <reiser@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Today I wondered about /dev/parport<n> after not seeing anything in
drivers/parport register char-major-99. Having PP_MAJOR in
include/linux/major.h would've allowed me to more quickly determine that it
was the ppdev driver driving these.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Previous inotify work avoidance is good when inotify is completely unused,
but it breaks down if even a single watch is in place anywhere in the
system. Robin Holt notices that udev is one such culprit - it slows down a
512-thread application on a 512 CPU system from 6 seconds to 22 minutes.
Solve this by adding a flag in the dentry that tells inotify whether or not
its parent inode has a watch on it. Event queueing to parent will skip
taking locks if this flag is cleared. Setting and clearing of this flag on
all child dentries versus event delivery: this is no in terms of race
cases, and that was shown to be equivalent to always performing the check.
The essential behaviour is that activity occuring _after_ a watch has been
added and _before_ it has been removed, will generate events.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
param_array() in kernel/params.c can now become static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
MODULE_PARM was actually breaking: recent gcc version optimize them out as
unused. It's time to replace the last users, which are generally in the
most unloved drivers anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a couple of 'const' qualifiers to the TTY flip buffer APIs, where
appropriate.
Signed-off-by: Thomas Koeller <thomas@koeller.dyndns.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce FMODE_EXEC file flag, to indicate that file is being opened for
execution. This is useful for distributed filesystems to maintain
consistent behavior for returning ETXTBUSY when opening for write and
execution happens on different nodes.
akpm:
Needed by Lustre at present. I assume their objective to to work towards
being able to install Lustre on an unmodified distro kernel, which seems
sane. It should have zero runtime cost.
Trond and Chuck indicate that NFS4 can probably use this too, for the same
thing.
Steven says it's also on the GFS todo list.
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Chuck Lever <cel@citi.umich.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a bug in reiserfs truncate. A transaction might overflow
when truncating long highly fragmented file. The fix is to split
truncation into several transactions to avoid overflowing.
Signed-off-by: Vladimir V. Saveliev <vs@namesys.com>
Cc; Charles McColgan <cm@chuck.net>
Cc: Alexander Zarochentsev <zam@namesys.com>
Cc: Hans Reiser <reiser@namesys.com>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's no reason for iprune_mutex being global.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The kjournald timer is currently on the kernel thread's stack and the journal
structure points at it. Save a pointer hop by moving the timer into the
journal structure.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As suggested by Eric Dumazet, optimize kzalloc() calls that pass a
compile-time constant size. Please note that the patch increases kernel
text slightly (~200 bytes for defconfig on x86).
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce a memory-zeroing variant of kmem_cache_alloc. The allocator
already exits in XFS and there are potential users for it so this patch
makes the allocator available for the general public.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
alarm() calls the kernel with an unsigend int timeout in seconds. The
value is stored in the tv_sec field of a struct timeval to setup the
itimer. The tv_sec field of struct timeval is of type long, which causes
the tv_sec value to be negative on 32 bit machines if seconds > INT_MAX.
Before the hrtimer merge (pre 2.6.16) such a negative value was converted
to the maximum jiffies timeout by the timeval_to_jiffies conversion. It's
not clear whether this was intended or just happened to be done by the
timeval_to_jiffies code.
hrtimers expect a timeval in canonical form and treat a negative timeout as
already expired. This breaks the legitimate usage of alarm() with a
timeout value > INT_MAX seconds.
For 32 bit machines it is therefor necessary to limit the internal seconds
value to avoid API breakage. Instead of doing this in all implementations
of sys_alarm the duplicated sys_alarm code is moved into a common function
in itimer.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add an extern declaration for exported symbols to make the compiler warn
on symbols declared statically.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new audio mode V4L2_TUNER_MODE_LANG1_LANG2 (used by VIDIOC_G/S_TUNER).
This mode allows the user to select both languages of a bilingual transmission,
one language on the left, one on the right audio channel. If there is no
bilingual transmission, or it is not supported, then this mode should act like
V4L2_TUNER_MODE_STEREO.
This mode is introduced for PVR-like drivers where it is useful to be able to
record both languages of a bilingual broadcast.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
E.D.D. has no user in-tree and mostly useless. Kill it. For possible
out-of-tree users, add a nice warning message and error handling if
LLDD doesn't report any useable reset mechanism (and thus tries to use
E.D.D.).
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[PATCH] libata: Remove dependence on host_set->dev for SAS
[PATCH] libata: ata_scsi_ioctl cleanup
[PATCH] libata: ata_scsi_queuecmd cleanup
[libata] export ata_dev_pair; trim trailing whitespace
[PATCH] libata: add ata_dev_pair helper
[PATCH] Make libata not powerdown drivers on PM_EVENT_FREEZE.
[PATCH] libata: make ata_set_mode() responsible for failure handling
[PATCH] libata: use ata_dev_disable() in ata_bus_probe()
[PATCH] libata: implement ata_dev_disable()
[PATCH] libata: check if port is disabled after internal command
[PATCH] libata: make per-dev transfer mode limits per-dev
[PATCH] libata: add per-dev pio/mwdma/udma_mask
[PATCH] libata: implement ata_unpack_xfermask()
[libata] Move some bmdma-specific code to libata-bmdma.c
[libata sata_uli] kill scr_addr abuse
[libata sata_nv] eliminate duplicate codepaths with iomap
[libata sata_nv] cleanups: convert #defines to enums; remove in-file history
[libata sata_sil24] cleanups: use pci_iomap(), kzalloc()
This patch series creates a strndup_user() function to easy copying C strings
from userspace. Also we avoid common pitfalls like userspace modifying the
final \0 after the strlen_user().
Signed-off-by: Davi Arnaut <davi.arnaut@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make the softlockup detector purely timer-interrupt driven, removing
softirq-context (timer) dependencies. This means that if the softlockup
watchdog triggers, it has truly observed a longer than 10 seconds
scheduling delay of a SCHED_FIFO prio 99 task.
(the patch also turns off the softlockup detector during the initial bootup
phase and does small style fixes)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In some embedded systems the IDE hardware interface may only support 16-bit
or smaller accesses. Allow the interface to specify if this is the case
and don't allow the drive or user to override the setting.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Class code and register definitions for the Secure Digital Host Controller
standard.
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pull the guts out of do_fsync() - we can use it elsewhere.
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We need set_page_dirty() to return true if it actually transitioned the page
from a clean to dirty state. This wasn't right in a couple of places. Do a
kernel-wide audit, fix things up.
This leaves open the possibility of returning a negative errno from
set_page_dirty() sometime in the future. But we don't do that at present.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Modify balance_dirty_pages_ratelimited() so that it can take a
number-of-pages-which-I-just-dirtied argument. For msync().
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add two new linux-specific fadvise extensions():
LINUX_FADV_ASYNC_WRITE: start async writeout of any dirty pages between file
offsets `offset' and `offset+len'. Any pages which are currently under
writeout are skipped, whether or not they are dirty.
LINUX_FADV_WRITE_WAIT: wait upon writeout of any dirty pages between file
offsets `offset' and `offset+len'.
By combining these two operations the application may do several things:
LINUX_FADV_ASYNC_WRITE: push some or all of the dirty pages at the disk.
LINUX_FADV_WRITE_WAIT, LINUX_FADV_ASYNC_WRITE: push all of the currently dirty
pages at the disk.
LINUX_FADV_WRITE_WAIT, LINUX_FADV_ASYNC_WRITE, LINUX_FADV_WRITE_WAIT: push all
of the currently dirty pages at the disk, wait until they have been written.
It should be noted that none of these operations write out the file's
metadata. So unless the application is strictly performing overwrites of
already-instantiated disk blocks, there are no guarantees here that the data
will be available after a crash.
To complete this suite of operations I guess we should have a "sync file
metadata only" operation. This gives applications access to all the building
blocks needed for all sorts of sync operations. But sync-metadata doesn't fit
well with the fadvise() interface. Probably it should be a new syscall:
sys_fmetadatasync().
The patch also diddles with the meaning of `endbyte' in sys_fadvise64_64().
It is made to represent that last affected byte in the file (ie: it is
inclusive). Generally, all these byterange and pagerange functions are
inclusive so we can easily represent EOF with -1.
As Ulrich notes, these two functions are somewhat abusive of the fadvise()
concept, which appears to be "set the future policy for this fd".
But these commands are a perfect fit with the fadvise() impementation, and
several of the existing fadvise() commands are synchronous and don't affect
future policy either. I think we can live with the slight incongruity.
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide abstraction for generating type and size information of assembly
routines and data, while permitting architectures to override these
defaults.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: "Russell King" <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "Andi Kleen" <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The hooks in the slab cache allocator code path for support of NUMA
mempolicies and cpuset memory spreading are in an important code path. Many
systems will use neither feature.
This patch optimizes those hooks down to a single check of some bits in the
current tasks task_struct flags. For non NUMA systems, this hook and related
code is already ifdef'd out.
The optimization is done by using another task flag, set if the task is using
a non-default NUMA mempolicy. Taking this flag bit along with the
PF_SPREAD_PAGE and PF_SPREAD_SLAB flag bits added earlier in this 'cpuset
memory spreading' patch set, one can check for the combination of any of these
special case memory placement mechanisms with a single test of the current
tasks task_struct flags.
This patch also tightens up the code, to save a few bytes of kernel text
space, and moves some of it out of line. Due to the nested inlines called
from multiple places, we were ending up with three copies of this code, which
once we get off the main code path (for local node allocation) seems a bit
wasteful of instruction memory.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide the slab cache infrastructure to support cpuset memory spreading.
See the previous patches, cpuset_mem_spread, for an explanation of cpuset
memory spreading.
This patch provides a slab cache SLAB_MEM_SPREAD flag. If set in the
kmem_cache_create() call defining a slab cache, then any task marked with the
process state flag PF_MEMSPREAD will spread memory page allocations for that
cache over all the allowed nodes, instead of preferring the local (faulting)
node.
On systems not configured with CONFIG_NUMA, this results in no change to the
page allocation code path for slab caches.
On systems with cpusets configured in the kernel, but the "memory_spread"
cpuset option not enabled for the current tasks cpuset, this adds a call to a
cpuset routine and failed bit test of the processor state flag PF_SPREAD_SLAB.
For tasks so marked, a second inline test is done for the slab cache flag
SLAB_MEM_SPREAD, and if that is set and if the allocation is not
in_interrupt(), this adds a call to to a cpuset routine that computes which of
the tasks mems_allowed nodes should be preferred for this allocation.
==> This patch adds another hook into the performance critical
code path to allocating objects from the slab cache, in the
____cache_alloc() chunk, below. The next patch optimizes this
hook, reducing the impact of the combined mempolicy plus memory
spreading hooks on this critical code path to a single check
against the tasks task_struct flags word.
This patch provides the generic slab flags and logic needed to apply memory
spreading to a particular slab.
A subsequent patch will mark a few specific slab caches for this placement
policy.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the page cache allocation calls to support cpuset memory spreading.
See the previous patch, cpuset_mem_spread, for an explanation of cpuset memory
spreading.
On systems without cpusets configured in the kernel, this is no change.
On systems with cpusets configured in the kernel, but the "memory_spread"
cpuset option not enabled for the current tasks cpuset, this adds a call to a
cpuset routine and failed bit test of the processor state flag PF_SPREAD_PAGE.
On tasks in cpusets with "memory_spread" enabled, this adds a call to a cpuset
routine that computes which of the tasks mems_allowed nodes should be
preferred for this allocation.
If memory spreading applies to a particular allocation, then any other NUMA
mempolicy does not apply.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch provides the implementation and cpuset interface for an alternative
memory allocation policy that can be applied to certain kinds of memory
allocations, such as the page cache (file system buffers) and some slab caches
(such as inode caches).
The policy is called "memory spreading." If enabled, it spreads out these
kinds of memory allocations over all the nodes allowed to a task, instead of
preferring to place them on the node where the task is executing.
All other kinds of allocations, including anonymous pages for a tasks stack
and data regions, are not affected by this policy choice, and continue to be
allocated preferring the node local to execution, as modified by the NUMA
mempolicy.
There are two boolean flag files per cpuset that control where the kernel
allocates pages for the file system buffers and related in kernel data
structures. They are called 'memory_spread_page' and 'memory_spread_slab'.
If the per-cpuset boolean flag file 'memory_spread_page' is set, then the
kernel will spread the file system buffers (page cache) evenly over all the
nodes that the faulting task is allowed to use, instead of preferring to put
those pages on the node where the task is running.
If the per-cpuset boolean flag file 'memory_spread_slab' is set, then the
kernel will spread some file system related slab caches, such as for inodes
and dentries evenly over all the nodes that the faulting task is allowed to
use, instead of preferring to put those pages on the node where the task is
running.
The implementation is simple. Setting the cpuset flags 'memory_spread_page'
or 'memory_spread_cache' turns on the per-process flags PF_SPREAD_PAGE or
PF_SPREAD_SLAB, respectively, for each task that is in the cpuset or
subsequently joins that cpuset. In subsequent patches, the page allocation
calls for the affected page cache and slab caches are modified to perform an
inline check for these flags, and if set, a call to a new routine
cpuset_mem_spread_node() returns the node to prefer for the allocation.
The cpuset_mem_spread_node() routine is also simple. It uses the value of a
per-task rotor cpuset_mem_spread_rotor to select the next node in the current
tasks mems_allowed to prefer for the allocation.
This policy can provide substantial improvements for jobs that need to place
thread local data on the corresponding node, but that need to access large
file system data sets that need to be spread across the several nodes in the
jobs cpuset in order to fit. Without this patch, especially for jobs that
might have one thread reading in the data set, the memory allocation across
the nodes in the jobs cpuset can become very uneven.
A couple of Copyright year ranges are updated as well. And a couple of email
addresses that can be found in the MAINTAINERS file are removed.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
include/linux/platform.h contained nothing that was actually used except
the default_idle() prototype, and is therefore removed by this patch.
This patch does the following with the platform specific default_idle()
functions on different architectures:
- remove the unused function:
- parisc
- sparc64
- make the needlessly global function static:
- arm
- h8300
- m68k
- m68knommu
- s390
- v850
- x86_64
- add a prototype in asm/system.h:
- cris
- i386
- ia64
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Patrick Mochel <mochel@digitalimplant.org>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make that the internal values for:
/proc/sys/vm/dirty_writeback_centisecs
/proc/sys/vm/dirty_expire_centisecs
are stored as jiffies instead of centiseconds. Let the sysctl interface do
the conversions with full precision using clock_t_to_jiffies, instead of
doing overflow-sensitive on-the-fly conversions every time the values are
used.
Cons: apparent precision loss if HZ is not a multiple of 100, because of
conversion back and forth. This is a common problem for all sysctl values
that use proc_dointvec_userhz_jiffies. (There is only one other in-tree
use, in net/core/neighbour.c.)
Signed-off-by: Bart Samwel <bart@samwel.tk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Mundt <lethal@linux-sh.org> says:
This patch set implements a number of patches to clean up and restructure the
bitmap region code, in addition to extending the interface to support
multiword spanning allocations.
The current implementation (before this patch set) is limited by only being
able to allocate pages <= BITS_PER_LONG, as noted by the strategically
positioned BUG_ON() at lib/bitmap.c:752:
/* We don't do regions of pages > BITS_PER_LONG. The
* algorithm would be a simple look for multiple zeros in the
* array, but there's no driver today that needs this. If you
* trip this BUG(), you get to code it... */
BUG_ON(pages > BITS_PER_LONG);
As I seem to have been the first person to trigger this, the result ends up
being the following patch set with the help of Paul Jackson.
The final patch in the series eliminates quite a bit of code duplication, so
the bitmap code size ends up being smaller than the current implementation as
an added bonus.
After these are applied, it should already be possible to do multiword
allocations with dma_alloc_coherent() out of ranges established by
dma_declare_coherent_memory() on x86 without having to change any of the code,
and the SH store queue API will follow up on this as the other user that needs
support for this.
This patch:
Some code cleanup on the lib/bitmap.c bitmap_*_region() routines:
* spacing
* variable names
* comments
Has no change to code function.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The meaning of MS_VERBOSE is backwards; if the bit is set, it really means,
"don't be verbose". This is confusing and counter-intuitive.
In addition, there is also no way to set the MS_VERBOSE flag in the
mount(8) program in util-linux, but interesting, it does define options
which would do the right thing if MS_SILENT were defined, which
unfortunately we do not:
#ifdef MS_SILENT
{ "quiet", 0, 0, MS_SILENT }, /* be quiet */
{ "loud", 0, 1, MS_SILENT }, /* print out messages. */
#endif
So the obvious fix is to deprecate the use of MS_VERBOSE and replace it
with MS_SILENT.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
All architecture independent system calls should be declared
in syscalls.h, add the one that is missing.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove some of the dependence on the host_set struct
in preparation for supporting SAS HBAs. Adds a struct device
pointer to the ata_port struct.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
At the moment libata doesn't pass pm_message_t down ata_device_suspend.
This causes drives to be powered down when we just want a freeze,
causing unnecessary wear and tear. This patch gets pm_message_t passed
down so that it can be used to determine whether to power down the
drive.
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
drivers/scsi/libata-core.c | 5 +++--
drivers/scsi/libata-scsi.c | 4 ++--
drivers/scsi/scsi_sysfs.c | 2 +-
include/linux/libata.h | 4 ++--
include/scsi/scsi_host.h | 2 +-
5 files changed, 9 insertions(+), 8 deletions(-)
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add per-dev pio/mwdma/udma_mask. All transfer mode limits used to be
applied to ap->*_mask which unnecessarily restricted other devices
sharing the port. This change will also benefit later EH speed down
and hotplug.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* 'blktrace' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23
[PATCH] relay: consolidate sendfile() and read() code
[PATCH] relay: add sendfile() support
[PATCH] relay: migrate from relayfs to a generic relay API
Add missing 'const' to pci_request_region[s] 'res_name' arg,
since we pass it directly to __request_region(), whose 'name' arg
is also const.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It turns out AMD 8131 quirk only affects MSI for devices behind the 8131 bridge.
Handle this by adding a flags field in pci_bus, inherited from parent to child.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Change the semantics of this call to return the max reserved
bus number instead of just the max assigned bus number.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add Broadcom HT-1000 south bridge's PCI ID to i2c-piix driver. Note
that at least on Supermicro H8SSL it uses non-standard SMBHSTCFG = 3
and standard values like 0 or 9 causes hangup.
Signed-off-by: Martin Devera <devik@cdi.cz>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Drop the i2c-frodo bus driver. It isn't referenced by the build
system, and depends on code which was never included in 2.6 kernels.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
semaphore to mutex conversion.
the conversion was generated via scripts, and the result was validated
automatically via a script as well.
build tested.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch refactors SENSOR_DEVICE_ATTR_2 macro, following pattern set by
SENSOR_ATTR. First it creates a new macro SENSOR_ATTR_2() which expands
to an initialization expression, then it uses that in SENSOR_DEVICE_ATTR_2,
which declares and initializes a struct sensor_device_attribute_2.
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch refactors SENSOR_DEVICE_ATTR macro. First it creates a new
macro SENSOR_ATTR() which expands to an initialization expression, then
it uses that in SENSOR_DEVICE_ATTR, which declares and initializes a
struct sensor_device_attribute.
IOW, SENSOR_ATTR() imitates __ATTR() in include/linux/device.h.
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When we stop allocating percpu memory for not-possible CPUs we must not touch
the percpu data for not-possible CPUs at all. The correct way of doing this
is to test cpu_possible() or to use for_each_cpu().
This patch is a kernel-wide sweep of all instances of NR_CPUS. I found very
few instances of this bug, if any. But the patch converts lots of open-coded
test to use the preferred helper macros.
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Christian Zankel <chris@zankel.net>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Jens Axboe <axboe@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eliminate a handful of cache references by keeping current in a register
instead of reloading (helps x86) and avoiding the overhead of a function
call. Inlining eventpoll_init_file() saves 24 bytes. Also reorder file
initialization to make writes occur more sequentially.
Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes two needlessly global structs static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Attempt to fix the problem wherein people's oops reports scroll off the screen
due to repeated oopsing or to oopses on other CPUs.
If this happens the user can reboot with the `pause_on_oops=<seconds>' option.
It will allow the first oopsing CPU to print an oops record just a single
time. Second oopsing attempts, or oopses on other CPUs will cause those CPUs
to enter a tight loop until the specified number of seconds have elapsed.
The patch implements the infrastructure generically in the expectation that
architectures other than x86 will find it useful.
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes all occurances of _INLINE_ in the kernel.
With the exception of tty_flip.h, I've simply removed the inline's since
gcc should know best which functions to be inlined.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Declarations use struct notifier_block on both legs of the ifdef, so move the
notifier_block forward declaration outside the ifdef.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The fat code uses the fat_lock always in a mutex way (taking and releasing
the lock in the same function), the patch below converts it into the new
mutex primitive. Please consider this patch for the code.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ext3's truncate_sem is always released in the same function it's taken
and it otherwise is a mutex as well..
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Paul Clements <Paul.Clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Acked-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Seems like needless clutter having a bunch of #if defined(CONFIG_$ARCH) in
include/linux/cache.h. Move the per architecture section definition to
asm/cache.h, and keep the if-not-defined dummy case in linux/cache.h to
catch architectures which don't implement the section.
Verified that symbols still go in .data.read_mostly on parisc,
and the compile doesn't break.
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since early 2.4.x all cdrom drivers implement the block_device methods
themselves, so they can handle additional ioctls directly instead of going
through the cdrom layer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits
platforms, lowering kmalloc() allocated space by 50%.
2) Reduce the size of (files_struct), using a special 32 bits (or
64bits) embedded_fd_set, instead of a 1024 bits fd_set for the
close_on_exec_init and open_fds_init fields. This save some ram (248
bytes per task) as most tasks dont open more than 32 files. D-Cache
footprint for such tasks is also reduced to the minimum.
3) Reduce size of allocated fdset. Currently two full pages are
allocated, that is 32768 bits on x86 for example, and way too much. The
minimum is now L1_CACHE_BYTES.
UP and SMP should benefit from this patch, because most tasks will touch
only one cache line when open()/close() stdin/stdout/stderr (0/1/2),
(next_fd, close_on_exec_init, open_fds_init, fd_array[0 .. 2] being in the
same cache line)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus points out that ext3_readdir's readahead only cuts in when
ext3_readdir() is operating at the very start of the directory. So for large
directories we end up performing no readahead at all and we suck.
So take it all out and use the core VM's page_cache_readahead(). This means
that ext3 directory reads will use all of readahead's dynamic sizing goop.
Note that we're using the directory's filp->f_ra to hold the readahead state,
but readahead is actually being performed against the underlying blockdev's
address_space. Fortunately the readahead code is all set up to handle this.
Tested with printk. It works. I was struggling to find a real workload which
actually cared.
(The patch also exports page_cache_readahead() to GPL modules)
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is unsafe to suspend devices if the hardware is controlled by X. Add an
extra check to prevent this from happening.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move externs from C source files to header files.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce the low level interface that can be used for handling the
snapshot of the system memory by the in-kernel swap-writing/reading code of
swsusp and the userland interface code (to be introduced shortly).
Also change the way in which swsusp records the allocated swap pages and,
consequently, simplifies the in-kernel swap-writing/reading code (this is
necessary for the userland interface too). To this end, it introduces two
helper functions in mm/swapfile.c, so that the swsusp code does not refer
directly to the swap internals.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Both R1BIO_Barrier and R1BIO_Returned are 4 !!!!
This means that barrier requests don't get returned (i.e. b_endio called)
because it looks like they already have been.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is version 20 of the Wireless Extensions. This is the
completion of the RtNetlink work I started early 2004, it enables the
full Wireless Extension API over RtNetlink.
Few comments on the patch :
o totally driver transparent, no change in drivers needed.
o iwevent were already RtNetlink based since they were created
(around 2.5.7). This adds all the regular SET and GET requests over
RtNetlink, using the exact same mechanism and data format as iwevents.
o This is a Kconfig option, as currently most people have no
need for it. Surprisingly, patch is actually small and well
encapsulated.
o Tested on SMP, attention as been paid to make it 64 bits clean.
o Code do probably too many checks and could be further
optimised, but better safe than sorry.
o RtNetlink based version of the Wireless Tools available on
my web page for people inclined to try out this stuff.
I would also like to thank Alexey Kuznetsov for his helpful
suggestions to make this patch better.
Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add support for new chip 5755 which is very similar to 5787.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To really make sense of route notifications in the presence of
multiple tables, userspace also needs to be notified about routing
rule updates. Notifications are sent to the so far unused
RTNLGRP_NOP1 (now RTNLGRP_RULE) group.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[TCP]: Do not use inet->id of global tcp_socket when sending RST.
[NETFILTER]: Fix undefined references to get_h225_addr
[NETFILTER]: futher {ip,ip6,arp}_tables unification
[NETFILTER]: Fix xt_policy address matching
[NETFILTER]: nf_conntrack: support for layer 3 protocol load on demand
[NETFILTER]: x_tables: set the protocol family in x_tables targets/matches
[NETFILTER]: conntrack: cleanup the conntrack ID initialization
[NETFILTER]: nfnetlink_queue: fix nfnetlink message size
[NETFILTER]: ctnetlink: Fix expectaction mask dumping
[NETFILTER]: Fix Kconfig typos
[NETFILTER]: Fix ip6tables breakage from {get,set}sockopt compat layer
* master.kernel.org:/home/rmk/linux-2.6-serial:
[SERIAL] Merge avlab serial board entries in parport_serial
[SERIAL] kernel console should send CRLF not LFCR
This patch moves {ip,ip6,arp}t_entry_{match,target} definitions to
x_tables.h. This move simplifies code and future compatibility fixes.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the family field in xt_[matches|targets] registered.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[TG3]: Bump driver version and reldate.
[TG3]: Skip phy power down on some devices
[TG3]: Fix SRAM access during tg3_init_one()
[X25]: dte facilities 32 64 ioctl conversion
[X25]: allow ITU-T DTE facilities for x25
[X25]: fix kernel error message 64 bit kernel
[X25]: ioctl conversion 32 bit user to 64 bit kernel
[NET]: socket timestamp 32 bit handler for 64 bit kernel
[NET]: allow 32 bit socket ioctl in 64 bit kernel
[BLUETOOTH]: Return negative error constant
Centralize the page migration functions in anticipation of additional
tinkering. Creates a new file mm/migrate.c
1. Extract buffer_migrate_page() from fs/buffer.c
2. Extract central migration code from vmscan.c
3. Extract some components from mempolicy.c
4. Export pageout() and remove_from_swap() from vmscan.c
5. Make it possible to configure NUMA systems without page migration
and non-NUMA systems with page migration.
I had to so some #ifdeffing in mempolicy.c that may need a cleanup.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Quite a long time back, prepare_hugepage_range() replaced
is_aligned_hugepage_range() as the callback from mm/mmap.c to arch code to
verify if an address range is suitable for a hugepage mapping.
is_aligned_hugepage_range() stuck around, but only to implement
prepare_hugepage_range() on archs which didn't implement their own.
Most archs (everything except ia64 and powerpc) used the same
implementation of is_aligned_hugepage_range(). On powerpc, which
implements its own prepare_hugepage_range(), the custom version was never
used.
In addition, "is_aligned_hugepage_range()" was a bad name, because it
suggests it returns true iff the given range is a good hugepage range,
whereas in fact it returns 0-or-error (so the sense is reversed).
This patch cleans up by abolishing is_aligned_hugepage_range(). Instead
prepare_hugepage_range() is defined directly. Most archs use the default
version, which simply checks the given region is aligned to the size of a
hugepage. ia64 and powerpc define custom versions. The ia64 one simply
checks that the range is in the correct address space region in addition to
being suitably aligned. The powerpc version (just as previously) checks
for suitable addresses, and if necessary performs low-level MMU frobbing to
set up new areas for use by hugepages.
No libhugetlbfs testsuite regressions on ppc64 (POWER5 LPAR).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The optional hugepage callback, hugetlb_free_pgd_range() is presently
implemented non-trivially only on ia64 (but I plan to add one for powerpc
shortly). It has its own prototype for the function in asm-ia64/pgtable.h.
However, since the function is called from generic code, it make sense for
its prototype to be in the generic hugetlb.h header file, as the protypes
other arch callbacks already are (prepare_hugepage_range(),
set_huge_pte_at(), etc.). This patch makes it so.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
free_pgtables() has special logic to call hugetlb_free_pgd_range() instead
of the normal free_pgd_range() on hugepage VMAs. However, the test it uses
to do so is incorrect: it calls is_hugepage_only_range on a hugepage sized
range at the start of the vma. is_hugepage_only_range() will return true
if the given range has any intersection with a hugepage address region, and
in this case the given region need not be hugepage aligned. So, for
example, this test can return true if called on, say, a 4k VMA immediately
preceding a (nicely aligned) hugepage VMA.
At present we get away with this because the powerpc version of
hugetlb_free_pgd_range() is just a call to free_pgd_range(). On ia64 (the
only other arch with a non-trivial is_hugepage_only_range()) we get away
with it for a different reason; the hugepage area is not contiguous with
the rest of the user address space, and VMAs are not permitted in between,
so the test can't return a false positive there.
Nonetheless this should be fixed. We do that in the patch below by
replacing the is_hugepage_only_range() test with an explicit test of the
VMA using is_vm_hugetlb_page().
This in turn changes behaviour for platforms where is_hugepage_only_range()
returns false always (everything except powerpc and ia64). We address this
by ensuring that hugetlb_free_pgd_range() is defined to be identical to
free_pgd_range() (instead of a no-op) on everything except ia64. Even so,
it will prevent some otherwise possible coalescing of calls down to
free_pgd_range(). Since this only happens for hugepage VMAs, removing this
small optimization seems unlikely to cause any trouble.
This patch causes no regressions on the libhugetlbfs testsuite - ppc64
POWER5 (8-way), ppc64 G5 (2-way) and i386 Pentium M (UP).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Originally, mm/hugetlb.c just handled the hugepage physical allocation path
and its {alloc,free}_huge_page() functions were used from the arch specific
hugepage code. These days those functions are only used with mm/hugetlb.c
itself. Therefore, this patch makes them static and removes their
prototypes from hugetlb.h. This requires a small rearrangement of code in
mm/hugetlb.c to avoid a forward declaration.
This patch causes no regressions on the libhugetlbfs testsuite (ppc64,
POWER5).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These days, hugepages are demand-allocated at first fault time. There's a
somewhat dubious (and racy) heuristic when making a new mmap() to check if
there are enough available hugepages to fully satisfy that mapping.
A particularly obvious case where the heuristic breaks down is where a
process maps its hugepages not as a single chunk, but as a bunch of
individually mmap()ed (or shmat()ed) blocks without touching and
instantiating the pages in between allocations. In this case the size of
each block is compared against the total number of available hugepages.
It's thus easy for the process to become overcommitted, because each block
mapping will succeed, although the total number of hugepages required by
all blocks exceeds the number available. In particular, this defeats such
a program which will detect a mapping failure and adjust its hugepage usage
downward accordingly.
The patch below addresses this problem, by strictly reserving a number of
physical hugepages for hugepage inodes which have been mapped, but not
instatiated. MAP_SHARED mappings are thus "safe" - they will fail on
mmap(), not later with an OOM SIGKILL. MAP_PRIVATE mappings can still
trigger an OOM. (Actually SHARED mappings can technically still OOM, but
only if the sysadmin explicitly reduces the hugepage pool between mapping
and instantiation)
This patch appears to address the problem at hand - it allows DB2 to start
correctly, for instance, which previously suffered the failure described
above.
This patch causes no regressions on the libhugetblfs testsuite, and makes a
test (designed to catch this problem) pass which previously failed (ppc64,
POWER5).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
mprotect.
From: David Gibson <david@gibson.dropbear.id.au>
Remove a test from the mprotect() path which checks that the mprotect()ed
range on a hugepage VMA is hugepage aligned (yes, really, the sense of
is_aligned_hugepage_range() is the opposite of what you'd guess :-/).
In fact, we don't need this test. If the given addresses match the
beginning/end of a hugepage VMA they must already be suitably aligned. If
they don't, then mprotect_fixup() will attempt to split the VMA. The very
first test in split_vma() will check for a badly aligned address on a
hugepage VMA and return -EINVAL if necessary.
From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
On i386 and x86-64, pte flag _PAGE_PSE collides with _PAGE_PROTNONE. The
identify of hugetlb pte is lost when changing page protection via mprotect.
A page fault occurs later will trigger a bug check in huge_pte_alloc().
The fix is to always make new pte a hugetlb pte and also to clean up
legacy code where _PAGE_PRESENT is forced on in the pre-faulting day.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Optimise page_count compound page test and make it consistent with similar
functions.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
set_page_count usage outside mm/ is limited to setting the refcount to 1.
Remove set_page_count from outside mm/, and replace those users with
init_page_count() and set_page_refcounted().
This allows more debug checking, and tighter control on how code is allowed
to play around with page->_count.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that compound page handling is properly fixed in the VM, move nommu
over to using compound pages rather than rolling their own refcounting.
nommu vm page refcounting is broken anyway, but there is no need to have
divergent code in the core VM now, nor when it gets fixed.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: David Howells <dhowells@redhat.com>
(Needs testing, please).
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove __put_page from outside the core mm/. It is dangerous because it does
not handle compound pages nicely, and misses 1->0 transitions. If a user
later appears that really needs the extra speed we can reevaluate.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Turn basically everything in vmscan.c into `unsigned long'. This is to avoid
the possibility that some piece of code in there might decide to operate upon
more than 4G (or even 2G) of pages in one hit.
This might be silly, but we'll need it one day.
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When on_each_cpu() runs the callback on other CPUs, it runs with local
interrupts disabled. So we should run the function with local interrupts
disabled on this CPU, too.
And do the same for UP, so the callback is run in the same environment on both
UP and SMP. (strictly it should do preempt_disable() too, but I think
local_irq_disable is sufficiently equivalent).
Also uninlines on_each_cpu(). softirq.c was the most appropriate file I could
find, but it doesn't seem to justify creating a new file.
Oh, and fix up that comment over (under?) x86's smp_call_function(). It
drives me nuts.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_NO_REAP is documented as an option that will cause this slab not to be
reaped under memory pressure. However, that is not what happens. The only
thing that SLAB_NO_REAP controls at the moment is the reclaim of the unused
slab elements that were allocated in batch in cache_reap(). Cache_reap()
is run every few seconds independently of memory pressure.
Could we remove the whole thing? Its only used by three slabs anyways and
I cannot find a reason for having this option.
There is an additional problem with SLAB_NO_REAP. If set then the recovery
of objects from alien caches is switched off. Objects not freed on the
same node where they were initially allocated will only be reused if a
certain amount of objects accumulates from one alien node (not very likely)
or if the cache is explicitly shrunk. (Strangely __cache_shrink does not
check for SLAB_NO_REAP)
Getting rid of SLAB_NO_REAP fixes the problems with alien cache freeing.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since size_t has the same size as a long on all architectures, it's enough
for overflow checks to check against ULONG_MAX.
This change could allow a compiler better optimization (especially in the
n=1 case).
The practical effect seems to be positive, but quite small:
text data bss dec hex filename
21762380 5859870 1848928 29471178 1c1b1ca vmlinux-old
21762211 5859870 1848928 29471009 1c1b121 vmlinux-patched
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clarify that preemption needs to be guarded against with the
__xxx_page_state functions.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Have an explicit mm call to split higher order pages into individual pages.
Should help to avoid bugs and be more explicit about the code's intention.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
atomic_add_unless (atomic_inc_not_zero) no longer requires an offset refcount
to function correctly.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The VM has an interesting race where a page refcount can drop to zero, but it
is still on the LRU lists for a short time. This was solved by testing a 0->1
refcount transition when picking up pages from the LRU, and dropping the
refcount in that case.
Instead, use atomic_add_unless to ensure we never pick up a 0 refcount page
from the LRU, thus a 0 refcount page will never have its refcount elevated
until it is allocated again.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
More atomic operation removal from page allocator
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the page release paths, we can be sure that nobody will mess with our
page->flags because the refcount has dropped to 0. So no need for atomic
operations here.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
PG_active is protected by zone->lru_lock, it does not need TestSet/TestClear
operations.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
PG_lru is protected by zone->lru_lock. It does not need TestSet/TestClear
operations.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch introduces the DMA_28BIT_MASK constant in dma-mapping.h
ALSA drivers using this mask are changed to use the new constant.
Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Jaroslav Kysela <perex@suse.cz>
Allows use of the optional user facility to insert ITU-T
(http://www.itu.int/ITU-T/) specified DTE facilities in call set-up x25
packets. This feature is optional; no facilities will be added if the ioctl
is not used, and call setup packet remains the same as before.
If the ioctls provided by the patch are used, then a facility marker will be
added to the x25 packet header so that the called dte address extension
facility can be differentiated from other types of facilities (as described in
the ITU-T X.25 recommendation) that are also allowed in the x25 packet header.
Facility markers are made up of two octets, and may be present in the x25
packet headers of call-request, incoming call, call accepted, clear request,
and clear indication packets. The first of the two octets represents the
facility code field and is set to zero by this patch. The second octet of the
marker represents the facility parameter field and is set to 0x0F because the
marker will be inserted before ITU-T type DTE facilities.
Since according to ITU-T X.25 Recommendation X.25(10/96)- 7.1 "All networks
will support the facility markers with a facility parameter field set to all
ones or to 00001111", therefore this patch should work with all x.25 networks.
While there are many ITU-T DTE facilities, this patch implements only the
called and calling address extension, with placeholders in the
x25_dte_facilities structure for the rest of the facilities.
Testing:
This patch was tested using a cisco xot router connected on its serial ports
to an X.25 network, and on its lan ports to a host running an xotd daemon.
It is also possible to test this patch using an xotd daemon and an x25tap
patch, where the xotd daemons work back-to-back without actually using an x.25
network. See www.fyonne.net for details on how to do this.
Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Acked-by: Andrew Hendry <ahendry@tusc.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the register_ioctl32_conversion() patch in the kernel is now obsolete,
provide another method to allow 32 bit user space ioctls to reach the kernel.
Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
HPA presence/enabled
HPA commands
Also add ata_id_is_cfa() as that is needed to detect and handle CF cards
which currently we reject.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add some dummy noop functions for use by libata clients
that do not need to do anything. Future SAS patches will
utilize these functions.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
SCSI midlayer has moved hostt->eh_timed_out to transport template. As
libata doesn't need full-blown transport support yet, implement
minimal transport for libata. No transport class or whatsoever, just
empty transport template with ->eh_timed_out hook.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
* master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6:
[CRYPTO] aes: Fixed array boundary violation
[CRYPTO] tcrypt: Fix key alignment
[CRYPTO] all: Add missing cra_alignmask
[CRYPTO] all: Use kzalloc where possible
[CRYPTO] api: Align tfm context as wide as possible
[CRYPTO] twofish: Use rol32/ror32 where appropriate
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (23 commits)
[PATCH] sysfs: fix a kobject leak in sysfs_add_link on the error path
[PATCH] sysfs: don't export dir symbols
[PATCH] get_cpu_sysdev() signedness fix
[PATCH] kobject_add_dir
[PATCH] debugfs: Add debugfs_create_blob() helper for exporting binary data
[PATCH] sysfs: fix problem with duplicate sysfs directories and files
[PATCH] Kobject: kobject.h: fix a typo
[PATCH] Kobject: provide better warning messages when people do stupid things
[PATCH] Driver core: add macros notice(), dev_notice()
[PATCH] firmware: fix BUG: in fw_realloc_buffer
[PATCH] sysfs: kzalloc conversion
[PATCH] fix module sysfs files reference counting
[PATCH] add EXPORT_SYMBOL_GPL_FUTURE() to USB subsystem
[PATCH] add EXPORT_SYMBOL_GPL_FUTURE() to RCU subsystem
[PATCH] add EXPORT_SYMBOL_GPL_FUTURE()
[PATCH] Clean up module.c symbol searching logic
[PATCH] kobj_map semaphore to mutex conversion
[PATCH] kref: avoid an atomic operation in kref_put()
[PATCH] handle errors returned by platform_get_irq*()
[PATCH] driver core: platform_get_irq*(): return -ENXIO on error
...
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] Fix cosmetic typo in asm/irq.h
[ARM] 3367/1: CLCD mode no longer supported on the RealView boards
[ARM] 3366/1: Allow the 16bpp mode configuration in the CLCD control register
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: (112 commits)
[libata] sata_mv: fix irq port status usage
[PATCH] libata: move IDENTIFY info printing from ata_dev_read_id() to ata_dev_configure()
[PATCH] libata: use local *id instead of dev->id in ata_dev_configure()
[PATCH] libata: check Word 88 validity in ata_id_xfer_mask()
[PATCH] libata: fix class handling in ata_bus_probe()
[PATCH] ahci: enable prefetching for PACKET commands
libata: turn on ATAPI by default
[PATCH] sata_sil24: lengthen softreset timeout
[PATCH] sata_sil24: exit early from softreset if SStatus reports no device
[PATCH] libata: fix missing classes[] initialization in ata_bus_probe()
[PATCH] libata: kill unused xfer_mode functions
[PATCH] libata: reimplement ata_set_mode() using xfer_mask helpers
[PATCH] libata: use xfer_mask helpers in ata_dev_set_mode()
[PATCH] libata: use ata_id_xfermask() in ata_dev_configure()
[PATCH] libata: add xfer_mask handling functions
[PATCH] libata: improve xfer mask constants and update ata_mode_string()
[PATCH] libata: rename ATA_FLAG_FLUSH_PIO_TASK to ATA_FLAG_FLUSH_PORT_TASK
[PATCH] libata: kill unused pio_task and packet_task
[PATCH] libata: convert pio_task and packet_task to port_task
[PATCH] libata: implement port_task
...
This merges the DVB tree, but fixes up the history that had gotten
screwed up by a broken commit.
The history is fixed up by re-doing the commit properly (taking the
resolve from the final result of the original), and then cherry-picking
the commits that followed the broken merge.
* dvb: (190 commits)
V4L/DVB (3545): Fixed no_overlay option and quirks on saa7134 driver
V4L/DVB (3543): Fix Makefile to adapt to bt8xx/ conversion
V4L/DVB (3538): Bt8xx documentation update
V4L/DVB (3537a): Whitespace cleanup
V4L/DVB (3533): Add WSS (wide screen signalling) module parameters
V4L/DVB (3532): Moved duplicated code of ALPS BSRU6 tuner to a standalone file.
V4L/DVB (3530): Kconfig: remove VIDEO_AUDIO_DECODER
V4L/DVB (3529): Kconfig: add menu items for cs53l32a and wm8775 A/D converters
V4L/DVB (3528): Kconfig: fix ATSC frontend menu item names by manufacturer
V4L/DVB (3527): VIDEO_CPIA2 must depend on USB
V4L/DVB (3525): Kconfig: remove VIDEO_DECODER
V4L/DVB (3524): Kconfig: add menu items for saa7115 and saa7127
V4L/DVB (3494): Kconfig: select VIDEO_MSP3400 to build msp3400.ko
V4L/DVB (3522): Fixed a trouble with other PAL standards
V4L/DVB (3521): Avoid warnings at video-buf.c
V4L/DVB (3514): SAA7113 doesn't have auto std chroma detection mode
V4L/DVB (3513): Remove saa711x driver
V4L/DVB (3509): Make a needlessly global function static.
V4L/DVB (3506): Cinergy T2 dmx cleanup on disconnect
V4L/DVB (3504): Medion 7134: Autodetect second bridge chip
...
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Conflicts:
Documentation/video4linux/CARDLIST.cx88
drivers/media/video/cx88/Kconfig
drivers/media/video/em28xx/em28xx-video.c
drivers/media/video/saa7134/saa7134-dvb.c
Resolved as in the original merge by Mauro Carvalho Chehab
Since tfm contexts can contain arbitrary types we should provide at least
natural alignment (__attribute__ ((__aligned__))) for them. In particular,
this is needed on the Xscale which is a 32-bit architecture with a u64 type
that requires 64-bit alignment. This problem was reported by Ronen Shitrit.
The crypto_tfm structure's size was 44 bytes on 32-bit architectures and
80 bytes on 64-bit architectures. So adding this requirement only means
that we have to add an extra 4 bytes on 32-bit architectures.
On i386 the natural alignment is 16 bytes which also benefits the VIA
Padlock as it no longer has to manually align its context structure to
128 bits.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jing Min Zhao <zhaojignmin@hotmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move nf_bridge_alloc from header file to the one place it is
used and optimize it.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will later be included in struct dccp_request_sock so that we can
have per connection feature negotiation state while in the 3way
handshake, when we clone the DCCP_ROLE_LISTEN socket (in
dccp_create_openreq_child) we'll just copy this state from
dreq_minisock to dccps_minisock.
Also the feature negotiation and option parsing code will mostly touch
dccps_minisock, which will simplify some stuff.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch extends {get|set}sockopt compatibility layer in order to
move protocol specific parts to their place and avoid huge universal
net/compat.c file in the future.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We're now starting to have quite a number of places that do skb_pull
followed immediately by an skb_postpull_rcsum. We can merge these two
operations into one function with skb_pull_rcsum. This makes sense
since most pull operations on receive skb's need to update the
checksum.
I've decided to make this out-of-line since it is fairly big and the
fast path where hardware checksums are enabled need to call
csum_partial anyway.
Since this is a brand new function we get to add an extra check on the
len argument. As it is most callers of skb_pull ignore its return
value which essentially means that there is no check on the len
argument.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The typedef for dn_address has been removed in favour of using __le16
or __u16 directly as appropriate. All the DECnet header files are
updated accordingly.
The byte ordering of dn_eth2dn() and dn_dn2eth() are both changed
since just about all their callers wanted network order rather than
host order, so the conversion is now done in the functions themselves.
Several missed endianess conversions have been picked up during the
conversion process. The nh_gw field in struct dn_fib_info has been
changed from a 32 bit field to 16 bits as it ought to be.
One or two cases of using htons rather than dn_htons in the routing
code have been found and fixed.
There are still a few warnings to fix, but this patch deals with the
important cases.
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements an application of the LSM-IPSec networking
controls whereby an application can determine the label of the
security association its TCP or UDP sockets are currently connected to
via getsockopt and the auxiliary data mechanism of recvmsg.
Patch purpose:
This patch enables a security-aware application to retrieve the
security context of an IPSec security association a particular TCP or
UDP socket is using. The application can then use this security
context to determine the security context for processing on behalf of
the peer at the other end of this connection. In the case of UDP, the
security context is for each individual packet. An example
application is the inetd daemon, which could be modified to start
daemons running at security contexts dependent on the remote client.
Patch design approach:
- Design for TCP
The patch enables the SELinux LSM to set the peer security context for
a socket based on the security context of the IPSec security
association. The application may retrieve this context using
getsockopt. When called, the kernel determines if the socket is a
connected (TCP_ESTABLISHED) TCP socket and, if so, uses the dst_entry
cache on the socket to retrieve the security associations. If a
security association has a security context, the context string is
returned, as for UNIX domain sockets.
- Design for UDP
Unlike TCP, UDP is connectionless. This requires a somewhat different
API to retrieve the peer security context. With TCP, the peer
security context stays the same throughout the connection, thus it can
be retrieved at any time between when the connection is established
and when it is torn down. With UDP, each read/write can have
different peer and thus the security context might change every time.
As a result the security context retrieval must be done TOGETHER with
the packet retrieval.
The solution is to build upon the existing Unix domain socket API for
retrieving user credentials. Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message).
Patch implementation details:
- Implementation for TCP
The security context can be retrieved by applications using getsockopt
with the existing SO_PEERSEC flag. As an example (ignoring error
checking):
getsockopt(sockfd, SOL_SOCKET, SO_PEERSEC, optbuf, &optlen);
printf("Socket peer context is: %s\n", optbuf);
The SELinux function, selinux_socket_getpeersec, is extended to check
for labeled security associations for connected (TCP_ESTABLISHED ==
sk->sk_state) TCP sockets only. If so, the socket has a dst_cache of
struct dst_entry values that may refer to security associations. If
these have security associations with security contexts, the security
context is returned.
getsockopt returns a buffer that contains a security context string or
the buffer is unmodified.
- Implementation for UDP
To retrieve the security context, the application first indicates to
the kernel such desire by setting the IP_PASSSEC option via
getsockopt. Then the application retrieves the security context using
the auxiliary data mechanism.
An example server application for UDP should look like this:
toggle = 1;
toggle_len = sizeof(toggle);
setsockopt(sockfd, SOL_IP, IP_PASSSEC, &toggle, &toggle_len);
recvmsg(sockfd, &msg_hdr, 0);
if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
cmsg_hdr->cmsg_level == SOL_IP &&
cmsg_hdr->cmsg_type == SCM_SECURITY) {
memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
}
}
ip_setsockopt is enhanced with a new socket option IP_PASSSEC to allow
a server socket to receive security context of the peer. A new
ancillary message type SCM_SECURITY.
When the packet is received we get the security context from the
sec_path pointer which is contained in the sk_buff, and copy it to the
ancillary message space. An additional LSM hook,
selinux_socket_getpeersec_udp, is defined to retrieve the security
context from the SELinux space. The existing function,
selinux_socket_getpeersec does not suit our purpose, because the
security context is copied directly to user space, rather than to
kernel space.
Testing:
We have tested the patch by setting up TCP and UDP connections between
applications on two machines using the IPSec policies that result in
labeled security associations being built. For TCP, we can then
extract the peer security context using getsockopt on either end. For
UDP, the receiving end can retrieve the security context using the
auxiliary data mechanism of recvmsg.
Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Back in the dark ages, we had to be conservative and only allow 15-bit
window fields if the window scale option was not negotiated. Some
ancient stacks used a signed 16-bit quantity for the window field of
the TCP header and would get confused.
Those days are long gone, so we can use the full 16-bits by default
now.
There is a sysctl added so that we can still interact with such old
stacks
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of the old __dev_put macro that is just a hold over from pre 2.6
kernel. And turn dev_hold into an inline instead of a macro.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch turns the RTNL from a semaphore to a new 2.6.16 mutex and
gets rid of some of the leftover legacy.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here goes a patch for supporting TOIM3232 based serial IrDA dongles.
The code is based on the tekram dongle code.
It's been tested with a TOIM3232 based IRWave 320S dongle. It may work
for TOIM4232 dongles, although it's not been tested.
Signed-off-by: David Basden <davidb-irda@rcpt.to>
Signed-off-by: Samuel Ortiz <samuel.ortiz@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This moves some TCP-specific MTU probing state out of
inet_connection_sock back to tcp_sock.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Uninline kfree_skb, which saves some 15k of object code on my notebook.
o Allow kfree_skb to be called with a NULL argument.
Subsequent patches can remove conditional from drivers and further
reduce source and object size.
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct xfrm_aevent_id needs to be 32-bit + 64-bit align friendly.
Based upon suggestions from Yoshifuji.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
The nlmsvc_traverse_shares return value is always zero, hence useless.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Note that we never return non-zero.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use a spinlock to ensure unique sequence numbers when creating krb5 gss tokens.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[root@qemu ~]# for a in /proc/sys/net/dccp/default/* ; do echo $a ; cat $a ; done
/proc/sys/net/dccp/default/ack_ratio
2
/proc/sys/net/dccp/default/rx_ccid
3
/proc/sys/net/dccp/default/send_ackvec
1
/proc/sys/net/dccp/default/send_ndp
1
/proc/sys/net/dccp/default/seq_window
100
/proc/sys/net/dccp/default/tx_ccid
3
[root@qemu ~]#
So if wanting to test ccid3 as the tx CCID one can just do:
[root@qemu ~]# echo 3 > /proc/sys/net/dccp/default/tx_ccid
[root@qemu ~]# echo 2 > /proc/sys/net/dccp/default/rx_ccid
[root@qemu ~]# cat /proc/sys/net/dccp/default/[tr]x_ccid
2
3
[root@qemu ~]#
Of course we also need the setsockopt for each app to tell its preferences, but
for testing or defining something other than CCID2 as the default for apps that
don't explicitely set their preference the sysctl interface is handy.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As per the draft. This fixes the build when netfilter dccp components
are built and dccp isn't. Thanks to Reuben Farrelly for reporting
this.
The following changesets will introduce /proc/sys/net/dccp/defaults/
to give more flexibility to DCCP developers and testers while apps
doesn't use setsockopt to specify the desired CCID, etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This also fixes the layout of dccp_hdr short sequence numbers, problem
was not fatal now as we only support long (48 bits) sequence numbers.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bridge netfilter code simulates the NF_IP_PRE_ROUTING hook and skips
the real hook by registering with high priority and returning NF_STOP if
skb->nf_bridge is present and the BRNF_NF_BRIDGE_PREROUTING flag is not
set. The flag is only set during the simulated hook.
Because skb->nf_bridge is only freed when the packet is destroyed, the
packet will not only skip the first invocation of NF_IP_PRE_ROUTING, but
in the case of tunnel devices on top of the bridge also all further ones.
Forwarded packets from a bridge encapsulated by a tunnel device and sent
as locally outgoing packet will also still have the incorrect bridge
information from the input path attached.
We already have nf_reset calls on all RX/TX paths of tunnel devices,
so simply reset the nf_bridge field there too. As an added bonus,
the bridge information for locally delivered packets is now also freed
when the packet is queued to a socket.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>