kernel-ark

Author	SHA1	Message	Date
Peter Zijlstra	4d78e7b656	sched: new task placement for vruntime add proper new task placement for the vruntime based math too. ( note: introduces a swap() macro, but the swap token is too widely used in the kernel namespace for a generic version to be added without changing non-scheduler code - so this cleanup will be done separately. ) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:04 +02:00
Ingo Molnar	6cb5819514	sched: optimize vruntime based scheduling optimize vruntime based scheduling. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:04 +02:00
Ingo Molnar	bf5c91ba8c	sched: move sched_feat() definitions move sched_feat() definitions so that it can be used sooner by generic code too. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:04 +02:00
Ingo Molnar	e9acbff648	sched: introduce se->vruntime introduce se->vruntime as a sum of weighted delta-exec's, and use that as the key into the tree. the idea to use absolute virtual time as the basic metric of scheduling has been first raised by William Lee Irwin, advanced by Tong Li and first prototyped by Roman Zippel in the "Really Fair Scheduler" (RFS) patchset. also see: http://lkml.org/lkml/2007/9/2/76 for a simpler variant of this patch. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:04 +02:00
Ingo Molnar	08e2388aa1	sched: clean up calc_weighted() clean up calc_weighted() - we always use the normalized shift so it's not needed to pass that in. Also, push the non-nice0 branch into the function. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:04 +02:00
Ingo Molnar	1091985b48	sched: speed up update_load_add/_sub() speed up update_load_add/_sub() by not delaying the division - this reduces CPU pipeline dependencies. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:04 +02:00
Ingo Molnar	19ccd97a03	sched: uninline __enqueue_entity()/__dequeue_entity() suggested by Roman Zippel: uninline __enqueue_entity() and __dequeue_entity(). this reduces code size: text data bss dec hex filename 25385 2386 16 27787 6c8b sched.o.before 25257 2386 16 27659 6c0b sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:04 +02:00
Peter Zijlstra	e59c80c5bb	sched: simplify SCHED_FEAT_* code Peter Zijlstra suggested to simplify SCHED_FEAT_* checks via the sched_feat(x) macro. No code changed: text data bss dec hex filename 38895 3550 24 42469 a5e5 sched.o.before 38895 3550 24 42469 a5e5 sched.o.after Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:03 +02:00
Ingo Molnar	429d43bcc0	sched: cleanup: simplify cfs_rq_curr() methods cleanup: simplify cfs_rq_curr() methods - now that the cfs_rq->curr pointer is unconditionally present, remove the wrappers. kernel/sched.o: text data bss dec hex filename 11784 224 2012 14020 36c4 sched.o.before 11784 224 2012 14020 36c4 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:03 +02:00
Ingo Molnar	62160e3f4a	sched: track cfs_rq->curr on !group-scheduling too Noticed by Roman Zippel: use cfs_rq->curr in the !group-scheduling case too. Small micro-optimization and cleanup effect: text data bss dec hex filename 36269 3482 24 39775 9b5f sched.o.before 36177 3486 24 39687 9b07 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:03 +02:00
Ingo Molnar	53df556e06	sched: remove precise CPU load calculations #2 continued removal of precise CPU load calculations. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:03 +02:00
Ingo Molnar	a25707f3ae	sched: remove precise CPU load CPU load calculations are statistical anyway, and there's little benefit from having it calculated on every scheduling event. So remove this code, it gets rid of a divide from the scheduler wakeup and context-switch fastpath. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:03 +02:00
Ingo Molnar	8ebc91d936	sched: remove stat_gran remove the stat_gran code - it was disabled by default and it causes unnecessary overhead. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:03 +02:00
Ingo Molnar	2bd8e6d422	sched: use constants if !CONFIG_SCHED_DEBUG use constants if !CONFIG_SCHED_DEBUG. this speeds up the code and reduces code-size: text data bss dec hex filename 27464 3014 16 30494 771e sched.o.before 26929 3010 20 29959 7507 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:02 +02:00
Ingo Molnar	38ad464d41	sched: uniform tunings use the same defaults on both UP and SMP. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:02 +02:00
Ingo Molnar	eba1ed4b7e	sched: debug: track maximum 'slice' track the maximum amount of time a task has executed while the CPU load was at least 2x. (i.e. at least two nice-0 tasks were runnable) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:02 +02:00
Ingo Molnar	a4b29ba2f7	sched: small sched_debug cleanup small kernel/sched_debug.c cleanup - break up multi-variable assignment. no code changed: text data bss dec hex filename 38869 3550 24 42443 a5cb sched.o.before 38869 3550 24 42443 a5cb sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:02 +02:00
Matthias Kaehlcke	2e45874c5a	sched: use list_for_each_entry_safe() in __wake_up_common() Use list_for_each_entry_safe() instead of list_for_each_safe() in __wake_up_common() Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:02 +02:00
Ingo Molnar	bb61c21083	sched: resched task in task_new_fair() to get full child-runs-first semantics make sure the parent is rescheduled. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:02 +02:00
Ingo Molnar	44142fac34	sched: fix sysctl_sched_child_runs_first flag fix the sched_child_runs_first flag: always call into ->task_new() if we are on the same CPU, as SCHED_OTHER tasks depend on it for correct initial setup. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:01 +02:00
Thomas Gleixner	1595f452f3	clockevents: introduce force broadcast notifier The 64bit SMP bootup is slightly different to the 32bit one. It enables the boot CPU local APIC timer before all CPUs are brought up. Some AMD C1E systems have the C1E feature flag only set in the secondary CPU. Due to the early enable of the boot CPU local APIC timer the APIC timer is registered as a fully functional device. When we detect the wreckage during the bringup of the secondary CPU, we need to force the boot CPU into broadcast mode. Add a new notifier reason and implement the force broadcast in the clock events layer. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-14 22:57:45 +02:00
Al Viro	5ba253313d	more low-hanging fruits - kernel, fs, lib signedness Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-14 12:41:52 -07:00
Al Viro	2b8232ce51	minimal build fixes for uml (fallout from x86 merge) a) include/asm-um/arch can't just point to include/asm-$(SUBARCH) now b) arch/{i386,x86_64}/crypto are merged now c) subarch-obj needed changes d) cpufeature_64.h should pull "cpufeature_32.h", not <asm/cpufeature_32.h> since it can be included from asm-um/cpufeature.h e) in case of uml-i386 we need CONFIG_X86_32 for make and gcc, but not for Kconfig f) sysctl.c shouldn't do vdso_enabled for uml-i386 (actually, that one should be registered from corresponding arch//kernel/, with ifdef going away; that's a separate patch, though). With that and with Stephen's patch ("[PATCH net-2.6] uml: hard_header fix") we have uml allmodconfig building both on i386 and amd64. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-13 09:57:15 -07:00
Venki Pallipadi	4a93232dab	clock events: allow replacement of broadcast timer Change the broadcast timer, if a timer with higher rating becomes available. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-12 23:04:23 +02:00
Thomas Gleixner	c8a1d398de	clockevents: fix periodic broadcast for oneshot devices The next_event member of the clock event device is used to keep track of the next periodic event. For one shot only devices it is wrong to clear the variable, as the next event will be based on it. Pointed out by Ralf Baechle Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2007-10-12 23:04:06 +02:00
Thomas Gleixner	de68d9b173	clockevents: Allow build w/o run-tine usage for migration purposes Migration aid to allow preparatory patches which introduce not yet used parts of clock events code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2007-10-12 23:04:05 +02:00
Linus Torvalds	e86908614f	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (408 commits) [POWERPC] Add memchr() to the bootwrapper [POWERPC] Implement logging of unhandled signals [POWERPC] Add legacy serial support for OPB with flattened device tree [POWERPC] Use 1TB segments [POWERPC] XilinxFB: Allow fixed framebuffer base address [POWERPC] XilinxFB: Add support for custom screen resolution [POWERPC] XilinxFB: Use pdata to pass around framebuffer parameters [POWERPC] PCI: Add 64-bit physical address support to setup_indirect_pci [POWERPC] 4xx: Kilauea defconfig file [POWERPC] 4xx: Kilauea DTS [POWERPC] 4xx: Add AMCC Kilauea eval board support to platforms/40x [POWERPC] 4xx: Add AMCC 405EX support to cputable.c [POWERPC] Adjust TASK_SIZE on ppc32 systems to 3GB that are capable [POWERPC] Use PAGE_OFFSET to tell if an address is user/kernel in SW TLB handlers [POWERPC] 85xx: Enable FP emulation in MPC8560 ADS defconfig [POWERPC] 85xx: Killed <asm/mpc85xx.h> [POWERPC] 85xx: Add cpm nodes for 8541/8555 CDS [POWERPC] 85xx: Convert mpc8560ads to the new CPM binding. [POWERPC] mpc8272ads: Remove muram from the CPM reg property. [POWERPC] Make clockevents work on PPC601 processors ... Fixed up conflict in Documentation/powerpc/booting-without-of.txt manually.	2007-10-11 21:55:47 -07:00
Olof Johansson	d0c3d534a4	[POWERPC] Implement logging of unhandled signals Implement show_unhandled_signals sysctl + support to print when a process is killed due to unhandled signals just as i386 and x86_64 does. Default to having it off, unlike x86 that defaults on. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-10-12 14:05:18 +10:00
Linus Torvalds	038a5008b2	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (867 commits) [SKY2]: status polling loop (post merge) [NET]: Fix NAPI completion handling in some drivers. [TCP]: Limit processing lost_retrans loop to work-to-do cases [TCP]: Fix lost_retrans loop vs fastpath problems [TCP]: No need to re-count fackets_out/sacked_out at RTO [TCP]: Extract tcp_match_queue_to_sack from sacktag code [TCP]: Kill almost unused variable pcount from sacktag [TCP]: Fix mark_head_lost to ignore R-bit when trying to mark L [TCP]: Add bytes_acked (ABC) clearing to FRTO too [IPv6]: Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493, try2 [NETFILTER]: x_tables: add missing ip6t_modulename aliases [NETFILTER]: nf_conntrack_tcp: fix connection reopening [QETH]: fix qeth_main.c [NETLINK]: fib_frontend build fixes [IPv6]: Export userland ND options through netlink (RDNSS support) [9P]: build fix with !CONFIG_SYSCTL [NET]: Fix dev_put() and dev_hold() comments [NET]: make netlink user -> kernel interface synchronious [NET]: unify netlink kernel socket recognition [NET]: cleanup 3rd argument in netlink_sendskb ... Fix up conflicts manually in Documentation/feature-removal-schedule.txt and my new least favourite crap, the "mod_devicetable" support in the files include/linux/mod_devicetable.h and scripts/mod/file2alias.c. (The latter files seem to be explicitly _designed_ to get conflicts when different subsystems work with them - that have an absolutely horrid lack of subsystem separation!) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-11 19:40:14 -07:00
Denis V. Lunev	cd40b7d398	[NET]: make netlink user -> kernel interface synchronious This patch make processing netlink user -> kernel messages synchronious. This change was inspired by the talk with Alexey Kuznetsov about current netlink messages processing. He says that he was badly wrong when introduced asynchronious user -> kernel communication. The call netlink_unicast is the only path to send message to the kernel netlink socket. But, unfortunately, it is also used to send data to the user. Before this change the user message has been attached to the socket queue and sk->sk_data_ready was called. The process has been blocked until all pending messages were processed. The bad thing is that this processing may occur in the arbitrary process context. This patch changes nlk->data_ready callback to get 1 skb and force packet processing right in the netlink_unicast. Kernel -> user path in netlink_unicast remains untouched. EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock drop, but the process remains in the cycle until the message will be fully processed. So, there is no need to use this kludges now. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 21:15:29 -07:00
Eric W. Biederman	9dd776b6d7	[NET]: Add network namespace clone & unshare support. This patch allows you to create a new network namespace using sys_clone, or sys_unshare. As the network namespace is still experimental and under development clone and unshare support is only made available when CONFIG_NET_NS is selected at compile time. As this patch introduces network namespace support into code paths that exist when the CONFIG_NET is not selected there are a few additions made to net_namespace.h to allow a few more functions to be used when the networking stack is not compiled in. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:46 -07:00
Adrian Bunk	464771fe47	[KERNEL]: Unexport raise_softirq_irqoff raise_softirq_irqoff no longer has any modular user. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:18 -07:00
Eric W. Biederman	b4b510290b	[NET]: Support multiple network namespaces with netlink Each netlink socket will live in exactly one network namespace, this includes the controlling kernel sockets. This patch updates all of the existing netlink protocols to only support the initial network namespace. Request by clients in other namespaces will get -ECONREFUSED. As they would if the kernel did not have the support for that netlink protocol compiled in. As each netlink protocol is updated to be multiple network namespace safe it can register multiple kernel sockets to acquire a presence in the rest of the network namespaces. The implementation in af_netlink is a simple filter implementation at hash table insertion and hash table look up time. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:09 -07:00
Robert Olsson	c45248c701	[SOFTIRQ]: Remove do_softirq() symbol export. As noted by Christoph Hellwig, pktgen was the only user so it can now be removed. [ Add missing cases caught by Adrian Bunk. -DaveM ] Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:36 -07:00
Arnaldo Carvalho de Melo	a272378d11	[KTIME]: Introduce ktime_sub_ns and ktime_sub_us First user will be the DCCP transport networking protocol. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:12 -07:00
Jens Axboe	f5ff8422bb	Fix warnings with !CONFIG_BLOCK Hide everything in blkdev.h with CONFIG_BLOCK isn't set, and fixup the (few) files that fail to build because they were relying on blkdev.h pulling in extra includes for them. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-10 09:25:57 +02:00
Al Viro	291041e935	fix bogus reporting of signals by audit Async signals should not be reported as sent by current in audit log. As it is, we call audit_signal_info() too early in check_kill_permission(). Note that check_kill_permission() has that test already - it needs to know if it should apply current-based permission checks. So the solution is to move the call of audit_signal_info() between those. Bogosity in question is easily reproduced - add a rule watching for e.g. kill(2) from specific process (so that audit_signal_info() would not short-circuit to nothing), say load_policy, watch the bogus OBJ_PID entry in audit logs claiming that write(2) on selinuxfs file issued by load_policy(8) had somehow managed to send a signal to syslogd... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Steve Grubb <sgrubb@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-07 16:28:43 -07:00
Anton Blanchard	74922be148	Fix timer_stats printout of events/sec When using /proc/timer_stats on ppc64 I noticed the events/sec field wasnt accurate. Sometimes the integer part was incorrect due to rounding (we werent taking the fractional seconds into consideration). The fraction part is also wrong, we need to pad the printf statement and take the bottom three digits of 1000 times the value. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-07 16:28:43 -07:00
Ingo Molnar	30084fbd1c	sched: fix profile=sleep fix sleep profiling - we lost this chunk in the CFS merge. Found-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-02 14:13:08 +02:00
Martin Schwidefsky	9f96cb1e8b	robust futex thread exit race Calling handle_futex_death in exit_robust_list for the different robust mutexes of a thread basically frees the mutex. Another thread might grab the lock immediately which updates the next pointer of the mutex. fetch_robust_entry over the next pointer might therefore branch into the robust mutex list of a different thread. This can cause two problems: 1) some mutexes held by the dead thread are not getting freed and 2) some mutexs held by a different thread are freed. The next point need to be read before calling handle_futex_death. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-01 07:52:23 -07:00
Mark Lord	4047727e5a	Fix SMP poweroff hangs We need to disable all CPUs other than the boot CPU (usually 0) before attempting to power-off modern SMP machines. This fixes the hang-on-poweroff issue on my MythTV SMP box, and also on Thomas Gleixner's new toybox. Signed-off-by: Mark Lord <mlord@pobox.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-01 07:52:23 -07:00
Al Viro	459685c75b	hibernation doesn't even build on frv - tons of helpers are missing Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-By: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-26 09:22:04 -07:00
Thomas Gleixner	b7e113dc9d	clockevents: remove the suspend/resume workaround^Wthinko In a desparate attempt to fix the suspend/resume problem on Andrews VAIO I added a workaround which enforced the broadcast of the oneshot timer on resume. This was actually resolving the problem on the VAIO but was just a stupid workaround, which was not tackling the root cause: the assignement of lower idle C-States in the ACPI processor_idle code. The cpuidle patches, which utilize the dynamic tick feature and go faster into deeper C-states exposed the problem again. The correct solution is the previous patch, which prevents lower C-states across the suspend/resume. Remove the enforcement code, including the conditional broadcast timer arming, which helped to pamper over the real problem for quite a time. The oneshot broadcast flag for the cpu, which runs the resume code can never be set at the time when this code is executed. It only gets set, when the CPU is entering a lower idle C-State. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Len Brown <lenb@kernel.org> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-22 17:15:34 -07:00
Davide Libenzi	b8fceee17a	signalfd simplification This simplifies signalfd code, by avoiding it to remain attached to the sighand during its lifetime. In this way, the signalfd remain attached to the sighand only during poll(2) (and select and epoll) and read(2). This also allows to remove all the custom "tsk == current" checks in kernel/signal.c, since dequeue_signal() will only be called by "current". I think this is also what Ben was suggesting time ago. The external effect of this, is that a thread can extract only its own private signals and the group ones. I think this is an acceptable behaviour, in that those are the signals the thread would be able to fetch w/out signalfd. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-20 13:19:59 -07:00
Hiroshi Shimamoto	9c95e7319b	sched: fix invalid sched_class use When using rt_mutex, a NULL pointer dereference is occurred at enqueue_task_rt. Here is a scenario; 1) there are two threads, the thread A is fair_sched_class and thread B is rt_sched_class. 2) Thread A is boosted up to rt_sched_class, because the thread A has a rt_mutex lock and the thread B is waiting the lock. 3) At this time, when thread A create a new thread C, the thread C has a rt_sched_class. 4) When doing wake_up_new_task() for the thread C, the priority of the thread C is out of the RT priority range, because the normal priority of thread A is not the RT priority. It makes data corruption by overflowing the rt_prio_array. The new thread C should be fair_sched_class. The new thread should be valid scheduler class before queuing. This patch fixes to set the suitable scheduler class. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2007-09-19 23:34:46 +02:00
Ingo Molnar	1799e35d5b	sched: add /proc/sys/kernel/sched_compat_yield add /proc/sys/kernel/sched_compat_yield to make sys_sched_yield() more agressive, by moving the yielding task to the last position in the rbtree. with sched_compat_yield=0: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2539 mingo 20 0 1576 252 204 R 50 0.0 0:02.03 loop_yield 2541 mingo 20 0 1576 244 196 R 50 0.0 0:02.05 loop with sched_compat_yield=1: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2584 mingo 20 0 1576 248 196 R 99 0.0 0:52.45 loop 2582 mingo 20 0 1576 256 204 R 0 0.0 0:00.00 loop_yield Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2007-09-19 23:34:46 +02:00
Pavel Emelyanov	28f300d236	Fix user namespace exiting OOPs It turned out, that the user namespace is released during the do_exit() in exit_task_namespaces(), but the struct user_struct is released only during the put_task_struct(), i.e. MUCH later. On debug kernels with poisoned slabs this will cause the oops in uid_hash_remove() because the head of the chain, which resides inside the struct user_namespace, will be already freed and poisoned. Since the uid hash itself is required only when someone can search it, i.e. when the namespace is alive, we can safely unhash all the user_struct-s from it during the namespace exiting. The subsequent free_uid() will complete the user_struct destruction. For example simple program #include <sched.h> char stack[2 * 1024 * 1024]; int f(void foo) { return 0; } int main(void) { clone(f, stack + 1 1024 * 1024, 0x10000000, 0); return 0; } run on kernel with CONFIG_USER_NS turned on will oops the kernel immediately. This was spotted during OpenVZ kernel testing. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Acked-by: "Serge E. Hallyn" <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-19 11:24:18 -07:00
Pavel Emelyanov	735de2230f	Convert uid hash to hlist Surprisingly, but (spotted by Alexey Dobriyan) the uid hash still uses list_heads, thus occupying twice as much place as it could. Convert it to hlist_heads. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-19 11:24:18 -07:00
Matthias Kaehlcke	d8a4821dca	kernel/user.c: Use list_for_each_entry instead of list_for_each kernel/user.c: Convert list_for_each to list_for_each_entry in uid_hash_find() Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-19 11:24:18 -07:00
Alexey Dobriyan	efc63c4fb0	Fix UTS corruption during clone(CLONE_NEWUTS) struct utsname is copied from master one without any exclusion. Here is sample output from one proggie doing sethostname("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"); sethostname("bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"); and another clone(,, CLONE_NEWUTS, ...) uname() hostname = 'aaaaaaaaaaaaaaaaaaaaaaaaabbbbb' hostname = 'bbbaaaaaaaaaaaaaaaaaaaaaaaaaaa' hostname = 'aaaaaaaabbbbbbbbbbbbbbbbbbbbbb' hostname = 'aaaaaaaaaaaaaaaaaaaaaaaaaabbbb' hostname = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaabb' hostname = 'aaabbbbbbbbbbbbbbbbbbbbbbbbbbb' hostname = 'bbbbbbbbbbbbbbbbaaaaaaaaaaaaaa' Hostname is sometimes corrupted. Yes, even _the_ simplest namespace activity had bug in it. :-( Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-19 11:24:17 -07:00

1 2 3 4 5 ...

2582 Commits