kernel-ark

Author	SHA1	Message	Date
Adrian Bunk	464771fe47	[KERNEL]: Unexport raise_softirq_irqoff raise_softirq_irqoff no longer has any modular user. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:18 -07:00
Eric W. Biederman	b4b510290b	[NET]: Support multiple network namespaces with netlink Each netlink socket will live in exactly one network namespace, this includes the controlling kernel sockets. This patch updates all of the existing netlink protocols to only support the initial network namespace. Request by clients in other namespaces will get -ECONREFUSED. As they would if the kernel did not have the support for that netlink protocol compiled in. As each netlink protocol is updated to be multiple network namespace safe it can register multiple kernel sockets to acquire a presence in the rest of the network namespaces. The implementation in af_netlink is a simple filter implementation at hash table insertion and hash table look up time. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:09 -07:00
Robert Olsson	c45248c701	[SOFTIRQ]: Remove do_softirq() symbol export. As noted by Christoph Hellwig, pktgen was the only user so it can now be removed. [ Add missing cases caught by Adrian Bunk. -DaveM ] Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:36 -07:00
Arnaldo Carvalho de Melo	a272378d11	[KTIME]: Introduce ktime_sub_ns and ktime_sub_us First user will be the DCCP transport networking protocol. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:12 -07:00
Jens Axboe	f5ff8422bb	Fix warnings with !CONFIG_BLOCK Hide everything in blkdev.h with CONFIG_BLOCK isn't set, and fixup the (few) files that fail to build because they were relying on blkdev.h pulling in extra includes for them. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-10 09:25:57 +02:00
Trond Myklebust	50e437d522	SUNRPC: Convert rpc_pipefs to use the generic filesystem notification hooks This will allow rpc.gssd to use inotify instead of dnotify in order to locate new rpc upcall pipes. This also requires the exporting of __audit_inode_child(), which is used by fsnotify_create() and fsnotify_mkdir(). Ccing David Woodhouse. Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-10-09 17:15:26 -04:00
Al Viro	291041e935	fix bogus reporting of signals by audit Async signals should not be reported as sent by current in audit log. As it is, we call audit_signal_info() too early in check_kill_permission(). Note that check_kill_permission() has that test already - it needs to know if it should apply current-based permission checks. So the solution is to move the call of audit_signal_info() between those. Bogosity in question is easily reproduced - add a rule watching for e.g. kill(2) from specific process (so that audit_signal_info() would not short-circuit to nothing), say load_policy, watch the bogus OBJ_PID entry in audit logs claiming that write(2) on selinuxfs file issued by load_policy(8) had somehow managed to send a signal to syslogd... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Steve Grubb <sgrubb@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-07 16:28:43 -07:00
Anton Blanchard	74922be148	Fix timer_stats printout of events/sec When using /proc/timer_stats on ppc64 I noticed the events/sec field wasnt accurate. Sometimes the integer part was incorrect due to rounding (we werent taking the fractional seconds into consideration). The fraction part is also wrong, we need to pad the printf statement and take the bottom three digits of 1000 times the value. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-07 16:28:43 -07:00
Ingo Molnar	30084fbd1c	sched: fix profile=sleep fix sleep profiling - we lost this chunk in the CFS merge. Found-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-02 14:13:08 +02:00
Martin Schwidefsky	9f96cb1e8b	robust futex thread exit race Calling handle_futex_death in exit_robust_list for the different robust mutexes of a thread basically frees the mutex. Another thread might grab the lock immediately which updates the next pointer of the mutex. fetch_robust_entry over the next pointer might therefore branch into the robust mutex list of a different thread. This can cause two problems: 1) some mutexes held by the dead thread are not getting freed and 2) some mutexs held by a different thread are freed. The next point need to be read before calling handle_futex_death. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-01 07:52:23 -07:00
Mark Lord	4047727e5a	Fix SMP poweroff hangs We need to disable all CPUs other than the boot CPU (usually 0) before attempting to power-off modern SMP machines. This fixes the hang-on-poweroff issue on my MythTV SMP box, and also on Thomas Gleixner's new toybox. Signed-off-by: Mark Lord <mlord@pobox.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-01 07:52:23 -07:00
Al Viro	459685c75b	hibernation doesn't even build on frv - tons of helpers are missing Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-By: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-26 09:22:04 -07:00
Thomas Gleixner	b7e113dc9d	clockevents: remove the suspend/resume workaround^Wthinko In a desparate attempt to fix the suspend/resume problem on Andrews VAIO I added a workaround which enforced the broadcast of the oneshot timer on resume. This was actually resolving the problem on the VAIO but was just a stupid workaround, which was not tackling the root cause: the assignement of lower idle C-States in the ACPI processor_idle code. The cpuidle patches, which utilize the dynamic tick feature and go faster into deeper C-states exposed the problem again. The correct solution is the previous patch, which prevents lower C-states across the suspend/resume. Remove the enforcement code, including the conditional broadcast timer arming, which helped to pamper over the real problem for quite a time. The oneshot broadcast flag for the cpu, which runs the resume code can never be set at the time when this code is executed. It only gets set, when the CPU is entering a lower idle C-State. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Len Brown <lenb@kernel.org> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-22 17:15:34 -07:00
Davide Libenzi	b8fceee17a	signalfd simplification This simplifies signalfd code, by avoiding it to remain attached to the sighand during its lifetime. In this way, the signalfd remain attached to the sighand only during poll(2) (and select and epoll) and read(2). This also allows to remove all the custom "tsk == current" checks in kernel/signal.c, since dequeue_signal() will only be called by "current". I think this is also what Ben was suggesting time ago. The external effect of this, is that a thread can extract only its own private signals and the group ones. I think this is an acceptable behaviour, in that those are the signals the thread would be able to fetch w/out signalfd. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-20 13:19:59 -07:00
Hiroshi Shimamoto	9c95e7319b	sched: fix invalid sched_class use When using rt_mutex, a NULL pointer dereference is occurred at enqueue_task_rt. Here is a scenario; 1) there are two threads, the thread A is fair_sched_class and thread B is rt_sched_class. 2) Thread A is boosted up to rt_sched_class, because the thread A has a rt_mutex lock and the thread B is waiting the lock. 3) At this time, when thread A create a new thread C, the thread C has a rt_sched_class. 4) When doing wake_up_new_task() for the thread C, the priority of the thread C is out of the RT priority range, because the normal priority of thread A is not the RT priority. It makes data corruption by overflowing the rt_prio_array. The new thread C should be fair_sched_class. The new thread should be valid scheduler class before queuing. This patch fixes to set the suitable scheduler class. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2007-09-19 23:34:46 +02:00
Ingo Molnar	1799e35d5b	sched: add /proc/sys/kernel/sched_compat_yield add /proc/sys/kernel/sched_compat_yield to make sys_sched_yield() more agressive, by moving the yielding task to the last position in the rbtree. with sched_compat_yield=0: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2539 mingo 20 0 1576 252 204 R 50 0.0 0:02.03 loop_yield 2541 mingo 20 0 1576 244 196 R 50 0.0 0:02.05 loop with sched_compat_yield=1: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2584 mingo 20 0 1576 248 196 R 99 0.0 0:52.45 loop 2582 mingo 20 0 1576 256 204 R 0 0.0 0:00.00 loop_yield Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2007-09-19 23:34:46 +02:00
Pavel Emelyanov	28f300d236	Fix user namespace exiting OOPs It turned out, that the user namespace is released during the do_exit() in exit_task_namespaces(), but the struct user_struct is released only during the put_task_struct(), i.e. MUCH later. On debug kernels with poisoned slabs this will cause the oops in uid_hash_remove() because the head of the chain, which resides inside the struct user_namespace, will be already freed and poisoned. Since the uid hash itself is required only when someone can search it, i.e. when the namespace is alive, we can safely unhash all the user_struct-s from it during the namespace exiting. The subsequent free_uid() will complete the user_struct destruction. For example simple program #include <sched.h> char stack[2 * 1024 * 1024]; int f(void foo) { return 0; } int main(void) { clone(f, stack + 1 1024 * 1024, 0x10000000, 0); return 0; } run on kernel with CONFIG_USER_NS turned on will oops the kernel immediately. This was spotted during OpenVZ kernel testing. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Acked-by: "Serge E. Hallyn" <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-19 11:24:18 -07:00
Pavel Emelyanov	735de2230f	Convert uid hash to hlist Surprisingly, but (spotted by Alexey Dobriyan) the uid hash still uses list_heads, thus occupying twice as much place as it could. Convert it to hlist_heads. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-19 11:24:18 -07:00
Matthias Kaehlcke	d8a4821dca	kernel/user.c: Use list_for_each_entry instead of list_for_each kernel/user.c: Convert list_for_each to list_for_each_entry in uid_hash_find() Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-19 11:24:18 -07:00
Alexey Dobriyan	efc63c4fb0	Fix UTS corruption during clone(CLONE_NEWUTS) struct utsname is copied from master one without any exclusion. Here is sample output from one proggie doing sethostname("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"); sethostname("bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"); and another clone(,, CLONE_NEWUTS, ...) uname() hostname = 'aaaaaaaaaaaaaaaaaaaaaaaaabbbbb' hostname = 'bbbaaaaaaaaaaaaaaaaaaaaaaaaaaa' hostname = 'aaaaaaaabbbbbbbbbbbbbbbbbbbbbb' hostname = 'aaaaaaaaaaaaaaaaaaaaaaaaaabbbb' hostname = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaabb' hostname = 'aaabbbbbbbbbbbbbbbbbbbbbbbbbbb' hostname = 'bbbbbbbbbbbbbbbbaaaaaaaaaaaaaa' Hostname is sometimes corrupted. Yes, even _the_ simplest namespace activity had bug in it. :-( Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-19 11:24:17 -07:00
Thomas Gleixner	5e41d0d60a	clockevents: prevent stale tick update on offline cpu Taking a cpu offline removes the cpu from the online mask before the CPU_DEAD notification is done. The clock events layer does the cleanup of the dead CPU from the CPU_DEAD notifier chain. tick_do_timer_cpu is used to avoid xtime lock contention by assigning the task of jiffies xtime updates to one CPU. If a CPU is taken offline, then this assignment becomes stale. This went unnoticed because most of the time the offline CPU went dead before the online CPU reached __cpu_die(), where the CPU_DEAD state is checked. In the case that the offline CPU did not reach the DEAD state before we reach __cpu_die(), the code in there goes to sleep for 100ms. Due to the stale time update assignment, the system is stuck forever. Take the assignment away when a cpu is not longer in the cpu_online_mask. We do this in the last call to tick_nohz_stop_sched_tick() when the offline CPU is on the way to the final play_dead() idle entry. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-09-16 15:36:43 +02:00
Thomas Gleixner	31d9b3938c	clockevents: do not shutdown the oneshot broadcast device When a cpu goes offline it is removed from the broadcast masks. If the mask becomes empty the code shuts down the broadcast device. This is wrong, because the broadcast device needs to be ready for the online cpu going idle (into a c-state, which stops the local apic timer). Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-09-16 15:36:43 +02:00
Thomas Gleixner	07eec6af44	clockevents: Enforce oneshot broadcast when broadcast mask is set on resume The jinxed VAIO refuses to resume without hitting keys on the keyboard when this is not enforced. It is unclear why the cpu ends up in a lower C State without notifying the clock events layer, but enforcing the oneshot broadcast here is safe. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-09-16 15:36:43 +02:00
Thomas Gleixner	6a669ee8a7	timekeeping: Prevent time going backwards on resume Timekeeping resume adjusts xtime by adding the slept time in seconds and resets the reference value of the clock source (clock->cycle_last). clock->cycle last is used to calculate the delta between the last xtime update and the readout of the clock source in __get_nsec_offset(). xtime plus the offset is the current time. The resume code ignores the delta which had already elapsed between the last xtime update and the actual time of suspend. If the suspend time is short, then we can see time going backwards on resume. Suspend: offs_s = clock->read() - clock->cycle_last; now = xtime + offs_s; timekeeping_suspend_time = read_rtc(); Resume: sleep_time = read_rtc() - timekeeping_suspend_time; xtime.tv_sec += sleep_time; clock->cycle_last = clock->read(); offs_r = clock->read() - clock->cycle_last; now = xtime + offs_r; if sleep_time_seconds == 0 and offs_r < offs_s, then time goes backwards. Fix this by storing the offset from the last xtime update and add it to xtime during resume, when we reset clock->cycle_last: sleep_time = read_rtc() - timekeeping_suspend_time; xtime.tv_sec += sleep_time; xtime += offs_s; /* Fixup xtime offset at suspend time */ clock->cycle_last = clock->read(); offs_r = clock->read() - clock->cycle_last; now = xtime + offs_r; Thanks to Marcelo for tracking this down on the OLPC and providing the necessary details to analyze the root cause. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <johnstul@us.ibm.com> Cc: Tosatti <marcelo@kvack.org>	2007-09-16 15:36:43 +02:00
Thomas Gleixner	3be9095063	timekeeping: access rtc outside of xtime lock Lockdep complains about the access of rtc in timekeeping_suspend inside the interrupt disabled region of the write locked xtime lock. Move the access outside. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <johnstul@us.ibm.com>	2007-09-16 15:36:43 +02:00
Tony Breeds	298a5df45d	Fix "no_sync_cmos_clock" logic inversion in kernel/time/ntp.c Seems to me that this timer will only get started on platforms that say they don't want it? Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Gabriel Paubert <paubert@iram.es> Cc: Zachary Amsden <zach@vmware.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-11 17:21:27 -07:00
Michael Ellerman	3210f0ecdb	Restore call_usermodehelper_pipe() behaviour The semantics of call_usermodehelper_pipe() used to be that it would fork the helper, and wait for the kernel thread to be started. This was implemented by setting sub_info.wait to 0 (implicitly), and doing a wait_for_completion(). As part of the cleanup done in `0ab4dc9227`, call_usermodehelper_pipe() was changed to pass 1 as the value for wait to call_usermodehelper_exec(). This is equivalent to setting sub_info.wait to 1, which is a change from the previous behaviour. Using 1 instead of 0 causes __call_usermodehelper() to start the kernel thread running wait_for_helper(), rather than directly calling ____call_usermodehelper(). The end result is that the calling kernel code blocks until the user mode helper finishes. As the helper is expecting input on stdin, and now no one is writing anything, everything locks up (observed in do_coredump). The fix is to change the 1 to UMH_WAIT_EXEC (aka 0), indicating that we want to wait for the kernel thread to be started, but not for the helper to finish. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-11 17:21:20 -07:00
Arnd Bergmann	179c85ea53	futex_compat: fix list traversal bugs The futex list traversal on the compat side appears to have a bug. It's loop termination condition compares: while (compat_ptr(uentry) != &head->list) But that can't be right because "uentry" has the special "pi" indicator bit still potentially set at bit 0. This is cleared by fetch_robust_entry() into the "entry" return value. What this seems to mean is that the list won't terminate when list iteration gets back to the the head. And we'll also process the list head like a normal entry, which could cause all kinds of problems. So we should check for equality with "entry". That pointer is of the non-compat type so we have to do a little casting to keep the compiler and sparse happy. The same problem can in theory occur with the 'pending' variable, although that has not been reported from users so far. Based on the original patch from David Miller. Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Miller <davem@davemloft.net> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-11 17:21:20 -07:00
Roland McGrath	7d94143291	Fix spurious syscall tracing after PTRACE_DETACH + PTRACE_ATTACH When PTRACE_SYSCALL was used and then PTRACE_DETACH is used, the TIF_SYSCALL_TRACE flag is left set on the formerly-traced task. This means that when a new tracer comes along and does PTRACE_ATTACH, it's possible he gets a syscall tracing stop even though he's never used PTRACE_SYSCALL. This happens if the task was in the middle of a system call when the second PTRACE_ATTACH was done. The symptom is an unexpected SIGTRAP when the tracer thinks that only SIGSTOP should have been provoked by his ptrace calls so far. A few machines already fixed this in ptrace_disable (i386, ia64, m68k). But all other machines do not, and still have this bug. On x86_64, this constitutes a regression in IA32 compatibility support. Since all machines now use TIF_SYSCALL_TRACE for this, I put the clearing of TIF_SYSCALL_TRACE in the generic ptrace_detach code rather than adding it to every other machine's ptrace_disable. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-10 18:57:47 -07:00
Peter Zijlstra	1169783085	sched: fix ideal_runtime calculations for reniced tasks fix ideal_runtime: - do not scale it using niced_granularity() it is against sum_exec_delta, so its wall-time, not fair-time. - move the whole check into __check_preempt_curr_fair() so that wakeup preemption can also benefit from the new logic. this also results in code size reduction: text data bss dec hex filename 13391 228 1204 14823 39e7 sched.o.before 13369 228 1204 14801 39d1 sched.o.after Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-09-05 14:32:49 +02:00
Peter Zijlstra	4a55b45036	sched: improve prev_sum_exec_runtime setting Second preparatory patch for fix-ideal runtime: Mark prev_sum_exec_runtime at the beginning of our run, the same spot that adds our wait period to wait_runtime. This seems a more natural location to do this, and it also reduces the code a bit: text data bss dec hex filename 13397 228 1204 14829 39ed sched.o.before 13391 228 1204 14823 39e7 sched.o.after Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-09-05 14:32:49 +02:00
Peter Zijlstra	7c92e54f6f	sched: simplify __check_preempt_curr_fair() Preparatory patch for fix-ideal-runtime: simplify __check_preempt_curr_fair(): get rid of the integer return. text data bss dec hex filename 13404 228 1204 14836 39f4 sched.o.before 13393 228 1204 14825 39e9 sched.o.after functionality is unchanged. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-09-05 14:32:49 +02:00
Ingo Molnar	cf2ab4696e	sched: fix xtensa build warning rename RSR to SRR - 'RSR' is already defined on xtensa. found by Adrian Bunk. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-09-05 14:32:49 +02:00
Ingo Molnar	2491b2b89d	sched: debug: fix sum_exec_runtime clearing when cleaning sched-stats also clear prev_sum_exec_runtime. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-09-05 14:32:49 +02:00
Ingo Molnar	a206c07213	sched: debug: fix cfs_rq->wait_runtime accounting the cfs_rq->wait_runtime debug/statistics counter was not maintained properly - fix this. this also removes some code: text data bss dec hex filename 13420 228 1204 14852 3a04 sched.o.before 13404 228 1204 14836 39f4 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2007-09-05 14:32:49 +02:00
Ingo Molnar	a0dc72601d	sched: fix niced_granularity() shift fix niced_granularity(). This resulted in under-scheduling for CPU-bound negative nice level tasks (and this in turn caused higher than necessary latencies in nice-0 tasks). Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-09-05 14:32:49 +02:00
Suresh Siddha	7fd0d2dde9	sched: fix MC/HT scheduler optimization, without breaking the FUZZ logic. First fix the check if (imbalance + SCHED_LOAD_SCALE_FUZZ < busiest_load_per_task) with this if (imbalance < busiest_load_per_task) As the current check is always false for nice 0 tasks (as SCHED_LOAD_SCALE_FUZZ is same as busiest_load_per_task for nice 0 tasks). With the above change, imbalance was getting reset to 0 in the corner case condition, making the FUZZ logic fail. Fix it by not corrupting the imbalance and change the imbalance, only when it finds that the HT/MC optimization is needed. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-09-05 14:32:48 +02:00
Linus Torvalds	5e7a39275b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: clean up task_new_fair() sched: small schedstat fix sched: fix wait_start_fair condition in update_stats_wait_end() sched: call update_curr() in task_tick_fair() sched: make the scheduler converge to the ideal latency sched: fix sleeper bonus limit	2007-08-31 10:52:00 -07:00
Oleg Nesterov	60187d2708	sigqueue_free: fix the race with collect_signal() Spotted by taoyue <yue.tao@windriver.com> and Jeremy Katz <jeremy.katz@windriver.com>. collect_signal: sigqueue_free: list_del_init(&first->list); if (!list_empty(&q->list)) { // not taken } q->flags &= ~SIGQUEUE_PREALLOC; __sigqueue_free(first); __sigqueue_free(q); Now, __sigqueue_free() is called twice on the same "struct sigqueue" with the obviously bad implications. In particular, this double free breaks the array_cache->avail logic, so the same sigqueue could be "allocated" twice, and the bug can manifest itself via the "impossible" BUG_ON(!SIGQUEUE_PREALLOC) in sigqueue_free/send_sigqueue. Hopefully this can explain these mysterious bug-reports, see http://marc.info/?t=118766926500003 http://marc.info/?t=118466273000005 Alexey Dobriyan reports this patch makes the difference for the testcase, but nobody has an access to the application which opened the problems originally. Also, this patch removes tasklist lock/unlock, ->siglock is enough. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: taoyue <yue.tao@windriver.com> Cc: Jeremy Katz <jeremy.katz@windriver.com> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Roland McGrath <roland@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-31 01:42:23 -07:00
Alexey Dobriyan	99db67bc04	userns: don't leak root user Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Cedric Le Goater <clg@fr.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-31 01:42:23 -07:00
Jarek Poplawski	59845b1ffd	request_irq: fix DEBUG_SHIRQ handling Mariusz Kozlowski reported lockdep's warning: > ================================= > [ INFO: inconsistent lock state ] > 2.6.23-rc2-mm1 #7 > --------------------------------- > inconsistent {in-hardirq-W} -> {hardirq-on-W} usage. > ifconfig/5492 [HC0[0]:SC0[0]:HE1:SE1] takes: > (&tp->lock){+...}, at: [<de8706e0>] rtl8139_interrupt+0x27/0x46b [8139too] > {in-hardirq-W} state was registered at: > [<c0138eeb>] __lock_acquire+0x949/0x11ac > [<c01397e7>] lock_acquire+0x99/0xb2 > [<c0452ff3>] _spin_lock+0x35/0x42 > [<de8706e0>] rtl8139_interrupt+0x27/0x46b [8139too] > [<c0147a5d>] handle_IRQ_event+0x28/0x59 > [<c01493ca>] handle_level_irq+0xad/0x10b > [<c0105a13>] do_IRQ+0x93/0xd0 > [<c010441e>] common_interrupt+0x2e/0x34 ... > other info that might help us debug this: > 1 lock held by ifconfig/5492: > #0: (rtnl_mutex){--..}, at: [<c0451778>] mutex_lock+0x1c/0x1f > > stack backtrace: ... > [<c0452ff3>] _spin_lock+0x35/0x42 > [<de8706e0>] rtl8139_interrupt+0x27/0x46b [8139too] > [<c01480fd>] free_irq+0x11b/0x146 > [<de871d59>] rtl8139_close+0x8a/0x14a [8139too] > [<c03bde63>] dev_close+0x57/0x74 ... This shows that a driver's irq handler was running both in hard interrupt and process contexts with irqs enabled. The latter was done during free_irq() call and was possible only with CONFIG_DEBUG_SHIRQ enabled. This was fixed by another patch. But similar problem is possible with request_irq(): any locks taken from irq handler could be vulnerable - especially with soft interrupts. This patch fixes it by disabling local interrupts during handler's run. (It seems, disabling softirqs should be enough, but it needs more checking on possible races or other special cases). Reported-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-31 01:42:23 -07:00
Rafael J. Wysocki	f3de4be9d5	PM: Fix dependencies of CONFIG_SUSPEND and CONFIG_HIBERNATION Dependencies of CONFIG_SUSPEND and CONFIG_HIBERNATION introduced by commit `296699de6b` "Introduce CONFIG_SUSPEND for suspend-to-Ram and standby" are incorrect, as they don't cover the facts that (1) not all architectures support suspend and (2) SMP hibernation is only possible on X86 and PPC64 (if CONFIG_PPC64_SWSUSP is set). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-31 01:42:22 -07:00
Oleg Nesterov	b07e35f94a	setpgid(child) fails if the child was forked by sub-thread Spotted by Marcin Kowalczyk <qrczak@knm.org.pl>. sys_setpgid(child) fails if the child was forked by sub-thread. Fix the "is it our child" check. The previous commit `ee0acf90d3` was not complete. (this patch asks for the new same_thread_group() helper, but mainline doesn't have it yet). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Cc: <stable@kernel.org> Tested-by: "Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-31 01:42:22 -07:00
Jonathan Lim	f2ab6d8889	Assign task_struct.exit_code before taskstats_exit() taskstats.ac_exitcode is assigned to task_struct.exit_code in bacct_add_tsk() through the following kernel function calls: do_exit() taskstats_exit() fill_pid() bacct_add_tsk() The problem is that in do_exit(), task_struct.exit_code is set to 'code' only after taskstats_exit() has been called. So we need to move the assignment before taskstats_exit(). Signed-off-by: Jonathan Lim <jlim@sgi.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-31 01:42:22 -07:00
Ingo Molnar	9f508f8258	sched: clean up task_new_fair() cleanup: we have the 'se' and 'curr' entity-pointers already, no need to use p->se and current->se. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de>	2007-08-28 12:53:24 +02:00
Ingo Molnar	213c8af67f	sched: small schedstat fix small schedstat fix: the cfs_rq->wait_runtime 'sum of all runtimes' statistics counters missed newly forked tasks and thus had a constant negative skew. Fix this. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de>	2007-08-28 12:53:24 +02:00
Ingo Molnar	b77d69db9f	sched: fix wait_start_fair condition in update_stats_wait_end() Peter Zijlstra noticed the following bug in SCHED_FEAT_SKIP_INITIAL (which is disabled by default at the moment): it relies on se.wait_start_fair being 0 while update_stats_wait_end() did not recognize a 0 value, so instead of 'skipping' the initial interval we gave the new child a maximum boost of +runtime-limit ... (No impact on the default kernel, but nice to fix for completeness.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de>	2007-08-28 12:53:24 +02:00
Ting Yang	7109c4429a	sched: call update_curr() in task_tick_fair() update the fair-clock before using it for the key value. [ mingo@elte.hu: small cleanups. ] Signed-off-by: Ting Yang <tingy@cs.umass.edu> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2007-08-28 12:53:24 +02:00
Ingo Molnar	f6cf891c4d	sched: make the scheduler converge to the ideal latency de-HZ-ification of the granularity defaults unearthed a pre-existing property of CFS: while it correctly converges to the granularity goal, it does not prevent run-time fluctuations in the range of [-gran ... 0 ... +gran]. With the increase of the granularity due to the removal of HZ dependencies, this becomes visible in chew-max output (with 5 tasks running): out: 28 . 27. 32 \| flu: 0 . 0 \| ran: 9 . 13 \| per: 37 . 40 out: 27 . 27. 32 \| flu: 0 . 0 \| ran: 17 . 13 \| per: 44 . 40 out: 27 . 27. 32 \| flu: 0 . 0 \| ran: 9 . 13 \| per: 36 . 40 out: 29 . 27. 32 \| flu: 2 . 0 \| ran: 17 . 13 \| per: 46 . 40 out: 28 . 27. 32 \| flu: 0 . 0 \| ran: 9 . 13 \| per: 37 . 40 out: 29 . 27. 32 \| flu: 0 . 0 \| ran: 18 . 13 \| per: 47 . 40 out: 28 . 27. 32 \| flu: 0 . 0 \| ran: 9 . 13 \| per: 37 . 40 average slice is the ideal 13 msecs and the period is picture-perfect 40 msecs. But the 'ran' field fluctuates around 13.33 msecs and there's no mechanism in CFS to keep that from happening: it's a perfectly valid solution that CFS finds. to fix this we add a granularity/preemption rule that knows about the "target latency", which makes tasks that run longer than the ideal latency run a bit less. The simplest approach is to simply decrease the preemption granularity when a task overruns its ideal latency. For this we have to track how much the task executed since its last preemption. ( this adds a new field to task_struct, but we can eliminate that overhead in 2.6.24 by putting all the scheduler timestamps into an anonymous union. ) with this change in place, chew-max output is fluctuation-less all around: out: 28 . 27. 39 \| flu: 0 . 2 \| ran: 13 . 13 \| per: 41 . 40 out: 28 . 27. 39 \| flu: 0 . 2 \| ran: 13 . 13 \| per: 41 . 40 out: 28 . 27. 39 \| flu: 0 . 2 \| ran: 13 . 13 \| per: 41 . 40 out: 28 . 27. 39 \| flu: 0 . 2 \| ran: 13 . 13 \| per: 41 . 40 out: 28 . 27. 39 \| flu: 0 . 1 \| ran: 13 . 13 \| per: 41 . 40 out: 28 . 27. 39 \| flu: 0 . 1 \| ran: 13 . 13 \| per: 41 . 40 this patch has no impact on any fastpath or on any globally observable scheduling property. (unless you have sharp enough eyes to see millisecond-level ruckles in glxgears smoothness :-) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de>	2007-08-28 12:53:24 +02:00
Mike Galbraith	5f01d519e6	sched: fix sleeper bonus limit There is an Amarok song switch time increase (regression) under hefty load. What is happening is that sleeper_bonus is never consumed, and only rarely goes below runtime_limit, so for the most part, Amarok isn't getting any bonus at all. We're keeping sleeper_bonus right at runtime_limit (sched_latency == sched_runtime_limit == 40ms) forever, ie we don't consume if we're lower that that, and don't add if we're above it. One Amarok thread waking (or anybody else) will push us past the threshold, so the next thread waking gets nada, but will reap pain from the previous thread waking until we drop back to runtime_limit. It looks to me like under load, some random task gets a bonus, and everybody else pays, whether deserving or not. This diff fixed the regression for me at any load rate. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2007-08-28 12:53:24 +02:00
Hugh Dickins	d243769d3f	fix bogus hotplug cpu warning Fix bogus DEBUG_PREEMPT warning on x86_64, when cpu brought online after bootup: current_is_keventd is right to note its use of smp_processor_id is preempt-safe, but should use raw_smp_processor_id to avoid the warning. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-27 10:27:48 -07:00
Ingo Molnar	50c46637aa	sched: s/sched_latency/sched_min_granularity runtime limit and wakeup granularity used to be a function of granularity and that was incorrect changed to sched_latency. Fix this to make wakeup granularity a function of min-granularity, and the runtime limit equal to latency. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-25 22:17:19 +02:00
Ingo Molnar	172ac3dbb7	sched: cleanup, sched_granularity -> sched_min_granularity due to adaptive granularity scheduling the role of sched_granularity has changed to "minimum granularity", so rename the variable (and the tunable) accordingly. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2007-08-25 18:41:53 +02:00
Peter Zijlstra	218050855e	sched: adaptive scheduler granularity Instead of specifying the preemption granularity, specify the wanted latency. By fixing the granlarity to a constany the wakeup latency it a function of the number of running tasks on the rq. Invert this relation. sysctl_sched_granularity becomes a minimum for the dynamic granularity computed from the new sysctl_sched_latency. Then use this latency to do more intelligent granularity decisions: if there are fewer tasks running then we can schedule coarser. This helps performance while still always keeping the latency target. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-25 18:41:53 +02:00
Peter Zijlstra	1fc84aaae3	sched: fix CONFIG_SCHED_DEBUG dependency of lockdep sysctls Make the lockdep sysctls not depend on CONFIG_SCHED_DEBUG. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-25 18:41:52 +02:00
Ingo Molnar	095e56c703	sched: fix startup penalty calculation fix task startup penalty miscalculation: sysctl_sched_granularity is unsigned int and wait_runtime is long so we first have to convert it to long before turning it negative ... Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-24 20:39:10 +02:00
Peter Zijlstra	ea0aa3b23a	sched: simplify bonus calculation #2 current code: delta = calc_delta_mine(delta_exec, curr->load.weight, lw); delta = min((u64)delta, cfs_rq->sleeper_bonus); Notice that this calc_delta_mine() line is exactly delta_mine, which gives: delta = min((u64)delta_mine, cfs_rq->sleeper_bonus); Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-24 20:39:10 +02:00
Peter Zijlstra	a6f2994042	sched: simplify bonus calculation #1 current code: delta = min(cfs_rq->sleeper_bonus, (u64)delta_exec); delta = calc_delta_mine(delta, curr->load.weight, lw); delta = min((u64)delta, cfs_rq->sleeper_bonus); drop the first min(), because we clip against sleeper_bonus in the 3rd line again. That gives: delta = calc_delta_mine(delta_exec, curr->load.weight, lw); delta = min((u64)delta, cfs_rq->sleeper_bonus); Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-24 20:39:10 +02:00
Ingo Molnar	b2133c8b1e	sched: tidy up and simplify the bonus balance make the bonus balance more consistent: do not hand out a bonus if there's too much in flight already, and only deduct as much from a runner as it has the capacity. This makes the bonus engine a zero-sum game (as intended). this also simplifies the code: text data bss dec hex filename 34770 2998 24 37792 93a0 sched.o.before 34749 2998 24 37771 938b sched.o.after and it also avoids overscheduling in sleep-happy workloads like hackbench.c. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-24 20:39:10 +02:00
Dmitry Adamushko	98fbc79853	sched: optimize task_tick_rt() a bit Mitchell Erblich suggested a quality-of-implementation change to not requeue SCHED_RR tasks if there's only a single task on the runqueue, by checking for rq->nr_running == 1. provide a more efficient implementation of that, to check that particular RT priority-queue only. [ From: mingo@elte.hu ] Also first requeue the task then set need_resched - results in slightly better machine-instruction ordering. Also clean up the code a bit. Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-24 20:39:10 +02:00
Sven-Thorsten Dietrich	deac4ee65a	sched: simplify can_migrate_task() Remove trivial conditional branch in Linux scheduler's can_migrate_task() function. text data bss dec hex filename 34770 2998 24 37792 93a0 sched.o.before 34757 2998 24 37779 9393 sched.o.after Signed-off-by: Sven-Thorsten Dietrich <sven@thebigcorporation.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-24 20:39:10 +02:00
Ingo Molnar	71fd371463	sched: remove HZ dependency from the granularity default remove HZ dependency from the granularity default. Use 10 msec for the base granularity, 1 msec for wakeup granularity and 25 msec for batch wakeup granularity. (These defaults are close to the values that the default HZ=250 setting got previously, and thus it's the most common setting.) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-24 20:39:10 +02:00
Bruce Ashfield	7c6c16f354	sched: CONFIG_SCHED_GROUP_FAIR=y fixlet when I built with CONFIG_FAIR_GROUP_SCHED=y, I need the following change to make things right. [ From: mingo@elte.hu ] this config option is not upstream-configurable right now but lets fix this for completeness. Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-24 20:39:10 +02:00
Linus Torvalds	d0797b39dc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: tweak the sched_runtime_limit tunable sched: skip updating rq's next_balance under null SD sched: fix broken SMT/MC optimizations sched: accounting regression since rc1 sched: fix sysctl directory permissions sched: sched_clock_idle_[sleep\|wakeup]_event()	2007-08-23 21:38:39 -07:00
Linus Torvalds	de80af4cc9	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: sysfs: don't warn on removal of a nonexistent binary file HOWTO: latest lxr url address changed HOWTO: korean translation of Documentation/HOWTO Fix Off-by-one in /sys/module/*/refcnt sysfs: fix locking in sysfs_lookup() and sysfs_rename_dir()	2007-08-23 21:34:43 -07:00
Ingo Molnar	505c0efd58	sched: tweak the sched_runtime_limit tunable Michael Gerdau reported reniced task CPU usage weirdnesses. Such symptoms can be caused by limit underruns so double the sched_runtime_limit. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-23 15:18:02 +02:00
Suresh Siddha	f549da848e	sched: skip updating rq's next_balance under null SD Was playing with sched_smt_power_savings/sched_mc_power_savings and found out that while the scheduler domains are reconstructed when sysfs settings change, rebalance_domains() can get triggered with null domain on other cpus, which is setting next_balance to jiffies + 60*HZ. Resulting in no idle/busy balancing for 60 seconds. Fix this. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-23 15:18:02 +02:00
Suresh Siddha	f8700df7c4	sched: fix broken SMT/MC optimizations On a four package system with HT - HT load balancing optimizations were broken. For example, if two tasks end up running on two logical threads of one of the packages, scheduler is not able to pull one of the tasks to a completely idle package. In this scenario, for nice-0 tasks, imbalance calculated by scheduler will be 512 and find_busiest_queue() will return 0 (as each cpu's load is 1024 > imbalance and has only one task running). Similarly MC scheduler optimizations also get fixed with this patch. [ mingo@elte.hu: restored fair balancing by increasing the fuzz and adding it back to the power decision, without the /2 factor. ] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-23 15:18:02 +02:00
Eric W. Biederman	c57baf1e1e	sched: fix sysctl directory permissions There are two remaining gotchas: - The directories have impossible permissions (writeable). - The ctl_name for the kernel directory is inconsistent with everything else. It should be CTL_KERN. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-23 15:18:02 +02:00
Ingo Molnar	2aa44d0567	sched: sched_clock_idle_[sleep\|wakeup]_event() construct a more or less wall-clock time out of sched_clock(), by using ACPI-idle's existing knowledge about how much time we spent idling. This allows the rq clock to work around TSC-stops-in-C2, TSC-gets-corrupted-in-C3 type of problems. ( Besides the scheduler's statistics this also benefits blktrace and printk-timestamps as well. ) Furthermore, the precise before-C2/C3-sleep and after-C2/C3-wakeup callbacks allow the scheduler to get out the most of the period where the CPU has a reliable TSC. This results in slightly more precise task statistics. the ACPI bits were acked by Len. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Len Brown <len.brown@intel.com>	2007-08-23 15:18:02 +02:00
Oleg Nesterov	834d216e1f	signalfd: fix interaction with posix-timers dequeue_signal: if (__SI_TIMER) { spin_unlock(&tsk->sighand->siglock); do_schedule_next_timer(info); spin_lock(&tsk->sighand->siglock); } Unless tsk == curent, this is absolutely unsafe: nothing prevents tsk from exiting. If signalfd was passed to another process, do_schedule_next_timer() is just wrong. Add yet another "tsk == current" check into dequeue_signal(). This patch fixes an oopsable bug, but breaks the scheduling of posix timers if the shared __SI_TIMER signal was fetched via signalfd attached to another sub-thread. Mostly fixed by the next patch. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Roland McGrath <roland@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-22 19:52:46 -07:00
Oleg Nesterov	d02479bdeb	posix-timers: fix creation race sys_timer_create() sets ->it_process and unlocks ->siglock, then checks tmr->it_sigev_notify to define if get_task_struct() is needed. We already passed ->it_id to the caller, another thread can delete this timer and free its memory in between. As a minimal fix, move this code under ->siglock, sys_timer_delete() takes it too before calling release_posix_timer(). A proper serialization would be to take ->it_lock, we add a partly initialized timer on posix_timers_id, not good. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-22 19:52:46 -07:00
Thomas Gleixner	179394af7a	posix-timers: fix deletion race timer_delete does: lock_timer(); timer->it_process = NULL; unlock_timer(); release_posix_timer(); timer->it_process is checked in lock_timer() to prevent access to a timer, which is on the way to be deleted, but the check happens after idr_lock is dropped. This allows release_posix_timer() to delete the timer before the lock code can check the timer: CPU 0 CPU 1 lock_timer(); timer->it_process = NULL; unlock_timer(); lock_timer() spin_lock(idr_lock); timer = idr_find(); spin_lock(timer->lock); spin_unlock(idr_lock); release_posix_timer(); spin_lock(idr_lock); idr_remove(timer); spin_unlock(idr_lock); free_timer(timer); if (timer->......) Change the locking to prevent this. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-22 19:52:45 -07:00
Andrew Morton	8b7f07155f	free_irq(): fix DEBUG_SHIRQ handling If we're going to run the handler from free_irq() then we must do it with local irq's disabled. Otherwise lockdep complains that the handler is taking irq-safe spinlocks in a non-irq-safe fashion. Cc: Ingo Molnar <mingo@elte.hu> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-22 19:52:44 -07:00
john stultz	187226f57f	futex_unlock_pi() hurts my brain and may cause application deadlock Avoid futex_unlock_pi returning -EFAULT (which results in deadlock), by clearing uval before jumping to retry_locked. Signed-off-by: John Stultz <johnstul@us.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-22 19:52:44 -07:00
Adrian Bunk	88ae704c2a	kernel/auditsc.c: fix an off-by-one This patch fixes an off-by-one in a BUG_ON() spotted by the Coverity checker. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Amy Griffis <amy.griffis@hp.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-22 19:52:44 -07:00
Alexey Dobriyan	256e2fdf03	Fix Off-by-one in /sys/module/*/refcnt sysfs internals were changed to not pin module in question. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Kay Sievers <kay.sievers@vrfy.org> Acked-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-08-22 14:35:35 -07:00
Robin Getz	cb00e99c0a	fix - ensure we don't use bootconsoles after init has been released Gerd Hoffmann pointed out that my patch from yesterday can lead to a null pointer dereference if the kernel is booted with no console, and no earlyprintk defined. This fixes that issue. Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-21 20:23:53 -07:00
Robin Getz	0c5564bd91	ensure we don't use bootconsoles after init has been released This is a followup to the cleanups for earlyprintk patch from Gerd Hoffmann http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=69331af79cf29e26d1231152a172a1a10c2df511 This ensures that a bootconsole is unregistered if it is not replaced. The current implementation spews garbage out the bootconsole in this case, since the bootconsole structure is normally in the init section, and is freed, but still used. Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Cc: Mike Frysinger <vapier.adi@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-20 22:42:01 -07:00
Christian Heim	e598fbaabd	Remove double inclusion of linux/capability.h Remove the second inclusion of linux/capability.h, which has been introduced with "[PATCH] move capable() to capability.h" (commit `c59ede7b78`) Signed-off-by: Christian Heim <phreak@gentoo.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-19 10:12:32 -07:00
Linus Torvalds	738ddd3039	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: run_rebalance_domains: s/SCHED_IDLE/CPU_IDLE/ sched: fix sleeper bonus sched: make global code static	2007-08-12 11:06:45 -07:00
Thomas Gleixner	2464286ace	genirq: suppress resend of level interrupts Level type interrupts are resent by the interrupt hardware when they are still active at irq_enable(). Suppress the resend mechanism for interrupts marked as level. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-12 11:05:45 -07:00
Thomas Gleixner	496634217e	genirq: cleanup mismerge artifact Commit `5a43a066b1`: "genirq: Allow fasteoi handler to retrigger disabled interrupts" was erroneously applied to handle_level_irq(). This added the irq retrigger / resend functionality to the level irq handler. Revert the offending bits. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-12 11:05:45 -07:00
Oleg Nesterov	de0cf899bb	sched: run_rebalance_domains: s/SCHED_IDLE/CPU_IDLE/ rebalance_domains(SCHED_IDLE) looks strange (typo), change it to CPU_IDLE. the effect of this bug was slightly more agressive idle-balancing on SMP than intended. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-12 18:08:19 +02:00
Ingo Molnar	5d2b3d3695	sched: fix sleeper bonus Peter Ziljstra noticed that the sleeper bonus deduction code was not properly rate-limited: a task that scheduled more frequently would get a disproportionately large deduction. So limit the deduction to delta_exec. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-12 18:08:19 +02:00
Adrian Bunk	6707de00fd	sched: make global code static This patch makes the following needlessly global code static: - arch_reinit_sched_domains() - struct attr_sched_mc_power_savings - struct attr_sched_smt_power_savings Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-12 18:08:19 +02:00
Linus Torvalds	d291676ce8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched debug: dont print kernel address in /proc/sched_debug sched: fix typo in the FAIR_GROUP_SCHED branch sched: improve rq-clock overflow logic	2007-08-11 15:58:37 -07:00
Peter Chubb	cd5bfea278	fix compilation with gcc 4.2 gcc-4.2 is a lot more picky about its symbol handling. EXPORT_SYMBOL no longer works on symbols that are undefined or defined with static scope. For example, with CONFIG_PROFILE off, I see: kernel/profile.c:206: error: __ksymtab_profile_event_unregister causes a section type conflict kernel/profile.c:205: error: __ksymtab_profile_event_register causes a section type conflict This patch moves the EXPORTs inside the #ifdef CONFIG_PROFILE, so we only try to export symbols that are defined. Also, in kernel/kprobes.c there's an EXPORT_SYMBOL_GPL() for jprobes_return, which if CONFIG_JPROBES is undefined is a static inline and gives the same error. And in drivers/acpi/resources/rsxface.c, there's an ACPI_EXPORT_SYMBOPL() for a static symbol. If it's static, it's not accessible from outside the compilation unit, so should bot be exported. These three changes allow building a zx1_defconfig kernel with gcc 4.2 on IA64. [akpm@linux-foundation.org: export jpobe_return properly] Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-11 15:47:42 -07:00
Miao Xie	6ddfca9548	timer: remove clockevents_unregister_notifier I find a function(clockevents_unregister_notifier) which is not called by anything in tree. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-11 15:47:42 -07:00
Rafael J. Wysocki	c5a69adff9	Hibernation: do not try to mark invalid PFNs as nosave On some systems some PFNs reported by the early initialization code as 'nosave' may be invalid. If we try to set the corresponding bits in the hibernation bitmap, BUG_ON() in memory_bm_find_bit() will be triggered and the system won't be able to boot (cf. https://bugzilla.novell.com/show_bug.cgi?id=296242). Prevent this from happening by verifying if the 'nosave' PFNs are valid in mark_nosave_pages(). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-11 15:47:40 -07:00
Lee Schermerhorn	8daec965e7	Fix missing numa_zonelist_order sysctl Misplaced #endif is hiding the numa_zonelist_order sysctl when !SECURITY. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-11 15:47:40 -07:00
Ingo Molnar	5167e75f4d	sched debug: dont print kernel address in /proc/sched_debug Arjan van de Ven pointed out that we should not print kernel addresses in world-readable /proc files - fix that. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2007-08-10 23:05:11 +02:00
Ingo Molnar	e56f31aad9	sched: fix typo in the FAIR_GROUP_SCHED branch while there's no in-tree way to turn group scheduling at the moment, fix a typo in it nevertheless. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-10 23:05:11 +02:00
Ingo Molnar	529c77261b	sched: improve rq-clock overflow logic improve the rq-clock overflow logic: limit the absolute rq->clock delta since the last scheduler tick, instead of limiting the delta itself. tested by Arjan van de Ven - whole laptop was misbehaving due to an incorrectly calibrated cpu_khz confusing sched_clock(). Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2007-08-10 23:05:11 +02:00
Linus Torvalds	be12014dd7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: (61 commits) sched: refine negative nice level granularity sched: fix update_stats_enqueue() reniced codepath sched: round a bit better sched: make the multiplication table more accurate sched: optimize update_rq_clock() calls in the load-balancer sched: optimize activate_task() sched: clean up set_curr_task_fair() sched: remove __update_rq_clock() call from entity_tick() sched: move the __update_rq_clock() call to scheduler_tick() sched debug: remove the 'u64 now' parameter from print_task()/_rq() sched: remove the 'u64 now' local variables sched: remove the 'u64 now' parameter from deactivate_task() sched: remove the 'u64 now' parameter from dequeue_task() sched: remove the 'u64 now' parameter from enqueue_task() sched: remove the 'u64 now' parameter from dec_nr_running() sched: remove the 'u64 now' parameter from inc_nr_running() sched: remove the 'u64 now' parameter from dec_load() sched: remove the 'u64 now' parameter from inc_load() sched: remove the 'u64 now' parameter from update_curr_load() sched: remove the 'u64 now' parameter from ->task_new() ...	2007-08-09 08:23:31 -07:00
Linus Torvalds	88ffc35059	Revert "genirq: temporary fix for level-triggered IRQ resend" This reverts commit `0fc4969b86`. It was always meant to be temporary, but it's generating more useless noise than anything else, and we probably should never have done it in the generic kernel (only had the people involved test it on their own). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-09 08:10:16 -07:00
Ingo Molnar	7cff8cf61c	sched: refine negative nice level granularity refine the granularity of negative nice level tasks: let them reschedule more often to offset the effect of them consuming their wait_runtime proportionately slower. (This makes nice-0 task scheduling smoother in the presence of negatively reniced tasks.) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:52 +02:00
Ingo Molnar	a69edb5560	sched: fix update_stats_enqueue() reniced codepath the key has to be rescaled to /weight even if it has a positive value. (this change only affects the scheduling of reniced tasks) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:52 +02:00
Ingo Molnar	194081ebfa	sched: round a bit better round a tiny bit better in high-frequency rescheduling scenarios, by rounding around zero instead of rounding down. (this is pretty theoretical though) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:51 +02:00
Ingo Molnar	254753dc32	sched: make the multiplication table more accurate do small deltas in the weight and multiplication constant table so that the worst-case numeric error is better than 1:100000000. (8 digits) the current error table is: nice mult * inv_mult error ------------------------------------------ -20: 88761 * 48388 -0.0000000065 -19: 71755 * 59856 -0.0000000037 -18: 56483 * 76040 0.0000000056 -17: 46273 * 92818 0.0000000042 -16: 36291 * 118348 -0.0000000065 -15: 29154 * 147320 -0.0000000037 -14: 23254 * 184698 -0.0000000009 -13: 18705 * 229616 -0.0000000037 -12: 14949 * 287308 -0.0000000009 -11: 11916 * 360437 -0.0000000009 -10: 9548 * 449829 -0.0000000009 -9: 7620 * 563644 -0.0000000037 -8: 6100 * 704093 0.0000000009 -7: 4904 * 875809 0.0000000093 -6: 3906 * 1099582 -0.0000000009 -5: 3121 * 1376151 -0.0000000058 -4: 2501 * 1717300 0.0000000009 -3: 1991 * 2157191 -0.0000000035 -2: 1586 * 2708050 0.0000000009 -1: 1277 * 3363326 0.0000000014 0: 1024 * 4194304 0.0000000000 1: 820 * 5237765 0.0000000009 2: 655 * 6557202 0.0000000033 3: 526 * 8165337 -0.0000000079 4: 423 * 10153587 0.0000000012 5: 335 * 12820798 0.0000000079 6: 272 * 15790321 0.0000000037 7: 215 * 19976592 -0.0000000037 8: 172 * 24970740 -0.0000000037 9: 137 * 31350126 -0.0000000079 10: 110 * 39045157 -0.0000000061 11: 87 * 49367440 -0.0000000037 12: 70 * 61356676 0.0000000056 13: 56 * 76695844 -0.0000000075 14: 45 * 95443717 -0.0000000072 15: 36 * 119304647 -0.0000000009 16: 29 * 148102320 -0.0000000037 17: 23 * 186737708 -0.0000000028 18: 18 * 238609294 -0.0000000009 19: 15 * 286331153 -0.0000000002 Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:51 +02:00
Ingo Molnar	6e82a3befe	sched: optimize update_rq_clock() calls in the load-balancer optimize update_rq_clock() calls in the load-balancer: update them right after locking the runqueue(s) so that the pull functions do not have to call it. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:51 +02:00
Ingo Molnar	2daa357705	sched: optimize activate_task() optimize activate_task() by removing update_rq_clock() from it. (and add update_rq_clock() to all callsites of activate_task() that did not have it before.) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:51 +02:00
Ingo Molnar	c3b64f1e4f	sched: clean up set_curr_task_fair() clean up set_curr_task_fair(). ( identity transformation that causes no change in functionality. ) text data bss dec hex filename 39170 3750 36 42956 a7cc sched.o.before 39170 3750 36 42956 a7cc sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:51 +02:00
Ingo Molnar	d9e0e6aa6d	sched: remove __update_rq_clock() call from entity_tick() remove __update_rq_clock() call from entity_tick(). no change in functionality because scheduler_tick() already calls __update_rq_clock(). Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:51 +02:00
Ingo Molnar	546fe3c909	sched: move the __update_rq_clock() call to scheduler_tick() move the __update_rq_clock() call from update_cpu_load() to scheduler_tick(). ( identity transformation that causes no change in functionality. ) this allows the direct use of rq->clock in ->task_tick() functions. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:51 +02:00
Ingo Molnar	a48da48b40	sched debug: remove the 'u64 now' parameter from print_task()/_rq() remove the 'u64 now' parameter from sched_debug.c:print_task()/_rq(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:51 +02:00
Ingo Molnar	bdd4dfa89c	sched: remove the 'u64 now' local variables final step: remove all (now superfluous) 'u64 now' variables. ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:51 +02:00
Ingo Molnar	2e1cb74a50	sched: remove the 'u64 now' parameter from deactivate_task() remove the 'u64 now' parameter from deactivate_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:49 +02:00
Ingo Molnar	69be72c13d	sched: remove the 'u64 now' parameter from dequeue_task() remove the 'u64 now' parameter from dequeue_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:49 +02:00
Ingo Molnar	8159f87e2b	sched: remove the 'u64 now' parameter from enqueue_task() remove the 'u64 now' parameter from enqueue_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:49 +02:00
Ingo Molnar	db53181e41	sched: remove the 'u64 now' parameter from dec_nr_running() remove the 'u64 now' parameter from dec_nr_running(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:49 +02:00
Ingo Molnar	e5fa2237b5	sched: remove the 'u64 now' parameter from inc_nr_running() remove the 'u64 now' parameter from inc_nr_running(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:49 +02:00
Ingo Molnar	79b5dddf83	sched: remove the 'u64 now' parameter from dec_load() remove the 'u64 now' parameter from dec_load(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:49 +02:00
Ingo Molnar	29b4b623fe	sched: remove the 'u64 now' parameter from inc_load() remove the 'u64 now' parameter from inc_load(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:49 +02:00
Ingo Molnar	84a1d7a2f9	sched: remove the 'u64 now' parameter from update_curr_load() remove the 'u64 now' parameter from update_curr_load(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:49 +02:00
Ingo Molnar	ee0827d8b5	sched: remove the 'u64 now' parameter from ->task_new() remove the 'u64 now' parameter from ->task_new(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:49 +02:00
Ingo Molnar	31ee529cc2	sched: remove the 'u64 now' parameter from ->put_prev_task() remove the 'u64 now' parameter from ->put_prev_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:49 +02:00
Ingo Molnar	ff95f3df54	sched: remove the 'u64 now' parameter from pick_next_task() remove the 'u64 now' parameter from pick_next_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:49 +02:00
Ingo Molnar	fb8d472402	sched: remove the 'u64 now' parameter from ->pick_next_task() remove the 'u64 now' parameter from ->pick_next_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:48 +02:00
Ingo Molnar	f02231e51a	sched: remove the 'u64 now' parameter from ->dequeue_task() remove the 'u64 now' parameter from ->dequeue_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:48 +02:00
Ingo Molnar	fd390f6a04	sched: remove the 'u64 now' parameter from ->enqueue_task() remove the 'u64 now' parameter from ->enqueue_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:48 +02:00
Ingo Molnar	f1e14ef64d	sched: remove the 'u64 now' parameter from update_curr_rt() remove the 'u64 now' parameter from update_curr_rt(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:48 +02:00
Ingo Molnar	ab6cde2692	sched: remove the 'u64 now' parameter from put_prev_entity() remove the 'u64 now' parameter from put_prev_entity(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:48 +02:00
Ingo Molnar	9948f4b2a7	sched: remove the 'u64 now' parameter from pick_next_entity() remove the 'u64 now' parameter from pick_next_entity(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:48 +02:00
Ingo Molnar	8494f412ed	sched: remove the 'u64 now' parameter from set_next_entity() remove the 'u64 now' parameter from set_next_entity(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:48 +02:00
Ingo Molnar	525c2716a4	sched: remove the 'u64 now' parameter from dequeue_entity() remove the 'u64 now' parameter from dequeue_entity(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:48 +02:00
Ingo Molnar	668031ca8f	sched: remove the 'u64 now' parameter from enqueue_entity() remove the 'u64 now' parameter from enqueue_entity(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:48 +02:00
Ingo Molnar	2396af69be	sched: remove the 'u64 now' parameter from enqueue_sleeper() remove the 'u64 now' parameter from enqueue_sleeper(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:48 +02:00
Ingo Molnar	dfdc119e54	sched: remove the 'u64 now' parameter from __enqueue_sleeper() remove the 'u64 now' parameter from __enqueue_sleeper(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:48 +02:00
Ingo Molnar	c7e9b5b293	sched: remove the 'u64 now' parameter from update_stats_curr_end() remove the 'u64 now' parameter from update_stats_curr_end(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:48 +02:00
Ingo Molnar	19b6a2e370	sched: remove the 'u64 now' parameter from update_stats_dequeue() remove the 'u64 now' parameter from update_stats_dequeue(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:48 +02:00
Ingo Molnar	79303e9e02	sched: remove the 'u64 now' parameter from update_stats_curr_start() remove the 'u64 now' parameter from update_stats_curr_start(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:47 +02:00
Ingo Molnar	9ef0a9615b	sched: remove the 'u64 now' parameter from update_stats_wait_end() remove the 'u64 now' parameter from update_stats_wait_end(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:47 +02:00
Ingo Molnar	eac55ea376	sched: remove the 'u64 now' parameter from __update_stats_wait_end() remove the 'u64 now' parameter from __update_stats_wait_end(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:47 +02:00
Ingo Molnar	d2417e5a3e	sched: remove the 'u64 now' parameter from update_stats_enqueue() remove the 'u64 now' parameter from update_stats_enqueue(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:47 +02:00
Ingo Molnar	5870db5b83	sched: remove the 'u64 now' parameter from update_stats_wait_start() remove the 'u64 now' parameter from update_stats_wait_start(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:47 +02:00
Ingo Molnar	b7cc089657	sched: remove the 'u64 now' parameter from update_curr() remove the 'u64 now' parameter from update_curr(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:47 +02:00
Ingo Molnar	5cef9eca38	sched: remove the 'u64 now' parameter from print_cfs_rq() remove the 'u64 now' parameter from print_cfs_rq(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:47 +02:00
Ingo Molnar	d281918d7c	sched: remove 'now' use from assignments change all 'now' timestamp uses in assignments to rq->clock. ( this is an identity transformation that causes no functionality change: all such new rq->clock is necessarily preceded by an update_rq_clock() call. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:47 +02:00
Ingo Molnar	eb59449400	sched: remove __rq_clock() remove the (now unused) __rq_clock() function. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:47 +02:00
Ingo Molnar	c1b3da3ecd	sched: eliminate __rq_clock() use eliminate __rq_clock() use by changing it to: __update_rq_clock(rq) now = rq->clock; identity transformation - no change in behavior. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:47 +02:00
Ingo Molnar	2ab81159fa	sched: remove rq_clock() remove the now unused rq_clock() function. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:47 +02:00
Ingo Molnar	a8e504d2a5	sched: eliminate rq_clock() use eliminate rq_clock() use by changing it to: update_rq_clock(rq) now = rq->clock; identity transformation - no change in behavior. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:47 +02:00
Ingo Molnar	b04a0f4c16	sched: add [__]update_rq_clock(rq) add the [__]update_rq_clock(rq) functions. (No change in functionality, just reorganization to prepare for elimination of the heavy 64-bit timestamp-passing in the scheduler.) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:46 +02:00
Peter Williams	a4ac01c36e	sched: fix bug in balance_tasks() There are two problems with balance_tasks() and how it used: 1. The variables best_prio and best_prio_seen (inherited from the old move_tasks()) were only required to handle problems caused by the active/expired arrays, the order in which they were processed and the possibility that the task with the highest priority could be on either. These issues are no longer present and the extra overhead associated with their use is unnecessary (and possibly wrong). 2. In the absence of CONFIG_FAIR_GROUP_SCHED being set, the same this_best_prio variable needs to be used by all scheduling classes or there is a risk of moving too much load. E.g. if the highest priority task on this at the beginning is a fairly low priority task and the rt class migrates a task (during its turn) then that moved task becomes the new highest priority task on this_rq but when the sched_fair class initializes its copy of this_best_prio it will get the priority of the original highest priority task as, due to the run queue locks being held, the reschedule triggered by pull_task() will not have taken place. This could result in inappropriate overriding of skip_for_load and excessive load being moved. The attached patch addresses these problems by deleting all reference to best_prio and best_prio_seen and making this_best_prio a reference parameter to the various functions involved. load_balance_fair() has also been modified so that this_best_prio is only reset (in the loop) if CONFIG_FAIR_GROUP_SCHED is set. This should preserve the effect of helping spread groups' higher priority tasks around the available CPUs while improving system performance when CONFIG_FAIR_GROUP_SCHED isn't set. Signed-off-by: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:46 +02:00
Alexey Dobriyan	e0361851e5	sched: remove binary sysctls from kernel.sched_domain kernel.sched_domain hierarchy is under CTL_UNNUMBERED and thus unreachable to sysctl(2). Generating .ctl_number's in such situation is not useful. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:46 +02:00
Ingo Molnar	fd8bb43e27	sched: delta_exec accounting fix small delta_exec accounting fix: increase delta_exec and increase sum_exec_runtime even if the task is not on the runqueue anymore. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:46 +02:00
Ingo Molnar	c5dcfe72aa	sched: clean up delta_mine cleanup: delta_mine is an unsigned value. no code impact: text data bss dec hex filename 27823 2726 16 30565 7765 sched.o.before 27823 2726 16 30565 7765 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:46 +02:00
Ingo Molnar	8e717b194c	sched: schedule() speedup speed up schedule(): share the 'now' parameter that deactivate_task() was calculating internally. ( this also fixes the small accounting window between the deactivate call and the pick_next_task() call. ) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:46 +02:00
Ingo Molnar	7bfd048587	sched: uninline rq_clock() uninline rq_clock() to save 263 bytes of code: text data bss dec hex filename 39561 3642 24 43227 a8db sched.o.before 39298 3642 24 42964 a7d4 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:46 +02:00
Josh Triplett	291ae5a120	sched: mark print_cfs_stats static sched_fair.c defines print_cfs_stats, and sched_debug.c uses it, but sched.c includes both sched_fair.c and sched_debug.c, so all the references to print_cfs_stats occur in the same compilation unit. Thus, mark print_cfs_stats static. Eliminates a sparse warning: warning: symbol 'print_cfs_stats' was not declared. Should it be static? Signed-off-by: Josh Triplett <josh@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:46 +02:00
Ulrich Drepper	9531b62f5e	sched: clean up sched_getaffinity() here's another tiny cleanup. The generated code is not affected (gcc is smart enough) but for people looking over the code it is just irritating to have the extra conditional. Signed-off-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:46 +02:00
Peter Williams	4301065920	sched: simplify move_tasks() The move_tasks() function is currently multiplexed with two distinct capabilities: 1. attempt to move a specified amount of weighted load from one run queue to another; and 2. attempt to move a specified number of tasks from one run queue to another. The first of these capabilities is used in two places, load_balance() and load_balance_idle(), and in both of these cases the return value of move_tasks() is used purely to decide if tasks/load were moved and no notice of the actual number of tasks moved is taken. The second capability is used in exactly one place, active_load_balance(), to attempt to move exactly one task and, as before, the return value is only used as an indicator of success or failure. This multiplexing of sched_task() was introduced, by me, as part of the smpnice patches and was motivated by the fact that the alternative, one function to move specified load and one to move a single task, would have led to two functions of roughly the same complexity as the old move_tasks() (or the new balance_tasks()). However, the new modular design of the new CFS scheduler allows a simpler solution to be adopted and this patch addresses that solution by: 1. adding a new function, move_one_task(), to be used by active_load_balance(); and 2. making move_tasks() a single purpose function that tries to move a specified weighted load and returns 1 for success and 0 for failure. One of the consequences of these changes is that neither move_one_task() or the new move_tasks() care how many tasks sched_class.load_balance() moves and this enables its interface to be simplified by returning the amount of load moved as its result and removing the load_moved pointer from the argument list. This helps simplify the new move_tasks() and slightly reduces the amount of work done in each of sched_class.load_balance()'s implementations. Further simplification, e.g. changes to balance_tasks(), are possible but (slightly) complicated by the special needs of load_balance_fair() so I've left them to a later patch (if this one gets accepted). NB Since move_tasks() gets called with two run queue locks held even small reductions in overhead are worthwhile. [ mingo@elte.hu ] this change also reduces code size nicely: text data bss dec hex filename 39216 3618 24 42858 a76a sched.o.before 39173 3618 24 42815 a73f sched.o.after Signed-off-by: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:46 +02:00
Ingo Molnar	f1a438d813	sched: reorder update_cpu_load(rq) with the ->task_tick() call Peter Williams suggested to flip the order of update_cpu_load(rq) with the ->task_tick() call. This is a NOP for the current scheduler (the two functions are independent of each other), ->task_tick() might create some state for update_cpu_load() in the future (or in PlugSched). Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:45 +02:00
Ingo Molnar	0915c4e89d	sched: batch sleeper bonus batch up the sleeper bonus sum a bit more. Anything below sched-granularity is too small to make a practical difference anyway. this optimization reduces the math in high-frequency scheduling scenarios. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-09 11:16:45 +02:00
Al Viro	175fc48425	fix oops in __audit_signal_info() The check for audit_signals is misplaced and the check for audit_dummy_context() is missing; as the result, if we send a signal to auditd from task with NULL ->audit_context while we have audit_signals != 0 we end up with an oops. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-07 19:58:56 -07:00
Al Viro	6f605d83dd	take sched_debug.c out of nasal demon territory C99 6.10.3[11]: preprocessing directive within the argument list of macro invocation => undefined behaviour. Don't do that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-06 18:22:27 -07:00
Oleg Nesterov	247284481c	Kill some obsolete sub-thread-ptrace stuff There is a couple of subtle checks which were needed to handle ptracing from the same thread group. This was deprecated a long ago, imho this code just complicates the understanding. And, the "->parent->signal->flags & SIGNAL_GROUP_EXIT" check in exit_notify() is not right. SIGNAL_GROUP_EXIT can mean exec(), not exit_group(). This means ptracer can lose a ptraced zombie on exec(). Minor problem, but still the bug. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-03 15:06:33 -07:00
Daniel Ritz	b6b1d87785	serial: fix 8250 early console setup the early setup function serial8250_console_early_setup() can be called from non __init code (eg. hotpluggable serial ports like serial_cs) so remove the __init from the call chain to avoid crashes. Signed-off-by: Daniel Ritz <daniel.ritz@gmx.ch> Cc: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-03 15:02:56 -07:00
Ingo Molnar	6cfb0d5d06	[PATCH] sched: reduce debug code move the rest of the debugging/instrumentation code to under CONFIG_SCHEDSTATS too. This reduces code size and speeds code up: text data bss dec hex filename 33044 4122 28 37194 914a sched.o.before 32708 4122 28 36858 8ffa sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-02 17:41:40 +02:00
Ingo Molnar	8179ca23d5	[PATCH] sched: use schedstat_set() API make use of the new schedstat_set() API to eliminate two #ifdef sections. No functional changes: text data bss dec hex filename 29009 4122 28 33159 8187 sched.o.before 29009 4122 28 33159 8187 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-02 17:41:40 +02:00
Ingo Molnar	c3c7011969	[PATCH] sched: add schedstat_set() API add the schedstat_set() API, to allow the reduction of CONFIG_SCHEDSTAT related #ifdefs. No code changed. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-02 17:41:40 +02:00
Ingo Molnar	9c2172459a	[PATCH] sched: move load-calculation functions move load-calculation functions so that they can use the per-policy declarations and methods. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-02 17:41:40 +02:00
Ingo Molnar	cad60d93e1	[PATCH] sched: ->task_new cleanup make sched_class.task_new == NULL a 'default method', this allows the removal of task_rt_new. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-02 17:41:40 +02:00
Ingo Molnar	4e6f96f313	[PATCH] sched: uninline inc/dec_nr_running() uninline inc_nr_running() and dec_nr_running(): text data bss dec hex filename 29039 4162 24 33225 81c9 sched.o.before 29027 4162 24 33213 81bd sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-02 17:41:40 +02:00
Ingo Molnar	cb1c4fc924	[PATCH] sched: uninline calc_delta_mine() uninline calc_delta_mine(): text data bss dec hex filename 29162 4162 24 33348 8244 sched.o.before 29039 4162 24 33225 81c9 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-02 17:41:40 +02:00
Ingo Molnar	ecf691daf7	[PATCH] sched: calc_delta_mine(): use fixed limit use fixed limit in calc_delta_mine() - this saves an instruction :) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-02 17:41:40 +02:00
Peter Williams	5a4f3ea77e	[PATCH] sched: tidy up left over smpnice code 1. The only place that RTPRIO_TO_LOAD_WEIGHT() is used is in the call to move_tasks() in the function active_load_balance() and its purpose here is just to make sure that the load to be moved is big enough to ensure that exactly one task is moved (if there's one available). This can be accomplished by using ULONG_MAX instead and this allows RTPRIO_TO_LOAD_WEIGHT() to be deleted. 2. This, in turn, allows PRIO_TO_LOAD_WEIGHT() to be deleted. 3. This allows load_weight() to be deleted which allows TIME_SLICE_NICE_ZERO to be deleted along with the comment above it. Signed-off-by: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-02 17:41:40 +02:00
Ingo Molnar	362a701663	[PATCH] sched: remove cache_hot_time remove the last unused remains of cache_hot_time. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-02 17:41:40 +02:00
Thomas Gleixner	0fc4969b86	genirq: temporary fix for level-triggered IRQ resend Marcin Slusarz reported a ne2k-pci "hung network interface" regression. delayed disable relies on the ability to re-trigger the interrupt in the case that a real interrupt happens after the software disable was set. In this case we actually disable the interrupt on the hardware level _after_ it occurred. On enable_irq, we need to re-trigger the interrupt. On i386 this relies on a hardware resend mechanism (send_IPI_self()). Actually we only need the resend for edge type interrupts. Level type interrupts come back once enable_irq() re-enables the interrupt line. I assume that the interrupt in question is level triggered because it is shared and above the legacy irqs 0-15: 17: 12 IO-APIC-fasteoi eth1, eth0 Looking into the IO_APIC code, the resend via send_IPI_self() happens unconditionally. So the resend is done for level and edge interrupts. This makes the problem more mysterious. The code in question lib8390.c does disable_irq(); fiddle_with_the_network_card_hardware() enable_irq(); The fiddle_with_the_network_card_hardware() might cause interrupts, which are cleared in the same code path again, Marcin found that when he disables the irq line on the hardware level (removing the delayed disable) the card is kept alive. So the difference is that we can get a resend on enable_irq, when an interrupt happens during the time, where we are in the disabled region. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-01 20:46:22 -07:00
Jesper Juhl	c9b3febc5b	Fix a use after free bug in kernel->userspace relay file support Coverity spotted what looks like a real possible case of using a variable after it has been freed. The problem is in kernel/relay.c::relay_open_buf() If the code hits "goto free_buf;" it ends up in this code : free_buf: relay_destroy_buf(buf); <--- calls kfree() on 'buf'. free_name: kfree(tmpname); end: return buf; <-- use after free of 'buf'. I read through the callers and they all handle a NULL return from this function as an error (and hitting the 'free_buf' label only happens on failure to chan->cb->create_buf_file(), so that looks like a clear error to me). The patch simply sets 'buf' to NULL after the call to relay_destroy_buf(buf); - as far as I can see that should take care of the problem. The patch also corrects a reference to a documentation file while I was at it. Note from Mathieu: the documentation reference change should have been done in a separate patch, but I guess no one will really care. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Acked-by: "David J. Wilder" <wilder@us.ibm.com> Tested-by: "David J. Wilder" <wilder@us.ibm.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Tom Zanussi <zanussi@us.ibm.com> Cc: Karim Yaghmour <karim@opersys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:42 -07:00
Satyam Sharma	e804a4a4dd	kthread: silence bogus section mismatch warning WARNING: kernel/built-in.o(.text+0x16910): Section mismatch: reference to .init.text: (between 'kthreadd' and 'init_waitqueue_head') comes because kernel/kthread.c:kthreadd() is not __init but calls kthreadd_setup() which is __init. But this is ok, because kthreadd_setup() is only ever called at init time, and then kthreadd() proceeds into its "for (;;)" loop. We could mark kthreadd __init_refok, but kthreadd_setup() with just one callsite and 4 lines in it (it's been that small since `10ab825bde`) doesn't need to be a separate function at all -- so let's just move those four lines at beginning of kthreadd() itself. Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:42 -07:00
Andreas Schwab	f54f098612	futex: pass nr_wake2 to futex_wake_op The fourth argument of sys_futex is ignored when op == FUTEX_WAKE_OP, but futex_wake_op expects it as its nr_wake2 parameter. The only user of this operation in glibc is always passing 1, so this bug had no consequences so far. Signed-off-by: Andreas Schwab <schwab@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:40 -07:00
Alexey Dobriyan	c0f3358621	Fix leak on /proc/lockdep_stats Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:40 -07:00
Alexey Dobriyan	5ea473a1df	Fix leaks on /proc/{*/sched,sched_debug,timer_list,timer_stats} On every open/close one struct seq_operations leaks. Kudos to /proc/slab_allocators. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:40 -07:00
Randy Dunlap	421cee2935	sched: fix kernel-doc warnings Fix kernel-doc warnings in sched.c: Warning(linux-2623-rc1g4//kernel/sched.c:1685): No description found for parameter 'notifier' Warning(linux-2623-rc1g4//kernel/sched.c:1696): No description found for parameter 'notifier' Warning(linux-2623-rc1g4//kernel/sched.c:1750): No description found for parameter 'prev' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:38 -07:00
Greg Kroah-Hartman	74c5b597e9	modules: better error messages when modules fail to load due to a sysfs problem. This helps people when debugging problems like the ones that were in the recent -mm releases. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-30 14:25:23 -07:00
Len Brown	673d5b43da	ACPI: restore CONFIG_ACPI_SLEEP Restore the 2.6.22 CONFIG_ACPI_SLEEP build option, but now shadowing the new CONFIG_PM_SLEEP option. Signed-off-by: Len Brown <len.brown@intel.com> [ Modified to work with the PM config setup changes. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-29 16:53:59 -07:00
Rafael J. Wysocki	296699de6b	Introduce CONFIG_SUSPEND for suspend-to-Ram and standby Introduce CONFIG_SUSPEND representing the ability to enter system sleep states, such as the ACPI S3 state, and allow the user to choose SUSPEND and HIBERNATION independently of each other. Make HOTPLUG_CPU be selected automatically if SUSPEND or HIBERNATION has been chosen and the kernel is intended for SMP systems. Also, introduce CONFIG_PM_SLEEP which is automatically selected if CONFIG_SUSPEND or CONFIG_HIBERNATION is set and use it to select the code needed for both suspend and hibernation. The top-level power management headers and the ACPI code related to suspend and hibernation are modified to use the new definitions (the changes in drivers/acpi/sleep/main.c are, mostly, moving code to reduce the number of ifdefs). There are many other files in which CONFIG_PM can be replaced with CONFIG_PM_SLEEP or even with CONFIG_SUSPEND, but they can be updated in the future. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-29 16:45:38 -07:00
Rafael J. Wysocki	b0cb1a19d0	Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION to avoid confusion (among other things, with CONFIG_SUSPEND introduced in the next patch). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-29 16:45:38 -07:00
Peter Zijlstra	040b3a2df2	audit: fix two bugs in the new execve audit code copy_from_user() returns the number of bytes not copied, hence 0 is the expected output. axi->mm might not be valid anymore when not equal to current->mm, do not dereference before checking that - thanks to Al for spotting that. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-28 19:42:22 -07:00
Al Viro	0af3678f7c	rip some includes from linux/interrupt.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-28 19:42:22 -07:00
Linus Torvalds	257f49251c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: [PATCH] sched: debug feature - make the sched-domains tree runtime-tweakable [PATCH] sched: add above_background_load() function [PATCH] sched: update Documentation/sched-stats.txt [PATCH] sched: mark sysrq_sched_debug_show() static [PATCH] sched: make cpu_clock() not use the rq clock [PATCH] sched: remove unused rq->load_balance_class [PATCH] sched: arch preempt notifier mechanism [PATCH] sched: increase SCHED_LOAD_SCALE_FUZZ	2007-07-26 13:59:59 -07:00
Rafael J. Wysocki	58b3b71dfa	Fix ThinkPad T42 poweroff failure introduced by by "PM: Introduce pm_power_off_prepare" Commit `bd804eba1c` ("PM: Introduce pm_power_off_prepare") caused problems in the poweroff path, as reported by YOSHIFUJI Hideaki / 吉藤英明. Generally, sysdev_shutdown() should be called after the ACPI preparation for powering the system off. To make it happen, we can separate sysdev_shutdown() from device_shutdown() and call it directly wherever necessary. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Tested-by: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 12:13:06 -07:00
Randy Dunlap	61df47c8da	kernel-doc fix for kmod.c Fix kmod.c: Warning(linux-2.6.23-rc1//kernel/kmod.c:364): No description found for parameter 'envp' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:33:06 -07:00
Nick Piggin	e692ab5347	[PATCH] sched: debug feature - make the sched-domains tree runtime-tweakable debugging feature: make the sched-domains tree runtime-tweakable. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ mingo@elte.hu: made it depend on CONFIG_SCHED_DEBUG & small updates ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-07-26 13:40:43 +02:00
Josh Triplett	f337346193	[PATCH] sched: mark sysrq_sched_debug_show() static Only sched.c uses sysrq_sched_debug_show, and sched.c includes sched_debug.c, so all uses of sysrq_sched_debug_show occur in the same source file. Eliminates a sparse warning: warning: symbol 'sysrq_sched_debug_show' was not declared. Should it be static? Signed-off-by: Josh Triplett <josh@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-07-26 13:40:43 +02:00
Ingo Molnar	2cd4d0ea19	[PATCH] sched: make cpu_clock() not use the rq clock it is enough to disable interrupts to get the precise rq-clock of the local CPU. this also solves an NMI watchdog regression: the NMI watchdog calls touch_softlockup_watchdog(), which might deadlock on rq->lock if the NMI hits an rq-locked critical section. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-07-26 13:40:43 +02:00
Satoru Takeuchi	018a221295	[PATCH] sched: remove unused rq->load_balance_class Remove unused rq->load_balance_class. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-07-26 13:40:43 +02:00
Avi Kivity	e107be36ef	[PATCH] sched: arch preempt notifier mechanism This adds a general mechanism whereby a task can request the scheduler to notify it whenever it is preempted or scheduled back in. This allows the task to swap any special-purpose registers like the fpu or Intel's VT registers. Signed-off-by: Avi Kivity <avi@qumranet.com> [ mingo@elte.hu: fixes, cleanups ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-07-26 13:40:43 +02:00
Linus Torvalds	a4fb2122f1	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: ACPI: Kconfig: remove CONFIG_ACPI_SLEEP from source ACPI: quiet ACPI Exceptions due to no _PTC or _TSS ACPI: Remove references to ACPI_STATE_S2 from acpi_pm_enter ACPI: Kconfig: always enable CONFIG_ACPI_SLEEP on X86 ACPI: Kconfig: fold /proc/acpi/sleep under CONFIG_ACPI_PROCFS ACPI: Kconfig: CONFIG_ACPI_PROCFS now defaults to N ACPI: autoload modules - Create __mod_acpi_device_table symbol for all ACPI drivers ACPI: autoload modules - Create ACPI alias interface ACPI: autoload modules - ACPICA modifications ACPI: asus-laptop: Fix failure exits ACPI: fix oops due to typo in new throttling code ACPI: ignore _PSx method for hotplugable PCI devices ACPI: Use ACPI methods to select PCI device suspend state ACPI, PNP: hook ACPI D-state to PNP suspend/resume ACPI: Add acpi_pm_device_sleep_state helper routine ACPI: Implement the set_target() callback from pm_ops	2007-07-25 11:28:00 -07:00
john stultz	17c38b7490	Cache xtime every call to update_wall_time This avoids xtime lag seen with dynticks, because while 'xtime' itself is still not updated often, we keep a 'xtime_cache' variable around that contains the approximate real-time that _is_ updated each time we do a 'update_wall_time()', and is thus never off by more than one tick. IOW, this restores the original semantics for 'xtime' users, as long as you use the proper abstraction functions (ie 'current_kernel_time()' or 'get_seconds()' depending on whether you want a timespec or just the seconds field). [ Updated Patch. As penance for my sins I've also yanked another #ifdef that was added to avoid the xtime lag w/ hrtimers. ] Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-25 10:17:44 -07:00
john stultz	2c6b47de17	Cleanup non-arch xtime uses, use get_seconds() or current_kernel_time(). This avoids use of the kernel-internal "xtime" variable directly outside of the actual time-related functions. Instead, use the helper functions that we already have available to us. This doesn't actually change any behaviour, but this will allow us to fix the fact that "xtime" isn't updated very often with CONFIG_NO_HZ (because much of the realtime information is maintained as separate offsets to 'xtime'), which has caused interfaces that use xtime directly to get a time that is out of sync with the real-time clock by up to a third of a second or so. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-25 10:09:20 -07:00
Len Brown	e8b2fd0122	ACPI: Kconfig: remove CONFIG_ACPI_SLEEP from source As it was a synonym for (CONFIG_ACPI && CONFIG_X86), the ifdefs for it were more clutter than they were worth. For ia64, just add a few stubs in anticipation of future S3 or S4 support. Signed-off-by: Len Brown <len.brown@intel.com>	2007-07-25 01:29:39 -04:00
Linus Torvalds	dd6ccfe64d	Merge branch 'audit.b39' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b39' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: [PATCH] get rid of AVC_PATH postponed treatment [PATCH] allow audit filtering on bit & operations [PATCH] audit: fix broken class-based syscall audit [PATCH] Make IPC mode consistent	2007-07-22 11:18:20 -07:00
Masoud Asgharifard Sharbiani	abd4f7505b	x86: i386-show-unhandled-signals-v3 This patch makes the i386 behave the same way that x86_64 does when a segfault happens. A line gets printed to the kernel log so that tools that need to check for failures can behave more uniformly between debug.show_unhandled_signals sysctl variable to 0 (or by doing echo 0 > /proc/sys/debug/exception-trace) Also, all of the lines being printed are now using printk_ratelimit() to deny the ability of DoS from a local user with a program like the following: main() { while (1) if (!fork()) (int )0 = 0; } This new revision also includes the fix that Andrew did which got rid of new sysctl that was added to the system in earlier versions of this. Also, 'show-unhandled-signals' sysctl has been renamed back to the old 'exception-trace' to avoid breakage of people's scripts. AK: Enabling by default for i386 will be likely controversal, but let's see what happens AK: Really folks, before complaining just fix your segfaults AK: I bet this will find a lot of silent issues Signed-off-by: Masoud Sharbiani <masouds@google.com> Signed-off-by: Andi Kleen <ak@suse.de> [ Personally, I've found the complaints useful on x86-64, so I'm all for this. That said, I wonder if we could do it more prettily.. -Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-22 11:03:37 -07:00
Al Viro	4259fa01a2	[PATCH] get rid of AVC_PATH postponed treatment Selinux folks had been complaining about the lack of AVC_PATH records when audit is disabled. I must admit my stupidity - I assumed that avc_audit() really couldn't use audit_log_d_path() because of deadlocks (== could be called with dcache_lock or vfsmount_lock held). Shouldn't have made that assumption - it never gets called that way. It _is_ called under spinlocks, but not those. Since audit_log_d_path() uses ab->gfp_mask for allocations, kmalloc() in there is not a problem. IOW, the simple fix is sufficient: let's rip AUDIT_AVC_PATH out and simply generate pathname as part of main record. It's trivial to do. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: James Morris <jmorris@namei.org>	2007-07-22 09:57:02 -04:00
Eric Paris	74f2345b6b	[PATCH] allow audit filtering on bit & operations Right now the audit filter can match on = != > < >= blah blah blah. This allow the filter to also look at bitwise AND operations, & Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-07-22 09:57:02 -04:00
Klaus Weidner	c926e4f432	[PATCH] audit: fix broken class-based syscall audit The sanity check in audit_match_class() is wrong. We are able to audit 2048 syscalls but in audit_match_class() we were accidentally using sizeof(_u32) instead of number of bits in _u32 when deciding how many syscalls were valid. On ia64 in particular we were hitting syscall numbers over the (wrong) limit of 256. Fixing the audit_match_class check takes care of the problem. Signed-off-by: Klaus Weidner <klaus@atsec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-07-22 09:57:02 -04:00
Steve Grubb	5b9a426223	[PATCH] Make IPC mode consistent The mode fields for IPC records are not consistent. Some are hex, others are octal. This patch makes them all octal. Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-07-22 09:57:02 -04:00

... 2 3 4 5 6 ...

2702 Commits