This patch changes attach_task_by_pid() to take a u64 rather than a
string; as a result it can be called directly as a control groups
write_u64 handler, and cgroup_common_file_write() can be removed.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch moves the write handler for the cgroups notify_on_release
file into a separate handler. This handler requires no cgroups locking
since it relies on atomic bitops for synchronization.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch contains cleanups suggested by reviewers for the recent
write_string() patchset:
- pair cgroup_lock_live_group() with cgroup_unlock() in cgroup.c for
clarity, rather than directly unlocking cgroup_mutex.
- make the return type of cgroup_lock_live_group() a bool
- use a #define'd constant for the local buffer size in read/write functions
Signed-off-by: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adds cgroup_release_agent_write() and cgroup_release_agent_show()
methods to handle writing/reading the path to a cgroup hierarchy's
release agent. As a result, cgroup_common_file_read() is now unnecessary.
As part of the change, a previously-tolerated race in
cgroup_release_agent() is avoided by copying the current
release_agent_path prior to calling call_usermode_helper().
Signed-off-by: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds a write_string() method for cgroups control files. The
semantics are that a buffer is copied from userspace to kernelspace
and the handler function invoked on that buffer. The buffer is
guaranteed to be nul-terminated, and no longer than max_write_len
(defaulting to 64 bytes if unspecified). Later patches will convert
existing raw file write handlers in control group subsystems to use
this method.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Balbir Singh <balbir@in.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- need_forkexit_callback will be read only after system boot.
- use_task_css_set_links will be read only after it's set.
And these 2 variables are checked when a new process is forked.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
--------------------------
while() {
list_entry();
...
}
--------------------------
is equivalent to following code.
--------------------------
list_for_each_entry(){
...
}
--------------------------
later can review easily more.
this patch is just clean up.
it doesn't have any behavor change.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The function does not modify anything (except the temporary css template), so
it's sufficient to hold read lock.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I noticed that there's a CONFIG_KPROBES check inside kernel/kprobes.c,
which is redundant.
Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently list of kretprobe instances are stored in kretprobe object (as
used_instances,free_instances) and in kretprobe hash table. We have one
global kretprobe lock to serialise the access to these lists. This causes
only one kretprobe handler to execute at a time. Hence affects system
performance, particularly on SMP systems and when return probe is set on
lot of functions (like on all systemcalls).
Solution proposed here gives fine-grain locks that performs better on SMP
system compared to present kretprobe implementation.
Solution:
1) Instead of having one global lock to protect kretprobe instances
present in kretprobe object and kretprobe hash table. We will have
two locks, one lock for protecting kretprobe hash table and another
lock for kretporbe object.
2) We hold lock present in kretprobe object while we modify kretprobe
instance in kretprobe object and we hold per-hash-list lock while
modifying kretprobe instances present in that hash list. To prevent
deadlock, we never grab a per-hash-list lock while holding a kretprobe
lock.
3) We can remove used_instances from struct kretprobe, as we can
track used instances of kretprobe instances using kretprobe hash
table.
Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
with return probes set on all systemcalls looks like this.
cacheline non-cacheline Un-patched kernel
aligned patch aligned patch
===============================================================================
real 9m46.784s 9m54.412s 10m2.450s
user 40m5.715s 40m7.142s 40m4.273s
sys 2m57.754s 2m58.583s 3m17.430s
===========================================================
Time duration for kernel compilation ("make -j 8) on the same system, when
kernel is not probed.
=========================
real 9m26.389s
user 40m8.775s
sys 2m7.283s
=========================
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All ratelimit user use same jiffies and burst params, so some messages
(callbacks) will be lost.
For example:
a call printk_ratelimit(5 * HZ, 1)
b call printk_ratelimit(5 * HZ, 1) before the 5*HZ timeout of a, then b will
will be supressed.
- rewrite __ratelimit, and use a ratelimit_state as parameter. Thanks for
hints from andrew.
- Add WARN_ON_RATELIMIT, update rcupreempt.h
- remove __printk_ratelimit
- use __ratelimit in net_ratelimit
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace a printk+WARN_ON() by a WARN(); this increases the chance of the
string making it into the bugreport (ie: it goes inside the
---[ cut here ]--- section)
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a WARN() macro that acts like WARN_ON(), with the added feature that it
takes a printk like argument that is printed as part of the warning message.
[akpm@linux-foundation.org: fix printk arguments]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We duplicate alloc/free_thread_info defines on many platforms (the
majority uses __get_free_pages/free_pages). This patch defines common
defines and removes these duplicated defines.
__HAVE_ARCH_THREAD_INFO_ALLOCATOR is introduced for platforms that do
something different.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Presently call_usermodehelper_setup() uses GFP_ATOMIC. but it can return
NULL _very_ easily.
GFP_ATOMIC is needed only when we can't sleep. and, GFP_KERNEL is robust
and better.
thus, I add gfp_mask argument to call_usermodehelper_setup().
So, its callers pass the gfp_t as below:
call_usermodehelper() and call_usermodehelper_keys():
depend on 'wait' argument.
call_usermodehelper_pipe():
always GFP_KERNEL because always run under process context.
orderly_poweroff():
pass to GFP_ATOMIC because may run under interrupt context.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: "Paul Menage" <menage@google.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Build kernel/profile.o only if CONFIG_PROFILING is enabled.
This makes CONFIG_PROFILING=n kernels smaller.
As a bonus, some profile_tick() calls and one branch from schedule() are
now eliminated with CONFIG_PROFILING=n (but I doubt these are
measurable effects).
This patch changes the effects of CONFIG_PROFILING=n, but I don't think
having more than two choices would be the better choice.
This patch also adds the name of the first parameter to the prototypes
of profile_{hits,tick}() since I anyway had to add them for the dummy
functions.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This will probably never trigger... but it won't hurt to be careful.
http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Joshua Bloch <jjb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
nohz: adjust tick_nohz_stop_sched_tick() call of s390 as well
nohz: prevent tick stop outside of the idle loop
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
arch/mips/kernel/stacktrace.c: Heiko can't type
kthread: reduce stack pressure in create_kthread and kthreadd
fix core/stacktrace changes on avr32, mips, sh
This patch adds O_NONBLOCK support to pipe2. It is minimally more involved
than the patches for eventfd et.al but still trivial. The interfaces of the
create_write_pipe and create_read_pipe helper functions were changed and the
one other caller as well.
The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#ifndef __NR_pipe2
# ifdef __x86_64__
# define __NR_pipe2 293
# elif defined __i386__
# define __NR_pipe2 331
# else
# error "need __NR_pipe2"
# endif
#endif
int
main (void)
{
int fds[2];
if (syscall (__NR_pipe2, fds, 0) == -1)
{
puts ("pipe2(0) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
int fl = fcntl (fds[i], F_GETFL);
if (fl == -1)
{
puts ("fcntl failed");
return 1;
}
if (fl & O_NONBLOCK)
{
printf ("pipe2(0) set non-blocking mode for fds[%d]\n", i);
return 1;
}
close (fds[i]);
}
if (syscall (__NR_pipe2, fds, O_NONBLOCK) == -1)
{
puts ("pipe2(O_NONBLOCK) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
int fl = fcntl (fds[i], F_GETFL);
if (fl == -1)
{
puts ("fcntl failed");
return 1;
}
if ((fl & O_NONBLOCK) == 0)
{
printf ("pipe2(O_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i);
return 1;
}
close (fds[i]);
}
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch introduces the new syscall inotify_init1 (note: the 1 stands for
the one parameter the syscall takes, as opposed to no parameter before). The
values accepted for this parameter are function-specific and defined in the
inotify.h header. Here the values must match the O_* flags, though. In this
patch CLOEXEC support is introduced.
The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#ifndef __NR_inotify_init1
# ifdef __x86_64__
# define __NR_inotify_init1 294
# elif defined __i386__
# define __NR_inotify_init1 332
# else
# error "need __NR_inotify_init1"
# endif
#endif
#define IN_CLOEXEC O_CLOEXEC
int
main (void)
{
int fd;
fd = syscall (__NR_inotify_init1, 0);
if (fd == -1)
{
puts ("inotify_init1(0) failed");
return 1;
}
int coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
puts ("inotify_init1(0) set close-on-exit");
return 1;
}
close (fd);
fd = syscall (__NR_inotify_init1, IN_CLOEXEC);
if (fd == -1)
{
puts ("inotify_init1(IN_CLOEXEC) failed");
return 1;
}
coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
puts ("inotify_init1(O_CLOEXEC) does not set close-on-exit");
return 1;
}
close (fd);
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[akpm@linux-foundation.org: add sys_ni stub]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds the new eventfd2 syscall. It extends the old eventfd
syscall by one parameter which is meant to hold a flag value. In this
patch the only flag support is EFD_CLOEXEC which causes the close-on-exec
flag for the returned file descriptor to be set.
A new name EFD_CLOEXEC is introduced which in this implementation must
have the same value as O_CLOEXEC.
The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#ifndef __NR_eventfd2
# ifdef __x86_64__
# define __NR_eventfd2 290
# elif defined __i386__
# define __NR_eventfd2 328
# else
# error "need __NR_eventfd2"
# endif
#endif
#define EFD_CLOEXEC O_CLOEXEC
int
main (void)
{
int fd = syscall (__NR_eventfd2, 1, 0);
if (fd == -1)
{
puts ("eventfd2(0) failed");
return 1;
}
int coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
puts ("eventfd2(0) sets close-on-exec flag");
return 1;
}
close (fd);
fd = syscall (__NR_eventfd2, 1, EFD_CLOEXEC);
if (fd == -1)
{
puts ("eventfd2(EFD_CLOEXEC) failed");
return 1;
}
coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
puts ("eventfd2(EFD_CLOEXEC) does not set close-on-exec flag");
return 1;
}
close (fd);
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[akpm@linux-foundation.org: add sys_ni stub]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds the new signalfd4 syscall. It extends the old signalfd
syscall by one parameter which is meant to hold a flag value. In this
patch the only flag support is SFD_CLOEXEC which causes the close-on-exec
flag for the returned file descriptor to be set.
A new name SFD_CLOEXEC is introduced which in this implementation must
have the same value as O_CLOEXEC.
The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#ifndef __NR_signalfd4
# ifdef __x86_64__
# define __NR_signalfd4 289
# elif defined __i386__
# define __NR_signalfd4 327
# else
# error "need __NR_signalfd4"
# endif
#endif
#define SFD_CLOEXEC O_CLOEXEC
int
main (void)
{
sigset_t ss;
sigemptyset (&ss);
sigaddset (&ss, SIGUSR1);
int fd = syscall (__NR_signalfd4, -1, &ss, 8, 0);
if (fd == -1)
{
puts ("signalfd4(0) failed");
return 1;
}
int coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
puts ("signalfd4(0) set close-on-exec flag");
return 1;
}
close (fd);
fd = syscall (__NR_signalfd4, -1, &ss, 8, SFD_CLOEXEC);
if (fd == -1)
{
puts ("signalfd4(SFD_CLOEXEC) failed");
return 1;
}
coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
puts ("signalfd4(SFD_CLOEXEC) does not set close-on-exec flag");
return 1;
}
close (fd);
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[akpm@linux-foundation.org: add sys_ni stub]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is by far the most complex in the series. It adds a new syscall
paccept. This syscall differs from accept in that it adds (at the userlevel)
two additional parameters:
- a signal mask
- a flags value
The flags parameter can be used to set flag like SOCK_CLOEXEC. This is
imlpemented here as well. Some people argued that this is a property which
should be inherited from the file desriptor for the server but this is against
POSIX. Additionally, we really want the signal mask parameter as well
(similar to pselect, ppoll, etc). So an interface change in inevitable.
The flag value is the same as for socket and socketpair. I think diverging
here will only create confusion. Similar to the filesystem interfaces where
the use of the O_* constants differs, it is acceptable here.
The signal mask is handled as for pselect etc. The mask is temporarily
installed for the thread and removed before the call returns. I modeled the
code after pselect. If there is a problem it's likely also in pselect.
For architectures which use socketcall I maintained this interface instead of
adding a system call. The symmetry shouldn't be broken.
The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <sys/syscall.h>
#ifndef __NR_paccept
# ifdef __x86_64__
# define __NR_paccept 288
# elif defined __i386__
# define SYS_PACCEPT 18
# define USE_SOCKETCALL 1
# else
# error "need __NR_paccept"
# endif
#endif
#ifdef USE_SOCKETCALL
# define paccept(fd, addr, addrlen, mask, flags) \
({ long args[6] = { \
(long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \
syscall (__NR_socketcall, SYS_PACCEPT, args); })
#else
# define paccept(fd, addr, addrlen, mask, flags) \
syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags)
#endif
#define PORT 57392
#define SOCK_CLOEXEC O_CLOEXEC
static pthread_barrier_t b;
static void *
tf (void *arg)
{
pthread_barrier_wait (&b);
int s = socket (AF_INET, SOCK_STREAM, 0);
struct sockaddr_in sin;
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
sin.sin_port = htons (PORT);
connect (s, (const struct sockaddr *) &sin, sizeof (sin));
close (s);
pthread_barrier_wait (&b);
s = socket (AF_INET, SOCK_STREAM, 0);
sin.sin_port = htons (PORT);
connect (s, (const struct sockaddr *) &sin, sizeof (sin));
close (s);
pthread_barrier_wait (&b);
pthread_barrier_wait (&b);
sleep (2);
pthread_kill ((pthread_t) arg, SIGUSR1);
return NULL;
}
static void
handler (int s)
{
}
int
main (void)
{
pthread_barrier_init (&b, NULL, 2);
struct sockaddr_in sin;
pthread_t th;
if (pthread_create (&th, NULL, tf, (void *) pthread_self ()) != 0)
{
puts ("pthread_create failed");
return 1;
}
int s = socket (AF_INET, SOCK_STREAM, 0);
int reuse = 1;
setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse));
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
sin.sin_port = htons (PORT);
bind (s, (struct sockaddr *) &sin, sizeof (sin));
listen (s, SOMAXCONN);
pthread_barrier_wait (&b);
int s2 = paccept (s, NULL, 0, NULL, 0);
if (s2 < 0)
{
puts ("paccept(0) failed");
return 1;
}
int coe = fcntl (s2, F_GETFD);
if (coe & FD_CLOEXEC)
{
puts ("paccept(0) set close-on-exec-flag");
return 1;
}
close (s2);
pthread_barrier_wait (&b);
s2 = paccept (s, NULL, 0, NULL, SOCK_CLOEXEC);
if (s2 < 0)
{
puts ("paccept(SOCK_CLOEXEC) failed");
return 1;
}
coe = fcntl (s2, F_GETFD);
if ((coe & FD_CLOEXEC) == 0)
{
puts ("paccept(SOCK_CLOEXEC) does not set close-on-exec flag");
return 1;
}
close (s2);
pthread_barrier_wait (&b);
struct sigaction sa;
sa.sa_handler = handler;
sa.sa_flags = 0;
sigemptyset (&sa.sa_mask);
sigaction (SIGUSR1, &sa, NULL);
sigset_t ss;
pthread_sigmask (SIG_SETMASK, NULL, &ss);
sigaddset (&ss, SIGUSR1);
pthread_sigmask (SIG_SETMASK, &ss, NULL);
sigdelset (&ss, SIGUSR1);
alarm (4);
pthread_barrier_wait (&b);
errno = 0 ;
s2 = paccept (s, NULL, 0, &ss, 0);
if (s2 != -1 || errno != EINTR)
{
puts ("paccept did not fail with EINTR");
return 1;
}
close (s);
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[akpm@linux-foundation.org: make it compile]
[akpm@linux-foundation.org: add sys_ni stub]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Roland McGrath <roland@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
set_type returns an int indicating success or failure, but up to now
setup_irq ignores that.
In my case this resulted in a machine hang:
gpio-keys requested IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING, but
arm/ns9xxx can only trigger on one direction so set_type didn't touch
the configuration which happens do default to a level sensitiveness and
returned -EINVAL. setup_irq ignored that and unmasked the irq. This
resulted in an endless triggering of the gpio-key interrupt service
routine which effectively killed the machine.
With this patch applied setup_irq propagates the error to the caller.
Note that before in the case
chip && !chip->set_type && !chip->name
a NULL pointer was feed to printk. This is fixed, too.
Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix try_to_freeze_tasks()'s use of do_div() on an s64 by making
elapsed_csecs64 a u64 instead and dividing that.
Possibly this should be guarded lest the interval calculation turn up
negative, but the possible negativity of the result of the division is
cast away anyway.
This was introduced by patch 438e2ce68d.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
schedule sysrq poweroff on boot cpu.
sysrq poweroff needs to disable nonboot cpus, and we need to run this on boot
cpu to avoid any recursion. http://bugzilla.kernel.org/show_bug.cgi?id=10897
[kosaki.motohiro@jp.fujitsu.com: build fix]
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Rus <harbour@sfinx.od.ua>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This interface allows adding a job on a specific cpu.
Although a work struct on a cpu will be scheduled to other cpu if the cpu
dies, there is a recursion if a work task tries to offline the cpu it's
running on. we need to schedule the task to a specific cpu in this case.
http://bugzilla.kernel.org/show_bug.cgi?id=10897
[oleg@tv-sign.ru: cleanups]
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Rus <harbour@sfinx.od.ua>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch simplifies the memory bitmap manipulations.
- remove the member size in struct bm_block
It is not necessary for struct bm_block to have the number of bit chunks that
can be calculated by using end_pfn and start_pfn.
- use find_next_bit() for memory_bm_next_pfn
No need to invent the bitmap library only for the memory bitmap.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Boot-time test for system suspend states (STR or standby). The generic
RTC framework triggers wakeup alarms, which are used to exit those states.
- Measures some aspects of suspend time ... this uses "jiffies" until
someone converts it to use a timebase that works properly even while
timer IRQs are disabled.
- Triggered by a command line parameter. By default nothing even
vaguely troublesome will happen, but "test_suspend=mem" will give
you a brief STR test during system boot. (Or you may need to use
"test_suspend=standby" instead, if your hardware needs that.)
This isn't without problems. It fires early enough during boot that for
example both PCMCIA and MMC stacks have misbehaved. The workaround in
those cases was to boot without such media cards inserted.
[matthltc@us.ibm.com: fix compile failure in boot time suspend selftest]
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pavel Machek <pavel@suse.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tell the user about the no_console_suspend option, so that we don't have to
tell each bug reporter personally.
[akpm@linux-foundation.org: clarify the text a little]
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To date, we've tried hard to confine filesystem support for capabilities
to the security modules. This has left a lot of the code in
kernel/capability.c in a state where it looks like it supports something
that filesystem support for capabilities actually suppresses when the LSM
security/commmoncap.c code runs. What is left is a lot of code that uses
sub-optimal locking in the main kernel
With this change we refactor the main kernel code and make it explicit
which locks are needed and that the only remaining kernel races in this
area are associated with non-filesystem capability code.
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add basic support for more than one hstate in hugetlbfs. This is the key
to supporting multiple hugetlbfs page sizes at once.
- Rather than a single hstate, we now have an array, with an iterator
- default_hstate continues to be the struct hstate which we use by default
- Add functions for architectures to register new hstates
[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Adam Litke <agl@us.ibm.com>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch reserves huge pages at mmap() time for MAP_PRIVATE mappings in
a similar manner to the reservations taken for MAP_SHARED mappings. The
reserve count is accounted both globally and on a per-VMA basis for
private mappings. This guarantees that a process that successfully calls
mmap() will successfully fault all pages in the future unless fork() is
called.
The characteristics of private mappings of hugetlbfs files behaviour after
this patch are;
1. The process calling mmap() is guaranteed to succeed all future faults until
it forks().
2. On fork(), the parent may die due to SIGKILL on writes to the private
mapping if enough pages are not available for the COW. For reasonably
reliable behaviour in the face of a small huge page pool, children of
hugepage-aware processes should not reference the mappings; such as
might occur when fork()ing to exec().
3. On fork(), the child VMAs inherit no reserves. Reads on pages already
faulted by the parent will succeed. Successful writes will depend on enough
huge pages being free in the pool.
4. Quotas of the hugetlbfs mount are checked at reserve time for the mapper
and at fault time otherwise.
Before this patch, all reads or writes in the child potentially needs page
allocations that can later lead to the death of the parent. This applies
to reads and writes of uninstantiated pages as well as COW. After the
patch it is only a write to an instantiated page that causes problems.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds proper extern declarations for five variables in
include/linux/vmstat.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: hrtick_enabled() should use cpu_active()
sched, x86: clean up hrtick implementation
sched: fix build error, provide partition_sched_domains() unconditionally
sched: fix warning in inc_rt_tasks() to not declare variable 'rq' if it's not needed
cpu hotplug: Make cpu_active_map synchronization dependency clear
cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)
sched: rework of "prioritize non-migratable tasks over migratable ones"
sched: reduce stack size in isolated_cpu_setup()
Revert parts of "ftrace: do not trace scheduler functions"
Fixed up conflicts in include/asm-x86/thread_info.h (due to the
TIF_SINGLESTEP unification vs TIF_HRTICK_RESCHED removal) and
kernel/sched_fair.c (due to cpu_active_map vs for_each_cpu_mask_nr()
introduction).
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
NR_CPUS: Replace NR_CPUS in speedstep-centrino.c
cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP
NR_CPUS: Replace NR_CPUS in cpufreq userspace routines
NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix
cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target
cpumask: Provide a generic set of CPUMASK_ALLOC macros
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c
cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c
cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c
cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr
Revert "cpumask: introduce new APIs"
cpumask: make for_each_cpu_mask a bit smaller
net: Pass reference to cpumask variable in net/sunrpc/svc.c
...
Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually
* 'core/softlockup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
softlockup: fix invalid proc_handler for softlockup_panic
softlockup: fix watchdog task wakeup frequency
softlockup: fix watchdog task wakeup frequency
softlockup: show irqtrace
softlockup: print a module list on being stuck
softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression
softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds
softlockup: fix softlockup_thresh fix
softlockup: fix softlockup_thresh unaligned access and disable detection at runtime
softlockup: allow panic on lockup
This adds a fast path for 64-bit syscall entry and exit when
TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing.
This path does not need to save and restore all registers as
the general case of tracing does. Avoiding the iret return path
when syscall audit is enabled helps performance a lot.
Signed-off-by: Roland McGrath <roland@redhat.com>
Since 15a647eba9 set_irq_wake returned -ENXIO
if another device had it already enabled. Zero is the right value to
return in this case. Moreover the change to desc->status was not reverted
if desc->chip->set_wake returned an error.
Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton reported this s390 allmodconfig build failure:
kernel/built-in.o: In function `hrtick_start_fair':
sched.c:(.text+0x69c6): undefined reference to `__smp_call_function_single'
the reason is that s390 is not a generic-ipi SMP platform yet, while
the hrtick code relies on it. Fix the dependency.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
remove CONFIG_KMOD from core kernel code
remove CONFIG_KMOD from lib
remove CONFIG_KMOD from sparc64
rework try_then_request_module to do less in non-modular kernels
remove mention of CONFIG_KMOD from documentation
make CONFIG_KMOD invisible
modules: Take a shortcut for checking if an address is in a module
module: turn longs into ints for module sizes
Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefs
module: reorder struct module to save space on 64 bit builds
module: generic each_symbol iterator function
module: don't use stop_machine for waiting rmmod
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (79 commits)
arm: bus_id -> dev_name() and dev_set_name() conversions
sparc64: fix up bus_id changes in sparc core code
3c59x: handle pci_name() being const
MTD: handle pci_name() being const
HP iLO driver
sysdev: Convert the x86 mce tolerant sysdev attribute to generic attribute
sysdev: Add utility functions for simple int/ulong variable sysdev attributes
sysdev: Pass the attribute to the low level sysdev show/store function
driver core: Suppress sysfs warnings for device_rename().
kobject: Transmit return value of call_usermodehelper() to caller
sysfs-rules.txt: reword API stability statement
debugfs: Implement debugfs_remove_recursive()
HOWTO: change email addresses of James in HOWTO
always enable FW_LOADER unless EMBEDDED=y
uio-howto.tmpl: use unique output names
uio-howto.tmpl: use standard copyright/legal markings
sysfs: don't call notify_change
sysdev: fix debugging statements in registration code.
kobject: should use kobject_put() in kset-example
kobject: reorder kobject to save space on 64 bit builds
...
Always compile request_module when the kernel allows modules.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>