Commit Graph

9227 Commits

Author SHA1 Message Date
Bartlomiej Zolnierkiewicz
8ac2b42a45 ide: add IDE_HFLAG_CLEAR_SIMPLEX host flag
* Rename 'simplex_stat' variable to 'dma_stat' in ide_get_or_set_dma_base().

* Factor out code for forcing host out of "simplex" mode from
  ide_get_or_set_dma_base() to ide_pci_clear_simplex() helper.

* Add IDE_HFLAG_CLEAR_SIMPLEX host flag and set it in alim15x3 (for M5229),
  amd74xx (for AMD 7409), cmd64x (for CMD643), generic (for Netcell) and
  serverworks (for CSB5) host drivers.

* Make ide_get_or_set_dma_base() test for IDE_HFLAG_CLEAR_SIMPLEX host flag
  instead of checking dev->device (BTW the code was buggy because it didn't
  check for dev->vendor, luckily none of these PCI Device IDs was used by
  some other vendor for PCI IDE controller).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-02-01 23:09:30 +01:00
Sergei Shtylyov
ecf3279639 ide: ide_setup_dma() assumes 8 ports
According to http://marc.info/?l=linux-ide&m=114346138611631, the drivers must
always register 8 DMA ports with ide_setup_dma(), so its last argument is not
needed. While at it, kill some useless parens in that function...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-02-01 23:09:30 +01:00
Bartlomiej Zolnierkiewicz
7b9f25b539 ide: add ide_dump_identify() debug helper
* Add ide_dump_identify() debug helper for dumping raw identify data in
  the hdparm friendly format (== the identify data can be extracted from
  dmesg output and passed to hdparm --Istdin).

* Dump identify data in ide-probe.c::do_identify() if DEBUG is enabled.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-02-01 23:09:28 +01:00
Bartlomiej Zolnierkiewicz
a1bb9457f0 ide-cd: move lba_to_msf() and msf_to_lba() to <linux/cdrom.h>
* Move lba_to_msf() and msf_to_lba() to <linux/cdrom.h>
  (use 'u8' type instead of 'byte' while at it).

* Remove msf_to_lba() copy from drivers/cdrom/cdrom.c.

Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-02-01 23:09:24 +01:00
Adrian Bunk
da6f4c7f6f ide: make wait_drive_not_busy() static again
After commit 7267c33774 
wait_drive_not_busy() can become static again.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-02-01 23:09:16 +01:00
Adrian Bunk
2eae6ebbf9 ide: small ide-scan-pci.c cleanup
- ide_scan_pcibus() can become static
- instead of ide_scan_pci() we can use ide_scan_pcibus() directly
  in module_init()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-02-01 23:09:16 +01:00
Jeff Layton
0113ab3464 SUNRPC: spin svc_rqst initialization to its own function
Move the initialzation in __svc_create_thread that happens prior to
thread creation to a new function. Export the function to allow
services to have better control over the svc_rqst structs.

Also rearrange the rqstp initialization to prevent NULL pointer
dereferences in svc_exit_thread in case allocations fail.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:15 -05:00
Tom Tucker
d21b05f101 rdma: SVCRMDA Header File
This file defines the data types used by the SVCRDMA transport module.
The principle data structure is the transport specific extension to
the svcxprt structure.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:14 -05:00
Tom Tucker
9571af18fa svc: Add svc_xprt_names service to replace svc_sock_names
Create a transport independent version of the svc_sock_names function.

The toclose capability of the svc_sock_names service can be implemented
using the svc_xprt_find and svc_xprt_close services.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:14 -05:00
Tom Tucker
7fcb98d58c svc: Add svc API that queries for a transport instance
Add a new svc function that allows a service to query whether a
transport instance has already been created. This is used in lockd
to determine whether or not a transport needs to be created when
a lockd instance is brought up.

Specifying 0 for the address family or port is effectively a wild-card,
and will result in matching the first transport in the service's list
that has a matching class name.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:13 -05:00
Tom Tucker
dc9a16e49d svc: Add /proc/sys/sunrpc/transport files
Add a file that when read lists the set of registered svc
transports.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:13 -05:00
Tom Tucker
260c1d1298 svc: Add transport hdr size for defer/revisit
Some transports have a header in front of the RPC header. The current
defer/revisit processing considers only the iov_len and arg_len to
determine how much to back up when saving the original request
to revisit. Add a field to the rqstp structure to save the size
of the transport header so svc_defer can correctly compute
the start of a request.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:13 -05:00
Tom Tucker
0f0257eaa5 svc: Move the xprt independent code to the svc_xprt.c file
This functionally trivial patch moves all of the transport independent
functions from the svcsock.c file to the transport independent svc_xprt.c
file.

In addition the following formatting changes were made:
- White space cleanup
- Function signatures on single line
- The inline directive was removed
- Lines over 80 columns were reformatted
- The term 'socket' was changed to 'transport' in comments
- The SMP comment was moved and updated.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:13 -05:00
Tom Tucker
57b1d3baba svc: Removing remaining references to rq_sock in rqstp
This functionally empty patch removes rq_sock and unamed union
from rqstp structure.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:13 -05:00
Tom Tucker
9dbc240f19 svc: Move the sockaddr information to svc_xprt
This patch moves the transport sockaddr to the svc_xprt
structure.  Convenience functions are added to set and
get the local and remote addresses of a transport from
the transport provider as well as determine the length
of a sockaddr.

A transport is responsible for setting the xpt_local
and xpt_remote addresses in the svc_xprt structure as
part of transport creation and xpo_accept processing. This
cannot be done in a generic way and in fact varies
between TCP, UDP and RDMA. A set of xpo_ functions
(e.g. getlocalname, getremotename) could have been
added but this would have resulted in additional
caching and copying of the addresses around.  Note that
the xpt_local address should also be set on listening
endpoints; for TCP/RDMA this is done as part of
endpoint creation.

For connected transports like TCP and RDMA, the addresses
never change and can be set once and copied into the
rqstp structure for each request. For UDP, however, the
local and remote addresses may change for each request. In
this case, the address information is obtained from the
UDP recvmsg info and copied into the rqstp structure from
there.

A svc_xprt_local_port function was also added that returns
the local port given a transport. This is used by
svc_create_xprt when returning the port associated with
a newly created transport, and later when creating a
generic find transport service to check if a service is
already listening on a given port.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:12 -05:00
Tom Tucker
8c7b0172a1 svc: Make deferral processing xprt independent
This patch moves the transport independent sk_deferred list to the svc_xprt
structure and updates the svc_deferred_req structure to keep pointers to
svc_xprt's directly. The deferral processing code is also moved out of the
transport dependent recvfrom functions and into the generic svc_recv path.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:12 -05:00
Tom Tucker
def13d7401 svc: Move the authinfo cache to svc_xprt.
Move the authinfo cache to svc_xprt. This allows both the TCP and RDMA
transports to share this logic. A flag bit is used to determine if
auth information is to be cached or not. Previously, this code looked
at the transport protocol.

I've also changed the spin_lock/unlock logic so that a lock is not taken for
transports that are not caching auth info.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:12 -05:00
Tom Tucker
4bc6c497b2 svc: Remove sk_lastrecv
With the implementation of the new mark and sweep algorithm for shutting
down old connections, the sk_lastrecv field is no longer needed.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:12 -05:00
Tom Tucker
a6046f71f2 svc: Change svc_sock_received to svc_xprt_received and export it
All fields touched by svc_sock_received are now transport independent.
Change it to use svc_xprt directly. This function is called from
transport dependent code, so export it.

Update the comment to clearly state the rules for calling this function.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:12 -05:00
Tom Tucker
a50fea26b9 svc: Make svc_send transport neutral
Move the sk_mutex field to the transport independent svc_xprt structure.
Now all the fields that svc_send touches are transport neutral. Change the
svc_send function to use the transport independent svc_xprt directly instead
of the transport dependent svc_sock structure.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:11 -05:00
Tom Tucker
7a90e8cc21 svc: Move sk_reserved to svc_xprt
This functionally trivial patch moves the sk_reserved field to the
transport independent svc_xprt structure.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:11 -05:00
Tom Tucker
7a18208383 svc: Make close transport independent
Move sk_list and sk_ready to svc_xprt. This involves close because these
lists are walked by svcs when closing all their transports. So I combined
the moving of these lists to svc_xprt with making close transport independent.

The svc_force_sock_close has been changed to svc_close_all and takes a list
as an argument. This removes some svc internals knowledge from the svcs.

This code races with module removal and transport addition.

Thanks to Simon Holm Thøgersen for a compile fix.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Simon Holm Thøgersen <odie@cs.aau.dk>
2008-02-01 16:42:11 -05:00
Tom Tucker
bb5cf160b2 svc: Move sk_server and sk_pool to svc_xprt
This is another incremental change that moves transport independent
fields from svc_sock to the svc_xprt structure. The changes
should be functionally null.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:11 -05:00
Tom Tucker
02fc6c3618 svc: Move sk_flags to the svc_xprt structure
This functionally trivial change moves the transport independent sk_flags
field to the transport independent svc_xprt structure.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:11 -05:00
Tom Tucker
e1b3157f97 svc: Change sk_inuse to a kref
Change the atomic_t reference count to a kref and move it to the
transport indepenent svc_xprt structure. Change the reference count
wrapper names to be generic.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:11 -05:00
Tom Tucker
d7c9f1ed97 svc: Change services to use new svc_create_xprt service
Modify the various kernel RPC svcs to use the svc_create_xprt service.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:09 -05:00
Tom Tucker
b700cbb11f svc: Add a generic transport svc_create_xprt function
The svc_create_xprt function is a transport independent version
of the svc_makesock function.

Since transport instance creation contains transport dependent and
independent components, add an xpo_create transport function. The
transport implementation of this function allocates the memory for the
endpoint, implements the transport dependent initialization logic, and
calls svc_xprt_init to initialize the transport independent field (svc_xprt)
in it's data structure.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:09 -05:00
Tom Tucker
38a417cc99 svc: Add xpo_accept transport function
Previously, the accept logic looked into the socket state to determine
whether to call accept or recv when data-ready was indicated on an endpoint.
Since some transports don't use sockets, this logic now uses a flag
bit (SK_LISTENER) to identify listening endpoints. A transport function
(xpo_accept) allows each transport to define its own accept processing.
A transport's initialization logic is reponsible for setting the
SK_LISTENER bit. I didn't see any way to do this in transport independent
logic since the passive side of a UDP connection doesn't listen and
always recv's.

In the svc_recv function, if the SK_LISTENER bit is set, the transport
xpo_accept function is called to handle accept processing.

Note that all functions are defined even if they don't make sense
for a given transport. For example, accept doesn't mean anything for
UDP. The function is defined anyway and bug checks if called. The
UDP transport should never set the SK_LISTENER bit.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
323bee32e9 svc: Add a transport function that checks for write space
In order to avoid blocking a service thread, the receive side checks
to see if there is sufficient write space to reply to the request.
Each transport has a different mechanism for determining if there is
enough write space to reply.

The code that checked for write space was coupled with code that
checked for CLOSE and CONN. These checks have been broken out into
separate statements to make the code easier to read.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
e831fe65b1 svc: Add xpo_prep_reply_hdr
Some transports add fields to the RPC header for replies, e.g. the TCP
record length. This function is called when preparing the reply header
to allow each transport to add whatever fields it requires.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
755cceaba7 svc: Add per-transport delete functions
Add transport specific xpo_detach and xpo_free functions. The xpo_detach
function causes the transport to stop delivering data-ready events
and enqueing the transport for I/O.

The xpo_free function frees all resources associated with the particular
transport instance.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
5148bf4ebc svc: Add transport specific xpo_release function
The svc_sock_release function releases pages allocated to a thread. For
UDP this frees the receive skb. For RDMA it will post a receive WR
and bump the client credit count.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
5d137990f5 svc: Move sk_sendto and sk_recvfrom to svc_xprt_class
The sk_sendto and sk_recvfrom are function pointers that allow svc_sock
to be used for both UDP and TCP. Move these function pointers to the
svc_xprt_ops structure.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
490231558e svc: Add a max payload value to the transport
The svc_max_payload function currently looks at the socket type
to determine the max payload. Add a max payload value to svc_xprt_class
so it can be returned directly.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
9f29868b49 svc: Change the svc_sock in the rqstp structure to a transport
The rqstp structure contains a pointer to the transport for the
RPC request. This functionaly trivial patch adds an unamed union
with pointers to both svc_sock and svc_xprt. Ultimately the
union will be removed and only the rq_xprt field will remain. This
allows incrementally extracting transport independent interfaces without
one gigundo patch.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
360d873864 svc: Make svc_sock the tcp/udp transport
Make TCP and UDP svc_sock transports, and register them
with the svc transport core.

A transport type (svc_sock) has an svc_xprt as its first member,
and calls svc_xprt_init to initialize this field.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:07 -05:00
Tom Tucker
1d8206b97a svc: Add an svc transport class
The transport class (svc_xprt_class) represents a type of transport, e.g.
udp, tcp, rdma.  A transport class has a unique name and a set of transport
operations kept in the svc_xprt_ops structure.

A transport class can be dynamically registered and unregisterd. The
svc_xprt_class represents the module that implements the transport
type and keeps reference counts on the module to avoid unloading while
there are active users.

The endpoint (svc_xprt) is a generic, transport independent endpoint that can
be used to send and receive data for an RPC service. It inherits it's
operations from the transport class.

A transport driver module registers and unregisters itself with svc sunrpc
by calling svc_reg_xprt_class, and svc_unreg_xprt_class respectively.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:07 -05:00
Frank Filz
406a7ea97d nfsd: Allow AIX client to read dir containing mountpoints
This patch addresses a compatibility issue with a Linux NFS server and
AIX NFS client.

I have exported /export as fsid=0 with sec=krb5:krb5i
I have mount --bind /home onto /export/home
I have exported /export/home with sec=krb5i

The AIX client mounts / -o sec=krb5:krb5i onto /mnt

If I do an ls /mnt, the AIX client gets a permission error. Looking at
the network traceIwe see a READDIR looking for attributes
FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID. The response gives a
NFS4ERR_WRONGSEC which the AIX client is not expecting.

Since the AIX client is only asking for an attribute that is an
attribute of the parent file system (pseudo root in my example), it
seems reasonable that there should not be an error.

In discussing this issue with Bruce Fields, I initially proposed
ignoring the error in nfsd4_encode_dirent_fattr() if all that was being
asked for was FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID, however,
Bruce suggested that we avoid calling cross_mnt() if only these
attributes are requested.

The following patch implements bypassing cross_mnt() if only
FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID are called. Since there
is some complexity in the code in nfsd4_encode_fattr(), I didn't want to
duplicate code (and introduce a maintenance nightmare), so I added a
parameter to nfsd4_encode_fattr() that indicates whether it should
ignore cross mounts and simply fill in the attribute using the passed in
dentry as opposed to it's parent.

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:06 -05:00
J. Bruce Fields
2e8138a274 nfsd: move nfsd/auth.h into fs/nfsd
This header is used only in a few places in fs/nfsd, so there seems to
be little point to having it in include/.  (Thanks to Robert Day for
pointing this out.)

Cc: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:05 -05:00
J. Bruce Fields
dbf847ecb6 knfsd: allow cache_register to return error on failure
Newer server features such as nfsv4 and gss depend on proc to work, so a
failure to initialize the proc files they need should be treated as
fatal.

Thanks to Andrew Morton for style fix and compile fix in case where
CONFIG_NFSD_V4 is undefined.

Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:05 -05:00
J. Bruce Fields
df95a9d4fb knfsd: cache unregistration needn't return error
There's really nothing much the caller can do if cache unregistration
fails.  And indeed, all any caller does in this case is print an error
and continue.  So just return void and move the printk's inside
cache_unregister.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:04 -05:00
J. Bruce Fields
d5c3428b2c nfsd: fail module init on reply cache init failure
If the reply cache initialization fails due to a kmalloc failure,
currently we try to soldier on with a reduced (or nonexistant) reply
cache.

Better to just fail immediately: the failure is then much easier to
understand and debug, and it could save us complexity in some later
code.  (But actually, it doesn't help currently because the cache is
also turned off in some odd failure cases; we should probably find a
better way to handle those failure cases some day.)

Fix some minor style problems while we're at it, and rename
nfsd_cache_init() to remove the need for a comment describing it.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:04 -05:00
Chuck Lever
48b4ba3fdd NFSD: Path name length signage in nfsd request argument structures
Clean up: For consistency, store the length of path name strings in nfsd
argument structures as unsigned integers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:03 -05:00
Chuck Lever
5a022fc870 NFSD: Adjust filename length argument of nfsd_lookup
Clean up: adjust the sign of the length argument of nfsd_lookup and
nfsd_lookup_dentry, for consistency with recent changes.  NFSD version
4 callers already pass an unsigned file name length.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:03 -05:00
Chuck Lever
29d5e55538 NFSD: File name length signage in nfsd request argument structures
Clean up: For consistency, store the length of file name strings in nfsd
argument structures as unsigned integers.  This matches the XDR routines
and client argument structures for the same operation types.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:02 -05:00
Chuck Lever
48df020aa1 NLM: Fix sign of length of NLM variable length strings
According to The Open Group's NLM specification, NLM callers are variable
length strings.  XDR variable length strings use an unsigned 32 bit length.
And internally, negative string lengths are not meaningful for the Linux
NLM implementation.

Clean up: Make nlm_lock.len and nlm_reboot.len unsigned integers.  This
makes the sign of NLM string lengths consistent with the sign of xdr_netobj
lengths.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:02 -05:00
Chuck Lever
e5cff482c7 SUNRPC: Use unsigned string lengths in xdr_decode_string_inplace
XDR strings, opaques, and net objects should all use unsigned lengths.
To wit, RFC 4506 says:

4.2.  Unsigned Integer

   An XDR unsigned integer is a 32-bit datum that encodes a non-negative
   integer in the range [0,4294967295].

 ...

4.11.  String

   The standard defines a string of n (numbered 0 through n-1) ASCII
   bytes to be the number n encoded as an unsigned integer (as described
   above), and followed by the n bytes of the string.

After this patch, xdr_decode_string_inplace now matches the other XDR
string and array helpers that take a string length argument.  See:

xdr_encode_opaque_fixed, xdr_encode_opaque, xdr_encode_array

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:02 -05:00
Linus Torvalds
dd5f5fed6c Merge branch 'audit.b46' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b46' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
  [AUDIT] Add uid, gid fields to ANOM_PROMISCUOUS message
  [AUDIT] ratelimit printk messages audit
  [patch 2/2] audit: complement va_copy with va_end()
  [patch 1/2] kernel/audit.c: warning fix
  [AUDIT] create context if auditing was ever enabled
  [AUDIT] clean up audit_receive_msg()
  [AUDIT] make audit=0 really stop audit messages
  [AUDIT] break large execve argument logging into smaller messages
  [AUDIT] include audit type in audit message when using printk
  [AUDIT] do not panic on exclude messages in audit_log_pid_context()
  [AUDIT] Add End of Event record
  [AUDIT] add session id to audit messages
  [AUDIT] collect uid, loginuid, and comm in OBJ_PID records
  [AUDIT] return EINTR not ERESTART*
  [PATCH] get rid of loginuid races
  [PATCH] switch audit_get_loginuid() to task_struct *
2008-02-02 08:37:03 +11:00
Tomas Winkler
e53cfe0ead iwlwifi: Fix MIMO PS mode
This patch setups correctly MIMO PS mode flags

Signed-off-by: Guy Cohen <guy.cohen@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-01 16:13:16 -05:00
Eric Paris
de6bbd1d30 [AUDIT] break large execve argument logging into smaller messages
execve arguments can be quite large.  There is no limit on the number of
arguments and a 4G limit on the size of an argument.

this patch prints those aruguments in bite sized pieces.  a userspace size
limitation of 8k was discovered so this keeps messages around 7.5k

single arguments larger than 7.5k in length are split into multiple records
and can be identified as aX[Y]=

Signed-off-by: Eric Paris <eparis@redhat.com>
2008-02-01 14:23:55 -05:00
Eric Paris
c0641f28dc [AUDIT] Add End of Event record
This patch adds an end of event record type. It will be sent by the kernel as
the last record when a multi-record event is triggered. This will aid realtime
analysis programs since they will now reliably know they have the last record
to complete an event. The audit daemon filters this and will not write it to
disk.

Signed-off-by: Steve Grubb <sgrubb redhat com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2008-02-01 14:07:19 -05:00
Eric Paris
4746ec5b01 [AUDIT] add session id to audit messages
In order to correlate audit records to an individual login add a session
id.  This is incremented every time a user logs in and is included in
almost all messages which currently output the auid.  The field is
labeled ses=  or oses=

Signed-off-by: Eric Paris <eparis@redhat.com>
2008-02-01 14:06:51 -05:00
Al Viro
bfef93a5d1 [PATCH] get rid of loginuid races
Keeping loginuid in audit_context is racy and results in messier
code.  Taken to task_struct, out of the way of ->audit_context
changes.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-02-01 14:05:28 -05:00
Al Viro
0c11b9428f [PATCH] switch audit_get_loginuid() to task_struct *
all callers pass something->audit_context

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-02-01 14:04:59 -05:00
Thomas Gleixner
cd689985cf futex: Add bitset conditional wait/wakeup functionality
To allow the implementation of optimized rw-locks in user space, glibc
needs a possibility to select waiters for wakeup depending on a bitset
mask.

This requires two new futex OPs: FUTEX_WAIT_BITS and FUTEX_WAKE_BITS
These OPs are basically the same as FUTEX_WAIT and FUTEX_WAKE plus an
additional argument - a bitset. Further the FUTEX_WAIT_BITS OP is
expecting an absolute timeout value instead of the relative one, which
is used for the FUTEX_WAIT OP.

FUTEX_WAIT_BITS calls into the kernel with a bitset. The bitset is
stored in the futex_q structure, which is used to enqueue the waiter
into the hashed futex waitqueue.

FUTEX_WAKE_BITS also calls into the kernel with a bitset. The wakeup
function logically ANDs the bitset with the bitset stored in each
waiters futex_q structure. If the result is zero (i.e. none of the set
bits in the bitsets is matching), then the waiter is not woken up. If
the result is not zero (i.e. one of the set bits in the bitsets is
matching), then the waiter is woken.

The bitset provided by the caller must be non zero. In case the
provided bitset is zero the kernel returns EINVAL.

Internaly the new OPs are only extensions to the existing FUTEX_WAIT
and FUTEX_WAKE functions. The existing OPs hand a bitset with all bits
set into the futex_wait() and futex_wake() functions.

Signed-off-by: Thomas Gleixner <tgxl@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-01 17:45:14 +01:00
Thomas Gleixner
5df7fa1c62 tick-sched: add more debug information
To allow better diagnosis of tick-sched related, especially NOHZ
related problems, we need to know when the last wakeup via an irq
happened and when the CPU left the idle state.

Add two fields (idle_waketime, idle_exittime) to the tick_sched
structure and add them to the timer_list output.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-01 17:45:14 +01:00
Thomas Gleixner
1001d0a9ee timekeeping: update xtime_cache when time(zone) changes
xtime_cache needs to be updated whenever xtime and or wall_to_monotic
are changed. Otherwise users of xtime_cache might see a stale (and in
the case of timezone changes utterly wrong) value until the next
update happens.

Fixup the obvious places, which miss this update.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <johnstul@us.ibm.com>
Tested-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-01 17:45:13 +01:00
Linus Torvalds
24e1c13c93 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: kill swap_io_context()
  as-iosched: fix inconsistent ioc->lock context
  ide-cd: fix leftover data BUG
  block: make elevator lib checkpatch compliant
  cfq-iosched: make checkpatch compliant
  block: make core bits checkpatch compliant
  block: new end request handling interface should take unsigned byte counts
  unexport add_disk_randomness
  block/sunvdc.c:print_version() must be __devinit
  splice: always updated atime in direct splice
2008-02-01 21:48:45 +11:00
Jens Axboe
3bc217ffe6 block: kill swap_io_context()
It blindly copies everything in the io_context, including the lock.
That doesn't work so well for either lock ordering or lockdep.

There seems zero point in swapping io contexts on a request to request
merge, so the best point of action is to just remove it.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-01 11:34:49 +01:00
Jens Axboe
22b132102f block: new end request handling interface should take unsigned byte counts
No point in passing signed integers as the byte count, they can never
be negative.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-01 09:26:33 +01:00
Patrick McHardy
e5dfb81518 [NET_SCHED]: Add flow classifier
Add new "flow" classifier, which is meant to extend the SFQ hashing
capabilities without hard-coding new hash functions and also allows
deterministic mappings of keys to classes, replacing some out of tree
iptables patches like IPCLASSIFY (maps IPs to classes), IPMARK (maps
IPs to marks, with fw filters to classes), ...

Some examples:

- Classic SFQ hash:

  tc filter add ... flow hash \
  	keys src,dst,proto,proto-src,proto-dst divisor 1024

- Classic SFQ hash, but using information from conntrack to work properly in
  combination with NAT:

  tc filter add ... flow hash \
  	keys nfct-src,nfct-dst,proto,nfct-proto-src,nfct-proto-dst divisor 1024

- Map destination IPs of 192.168.0.0/24 to classids 1-257:

  tc filter add ... flow map \
  	key dst addend -192.168.0.0 divisor 256

- alternatively:

  tc filter add ... flow map \
  	key dst and 0xff

- similar, but reverse ordered:

  tc filter add ... flow map \
  	key dst and 0xff xor 0xff

Perturbation is currently not supported because we can't reliable kill the
timer on destruction.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:36 -08:00
Patrick McHardy
94de78d195 [NET_SCHED]: sch_sfq: make internal queues visible as classes
Add support for dumping statistics and make internal queues visible as
classes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:35 -08:00
Adrian Bunk
0027ba8434 [IPV4]: Make struct ipv4_devconf static.
struct ipv4_devconf can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:31 -08:00
Masahide NAKAMURA
9472c9ef64 [XFRM]: Fix statistics.
o Outbound sequence number overflow error status
  is counted as XfrmOutStateSeqError.
o Additionaly, it changes inbound sequence number replay
  error name from XfrmInSeqOutOfWindow to XfrmInStateSeqError
  to apply name scheme above.
o Inbound IPv4 UDP encapsuling type mismatch error is wrongly
  mapped to XfrmInStateInvalid then this patch fiex the error
  to XfrmInStateMismatch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:30 -08:00
Eric Dumazet
29e75252da [IPV4] route cache: Introduce rt_genid for smooth cache invalidation
Current ip route cache implementation is not suited to large caches.

We can consume a lot of CPU when cache must be invalidated, since we
currently need to evict all cache entries, and this eviction is
sometimes asynchronous. min_delay & max_delay can somewhat control this
asynchronism behavior, but whole thing is a kludge, regularly triggering
infamous soft lockup messages. When entries are still in use, this also
consumes a lot of ram, filling dst_garbage.list.

A better scheme is to use a generation identifier on each entry,
so that cache invalidation can be performed by changing the table
identifier, without having to scan all entries.
No more delayed flushing, no more stalling when secret_interval expires.

Invalidated entries will then be freed at GC time (controled by
ip_rt_gc_timeout or stress), or when an invalidated entry is found
in a chain when an insert is done.
Thus we keep a normal equilibrium.

This patch :
- renames rt_hash_rnd to rt_genid (and makes it an atomic_t)
- Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit)
- Checks entry->rt_genid at appropriate places :
2008-01-31 19:28:27 -08:00
Chris Leech
e83a2ea850 [VLAN]: set_rx_mode support for unicast address list
Reuse the existing logic for multicast list synchronization for the
unicast address list. The core of dev_mc_sync/unsync are split out as
__dev_addr_sync/unsync and moved from dev_mcast.c to dev.c.  These are
then used to implement dev_unicast_sync/unsync as well.

I'm working on cleaning up Intel's FCoE stack, which generates new MAC
addresses from the fibre channel device id assigned by the fabric as
per the current draft specification in T11.  When using such a
protocol in a VLAN environment it would be nice to not always be
forced into promiscuous mode, assuming the underlying Ethernet driver
supports multiple unicast addresses as well.

Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-01-31 19:28:24 -08:00
Stephen Hemminger
71d67e666e [IPV4] fib_trie: rescan if key is lost during dump
Normally during a dump the key of the last dumped entry is used for
continuation, but since lock is dropped it might be lost. In that case
fallback to the old counter based N^2 behaviour.  This means the dump
will end up skipping some routes which matches what FIB_HASH does.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:23 -08:00
Pavel Emelyanov
d86e0dac2c [NETNS]: Tcp-v6 sockets per-net lookup.
Add a net argument to inet6_lookup and propagate it further.
Actually, this is tcp-v6 implementation of what was done for
tcp-v4 sockets in a previous patch.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:20 -08:00
Pavel Emelyanov
535174efbe [IPV6]: Introduce the INET6_TW_MATCH macro.
We have INET_MATCH, INET_TW_MATCH and INET6_MATCH to test sockets and
twbuckets for matching, but ipv6 twbuckets are tested manually.

Here's the INET6_TW_MATCH to help with it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:17 -08:00
Jan Engelhardt
9ddd0ed050 [NETFILTER]: nf_{conntrack,nat}_pptp: annotate PPtP helper with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:09 -08:00
Jan Engelhardt
13f7d63c29 [NETFILTER]: nf_{conntrack,nat}_sip: annotate SIP helper with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:08 -08:00
Alexey Dobriyan
3cb609d57c [NETFILTER]: x_tables: create per-netns /proc/net/*_tables_*
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:06 -08:00
Jan Engelhardt
09e410def6 [NETFILTER]: xt_hashlimit match, revision 1
Introduces the xt_hashlimit match revision 1. It adds support for
kernel-level inversion and grouping source and/or destination IP
addresses, allowing to limit on a per-subnet basis. While this would
technically obsolete xt_limit, xt_hashlimit is a more expensive due
to the hashbucketing.

Kernel-level inversion: Previously you had to do user-level inversion:

	iptables -N foo
	iptables -A foo -m hashlimit --hashlimit(-upto) 5/s -j RETURN
	iptables -A foo -j DROP
	iptables -A INPUT -j foo

now it is simpler:

	iptables -A INPUT -m hashlimit --hashlimit-over 5/s -j DROP

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:04 -08:00
Patrick McHardy
b0a6363c24 [NETFILTER]: {ip,arp,ip6}_tables: fix sparse warnings in compat code
CHECK   net/ipv4/netfilter/ip_tables.c
net/ipv4/netfilter/ip_tables.c:1453:8: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1453:8:    expected int *size
net/ipv4/netfilter/ip_tables.c:1453:8:    got unsigned int [usertype] *size
net/ipv4/netfilter/ip_tables.c:1458:44: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1458:44:    expected int *size
net/ipv4/netfilter/ip_tables.c:1458:44:    got unsigned int [usertype] *size
net/ipv4/netfilter/ip_tables.c:1603:2: warning: incorrect type in argument 2 (different signedness)
net/ipv4/netfilter/ip_tables.c:1603:2:    expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1603:2:    got int *<noident>
net/ipv4/netfilter/ip_tables.c:1627:8: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1627:8:    expected int *size
net/ipv4/netfilter/ip_tables.c:1627:8:    got unsigned int *size
net/ipv4/netfilter/ip_tables.c:1634:40: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1634:40:    expected int *size
net/ipv4/netfilter/ip_tables.c:1634:40:    got unsigned int *size
net/ipv4/netfilter/ip_tables.c:1653:8: warning: incorrect type in argument 5 (different signedness)
net/ipv4/netfilter/ip_tables.c:1653:8:    expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1653:8:    got int *<noident>
net/ipv4/netfilter/ip_tables.c:1666:2: warning: incorrect type in argument 2 (different signedness)
net/ipv4/netfilter/ip_tables.c:1666:2:    expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1666:2:    got int *<noident>
  CHECK   net/ipv4/netfilter/arp_tables.c
net/ipv4/netfilter/arp_tables.c:1285:40: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/arp_tables.c:1285:40:    expected int *size
net/ipv4/netfilter/arp_tables.c:1285:40:    got unsigned int *size
net/ipv4/netfilter/arp_tables.c:1543:44: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/arp_tables.c:1543:44:    expected int *size
net/ipv4/netfilter/arp_tables.c:1543:44:    got unsigned int [usertype] *size
  CHECK   net/ipv6/netfilter/ip6_tables.c
net/ipv6/netfilter/ip6_tables.c:1481:8: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1481:8:    expected int *size
net/ipv6/netfilter/ip6_tables.c:1481:8:    got unsigned int [usertype] *size
net/ipv6/netfilter/ip6_tables.c:1486:44: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1486:44:    expected int *size
net/ipv6/netfilter/ip6_tables.c:1486:44:    got unsigned int [usertype] *size
net/ipv6/netfilter/ip6_tables.c:1631:2: warning: incorrect type in argument 2 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1631:2:    expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1631:2:    got int *<noident>
net/ipv6/netfilter/ip6_tables.c:1655:8: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1655:8:    expected int *size
net/ipv6/netfilter/ip6_tables.c:1655:8:    got unsigned int *size
net/ipv6/netfilter/ip6_tables.c:1662:40: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1662:40:    expected int *size
net/ipv6/netfilter/ip6_tables.c:1662:40:    got unsigned int *size
net/ipv6/netfilter/ip6_tables.c:1680:8: warning: incorrect type in argument 5 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1680:8:    expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1680:8:    got int *<noident>
net/ipv6/netfilter/ip6_tables.c:1693:2: warning: incorrect type in argument 2 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1693:2:    expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1693:2:    got int *<noident>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:49 -08:00
Jan Engelhardt
edc26f7aaa [NETFILTER]: xt_owner: allow matching UID/GID ranges
Add support for ranges to the new revision. This doesn't affect
compatibility since the new revision was not released yet.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:43 -08:00
Alexey Dobriyan
79df341ab6 [NETFILTER]: arp_tables: netns preparation
* Propagate netns from userspace.
* arpt_register_table() registers table in supplied netns.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:40 -08:00
Alexey Dobriyan
336b517fdc [NETFILTER]: ip6_tables: netns preparation
* Propagate netns from userspace down to xt_find_table_lock()
* Register ip6 tables in netns (modules still use init_net)

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:39 -08:00
Alexey Dobriyan
44d34e721e [NETFILTER]: x_tables: return new table from {arp,ip,ip6}t_register_table()
Typical table module registers xt_table structure (i.e. packet_filter)
and link it to list during it. We can't use one template for it because
corresponding list_head will become corrupted. We also can't unregister
with template because it wasn't changed at all and thus doesn't know in
which list it is.

So, we duplicate template at the very first step of table registration.
Table modules will save it for use during unregistration time and actual
filtering.

Do it at once to not screw bisection.

P.S.: renaming i.e. packet_filter => __packet_filter is temporary until
      full netnsization of table modules is done.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:36 -08:00
Alexey Dobriyan
8d87005207 [NETFILTER]: x_tables: per-netns xt_tables
In fact all we want is per-netns set of rules, however doing that will
unnecessary complicate routines such as ipt_hook()/ipt_do_table, so
make full xt_table array per-netns.

Every user stubbed with init_net for a while.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:35 -08:00
Alexey Dobriyan
a98da11d88 [NETFILTER]: x_tables: change xt_table_register() return value convention
Switch from 0/-E to ptr/PTR_ERR convention.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:35 -08:00
Jan Engelhardt
b41649989c [NETFILTER]: xt_conntrack: add port and direction matching
Extend the xt_conntrack match revision 1 by port matching (all four
{orig,repl}{src,dst}) and by packet direction matching.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:31 -08:00
Jan Engelhardt
c82a5cb8b2 linux/types.h: Use __u64 for aligned_u64
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:30 -08:00
Patrick McHardy
2fd8e526f4 [NETFILTER]: bridge netfilter: remove nf_bridge_info read-only netoutdev member
Before the removal of the deferred output hooks, netoutdev was used in
case of VLANs on top of a bridge to store the VLAN device, so the
deferred hooks would see the correct output device. This isn't
necessary anymore since we're calling the output hooks for the correct
device directly in the IP stack.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:29 -08:00
Jan Engelhardt
ecb6f85e11 [NETFILTER]: Use const in struct xt_match, xt_target, xt_table
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:28 -08:00
Herbert Xu
1a6509d991 [IPSEC]: Add support for combined mode algorithms
This patch adds support for combined mode algorithms with GCM being
the first algorithm supported.

Combined mode algorithms can be added through the xfrm_user interface
using the new algorithm payload type XFRMA_ALG_AEAD.  Each algorithms
is identified by its name and the ICV length.

For the purposes of matching algorithms in xfrm_tmpl structures,
combined mode algorithms occupy the same name space as encryption
algorithms.  This is in line with how they are negotiated using IKE.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:03 -08:00
Jussi Kivilinna
3692e94f15 Move usbnet.h and rndis_host.h to include/linux/usb
Move headers usbnet.h and rndis_host.h to include/linux/usb and fix includes
for drivers/net/usb modules. Headers are moved because rndis_wlan will be
outside drivers/net/usb in drivers/net/wireless and yet need these headers.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:00 -08:00
Iñaky Pérez-González
303d9bf6bb rfkill: add the WiMAX radio type
Teach rfkill about wimax radios.

Had to define a KEY_WIMAX as a 'key for disabling only wimax radios',
as other radio technologies have. This makes sense as hardware has
specific keys for disabling specific radios.

The RFKILL enabling part is, otherwise, a copy and paste of any other
radio technology.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:26:46 -08:00
Linus Torvalds
75659ca0c1 Merge branch 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc
* 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits)
  Remove commented-out code copied from NFS
  NFS: Switch from intr mount option to TASK_KILLABLE
  Add wait_for_completion_killable
  Add wait_event_killable
  Add schedule_timeout_killable
  Use mutex_lock_killable in vfs_readdir
  Add mutex_lock_killable
  Use lock_page_killable
  Add lock_page_killable
  Add fatal_signal_pending
  Add TASK_WAKEKILL
  exit: Use task_is_*
  signal: Use task_is_*
  sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL
  ptrace: Use task_is_*
  power: Use task_is_*
  wait: Use TASK_NORMAL
  proc/base.c: Use task_is_*
  proc/array.c: Use TASK_REPORT
  perfmon: Use task_is_*
  ...

Fixed up conflicts in NFS/sunrpc manually..
2008-02-01 11:45:47 +11:00
Ingo Molnar
62152d0ea7 asm-generic/tlb.h: build fix
bring back the avr32, blackfin, sh, sparc architectures into working order,
by reverting the effects of this change that came in via the x86 tree:

   commit a5a19c63f4
   Author: Jeremy Fitzhardinge <jeremy@goop.org>
   Date:   Wed Jan 30 13:33:39 2008 +0100

       x86: demacro asm-x86/pgalloc_32.h

Sorry about that!

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-31 22:05:48 +01:00
Linus Torvalds
8af03e782c Merge branch 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (454 commits)
  [POWERPC] Cell IOMMU fixed mapping support
  [POWERPC] Split out the ioid fetching/checking logic
  [POWERPC] Add support to cell_iommu_setup_page_tables() for multiple windows
  [POWERPC] Split out the IOMMU logic from cell_dma_dev_setup()
  [POWERPC] Split cell_iommu_setup_hardware() into two parts
  [POWERPC] Split out the logic that allocates struct iommus
  [POWERPC] Allocate the hash table under 1G on cell
  [POWERPC] Add set_dma_ops() to match get_dma_ops()
  [POWERPC] 83xx: Clean up / convert mpc83xx board DTS files to v1 format.
  [POWERPC] 85xx: Only invalidate TLB0 and TLB1
  [POWERPC] 83xx: Fix typo in mpc837x compatible entries
  [POWERPC] 85xx: convert sbc85* boards to use machine_device_initcall
  [POWERPC] 83xx: rework platform Kconfig
  [POWERPC] 85xx: rework platform Kconfig
  [POWERPC] 86xx: Remove unused IRQ defines
  [POWERPC] QE: Explicitly set address-cells and size cells for muram
  [POWERPC] Convert StorCenter DTS file to /dts-v1/ format.
  [POWERPC] 86xx: Convert all 86xx DTS files to /dts-v1/ format.
  [PPC] Remove 85xx from arch/ppc
  [PPC] Remove 83xx from arch/ppc
  ...
2008-01-31 13:37:27 +11:00
Linus Torvalds
6232665040 Merge git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
  alpha: fix x86.git merge build error
  ia64: on UP percpu variables are not small memory model
  x86: fix arch/x86/kernel/test_nx.c modular build bug
  s390: use generic percpu linux-2.6.git
  POWERPC: use generic per cpu
  ia64: use generic percpu
  SPARC64: use generic percpu
  percpu: change Kconfig to HAVE_SETUP_PER_CPU_AREA
  modules: fold percpu_modcopy into module.c
  x86: export copy_from_user_ll_nocache[_nozero]
  x86: fix duplicated TIF on 64-bit
2008-01-31 11:48:53 +11:00
Paul Mackerras
bd45ac0c5d Merge branch 'linux-2.6' 2008-01-31 11:25:51 +11:00
Linus Torvalds
44c3b59102 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
  security: compile capabilities by default
  selinux: make selinux_set_mnt_opts() static
  SELinux: Add warning messages on network denial due to error
  SELinux: Add network ingress and egress control permission checks
  NetLabel: Add auditing to the static labeling mechanism
  NetLabel: Introduce static network labels for unlabeled connections
  SELinux: Allow NetLabel to directly cache SIDs
  SELinux: Enable dynamic enable/disable of the network access checks
  SELinux: Better integration between peer labeling subsystems
  SELinux: Add a new peer class and permissions to the Flask definitions
  SELinux: Add a capabilities bitmap to SELinux policy version 22
  SELinux: Add a network node caching mechanism similar to the sel_netif_*() functions
  SELinux: Only store the network interface's ifindex
  SELinux: Convert the netif code to use ifindex values
  NetLabel: Add IP address family information to the netlbl_skbuff_getattr() function
  NetLabel: Add secid token support to the NetLabel secattr struct
  NetLabel: Consolidate the LSM domain mapping/hashing locks
  NetLabel: Cleanup the LSM domain hash functions
  NetLabel: Remove unneeded RCU read locks
2008-01-31 09:32:24 +11:00
Linus Torvalds
3b470ac43f Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
  PPC: Fix powerpc vio_find_name to not use devices_subsys
  Driver core: add bus_find_device_by_name function
  Module: check to see if we have a built in module with the same name
  x86: fix runtime error in arch/x86/kernel/cpu/mcheck/mce_amd_64.c
  Driver core: Fix up build when CONFIG_BLOCK=N
2008-01-31 09:31:37 +11:00
travis@sgi.com
05991bef10 ia64: use generic percpu
ia64 has a special processor specific mapping that can be used to locate the
offset for the current per cpu area.

Cc: linux-ia64@vger.kernel.org
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 23:27:58 +01:00
Avi Kivity
2f52d58c92 KVM: Move apic timer migration away from critical section
Migrating the apic timer in the critical section is not very nice, and is
absolutely horrible with the real-time port.  Move migration to the regular
vcpu execution path, triggered by a new bitflag.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:22 +02:00
Glauber de Oliveira Costa
a03d7f4b54 KVM: Put kvm_para.h include outside __KERNEL__
kvm_para.h potentially contains definitions that are to be used by userspace,
so it should not be included inside the __KERNEL__ block. To protect its own
data structures, kvm_para.h already includes its own __KERNEL__ block.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Acked-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:22 +02:00
Christian Ehrhardt
6f723c7911 KVM: Portability: Move kvm_fpu to asm-x86/kvm.h
This patch moves kvm_fpu asm-x86/kvm.h to allow every architecture to
define an own representation used for KVM_GET_FPU/KVM_SET_FPU.

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:22 +02:00
Marcelo Tosatti
aaee2c94f7 KVM: MMU: Switch to mmu spinlock
Convert the synchronization of the shadow handling to a separate mmu_lock
spinlock.

Also guard fetch() by mmap_sem in read-mode to protect against alias
and memslot changes.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:21 +02:00
Marcelo Tosatti
7ec5458821 KVM: Add kvm_read_guest_atomic()
In preparation for a mmu spinlock, add kvm_read_guest_atomic()
and use it in fetch() and prefetch_page().

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:20 +02:00
Avi Kivity
b93463aa59 KVM: Accelerated apic support
This adds a mechanism for exposing the virtual apic tpr to the guest, and a
protocol for letting the guest update the tpr without causing a vmexit if
conditions allow (e.g. there is no interrupt pending with a higher priority
than the new tpr).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:20 +02:00
Avi Kivity
b209749f52 KVM: local APIC TPR access reporting facility
Add a facility to report on accesses to the local apic tpr even if the
local apic is emulated in the kernel.  This is basically a hack that
allows userspace to patch Windows which tends to bang on the tpr a lot.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:20 +02:00
Zhang Xiantao
ec10f4750d KVM: Expose ioapic to ia64 save/restore APIs
IA64 also needs to see ioapic structure in irqchip.

Signed-off-by: xiantao.zhang@intel.com <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:19 +02:00
Zhang Xiantao
5736199afb KVM: Move kvm_vcpu_kick() to x86.c
Moving kvm_vcpu_kick() to x86.c. Since it should be
common for all archs, put its declarations in <linux/kvm_host.h>

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:19 +02:00
Avi Kivity
edf884172e KVM: Move arch dependent files to new directory arch/x86/kvm/
This paves the way for multiple architecture support.  Note that while
ioapic.c could potentially be shared with ia64, it is also moved.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:18 +02:00
Avi Kivity
fb56dbb31c KVM: Export include/linux/kvm.h only if $ARCH actually supports KVM
Currently, make headers_check barfs due to <asm/kvm.h>, which <linux/kvm.h>
includes, not existing.  Rather than add a zillion <asm/kvm.h>s, export kvm.h
only if the arch actually supports it.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:53:16 +02:00
Jerone Young
51e296258c KVM: Add ifdef in irqchip struct for x86 only structures
This patch fixes a small issue where sturctures:
	kvm_pic_state
	kvm_ioapic_state

are defined inside x86 specific code and may or may not
be defined in anyway for other architectures. The problem
caused is one cannot compile userspace apps (ex. libkvm)
for other archs since a size cannot be determined for these
structures.

Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:53:15 +02:00
Dan Kenigsberg
0771671749 KVM: Enhance guest cpuid management
The current cpuid management suffers from several problems, which inhibit
passing through the host feature set to the guest:

 - No way to tell which features the host supports

  While some features can be supported with no changes to kvm, others
  need explicit support.  That means kvm needs to vet the feature set
  before it is passed to the guest.

 - No support for indexed or stateful cpuid entries

  Some cpuid entries depend on ecx as well as on eax, or on internal
  state in the processor (running cpuid multiple times with the same
  input returns different output).  The current cpuid machinery only
  supports keying on eax.

 - No support for save/restore/migrate

  The internal state above needs to be exposed to userspace so it can
  be saved or migrated.

This patch adds extended cpuid support by means of three new ioctls:

 - KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm)
   supports

 - KVM_SET_CPUID2: sets the vcpu's cpuid table

 - KVM_GET_CPUID2: gets the vcpu's cpuid table, including hidden state

[avi: fix original KVM_SET_CPUID not removing nx on non-nx hosts as it did
      before]

Signed-off-by: Dan Kenigsberg <danken@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:53:13 +02:00
Jerone Young
a162dd5873 KVM: Portability: Move cpuid structures to <asm/kvm.h>
This patch moves structures:
	kvm_cpuid_entry
	kvm_cpuid

from include/linux/kvm.h to include/asm-x86/kvm.h

Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:53:08 +02:00
Jerone Young
244d57ece9 KVM: Portability: Move kvm_sregs and msr structures to <asm/kvm.h>
Move structures:
	kvm_sregs
	kvm_msr_entry
	kvm_msrs
	kvm_msr_list

from include/linux/kvm.h to include/asm-x86/kvm.h

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:53:08 +02:00
Jerone Young
3a56b20104 KVM: Portability: Move kvm_segment & kvm_dtable structure to <asm/kvm.h>
This patch moves structures:
	kvm_segment
	kvm_dtable
from include/linux/kvm.h to include/asm-x86/kvm.h

Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:53:08 +02:00
Jerone Young
d9ecf92810 KVM: Portability: Move structure lapic_state to <asm/kvm.h>
This patch moves structure lapic_state from include/linux/kvm.h
to include/asm-x86/kvm.h

Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:53:08 +02:00
Jerone Young
19d30b1644 KVM: Portability: Move kvm_regs to <asm/kvm.h>
This patch moves structure kvm_regs to include/asm-x86/kvm.h.
Each architecture will need to create there own version of this
structure.

Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:53:07 +02:00
Jerone Young
da1386a5bc KVM: Portability: Move x86 pic strutctures
This patch moves structures:
	kvm_pic_state
	kvm_ioapic_state

to inclue/asm-x86/kvm.h.

Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:53:07 +02:00
Jerone Young
f6a40e3bdf KVM: Portability: Move kvm_memory_alias to asm/kvm.h
This patch moves sturct kvm_memory_alias from include/linux/kvm.h
to include/asm-x86/kvm.h. Also have include/linux/kvm.h include
include/asm/kvm.h.

Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:53:07 +02:00
Izik Eidus
cbc9402297 KVM: Add ioctl to tss address from userspace,
Currently kvm has a wart in that it requires three extra pages for use
as a tss when emulating real mode on Intel.  This patch moves the allocation
internally, only requiring userspace to tell us where in the physical address
space we can place the tss.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:56 +02:00
Christian Borntraeger
5f43238d03 KVM: Per-architecture hypercall definitions
Currently kvm provides hypercalls only for x86* architectures. To
provide hypercall infrastructure for other kvm architectures I split
kvm_para.h into a generic header file and architecture specific
definitions.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:55 +02:00
Izik Eidus
6fc138d227 KVM: Support assigning userspace memory to the guest
Instead of having the kernel allocate memory to the guest, let userspace
allocate it and pass the address to the kernel.

This is required for s390 support, but also enables features like memory
sharing and using hugetlbfs backed memory.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:51 +02:00
Izik Eidus
82ce2c9683 KVM: Allow dynamic allocation of the mmu shadow cache size
The user is now able to set how many mmu pages will be allocated to the guest.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:50 +02:00
Anthony Liguori
7aa81cc047 KVM: Refactor hypercall infrastructure (v3)
This patch refactors the current hypercall infrastructure to better
support live migration and SMP.  It eliminates the hypercall page by
trapping the UD exception that would occur if you used the wrong hypercall
instruction for the underlying architecture and replacing it with the right
one lazily.

A fall-out of this patch is that the unhandled hypercalls no longer trap to
userspace.  There is very little reason though to use a hypercall to
communicate with userspace as PIO or MMIO can be used.  There is no code
in tree that uses userspace hypercalls.

[avi: fix #ud injection on vmx]

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:46 +02:00
Bernhard Kaindl
f212ec4b7b x86: early boot debugging via FireWire (ohci1394_dma=early)
This patch adds a new configuration option, which adds support for a new
early_param which gets checked in arch/x86/kernel/setup_{32,64}.c:setup_arch()
to decide wether OHCI-1394 FireWire controllers should be initialized and
enabled for physical DMA access to allow remote debugging of early problems
like issues ACPI or other subsystems which are executed very early.

If the config option is not enabled, no code is changed, and if the boot
paramenter is not given, no new code is executed, and independent of that,
all new code is freed after boot, so the config option can be even enabled
in standard, non-debug kernels.

With specialized tools, it is then possible to get debugging information
from machines which have no serial ports (notebooks) such as the printk
buffer contents, or any data which can be referenced from global pointers,
if it is stored below the 4GB limit and even memory dumps of of the physical
RAM region below the 4GB limit can be taken without any cooperation from the
CPU of the host, so the machine can be crashed early, it does not matter.

In the extreme, even kernel debuggers can be accessed in this way. I wrote
a small kgdb module and an accompanying gdb stub for FireWire which allows
to gdb to talk to kgdb using remote remory reads and writes over FireWire.

An version of the gdb stub fore FireWire is able to read all global data
from a system which is running a a normal kernel without any kernel debugger,
without any interruption or support of the system's CPU. That way, e.g. the
task struct and so on can be read and even manipulated when the physical DMA
access is granted.

A HOWTO is included in this patch, in Documentation/debugging-via-ohci1394.txt
and I've put a copy online at
ftp://ftp.suse.de/private/bk/firewire/docs/debugging-via-ohci1394.txt

It also has links to all the tools which are available to make use of it
another copy of it is online at:
ftp://ftp.suse.de/private/bk/firewire/kernel/ohci1394_dma_early-v2.diff

Signed-Off-By: Bernhard Kaindl <bk@suse.de>
Tested-By: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:11 +01:00
Ingo Molnar
12d6f21eac x86: do not PSE on CONFIG_DEBUG_PAGEALLOC=y
get more testing of the c_p_a() code done by not turning off
PSE on DEBUG_PAGEALLOC.

this simplifies the early pagetable setup code, and tests
the largepage-splitup code quite heavily.

In the end, all the largepages will be split up pretty quickly,
so there's no difference to how DEBUG_PAGEALLOC worked before.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:58 +01:00
Jeremy Fitzhardinge
a5a19c63f4 x86: demacro asm-x86/pgalloc_32.h
Convert macros into inline functions, for better type-checking.

This patch required a little bit of fiddling with headers in order to
make __(pte|pmd)_free_tlb inline rather than macros.
asm-generic/tlb.h includes asm/pgalloc.h, though it doesn't directly
use any pgalloc definitions.  I removed this include to avoid an
include cycle, but it may cause secondary compile failures by things
depending on the indirect inclusion; arch/x86/mm/hugetlbpage.c was one
such place; there may be others.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:39 +01:00
Florian Fainelli
0acf8e3447 pci: add PCI identifiers for the RDC devices
This patch defines the PCI identifiers found in
the RDC R-321x System-on-Chip.

Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:36 +01:00
Andi Kleen
03252919b7 x86: print which shared library/executable faulted in segfault etc. messages v3
They now look like:

hal-resmgr[13791]: segfault at 3c rip 2b9c8caec182 rsp 7fff1e825d30 error 4 in libacl.so.1.1.0[2b9c8caea000+6000]

This makes it easier to pinpoint bugs to specific libraries.

And printing the offset into a mapping also always allows to find the
correct fault point in a library even with randomized mappings. Previously
there was no way to actually find the correct code address inside
the randomized mapping.

Relies on earlier patch to shorten the printk formats.

They are often now longer than 80 characters, but I think that's worth it.

[includes fix from Eric Dumazet to check d_path error value]

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:18 +01:00
Andi Kleen
ca74a6f84e x86: optimize lock prefix switching to run less frequently
On VMs implemented using JITs that cache translated code changing the lock
prefixes is a quite costly operation that forces the JIT to throw away and
retranslate a lot of code.

Previously a SMP kernel would rewrite the locks once for each CPU which
is quite unnecessary. This patch changes the code to never switch at boot in
 the normal case (SMP kernel booting with >1 CPU) or only once for SMP kernel
on UP.

This makes a significant difference in boot up performance on AMD SimNow!
Also I expect it to be a little faster on native systems too because a smp
switch does a lot of text_poke()s which each synchronize the pipeline.

v1->v2: Rename max_cpus
v1->v2: Fix off by one in UP check (Thomas Gleixner)

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:17 +01:00
John Reiser
6b8be6df7f x86: add ENDPROC() markers
The ENDPROCs() were not used everywhere.  Some code used just END() instead,
while other code used nothing.  um/sys-i386/checksum.S didn't #include
<linux/linkage.h> .  I also got confused because gcc puts the
.type near the ENTRY, while ENDPROC puts it on the opposite end.

Signed off by: John Reiser <jreiser@BitWagon.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:13 +01:00
Ingo Molnar
076f9776f5 x86: make early printk selectable on 64-bit as well
Enable CONFIG_EMBEDDED to select CONFIG_EARLY_PRINTK on 64-bit as well.

saves ~2K:

   text    data     bss     dec     hex filename
   7290283 3672091 1907848 12870222         c4624e vmlinux.before
   7288373 3671795 1907848 12868016         c459b0 vmlinux.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:06 +01:00
Ingo Molnar
d50efc6c40 x86: fix UML and -regparm=3
introduce the "asmregparm" calling convention: for functions
implemented in assembly with a fixed regparm input parameters
calling convention.

mark the semaphore and rwsem slowpath functions with that.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:00 +01:00
Ananth N Mavinakayanahalli
8c1c935642 x86: kprobes: add kprobes smoke tests that run on boot
Here is a quick and naive smoke test for kprobes. This is intended to
just verify if some unrelated change broke the *probes subsystem. It is
self contained, architecture agnostic and isn't of any great use by itself.

This needs to be built in the kernel and runs a basic set of tests to
verify if kprobes, jprobes and kretprobes run fine on the kernel. In case
of an error, it'll print out a message with a "BUG" prefix.

This is a start; we intend to add more tests to this bucket over time.

Thanks to Jim Keniston and Masami Hiramatsu for comments and suggestions.

Tested on x86 (32/64) and powerpc.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 13:32:53 +01:00
travis@sgi.com
5280e004fc percpu: move arch XX_PER_CPU_XX definitions into linux/percpu.h
- Special consideration for IA64: Add the ability to specify
  arch specific per cpu flags

- remove .data.percpu attribute from DEFINE_PER_CPU for non-smp case.

The arch definitions are all the same. So move them into linux/percpu.h.

We cannot move DECLARE_PER_CPU since some include files just include
asm/percpu.h to avoid include recursion problems.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 13:32:52 +01:00
Jeremy Fitzhardinge
74ef649fe8 x86: add _AT() macro to conditionally cast
# HG changeset patch
# User Jeremy Fitzhardinge <jeremy@xensource.com>
# Date 1199317452 28800
# Node ID f7e7db3facd9406545103164f9be8f9ba1a2b549
# Parent  4d9a413a0f4c1d98dbea704f0366457b5117045d
x86: add _AT() macro to conditionally cast

Define _AT(type, value) to conditionally cast a value when compiling C
code, but not when used in assembler.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:32:42 +01:00
Roland McGrath
bb61682b3f x86: x86 core dump TLS
This makes ELF core dumps of 32-bit processes include a new
note type NT_386_TLS (0x200) giving the contents of the TLS
slots in struct user_desc format.  This lets post mortem
examination figure out what the segment registers mean like
the debugger does with get_thread_area on a live process.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:56 +01:00
Roland McGrath
c269f19617 x86: compat_sys_ptrace
This adds a generic definition of compat_sys_ptrace that calls
compat_arch_ptrace, parallel to sys_ptrace/arch_ptrace.  Some
machines needing this already define a function by that name.
The new generic function is defined only on machines that
put #define __ARCH_WANT_COMPAT_SYS_PTRACE into asm/ptrace.h.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:48 +01:00
Roland McGrath
032d82d906 x86: compat_ptrace_request
This adds a compat_ptrace_request that is the analogue of ptrace_request
for the things that 32-on-64 ptrace implementations can share in common.
So far there are just a couple of requests handled generically.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:47 +01:00
Roland McGrath
5bde4d1817 x86: user_regset user-copy helpers
This defines two new inlines in linux/regset.h, for use in arch_ptrace
implementations and the like.  These provide simplified wrappers for using
the user_regset interfaces to copy thread regset data into the caller's
user-space memory.  The inlines are trivial, but make the common uses in
places such as ptrace implementation much more concise, easier to read, and
less prone to code-copying errors.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:47 +01:00
Roland McGrath
bae3f7c39d x86: user_regset helpers
This adds some inlines to linux/regset.h intended for arch code to use in
its user_regset get and set functions.  These make it pretty easy to deal
with the interface's optional kernel-space or user-space pointers and its
generalized access to a part of the register data at a time.

In simple cases where the internal data structure matches the exported
layout (core dump format), a get function can be nothing but a call to
user_regset_copyout, and a set function a call to user_regset_copyin.

In other cases the exported layout is usually made up of a few pieces each
stored contiguously in a different internal data structure.  These helpers
make it straightforward to write a get or set function by processing each
contiguous chunk of the data in order.  The start_pos and end_pos arguments
are always constants, so these inlines collapse to a small amount of code.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:45 +01:00
Roland McGrath
bdf88217b7 x86: user_regset header
The new header <linux/regset.h> defines the types struct user_regset and
struct user_regset_view, with some associated declarations.  This new set
of interfaces will become the standard way for arch code to expose
user-mode machine-specific state.  A single set of entry points into arch
code can do all the low-level work in one place to fill the needs of core
dumps, ptrace, and any other user-mode debugging facilities that might come
along in the future.

For existing arch code to adapt to the user_regset interfaces, each arch
can work from the code it already has to support core files and ptrace.
The formats you want for user_regset are the core file formats.  The only
wrinkle in adapting old ptrace implementation code as user_regset get and
set functions is that these functions can be called on current as well as
on another task_struct that is stopped and switched out as for ptrace.
For some kinds of machine state, you may have to load it directly from CPU
registers or otherwise differently for current than for another thread.
(Your core dump support already handles this in elf_core_copy_regs for
current and elf_core_copy_task_regs for other tasks, so just check there.)
The set function should also be made to work on current in case that
entails some special cases, though this was never required before for
ptrace.  Adding this flexibility covers the arch needs to open the door to
more sophisticated new debugging facilities that don't always need to
context-switch to do every little thing.

The copyin/copyout helper functions (in a later patch) relieve the arch
code of most of the cumbersome details of the flexible get/set interfaces.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:44 +01:00
Jan Beulich
cae4595764 x86: make __{save,restore}_processor_state static
.. allowing to remove their declarations from a global include file
(the symbols don't exist for anything but x86).

Likewise for 64-bits' fix_processor_context(), just that that one was
properly declared in an arch-specific header.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:23 +01:00
Nick Piggin
95c354fe9f spinlock: lockbreak cleanup
The break_lock data structure and code for spinlocks is quite nasty.
Not only does it double the size of a spinlock but it changes locking to
a potentially less optimal trylock.

Put all of that under CONFIG_GENERIC_LOCKBREAK, and introduce a
__raw_spin_is_contended that uses the lock data itself to determine whether
there are waiters on the lock, to be used if CONFIG_GENERIC_LOCKBREAK is
not set.

Rename need_lockbreak to spin_needbreak, make it use spin_is_contended to
decouple it from the spinlock implementation, and make it typesafe (rwlocks
do not have any need_lockbreak sites -- why do they even get bloated up
with that break_lock then?).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:20 +01:00
Ingo Molnar
2355188570 x86: avoid build warning
fix this build warning:

 include/asm/topology_32.h: In function 'node_to_first_cpu':
 include/asm/topology_32.h:66: warning: unused variable 'mask'

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:10 +01:00
Jeremy Fitzhardinge
5548fecdff x86: clean up bitops-related warnings
Add casts to appropriate places to silence spurious bitops warnings.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:30:55 +01:00
Roland McGrath
5b88abbf77 ptrace: generic PTRACE_SINGLEBLOCK
This makes ptrace_request handle PTRACE_SINGLEBLOCK along with
PTRACE_CONT et al.  The new generic code makes use of the
arch_has_block_step macro and generic entry points on machines
that define them.

[ mingo@elte.hu: bugfix ]

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:30:53 +01:00
Roland McGrath
dc802c2d2e ptrace: arch_has_block_step
This defines the new macro arch_has_block_step() in linux/ptrace.h, a
default for when asm/ptrace.h does not define it.  This is the analog
of arch_has_single_step() for step-until-branch on machines that have
it.  It declares the new user_enable_block_step function, which goes
with the existing user_enable_single_step and user_disable_single_step.
This is not used yet, but paves the way to harmonize on this interface
for the arch-specific calls on all machines.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:30:53 +01:00
Roland McGrath
fb7fa8f174 ptrace: arch_has_single_step
This defines the new macro arch_has_single_step() in linux/ptrace.h, a
default for when asm/ptrace.h does not define it.  It declares the new
user_enable_single_step and user_disable_single_step functions.
This is not used yet, but paves the way to harmonize on this interface
for the arch-specific calls on all machines.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:30:47 +01:00
Bernhard Walle
c9cce83dd1 x86: remove extern declarations for code, data, bss resources
This patch removes the extern struct resource declarations for
data_resource, code_resource and bss_resource on x86 and declares that
three structures as static as done on other architectures like IA64.

On i386, these structures are moved to setup_32.c (from e820_32.c) because
that's code that is not specific to e820 and also required on EFI systems.
That makes the "extern" reference superfluous.

On x86_64, data_resource, code_resource and bss_resource are passed to
e820_reserve_resources() as arguments just as done on i386 and IA64.  That
also avoids the "extern" reference and it's possible to make it static.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:30:32 +01:00
Thomas Gleixner
02456c708e x86: nuke a ton of dead hpet code
No users, just ballast

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 13:30:27 +01:00
Thomas Gleixner
70a2002563 x86: move pmtmr related declarations
Move more stuff out of proto.h

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 13:30:18 +01:00
Thomas Gleixner
c202f298de x86: clean up arch/x86/ia32/sys_ia32.c
White space and coding style clenaup.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 13:30:08 +01:00
Venki Pallipadi
6378ddb592 time: track accurate idle time with tick_sched.idle_sleeptime
Current idle time in kstat is based on jiffies and is coarse grained.
tick_sched.idle_sleeptime is making some attempt to keep track of idle time
in a fine grained manner.  But, it is not handling the time spent in
interrupts fully.

Make tick_sched.idle_sleeptime accurate with respect to time spent on
handling interrupts and also add tick_sched.idle_lastupdate, which keeps
track of last time when idle_sleeptime was updated.

This statistics will be crucial for cpufreq-ondemand governor, which can
shed some conservative gaurd band that is uses today while setting the
frequency.  The ondemand changes that uses the exact idle time is coming
soon.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:30:04 +01:00
Balaji Rao
e3f37a54f6 x86: assign IRQs to HPET timers
The userspace API for the HPET (see Documentation/hpet.txt) did not work. The
HPET_IE_ON ioctl was failing as there was no IRQ assigned to the timer
device. This patch fixes it by allocating IRQs to timer blocks in the HPET.

arch/x86/kernel/hpet.c |   13 +++++--------
drivers/char/hpet.c    |   45 ++++++++++++++++++++++++++++++++++++++-------
include/linux/hpet.h   |    2 +-
3 files changed, 44 insertions(+), 16 deletions(-)

Signed-off-by: Balaji Rao <balajirrao@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:30:03 +01:00
Thomas Gleixner
4713e22ce8 clocksource: add unregister function to disable unusable clocksources
On x86 the PIT might become an unusable clocksource. Add an unregister
function to provide a possibilty to remove the PIT from the list of
available clock sources.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 13:30:02 +01:00
Atsushi Nemoto
1d76c26228 clocksource: make CLOCKSOURCE_MASK bullet-proof
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:30:01 +01:00
Pavel Machek
a6fa8e5a61 time: clean hungarian notation from timers
Clean up hungarian notation from timer code.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:30:00 +01:00
Benny Halevy
99fadcd764 nfs: convert NFS_*(inode) helpers to static inline
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:11 -05:00
Benny Halevy
3a10c30acc nfs: obliterate NFS_FLAGS macro
use NFS_I(inode)->flags instead

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:11 -05:00
Chuck Lever
f1ec08cb94 SUNRPC: Use appropriate argument types in rpcb client
Clean up: Follow recommendations of Chapter 5 of Documentation/CodingStyle
and use "u32" instead of "__u32" for types in definitions that are not
shared with user space.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:09 -05:00
Chuck Lever
c0e07cb68d NFS: NFS version number is unsigned
RPC protocol version numbers are unsigned.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:08 -05:00
Chuck Lever
883bb163f8 NLM: Introduce an arguments structure for nlmclnt_init()
Clean up: pass 5 arguments to nlmclnt_init() in a structure similar to the
new nfs_client_initdata structure.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2008-01-30 02:06:07 -05:00
Chuck Lever
1093a60ef3 NLM/NFS: Use cached nlm_host when calling nlmclnt_proc()
Now that each NFS mount point caches its own nlm_host structure, it can be
passed to nlmclnt_proc() for each lock request.  By pinning an nlm_host for
each mount point, we trade the overhead of looking up or creating a fresh
nlm_host struct during every NLM procedure call for a little extra memory.

We also restrict the nlmclnt_proc symbol to limit the use of this call to
in-tree modules.

Note that nlm_lookup_host() (just removed from the client's per-request
NLM processing) could also trigger an nlm_host garbage collection.  Now
client-side nlm_host garbage collection occurs only during NFS mount
processing.  Since the NFS client now holds a reference on these nlm_host
structures, they wouldn't have been affected by garbage collection
anyway.

Given that nlm_lookup_host() reorders the global nlm_host chain after
every successful lookup, and that a garbage collection could be triggered
during the call, we've removed a significant amount of per-NLM-request
CPU processing overhead.

Sidebar: there are only a few remaining references to the internals of
NFS inodes in the client-side NLM code.  The only references I found are
related to extracting or comparing the inode's file handle via NFS_FH().
One is in nlmclnt_grant(); the other is in nlmclnt_setlockargs().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:07 -05:00
Chuck Lever
9289e7f91a NFS: Invoke nlmclnt_init during NFS mount processing
Cache an appropriate nlm_host structure in the NFS client's mount point
metadata for later use.

Note that there is no need to set NFS_MOUNT_NONLM in the error case -- if
nfs_start_lockd() returns a non-zero value, its callers ensure that the
mount request fails outright.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:07 -05:00
Chuck Lever
52c4044d00 NLM: Introduce external nlm_host set-up and tear-down functions
We would like to remove the per-lock-operation nlm_lookup_host() call from
nlmclnt_proc().

The new architecture pins an nlm_host structure to each NFS client
superblock that has the "lock" mount option set.  The NFS client passes
in the pinned nlm_host structure during each call to nlmclnt_proc().  NFS
client unmount processing "puts" the nlm_host so it can be garbage-
collected later.

This patch introduces externally callable NLM functions that handle
mount-time nlm_host set up and tear-down.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:06 -05:00
Chuck Lever
b454ae9060 SUNRPC: fewer conditionals in the format_ip_address routines
Clean up: have the set up routines explicitly pass the strings to be used
for the transport name and NETID.  This removes a number of conditionals
and dependencies on rpc_xprt.prot, which is overloaded.

Tighten up type checking on the address_strings array while we're at it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:04 -05:00
Trond Myklebust
59dca3b28c NFS: Fix the 'proto=' mount option
Currently, if you have a server mounted using networking protocol, you
cannot specify a different value using the 'proto=' option on another
mountpoint.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:00 -05:00
Trond Myklebust
331702337f NFS: Support per-mountpoint timeout parameters.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:59 -05:00
Trond Myklebust
ba7392bb37 SUNRPC: Add support for per-client timeout values
In order to be able to support setting the timeo and retrans parameters on
a per-mountpoint basis, we move the rpc_timeout structure into the
rpc_clnt.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:59 -05:00
Trond Myklebust
2881ae74e6 SUNRPC: Clean up the transport timeout initialisation
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:58 -05:00
Trond Myklebust
69dd716c5f NFSv4: Add socket proto argument to setclientid
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:58 -05:00
Chuck Lever
6e4cffd7b2 NFS: Expand server address storage in nfs_client struct
Prepare for managing larger addresses in the NFS client by widening the
nfs_client struct's cl_addr field.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>

(Modified to work with the new parameters for nfs_alloc_client)
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:54 -05:00
Chuck Lever
4392f25922 NFS: Increase size of cl_ipaddr field to hold IPv6 addresses
The nfs_client's cl_ipaddr field needs to be larger to hold strings that
represent IPv6 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:51 -05:00
Chuck Lever
cc38bac3a0 NFS: Ensure NFSv4 SETCLIENTID send buffer is large enough
Ensure that the RPC buffer size specified for NFSv4 SETCLIENTID procedures
matches what we are encoding into the buffer.  See the definition of
struct nfs4_setclientid {} and the encode_setclientid() function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:51 -05:00
Chuck Lever
0fb2b7e945 SUNRPC: Move universal address definitions to global header
Universal addresses are defined in RFC 1833 and clarified in RFC 3530.  We
need to use them in several places in the NFS and RPC clients, so move the
relevant definition and block comment to an appropriate global include
file.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:50 -05:00
Trond Myklebust
40c553193d NFS: Remove the redundant nfs_client->cl_nfsversion
We can get the same information from the rpc_ops structure instead.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:49 -05:00
Trond Myklebust
bfc69a4566 NFS: define a function to update nfsi->cache_change_attribute
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:47 -05:00
Trond Myklebust
c087567d3f SUNRPC: Remove the obsolete RPC_WAITQ macro
Now that we've killed off all the users.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:41 -05:00
Trond Myklebust
47fe064831 SUNRPC: Unexport rpc_init_task() and rpc_execute()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:40 -05:00
Trond Myklebust
e8f5d77c80 SUNRPC: allow the caller of rpc_run_task to preallocate the struct rpc_task
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:38 -05:00
Trond Myklebust
b5627943ab SUNRPC: Remove the now unused function rpc_call_setup()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:35 -05:00
Trond Myklebust
bdc7f021f3 NFS: Clean up the (commit|read|write)_setup() callback routines
Move the common code for setting up the nfs_write_data and nfs_read_data
structures into fs/nfs/read.c, fs/nfs/write.c and fs/nfs/direct.c.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:32 -05:00
Trond Myklebust
77de2c590e SUNRPC: Add a helper rpc_call_start() that initialises task->tk_action
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:31 -05:00
Trond Myklebust
3ff7576dda SUNRPC: Clean up the initialisation of priority queue scheduling info.
We want the default scheduling priority (priority == 0) to remain
RPC_PRIORITY_NORMAL.

Also ensure that the priority wait queue scheduling is per process id
instead of sometimes being per thread, and sometimes being per inode.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:30 -05:00
Trond Myklebust
c970aa85e7 SUNRPC: Clean up rpc_run_task
Make it use the new task initialiser structure instead of acting as a
wrapper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:30 -05:00
Trond Myklebust
84115e1cd4 SUNRPC: Cleanup of rpc_task initialisation
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:30 -05:00
Trond Myklebust
62da3b2488 SUNRPC: Rename xprt_disconnect()
xprt_disconnect() should really only be called when the transport shutdown
is completed, and it is time to wake up any pending tasks. Rename it to
xprt_disconnect_done() in order to reflect the semantical change.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:27 -05:00
Trond Myklebust
3b948ae5be SUNRPC: Allow the client to detect if the TCP connection is closed
Add an xprt->state bit to enable the TCP ->state_change() method to signal
whether or not the TCP connection is in the process of closing down.
This will to be used by the reconnection logic in a separate patch.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:25 -05:00
Trond Myklebust
66af1e5585 SUNRPC: Fix a race in xs_tcp_state_change()
When scheduling the autoclose RPC call, we want to ensure that we don't
race against the test_bit() call in xprt_clear_locked().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:24 -05:00
Steve Dickson
ef818a28fa NFS: Stop sillyname renames and unmounts from racing
Added an active/deactive mechanism to the nfs_server structure
allowing async operations to hold off umount until the
operations are done.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:24 -05:00
Trond Myklebust
acee478afc NFS: Clean up the write request locking.
Ensure that we set/clear NFS_PAGE_TAG_LOCKED when the nfs_page is hashed.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:24 -05:00
Paul Moore
13541b3ada NetLabel: Add auditing to the static labeling mechanism
This patch adds auditing support to the NetLabel static labeling mechanism.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-30 08:17:29 +11:00
Paul Moore
d621d35e57 SELinux: Enable dynamic enable/disable of the network access checks
This patch introduces a mechanism for checking when labeled IPsec or SECMARK
are in use by keeping introducing a configuration reference counter for each
subsystem.  In the case of labeled IPsec, whenever a labeled SA or SPD entry
is created the labeled IPsec/XFRM reference count is increased and when the
entry is removed it is decreased.  In the case of SECMARK, when a SECMARK
target is created the reference count is increased and later decreased when the
target is removed.  These reference counters allow SELinux to quickly determine
if either of these subsystems are enabled.

NetLabel already has a similar mechanism which provides the netlbl_enabled()
function.

This patch also renames the selinux_relabel_packet_permission() function to
selinux_secmark_relabel_packet_permission() as the original name and
description were misleading in that they referenced a single packet label which
is not the case.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-30 08:17:26 +11:00
Martin K. Petersen
7da975a231 Fix blktrace compile warning
request_queue_t is deprecated

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-29 21:55:15 +01:00
Jens Axboe
023ccde109 block: fix warning on compile with CONFIG_BLOCK
struct io_context was not defined, just add an empty forward decl.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-29 21:55:14 +01:00
Linus Torvalds
0ba6c33bcd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25: (1470 commits)
  [IPV6] ADDRLABEL: Fix double free on label deletion.
  [PPP]: Sparse warning fixes.
  [IPV4] fib_trie: remove unneeded NULL check
  [IPV4] fib_trie: More whitespace cleanup.
  [NET_SCHED]: Use nla_policy for attribute validation in ematches
  [NET_SCHED]: Use nla_policy for attribute validation in actions
  [NET_SCHED]: Use nla_policy for attribute validation in classifiers
  [NET_SCHED]: Use nla_policy for attribute validation in packet schedulers
  [NET_SCHED]: sch_api: introduce constant for rate table size
  [NET_SCHED]: Use typeful attribute parsing helpers
  [NET_SCHED]: Use typeful attribute construction helpers
  [NET_SCHED]: Use NLA_PUT_STRING for string dumping
  [NET_SCHED]: Use nla_nest_start/nla_nest_end
  [NET_SCHED]: Propagate nla_parse return value
  [NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get
  [NET_SCHED]: act_api: use nlmsg_parse
  [NET_SCHED]: act_api: fix netlink API conversion bug
  [NET_SCHED]: sch_netem: use nla_parse_nested_compat
  [NET_SCHED]: sch_atm: fix format string warning
  [NETNS]: Add namespace for ICMP replying code.
  ...
2008-01-29 22:54:01 +11:00
Linus Torvalds
5ea293a904 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (79 commits)
  Remove references to "make dep"
  kconfig: document use of HAVE_*
  Introduce new section reference annotations tags: __ref, __refdata, __refconst
  kbuild: warn about ld added unique sections
  kbuild: add verbose option to Section mismatch reporting in modpost
  kconfig: tristate choices with mixed tristate and boolean values
  asm-generic/vmlix.lds.h: simplify __mem{init,exit}* dependencies
  remove __attribute_used__
  kbuild: support ARCH=x86 in buildtar
  kconfig: remove "enable"
  kbuild: simplified warning report in modpost
  kbuild: introduce a few helpers in modpost
  kbuild: use simpler section mismatch warnings in modpost
  kbuild: link vmlinux.o before kallsyms passes
  kbuild: introduce new option to enhance section mismatch analysis
  Use separate sections for __dev/__cpu/__mem code/data
  compiler.h: introduce __section()
  all archs: consolidate init and exit sections in vmlinux.lds.h
  kbuild: check section names consistently in modpost
  kbuild: introduce blacklisting in modpost
  ...
2008-01-29 22:46:14 +11:00
Linus Torvalds
03bc26cfef Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  Module: check to see if we have a built in module with the same name
  module: add module taint on ndiswrapper
  module: fix the module name length in param_sysfs_builtin
  module: make module_address_lookup safe
  module: better OOPS and lockdep coverage for loading modules
  module: Fix gratuitous sprintf in module.c
  module: wait for dependent modules doing init.
  module: Don't report discarded init pages as kernel text.
2008-01-29 22:45:39 +11:00
Rusty Russell
6dd06c9fbe module: make module_address_lookup safe
module_address_lookup releases preemption then returns a pointer into
the module space.  The only user (kallsyms) copies the result, so just
do that under the preempt disable.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-01-29 17:13:23 +11:00
Mingming Cao
7b7510662f jbd2: add lockdep support
Ported from similar patch for the jbd layer.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-01-28 23:58:27 -05:00
Alex Tomas
c9de560ded ext4: Add multi block allocator for ext4
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-01-29 00:19:52 -05:00
Alex Tomas
1988b51e47 ext4: Add new functions for searching extent tree
Add the functions ext4_ext_search_left() and ext4_ext_search_right(),
which are used by mballoc during ext4_ext_get_blocks to decided whether
to merge extent information.

Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Johann Lombardi <johann@clusterfs.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V
aa02ad67d9 ext4: Add ext4_find_next_bit()
This function is used by the ext4 multi block allocator patches.

Also add generic_find_next_le_bit

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2008-01-28 23:58:27 -05:00