Commit Graph

8947 Commits

Author SHA1 Message Date
Herbert Xu
27ab256864 [UDP]: Avoid repeated counting of checksum errors due to peeking
Currently it is possible for two processes to peek on the same socket
and end up incrementing the error counter twice for the same packet.

This patch fixes it by making skb_kill_datagram return whether it
succeeded in unlinking the packet and only incrementing the counter
if it did.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:32 -08:00
Pavel Emelyanov
68dd299bc8 [INET]: Merge sys.net.ipv4.ip_forward and sys.net.ipv4.conf.all.forwarding
AFAIS these two entries should do the same thing - change the
forwarding state on ipv4_devconf and on all the devices.

I propose to merge the handlers together using ctl paths.

The inet_forward_change() is static after this and I move
it higher to be closer to other "propagation" helpers and
to avoid diff making patches based on { and } matching :)
i.e. - make them easier to read.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:31 -08:00
Pavel Emelyanov
08913681e4 [NET]: Remove the empty net_table
I have removed all the entries from this table (core_table,
ipv4_table and tr_table), so now we can safely drop it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:29 -08:00
Pavel Emelyanov
36f0bebd98 [TR]: Use ctl paths to register net/token-ring/ table
The same thing for token-ring - use ctl paths and get
rid of external references on the tr_table.

Unfortunately, I couldn't split this patch into cleanup and
use-the-paths parts.

As a lame excuse I can say, that the cleanup is just moving
the tr_table from one file to another - closet to a single
variable, that this ctl table tunes. Since the source  file
becomes empty after the move, I remove it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:28 -08:00
Patrick McHardy
02f014d888 [NETFILTER]: nf_queue: move list_head/skb/id to struct nf_info
Move common fields for queue management to struct nf_info and rename it
to struct nf_queue_entry. The avoids one allocation/free per packet and
simplifies the code a bit.

Alternatively we could add some private room at the tail, but since
all current users use identical structs this seems easier.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:14 -08:00
Patrick McHardy
c01cd429fc [NETFILTER]: nf_queue: move queueing related functions/struct to seperate header
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:10 -08:00
Patrick McHardy
f9d8928f83 [NETFILTER]: nf_queue: remove unused data pointer
Remove the data pointer from struct nf_queue_handler. It has never been used
and is useless for the only handler that really matters, nfnetlink_queue,
since the handler is shared between all instances.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:10 -08:00
Patrick McHardy
e3ac529815 [NETFILTER]: nf_queue: make queue_handler const
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:09 -08:00
Patrick McHardy
1841a4c7ae [NETFILTER]: nf_ct_h323: remove ipv6 module dependency
nf_conntrack_h323 needs ip6_route_output for the call forwarding filter.
Add a ->route function to nf_afinfo and use that to avoid pulling in the
ipv6 module.

Fix the #ifdef for the IPv6 code while I'm at it - the IPv6 support is
only needed when IPv6 conntrack is enabled.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:05 -08:00
Patrick McHardy
50c164a81f [NETFILTER]: x_tables: add rateest match
Add rate estimator match. The rate estimator match can match on
estimated rates by the RATEEST target. It supports matching on
absolute bps/pps values, comparing two rate estimators and matching
on the difference between two rate estimators.

This is what I use to route outgoing data connections from a FTP
server over two lines based on the  available bandwidth:

# estimate outgoing rates
iptables -t mangle -A POSTROUTING -o eth0 -j RATEEST --rateest-name eth0 \
                                                     --rateest-interval 250ms \
                                                     --rateest-ewma 0.5s
iptables -t mangle -A POSTROUTING -o ppp0 -j RATEEST --rateest-name ppp0 \
                                                     --rateest-interval 250ms \
                                                     --rateest-ewma 0.5s

# mark based on available bandwidth
iptables -t mangle -A BALANCE -m state --state NEW \
                              -m helper --helper ftp \
                              -m rateest --rateest-delta \
                                         --rateest1 eth0 \
                                         --rateest-bps1 2.5mbit \
                                         --rateest-gt \
                                         --rateest2 ppp0 \
                                         --rateest-bps2 2mbit \
                              -j CONNMARK --set-mark 0x1

iptables -t mangle -A BALANCE -m state --state NEW \
                              -m helper --helper ftp \
                              -m rateest --rateest-delta \
                                         --rateest1 ppp0 \
                                         --rateest-bps1 2mbit \
                                         --rateest-gt \
                                         --rateest2 eth0 \
                                         --rateest-bps2 2.5mbit \
                              -j CONNMARK --set-mark 0x2

iptables -t mangle -A BALANCE -j CONNMARK --restore-mark

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:03 -08:00
Patrick McHardy
5859034d7e [NETFILTER]: x_tables: add RATEEST target
Add new rate estimator target (using gen_estimator). In combination with
the rateest match (next patch) this can be used for load-based multipath
routing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:02 -08:00
Jan Engelhardt
5c350e5a38 [NETFILTER]: IPv6 capable xt_TOS v1 target
Extends the xt_DSCP target by xt_TOS v1 to add support for selectively
setting and flipping any bit in the IPv4 TOS and IPv6 Priority fields.
(ipt_TOS and xt_DSCP only accepted a limited range of possible
values.)

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:00 -08:00
Jan Engelhardt
f1095ab51d [NETFILTER]: IPv6 capable xt_tos v1 match
Extends the xt_dscp match by xt_tos v1 to add support for selectively
matching any bit in the IPv4 TOS and IPv6 Priority fields. (ipt_tos
and xt_dscp only accepted a limited range of possible values.)

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:00 -08:00
Laszlo Attila Toth
e2cf5ecbea [NETFILTER]: ipt_addrtype: limit address type checking to an interface
Addrtype match has a new revision (1), which lets address type checking
limited to the interface the current packet belongs to. Either incoming
or outgoing interface can be used depending on the current hook. In the
FORWARD hook two maches should be used if both interfaces have to be checked.
The new structure is ipt_addrtype_info_v1.

Revision 0 lets older userspace programs use the match as earlier.
ipt_addrtype_info is used.

Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:56 -08:00
Jan Engelhardt
0265ab44ba [NETFILTER]: merge ipt_owner/ip6t_owner in xt_owner
xt_owner merges ipt_owner and ip6t_owner, and adds a flag to match
on socket (non-)existence.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:55 -08:00
Eric Dumazet
259d4e41f3 [NETFILTER]: x_tables: struct xt_table_info diet
Instead of using a big array of NR_CPUS entries, we can compute the size
needed at runtime, using nr_cpu_ids

This should save some ram (especially on David's machines where NR_CPUS=4096 :
32 KB can be saved per table, and 64KB for dynamically allocated ones (because
of slab/slub alignements) )

In particular, the 'bootstrap' tables are not any more static (in data
section) but on stack as their size is now very small.

This also should reduce the size used on stack in compat functions
(get_info() declares an automatic variable, that could be bigger than kernel
stack size for big NR_CPUS)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:54 -08:00
Sven Schnelle
338e8a7926 [NETFILTER]: x_tables: add TCPOPTSTRIP target
Signed-off-by: Sven Schnelle <svens@bitebene.org>
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:51 -08:00
Eric W. Biederman
e51b6ba077 sysctl: Infrastructure for per namespace sysctls
This patch implements the basic infrastructure for per namespace sysctls.

A list of lists of sysctl headers is added, allowing each namespace to have
it's own list of sysctl headers.

Each list of sysctl headers has a lookup function to find the first
sysctl header in the list, allowing the lists to have a per namespace
instance.

register_sysct_root is added to tell sysctl.c about additional
lists of sysctl_headers.  As all of the users are expected to be in
kernel no unregister function is provided.

sysctl_head_next is updated to walk through the list of lists.

__register_sysctl_paths is added to add a new sysctl table on
a non-default sysctl list.

The only intrusive part of this patch is propagating the information
to decided which list of sysctls to use for sysctl_check_table.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:17 -08:00
Eric W. Biederman
23eb06de7d sysctl: Remember the ctl_table we passed to register_sysctl_paths
By doing this we allow users of register_sysctl_paths that build
and dynamically allocate their ctl_table to be simpler.  This allows
them to just remember the ctl_table_header returned from
register_sysctl_paths from which they can now find the
ctl_table array they need to free.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:17 -08:00
Eric W. Biederman
29e796fd4d sysctl: Add register_sysctl_paths function
There are a number of modules that register a sysctl table
somewhere deeply nested in the sysctl hierarchy, such as
fs/nfs, fs/xfs, dev/cdrom, etc.

They all specify several dummy ctl_tables for the path name.
This patch implements register_sysctl_path that takes
an additional path name, and makes up dummy sysctl nodes
for each component.

This patch was originally written by Olaf Kirch and
brought to my attention and reworked some by Olaf Hering.
I have changed a few additional things so the bugs are mine.

After converting all of the easy callers Olaf Hering observed
allyesconfig ARCH=i386, the patch reduces the final binary size by 9369 bytes.

.text +897
.data -7008

   text    data     bss     dec     hex filename
   26959310        4045899 4718592 35723801        2211a19 ../vmlinux-vanilla
   26960207        4038891 4718592 35717690        221023a ../O-allyesconfig/vmlinux

So this change is both a space savings and a code simplification.

CC: Olaf Kirch <okir@suse.de>
CC: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:16 -08:00
Patrick McHardy
be0ea7d5da [NETFILTER]: Convert old checksum helper names
Kill the defines again, convert to the new checksum helper names and
remove the dependency of NET_ACT_NAT on NETFILTER.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:15 -08:00
Patrick McHardy
a99a00cf1a [NET]: Move netfilter checksum helpers to net/core/utils.c
This allows to get rid of the CONFIG_NETFILTER dependency of NET_ACT_NAT.
This patch redefines the old names to keep the noise low, the next patch
converts all users.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:14 -08:00
Gerrit Renker
0c86962076 [DCCP]: Integrate state transitions for passive-close
This adds the necessary state transitions for the two forms of passive-close

 * PASSIVE_CLOSE    - which is entered when a host   receives a Close;
 * PASSIVE_CLOSEREQ - which is entered when a client receives a CloseReq.

Here is a detailed account of what the patch does in each state.

1) Receiving CloseReq

  The pseudo-code in 8.5 says:

     Step 13: Process CloseReq
          If P.type == CloseReq and S.state < CLOSEREQ,
              Generate Close
              S.state := CLOSING
              Set CLOSING timer.

  This means we need to address what to do in CLOSED, LISTEN, REQUEST, RESPOND, PARTOPEN, and OPEN.

   * CLOSED:         silently ignore - it may be a late or duplicate CloseReq;
   * LISTEN/RESPOND: will not appear, since Step 7 is performed first (we know we are the client);
   * REQUEST:        perform Step 13 directly (no need to enqueue packet);
   * OPEN/PARTOPEN:  enter PASSIVE_CLOSEREQ so that the application has a chance to process unread data.

  When already in PASSIVE_CLOSEREQ, no second CloseReq is enqueued. In any other state, the CloseReq is ignored.
  I think that this offers some robustness against rare and pathological cases: e.g. a simultaneous close where
  the client sends a Close and the server a CloseReq. The client will then be retransmitting its Close until it
  gets the Reset, so ignoring the CloseReq while in state CLOSING is sane.

2) Receiving Close

  The code below from 8.5 is unconditional.

     Step 14: Process Close
          If P.type == Close,
              Generate Reset(Closed)
              Tear down connection
              Drop packet and return

  Thus we need to consider all states:
   * CLOSED:           silently ignore, since this can happen when a retransmitted or late Close arrives;
   * LISTEN:           dccp_rcv_state_process() will generate a Reset ("No Connection");
   * REQUEST:          perform Step 14 directly (no need to enqueue packet);
   * RESPOND:          dccp_check_req() will generate a Reset ("Packet Error") -- left it at that;
   * OPEN/PARTOPEN:    enter PASSIVE_CLOSE so that application has a chance to process unread data;
   * CLOSEREQ:         server performed active-close -- perform Step 14;
   * CLOSING:          simultaneous-close: use a tie-breaker to avoid message ping-pong (see comment);
   * PASSIVE_CLOSEREQ: ignore - the peer has a bug (sending first a CloseReq and now a Close);
   * TIMEWAIT:         packet is ignored.

   Note that the condition of receiving a packet in state CLOSED here is different from the condition "there
   is no socket for such a connection": the socket still exists, but its state indicates it is unusable.

   Last, dccp_finish_passive_close sets either DCCP_CLOSED or DCCP_CLOSING = TCP_CLOSING, so that
   sk_stream_wait_close() will wait for the final Reset (which will trigger CLOSING => CLOSED).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:13 -08:00
Gerrit Renker
f11135a344 [DCCP]: Dedicated auxiliary states to support passive-close
This adds two auxiliary states to deal with passive closes:
  * PASSIVE_CLOSE    (reached from OPEN via reception of Close)    and
  * PASSIVE_CLOSEREQ (reached from OPEN via reception of CloseReq)
as internal intermediate states.

These states are used to allow a receiver to process unread data before
acknowledging the received connection-termination-request (the Close/CloseReq).

Without such support, it will happen that passively-closed sockets enter CLOSED
state while there is still unprocessed data in the queue; leading to unexpected
and erratic API behaviour.

PASSIVE_CLOSE has been mapped into TCPF_CLOSE_WAIT, so that the code will
seamlessly work with inet_accept() (which tests for this state).

The state names are thanks to Arnaldo, who suggested this naming scheme
following an earlier revision of this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:12 -08:00
Fred L. Templin
c7dc89c0ac [IPV6]: Add RFC4214 support
This patch includes support for the Intra-Site Automatic Tunnel
Addressing Protocol (ISATAP) per RFC4214. It uses the SIT
module, and is configured using extensions to the "iproute2"
utility. The diffs are specific to the Linux 2.6.24-rc2 kernel
distribution.

This version includes the diff for ./include/linux/if.h which was
missing in the v2.4 submission and is needed to make the
patch compile. The patch has been installed, compiled and
tested in a clean 2.6.24-rc2 kernel build area.

Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:09 -08:00
Pavel Emelyanov
8d8ad9d7c4 [NET]: Name magic constants in sock_wake_async()
The sock_wake_async() performs a bit different actions
depending on "how" argument. Unfortunately this argument
ony has numerical magic values.

I propose to give names to their constants to help people
reading this function callers understand what's going on
without looking into this function all the time.

I suppose this is 2.6.25 material, but if it's not (or the
naming seems poor/bad/awful), I can rework it against the
current net-2.6 tree.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:03 -08:00
Ilpo Järvinen
e7d0362dd4 [PCOUNTER] Fix build error without CONFIG_SMP
I keep getting this build error and couldn't find anyone fixing
it in archives. ...Maybe all net developers except me build
just SMP kernels :-).

In file included from include/net/sock.h:50,
                 from ipc/mqueue.c:35:
include/linux/pcounter.h: In function 'pcounter_add':
include/linux/pcounter.h:87: error: 'struct pcounter' has no
member named 'value'
make[1]: *** [ipc/mqueue.o] Error 1
make: *** [ipc] Error 2

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:50 -08:00
Gerrit Renker
9b91ad2747 [DCCP]: Make PARTOPEN an autonomous state
This decouples PARTOPEN from TCP-specific stream-states.

It thus addresses the FIXME.

The code has been checked with regard to dependency on PARTOPEN and FIN_WAIT1
states (to which PARTOPEN previously was mapped): there is no difference, as
PARTOPEN is always referred to directly (i.e. not via the mapping to TCP
state).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:44 -08:00
Arnaldo Carvalho de Melo
de4d1db369 [LIB]: Introduce struct pcounter
This just generalises what was introduced by Eric Dumazet for the struct proto
inuse field in 286ab3d460:

    [NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way.

Please look at the comment in there to see the rationale.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:39 -08:00
Ron Rindjunsky
6b4e324164 mac80211: adding 802.11n definitions in ieee80211.h
This patch adds several structs and definitions to ieee80211.h
to support 802.11n draft specifications.
As 802.11n depends on and extends the 802.11e standard in several issues,
there are also several definitions that belong to 802.11e.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:38 -08:00
Denis V. Lunev
e372c41401 [NET]: Consolidate net namespace related proc files creation.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:28 -08:00
Denis V. Lunev
97c53cacf0 [NET]: Make rtnetlink infrastructure network namespace aware (v3)
After this patch none of the netlink callback support anything
except the initial network namespace but the rtnetlink infrastructure
now handles multiple network namespaces.

Changes from v2:
- IPv6 addrlabel processing

Changes from v1:
- no need for special rtnl_unlock handling
- fixed IPv6 ndisc

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:25 -08:00
Michael Wu
c237899d1f ieee80211: Add IEEE80211_MAX_FRAME_LEN to linux/ieee80211.h
This patch adds IEEE80211_MAX_FRAME_LEN which is useful for drivers trying
to determine how much to allocate for their RX buffers.

It also updates the comment on IEEE80211_MAX_DATA_LEN based on revisions
in 802.11e.

IEEE80211_MAX_FRAG_THRESHOLD and IEEE80211_MAX_RTS_THRESHOLD are also
revised due to the new maximum frame size.

Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:20 -08:00
Stephen Hemminger
c7b6ea24b4 [NETPOLL]: Don't need rx_flags.
The rx_flags variable is redundant. Turning rx on/off is done
via setting the rx_np pointer.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:18 -08:00
Stephen Hemminger
0953864160 [NETPOLL]: no need to store local_mac
The local_mac is managed by the network device, no need to keep a
spare copy and all the management problems that could cause.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:17 -08:00
Oliver Hartkopp
1f98eefae8 [CAN]: Add missing Kbuild entries
This patch adds the missing Kbuild entries and the missing Kbuild file
in include/linux/can for the CAN subsystem.

Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:13 -08:00
Oliver Hartkopp
4195e31780 [CAN]: Fix plain integer definitions in userspace header.
This patch fixes the use of plain integers instead of __u32 in a struct
that is visible from kernel space and user space.

Thanks to Sam Ravnborg for pointing out the wrong plain int usage.

Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:12 -08:00
Oliver Hartkopp
ffd980f976 [CAN]: Add broadcast manager (bcm) protocol
This patch adds the CAN broadcast manager (bcm) protocol.

Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:11 -08:00
Oliver Hartkopp
c18ce101f2 [CAN]: Add raw protocol
This patch adds the CAN raw protocol.

Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:10 -08:00
Oliver Hartkopp
0d66548a10 [CAN]: Add PF_CAN core module
This patch adds the CAN core functionality but no protocols or drivers.
No protocol implementations are included here.  They come as separate
patches.  Protocol numbers are already in include/linux/can.h.

Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:10 -08:00
Oliver Hartkopp
cd05acfe65 [CAN]: Allocate protocol numbers for PF_CAN
This patch adds a protocol/address family number, ARP hardware type,
ethernet packet type, and a line discipline number for the SocketCAN
implementation.

Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:09 -08:00
Ilpo Järvinen
68f8353b48 [TCP]: Rewrite SACK block processing & sack_recv_cache use
Key points of this patch are:

  - In case new SACK information is advance only type, no skb
    processing below previously discovered highest point is done
  - Optimize cases below highest point too since there's no need
    to always go up to highest point (which is very likely still
    present in that SACK), this is not entirely true though
    because I'm dropping the fastpath_skb_hint which could
    previously optimize those cases even better. Whether that's
    significant, I'm not too sure.

Currently it will provide skipping by walking. Combined with
RB-tree, all skipping would become fast too regardless of window
size (can be done incrementally later).

Previously a number of cases in TCP SACK processing fails to
take advantage of costly stored information in sack_recv_cache,
most importantly, expected events such as cumulative ACK and new
hole ACKs. Processing on such ACKs result in rather long walks
building up latencies (which easily gets nasty when window is
huge). Those latencies are often completely unnecessary
compared with the amount of _new_ information received, usually
for cumulative ACK there's no new information at all, yet TCP
walks whole queue unnecessary potentially taking a number of
costly cache misses on the way, etc.!

Since the inclusion of highest_sack, there's a lot information
that is very likely redundant (SACK fastpath hint stuff,
fackets_out, highest_sack), though there's no ultimate guarantee
that they'll remain the same whole the time (in all unearthly
scenarios). Take advantage of this knowledge here and drop
fastpath hint and use direct access to highest SACKed skb as
a replacement.

Effectively "special cased" fastpath is dropped. This change
adds some complexity to introduce better coveraged "fastpath",
though the added complexity should make TCP behave more cache
friendly.

The current ACK's SACK blocks are compared against each cached
block individially and only ranges that are new are then scanned
by the high constant walk. For other parts of write queue, even
when in previously known part of the SACK blocks, a faster skip
function is used (if necessary at all). In addition, whenever
possible, TCP fast-forwards to highest_sack skb that was made
available by an earlier patch. In typical case, no other things
but this fast-forward and mandatory markings after that occur
making the access pattern quite similar to the former fastpath
"special case".

DSACKs are special case that must always be walked.

The local to recv_sack_cache copying could be more intelligent
w.r.t DSACKs which are likely to be there only once but that
is left to a separate patch.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:07 -08:00
Ilpo Järvinen
fd6dad616d [TCP]: Earlier SACK block verification & simplify access to them
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:07 -08:00
Ilpo Järvinen
a47e5a988a [TCP]: Convert highest_sack to sk_buff to allow direct access
It is going to replace the sack fastpath hint quite soon... :-)

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:03 -08:00
YOSHIFUJI Hideaki
2a8cc6c890 [IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.
Policy table is implemented as an RCU linear list since we do not expect
large list nor frequent updates.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:58 -08:00
Patrick McHardy
6e23ae2a48 [NETFILTER]: Introduce NF_INET_ hook values
The IPv4 and IPv6 hook values are identical, yet some code tries to figure
out the "correct" value by looking at the address family. Introduce NF_INET_*
values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__
section for userspace compatibility.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:55 -08:00
Jens Axboe
9c55e01c0c [TCP]: Splice receive support.
Support for network splice receive.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:31 -08:00
Jens Axboe
bbdfc2f706 [SPLICE]: Don't assume regular pages in splice_to_pipe()
Allow caller to pass in a release function, there might be
other resources that need releasing as well. Needed for
network receive.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:30 -08:00
Sam Ravnborg
312b1485fb Introduce new section reference annotations tags: __ref, __refdata, __refconst
Today we have the following annotations for functions/data
referencing __init/__exit functions / data:

__init_refok     => for init functions
__initdata_refok => for init data
__exit_refok     => for exit functions

There is really no difference between the __init and __exit
versions and simplify it and to introduce a shorter annotation
the following new annotations are introduced:

__ref      => for functions (code) that
              references __*init / __*exit
__refdata  => for variables
__refconst => for const variables

Whit this annotation is it more obvious what the annotation
is for and there is no longer the arbitary division
between __init and __exit code.

The mechanishm is the same as before - a special section
is created which is made part of the usual sections
in the linker script.

We will start to see annotations like this:

-static struct pci_serial_quirk pci_serial_quirks[] = {
+static const struct pci_serial_quirk pci_serial_quirks[] __refconst = {
-----------------
-static struct notifier_block __cpuinitdata cpuid_class_cpu_notifier =
+static struct notifier_block cpuid_class_cpu_notifier __refdata =
----------------
-static int threshold_cpu_callback(struct notifier_block *nfb,
+static int __ref threshold_cpu_callback(struct notifier_block *nfb,

[The above is just random samples].

Note: No modifications were needed in modpost
to support the new sections due to the newly introduced
blacklisting.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-01-28 23:21:19 +01:00
Adrian Bunk
3ff6eecca4 remove __attribute_used__
Remove the deprecated __attribute_used__.

[Introduce __section in a few places to silence checkpatch /sam]

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-01-28 23:21:18 +01:00
Sam Ravnborg
eb8f689046 Use separate sections for __dev/__cpu/__mem code/data
Introducing separate sections for __dev* (HOTPLUG),
__cpu* (HOTPLUG_CPU) and __mem* (MEMORY_HOTPLUG)
allows us to do a much more reliable Section mismatch
check in modpost. We are no longer dependent on the actual
configuration of for example HOTPLUG.

This has the effect that all users see much more
Section mismatch warnings than before because they
were almost all hidden when HOTPLUG was enabled.
The advantage of this is that when building a piece
of code then it is much more likely that the Section
mismatch errors are spotted and the warnings will be
felt less random of nature.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Adrian Bunk <bunk@kernel.org>
2008-01-28 23:21:17 +01:00
Sam Ravnborg
f3fe866d59 compiler.h: introduce __section()
Add a new helper: __section() that makes a section definition
much shorter and more readable.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-01-28 23:21:17 +01:00
Robert P. J. Day
7998a73166 A few corrections to include/linux/Kbuild
auxvec.h, i2c-dev.h and vt.h *should* be unifdef'ed i2o-dev.h does not need
unifdef'ing

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-01-28 23:14:36 +01:00
Linus Torvalds
f4798748de Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (24 commits)
  HID: ADS/Tech Radio si470x needs blacklist entry
  HID: Logitech Extreme 3D needs NOGET quirk
  HID: Refactor MS Presenter 8K key mapping
  HID: MS Presenter mapping for PID 0x0701
  HID: Support Samsung IR remote
  HID: fix compilation of hidbp drivers without usbhid
  HID: Blacklist the Gretag-Macbeth Huey display colorimeter
  HID: the `bit' in hidinput_mapping_quirks() is an out parameter
  HID: remove redundant WARN_ON()s in order not to scare users
  HID: force hiddev creation for SONY PS3 controller
  HID: Use hid blacklist in usbmouse/usbkbd
  HID: proper handling of MS 4k and 6k devices
  HID: remove unused variable in quirk event handler
  HID: hid-input quirk for BTC 8193
  HID: separate hid-input event quirks from generic code
  HID: refactor mapping to input subsystem for quirky devices
  HID: Microsoft Wireless Optical Desktop 3.0 quirk
  HID: Add support for Logitech Elite keyboards
  HID: add full support for Genius KB-29E
  HID: fix a potential bug in pointer casting
  ...
2008-01-29 08:52:20 +11:00
Linus Torvalds
8d01eddf29 Merge branch 'for-2.6.25' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.25' of git://git.kernel.dk/linux-2.6-block:
  block: implement drain buffers
  __bio_clone: don't calculate hw/phys segment counts
  block: allow queue dma_alignment of zero
  blktrace: Add blktrace ioctls to SCSI generic devices
2008-01-29 08:51:56 +11:00
Linus Torvalds
f0f0052069 Merge branch 'blk-end-request' of git://git.kernel.dk/linux-2.6-block
* 'blk-end-request' of git://git.kernel.dk/linux-2.6-block: (30 commits)
  blk_end_request: changing xsysace (take 4)
  blk_end_request: changing ub (take 4)
  blk_end_request: cleanup of request completion (take 4)
  blk_end_request: cleanup 'uptodate' related code (take 4)
  blk_end_request: remove/unexport end_that_request_* (take 4)
  blk_end_request: changing scsi (take 4)
  blk_end_request: add bidi completion interface (take 4)
  blk_end_request: changing ide-cd (take 4)
  blk_end_request: add callback feature (take 4)
  blk_end_request: changing ide normal caller (take 4)
  blk_end_request: changing cpqarray (take 4)
  blk_end_request: changing cciss (take 4)
  blk_end_request: changing ide-scsi (take 4)
  blk_end_request: changing s390 (take 4)
  blk_end_request: changing mmc (take 4)
  blk_end_request: changing i2o_block (take 4)
  blk_end_request: changing viocd (take 4)
  blk_end_request: changing xen-blkfront (take 4)
  blk_end_request: changing viodasd (take 4)
  blk_end_request: changing sx8 (take 4)
  ...
2008-01-29 08:51:32 +11:00
Linus Torvalds
68fbda7de0 Merge branch 'sg' of git://git.kernel.dk/linux-2.6-block
* 'sg' of git://git.kernel.dk/linux-2.6-block:
  SG: work with the SCSI fixed maximum allocations.
  SG: Convert SCSI to use scatterlist helpers for sg chaining
  SG: Move functions to lib/scatterlist.c and add sg chaining allocator helpers
2008-01-29 08:51:05 +11:00
Linus Torvalds
d4928196fe Merge branch 'cfq-ioc-share' of git://git.kernel.dk/linux-2.6-block
* 'cfq-ioc-share' of git://git.kernel.dk/linux-2.6-block:
  cfq-iosched: kill some big inlines
  cfq-iosched: relax IOPRIO_CLASS_IDLE restrictions
  kernel: add CLONE_IO to specifically request sharing of IO contexts
  io_context sharing - anticipatory changes
  block: cfq: make the io contect sharing lockless
  io_context sharing - cfq changes
  io context sharing: preliminary support
  ioprio: move io priority from task_struct to io_context
2008-01-29 08:50:42 +11:00
Robert Schedel
fe56caa97e HID: Support Samsung IR remote
Samsung USB remotes (0419:0001) are rejected by kernel 2.6.23, because the
report descriptor from the remote contains a 48 bit HID report field. HID 1.11
states: Fields may span at most 4 bytes.

This patch, based on 2.6.23, fixes this by modifying the internal report
descriptor in hid-quirks.c. Additional user space support (e.g. LIRC) is
required to fetch the information from the hiddev interface.

The burden to reconstruct the data is moved into userspace (lirc through hiddev).
There is no need to set HID_QUIRK_HIDDEV quirk, as the device has also output
applications, which trigger the creation of hiddev device automatically.

Signed-off-by: Robert Schedel <r.schedel@yahoo.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2008-01-28 14:51:22 +01:00
Fengguang Wu
70d215c4a7 HID: the `bit' in hidinput_mapping_quirks() is an out parameter
Fix a panic, by changing
	hidinput_mapping_quirks(,, unsigned long *bit,)
to
	hidinput_mapping_quirks(,, unsigned long **bit,)

The `bit' in this function is an out parameter.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2008-01-28 14:51:22 +01:00
Jiri Kosina
628edcde87 HID: proper handling of MS 4k and 6k devices
This removes ugly macros IS_* to distinguish devices that
need special handling in hid-input, and establish proper
quirks for them.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2008-01-28 14:51:21 +01:00
Jiri Kosina
36ccaad640 HID: hid-input quirk for BTC 8193
BTC 8193 keyboard handles its scrollwheel in very non-standard way.
It produces two non-standard usages for scrolling up and down, in
both cases with postive value equaling to 1. We handle this by temporary
mapping, which we then catch in quirk event handler, and remap to
negative HWHEEL even in order to introduce correct behavior.

Also the button requires special mapping, as it triggers standard-violating
usage code.

Reported in kernel.org bugzilla #9385

Reported-by: Kir Kolyshkin <kir@sacred.ru>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2008-01-28 14:51:21 +01:00
Jiri Kosina
87bc2aa993 HID: separate hid-input event quirks from generic code
This patch separates also the hid-input quirks that have to be
applied at the time the event occurs, so that the generic code
handling HUT-compliant devices is not messed up by them too much.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2008-01-28 14:51:20 +01:00
Jiri Kosina
10bd065fac HID: refactor mapping to input subsystem for quirky devices
Currently, the handling of mapping between hid and input for devices
that don't conform to HUT 1.12 specification is very messy -- no per-device
handling, no blacklists, conditions on idVendor and idProduct placed
all over the code.

This patch moves all the device-specific input mapping to a separate
file, and introduces a blacklist-style handling for non-standard
device-specific mappings.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2008-01-28 14:51:20 +01:00
Jiri Kosina
af9e0eacdc HID: add full support for Genius KB-29E
Genius KB-29E has broken report descriptor, which causes some of the
Consumer usages to appear incorrectly as Button usages. We fix it by
fixing the report descriptor before it is being parsed.

Also a few of the keys violate the HUT standard, so they need a special
handling. They currently fall into "Reserved" range as per HUT 1.12.

Reported-by: Szekeres Istvan <szekeres@iii.hu>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2008-01-28 14:51:20 +01:00
Pavel Troller
c80e5ffac0 HID: Implement horizontal wheel handling for A4 Tech X5-005D
This mouse distinguishes horizontal wheel from vertical by a special "pseudo
event" GenericDesktop.00b8, with values of 0 for vertical and 8 for horizontal
wheel. Because this event is supplied by the parser too late, we need to delay
a wheel event, wait for this one and send either REL_WHEEL or REL_HWHEEL to
input depending on the event value.

Signed-off-by: Pavel Troller <patrol@sinus.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2008-01-28 14:51:19 +01:00
Michel Daenzer
81e1a87550 HID: Rename some code identifiers from PowerBook specific to Apple generic
Preserve identifiers exposed in build and run time configuration though in
order not to break existing configurations.

This is in preparation for adding support for Apple aluminum USB keyboards.

Signed-off-by: Michel Daenzer <michel@tungstengraphics.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2008-01-28 14:51:19 +01:00
Russell King
c00d4ffdba Merge branch 'orion' into devel
* orion: (26 commits)
  [ARM] Orion: implement power-off method for QNAP TS-109/209
  [ARM] Orion: add support for QNAP TS-109/TS-209
  [ARM] Orion: I2C support
  [I2C] i2c-mv64xxx: Don't set i2c_adapter.retries
  [I2C] Split mv643xx I2C platform support
  [ARM] Orion: enable CONFIG_RTC_DRV_M41T80 for D-Link DNS-323
  [ARM] Orion defconfig
  [ARM] Orion: add support for Orion/MV88F5181 based D-Link DNS-323
  [ARM] Orion: MV88F5181 support bits
  [ARM] Orion: Buffalo/Revogear Kurobox Pro support
  [ARM] OrionNAS RD board support
  [ARM] Orion: support for Marvell Orion-2 (88F5281) Development Board
  [ARM] Orion: common platform setup for Gigabit Ethernet port
  [ARM] Orion: platform device registration for UART, USB and NAND
  [ARM] Orion: system timer support
  [ARM] Orion edge GPIO IRQ support
  [ARM] Orion: IRQ support
  [ARM] Orion: provide GPIO method for enabling hardware assisted blinking
  [ARM] Orion: GPIO support
  [ARM] Orion: programable address map support
  ...

Conflicts:

	arch/arm/Kconfig
	arch/arm/Makefile

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-01-28 13:21:30 +00:00
James Bottomley
7cedb1f17f SG: work with the SCSI fixed maximum allocations.
SCSI sg table allocation has a maximum size (of SCSI_MAX_SG_SEGMENTS,
currently 128) and this will cause a BUG_ON() in SCSI if something
tries an allocation over it.  This patch adds a size limit to the
chaining allocator to allow the specification of the maximum
allocation size for chaining, so we always chain in units of the
maximum SCSI allocation size.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:54:49 +01:00
James Bottomley
fa0ccd837e block: implement drain buffers
These DMA drain buffer implementations in drivers are pretty horrible
to do in terms of manipulating the scatterlist.  Plus they're being
done at least in drivers/ide and drivers/ata, so we now have code
duplication.

The one use case for this, as I understand it is AHCI controllers doing
PIO mode to mmc devices but translating this to DMA at the controller
level.

So, what about adding a callback to the block layer that permits the
adding of the drain buffer for the problem devices.  The idea is that
you'd do this in slave_configure after you find one of these devices.

The beauty of doing it in the block layer is that it quietly adds the
drain buffer to the end of the sg list, so it automatically gets mapped
(and unmapped) without anything unusual having to be done to the
scatterlist in driver/scsi or drivers/ata and without any alteration to
the transfer length.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:54:11 +01:00
Jens Axboe
fadad878cc kernel: add CLONE_IO to specifically request sharing of IO contexts
syslets (or other threads/processes that want io context sharing) can
set this to enforce sharing of io context.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:50:36 +01:00
Jens Axboe
4ac845a2e9 block: cfq: make the io contect sharing lockless
The io context sharing introduced a per-ioc spinlock, that would protect
the cfq io context lookup. That is a regression from the original, since
we never needed any locking there because the ioc/cic were process private.

The cic lookup is changed from an rbtree construct to a radix tree, which
we can then use RCU to make the reader side lockless. That is the performance
critical path, modifying the radix tree is only done on process creation
(when that process first does IO, actually) and on process exit (if that
process has done IO).

As it so happens, radix trees are also much faster for this type of
lookup where the key is a pointer. It's a very sparse tree.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:50:33 +01:00
Jens Axboe
d38ecf935f io context sharing: preliminary support
Detach task state from ioc, instead keep track of how many processes
are accessing the ioc.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:50:31 +01:00
Jens Axboe
fd0928df98 ioprio: move io priority from task_struct to io_context
This is where it belongs and then it doesn't take up space for a
process that doesn't do IO.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:50:29 +01:00
Kiyoshi Ueda
5450d3e1d6 blk_end_request: cleanup 'uptodate' related code (take 4)
This patch converts 'uptodate' arguments of no longer exported
interfaces, end_that_request_first/last, to 'error', and removes
internal conversions for it in blk_end_request interfaces.

Also, this patch removes no longer needed end_io_error().

Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:37:13 +01:00
Kiyoshi Ueda
3bcddeac1c blk_end_request: remove/unexport end_that_request_* (take 4)
This patch removes the following functions:
  o end_that_request_first()
  o end_that_request_chunk()
and stops exporting the functions below:
  o end_that_request_last()

Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:37:12 +01:00
Kiyoshi Ueda
e3a04fe34a blk_end_request: add bidi completion interface (take 4)
This patch adds a variant of the interface, blk_end_bidi_request(),
which completes a bidi request.

Bidi request must be completed as a whole, both rq and rq->next_rq
at once.  So the interface has 2 arguments for completion size.

As for ->end_io, only rq->end_io is called (rq->next_rq->end_io is not
called).  So if special completion handling is needed, the handler
must be set to rq->end_io.
And the handler must take care of freeing next_rq too, since
the interface doesn't care of it if rq->end_io is not NULL.

Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:37:08 +01:00
Kiyoshi Ueda
e19a3ab058 blk_end_request: add callback feature (take 4)
This patch adds a variant of the interface, blk_end_request_callback(),
which has driver callback feature.

Drivers may need to do special works between end_that_request_first()
and end_that_request_last().
For such drivers, blk_end_request_callback() allows it to pass
a callback function which is called between end_that_request_first()
and end_that_request_last().

This interface is only for fallback of other blk_end_request interfaces.
Drivers should avoid their tricky behaviors and use other interfaces
as much as possible.

Currently, only one driver, ide-cd, needs this interface.
So this interface should/will be removed, after the driver removes
such tricky behaviors.

o ide-cd (cdrom_newpc_intr())
  In PIO mode, cdrom_newpc_intr() needs to defer end_that_request_last()
  until the device clears DRQ_STAT and raises an interrupt after
  end_that_request_first().
  So end_that_request_first() and end_that_request_last() are called
  separately in cdrom_newpc_intr().

  This means blk_end_request_callback() has to return without
  completing request even if no leftover in the request.
  To satisfy the requirement, callback function has return value
  so that drivers can tell blk_end_request_callback() to return
  without completing request.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:37:04 +01:00
Kiyoshi Ueda
3b11313a6c blk_end_request: add/export functions to get request size (take 4)
This patch adds/exports functions to get the size of request in bytes.
They are useful because blk_end_request interfaces take bytes
as a completed I/O size instead of sectors.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:35:56 +01:00
Kiyoshi Ueda
336cdb4003 blk_end_request: add new request completion interface (take 4)
This patch adds 2 new interfaces for request completion:
  o blk_end_request()   : called without queue lock
  o __blk_end_request() : called with queue lock held

blk_end_request takes 'error' as an argument instead of 'uptodate',
which current end_that_request_* take.
The meanings of values are below and the value is used when bio is
completed.
    0 : success
  < 0 : error

Some device drivers call some generic functions below between
end_that_request_{first/chunk} and end_that_request_last().
  o add_disk_randomness()
  o blk_queue_end_tag()
  o blkdev_dequeue_request()
These are called in the blk_end_request interfaces as a part of
generic request completion.
So all device drivers become to call above functions.
To decide whether to call blkdev_dequeue_request(), blk_end_request
uses list_empty(&rq->queuelist) (blk_queued_rq() macro is added for it).
So drivers must re-initialize it using list_init() or so before calling
blk_end_request if drivers use it for its specific purpose.
(Currently, there is no driver which completes request without
 re-initializing the queuelist after used it.  So rq->queuelist
 can be used for the purpose above.)

"Normal" drivers can be converted to use blk_end_request()
in a standard way shown below.

 a) end_that_request_{chunk/first}
    spin_lock_irqsave()
    (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request())
    end_that_request_last()
    spin_unlock_irqrestore()
    => blk_end_request()

 b) spin_lock_irqsave()
    end_that_request_{chunk/first}
    (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request())
    end_that_request_last()
    spin_unlock_irqrestore()
    => spin_lock_irqsave()
       __blk_end_request()
       spin_unlock_irqsave()

 c) spin_lock_irqsave()
    (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request())
    end_that_request_last()
    spin_unlock_irqrestore()
    => blk_end_request()   or   spin_lock_irqsave()
                                __blk_end_request()
                                spin_unlock_irqrestore()

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:35:53 +01:00
Jens Axboe
0db9299f48 SG: Move functions to lib/scatterlist.c and add sg chaining allocator helpers
Manually doing chained sg lists is not trivial, so add some helpers
to make sure that drivers get it right.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:05:27 +01:00
Pete Wyckoff
482eb68916 block: allow queue dma_alignment of zero
Let queue_dma_alignment return 0 if it was specifically set to 0.
This permits devices with no particular alignment restrictions to
use arbitrary user space buffers without copying.

Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:04:46 +01:00
Christof Schmitt
6da127ad09 blktrace: Add blktrace ioctls to SCSI generic devices
Since the SCSI layer uses the request queues from the block layer, blktrace can
also be used to trace the requests to all SCSI devices (like SCSI tape drives),
not only disks. The only missing part is the ioctl interface to start and stop
tracing.

This patch adds the SETUP, START, STOP and TEARDOWN ioctls from blktrace to the
sg device files. With this change, blktrace can be used for SCSI devices like
for disks, e.g.: blktrace -d /dev/sg1 -o - | blkparse -i -

Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:04:46 +01:00
David Brownell
e9f1373b64 i2c: Add i2c_new_dummy() utility
This adds a i2c_new_dummy() primitive to help work with devices
that consume multiple addresses, which include many I2C eeproms
and at least one RTC.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-01-27 18:14:52 +01:00
David Brownell
9b766b814d i2c: Stop using the redundant client list
The i2c_adapter.clients list of i2c_client nodes duplicates driver
model state.  This patch starts removing that list, letting us remove
most existing users of those i2c-core lists.

 * The core I2C code now iterates over the driver model's list instead
   of the i2c-internal one in some places where it's safe:
      - Passing a command/ioctl to each client, a mechanims
        used almost exclusively by DVB adapters;
      - Device address checking, in both i2c-core and i2c-dev.

 * Provide i2c_verify_client() to use with driver model iterators.

 * Flag the relevant i2c_adapter and i2c_client fields as deprecated,
   to help prevent new users from appearing.

For the moment the list needs to stick around, since some issues show
up when deleting devices created by legacy I2C drivers.  (They don't
follow standard driver model rules.  Removing those devices can cause
self-deadlocks.)

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-01-27 18:14:51 +01:00
Jean Delvare
7bca0871ca i2c: Discard unused driver IDs
Discard all I2C driver IDs that aren't used anywhere. That's not just a
couple of them, but more like 49 or one quarter of all defined IDs! And
this is just a first pass, next will come all IDs that are set but
never used, or used but never set.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-01-27 18:14:50 +01:00
David Brownell
6d16bfb5e8 i2c/tps65010: move header to <linux/i2c/...>
Move the tps65010 header file from the OMAP arch directory to the
more generic <linux/i2c/...> directory, and remove the spurious
dependency of this driver on OMAP.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-01-27 18:14:49 +01:00
Jean Delvare
026526f5af i2c: Drop redundant i2c_driver.list
i2c_driver.list is superfluous, this list duplicates the one
maintained by the driver core. Drop it.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
2008-01-27 18:14:49 +01:00
Jean Delvare
87c6c22945 i2c: Drop redundant i2c_adapter.list
i2c_adapter.list is superfluous, this list duplicates the one
maintained by the driver core. Drop it.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
2008-01-27 18:14:48 +01:00
Jean Delvare
e48d33193d i2c: Change prototypes of refcounting functions
Use more standard prototypes for i2c_use_client() and
i2c_release_client(). The former now returns a pointer to the client,
and the latter no longer returns anything. This matches what all other
subsystems do.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: David Brownell <david-b@pacbell.net>
2008-01-27 18:14:48 +01:00
Jean Delvare
bdc511f438 i2c: Use the driver model reference counting
Don't implement our own reference counting mechanism for i2c clients
when the driver model already has one.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: David Brownell <david-b@pacbell.net>
2008-01-27 18:14:48 +01:00
Mark M. Hoffman
bfb6df24fa i2c: Constify client address data
This patch allows much of the I2C client address data to move from initdata
into text.
    
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-01-27 18:14:46 +01:00
Adrian Bunk
7e8b99251b i2c: some overdue driver removal
This patch contains the overdue removal of three I2C drivers.

[JD: In fact only i2c-ixp4xx can be removed at the moment, the other two
platforms don't implement the generic GPIO layer yet.]

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-01-27 18:14:46 +01:00
Adrian Bunk
eee87d3196 i2c: the scheduled I2C RTC driver removal
This patch contains the scheduled removal of legacy I2C RTC drivers with 
replacement drivers.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-01-27 18:14:45 +01:00
Bartlomiej Zolnierkiewicz
7267c33774 ide: remove REQ_TYPE_ATA_CMD
Based on the earlier work by Tejun Heo.

All users are gone so we can finally remove it.

Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:13 +01:00
Bartlomiej Zolnierkiewicz
34f5d5ae35 ide: switch set_xfer_rate() to use REQ_TYPE_ATA_TASKFILE requests
Based on the earlier work by Tejun Heo.

Switch set_xfer_rate() to use REQ_TYPE_ATA_TASKFILE requests
and make ide_wait_cmd() static.

There should be no functionality changes caused by this patch.

Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:12 +01:00
Bartlomiej Zolnierkiewicz
2624565caa ide: use wait_drive_not_busy() in drive_cmd_intr() (take 2)
Use wait_drive_not_busy() in drive_cmd_intr().

v2:
* Fix wait_drive_not_busy() comment (noticed by Sergei).

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:11 +01:00
Bartlomiej Zolnierkiewicz
4906f3b4cd ide: kill DATA_READY define
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:11 +01:00
Tejun Heo
4d7a984bdc ide: task_end_request() fix
task_end_request() modified to always call ide_end_drive_cmd()
for taskfile requests.  Previously, ide_end_drive_cmd() was
called only when IDE_TFLAG_FLAGGED was set.  Also,
ide_dma_intr() is modified to use task_end_request().

Enables TASKFILE ioctls to get valid register outputs on
successful completion.

Bart:
- ported it over recent IDE changes

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:11 +01:00
Bartlomiej Zolnierkiewicz
657cc1a8f6 ide: set IDE_TFLAG_IN_* flags before queuing/executing command
* Add IDE_TFLAG_{HOB,TF,DEVICE} defines.

* Set IDE_TFLAG_IN_* flags in {do_rw,ide_no_data,ide_raw}_taskfile() users.

* Remove no longer needed ->tf_flags setup from ide_end_drive_cmd().

There should be no functionality changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:10 +01:00
Tejun Heo
35cf2b94d0 ide: fix ->io_32bit race in ide_taskfile_ioctl()
In ide_taskfile_ioctl(), there was a race condition involving
drive->io_32bit.  It was cleared and restored during ioctl
requests but there was no synchronization with other requests.
So, other requests could execute with the altered ->io_32bit
setting or updated drive->io_32bit could be overwritten by
ide_taskfile_ioctl().

This patch adds IDE_TFLAG_IO_16BIT flag to indicate to
ide_pio_datablock() that 16-bit I/O is needed regardless of
drive->io_32bit settting.

Bart:
- ported it over recent IDE changes

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:10 +01:00
Bartlomiej Zolnierkiewicz
9e47be0c97 ide: remove broken disk byte-swapping support
Remove broken disk byte-swapping support:
- it can cause a data corruption on SMP (or if using PREEMPT on UP)
- all data coming from disk are byte-swapped by taskfile_*_data() which
  results in incorrect identify data being reported by /proc/ide/ and IOCTLs
- "hdx=bswap/byteswap" kernel parameter has been broken on m68k host drivers
  (including Atari/Q40 ones) since 2.5.x days (because of 'hwif' zero-ing)
- byte-swapping is limited to PIO transfers (for working with TiVo disks on
  x86 machines using user-space solutions or dm-byteswap should result in
  much better performance because DMA can be used)

For previous discussions please see:

http://www.ussg.iu.edu/hypermail/linux/kernel/0201.0/0768.html
http://lkml.org/lkml/2004/2/28/111

[ I have dm-byteswap device mapper target if somebody is interested
  (patch is for 2.6.4 though but I'll dust it off if needed). ]

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:09 +01:00
Bartlomiej Zolnierkiewicz
81ca691981 ide: add ide_set_irq() inline helper
There should be no functionality changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:08 +01:00
Bartlomiej Zolnierkiewicz
ade2daf9c6 ide: make remaining built-in only IDE host drivers modular (take 2)
* Make remaining built-in only IDE host drivers modular, add ide-scan-pci.c
  file for probing PCI host drivers registered with IDE core (special case
  for built-in IDE and CONFIG_IDEPCI_PCIBUS_ORDER=y) and then take care of
  the ordering in which all IDE host drivers are probed when IDE is built-in
  during link time.

* Move probing of gayle, falconide, macide, q40ide and buddha (m68k arch
  specific) host drivers, before PCI ones (no PCI on m68k), ide-cris (cris
  arch specific), cmd640 (x86 arch specific) and pmac (ppc arch specific).

* Move probing of ide-cris (cris arch specific) host driver before cmd640
  (x86 arch specific).

* Move probing of mpc8xx (ppc specific) host driver before ide-pnp (depends
  on ISA and none of ppc platform that use mpc8xx supports ISA) and ide-h8300
  (h8300 arch specific).

* Add "probe_vlb" kernel parameter to cmd640 host driver and update
  Documentation/ide.txt accordingly.

* Make IDE_ARM config option visible so it can also be disabled if needed.

* Remove bogus comment from ide.c while at it.

v2:
* Fix two issues spotted by Sergei:
  - replace ENOMEM error value by ENOENT in ide-h8300 host driver
  - fix MODULE_PARM_DESC() in cmd640 host driver

Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:07 +01:00
Bartlomiej Zolnierkiewicz
cbb010c180 ide: drop 'initializing' argument from ide_register_hw()
* Rename init_hwif_data() to ide_init_port_data() and export it.

* For all users of ide_register_hw() with 'initializing' argument set
  hwif->present and hwif->hold are always zero so convert these host
  drivers to use ide_find_port()+ide_init_port_data()+ide_init_port_hw()
  instead (also no need for init_hwif_default() call since the setup
  done by it gets over-ridden by ide_init_port_hw() call).

* Drop 'initializing' argument from ide_register_hw().

Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:06 +01:00
Bartlomiej Zolnierkiewicz
57c802e84f ide: add ide_init_port_hw() helper
* Add ide_init_port_hw() helper.

* rapide.c: convert rapide_locate_hwif() to rapide_setup_ports()
  and use ide_init_port_hw().

* ide_platform.c: convert plat_ide_locate_hwif() to plat_ide_setup_ports()
  and use ide_init_port_hw().

* sgiioc4.c: use ide_init_port_hw().

* pmac.c: add 'hw_regs_t *hw' argument to pmac_ide_setup_device(),
  setup 'hw' in pmac_ide_{macio,pci}_attach() and use ide_init_port_hw()
  in pmac_ide_setup_device().

This patch is a preparation for the future changes in the IDE probing code.

There should be no functionality changes caused by this patch.

Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Anton Vorontsov <avorontsov@ru.mvista.com>
Cc: Jeremy Higdon <jeremy@sgi.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:05 +01:00
Olof Johansson
b0d5bc27ce ide: Fix build break caused by "ide: remove ideprobe_init()"
Fix build break of powerpc holly_defconfig:

In file included from arch/powerpc/platforms/embedded6xx/holly.c:24:
include/linux/ide.h:1206: error: 'CONFIG_IDE_MAX_HWIFS' undeclared here (not in a function)

There's no need to have a sized array in the prototype, might as well
turn it into a pointer.

It could probably be argued that large parts of the include file can be
covered under #ifdef CONFIG_IDE, but that's a larger undertaking.

Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:05 +01:00
Bartlomiej Zolnierkiewicz
151575e464 ide: remove ideprobe_init()
* Rename ide_device_add() to ide_device_add_all() and make it accept
  'u8 idx[MAX_HWIFS]' instead of 'u8 idx[4]' as an argument.

* Add ide_device_add() wrapper for ide_device_add_all().

* Convert ide_generic_init() to use ide_device_add_all().

* Remove no longer needed ideprobe_init().

There should be no functionality changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:05 +01:00
Bartlomiej Zolnierkiewicz
f01393e48c ide: merge ->fixup and ->quirkproc methods
* Assign drive->quirk_list in ->quirkproc implementations:
  - hpt366.c::hpt3xx_quirkproc()
  - pdc202xx_new.c::pdcnew_quirkproc()
  - pdc202xx_old.c::pdc202xx_quirkproc()

* Make ->quirkproc void.

* Move calling ->quirkproc from do_identify() to probe_hwif().

* Convert it821x_fixups() to it821x_quirkproc() in it821x.c.

* Convert siimage_fixup() to sil_quirkproc() in siimage.c, also remove
  no longer needed drive->present check from is_dev_seagate_sata().

* Convert ide_undecoded_slave() to accept 'drive' instead of 'hwif'
  as an argument.  Then convert ide_register_hw() to accept 'quirkproc'
  argument instead of 'fixup' one.

* Remove no longer needed ->fixup method.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:03 +01:00
Bartlomiej Zolnierkiewicz
15ce926ada ide: merge ->dma_host_{on,off} methods into ->dma_host_set method
Merge ->dma_host_{on,off} methods into ->dma_host_set method
which takes 'int on' argument.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:03 +01:00
Bartlomiej Zolnierkiewicz
4a546e046d ide: remove ->ide_dma_on and ->dma_off_quietly methods from ide_hwif_t
* Make ide_dma_off_quietly() and __ide_dma_on() always available.

* Drop "__" prefix from __ide_dma_on().

* Check for presence of ->dma_host_on instead of ->ide_dma_on.

* Convert all users of ->ide_dma_on and ->dma_off_quietly methods
  to use ide_dma_on() and ide_dma_off_quietly() instead.

* Remove no longer needed ->ide_dma_on and ->dma_off_quietly methods
  from ide_hwif_t.

* Make ide_dma_on() void.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:01 +01:00
Bartlomiej Zolnierkiewicz
8704de8f29 cy82c693: add ->set_dma_mode method
* Fix SWDMA/MWDMA masks in cy82c693_chipset.

* Add IDE_HFLAG_CY82C693 host flag and use it in ide_tune_dma() to
  check whether the DMA should be enabled even if ide_max_dma_mode()
  fails.

* Convert cy82c693_dma_enable() to become cy82c693_set_dma_mode()
  and remove no longer needed cy82c693_ide_dma_on().  Then set
  IDE_HFLAG_CY82C693 instead of IDE_HFLAG_TRUST_BIOS_FOR_DMA in
  cy82c693_chipset.

* Bump driver version.

As a result of this patch cy82c693 driver will configure and use DMA on
all SWDMA0-2 and MWDMA0-2 capable ATA devices instead of relying on BIOS.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-26 20:13:00 +01:00
Jean Delvare
2f0a8df40f [I2C] i2c-mv64xxx: Don't set i2c_adapter.retries
I2C adapter drivers are supposed to handle retries on nack by themselves
if they do, so there's no point in setting .retries if they don't.

As this retry mechanism is going away (at least in its current form),
clean this up now so that we don't get build failures later.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Mark A. Greer <mgreer@mvista.com>
2008-01-26 15:04:01 +00:00
Tzachi Perelstein
a0832798c0 [I2C] Split mv643xx I2C platform support
The motivation for this change is to allow other chips, like the
Marvell Orion ARM SoC family, to use the existing i2c-mv64xxx driver.

Signed-off-by: Tzachi Perelstein <tzachi@marvell.com>
Acked-by: Nicolas Pitre <nico@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Acked-by: Mark A. Greer <mgreer@mvista.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
2008-01-26 15:03:59 +00:00
Linus Torvalds
9b73e76f3c Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits)
  [SCSI] usbstorage: use last_sector_bug flag universally
  [SCSI] libsas: abstract STP task status into a function
  [SCSI] ultrastor: clean up inline asm warnings
  [SCSI] aic7xxx: fix firmware build
  [SCSI] aacraid: fib context lock for management ioctls
  [SCSI] ch: remove forward declarations
  [SCSI] ch: fix device minor number management bug
  [SCSI] ch: handle class_device_create failure properly
  [SCSI] NCR5380: fix section mismatch
  [SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices
  [SCSI] IB/iSER: add logical unit reset support
  [SCSI] don't use __GFP_DMA for sense buffers if not required
  [SCSI] use dynamically allocated sense buffer
  [SCSI] scsi.h: add macro for enclosure bit of inquiry data
  [SCSI] sd: add fix for devices with last sector access problems
  [SCSI] fix pcmcia compile problem
  [SCSI] aacraid: add Voodoo Lite class of cards.
  [SCSI] aacraid: add new driver features flags
  [SCSI] qla2xxx: Update version number to 8.02.00-k7.
  [SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command.
  ...
2008-01-25 17:19:08 -08:00
Linus Torvalds
29bd17af7d Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (31 commits)
  ocfs2: clean up bh null checks
  ocfs2: document access rules for blocked_lock_list
  configfs: file.c fix possible recursive locking
  configfs: dir.c fix possible recursive locking
  configfs: Remove EXPERIMENTAL
  ocfs2: bump version number
  ocfs2/dlm: Clear joining_node on hearbeat node down
  ocfs2: convert byte order of constant instead of variable
  ocfs2: Update default cluster timeouts
  ocfs2: printf fixes
  ocfs2: Use generic_file_llseek
  ocfs2: Safer read_inline_data()
  ocfs2: Silence false lockdep warnings
  [PATCH 2/2] ocfs2: cluster aware flock()
  [PATCH 1/2] ocfs2: add flock lock type
  ocfs2: Local alloc window size changeable via mount option
  ocfs2: Support commit= mount option
  ocfs2: Add missing permission checks
  [PATCH 2/2] ocfs2: Implement group add for online resize
  [PATCH 1/2] ocfs2: Add group extend for online resize
  ...
2008-01-25 17:11:13 -08:00
Linus Torvalds
2ba14a017a Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: (67 commits)
  fix drivers/ata/sata_fsl.c double-decl
  [libata] Prefer SCSI_SENSE_BUFFERSIZE to sizeof()
  pata_legacy: Merge winbond support
  ata_generic: Cenatek support
  pata_winbond: error return
  pata_serverworks: Fix cable types and cosmetics
  pata_mpc52xx: remove un-needed assignment
  libata: fix off-by-one in error categorization
  ahci: factor out AHCI enabling and enable AHCI before reading CAP
  ata_piix: implement SIDPR SCR access
  ata_piix: convert to prepare - activate initialization
  libata: factor out ata_pci_activate_sff_host() from ata_pci_one()
  [libata] Prefer SCSI_SENSE_BUFFERSIZE to sizeof()
  pata_legacy: resychronize with upstream changes and resubmit
  [libata] pata_legacy: typo fix
  [libata] pata_winbond: update for new ->data_xfer hook
  pata_pcmcia: convert to new data_xfer prototype
  libata annotations and fixes
  libata: use dev_driver_string() instead of "libata" in libata-sff.c
  ata_piix: kill unused constants and flags
  ...
2008-01-25 17:08:28 -08:00
Joel Becker
d69a3ad6a0 dlm: Split lock mode and flag constants into a sharable header.
This allows others to use the DLM constants without being tied to the
function API of fs/dlm.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:46:04 -08:00
Linus Torvalds
b31fde6db2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb
* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (509 commits)
  V4L/DVB (7078): radio: fix sf16fmi section mismatch
  V4L/DVB (7077): bt878: remove handcrafted PCI subsystem ID check
  V4L/DVB (7075): Make a local function static
  V4L/DVB (7074): DiB7000P: correct tuning problem for 7MHz channel
  V4L/DVB (7073): DiB7070: Reception quality improved
  V4L/DVB (7072): sets the MT2060 IF1 frequency according to EEPROM
  V4L/DVB (7071): DiB0700: Start streaming the right way
  V4L/DVB (7070): Fix some tuning problems
  V4L/DVB (7069):  Support for myTV.t
  V4L/DVB (7068): Add support for WinTV Nova-T-CE driver
  V4L/DVB (7067): fix autoserach in the Hauppauge NOVA-T 500
  V4L/DVB (7066):  ASUS My Cinema U3000 Mini DVBT Tuner
  V4L/DVB (7065): Artec T14BR patches
  V4L/DVB (7063): xc5000: Fix OOPS caused by missing firmware
  V4L/DVB (7062): radio-si570x: Some fixes and new USB ID addition
  V4L/DVB (7061): radio-si470x: Some cleanups
  V4L/DVB (7060): em28xx: remove has_tuner
  V4L/DVB (7059): cx88: Ensure the tuner is reset correctly
  V4L/DVB (7058): IR corrections for the Pinnacle 800i
  V4L/DVB (7056): tuner: suppress obsolete tuner i2c address warning for XC5000 tuners
  ...
2008-01-25 13:59:51 -08:00
Linus Torvalds
f31c338675 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (67 commits)
  ide: remove redundant DMA blacklist check from __ide_dma_on()
  ide: cleanup ide_set_dma()
  ide: remove redundant ->ide_dma_on call from set_using_dma()
  sc1200: move DMA timings to timing tables
  ide: add IDE_HFLAG_ABUSE_SET_DMA_MODE host flag
  sis5513: factor out UDMA programming code
  pdc202xx_new: move PIO programming code to pdcnew_set_pio_mode()
  ide: make 'extra' field in struct ide_port_info u8
  ide: kill duplicate code in ide_dump_{ata,atapi}_status()
  ide-disk: use ide_get_lba_addr()
  ide: printk fix
  ide: add ide_tf_read() helper
  ide: fix registers loading order in ide_dump_ata_status()
  ide-disk: use do_rw_taskfile() (take 2)
  ide-disk: add ide_tf_set_cmd() helper
  ide-disk: extend timeout for PIO-in commands
  ide: remove 'handler' field from ide_task_t (take 2)
  ide: use ->data_phase to set ->handler in do_rw_taskfile()
  ide: convert do_rw_taskfile() to use ->data_phase
  ide: merge flagged_taskfile() into do_rw_taskfile()
  ...
2008-01-25 13:57:26 -08:00
Bartlomiej Zolnierkiewicz
4db90a1452 ide: add IDE_HFLAG_ABUSE_SET_DMA_MODE host flag
* Add IDE_HFLAG_ABUSE_SET_DMA_MODE host flag and use it to decide
  what to do with transfer modes < XFER_PIO_0 in ide_set_xfer_rate().

* Set IDE_HFLAG_ABUSE_SET_DMA_MODE in host drivers that need it
  (aec62xx, amd74xx, cs5520, cs5535, hpt34x, hpt366, pdc202xx_old,
  serverworks, tc86c001 and via82cxxx) and cleanup ->set_dma_mode
  methods in host drivers that don't (IDE core code guarantees that
  ->set_dma_mode will be called only for modes which are present
  in SWDMA/MWDMA/UDMA masks).

While at it:

* Add IDE_HFLAGS_HPT34X/HPT3XX/PDC202XX/SVWKS define in
  hpt34x/hpt366/pdc202xx_old/serverworks host driver.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:18 +01:00
Bartlomiej Zolnierkiewicz
3071a9d00b ide: make 'extra' field in struct ide_port_info u8
The maximum value used currently for 'extra' field in struct ide_port_info
is 240.

Make 'extra' u8 so it packs nicely together with enablebits[] and 'chipset'
fields (ide_pci_enablebit_t is 3 bytes and hwif_chipset_t is 1 byte).

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:17 +01:00
Bartlomiej Zolnierkiewicz
a501633c7d ide-disk: use ide_get_lba_addr()
* Export ide_get_lba_addr().

* Convert idedisk_{read_native,set}_max_address() to use ide_get_lba_addr().

* Remove incorrect comment from idedisk_read_native_max_address()
  (noticed by Sergei).

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:17 +01:00
Bartlomiej Zolnierkiewicz
c2b57cdc1d ide: add ide_tf_read() helper
* Factor out code reading taskfile registers from ide_end_drive_cmd()
  to the new ide_tf_read() helper.

* Add IDE_TFLAG_IN_* taskfile flags to indicate the need to load
  particular IDE taskfile register in ide_tf_read().

* Update ide_end_drive_cmd() to set respective IDE_TFLAG_IN_* taksfile flags.

* Add ide_get_lba_addr() for getting LBA sector address from taskfile struct.

* Factor out code getting sector address from ide_dump_ata_status()
  to the new ide_dump_sector() function.

* Convert ide_dump_sector() to use ide_tf_read() and ide_get_lba_addr().

* Remove no longer needed ide_read_24().

The only change in functionality caused by this patch is that
ide_dump_ata_status() no longer prints "high"/"low" parts of LBA48
sector address (of course LBA48 sector address is still printed).

Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:17 +01:00
Bartlomiej Zolnierkiewicz
f6e29e35cc ide-disk: use do_rw_taskfile() (take 2)
* Add IDE_TFLAG_DMA_PIO_FALLBACK taskfile flag to indicate the need
  to skip loading taskfile registers in do_rw_taskfile().

* Export do_rw_taskfile().

* Convert __ide_do_rw_disk() to use do_rw_taskfile().

* Unexport ide_tf_load().

* Unexport {pre_task_out,task_in}_intr() and make it static.

* Remove incorrect comment about do_rw_taskfile() from <linux/ide.h>.

There should be no functionality changes caused by this patch.

v2:
* Add missing blk_fs_request() check to task_dma_ok() (for VDMA).

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:16 +01:00
Bartlomiej Zolnierkiewicz
57d7366b78 ide: remove 'handler' field from ide_task_t (take 2)
* Add IDE_TFLAG_CUSTOM_HANDLER taskfile flag and use it for internal requests
  which require custom handlers.  Check the flag in do_rw_taskfile() and set
  handler accordingly.

* Cleanup ide_init_{specify,restore,setmult}_cmd() and rename it to
  ide_tf_set_{specify,restore,setmult}_cmd().

* Make {set_geometry,recal,set_multmode}_intr() static.

* Remove no longer needed 'handler' field from ide_task_t.

v2:
* 'handler' in do_rw_taskfile() must be set to NULL initially.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:16 +01:00
Bartlomiej Zolnierkiewicz
1192e528e0 ide: use ->data_phase to set ->handler in do_rw_taskfile()
* Use ->data_phase to set ->handler in do_rw_taskfile() instead of
  setting ->handler in callers of ide_raw_taskfile()/do_rw_taskfile().

* Unexport task_no_data_intr() and make it static.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:16 +01:00
Bartlomiej Zolnierkiewicz
10d90157c8 ide: convert do_rw_taskfile() to use ->data_phase
* Use task->data_phase in do_rw_taskfile() to decide what to do.

* task->prehandler is only used by TASKFILE[_MULTI]_OUT so just
  use pre_task_out_intr() directly and remove no longer needed
  'prehandler' field from ide_task_t.

* Remove no longer needed ide_pre_handler_t type.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:16 +01:00
Bartlomiej Zolnierkiewicz
1edee60e9d ide: merge flagged_taskfile() into do_rw_taskfile()
Based on the earlier work by Tejun Heo.

task->data_phase == TASKFILE_MULTI_{IN,OUT} vs drive->mult_count == 0
check is needed also for ide_taskfile_ioctl() requests that don't have
IDE_TFLAG_FLAGGED taskfile flag set.

Cc: Tejun Heo <htejun@gmail.com>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:15 +01:00
Bartlomiej Zolnierkiewicz
866e2ec9ce ide: remove 'tf_in_flags' field from ide_task_t
* Add IDE_TFLAG_IN_DATA taskfile flag to indicate the need of reading
  IDE_DATA_REG in ide_end_drive_cmd().

  Set the new flag in ide_taskfile_ioctl() if ->in_flags.b.data is set.

* Add IDE_TFLAG_FLAGGED_SET_IN_FLAGS taskfile flag to indicate the
  need of modifying ->in_flags in ide_taskfile_ioctl().

  Set the new flag in flagged_taskfile() and move the code modifying
  ->tf_in_flags to ide_taskfile_ioctl().

  While at it remove the bogus comment: ->tf_in_flags (except .b.data)
  have no effect on selection of registers to read.

* Remove no longer needed 'tf_in_flags' field from ide_task_t.

As the result we finally have the internals of HDIO_DRIVE_TASKFILE ioctl
separated from the core IDE code.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:14 +01:00
Bartlomiej Zolnierkiewicz
ac026ff254 ide: remove 'command_type' field from ide_task_t
* Add 'data_buf' and 'nsect' variables in ide_taskfile_ioctl()
  to cache data buffer pointer and number of sectors to transfer
  (this allows us to have only one ide_diag_taskfile() call).

* Add IDE_TFLAG_WRITE taskfile flag and use it to check whether
  the REQ_RW request flag should be set.

* Move ->command_type handling from ide_diag_taskfile() to
  ide_taskfile_ioctl() and use ->req_cmd instead of ->command_type.

* Add 'nsect' parameter to ide_raw_taskfile().

* Merge ide_diag_taskfile() into ide_raw_taskfile().

* Initialize ->data_phase explicitly in idedisk_prepare_flush(),
  ide_start_power_step() and ide_disk_special().

* Remove no longer needed 'command_type' field from ide_task_t.

* Add #ifndef/#endif __KERNEL__ to <linux/hdreg.h> around no
  longer used by kernel IDE_DRIVE_TASK_* and TASKFILE_* defines.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:14 +01:00
Bartlomiej Zolnierkiewicz
7299a39184 ide: remove hwif->intrproc
Given that:

* hpt366.c::hpt3xx_intrproc() is the only user of hwif->intrproc

* hpt366.c::hpt3xx_quirkproc() sets drive->quirk_list to 1 for quirky drives
  which is a value unique to hpt366 host driver

we can remove hwif->intproc and just check for drive->quirk_list == 1
in ide_do_request().

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:14 +01:00
Bartlomiej Zolnierkiewicz
f919790f8c ide: remove SELECT_INTERRUPT()
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:13 +01:00
Bartlomiej Zolnierkiewicz
cd3dbc99da ide: remove QUIRK_LIST()
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:13 +01:00
Bartlomiej Zolnierkiewicz
2fc5738819 ide: add ide_pktcmd_tf_load() helper
Add ide_pktcmd_tf_load() helper and convert ATAPI device drivers to use it.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:13 +01:00
Bartlomiej Zolnierkiewicz
8e7657ae0f ide: remove atapi_ireason_t (take 3)
Remove atapi_ireason_t.

While at it:
* replace 'HWIF(drive)' by 'drive->hwif' (or just 'hwif' where possible)

v2:
* v1 had CD and IO bits reversed in many places.

* Use CD and IO defines from <linux/hdreg.h>.

v3:
* Fix incorrect "(ireason & IO) == test_bit()". (Noticed by Sergei)

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:12 +01:00
Bartlomiej Zolnierkiewicz
790d123989 ide: remove ata_nsector_t, ata_data_t and atapi_bcount_t
Remove ata_nsector_t, ata_data_t (unused) and atapi_bcount_t.

While at it:
* replace 'HWIF(drive)' by 'hwif'

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:12 +01:00
Bartlomiej Zolnierkiewicz
e5f9f5a89a ide: remove atapi_feature_t
Remove atapi_feature_t.

While at it:
* replace 'HWIF(drive)' by 'hwif'

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:12 +01:00
Bartlomiej Zolnierkiewicz
0e38a66a1e ide: remove atapi_error_t (take 2)
Remove atapi_error_t.

While at it:
* replace 'HWIF(drive)' by 'drive->hwif'

v2:
* Add {ILI,EOM,LFS}_ERR defines to <linux/hdreg.h>.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:12 +01:00
Bartlomiej Zolnierkiewicz
22c525b976 ide: remove ata_status_t and atapi_status_t
Remove ata_status_t (unused) and atapi_status_t.

While at it:
* replace 'HWIF(drive)' by 'drive->hwif' (or just 'hwif' where possible)

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:11 +01:00
Bartlomiej Zolnierkiewicz
6a2144146a ide: CPU endianness doesn't matter for special_t
special_t is used only internally by the IDE subsystem (it isn't
related to hardware registers and isn't exported to the user-space).

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:11 +01:00
Bartlomiej Zolnierkiewicz
29ed2a5f8c ide: remove REQ_TYPE_ATA_TASK
Based on the earlier work by Tejun Heo.

All users are gone so we can finally remove it.

Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:11 +01:00
Bartlomiej Zolnierkiewicz
807e35d695 ide: use ide_tf_load() in execute_drive_cmd()
* Add IDE_TFLAG_OUT_DEVICE taskfile flag to indicate the need of writing
  the Device register and handle it in ide_tf_load().

  Update ide_tf_load() and {do_rw,flagged}_taskfile() users accordingly.

* Use struct ide_taskfile and ide_tf_load() in execute_drive_cmd().

* Make the debugging code dump all taskfile registers for both
  REQ_ATA_TYPE_{CMD,TASK} requests and move it to ide_tf_load()
  so it also covers REQ_ATA_TYPE_TASKFILE requests.

There should be no functionality changes caused by this patch
(unless DEBUG is defined).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:10 +01:00
Bartlomiej Zolnierkiewicz
4ee06b7e67 ide: remove stale ide.h "configuration options"
Remove stale ide.h "configuration options":

* INITIAL_MULT_COUNT - always defined to 0

* SUPPORT_SLOW_DATA_PORTS - unused

* OK_TO_RESET_CONTROLLER - always defined to 1

* DISABLE_IRQ_NOSYNC - always defined to 0

Leave SUPPORT_VLB_SYNC (defined to 0 for CRIS and FRV, otherwise to 1)
for now but disallow overriding it by <asm/ide.h>.

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:08 +01:00
Bartlomiej Zolnierkiewicz
74095a91ed ide: use do_rw_taskfile() in flagged_taskfile()
Based on the earlier work by Tejun Heo.

* Move setting IDE_TFLAG_LBA48 taskfile flag from do_rw_taskfile()
  function to the callers.

* Add IDE_TFLAG_FLAGGED taskfile flag for flagged taskfiles coming
  from ide_taskfile_ioctl().  Check it instead of ->tf_out_flags.all.

* Add IDE_TFLAG_OUT_DATA taskfile flag to indicate the need to load
  IDE data register in ide_tf_load().

* Add IDE_TFLAG_OUT_* taskfile flags to indicate the need to load
  particular IDE taskfile registers in ide_tf_load().

* Update do_rw_taskfile() and ide_tf_load() users to set respective
  IDE_TFLAG_OUT_* taksfile flags.

* Add task_dma_ok() helper.

* Use IDE_TFLAG_FLAGGED taskfile flag to select HIHI mask in ide_tf_load().

* Use do_rw_taskfile() in flagged_taskfile().

* Remove no longer needed 'tf_out_flags' field from ide_task_t.

There should be no functionality changes caused by this patch.

Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:07 +01:00
Bartlomiej Zolnierkiewicz
9a3c49be5c ide: add ide_no_data_taskfile() helper
* Add ide_no_data_taskfile() helper and convert ide_raw_taskfile() w/ NO DATA
  protocol users to use it instead.

* Set ->data_phase explicitly in ide_no_data_taskfile()
  (TASKFILE_NO_DATA is defined as 0x0000).

* Unexport task_no_data_intr().

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:07 +01:00
Bartlomiej Zolnierkiewicz
9e42237f26 ide: add ide_tf_load() helper
Based on the earlier work by Tejun Heo.

* Add 'tf_flags' field (for taskfile flags) to ide_task_t.

* Add IDE_TFLAG_LBA48 taskfile flag for LBA48 taskfiles.

* Add IDE_TFLAG_NO_SELECT_MASK taskfile flag for __ide_do_rw_disk()
  which doesn't use SELECT_MASK() (looks like a bug but it requires
  some more investigation).

* Split off ide_tf_load() helper from do_rw_taskfile().

* Convert __ide_do_rw_disk() to use ide_tf_load().

There should be no functionality changes caused by this patch.

Cc: Tejun Heo <htejun@gmail.com>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:07 +01:00
Bartlomiej Zolnierkiewicz
650d841d9e ide: add struct ide_taskfile (take 2)
* Don't set write-only ide_task_t.hobRegister[6] and ide_task_t.hobRegister[7]
  in idedisk_set_max_address_ext().

* Add struct ide_taskfile and use it in ide_task_t instead of tfRegister[]
  and hobRegister[].

* Remove no longer needed IDE_CONTROL_OFFSET_HOB define.

* Add #ifndef/#endif __KERNEL__ around definitions of {task,hob}_struct_t.

While at it:

* Use ATA_LBA define for LBA bit (0x40) as suggested by Tejun Heo.

v2:
* Add missing newlines. (Noticed by Sergei)

* Use ~ATA_LBA instead of 0xBF. (Noticed by Sergei)

* Use unnamed unions for error/feature and status/command.
  (Suggested by Sergei).

There should be no functionality changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:06 +01:00
Bartlomiej Zolnierkiewicz
cd2a2d9697 ide: remove task_ioreg_t typedef (take 2)
Remove task_ioreg_t typedef from the kernel code (but leave it
in <linux/hdreg.h> for #ifndef/#endif __KERNEL__ case).

While at it also move sata_ioreg_t typedef under #ifndef/#endif __KERNEL__.

v2:
Remove name of the second parameter from ide_execute_command() declaration.
(Noticed by Sergei).

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:06 +01:00
Bartlomiej Zolnierkiewicz
1c029fd658 ide: remove ->dma_master field from ide_hwif_t (take 5)
* Convert cmd64x, hpt366 and pdc202xx_old host drivers to use
  pci_resource_start(hwif->pci_dev, 4) instead of hwif->dma_master.

* Remove no longer needed ->dma_master field from ide_hwif_t.

v2:
* Use the more readable 'hwif->dma_base - (hwif->channel * 8)' instead of
  pci_resource_start(hwif->pci_dev, 4).

v3:
* Use hwif->extra_base in hpt366/pdc20xx_old + some cosmetic fixups over v2
  (suggested by Sergei).

v4:
* Correct offsets in hpt3xxn_set_clock().

v5:
* Use hwif->extra_base in hpt366 for _real_ this time. (Noticed by Sergei)

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-01-25 22:17:05 +01:00
Hans Verkuil
0b394def21 V4L/DVB (6868): i2c-id.h: add I2C_DRIVERID_CS5345
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2008-01-25 19:04:07 -02:00
Hans Verkuil
05e997197e V4L/DVB (6487): i2c-id: add M52790 driver ID
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2008-01-25 19:01:47 -02:00
Arjan van de Ven
6d082592b6 sched: keep total / count stats in addition to the max for
Right now, the linux kernel (with scheduler statistics enabled) keeps track
of the maximum time a process is waiting to be scheduled. While the maximum
is a very useful metric, tracking average and total is equally useful
(at least for latencytop) to figure out the accumulated effect of scheduler
delays. The accumulated effect is important to judge the performance impact
of scheduler tuning/behavior.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:35 +01:00
Alexey Dobriyan
286100a6cf sched, futex: detach sched.h and futex.h
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:34 +01:00
Ingo Molnar
90739081ef softlockup: fix signedness
fix softlockup tunables signedness.

mark tunables read-mostly.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:34 +01:00
Arjan van de Ven
9745512ce7 sched: latencytop support
LatencyTOP kernel infrastructure; it measures latencies in the
scheduler and tracks it system wide and per process.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:34 +01:00
Pavel Machek
e118adef23 timers: don't #error on higher HZ values
For some crazy reason (trying to work around hw problem in i810) I wanted
to use HZ around 4000.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:34 +01:00
Ingo Molnar
6478d8800b sched: remove the !PREEMPT_BKL code
remove the !PREEMPT_BKL code.

this removes 160 lines of legacy code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:33 +01:00
Peter Zijlstra
d3d74453c3 hrtimer: fixup the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ fallback
Currently all highres=off timers are run from softirq context, but
HRTIMER_CB_IRQSAFE_NO_SOFTIRQ timers expect to run from irq context.

Fix this up by splitting it similar to the highres=on case.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:31 +01:00
Peter Zijlstra
48d5e25821 sched: rt throttling vs no_hz
We need to teach no_hz about the rt throttling because its tick driven.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:31 +01:00
Peter Zijlstra
6f505b1642 sched: rt group scheduling
Extend group scheduling to also cover the realtime classes. It uses the time
limiting introduced by the previous patch to allow multiple realtime groups.

The hard time limit is required to keep behaviour deterministic.

The algorithms used make the realtime scheduler O(tg), linear scaling wrt the
number of task groups. This is the worst case behaviour I can't seem to get out
of, the avg. case of the algorithms can be improved, I focused on correctness
and worst case.

[ akpm@linux-foundation.org: move side-effects out of BUG_ON(). ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:30 +01:00
Peter Zijlstra
fa85ae2418 sched: rt time limit
Very simple time limit on the realtime scheduling classes.
Allow the rq's realtime class to consume sched_rt_ratio of every
sched_rt_period slice. If the class exceeds this quota the fair class
will preempt the realtime class.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:29 +01:00
Peter Zijlstra
8f4d37ec07 sched: high-res preemption tick
Use HR-timers (when available) to deliver an accurate preemption tick.

The regular scheduler tick that runs at 1/HZ can be too coarse when nice
level are used. The fairness system will still keep the cpu utilisation 'fair'
by then delaying the task that got an excessive amount of CPU time but try to
minimize this by delivering preemption points spot-on.

The average frequency of this extra interrupt is sched_latency / nr_latency.
Which need not be higher than 1/HZ, its just that the distribution within the
sched_latency period is important.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:29 +01:00
Herbert Xu
02b67cc3ba sched: do not do cond_resched() when CONFIG_PREEMPT
Why do we even have cond_resched when real preemption
is on? It seems to be a waste of space and time.

remove cond_resched with CONFIG_PREEMPT on.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:28 +01:00
Peter Zijlstra
78f2c7db60 sched: SCHED_FIFO/SCHED_RR watchdog timer
Introduce a new rlimit that allows the user to set a runtime timeout on
real-time tasks their slice. Once this limit is exceeded the task will receive
SIGXCPU.

So it measures runtime since the last sleep.

Input and ideas by Thomas Gleixner and Lennart Poettering.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Lennart Poettering <mzxreary@0pointer.de>
CC: Michael Kerrisk <mtk.manpages@googlemail.com>
CC: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:27 +01:00
Peter Zijlstra
fa717060f1 sched: sched_rt_entity
Move the task_struct members specific to rt scheduling together.
A future optimization could be to put sched_entity and sched_rt_entity
into a union.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:27 +01:00
Paul E. McKenney
e260be673a Preempt-RCU: implementation
This patch implements a new version of RCU which allows its read-side
critical sections to be preempted. It uses a set of counter pairs
to keep track of the read-side critical sections and flips them
when all tasks exit read-side critical section. The details
of this implementation can be found in this paper -

	http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf

and the article-

	http://lwn.net/Articles/253651/

This patch was developed as a part of the -rt kernel development and
meant to provide better latencies when read-side critical sections of
RCU don't disable preemption.  As a consequence of keeping track of RCU
readers, the readers have a slight overhead (optimizations in the paper).
This implementation co-exists with the "classic" RCU implementations
and can be switched to at compiler.

Also includes RCU tracing summarized in debugfs.

[ akpm@linux-foundation.org: build fixes on non-preempt architectures ]

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Reviewed-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:24 +01:00
Paul E. McKenney
01c1c660f4 Preempt-RCU: reorganize RCU code into rcuclassic.c and rcupdate.c
This patch re-organizes the RCU code to enable multiple implementations
of RCU. Users of RCU continues to include rcupdate.h and the
RCU interfaces remain the same. This is in preparation for
subsequently merging the preemptible RCU implementation.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:24 +01:00
Dipankar Sarma
c2d727aa2f Preempt-RCU: Use softirq instead of tasklets for
This patch makes RCU use softirq instead of tasklets.

It also adds a memory barrier after raising the softirq
inorder to ensure that the cpu sees the most recently updated
value of rcu->cur while processing callbacks.
The discussion of the related theoretical race pointed out
by James Huang can be found here --> http://lkml.org/lkml/2007/11/20/603

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Reviewed-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:23 +01:00
Steven Rostedt
cb46984504 sched: RT-balance, add new methods to sched_class
Dmitry Adamushko found that the current implementation of the RT
balancing code left out changes to the sched_setscheduler and
rt_mutex_setprio.

This patch addresses this issue by adding methods to the schedule classes
to handle being switched out of (switched_from) and being switched into
(switched_to) a sched_class. Also a method for changing of priorities
is also added (prio_changed).

This patch also removes some duplicate logic between rt_mutex_setprio and
sched_setscheduler.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:22 +01:00
Steven Rostedt
9a897c5a67 sched: RT-balance, replace hooks with pre/post schedule and wakeup methods
To make the main sched.c code more agnostic to the schedule classes.
Instead of having specific hooks in the schedule code for the RT class
balancing. They are replaced with a pre_schedule, post_schedule
and task_wake_up methods. These methods may be used by any of the classes
but currently, only the sched_rt class implements them.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:22 +01:00
Ingo Molnar
32525d022a sched: whitespace cleanups in topology.h
whitespace cleanups in topology.h.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:20 +01:00
Ingo Molnar
52d853431e sched: reactivate fork balancing
reactivate fork balancing.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:20 +01:00
Gregory Haskins
57d885fea0 sched: add sched-domain roots
We add the notion of a root-domain which will be used later to rescope
global variables to per-domain variables.  Each exclusive cpuset
essentially defines an island domain by fully partitioning the member cpus
from any other cpuset.  However, we currently still maintain some
policy/state as global variables which transcend all cpusets.  Consider,
for instance, rt-overload state.

Whenever a new exclusive cpuset is created, we also create a new
root-domain object and move each cpu member to the root-domain's span.
By default the system creates a single root-domain with all cpus as
members (mimicking the global state we have today).

We add some plumbing for storing class specific data in our root-domain.
Whenever a RQ is switching root-domains (because of repartitioning) we
give each sched_class the opportunity to remove any state from its old
domain and add state to the new one.  This logic doesn't have any clients
yet but it will later in the series.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: Christoph Lameter <clameter@sgi.com>
CC: Paul Jackson <pj@sgi.com>
CC: Simon Derr <simon.derr@bull.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:18 +01:00
Gregory Haskins
e7693a362e sched: de-SCHED_OTHER-ize the RT path
The current wake-up code path tries to determine if it can optimize the
wake-up to "this_cpu" by computing load calculations.  The problem is that
these calculations are only relevant to SCHED_OTHER tasks where load is king.
For RT tasks, priority is king.  So the load calculation is completely wasted
bandwidth.

Therefore, we create a new sched_class interface to help with
pre-wakeup routing decisions and move the load calculation as a function
of CFS task's class.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:09 +01:00
Gregory Haskins
73fe6aae84 sched: add RT-balance cpu-weight
Some RT tasks (particularly kthreads) are bound to one specific CPU.
It is fairly common for two or more bound tasks to get queued up at the
same time.  Consider, for instance, softirq_timer and softirq_sched.  A
timer goes off in an ISR which schedules softirq_thread to run at RT50.
Then the timer handler determines that it's time to smp-rebalance the
system so it schedules softirq_sched to run.  So we are in a situation
where we have two RT50 tasks queued, and the system will go into
rt-overload condition to request other CPUs for help.

This causes two problems in the current code:

1) If a high-priority bound task and a low-priority unbounded task queue
   up behind the running task, we will fail to ever relocate the unbounded
   task because we terminate the search on the first unmovable task.

2) We spend precious futile cycles in the fast-path trying to pull
   overloaded tasks over.  It is therefore optimial to strive to avoid the
   overhead all together if we can cheaply detect the condition before
   overload even occurs.

This patch tries to achieve this optimization by utilizing the hamming
weight of the task->cpus_allowed mask.  A weight of 1 indicates that
the task cannot be migrated.  We will then utilize this information to
skip non-migratable tasks and to eliminate uncessary rebalance attempts.

We introduce a per-rq variable to count the number of migratable tasks
that are currently running.  We only go into overload if we have more
than one rt task, AND at least one of them is migratable.

In addition, we introduce a per-task variable to cache the cpus_allowed
weight, since the hamming calculation is probably relatively expensive.
We only update the cached value when the mask is updated which should be
relatively infrequent, especially compared to scheduling frequency
in the fast path.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:07 +01:00
Ingo Molnar
82a1fcb902 softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
this patch extends the soft-lockup detector to automatically
detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
printed the following way:

 ------------------>
 INFO: task prctl:3042 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
 prctl         D fd5e3793     0  3042   2997
        f6050f38 00000046 00000001 fd5e3793 00000009 c06d8264 c06dae80 00000286
        f6050f40 f6050f00 f7d34d90 f7d34fc8 c1e1be80 00000001 f6050000 00000000
        f7e92d00 00000286 f6050f18 c0489d1a f6050f40 00006605 00000000 c0133a5b
 Call Trace:
  [<c04883a5>] schedule_timeout+0x6d/0x8b
  [<c04883d8>] schedule_timeout_uninterruptible+0x15/0x17
  [<c0133a76>] msleep+0x10/0x16
  [<c0138974>] sys_prctl+0x30/0x1e2
  [<c0104c52>] sysenter_past_esp+0x5f/0xa5
  =======================
 2 locks held by prctl/3042:
 #0:  (&sb->s_type->i_mutex_key#5){--..}, at: [<c0197d11>] do_fsync+0x38/0x7a
 #1:  (jbd_handle){--..}, at: [<c01ca3d2>] journal_start+0xc7/0xe9
 <------------------

the current default timeout is 120 seconds. Such messages are printed
up to 10 times per bootup. If the system has crashed already then the
messages are not printed.

if lockdep is enabled then all held locks are printed as well.

this feature is a natural extension to the softlockup-detector (kernel
locked up without scheduling) and to the NMI watchdog (kernel locked up
with IRQs disabled).

[ Gautham R Shenoy <ego@in.ibm.com>: CPU hotplug fixes. ]
[ Andrew Morton <akpm@linux-foundation.org>: build warning fix. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2008-01-25 21:08:02 +01:00
Ingo Molnar
d0d23b5432 cpu-hotplug: fix build on !CONFIG_SMP
fix build on !CONFIG_SMP.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:02 +01:00
Gautham R Shenoy
95402b3829 cpu-hotplug: replace per-subsystem mutexes with get_online_cpus()
This patch converts the known per-subsystem mutexes to get_online_cpus
put_online_cpus. It also eliminates the CPU_LOCK_ACQUIRE and
CPU_LOCK_RELEASE hotplug notification events.

Signed-off-by: Gautham  R Shenoy <ego@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:02 +01:00
Gautham R Shenoy
86ef5c9a8e cpu-hotplug: replace lock_cpu_hotplug() with get_online_cpus()
Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use
get_online_cpus and put_online_cpus instead as it highlights the
refcount semantics in these operations.

The new API guarantees protection against the cpu-hotplug operation, but
it doesn't guarantee serialized access to any of the local data
structures. Hence the changes needs to be reviewed.

In case of pseries_add_processor/pseries_remove_processor, use
cpu_maps_update_begin()/cpu_maps_update_done() as we're modifying the
cpu_present_map there.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:02 +01:00
Gautham R Shenoy
d221938c04 cpu-hotplug: refcount based cpu hotplug
This patch implements a Refcount + Waitqueue based model for
cpu-hotplug.

Now, a thread which wants to prevent cpu-hotplug, will bump up a global
refcount and the thread which wants to perform a cpu-hotplug operation
will block till the global refcount goes to zero.

The readers, if any, during an ongoing cpu-hotplug operation are blocked
until the cpu-hotplug operation is over.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Paul Jackson <pj@sgi.com> [For !CONFIG_HOTPLUG_CPU ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:01 +01:00
Srivatsa Vaddagiri
6b2d770026 sched: group scheduler, fix fairness of cpu bandwidth allocation for task groups
The current load balancing scheme isn't good enough for precise
group fairness.

For example: on a 8-cpu system, I created 3 groups as under:

	a = 8 tasks (cpu.shares = 1024)
	b = 4 tasks (cpu.shares = 1024)
	c = 3 tasks (cpu.shares = 1024)

a, b and c are task groups that have equal weight. We would expect each
of the groups to receive 33.33% of cpu bandwidth under a fair scheduler.

This is what I get with the latest scheduler git tree:

Signed-off-by: Ingo Molnar <mingo@elte.hu>
--------------------------------------------------------------------------------
Col1  | Col2    | Col3  |  Col4
------|---------|-------|-------------------------------------------------------
a     | 277.676 | 57.8% | 54.1%  54.1%  54.1%  54.2%  56.7%  62.2%  62.8% 64.5%
b     | 116.108 | 24.2% | 47.4%  48.1%  48.7%  49.3%
c     |  86.326 | 18.0% | 47.5%  47.9%  48.5%
--------------------------------------------------------------------------------

Explanation of o/p:

Col1 -> Group name
Col2 -> Cumulative execution time (in seconds) received by all tasks of that
	group in a 60sec window across 8 cpus
Col3 -> CPU bandwidth received by the group in the 60sec window, expressed in
        percentage. Col3 data is derived as:
		Col3 = 100 * Col2 / (NR_CPUS * 60)
Col4 -> CPU bandwidth received by each individual task of the group.
		Col4 = 100 * cpu_time_recd_by_task / 60

[I can share the test case that produces a similar o/p if reqd]

The deviation from desired group fairness is as below:

	a = +24.47%
	b = -9.13%
	c = -15.33%

which is quite high.

After the patch below is applied, here are the results:

--------------------------------------------------------------------------------
Col1  | Col2    | Col3  |  Col4
------|---------|-------|-------------------------------------------------------
a     | 163.112 | 34.0% | 33.2%  33.4%  33.5%  33.5%  33.7%  34.4%  34.8% 35.3%
b     | 156.220 | 32.5% | 63.3%  64.5%  66.1%  66.5%
c     | 160.653 | 33.5% | 85.8%  90.6%  91.4%
--------------------------------------------------------------------------------

Deviation from desired group fairness is as below:

	a = +0.67%
	b = -0.83%
	c = +0.17%

which is far better IMO. Most of other runs have yielded a deviation within
+-2% at the most, which is good.

Why do we see bad (group) fairness with current scheuler?
=========================================================

Currently cpu's weight is just the summation of individual task weights.
This can yield incorrect results. For ex: consider three groups as below
on a 2-cpu system:

	CPU0	CPU1
---------------------------
	A (10)  B(5)
		C(5)
---------------------------

Group A has 10 tasks, all on CPU0, Group B and C have 5 tasks each all
of which are on CPU1. Each task has the same weight (NICE_0_LOAD =
1024).

The current scheme would yield a cpu weight of 10240 (10*1024) for each cpu and
the load balancer will think both CPUs are perfectly balanced and won't
move around any tasks. This, however, would yield this bandwidth:

	A = 50%
	B = 25%
	C = 25%

which is not the desired result.

What's changing in the patch?
=============================

	- How cpu weights are calculated when CONFIF_FAIR_GROUP_SCHED is
	  defined (see below)
	- API Change
		- Two tunables introduced in sysfs (under SCHED_DEBUG) to
		  control the frequency at which the load balance monitor
		  thread runs.

The basic change made in this patch is how cpu weight (rq->load.weight) is
calculated. Its now calculated as the summation of group weights on a cpu,
rather than summation of task weights. Weight exerted by a group on a
cpu is dependent on the shares allocated to it and also the number of
tasks the group has on that cpu compared to the total number of
(runnable) tasks the group has in the system.

Let,
	W(K,i)  = Weight of group K on cpu i
	T(K,i)  = Task load present in group K's cfs_rq on cpu i
	T(K)    = Total task load of group K across various cpus
	S(K) 	= Shares allocated to group K
	NRCPUS	= Number of online cpus in the scheduler domain to
	 	  which group K is assigned.

Then,
	W(K,i) = S(K) * NRCPUS * T(K,i) / T(K)

A load balance monitor thread is created at bootup, which periodically
runs and adjusts group's weight on each cpu. To avoid its overhead, two
min/max tunables are introduced (under SCHED_DEBUG) to control the rate
at which it runs.

Fixes from: Peter Zijlstra <a.p.zijlstra@chello.nl>

- don't start the load_balance_monitor when there is only a single cpu.
- rename the kthread because its currently longer than TASK_COMM_LEN

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:00 +01:00
Linus Torvalds
b47711bfbc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
  selinux: make mls_compute_sid always polyinstantiate
  security/selinux: constify function pointer tables and fields
  security: add a secctx_to_secid() hook
  security: call security_file_permission from rw_verify_area
  security: remove security_sb_post_mountroot hook
  Security: remove security.h include from mm.h
  Security: remove security_file_mmap hook sparse-warnings (NULL as 0).
  Security: add get, set, and cloning of superblock security information
  security/selinux: Add missing "space"
2008-01-25 08:44:29 -08:00
Linus Torvalds
eba0e319c1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (125 commits)
  [CRYPTO] twofish: Merge common glue code
  [CRYPTO] hifn_795x: Fixup container_of() usage
  [CRYPTO] cast6: inline bloat--
  [CRYPTO] api: Set default CRYPTO_MINALIGN to unsigned long long
  [CRYPTO] tcrypt: Make xcbc available as a standalone test
  [CRYPTO] xcbc: Remove bogus hash/cipher test
  [CRYPTO] xcbc: Fix algorithm leak when block size check fails
  [CRYPTO] tcrypt: Zero axbuf in the right function
  [CRYPTO] padlock: Only reset the key once for each CBC and ECB operation
  [CRYPTO] api: Include sched.h for cond_resched in scatterwalk.h
  [CRYPTO] salsa20-asm: Remove unnecessary dependency on CRYPTO_SALSA20
  [CRYPTO] tcrypt: Add select of AEAD
  [CRYPTO] salsa20: Add x86-64 assembly version
  [CRYPTO] salsa20_i586: Salsa20 stream cipher algorithm (i586 version)
  [CRYPTO] gcm: Introduce rfc4106
  [CRYPTO] api: Show async type
  [CRYPTO] chainiv: Avoid lock spinning where possible
  [CRYPTO] seqiv: Add select AEAD in Kconfig
  [CRYPTO] scatterwalk: Handle zero nbytes in scatterwalk_map_and_copy
  [CRYPTO] null: Allow setkey on digest_null 
  ...
2008-01-25 08:38:25 -08:00
Greg Kroah-Hartman
79a6ee42fd Kobject: fix coding style issues in kobject.h
Finally clean up the odd spaces and other mess in kobject.h

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 21:27:06 -08:00
Greg Kroah-Hartman
d462943afe Driver core: fix coding style issues in device.h
Finally clean up the odd spaces and other mess in device.h

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 21:04:46 -08:00
Dave Young
fd04897bb2 Driver Core: add class iteration api
Add the following class iteration functions for driver use:
	class_for_each_device
	class_find_device
	class_for_each_child
	class_find_child

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:44 -08:00
Stephen Rothwell
ae72cddb23 Driver Core: constify the name passed to platform_device_register_simple
This name is just passed to platform_device_alloc which has its parameter
declared const.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:43 -08:00
Kay Sievers
af5ca3f4ec Driver core: change sysdev classes to use dynamic kobject names
All kobjects require a dynamically allocated name now. We no longer
need to keep track if the name is statically assigned, we can just
unconditionally free() all kobject names on cleanup.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:40 -08:00
Greg Kroah-Hartman
528a4bf1d5 Kobject: remove kobject_unregister() as no one uses it anymore
There are no in-kernel users of kobject_unregister() so it should be
removed.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:40 -08:00
Kay Sievers
0f4dafc056 Kobject: auto-cleanup on final unref
We save the current state in the object itself, so we can do proper
cleanup when the last reference is dropped.

If the initial reference is dropped, the object will be removed from
sysfs if needed, if an "add" event was sent, "remove" will be send, and
the allocated resources are released.

This allows us to clean up some driver core usage as well as allowing us
to do other such changes to the rest of the kernel.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:39 -08:00
Greg Kroah-Hartman
12e339ac6e Kset: remove kset_add function
No one is calling this anymore, so just remove it and hard-code the one
internal-use of it.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:39 -08:00
Greg Kroah-Hartman
6d06adfaf8 Kobject: remove kobject_register()
The function is no longer used by anyone in the kernel, and it prevents
the proper sending of the kobject uevent after the needed files are set
up by the caller.  kobject_init_and_add() can be used in its place.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:39 -08:00
Greg Kroah-Hartman
f9cb074bff Kobject: rename kobject_init_ng() to kobject_init()
Now that the old kobject_init() function is gone, rename
kobject_init_ng() to kobject_init() to clean up the namespace.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:38 -08:00
Greg Kroah-Hartman
e1543ddf73 Kobject: remove kobject_init() as no one uses it anymore
The old kobject_init() function is on longer in use, so let us remove it
from the public scope (kset mess in the kobject.c file still uses it,
but that can be cleaned up later very simply.)

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:38 -08:00
Greg Kroah-Hartman
b2d6db5878 Kobject: rename kobject_add_ng() to kobject_add()
Now that the old kobject_add() function is gone, rename kobject_add_ng()
to kobject_add() to clean up the namespace.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:38 -08:00
Greg Kroah-Hartman
9e7bbccd02 Kobject: remove kobject_add() as no one uses it anymore
The old kobject_add() function is on longer in use, so let us remove it
from the public scope (kset mess in the kobject.c file still uses it,
but that can be cleaned up later very simply.)

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:38 -08:00
Kay Sievers
edfaa7c365 Driver core: convert block from raw kobjects to core devices
This moves the block devices to /sys/class/block. It will create a
flat list of all block devices, with the disks and partitions in one
directory. For compatibility /sys/block is created and contains symlinks
to the disks.

  /sys/class/block
  |-- sda -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda
  |-- sda1 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda1
  |-- sda10 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda10
  |-- sda5 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda5
  |-- sda6 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda6
  |-- sda7 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda7
  |-- sda8 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda8
  |-- sda9 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda9
  `-- sr0 -> ../../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0

  /sys/block/
  |-- sda -> ../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda
  `-- sr0 -> ../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:36 -08:00
Greg Kroah-Hartman
e5dd127846 Driver core: move the static kobject out of struct driver
This patch removes the kobject, and a few other driver-core-only fields
out of struct driver and into the driver core only.  Now drivers can be
safely create on the stack or statically (like they currently are.)

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:35 -08:00
Greg Kroah-Hartman
c63469a398 Driver core: move the driver specific module code into the driver core
The module driver specific code should belong in the driver core, not in
the kernel/ directory.  So move this code.  This is done in preparation
for some struct device_driver rework that should be confined to the
driver core code only.

This also lets us keep from exporting these functions, as no external
code should ever be calling it.

Thanks to Andrew Morton for the !CONFIG_MODULES fix.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:35 -08:00
Greg Kroah-Hartman
cbe9c595f1 Driver: add driver_add_kobj for looney iseries_veth driver
The iseries driver wants to hang kobjects off of its driver, so, to
preserve backwards compatibility, we need to add a call to the driver
core to allow future changes to work properly.

Hopefully no one uses this function in the future and the iseries_veth
driver authors come to their senses so I can remove this hack...

Cc: Dave Larson <larson1@us.ibm.com>
Cc: Santiago Leon <santil@us.ibm.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:34 -08:00
Cornelia Huck
57c745340a driver core: Introduce default attribute groups.
This is lot like default attributes for devices (and indeed,
a lot of the code is lifted from there).

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:34 -08:00
Greg Kroah-Hartman
c6f7e72a3f driver core: remove fields from struct bus_type
struct bus_type is static everywhere in the kernel.  This moves the
kobject in the structure out of it, and a bunch of other private only to
the driver core fields are now moved to a private structure.  This lets
us dynamically create the backing kobject properly and gives us the
chance to be able to document to users exactly how to use the struct
bus_type as there are no fields they can improperly access.

Thanks to Kay for the build fixes on this patch.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:33 -08:00
Greg Kroah-Hartman
b249072ee6 driver core: add way to get to bus device klist
This allows an easier way to get to the device klist associated with a
struct bus_type (you have three to choose from...)  This will make it
easier to move these fields to be dynamic in a future patch.

The only user of this is the PCI core which horribly abuses this
interface to rearrange the order of the pci devices.  This should be
done using the existing bus device walking functions, but that's left
for future patches.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:33 -08:00
Greg Kroah-Hartman
0fed80f7a6 driver core: add way to get to bus kset
This allows an easier way to get to the kset associated with a struct
bus_type (you have three to choose from...)  This will make it easier to
move these fields to be dynamic in a future patch.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:33 -08:00
Greg Kroah-Hartman
cc972e896b driver core: remove owner field from struct bus_type
This isn't used by anything in the driver core, and by no one in the 204
different usages of it in the kernel tree.  Remove this field so no one
gets any idea that it is needed to be used.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:31 -08:00
Greg Kroah-Hartman
81e7c6a636 UIO: fix kobject usage
The uio kobject code is "wierd".  This patch should hopefully fix it up
to be sane and not leak memory anymore.


Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benedikt Spranger <b.spranger@linutronix.de>
Signed-off-by: Hans J. Koch <hjk@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:26 -08:00
Greg Kroah-Hartman
d76e15fb20 driver core: make /sys/power a kobject
/sys/power should not be a kset, that's overkill.  This patch renames it
to power_kset and fixes up all usages of it in the tree.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:25 -08:00
Greg Kroah-Hartman
2fb9113b97 kobject: remove subsystem_(un)register functions
These functions are no longer used and are the last remants of the old
subsystem crap.  So delete them for good.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:24 -08:00
Greg Kroah-Hartman
0ff21e4663 kobject: convert kernel_kset to be a kobject
kernel_kset does not need to be a kset, but a much simpler kobject now
that we have kobj_attributes.

We also rename kernel_kset to kernel_kobj to catch all users of this
symbol with a build error instead of an easy-to-ignore build warning.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:24 -08:00
Greg Kroah-Hartman
5c03c7ab88 kset: remove decl_subsys macro
This macro is no longer used.  ksets should be created dynamically with
a call to kset_create_and_add() not declared statically.

Yes, there are 5 remaining static struct kset usages in the kernel tree,
but they will be fixed up soon.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:24 -08:00
Greg Kroah-Hartman
f62ed9e33b firmware: change firmware_kset to firmware_kobj
There is no firmware "subsystem" it's just a directory in /sys that
other portions of the kernel want to hook into.  So make it a kobject
not a kset to help alivate anyone who tries to do some odd kset-like
things with this.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:23 -08:00
Greg Kroah-Hartman
15f2f9b3a9 firmware: remove firmware_(un)register()
These functions are no longer called or needed, so we can remove them.

As I rewrote the whole firmware.c file, add my copyright.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:23 -08:00
Kay Sievers
000f2a4d8c Driver Core: kill subsys_attribute and default sysfs ops
Remove the no longer needed subsys_attributes, they are all converted to
the more sensical kobj_attributes.

There is no longer a magic fallback in sysfs attribute operations, all
kobjects which create simple attributes need explicitely a ktype
assigned, which tells the core what was intended here.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:22 -08:00
Greg Kroah-Hartman
9e5f7f9abe firmware: export firmware_kset so that people can use that instead of the braindead firmware_register interface
Needed for future firmware subsystem cleanups.

In the end, the firmware_register/unregister functions will be deleted
entirely, but we need this symbol so that subsystems can migrate over.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:19 -08:00
Kay Sievers
eb41d9465c fix struct user_info export's sysfs interaction
Clean up the use of ksets and kobjects. Kobjects are instances of
objects (like struct user_info), ksets are collections of objects of a
similar type (like the uids directory containing the user_info directories).
So, use kobjects for the user_info directories, and a kset for the "uids"
directory.

On object cleanup, the final kobject_put() was missing.

Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:18 -08:00
Kay Sievers
23b5212cc7 Driver Core: add kobj_attribute handling
Add kobj_sysfs_ops to replace subsys_sysfs_ops. There is no
need for special kset operations, we want to be able to use
simple attribute operations at any kobject, not only ksets.

The whole concept of any default sysfs attribute operations
will go away with the upcoming removal of subsys_sysfs_ops.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:18 -08:00
Greg Kroah-Hartman
6dcec2511f kset: convert struct bus_device->drivers to use kset_create
Dynamically create the kset instead of declaring it statically.

Having 3 static kobjects in one structure is not only foolish, but ripe
for nasty race conditions if handled improperly.  We also rename the
field to catch any potential users of it (not that there should be
outside of the driver core...)

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:16 -08:00
Greg Kroah-Hartman
3d8995963d kset: convert struct bus_device->devices to use kset_create
Dynamically create the kset instead of declaring it statically.

Having 3 static kobjects in one structure is not only foolish, but ripe
for nasty race conditions if handled improperly.  We also rename the
field to catch any potential users of it (not that there should be
outside of the driver core...)

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:16 -08:00
Greg Kroah-Hartman
039a5dcd2f kset: convert /sys/power to use kset_create
Dynamically create the kset instead of declaring it statically.  We also
rename power_subsys to power_kset to catch all users of the variable and
we properly export it so that people don't have to guess that it really
is present in the system.

The pseries code is wierd, why is it createing /sys/power if CONFIG_PM
is disabled?  Oh well, stupid big boxes ignoring config options...

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:16 -08:00
Greg Kroah-Hartman
7405c1e15e kset: convert /sys/module to use kset_create
Dynamically create the kset instead of declaring it statically.  We also
rename module_subsys to module_kset to catch all users of the variable.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:16 -08:00
Greg Kroah-Hartman
2d72fc00a1 kobject: convert /sys/hypervisor to use kobject_create
We don't need a kset here, a simple kobject will do just fine, so
dynamically create the kobject and use it.

We also rename hypervisor_subsys to hypervisor_kset to catch all users
of the variable.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:15 -08:00
Greg Kroah-Hartman
bd35b93d80 kset: convert kernel_subsys to use kset_create
Dynamically create the kset instead of declaring it statically.  We also
rename kernel_subsys to kernel_kset to catch all users of this symbol
with a build error instead of an easy-to-ignore build warning.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:14 -08:00
Greg Kroah-Hartman
e5e38a86c0 kset: remove decl_subsys_name
The last user of this macro (pci hotplug core) is now switched over to
using a dynamic kset, so this macro is no longer needed at all.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:14 -08:00
Greg Kroah-Hartman
81ace5cd8f kset: convert pci hotplug to use kset_create_and_add
This also renames pci_hotplug_slots_subsys to pcis_hotplug_slots_kset
catch all current users with a build error instead of a build warning
which can easily be missed.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:14 -08:00
Greg Kroah-Hartman
00d2666623 kobject: convert main fs kobject to use kobject_create
This also renames fs_subsys to fs_kobj to catch all current users with a
build error instead of a build warning which can easily be missed.


Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:13 -08:00
Greg Kroah-Hartman
43968d2f16 kobject: get rid of kobject_kset_add_dir
kobject_kset_add_dir is only called in one place so remove it and use
kobject_create() instead.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:11 -08:00
Greg Kroah-Hartman
4ff6abff83 kobject: get rid of kobject_add_dir
kobject_create_and_add is the same as kobject_add_dir, so drop
kobject_add_dir.


Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:11 -08:00
Greg Kroah-Hartman
3f9e3ee9dc kobject: add kobject_create_and_add function
This lets users create dynamic kobjects much easier.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:10 -08:00
Greg Kroah-Hartman
b727c70289 kset: add kset_create_and_add function
Now ksets can be dynamically created on the fly, no static definitions
are required.  Thanks to Miklos for hints on how to make this work
better for the callers.

And thanks to Kay for finding some stupid bugs in my original version
and pointing out that we need to handle the fact that kobject's can have
a kset as a parent and to handle that properly in kobject_add().

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:10 -08:00
Greg Kroah-Hartman
12d03da7c1 kobject: remove kobj_set_kset_s as no one is using it anymore
What a confusing name for a macro...

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:10 -08:00
Greg Kroah-Hartman
3514faca19 kobject: remove struct kobj_type from struct kset
We don't need a "default" ktype for a kset.  We should set this
explicitly every time for each kset.  This change is needed so that we
can make ksets dynamic, and cleans up one of the odd, undocumented
assumption that the kset/kobject/ktype model has.

This patch is based on a lot of help from Kay Sievers.

Nasty bug in the block code was found by Dave Young
<hidave.darkstar@gmail.com>

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:10 -08:00
Greg Kroah-Hartman
c11c4154e7 kobject: add kobject_init_and_add function
Also add a kobject_init_and_add function which bundles up what a lot of
the current callers want to do all at once, and it properly handles the
memory usages, unlike kobject_register();

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:10 -08:00
Greg Kroah-Hartman
244f6cee9a kobject: add kobject_add_ng function
This is what the kobject_add function is going to become.

Add this to the kernel and then we can convert the tree over to use it.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:09 -08:00
Greg Kroah-Hartman
e86000d042 kobject: add kobject_init_ng function
This is what the kobject_init function is going to become.

Add this to the kernel and then we can convert the tree over to use it.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:09 -08:00
Greg Kroah-Hartman
18041f4775 kobject: make kobject_cleanup be static
No one except the kobject core calls it so make the function static.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:09 -08:00
Emil Medve
7b8712e563 driver core: Make the dev_*() family of macros in device.h complete
Removed duplicates defined elsewhere

Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:08 -08:00
Tony Jones
7dd817d083 tifm: Convert from class_device to device for TI flash media
Signed-off-by: Tony Jones <tonyj@suse.de>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:06 -08:00
Tony Jones
6013c12be8 pktcdvd: Convert from class_device to device for block/pktcdvd
struct class_device is going away, this converts the code to use struct
device instead.

Signed-off-by: Tony Jones <tonyj@suse.de>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:06 -08:00
Tony Jones
891f78ea83 DMA: Convert from class_device to device for DMA engine
Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Shannon Nelson <shannon.nelson@intel.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:05 -08:00
Evgeniy Polyakov
41ca28ab2a kref: add kref_set()
This adds kref_set() to the kref api for future use by people who really
know what they are doing with krefs...

From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:05 -08:00
Rafael J. Wysocki
775b64d2b6 PM: Acquire device locks on suspend
This patch reorganizes the way suspend and resume notifications are
sent to drivers.  The major changes are that now the PM core acquires
every device semaphore before calling the methods, and calls to
device_add() during suspends will fail, while calls to device_del()
during suspends will block.

It also provides a way to safely remove a suspended device with the
help of the PM core, by using the device_pm_schedule_removal() callback
introduced specifically for this purpose, and updates two drivers (msr
and cpuid) that need to use it.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:04 -08:00
Jan Engelhardt
1996a10948 security/selinux: constify function pointer tables and fields
Constify function pointer tables and fields.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-25 11:29:54 +11:00
David Howells
63cb344923 security: add a secctx_to_secid() hook
Add a secctx_to_secid() LSM hook to go along with the existing
secid_to_secctx() LSM hook.  This patch also includes the SELinux
implementation for this hook.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-25 11:29:53 +11:00
H. Peter Anvin
bced95283e security: remove security_sb_post_mountroot hook
The security_sb_post_mountroot() hook is long-since obsolete, and is
fundamentally broken: it is never invoked if someone uses initramfs.
This is particularly damaging, because the existence of this hook has
been used as motivation for not using initramfs.

Stephen Smalley confirmed on 2007-07-19 that this hook was originally
used by SELinux but can now be safely removed:

     http://marc.info/?l=linux-kernel&m=118485683612916&w=2

Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-25 11:29:50 +11:00
James Morris
42d7896ebc Security: remove security.h include from mm.h
Remove security.h include from mm.h, as it is only needed for a single
extern declaration, and pulls in all kinds of crud.

Fine-by-me: David Chinner <dgc@sgi.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-25 11:29:49 +11:00
Eric Paris
c9180a57a9 Security: add get, set, and cloning of superblock security information
Adds security_get_sb_mnt_opts, security_set_sb_mnt_opts, and
security_clont_sb_mnt_opts to the LSM and to SELinux.  This will allow
filesystems to directly own and control all of their mount options if they
so choose.  This interface deals only with option identifiers and strings so
it should generic enough for any LSM which may come in the future.

Filesystems which pass text mount data around in the kernel (almost all of
them) need not currently make use of this interface when dealing with
SELinux since it will still parse those strings as it always has.  I assume
future LSM's would do the same.  NFS is the primary FS which does not use
text mount data and thus must make use of this interface.

An LSM would need to implement these functions only if they had mount time
options, such as selinux has context= or fscontext=.  If the LSM has no
mount time options they could simply not implement and let the dummy ops
take care of things.

An LSM other than SELinux would need to define new option numbers in
security.h and any FS which decides to own there own security options would
need to be patched to use this new interface for every possible LSM.  This
is because it was stated to me very clearly that LSM's should not attempt to
understand FS mount data and the burdon to understand security should be in
the FS which owns the options.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-25 11:29:46 +11:00
Len Brown
d4b7dc499d ACPI: make _OSI(Linux) console messages smarter
If BIOS invokes _OSI(Linux), the kernel response
depends on what the ACPI DMI list knows about the system,
and that is reflectd in dmesg:

1) System unknown to DMI:

ACPI: BIOS _OSI(Linux) query ignored
ACPI: DMI System Vendor: LENOVO
ACPI: DMI Product Name: 7661W1P
ACPI: DMI Product Version: ThinkPad T61
ACPI: DMI Board Name: 7661W1P
ACPI: DMI BIOS Vendor: LENOVO
ACPI: DMI BIOS Date: 10/18/2007
ACPI: Please send DMI info above to linux-acpi@vger.kernel.org
ACPI: If "acpi_osi=Linux" works better, please notify linux-acpi@vger.kernel.org

2) System known to DMI, but effect of OSI(Linux) unknown:

ACPI: DMI detected: Lenovo ThinkPad T61
...
ACPI: BIOS _OSI(Linux) query ignored via DMI
ACPI: If "acpi_osi=Linux" works better, please notify linux-acpi@vger.kernel.org

3) System known to DMI, which disables _OSI(Linux):

ACPI: DMI detected: Lenovo ThinkPad T61
...
ACPI: BIOS _OSI(Linux) query ignored via DMI

4) System known to DMI, which enable _OSI(Linux):

ACPI: DMI detected: Lenovo ThinkPad T61
ACPI: Added _OSI(Linux)
...
ACPI: BIOS _OSI(Linux) query honored via DMI

cmdline overrides take precidence over the built-in
default and the DMI prescribed default.
cmdline "acpi_osi=Linux" results in:

ACPI: BIOS _OSI(Linux) query honored via cmdline

Signed-off-by: Len Brown <len.brown@intel.com>
2008-01-23 21:26:15 -05:00
Len Brown
f89e3b0620 DMI: create dmi_get_slot()
This simply allows other sub-systems (such as ACPI)
to access and print out slots in static dmi_ident[].

Signed-off-by: Len Brown <len.brown@intel.com>
2008-01-23 21:23:13 -05:00
Len Brown
81b4e1f626 DMI: move dmi_available declaration to linux/dmi.h
Signed-off-by: Len Brown <len.brown@intel.com>
2008-01-23 21:22:21 -05:00