kernel-ark/Documentation
David Howells 2d6fff6370 FS-Cache: Add the FS-Cache netfs API and documentation
Add the API for a generic facility (FS-Cache) by which filesystems (such as AFS
or NFS) may call on local caching capabilities without having to know anything
about how the cache works, or even if there is a cache:

	+---------+
	|         |                        +--------------+
	|   NFS   |--+                     |              |
	|         |  |                 +-->|   CacheFS    |
	+---------+  |   +----------+  |   |  /dev/hda5   |
	             |   |          |  |   +--------------+
	+---------+  +-->|          |  |
	|         |      |          |--+
	|   AFS   |----->| FS-Cache |
	|         |      |          |--+
	+---------+  +-->|          |  |
	             |   |          |  |   +--------------+
	+---------+  |   +----------+  |   |              |
	|         |  |                 +-->|  CacheFiles  |
	|  ISOFS  |--+                     |  /var/cache  |
	|         |                        +--------------+
	+---------+

General documentation and documentation of the netfs specific API are provided
in addition to the header files.

As this patch stands, it is possible to build a filesystem against the facility
and attempt to use it.  All that will happen is that all requests will be
immediately denied as if no cache is present.

Further patches will implement the core of the facility.  The facility will
transfer requests from networking filesystems to appropriate caches if
possible, or else gracefully deny them.

If this facility is disabled in the kernel configuration, then all its
operations will trivially reduce to nothing during compilation.

WHY NOT I_MAPPING?
==================

I have added my own API to implement caching rather than using i_mapping to do
this for a number of reasons.  These have been discussed a lot on the LKML and
CacheFS mailing lists, but to summarise the basics:

 (1) Most filesystems don't do hole reportage.  Holes in files are treated as
     blocks of zeros and can't be distinguished otherwise, making it difficult
     to distinguish blocks that have been read from the network and cached from
     those that haven't.

 (2) The backing inode must be fully populated before being exposed to
     userspace through the main inode because the VM/VFS goes directly to the
     backing inode and does not interrogate the front inode's VM ops.

     Therefore:

     (a) The backing inode must fit entirely within the cache.

     (b) All backed files currently open must fit entirely within the cache at
     	 the same time.

     (c) A working set of files in total larger than the cache may not be
     	 cached.

     (d) A file may not grow larger than the available space in the cache.

     (e) A file that's open and cached, and remotely grows larger than the
     	 cache is potentially stuffed.

 (3) Writes go to the backing filesystem, and can only be transferred to the
     network when the file is closed.

 (4) There's no record of what changes have been made, so the whole file must
     be written back.

 (5) The pages belong to the backing filesystem, and all metadata associated
     with that page are relevant only to the backing filesystem, and not
     anything stacked atop it.

OVERVIEW
========

FS-Cache provides (or will provide) the following facilities:

 (1) Caches can be added / removed at any time, even whilst in use.

 (2) Adds a facility by which tags can be used to refer to caches, even if
     they're not available yet.

 (3) More than one cache can be used at once.  Caches can be selected
     explicitly by use of tags.

 (4) The netfs is provided with an interface that allows either party to
     withdraw caching facilities from a file (required for (1)).

 (5) A netfs may annotate cache objects that belongs to it.  This permits the
     storage of coherency maintenance data.

 (6) Cache objects will be pinnable and space reservations will be possible.

 (7) The interface to the netfs returns as few errors as possible, preferring
     rather to let the netfs remain oblivious.

 (8) Cookies are used to represent indices, files and other objects to the
     netfs.  The simplest cookie is just a NULL pointer - indicating nothing
     cached there.

 (9) The netfs is allowed to propose - dynamically - any index hierarchy it
     desires, though it must be aware that the index search function is
     recursive, stack space is limited, and indices can only be children of
     indices.

(10) Indices can be used to group files together to reduce key size and to make
     group invalidation easier.  The use of indices may make lookup quicker,
     but that's cache dependent.

(11) Data I/O is effectively done directly to and from the netfs's pages.  The
     netfs indicates that page A is at index B of the data-file represented by
     cookie C, and that it should be read or written.  The cache backend may or
     may not start I/O on that page, but if it does, a netfs callback will be
     invoked to indicate completion.  The I/O may be either synchronous or
     asynchronous.

(12) Cookies can be "retired" upon release.  At this point FS-Cache will mark
     them as obsolete and the index hierarchy rooted at that point will get
     recycled.

(13) The netfs provides a "match" function for index searches.  In addition to
     saying whether a match was made or not, this can also specify that an
     entry should be updated or deleted.

FS-Cache maintains a virtual index tree in which all indices, files, objects
and pages are kept.  Bits of this tree may actually reside in one or more
caches.

                                           FSDEF
                                             |
                        +------------------------------------+
                        |                                    |
                       NFS                                  AFS
                        |                                    |
           +--------------------------+                +-----------+
           |                          |                |           |
        homedir                     mirror          afs.org   redhat.com
           |                          |                            |
     +------------+           +---------------+              +----------+
     |            |           |               |              |          |
   00001        00002       00007           00125        vol00001   vol00002
     |            |           |               |                         |
 +---+---+     +-----+      +---+      +------+------+            +-----+----+
 |   |   |     |     |      |   |      |      |      |            |     |    |
PG0 PG1 PG2   PG0  XATTR   PG0 PG1   DIRENT DIRENT DIRENT        R/W   R/O  Bak
                     |                                            |
                    PG0                                       +-------+
                                                              |       |
                                                            00001   00003
                                                              |
                                                          +---+---+
                                                          |   |   |
                                                         PG0 PG1 PG2

In the example above, two netfs's can be seen to be backed: NFS and AFS.  These
have different index hierarchies:

 (*) The NFS primary index will probably contain per-server indices.  Each
     server index is indexed by NFS file handles to get data file objects.
     Each data file objects can have an array of pages, but may also have
     further child objects, such as extended attributes and directory entries.
     Extended attribute objects themselves have page-array contents.

 (*) The AFS primary index contains per-cell indices.  Each cell index contains
     per-logical-volume indices.  Each of volume index contains up to three
     indices for the read-write, read-only and backup mirrors of those volumes.
     Each of these contains vnode data file objects, each of which contains an
     array of pages.

The very top index is the FS-Cache master index in which individual netfs's
have entries.

Any index object may reside in more than one cache, provided it only has index
children.  Any index with non-index object children will be assumed to only
reside in one cache.

The FS-Cache overview can be found in:

	Documentation/filesystems/caching/fscache.txt

The netfs API to FS-Cache can be found in:

	Documentation/filesystems/caching/netfs-api.txt

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:36 +01:00
..
ABI Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 2009-04-01 10:57:49 -07:00
accounting Documentation/accounting/getdelays.c: fix endless loop 2009-01-15 16:39:37 -08:00
acpi ACPI: update debug parameter documentation 2008-11-07 21:45:29 -05:00
aoe
arm Merge branch 'next-s3c-pm' of git://aeryn.fluff.org.uk/bjdooks/linux into devel 2009-03-26 22:44:43 +00:00
auxdisplay .gitignore updates 2008-10-30 11:38:45 -07:00
blackfin Blackfin arch: Add document about bfin-gpio 2009-01-07 23:14:38 +08:00
block block: Repeated lines in switching-sched.txt 2009-03-26 11:01:28 +01:00
blockdev Create/use more directory structure in the Documentation/ tree. 2008-11-14 17:28:53 +00:00
cdrom doc/cdrom: Trvial documentation error, file not present 2008-10-10 08:22:44 +02:00
cgroups memcg: fix OOM killer under memcg 2009-04-02 19:04:55 -07:00
connector Documentation/connector/cn_test.c: don't use gfp_any() 2009-02-12 16:47:01 -08:00
console
cpu-freq [CPUFREQ] ondemand/conservative: sanitize sampling_rate restrictions 2009-02-24 22:47:31 -05:00
cpuidle
cris fix random typos 2008-10-16 11:21:30 -07:00
crypto async_tx, dmaengine: document channel allocation and api rework 2009-01-05 18:10:19 -07:00
development-process Fix a typo in the development process document. 2009-01-08 16:32:13 -07:00
device-mapper
DocBook documentation: ignore byproducts from latex 2009-04-02 19:04:53 -07:00
driver-model PATCH [1/2] Documentation/driver-model/device.txt: fix struct device_attribute 2009-02-22 09:27:15 -08:00
dvb V4L/DVB (11138): get_dvb_firmware: add support for downloading the cx2584x firmware for pvrusb2 2009-03-30 12:43:31 -03:00
early-userspace
fault-injection
fb fbdev: remove cyblafb driver 2009-04-01 08:59:33 -07:00
filesystems FS-Cache: Add the FS-Cache netfs API and documentation 2009-04-03 16:42:36 +01:00
firmware_class
frv
hwmon hwmon: Add LTC4215 driver 2009-04-01 08:59:21 -07:00
i2c Move the pcf8591 driver to hwmon 2009-03-30 21:46:43 +02:00
i2o
ia64 .gitignore updates 2008-10-30 11:38:45 -07:00
ide ide: update warm-plug HOWTO 2009-01-06 17:21:00 +01:00
infiniband
input Merge commit 'v2.6.28-rc9' into next 2008-12-20 04:54:54 -05:00
ioctl V4L/DVB (10870a): remove all references for video_decoder.h 2009-03-30 12:43:15 -03:00
isdn Rationalise Randy's address a bit 2008-10-30 11:38:47 -07:00
ja_JP Sync patch for jp_JP/stable_kernel_rules.txt 2009-01-28 15:55:48 -08:00
kbuild kbuild: fix kbuild.txt typos 2009-01-14 21:42:51 +01:00
kdump powerpc: Support for relocatable kdump kernel 2008-10-22 15:01:22 +11:00
ko_KR
laptops ACPI: thinkpad-acpi: bump up version to 0.22 2009-01-15 13:48:24 -05:00
lguest lguest: barrier me harder 2009-03-30 21:55:26 +10:30
m68k
make
mips ide: remove unused CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ 2009-01-14 19:19:03 +01:00
misc-devices drivers/misc/isl29003.c: driver for the ISL29003 ambient light sensor 2009-04-01 08:59:18 -07:00
mn10300
mtd
namespaces
netlabel
networking Neterion: Driver help file 2009-04-02 00:33:39 -07:00
parisc
PCI PCI MSI: Add example request loop to MSI-HOWTO.txt 2009-03-20 11:35:04 -07:00
pcmcia .gitignore updates 2008-10-30 11:38:45 -07:00
power pm: document use of RTC in pm_trace 2008-10-16 11:21:29 -07:00
powerpc powerpc: add mmc-spi-slot bindings 2009-04-01 08:59:23 -07:00
prctl
RCU rcu: documentation 1Q09 update 2009-03-10 15:55:11 -07:00
s390 documentation: update s390 header file paths 2009-01-06 15:59:28 -08:00
scheduler sched, documentation: remove old O(1) scheduler document 2009-03-02 12:02:52 +01:00
scsi [SCSI] osd: Documentation for OSD library 2009-03-12 12:58:09 -05:00
serial Create/use more directory structure in the Documentation/ tree. 2008-11-14 17:28:53 +00:00
sh sh: Kill off remaining CONFIG_SH_KGDB bits. 2008-12-22 18:44:05 +09:00
sound Merge branch 'topic/oxygen' into for-linus 2009-03-24 00:36:17 +01:00
sparc
spi hwmon: (lm70) Code streamlining and cleanup 2009-01-07 16:37:34 +01:00
sysctl documentation: fix unix_dgram_qlen description 2009-04-02 19:04:53 -07:00
telephony
thermal
timers
tracers doc: mmiotrace.txt, buffer size control change 2009-02-15 20:05:13 +01:00
uml
usb USB: usbmon: Add binary API v1 2009-03-24 16:20:36 -07:00
video4linux V4L/DVB (11225): v4lgrab: fix compilation warnings 2009-03-30 12:43:41 -03:00
vm mm: remove try_to_munlock from vmscan 2009-01-06 15:59:03 -08:00
w1 w1: send status messages after command processing 2009-01-08 08:31:14 -08:00
watchdog .gitignore updates 2008-10-30 11:38:45 -07:00
wimax i2400m: documentation and instructions for usage 2009-01-07 10:00:18 -08:00
x86 Merge branch 'x86/doc' into x86/core 2009-03-05 21:49:44 +01:00
zh_CN
00-INDEX Merge branch 'doc-subdirs' of git://git.kernel.org/pub/scm/linux/kernel/git/rdunlap/linux-docs 2008-11-15 11:51:03 -08:00
applying-patches.txt
atomic_ops.txt
bad_memory.txt Document handling of bad memory 2008-12-03 16:09:53 -07:00
basic_profiling.txt
binfmt_misc.txt
braille-console.txt
bt8xxgpio.txt
BUG-HUNTING
c2port.txt Add c2 port support 2008-11-12 17:17:18 -08:00
cachetlb.txt
Changes Documentation/Changes: add required versions for new filesystems 2009-01-29 18:19:30 -08:00
CodingStyle fix emacs indenting howto filename expansion 2009-01-29 18:19:29 -08:00
cpu-hotplug.txt x86: use possible_cpus=NUM to extend the possible cpus allowed 2008-12-18 12:08:05 +01:00
cpu-load.txt
cputopology.txt cpumask: Use topology_core_cpumask()/topology_thread_cpumask() 2009-01-11 19:12:49 +01:00
credentials.txt CRED: Documentation 2008-11-14 10:39:26 +11:00
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt trivial: fix an -> a typos in documentation and comments 2009-01-06 11:28:07 +01:00
devices.txt [SCSI] major.h: char-major number for OSD device driver 2009-03-12 12:58:05 -05:00
DMA-API.txt dma-debug: Documentation update 2009-03-17 12:56:47 +01:00
DMA-attributes.txt
DMA-ISA-LPC.txt
DMA-mapping.txt documentation: update header file paths 2009-01-06 15:59:28 -08:00
dmaengine.txt async_tx, dmaengine: document channel allocation and api rework 2009-01-05 18:10:19 -07:00
dontdiff dontdiff: Fix asm exclude 2009-03-26 15:45:43 -07:00
dynamic-debug-howto.txt Dynamic debug: allow simple quoting of words 2009-03-24 16:38:27 -07:00
edac.txt
eisa.txt
email-clients.txt Documentation/email-clients.txt: add some info about gmail 2008-11-06 15:41:19 -08:00
exception.txt
feature-removal-schedule.txt gpio: gpio_{request,free}() now required (feature removal) 2009-04-02 19:04:51 -07:00
ftrace.txt ftrace: improve documentation 2008-11-28 13:15:14 +01:00
gpio.txt gpio: gpio_{request,free}() now required (feature removal) 2009-04-02 19:04:51 -07:00
highuid.txt
HOWTO Remove Andrew Morton's http://www.zip.com.au/~akpm/ 2008-10-16 11:21:32 -07:00
hw_random.txt
ics932s401 ics932s401: new clock generator chip driver 2008-11-12 17:17:18 -08:00
initrd.txt
Intel-IOMMU.txt
io_ordering.txt
io-mapping.txt io mapping: improve documentation 2008-11-03 18:21:44 +01:00
IO-mapping.txt Documentation: move DMA-mapping.txt to Doc/PCI/ 2009-01-29 18:19:29 -08:00
iostats.txt
IPMI.txt
IRQ-affinity.txt
IRQ.txt
irqflags-tracing.txt
isapnp.txt
java.txt
kernel-doc-nano-HOWTO.txt kernel-doc: preferred ending marker and examples 2009-02-11 14:25:36 -08:00
kernel-docs.txt
kernel-parameters.txt Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 2009-04-01 09:47:12 -07:00
keys-request-key.txt
keys.txt
kobject.txt kobject: Make Documentation/kobject.txt a little more coherent. 2009-01-06 10:44:32 -08:00
kprobes.txt kprobes: support probing module __exit function 2009-01-06 15:59:21 -08:00
kref.txt
ldm.txt
leds-class.txt
local_ops.txt documentation: local_ops fix on_each_cpu 2008-12-01 13:51:26 +01:00
lockdep-design.txt lockdep: get_user_chars() redo 2009-02-14 23:28:22 +01:00
lockstat.txt lockstat: contend with points 2008-10-20 15:43:10 +02:00
logo.svg linux.conf.au 2009: Tuz 2009-03-16 07:55:37 -07:00
logo.txt linux.conf.au 2009: Tuz 2009-03-16 07:55:37 -07:00
magic-number.txt documentation: update header file paths 2009-01-06 15:59:28 -08:00
Makefile
ManagementStyle docs: fix ManagementStyle book name 2008-10-30 11:38:46 -07:00
markers.txt markers: comment marker_synchronize_unregister() on data dependency 2008-11-28 16:47:41 +01:00
mca.txt
md.txt
memory-barriers.txt
memory-hotplug.txt mm: show node to memory section relationship with symlinks in sysfs 2009-01-06 15:59:00 -08:00
memory.txt
mono.txt
mutex-design.txt
nmi_watchdog.txt x86, nmi-watchdog: update procfs nmi_watchdog file documentation v2 2008-10-30 19:07:04 +01:00
nommu-mmap.txt NOMMU: Make mmap allocation page trimming behaviour configurable. 2009-01-08 12:04:47 +00:00
numastat.txt
oops-tracing.txt
parport-lowlevel.txt
parport.txt
pi-futex.txt
pnp.txt
preempt-locking.txt
printk-formats.txt DOC: add printk-formats.txt 2008-11-12 17:17:17 -08:00
prio_tree.txt
rbtree.txt
rfkill.txt rfkill: add master_switch_mode and EPO lock to rfkill and rfkill-input 2008-10-31 19:00:09 -04:00
robust-futex-ABI.txt
robust-futexes.txt
rt-mutex-design.txt
rt-mutex.txt
rtc.txt
SAK.txt Remove Andrew Morton's old email accounts 2008-10-16 11:21:32 -07:00
SecurityBugs
SELinux.txt
serial-console.txt
sgi-ioc4.txt
sgi-visws.txt
slow-work.txt Document the slow work thread pool 2009-04-03 16:42:35 +01:00
SM501.txt
Smack.txt smack: Add a new '-CIPSO' option to the network address label configuration 2009-03-28 15:01:37 +11:00
sparse.txt
spinlocks.txt
stable_api_nonsense.txt
stable_kernel_rules.txt Update stable tree documentation 2008-10-29 15:03:49 -07:00
SubmitChecklist documentation: explain memory barriers 2008-10-16 11:21:32 -07:00
SubmittingDrivers Remove Andrew Morton's old email accounts 2008-10-16 11:21:32 -07:00
SubmittingPatches Merge branch 'docs' of git://git.lwn.net/linux-2.6 2008-10-16 12:18:16 -07:00
svga.txt
sysfs-rules.txt
sysrq.txt filesystem freeze: allow SysRq emergency thaw to thaw frozen filesystems 2009-04-01 08:59:17 -07:00
tracepoints.txt tracepoints: Documentation TPPROTO misspelt in Documentation/tracepoints.txt 2008-11-29 15:13:42 +01:00
unaligned-memory-access.txt
unicode.txt
unshare.txt
VGA-softcursor.txt
video-output.txt
volatile-considered-harmful.txt
voyager.txt
zorro.txt