3182 lines
183 KiB
HTML
3182 lines
183 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
||
<html>
|
||
<!-- This manual is for R, version 3.3.1 (2016-06-21).
|
||
|
||
Copyright (C) 2000-2016 R Core Team
|
||
|
||
Permission is granted to make and distribute verbatim copies of this
|
||
manual provided the copyright notice and this permission notice are
|
||
preserved on all copies.
|
||
|
||
Permission is granted to copy and distribute modified versions of this
|
||
manual under the conditions for verbatim copying, provided that the
|
||
entire resulting derived work is distributed under the terms of a
|
||
permission notice identical to this one.
|
||
|
||
Permission is granted to copy and distribute translations of this manual
|
||
into another language, under the above conditions for modified versions,
|
||
except that this permission notice may be stated in a translation
|
||
approved by the R Core Team. -->
|
||
<!-- Created by GNU Texinfo 6.1, http://www.gnu.org/software/texinfo/ -->
|
||
<head>
|
||
<title>R Data Import/Export</title>
|
||
|
||
<meta name="description" content="R Data Import/Export">
|
||
<meta name="keywords" content="R Data Import/Export">
|
||
<meta name="resource-type" content="document">
|
||
<meta name="distribution" content="global">
|
||
<meta name="Generator" content="texi2any">
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
||
<link href="#Top" rel="start" title="Top">
|
||
<link href="#Function-and-variable-index" rel="index" title="Function and variable index">
|
||
<link href="#SEC_Contents" rel="contents" title="Table of Contents">
|
||
<style type="text/css">
|
||
<!--
|
||
a.summary-letter {text-decoration: none}
|
||
blockquote.indentedblock {margin-right: 0em}
|
||
blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
|
||
blockquote.smallquotation {font-size: smaller}
|
||
div.display {margin-left: 3.2em}
|
||
div.example {margin-left: 3.2em}
|
||
div.lisp {margin-left: 3.2em}
|
||
div.smalldisplay {margin-left: 3.2em}
|
||
div.smallexample {margin-left: 3.2em}
|
||
div.smalllisp {margin-left: 3.2em}
|
||
kbd {font-style: oblique}
|
||
pre.display {font-family: inherit}
|
||
pre.format {font-family: inherit}
|
||
pre.menu-comment {font-family: serif}
|
||
pre.menu-preformatted {font-family: serif}
|
||
pre.smalldisplay {font-family: inherit; font-size: smaller}
|
||
pre.smallexample {font-size: smaller}
|
||
pre.smallformat {font-family: inherit; font-size: smaller}
|
||
pre.smalllisp {font-size: smaller}
|
||
span.nolinebreak {white-space: nowrap}
|
||
span.roman {font-family: initial; font-weight: normal}
|
||
span.sansserif {font-family: sans-serif; font-weight: normal}
|
||
ul.no-bullet {list-style: none}
|
||
body {
|
||
margin-left: 5%;
|
||
margin-right: 5%;
|
||
}
|
||
|
||
h1 {
|
||
background: white;
|
||
color: rgb(25%, 25%, 25%);
|
||
font-family: monospace;
|
||
font-size: xx-large;
|
||
text-align: center;
|
||
}
|
||
|
||
h2 {
|
||
background: white;
|
||
color: rgb(40%, 40%, 40%);
|
||
font-family: monospace;
|
||
font-size: x-large;
|
||
text-align: center;
|
||
}
|
||
|
||
h3 {
|
||
background: white;
|
||
color: rgb(40%, 40%, 40%);
|
||
font-family: monospace;
|
||
font-size: large;
|
||
}
|
||
|
||
h4 {
|
||
background: white;
|
||
color: rgb(40%, 40%, 40%);
|
||
font-family: monospace;
|
||
}
|
||
|
||
span.samp {
|
||
font-family: monospace;
|
||
}
|
||
|
||
span.command {
|
||
font-family: monospace;
|
||
}
|
||
|
||
span.option {
|
||
font-family: monospace;
|
||
}
|
||
|
||
span.file {
|
||
font-family: monospace;
|
||
}
|
||
|
||
span.env {
|
||
font-family: monospace;
|
||
}
|
||
|
||
ul {
|
||
margin-top: 0.25ex;
|
||
margin-bottom: 0.25ex;
|
||
}
|
||
|
||
li {
|
||
margin-top: 0.25ex;
|
||
margin-bottom: 0.25ex;
|
||
}
|
||
|
||
p {
|
||
margin-top: 0.6ex;
|
||
margin-bottom: 1.2ex;
|
||
}
|
||
|
||
-->
|
||
</style>
|
||
|
||
|
||
</head>
|
||
|
||
<body lang="en">
|
||
<h1 class="settitle" align="center">R Data Import/Export</h1>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<a name="SEC_Contents"></a>
|
||
<h2 class="contents-heading">Table of Contents</h2>
|
||
|
||
<div class="contents">
|
||
|
||
<ul class="no-bullet">
|
||
<li><a name="toc-Acknowledgements-1" href="#Acknowledgements">Acknowledgements</a></li>
|
||
<li><a name="toc-Introduction-1" href="#Introduction">1 Introduction</a>
|
||
<ul class="no-bullet">
|
||
<li><a name="toc-Imports-1" href="#Imports">1.1 Imports</a>
|
||
<ul class="no-bullet">
|
||
<li><a name="toc-Encodings-1" href="#Encodings">1.1.1 Encodings</a></li>
|
||
</ul></li>
|
||
<li><a name="toc-Export-to-text-files-1" href="#Export-to-text-files">1.2 Export to text files</a></li>
|
||
<li><a name="toc-XML-1" href="#XML">1.3 XML</a></li>
|
||
</ul></li>
|
||
<li><a name="toc-Spreadsheet_002dlike-data-1" href="#Spreadsheet_002dlike-data">2 Spreadsheet-like data</a>
|
||
<ul class="no-bullet">
|
||
<li><a name="toc-Variations-on-read_002etable-1" href="#Variations-on-read_002etable">2.1 Variations on <code>read.table</code></a></li>
|
||
<li><a name="toc-Fixed_002dwidth_002dformat-files-1" href="#Fixed_002dwidth_002dformat-files">2.2 Fixed-width-format files</a></li>
|
||
<li><a name="toc-Data-Interchange-Format-_0028DIF_0029-1" href="#Data-Interchange-Format-_0028DIF_0029">2.3 Data Interchange Format (DIF)</a></li>
|
||
<li><a name="toc-Using-scan-directly-1" href="#Using-scan-directly">2.4 Using <code>scan</code> directly</a></li>
|
||
<li><a name="toc-Re_002dshaping-data-1" href="#Re_002dshaping-data">2.5 Re-shaping data</a></li>
|
||
<li><a name="toc-Flat-contingency-tables-1" href="#Flat-contingency-tables">2.6 Flat contingency tables</a></li>
|
||
</ul></li>
|
||
<li><a name="toc-Importing-from-other-statistical-systems-1" href="#Importing-from-other-statistical-systems">3 Importing from other statistical systems</a>
|
||
<ul class="no-bullet">
|
||
<li><a name="toc-EpiInfo_002c-Minitab_002c-S_002dPLUS_002c-SAS_002c-SPSS_002c-Stata_002c-Systat" href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">3.1 EpiInfo, Minitab, S-PLUS, SAS, SPSS, Stata, Systat</a></li>
|
||
<li><a name="toc-Octave-1" href="#Octave">3.2 Octave</a></li>
|
||
</ul></li>
|
||
<li><a name="toc-Relational-databases-1" href="#Relational-databases">4 Relational databases</a>
|
||
<ul class="no-bullet">
|
||
<li><a name="toc-Why-use-a-database_003f-1" href="#Why-use-a-database_003f">4.1 Why use a database?</a></li>
|
||
<li><a name="toc-Overview-of-RDBMSs-1" href="#Overview-of-RDBMSs">4.2 Overview of RDBMSs</a>
|
||
<ul class="no-bullet">
|
||
<li><a name="toc-SQL-queries-1" href="#SQL-queries">4.2.1 <acronym>SQL</acronym> queries</a></li>
|
||
<li><a name="toc-Data-types-1" href="#Data-types">4.2.2 Data types</a></li>
|
||
</ul></li>
|
||
<li><a name="toc-R-interface-packages-1" href="#R-interface-packages">4.3 R interface packages</a>
|
||
<ul class="no-bullet">
|
||
<li><a name="toc-Packages-using-DBI" href="#DBI">4.3.1 Packages using DBI</a></li>
|
||
<li><a name="toc-Package-RODBC" href="#RODBC">4.3.2 Package RODBC</a></li>
|
||
</ul></li>
|
||
</ul></li>
|
||
<li><a name="toc-Binary-files-1" href="#Binary-files">5 Binary files</a>
|
||
<ul class="no-bullet">
|
||
<li><a name="toc-Binary-data-formats-1" href="#Binary-data-formats">5.1 Binary data formats</a></li>
|
||
<li><a name="toc-dBase-files-_0028DBF_0029-1" href="#dBase-files-_0028DBF_0029">5.2 dBase files (DBF)</a></li>
|
||
</ul></li>
|
||
<li><a name="toc-Image-files-1" href="#Image-files">6 Image files</a></li>
|
||
<li><a name="toc-Connections-1" href="#Connections">7 Connections</a>
|
||
<ul class="no-bullet">
|
||
<li><a name="toc-Types-of-connections-1" href="#Types-of-connections">7.1 Types of connections</a></li>
|
||
<li><a name="toc-Output-to-connections-1" href="#Output-to-connections">7.2 Output to connections</a></li>
|
||
<li><a name="toc-Input-from-connections-1" href="#Input-from-connections">7.3 Input from connections</a>
|
||
<ul class="no-bullet">
|
||
<li><a name="toc-Pushback-1" href="#Pushback">7.3.1 Pushback</a></li>
|
||
</ul></li>
|
||
<li><a name="toc-Listing-and-manipulating-connections-1" href="#Listing-and-manipulating-connections">7.4 Listing and manipulating connections</a></li>
|
||
<li><a name="toc-Binary-connections-1" href="#Binary-connections">7.5 Binary connections</a>
|
||
<ul class="no-bullet">
|
||
<li><a name="toc-Special-values-1" href="#Special-values">7.5.1 Special values</a></li>
|
||
</ul></li>
|
||
</ul></li>
|
||
<li><a name="toc-Network-interfaces-1" href="#Network-interfaces">8 Network interfaces</a>
|
||
<ul class="no-bullet">
|
||
<li><a name="toc-Reading-from-sockets-1" href="#Reading-from-sockets">8.1 Reading from sockets</a></li>
|
||
<li><a name="toc-Using-download_002efile-1" href="#Using-download_002efile">8.2 Using <code>download.file</code></a></li>
|
||
</ul></li>
|
||
<li><a name="toc-Reading-Excel-spreadsheets-1" href="#Reading-Excel-spreadsheets">9 Reading Excel spreadsheets</a></li>
|
||
<li><a name="toc-References-1" href="#References">Appendix A References</a></li>
|
||
<li><a name="toc-Function-and-variable-index-1" href="#Function-and-variable-index">Function and variable index</a></li>
|
||
<li><a name="toc-Concept-index-1" href="#Concept-index">Concept index</a></li>
|
||
</ul>
|
||
</div>
|
||
|
||
|
||
<a name="Top"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Acknowledgements" accesskey="n" rel="next">Acknowledgements</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="R-Data-Import_002fExport"></a>
|
||
<h1 class="top">R Data Import/Export</h1>
|
||
|
||
<p>This is a guide to importing and exporting data to and from R.
|
||
</p>
|
||
<p>This manual is for R, version 3.3.1 (2016-06-21).
|
||
</p>
|
||
<p>Copyright © 2000–2016 R Core Team
|
||
</p>
|
||
<blockquote>
|
||
<p>Permission is granted to make and distribute verbatim copies of this
|
||
manual provided the copyright notice and this permission notice are
|
||
preserved on all copies.
|
||
</p>
|
||
<p>Permission is granted to copy and distribute modified versions of this
|
||
manual under the conditions for verbatim copying, provided that the
|
||
entire resulting derived work is distributed under the terms of a
|
||
permission notice identical to this one.
|
||
</p>
|
||
<p>Permission is granted to copy and distribute translations of this manual
|
||
into another language, under the above conditions for modified versions,
|
||
except that this permission notice may be stated in a translation
|
||
approved by the R Core Team.
|
||
</p></blockquote>
|
||
|
||
|
||
|
||
<table summary="" class="menu" border="0" cellspacing="0">
|
||
<tr><td align="left" valign="top">• <a href="#Acknowledgements" accesskey="1">Acknowledgements</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Introduction" accesskey="2">Introduction</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Spreadsheet_002dlike-data" accesskey="3">Spreadsheet-like data</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Importing-from-other-statistical-systems" accesskey="4">Importing from other statistical systems</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Relational-databases" accesskey="5">Relational databases</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Binary-files" accesskey="6">Binary files</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Image-files" accesskey="7">Image files</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Connections" accesskey="8">Connections</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Network-interfaces" accesskey="9">Network interfaces</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Reading-Excel-spreadsheets">Reading Excel spreadsheets</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#References">References</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Function-and-variable-index">Function and variable index</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Concept-index">Concept index</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
</table>
|
||
|
||
<hr>
|
||
<a name="Acknowledgements"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Introduction" accesskey="n" rel="next">Introduction</a>, Previous: <a href="#Top" accesskey="p" rel="prev">Top</a>, Up: <a href="#Top" accesskey="u" rel="up">Top</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Acknowledgements-1"></a>
|
||
<h2 class="unnumbered">Acknowledgements</h2>
|
||
|
||
<p>The relational databases part of this manual is based in part on an
|
||
earlier manual by Douglas Bates and Saikat DebRoy. The principal author
|
||
of this manual was Brian Ripley.
|
||
</p>
|
||
<p>Many volunteers have contributed to the packages used here. The
|
||
principal authors of the packages mentioned are
|
||
</p>
|
||
<blockquote>
|
||
<table summary="">
|
||
<tr><td><a href="https://CRAN.R-project.org/package=DBI"><strong>DBI</strong></a></td><td>David A. James</td></tr>
|
||
<tr><td><a href="https://CRAN.R-project.org/package=dataframes2xls"><strong>dataframes2xls</strong></a></td><td>Guido van Steen</td></tr>
|
||
<tr><td><a href="https://CRAN.R-project.org/package=foreign"><strong>foreign</strong></a></td><td>Thomas Lumley, Saikat DebRoy, Douglas Bates, Duncan Murdoch and Roger Bivand</td></tr>
|
||
<tr><td><a href="https://CRAN.R-project.org/package=gdata"><strong>gdata</strong></a></td><td>Gregory R. Warnes</td></tr>
|
||
<tr><td><a href="https://CRAN.R-project.org/package=hdf5"><strong>hdf5</strong></a></td><td>Marcus Daniels</td></tr>
|
||
<tr><td><a href="https://CRAN.R-project.org/package=ncdf"><strong>ncdf</strong></a>, <a href="https://CRAN.R-project.org/package=ncdf4"><strong>ncdf4</strong></a></td><td>David Pierce</td></tr>
|
||
<tr><td><a href="https://CRAN.R-project.org/package=rJava"><strong>rJava</strong></a></td><td>Simon Urbanek</td></tr>
|
||
<tr><td><a href="https://CRAN.R-project.org/package=RJDBC"><strong>RJDBC</strong></a></td><td>Simon Urbanek</td></tr>
|
||
<tr><td><a href="https://CRAN.R-project.org/package=RMySQL"><strong>RMySQL</strong></a></td><td>David James and Saikat DebRoy</td></tr>
|
||
<tr><td><a href="https://CRAN.R-project.org/package=RNetCDF"><strong>RNetCDF</strong></a></td><td>Pavel Michna</td></tr>
|
||
<tr><td><a href="https://CRAN.R-project.org/package=RODBC"><strong>RODBC</strong></a></td><td>Michael Lapsley and Brian Ripley</td></tr>
|
||
<tr><td><a href="https://CRAN.R-project.org/package=ROracle"><strong>ROracle</strong></a></td><td>David A, James</td></tr>
|
||
<tr><td><a href="https://CRAN.R-project.org/package=RPostgreSQL"><strong>RPostgreSQL</strong></a></td><td>Sameer Kumar Prayaga and Tomoaki Nishiyama</td></tr>
|
||
<tr><td><strong>RSPerl</strong></td><td>Duncan Temple Lang</td></tr>
|
||
<tr><td><strong>RSPython</strong></td><td>Duncan Temple Lang</td></tr>
|
||
<tr><td><a href="https://CRAN.R-project.org/package=RSQLite"><strong>RSQLite</strong></a></td><td>David A, James</td></tr>
|
||
<tr><td><strong>SJava</strong></td><td>John Chambers and Duncan Temple Lang</td></tr>
|
||
<tr><td><a href="https://CRAN.R-project.org/package=WriteXLS"><strong>WriteXLS</strong></a></td><td>Marc Schwartz</td></tr>
|
||
<tr><td><a href="https://CRAN.R-project.org/package=XLConnect"><strong>XLConnect</strong></a></td><td>Mirai Solutions GmbH</td></tr>
|
||
<tr><td><a href="https://CRAN.R-project.org/package=xlsReadWrite"><strong>xlsReadWrite</strong></a></td><td>Hans-Peter Suter</td></tr>
|
||
<tr><td><a href="https://CRAN.R-project.org/package=XML"><strong>XML</strong></a></td><td>Duncan Temple Lang</td></tr>
|
||
</table>
|
||
</blockquote>
|
||
|
||
<p>Brian Ripley is the author of the support for connections.
|
||
</p>
|
||
|
||
<hr>
|
||
<a name="Introduction"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Spreadsheet_002dlike-data" accesskey="n" rel="next">Spreadsheet-like data</a>, Previous: <a href="#Acknowledgements" accesskey="p" rel="prev">Acknowledgements</a>, Up: <a href="#Top" accesskey="u" rel="up">Top</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Introduction-1"></a>
|
||
<h2 class="chapter">1 Introduction</h2>
|
||
|
||
<p>Reading data into a statistical system for analysis and exporting the
|
||
results to some other system for report writing can be frustrating tasks
|
||
that can take far more time than the statistical analysis itself, even
|
||
though most readers will find the latter far more appealing.
|
||
</p>
|
||
<p>This manual describes the import and export facilities available either
|
||
in R itself or via packages which are available from <acronym>CRAN</acronym>
|
||
or elsewhere.
|
||
</p>
|
||
<p>Unless otherwise stated, everything described in this manual is (at
|
||
least in principle) available on all platforms running R.
|
||
</p>
|
||
<p>In general, statistical systems like R are not particularly well
|
||
suited to manipulations of large-scale data. Some other systems are
|
||
better than R at this, and part of the thrust of this manual is to
|
||
suggest that rather than duplicating functionality in R we can make
|
||
another system do the work! (For example Therneau & Grambsch (2000)
|
||
commented that they preferred to do data manipulation in SAS and then
|
||
use package <a href="https://CRAN.R-project.org/package=survival"><strong>survival</strong></a> in S for the analysis.) Database
|
||
manipulation systems are often very suitable for manipulating and
|
||
extracting data: several packages to interact with DBMSs are discussed
|
||
here.
|
||
</p>
|
||
<p>There are packages to allow functionality developed in languages such as
|
||
<code>Java</code>, <code>perl</code> and <code>python</code> to be directly integrated
|
||
with R code, making the use of facilities in these languages even
|
||
more appropriate. (See the <a href="https://CRAN.R-project.org/package=rJava"><strong>rJava</strong></a> package from <acronym>CRAN</acronym>
|
||
and the <strong>SJava</strong>, <strong>RSPerl</strong> and <strong>RSPython</strong> packages from the
|
||
Omegahat project, <a href="http://www.omegahat.net">http://www.omegahat.net</a>.)
|
||
</p>
|
||
|
||
<a name="index-Unix-tools"></a>
|
||
<a name="index-awk"></a>
|
||
<a name="index-perl"></a>
|
||
<p>It is also worth remembering that R like S comes from the Unix
|
||
tradition of small re-usable tools, and it can be rewarding to use tools
|
||
such as <code>awk</code> and <code>perl</code> to manipulate data before import or
|
||
after export. The case study in Becker, Chambers & Wilks (1988, Chapter
|
||
9) is an example of this, where Unix tools were used to check and
|
||
manipulate the data before input to S. The traditional Unix tools
|
||
are now much more widely available, including for Windows.
|
||
</p>
|
||
<table summary="" class="menu" border="0" cellspacing="0">
|
||
<tr><td align="left" valign="top">• <a href="#Imports" accesskey="1">Imports</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Export-to-text-files" accesskey="2">Export to text files</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#XML" accesskey="3">XML</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
</table>
|
||
|
||
<hr>
|
||
<a name="Imports"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Export-to-text-files" accesskey="n" rel="next">Export to text files</a>, Previous: <a href="#Introduction" accesskey="p" rel="prev">Introduction</a>, Up: <a href="#Introduction" accesskey="u" rel="up">Introduction</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Imports-1"></a>
|
||
<h3 class="section">1.1 Imports</h3>
|
||
<a name="index-scan"></a>
|
||
|
||
<table summary="" class="menu" border="0" cellspacing="0">
|
||
<tr><td align="left" valign="top">• <a href="#Encodings" accesskey="1">Encodings</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
</table>
|
||
|
||
<p>The easiest form of data to import into R is a simple text file, and
|
||
this will often be acceptable for problems of small or medium scale.
|
||
The primary function to import from a text file is <code>scan</code>, and this
|
||
underlies most of the more convenient functions discussed in
|
||
<a href="#Spreadsheet_002dlike-data">Spreadsheet-like data</a>.
|
||
</p>
|
||
<p>However, all statistical consultants are familiar with being presented
|
||
by a client with a memory stick (formerly, a floppy disc or CD-R) of
|
||
data in some proprietary binary format, for example ‘an Excel
|
||
spreadsheet’ or ‘an SPSS file’. Often the simplest thing to do is to
|
||
use the originating application to export the data as a text file (and
|
||
statistical consultants will have copies of the most common applications
|
||
on their computers for that purpose). However, this is not always
|
||
possible, and <a href="#Importing-from-other-statistical-systems">Importing from other statistical systems</a> discusses
|
||
what facilities are available to access such files directly from R.
|
||
For Excel spreadsheets, the available methods are summarized in
|
||
<a href="#Reading-Excel-spreadsheets">Reading Excel spreadsheets</a>.
|
||
</p>
|
||
<p>In a few cases, data have been stored in a binary form for compactness
|
||
and speed of access. One application of this that we have seen several
|
||
times is imaging data, which is normally stored as a stream of bytes as
|
||
represented in memory, possibly preceded by a header. Such data formats
|
||
are discussed in <a href="#Binary-files">Binary files</a> and <a href="#Binary-connections">Binary connections</a>.
|
||
</p>
|
||
<p>For much larger databases it is common to handle the data using a
|
||
database management system (DBMS). There is once again the option of
|
||
using the DBMS to extract a plain file, but for many such DBMSs the
|
||
extraction operation can be done directly from an R package:
|
||
See <a href="#Relational-databases">Relational databases</a>. Importing data via network connections is
|
||
discussed in <a href="#Network-interfaces">Network interfaces</a>.
|
||
</p>
|
||
<hr>
|
||
<a name="Encodings"></a>
|
||
<div class="header">
|
||
<p>
|
||
Previous: <a href="#Imports" accesskey="p" rel="prev">Imports</a>, Up: <a href="#Imports" accesskey="u" rel="up">Imports</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Encodings-1"></a>
|
||
<h4 class="subsection">1.1.1 Encodings</h4>
|
||
<a name="index-Encodings"></a>
|
||
|
||
<p>Unless the file to be imported from is entirely in <acronym>ASCII</acronym>, it
|
||
is usually necessary to know how it was encoded. For text files, a good
|
||
way to find out something about its structure is the <code>file</code>
|
||
command-line tool (for Windows, included in <code>Rtools</code>). This
|
||
reports something like
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">text.Rd: UTF-8 Unicode English text
|
||
text2.dat: ISO-8859 English text
|
||
text3.dat: Little-endian UTF-16 Unicode English character data,
|
||
with CRLF line terminators
|
||
intro.dat: UTF-8 Unicode text
|
||
intro.dat: UTF-8 Unicode (with BOM) text
|
||
</pre></div>
|
||
|
||
<p>Modern Unix-alike systems, including OS X, are likely to produce
|
||
UTF-8 files. Windows may produce what it calls ‘Unicode’ files
|
||
(<code>UCS-2LE</code> or just possibly <code>UTF-16LE</code><a name="DOCF1" href="#FOOT1"><sup>1</sup></a>). Otherwise most files will be in a
|
||
8-bit encoding unless from a Chinese/Japanese/Korean locale (which have
|
||
a wide range of encodings in common use). It is not possible to
|
||
automatically detect with certainty which 8-bit encoding (although
|
||
guesses may be possible and <code>file</code> may guess as it did in the
|
||
example above), so you may simply have to ask the originator for some
|
||
clues (e.g. ‘Russian on Windows’).
|
||
</p>
|
||
<p>‘BOMs’ (Byte Order Marks,
|
||
<a href="https://en.wikipedia.org/wiki/Byte_order_mark">https://en.wikipedia.org/wiki/Byte_order_mark</a>) cause problems for
|
||
Unicode files. In the Unix world BOMs are rarely used, whereas in the
|
||
Windows world they almost always are for UCS-2/UTF-16 files, and often
|
||
are for UTF-8 files. The <code>file</code> utility will not even recognize
|
||
UCS-2 files without a BOM, but many other utilities will refuse to read
|
||
files with a BOM and the <acronym>IANA</acronym> standards for <code>UTF-16LE</code>
|
||
and <code>UTF-16BE</code> prohibit it. We have too often been reduced to
|
||
looking at the file with the command-line utility <code>od</code> or a hex
|
||
editor to work out its encoding.
|
||
</p>
|
||
<p>Note that <code>utf8</code> is not a valid encoding name (<code>UTF-8</code> is),
|
||
and <code>macintosh</code> is the most portable name for what is sometimes
|
||
called ‘Mac Roman’ encoding.
|
||
</p>
|
||
<hr>
|
||
<a name="Export-to-text-files"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#XML" accesskey="n" rel="next">XML</a>, Previous: <a href="#Imports" accesskey="p" rel="prev">Imports</a>, Up: <a href="#Introduction" accesskey="u" rel="up">Introduction</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Export-to-text-files-1"></a>
|
||
<h3 class="section">1.2 Export to text files</h3>
|
||
<a name="index-Exporting-to-a-text-file"></a>
|
||
|
||
<p>Exporting results from R is usually a less contentious task, but
|
||
there are still a number of pitfalls. There will be a target
|
||
application in mind, and normally a text file will be the most
|
||
convenient interchange vehicle. (If a binary file is required, see
|
||
<a href="#Binary-files">Binary files</a>.)
|
||
</p>
|
||
<a name="index-cat"></a>
|
||
<p>Function <code>cat</code> underlies the functions for exporting data. It
|
||
takes a <code>file</code> argument, and the <code>append</code> argument allows a
|
||
text file to be written via successive calls to <code>cat</code>. Better,
|
||
especially if this is to be done many times, is to open a <code>file</code>
|
||
connection for writing or appending, and <code>cat</code> to that connection,
|
||
then <code>close</code> it.
|
||
</p>
|
||
<a name="index-write"></a>
|
||
<a name="index-write_002etable"></a>
|
||
<p>The most common task is to write a matrix or data frame to file as a
|
||
rectangular grid of numbers, possibly with row and column labels. This
|
||
can be done by the functions <code>write.table</code> and <code>write</code>.
|
||
Function <code>write</code> just writes out a matrix or vector in a specified
|
||
number of columns (and transposes a matrix). Function
|
||
<code>write.table</code> is more convenient, and writes out a data frame (or
|
||
an object that can be coerced to a data frame) with row and column
|
||
labels.
|
||
</p>
|
||
<p>There are a number of issues that need to be considered in writing out a
|
||
data frame to a text file.
|
||
</p>
|
||
<ol>
|
||
<li> <a name="index-format"></a>
|
||
<strong>Precision</strong>
|
||
|
||
<p>Most of the conversions of real/complex numbers done by these functions
|
||
is to full precision, but those by <code>write</code> are governed by the
|
||
current setting of <code>options(digits)</code>. For more control, use
|
||
<code>format</code> on a data frame, possibly column-by-column.
|
||
</p>
|
||
</li><li> <strong>Header line</strong>
|
||
|
||
<p>R prefers the header line to have no entry for the row names, so the
|
||
file looks like
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example"> dist climb time
|
||
Greenmantle 2.5 650 16.083
|
||
...
|
||
</pre></div>
|
||
|
||
<p>Some other systems require a (possibly empty) entry for the row names, which
|
||
is what <code>write.table</code> will provide if argument <code>col.names = NA</code>
|
||
is specified. Excel is one such system.
|
||
</p>
|
||
</li><li> <strong>Separator</strong>
|
||
<a name="index-CSV-files"></a>
|
||
<a name="index-comma-separated-values"></a>
|
||
<a name="index-write_002ecsv"></a>
|
||
<a name="index-write_002ecsv2"></a>
|
||
|
||
<p>A common field separator to use in the file is a comma, as that is
|
||
unlikely to appear in any of the fields in English-speaking countries.
|
||
Such files are known as CSV (comma separated values) files, and wrapper
|
||
function <code>write.csv</code> provides appropriate defaults. In some
|
||
locales the comma is used as the decimal point (set this in
|
||
<code>write.table</code> by <code>dec = ","</code>) and there CSV files use the
|
||
semicolon as the field separator: use <code>write.csv2</code> for appropriate
|
||
defaults. There is an IETF standard for CSV files (which mandates
|
||
commas and CRLF line endings, for which use <code>eol = "\r\n"</code>), RFC4180
|
||
(see <a href="https://tools.ietf.org/html/rfc4180">https://tools.ietf.org/html/rfc4180</a>), but what is more
|
||
important in practice is that the file is readable by the application it
|
||
is targeted at.
|
||
</p>
|
||
<p>Using a semicolon or tab (<code>sep = "\t"</code>) are probably the safest
|
||
options.
|
||
</p>
|
||
</li><li> <strong>Missing values</strong>
|
||
<a name="index-Missing-values"></a>
|
||
|
||
<p>By default missing values are output as <code>NA</code>, but this may be
|
||
changed by argument <code>na</code>. Note that <code>NaN</code>s are treated as
|
||
<code>NA</code> by <code>write.table</code>, but not by <code>cat</code> nor <code>write</code>.
|
||
</p>
|
||
</li><li> <strong>Quoting strings</strong>
|
||
<a name="index-Quoting-strings"></a>
|
||
|
||
<p>By default strings are quoted (including the row and column names).
|
||
Argument <code>quote</code> controls if character and factor variables are
|
||
quoted: some programs, for example <strong>Mondrian</strong>, do not accept quoted
|
||
strings (which are the default).
|
||
</p>
|
||
<p>Some care is needed if the strings contain embedded quotes. Three
|
||
useful forms are
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">> df <- data.frame(a = I("a \" quote"))
|
||
> write.table(df)
|
||
"a"
|
||
"1" "a \" quote"
|
||
> write.table(df, qmethod = "double")
|
||
"a"
|
||
"1" "a "" quote"
|
||
> write.table(df, quote = FALSE, sep = ",")
|
||
a
|
||
1,a " quote
|
||
</pre></div>
|
||
|
||
<p>The second is the form of escape commonly used by spreadsheets.
|
||
</p>
|
||
</li><li> <strong>Encodings</strong>
|
||
<a name="index-Encodings-1"></a>
|
||
|
||
<p>Text files do not contain metadata on their encodings, so for
|
||
non-<acronym>ASCII</acronym> data the file needs to be targetted to the
|
||
application intended to read it. All of these functions can write to a
|
||
<em>connection</em> which allows an encoding to be specified for the file,
|
||
and <code>write.table</code> has a <code>fileEncoding</code> argument to make this
|
||
easier.
|
||
</p>
|
||
<p>The hard part is to know what file encoding to use. For use on Windows,
|
||
it is best to use what Windows calls ‘Unicode’<a name="DOCF2" href="#FOOT2"><sup>2</sup></a>, that is <code>"UTF-16LE"</code>. Using UTF-8 is a good way
|
||
to make portable files that will not easily be confused with any other
|
||
encoding, but even OS X applications (where UTF-8 is the system
|
||
encoding) may not recognize them, and Windows applications are most
|
||
unlikely to. Apparently Excel:mac 2004/8 expects <code>.csv</code> files in
|
||
<code>"macroman"</code> encoding (the encoding used in much earlier versions
|
||
of Mac OS).
|
||
</p>
|
||
</li></ol>
|
||
|
||
<a name="index-write_002ematrix"></a>
|
||
<p>Function <code>write.matrix</code> in package <a href="https://CRAN.R-project.org/package=MASS"><strong>MASS</strong></a> provides a
|
||
specialized interface for writing matrices, with the option of writing
|
||
them in blocks and thereby reducing memory usage.
|
||
</p>
|
||
<a name="index-sink"></a>
|
||
<p>It is possible to use <code>sink</code> to divert the standard R output to
|
||
a file, and thereby capture the output of (possibly implicit)
|
||
<code>print</code> statements. This is not usually the most efficient route,
|
||
and the <code>options(width)</code> setting may need to be increased.
|
||
</p>
|
||
<a name="index-write_002eforeign"></a>
|
||
<p>Function <code>write.foreign</code> in package <a href="https://CRAN.R-project.org/package=foreign"><strong>foreign</strong></a> uses
|
||
<code>write.table</code> to produce a text file and also writes a code file
|
||
that will read this text file into another statistical package. There is
|
||
currently support for export to <code>SAS</code>, <code>SPSS</code> and <code>Stata</code>.
|
||
</p>
|
||
<hr>
|
||
<a name="XML"></a>
|
||
<div class="header">
|
||
<p>
|
||
Previous: <a href="#Export-to-text-files" accesskey="p" rel="prev">Export to text files</a>, Up: <a href="#Introduction" accesskey="u" rel="up">Introduction</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="XML-1"></a>
|
||
<h3 class="section">1.3 XML</h3>
|
||
<a name="index-XML"></a>
|
||
|
||
<p>When reading data from text files, it is the responsibility of the user
|
||
to know and to specify the conventions used to create that file,
|
||
e.g. the comment character, whether a header line is present, the value
|
||
separator, the representation for missing values (and so on) described
|
||
in <a href="#Export-to-text-files">Export to text files</a>. A markup language which can be used to
|
||
describe not only content but also the structure of the content can
|
||
make a file self-describing, so that one need not provide these details
|
||
to the software reading the data.
|
||
</p>
|
||
<p>The eXtensible Markup Language – more commonly known simply as
|
||
<acronym>XML</acronym> – can be used to provide such structure, not only for
|
||
standard datasets but also more complex data structures.
|
||
<acronym>XML</acronym> is becoming extremely popular and is emerging as a
|
||
standard for general data markup and exchange. It is being used by
|
||
different communities to describe geographical data such as maps,
|
||
graphical displays, mathematics and so on.
|
||
</p>
|
||
<p><acronym>XML</acronym> provides a way to specify the file’s encoding, e.g.
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example"><?xml version="1.0" encoding="UTF-8"?>
|
||
</pre></div>
|
||
|
||
<p>although it does not require it.
|
||
</p>
|
||
<p>The <a href="https://CRAN.R-project.org/package=XML"><strong>XML</strong></a> package provides general facilities for reading and
|
||
writing <acronym>XML</acronym> documents within R.
|
||
Package <a href="https://CRAN.R-project.org/package=StatDataML"><strong>StatDataML</strong></a> on <acronym>CRAN</acronym> is one example building
|
||
on <a href="https://CRAN.R-project.org/package=XML"><strong>XML</strong></a>.
|
||
</p>
|
||
<p>NB: <a href="https://CRAN.R-project.org/package=XML"><strong>XML</strong></a> is available as a binary package for Windows, normally
|
||
from the ‘CRAN extras’ repository (which is selected by default on
|
||
Windows).
|
||
</p>
|
||
<hr>
|
||
<a name="Spreadsheet_002dlike-data"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Importing-from-other-statistical-systems" accesskey="n" rel="next">Importing from other statistical systems</a>, Previous: <a href="#Introduction" accesskey="p" rel="prev">Introduction</a>, Up: <a href="#Top" accesskey="u" rel="up">Top</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Spreadsheet_002dlike-data-1"></a>
|
||
<h2 class="chapter">2 Spreadsheet-like data</h2>
|
||
<a name="index-Spreadsheet_002dlike-data"></a>
|
||
|
||
<table summary="" class="menu" border="0" cellspacing="0">
|
||
<tr><td align="left" valign="top">• <a href="#Variations-on-read_002etable" accesskey="1">Variations on read.table</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Fixed_002dwidth_002dformat-files" accesskey="2">Fixed-width-format files</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Data-Interchange-Format-_0028DIF_0029" accesskey="3">Data Interchange Format (DIF)</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Using-scan-directly" accesskey="4">Using scan directly</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Re_002dshaping-data" accesskey="5">Re-shaping data</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Flat-contingency-tables" accesskey="6">Flat contingency tables</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
</table>
|
||
|
||
<p>In <a href="#Export-to-text-files">Export to text files</a> we saw a number of variations on the
|
||
format of a spreadsheet-like text file, in which the data are presented
|
||
in a rectangular grid, possibly with row and column labels. In this
|
||
section we consider importing such files into R.
|
||
</p>
|
||
<hr>
|
||
<a name="Variations-on-read_002etable"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Fixed_002dwidth_002dformat-files" accesskey="n" rel="next">Fixed-width-format files</a>, Previous: <a href="#Spreadsheet_002dlike-data" accesskey="p" rel="prev">Spreadsheet-like data</a>, Up: <a href="#Spreadsheet_002dlike-data" accesskey="u" rel="up">Spreadsheet-like data</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Variations-on-read_002etable-1"></a>
|
||
<h3 class="section">2.1 Variations on <code>read.table</code></h3>
|
||
<a name="index-read_002etable"></a>
|
||
|
||
<p>The function <code>read.table</code> is the most convenient way to read in a
|
||
rectangular grid of data. Because of the many possibilities, there are
|
||
several other functions that call <code>read.table</code> but change a group
|
||
of default arguments.
|
||
</p>
|
||
<p>Beware that <code>read.table</code> is an inefficient way to read in
|
||
very large numerical matrices: see <code>scan</code> below.
|
||
</p>
|
||
<p>Some of the issues to consider are:
|
||
</p>
|
||
<ol>
|
||
<li> <strong>Encoding</strong>
|
||
|
||
<p>If the file contains non-<acronym>ASCII</acronym> character fields, ensure that
|
||
it is read in the correct encoding. This is mainly an issue for reading
|
||
Latin-1 files in a UTF-8 locale, which can be done by something like
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">read.table("file.dat", fileEncoding="latin1")
|
||
</pre></div>
|
||
|
||
<p>Note that this will work in any locale which can represent Latin-1
|
||
strings, but not many Greek/Russian/Chinese/Japanese … locales.
|
||
</p>
|
||
|
||
</li><li> <strong>Header line</strong>
|
||
|
||
<p>We recommend that you specify the <code>header</code> argument explicitly,
|
||
Conventionally the header line has entries only for the columns and not
|
||
for the row labels, so is one field shorter than the remaining lines.
|
||
(If R sees this, it sets <code>header = TRUE</code>.) If presented with a
|
||
file that has a (possibly empty) header field for the row labels, read
|
||
it in by something like
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">read.table("file.dat", header = TRUE, row.names = 1)
|
||
</pre></div>
|
||
|
||
<p>Column names can be given explicitly via the <code>col.names</code>; explicit
|
||
names override the header line (if present).
|
||
</p>
|
||
</li><li> <strong>Separator</strong>
|
||
|
||
<p>Normally looking at the file will determine the field separator to be
|
||
used, but with white-space separated files there may be a choice between
|
||
the default <code>sep = ""</code> which uses any white space (spaces, tabs or
|
||
newlines) as a separator, <code>sep = " "</code> and <code>sep = "\t"</code>. Note
|
||
that the choice of separator affects the input of quoted strings.
|
||
</p>
|
||
<p>If you have a tab-delimited file containing empty fields be sure to use
|
||
<code>sep = "\t"</code>.
|
||
</p>
|
||
|
||
</li><li> <strong>Quoting</strong>
|
||
<a name="index-Quoting-strings-1"></a>
|
||
|
||
<p>By default character strings can be quoted by either ‘<samp>"</samp>’ or
|
||
‘<samp>'</samp>’, and in each case all the characters up to a matching quote are
|
||
taken as part of the character string. The set of valid quoting
|
||
characters (which might be none) is controlled by the <code>quote</code>
|
||
argument. For <code>sep = "\n"</code> the default is changed to <code>quote =
|
||
""</code>.
|
||
</p>
|
||
<p>If no separator character is specified, quotes can be escaped within
|
||
quoted strings by immediately preceding them by ‘<samp>\</samp>’, C-style.
|
||
</p>
|
||
<p>If a separator character is specified, quotes can be escaped within
|
||
quoted strings by doubling them as is conventional in spreadsheets. For
|
||
example
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">'One string isn''t two',"one more"
|
||
</pre></div>
|
||
|
||
<p>can be read by
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">read.table("testfile", sep = ",")
|
||
</pre></div>
|
||
|
||
<p>This does not work with the default separator.
|
||
</p>
|
||
</li><li> <strong>Missing values</strong>
|
||
<a name="index-Missing-values-1"></a>
|
||
|
||
<p>By default the file is assumed to contain the character string <code>NA</code>
|
||
to represent missing values, but this can be changed by the argument
|
||
<code>na.strings</code>, which is a vector of one or more character
|
||
representations of missing values.
|
||
</p>
|
||
<p>Empty fields in numeric columns are also regarded as missing values.
|
||
</p>
|
||
<p>In numeric columns, the values <code>NaN</code>, <code>Inf</code> and <code>-Inf</code> are
|
||
accepted.
|
||
</p>
|
||
</li><li> <strong>Unfilled lines</strong>
|
||
|
||
<p>It is quite common for a file exported from a spreadsheet to have all
|
||
trailing empty fields (and their separators) omitted. To read such
|
||
files set <code>fill = TRUE</code>.
|
||
</p>
|
||
</li><li> <strong>White space in character fields</strong>
|
||
|
||
<p>If a separator is specified, leading and trailing white space in
|
||
character fields is regarded as part of the field. To strip the space,
|
||
use argument <code>strip.white = TRUE</code>.
|
||
</p>
|
||
</li><li> <strong>Blank lines</strong>
|
||
|
||
<p>By default, <code>read.table</code> ignores empty lines. This can be changed
|
||
by setting <code>blank.lines.skip = FALSE</code>, which will only be useful in
|
||
conjunction with <code>fill = TRUE</code>, perhaps to use blank rows to
|
||
indicate missing cases in a regular layout.
|
||
</p>
|
||
</li><li> <strong>Classes for the variables</strong>
|
||
|
||
<p>Unless you take any special action, <code>read.table</code> reads all the
|
||
columns as character vectors and then tries to select a suitable class
|
||
for each variable in the data frame. It tries in turn <code>logical</code>,
|
||
<code>integer</code>, <code>numeric</code> and <code>complex</code>, moving on if any
|
||
entry is not missing and cannot be converted.<a name="DOCF3" href="#FOOT3"><sup>3</sup></a>
|
||
If all of these fail, the variable is converted to a factor.
|
||
</p>
|
||
<p>Arguments <code>colClasses</code> and <code>as.is</code> provide greater control.
|
||
Specifying <code>as.is = TRUE</code> suppresses conversion of character
|
||
vectors to factors (only). Using <code>colClasses</code> allows the desired
|
||
class to be set for each column in the input: it will be faster and use
|
||
less memory.
|
||
</p>
|
||
<p>Note that <code>colClasses</code> and <code>as.is</code> are specified <em>per</em>
|
||
column, not <em>per</em> variable, and so include the column of row names
|
||
(if any).
|
||
</p>
|
||
</li><li> <strong>Comments</strong>
|
||
|
||
<p>By default, <code>read.table</code> uses ‘<samp>#</samp>’ as a comment character,
|
||
and if this is encountered (except in quoted strings) the rest of the
|
||
line is ignored. Lines containing only white space and a comment are
|
||
treated as blank lines.
|
||
</p>
|
||
<p>If it is known that there will be no comments in the data file, it is
|
||
safer (and may be faster) to use <code>comment.char = ""</code>.
|
||
</p>
|
||
</li><li> <strong>Escapes</strong>
|
||
|
||
<p>Many OSes have conventions for using backslash as an escape character in
|
||
text files, but Windows does not (and uses backslash in path names).
|
||
It is optional in R whether such conventions are applied to data files.
|
||
</p>
|
||
<p>Both <code>read.table</code> and <code>scan</code> have a logical argument
|
||
<code>allowEscapes</code>. This is false by default, and backslashes are then
|
||
only interpreted as (under circumstances described above) escaping
|
||
quotes. If this set to be true, C-style escapes are interpreted, namely
|
||
the control characters <code>\a, \b, \f, \n, \r, \t, \v</code> and octal and
|
||
hexadecimal representations like <code>\040</code> and <code>\0x2A</code>. Any
|
||
other escaped character is treated as itself, including backslash. Note
|
||
that Unicode escapes such as <code>\u<var>xxxx</var></code> are never interpreted.
|
||
</p>
|
||
</li><li> <strong>Encoding</strong>
|
||
|
||
<p>This can be specified by the <code>fileEncoding</code> argument, for example
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">fileEncoding = "UCS-2LE" # Windows ‘Unicode’ files
|
||
fileEncoding = "UTF-8"
|
||
</pre></div>
|
||
|
||
<p>If you know (correctly) the file’s encoding this will almost always
|
||
work. However, we know of one exception, UTF-8 files with a BOM. Some
|
||
people claim that UTF-8 files should never have a BOM, but some software
|
||
(apparently including Excel:mac) uses them, and many Unix-alike OSes do
|
||
not accept them. So faced with a file which <code>file</code> reports as
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">intro.dat: UTF-8 Unicode (with BOM) text
|
||
</pre></div>
|
||
|
||
<p>it can be read on Windows by
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">read.table("intro.dat", fileEncoding = "UTF-8")
|
||
</pre></div>
|
||
|
||
<p>but on a Unix-alike might need
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">read.table("intro.dat", fileEncoding = "UTF-8-BOM")
|
||
</pre></div>
|
||
|
||
<p>(This would most likely work without specifying an encoding in a UTF-8 locale.)
|
||
</p>
|
||
<p>Another problem with this (real-life) example is that whereas
|
||
<code>file-5.03</code> reported the BOM, <code>file-4.17</code> found on OS
|
||
10.5 (Leopard) did not.
|
||
</p></li></ol>
|
||
|
||
<a name="index-read_002ecsv"></a>
|
||
<a name="index-read_002ecsv2"></a>
|
||
<a name="index-read_002edelim"></a>
|
||
<a name="index-read_002edelim2"></a>
|
||
<a name="index-CSV-files-1"></a>
|
||
<a name="index-Sys_002elocaleconv"></a>
|
||
<a name="index-locales"></a>
|
||
<p>Convenience functions <code>read.csv</code> and <code>read.delim</code> provide
|
||
arguments to <code>read.table</code> appropriate for CSV and tab-delimited
|
||
files exported from spreadsheets in English-speaking locales. The
|
||
variations <code>read.csv2</code> and <code>read.delim2</code> are appropriate for
|
||
use in those locales where the comma is used for the decimal point and
|
||
(for <code>read.csv2</code>) for spreadsheets which use semicolons to separate
|
||
fields.
|
||
</p>
|
||
<p>If the options to <code>read.table</code> are specified incorrectly, the error
|
||
message will usually be of the form
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">Error in scan(file = file, what = what, sep = sep, :
|
||
line 1 did not have 5 elements
|
||
</pre></div>
|
||
|
||
<p>or
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">Error in read.table("files.dat", header = TRUE) :
|
||
more columns than column names
|
||
</pre></div>
|
||
|
||
<a name="index-count_002efields"></a>
|
||
|
||
<p>This may give enough information to find the problem, but the auxiliary
|
||
function <code>count.fields</code> can be useful to investigate further.
|
||
</p>
|
||
<p>Efficiency can be important when reading large data grids. It will help
|
||
to specify <code>comment.char = ""</code>, <code>colClasses</code> as one of the
|
||
atomic vector types (logical, integer, numeric, complex, character or
|
||
perhaps raw) for each column, and to give <code>nrows</code>, the number of
|
||
rows to be read (and a mild over-estimate is better than not specifying
|
||
this at all). See the examples in later sections.
|
||
</p>
|
||
|
||
<hr>
|
||
<a name="Fixed_002dwidth_002dformat-files"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Data-Interchange-Format-_0028DIF_0029" accesskey="n" rel="next">Data Interchange Format (DIF)</a>, Previous: <a href="#Variations-on-read_002etable" accesskey="p" rel="prev">Variations on read.table</a>, Up: <a href="#Spreadsheet_002dlike-data" accesskey="u" rel="up">Spreadsheet-like data</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Fixed_002dwidth_002dformat-files-1"></a>
|
||
<h3 class="section">2.2 Fixed-width-format files</h3>
|
||
<a name="index-Fixed_002dwidth_002dformat-files"></a>
|
||
|
||
<p>Sometimes data files have no field delimiters but have fields in
|
||
pre-specified columns. This was very common in the days of punched
|
||
cards, and is still sometimes used to save file space.
|
||
</p>
|
||
<a name="index-read_002efwf"></a>
|
||
<p>Function <code>read.fwf</code> provides a simple way to read such files,
|
||
specifying a vector of field widths. The function reads the file into
|
||
memory as whole lines, splits the resulting character strings, writes
|
||
out a temporary tab-separated file and then calls <code>read.table</code>.
|
||
This is adequate for small files, but for anything more complicated we
|
||
recommend using the facilities of a language like <code>perl</code> to
|
||
pre-process the file.
|
||
<a name="index-perl-1"></a>
|
||
</p>
|
||
|
||
<a name="index-read_002efortran"></a>
|
||
<p>Function <code>read.fortran</code> is a similar function for fixed-format files,
|
||
using Fortran-style column specifications.
|
||
</p>
|
||
<hr>
|
||
<a name="Data-Interchange-Format-_0028DIF_0029"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Using-scan-directly" accesskey="n" rel="next">Using scan directly</a>, Previous: <a href="#Fixed_002dwidth_002dformat-files" accesskey="p" rel="prev">Fixed-width-format files</a>, Up: <a href="#Spreadsheet_002dlike-data" accesskey="u" rel="up">Spreadsheet-like data</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Data-Interchange-Format-_0028DIF_0029-1"></a>
|
||
<h3 class="section">2.3 Data Interchange Format (DIF)</h3>
|
||
<a name="index-Data-Interchange-Format-_0028DIF_0029"></a>
|
||
|
||
<p>An old format sometimes used for spreadsheet-like data is DIF, or Data Interchange
|
||
format.
|
||
</p>
|
||
<a name="index-read_002eDIF"></a>
|
||
<p>Function <code>read.DIF</code> provides a simple way to read such files. It takes
|
||
arguments similar to <code>read.table</code> for assigning types to each of the columns.
|
||
</p>
|
||
<p>On Windows, spreadsheet programs often store spreadsheet data copied to
|
||
the clipboard in this format; <code>read.DIF("clipboard")</code> can read it
|
||
from there directly. It is slightly more robust than
|
||
<code>read.table("clipboard")</code> in handling spreadsheets with empty
|
||
cells.
|
||
</p>
|
||
<hr>
|
||
<a name="Using-scan-directly"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Re_002dshaping-data" accesskey="n" rel="next">Re-shaping data</a>, Previous: <a href="#Data-Interchange-Format-_0028DIF_0029" accesskey="p" rel="prev">Data Interchange Format (DIF)</a>, Up: <a href="#Spreadsheet_002dlike-data" accesskey="u" rel="up">Spreadsheet-like data</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Using-scan-directly-1"></a>
|
||
<h3 class="section">2.4 Using <code>scan</code> directly</h3>
|
||
<a name="index-scan-1"></a>
|
||
|
||
<p>Both <code>read.table</code> and <code>read.fwf</code> use <code>scan</code> to read the
|
||
file, and then process the results of <code>scan</code>. They are very
|
||
convenient, but sometimes it is better to use <code>scan</code> directly.
|
||
</p>
|
||
<p>Function <code>scan</code> has many arguments, most of which we have already
|
||
covered under <code>read.table</code>. The most crucial argument is
|
||
<code>what</code>, which specifies a list of modes of variables to be read
|
||
from the file. If the list is named, the names are used for the
|
||
components of the returned list. Modes can be numeric, character or
|
||
complex, and are usually specified by an example, e.g. <code>0</code>,
|
||
<code>""</code> or <code>0i</code>. For example
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">cat("2 3 5 7", "11 13 17 19", file="ex.dat", sep="\n")
|
||
scan(file="ex.dat", what=list(x=0, y="", z=0), flush=TRUE)
|
||
</pre></div>
|
||
|
||
<p>returns a list with three components and discards the fourth column in
|
||
the file.
|
||
</p>
|
||
<a name="index-readLines"></a>
|
||
<p>There is a function <code>readLines</code> which will be more convenient if
|
||
all you want is to read whole lines into R for further processing.
|
||
</p>
|
||
<p>One common use of <code>scan</code> is to read in a large matrix. Suppose
|
||
file <samp>matrix.dat</samp> just contains the numbers for a 200 x 2000
|
||
matrix. Then we can use
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">A <- matrix(scan("matrix.dat", n = 200*2000), 200, 2000, byrow = TRUE)
|
||
</pre></div>
|
||
|
||
<p>On one test this took 1 second (under Linux, 3 seconds under Windows on
|
||
the same machine) whereas
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">A <- as.matrix(read.table("matrix.dat"))
|
||
</pre></div>
|
||
|
||
<p>took 10 seconds (and more memory), and
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">A <- as.matrix(read.table("matrix.dat", header = FALSE, nrows = 200,
|
||
comment.char = "", colClasses = "numeric"))
|
||
</pre></div>
|
||
|
||
<p>took 7 seconds. The difference is almost entirely due to the overhead
|
||
of reading 2000 separate short columns: were they of length 2000,
|
||
<code>scan</code> took 9 seconds whereas <code>read.table</code> took 18 if used
|
||
efficiently (in particular, specifying <code>colClasses</code>) and 125 if
|
||
used naively.
|
||
</p>
|
||
|
||
<p>Note that timings can depend on the type read and the data.
|
||
Consider reading a million distinct integers:
|
||
</p><div class="example">
|
||
<pre class="example">writeLines(as.character((1+1e6):2e6), "ints.dat")
|
||
xi <- scan("ints.dat", what=integer(0), n=1e6) # 0.77s
|
||
xn <- scan("ints.dat", what=numeric(0), n=1e6) # 0.93s
|
||
xc <- scan("ints.dat", what=character(0), n=1e6) # 0.85s
|
||
xf <- as.factor(xc) # 2.2s
|
||
DF <- read.table("ints.dat") # 4.5s
|
||
</pre></div>
|
||
<p>and a million examples of a small set of codes:
|
||
</p><div class="example">
|
||
<pre class="example">code <- c("LMH", "SJC", "CHCH", "SPC", "SOM")
|
||
writeLines(sample(code, 1e6, replace=TRUE), "code.dat")
|
||
y <- scan("code.dat", what=character(0), n=1e6) # 0.44s
|
||
yf <- as.factor(y) # 0.21s
|
||
DF <- read.table("code.dat") # 4.9s
|
||
DF <- read.table("code.dat", nrows=1e6) # 3.6s
|
||
</pre></div>
|
||
|
||
<p>Note that these timings depend heavily on the operating system (the
|
||
basic reads in Windows take at least as twice as long as these Linux
|
||
times) and on the precise state of the garbage collector.
|
||
</p>
|
||
|
||
<hr>
|
||
<a name="Re_002dshaping-data"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Flat-contingency-tables" accesskey="n" rel="next">Flat contingency tables</a>, Previous: <a href="#Using-scan-directly" accesskey="p" rel="prev">Using scan directly</a>, Up: <a href="#Spreadsheet_002dlike-data" accesskey="u" rel="up">Spreadsheet-like data</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Re_002dshaping-data-1"></a>
|
||
<h3 class="section">2.5 Re-shaping data</h3>
|
||
<a name="index-Re_002dshaping-data"></a>
|
||
|
||
<p>Sometimes spreadsheet data is in a compact format that gives the
|
||
covariates for each subject followed by all the observations on that
|
||
subject. R’s modelling functions need observations in a single
|
||
column. Consider the following sample of data from repeated MRI brain
|
||
measurements
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example"> Status Age V1 V2 V3 V4
|
||
P 23646 45190 50333 55166 56271
|
||
CC 26174 35535 38227 37911 41184
|
||
CC 27723 25691 25712 26144 26398
|
||
CC 27193 30949 29693 29754 30772
|
||
CC 24370 50542 51966 54341 54273
|
||
CC 28359 58591 58803 59435 61292
|
||
CC 25136 45801 45389 47197 47126
|
||
</pre></div>
|
||
|
||
<p>There are two covariates and up to four measurements on each subject.
|
||
The data were exported from Excel as a file <samp>mr.csv</samp>.
|
||
</p>
|
||
<a name="index-stack"></a>
|
||
<p>We can use <code>stack</code> to help manipulate these data to give a single
|
||
response.
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">zz <- read.csv("mr.csv", strip.white = TRUE)
|
||
zzz <- cbind(zz[gl(nrow(zz), 1, 4*nrow(zz)), 1:2], stack(zz[, 3:6]))
|
||
</pre></div>
|
||
|
||
<p>with result
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example"> Status Age values ind
|
||
X1 P 23646 45190 V1
|
||
X2 CC 26174 35535 V1
|
||
X3 CC 27723 25691 V1
|
||
X4 CC 27193 30949 V1
|
||
X5 CC 24370 50542 V1
|
||
X6 CC 28359 58591 V1
|
||
X7 CC 25136 45801 V1
|
||
X11 P 23646 50333 V2
|
||
...
|
||
</pre></div>
|
||
|
||
<a name="index-unstack_002e"></a>
|
||
<p>Function <code>unstack</code> goes in the opposite direction, and may be
|
||
useful for exporting data.
|
||
</p>
|
||
<a name="index-reshape"></a>
|
||
<p>Another way to do this is to use the function
|
||
<code>reshape</code>, by
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">> reshape(zz, idvar="id",timevar="var",
|
||
varying=list(c("V1","V2","V3","V4")),direction="long")
|
||
Status Age var V1 id
|
||
1.1 P 23646 1 45190 1
|
||
2.1 CC 26174 1 35535 2
|
||
3.1 CC 27723 1 25691 3
|
||
4.1 CC 27193 1 30949 4
|
||
5.1 CC 24370 1 50542 5
|
||
6.1 CC 28359 1 58591 6
|
||
7.1 CC 25136 1 45801 7
|
||
1.2 P 23646 2 50333 1
|
||
2.2 CC 26174 2 38227 2
|
||
...
|
||
</pre></div>
|
||
|
||
<p>The <code>reshape</code> function has a more complicated syntax than
|
||
<code>stack</code> but can be used for data where the ‘long’ form has more
|
||
than the one column in this example. With <code>direction="wide"</code>,
|
||
<code>reshape</code> can also perform the opposite transformation.
|
||
</p>
|
||
<p>Some people prefer the tools in packages <a href="https://CRAN.R-project.org/package=reshape"><strong>reshape</strong></a>,
|
||
<a href="https://CRAN.R-project.org/package=reshape2"><strong>reshape2</strong></a> and <a href="https://CRAN.R-project.org/package=plyr"><strong>plyr</strong></a>.
|
||
</p>
|
||
<hr>
|
||
<a name="Flat-contingency-tables"></a>
|
||
<div class="header">
|
||
<p>
|
||
Previous: <a href="#Re_002dshaping-data" accesskey="p" rel="prev">Re-shaping data</a>, Up: <a href="#Spreadsheet_002dlike-data" accesskey="u" rel="up">Spreadsheet-like data</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Flat-contingency-tables-1"></a>
|
||
<h3 class="section">2.6 Flat contingency tables</h3>
|
||
<a name="index-Flat-contingency-tables"></a>
|
||
|
||
<p>Displaying higher-dimensional contingency tables in array form typically
|
||
is rather inconvenient. In categorical data analysis, such information
|
||
is often represented in the form of bordered two-dimensional arrays with
|
||
leading rows and columns specifying the combination of factor levels
|
||
corresponding to the cell counts. These rows and columns are typically
|
||
“ragged” in the sense that labels are only displayed when they change,
|
||
with the obvious convention that rows are read from top to bottom and
|
||
columns are read from left to right. In R, such “flat” contingency
|
||
tables can be created using <code>ftable</code>,
|
||
<a name="index-ftable"></a>
|
||
which creates objects of class <code>"ftable"</code> with an appropriate print
|
||
method.
|
||
</p>
|
||
<p>As a simple example, consider the R standard data set
|
||
<code>UCBAdmissions</code> which is a 3-dimensional contingency table
|
||
resulting from classifying applicants to graduate school at UC Berkeley
|
||
for the six largest departments in 1973 classified by admission and sex.
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">> data(UCBAdmissions)
|
||
> ftable(UCBAdmissions)
|
||
Dept A B C D E F
|
||
Admit Gender
|
||
Admitted Male 512 353 120 138 53 22
|
||
Female 89 17 202 131 94 24
|
||
Rejected Male 313 207 205 279 138 351
|
||
Female 19 8 391 244 299 317
|
||
</pre></div>
|
||
|
||
<p>The printed representation is clearly more useful than displaying the
|
||
data as a 3-dimensional array.
|
||
</p>
|
||
<p>There is also a function <code>read.ftable</code> for reading in flat-like
|
||
contingency tables from files.
|
||
<a name="index-read_002eftable"></a>
|
||
This has additional arguments for dealing with variants on how exactly
|
||
the information on row and column variables names and levels is
|
||
represented. The help page for <code>read.ftable</code> has some useful
|
||
examples. The flat tables can be converted to standard contingency
|
||
tables in array form using <code>as.table</code>.
|
||
</p>
|
||
<p>Note that flat tables are characterized by their “ragged” display of
|
||
row (and maybe also column) labels. If the full grid of levels of the
|
||
row variables is given, one should instead use <code>read.table</code> to read
|
||
in the data, and create the contingency table from this using
|
||
<code>xtabs</code>.
|
||
</p>
|
||
|
||
<hr>
|
||
<a name="Importing-from-other-statistical-systems"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Relational-databases" accesskey="n" rel="next">Relational databases</a>, Previous: <a href="#Spreadsheet_002dlike-data" accesskey="p" rel="prev">Spreadsheet-like data</a>, Up: <a href="#Top" accesskey="u" rel="up">Top</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Importing-from-other-statistical-systems-1"></a>
|
||
<h2 class="chapter">3 Importing from other statistical systems</h2>
|
||
<a name="index-Importing-from-other-statistical-systems"></a>
|
||
|
||
<p>In this chapter we consider the problem of reading a binary data file
|
||
written by another statistical system. This is often best avoided, but
|
||
may be unavoidable if the originating system is not available.
|
||
</p>
|
||
<p>In all cases the facilities described were written for data files from
|
||
specific versions of the other system (often in the early 2000s), and
|
||
have not necessarily been updated for the most recent versions of the
|
||
other system.
|
||
</p>
|
||
<table summary="" class="menu" border="0" cellspacing="0">
|
||
<tr><td align="left" valign="top">• <a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat" accesskey="1">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Octave" accesskey="2">Octave</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
</table>
|
||
|
||
<hr>
|
||
<a name="EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Octave" accesskey="n" rel="next">Octave</a>, Previous: <a href="#Importing-from-other-statistical-systems" accesskey="p" rel="prev">Importing from other statistical systems</a>, Up: <a href="#Importing-from-other-statistical-systems" accesskey="u" rel="up">Importing from other statistical systems</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="EpiInfo_002c-Minitab_002c-S_002dPLUS_002c-SAS_002c-SPSS_002c-Stata_002c-Systat"></a>
|
||
<h3 class="section">3.1 EpiInfo, Minitab, S-PLUS, SAS, SPSS, Stata, Systat</h3>
|
||
|
||
<p>The recommended package <a href="https://CRAN.R-project.org/package=foreign"><strong>foreign</strong></a> provides import facilities for
|
||
files produced by these statistical systems, and for export to Stata. In
|
||
some cases these functions may require substantially less memory than
|
||
<code>read.table</code> would. <code>write.foreign</code> (See <a href="#Export-to-text-files">Export to text files</a>) provides an export mechanism with support currently for
|
||
<code>SAS</code>, <code>SPSS</code> and <code>Stata</code>.
|
||
</p>
|
||
<a name="index-EpiInfo"></a>
|
||
<a name="index-EpiData"></a>
|
||
<a name="index-read_002eepiinfo"></a>
|
||
<p>EpiInfo versions 5 and 6 stored data in a self-describing fixed-width
|
||
text format. <code>read.epiinfo</code> will read these <samp>.REC</samp> files into
|
||
an R data frame. EpiData also produces data in this format.
|
||
</p>
|
||
<a name="index-Minitab"></a>
|
||
<a name="index-read_002emtp"></a>
|
||
<p>Function <code>read.mtp</code> imports a ‘Minitab Portable Worksheet’. This
|
||
returns the components of the worksheet as an R list.
|
||
</p>
|
||
<a name="index-SAS"></a>
|
||
<a name="index-read_002export"></a>
|
||
<p>Function <code>read.xport</code> reads a file in SAS Transport (XPORT) format
|
||
and return a list of data frames. If SAS is available on your system,
|
||
function <code>read.ssd</code> can be used to create and run a SAS script that
|
||
saves a SAS permanent dataset (<samp>.ssd</samp> or <samp>.sas7bdat</samp>) in
|
||
Transport format. It then calls <code>read.xport</code> to read the resulting
|
||
file. (Package <a href="https://CRAN.R-project.org/package=Hmisc"><strong>Hmisc</strong></a> has a similar function <code>sas.get</code>, also
|
||
running SAS.) For those without access to SAS but running on Windows,
|
||
the SAS System Viewer (a zero-cost download) can be used to open SAS
|
||
datasets and export them to e.g. <samp>.csv</samp> format.
|
||
</p>
|
||
<a name="index-S_002dPLUS"></a>
|
||
<a name="index-read_002eS"></a>
|
||
<a name="index-data_002erestore"></a>
|
||
|
||
<p>Function <code>read.S</code> which can read binary objects produced by S-PLUS
|
||
3.x, 4.x or 2000 on (32-bit) Unix or Windows (and can read them on a
|
||
different OS). This is able to read many but not all S objects: in
|
||
particular it can read vectors, matrices and data frames and lists
|
||
containing those.
|
||
</p>
|
||
<p>Function <code>data.restore</code> reads S-PLUS data dumps (created by
|
||
<code>data.dump</code>) with the same restrictions (except that dumps from the
|
||
Alpha platform can also be read). It should be possible to read data
|
||
dumps from S-PLUS 5.x and later written with <code>data.dump(oldStyle=T)</code>.
|
||
</p>
|
||
<p>If you have access to S-PLUS, it is usually more reliable to <code>dump</code>
|
||
the object(s) in S-PLUS and <code>source</code> the dump file in R. For
|
||
S-PLUS 5.x and later you may need to use <code>dump(..., oldStyle=T)</code>,
|
||
and to read in very large objects it may be preferable to use the dump
|
||
file as a batch script rather than use the <code>source</code> function.
|
||
</p>
|
||
<a name="index-SPSS"></a>
|
||
<a name="index-SPSS-Data-Entry"></a>
|
||
<a name="index-read_002espss"></a>
|
||
<p>Function <code>read.spss</code> can read files created by the ‘save’ and
|
||
‘export’ commands in <acronym>SPSS</acronym>. It returns a list with one
|
||
component for each variable in the saved data set. <acronym>SPSS</acronym>
|
||
variables with value labels are optionally converted to R factors.
|
||
</p>
|
||
<p><acronym>SPSS</acronym> Data Entry is an application for creating data entry
|
||
forms. By default it creates data files with extra formatting
|
||
information that <code>read.spss</code> cannot handle, but it is possible to
|
||
export the data in an ordinary <acronym>SPSS</acronym> format.
|
||
</p>
|
||
<p>Some third-party applications claim to produce data ‘in SPSS format’ but
|
||
with differences in the formats: <code>read.spss</code> may or may not be able
|
||
to handle these.
|
||
</p>
|
||
<a name="index-Stata"></a>
|
||
<a name="index-read_002edta"></a>
|
||
<a name="index-write_002edta"></a>
|
||
<p>Stata <samp>.dta</samp> files are a binary file format. Files from versions 5
|
||
up to 11 of Stata can be read and written by functions <code>read.dta</code>
|
||
and <code>write.dta</code>. Stata variables with value labels are optionally
|
||
converted to (and from) R factors. Stata version 12 by default
|
||
writes ‘format-115 datasets’: <code>read.dta</code> currently may not be able
|
||
to read those.
|
||
</p>
|
||
|
||
<a name="index-Systat"></a>
|
||
<a name="index-read_002esystat"></a>
|
||
<p><code>read.systat</code> reads those Systat <code>SAVE</code> files that are
|
||
rectangular data files (<code>mtype = 1</code>) written on little-endian
|
||
machines (such as from Windows). These have extension <samp>.sys</samp>
|
||
or (more recently) <samp>.syd</samp>.
|
||
</p>
|
||
|
||
<hr>
|
||
<a name="Octave"></a>
|
||
<div class="header">
|
||
<p>
|
||
Previous: <a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat" accesskey="p" rel="prev">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a>, Up: <a href="#Importing-from-other-statistical-systems" accesskey="u" rel="up">Importing from other statistical systems</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Octave-1"></a>
|
||
<h3 class="section">3.2 Octave</h3>
|
||
<a name="index-Octave"></a>
|
||
<a name="index-read_002eoctave"></a>
|
||
|
||
<p>Octave is a numerical linear algebra system
|
||
(<a href="http://www.octave.org">http://www.octave.org</a>), and function <code>read.octave</code> in
|
||
package <a href="https://CRAN.R-project.org/package=foreign"><strong>foreign</strong></a> can read in files in Octave text data format
|
||
created using the Octave command <code>save -ascii</code>, with support for
|
||
most of the common types of variables, including the standard atomic
|
||
(real and complex scalars, matrices, and <em>N</em>-d arrays, strings,
|
||
ranges, and boolean scalars and matrices) and recursive (structs, cells,
|
||
and lists) ones.
|
||
</p>
|
||
<hr>
|
||
<a name="Relational-databases"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Binary-files" accesskey="n" rel="next">Binary files</a>, Previous: <a href="#Importing-from-other-statistical-systems" accesskey="p" rel="prev">Importing from other statistical systems</a>, Up: <a href="#Top" accesskey="u" rel="up">Top</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Relational-databases-1"></a>
|
||
<h2 class="chapter">4 Relational databases</h2>
|
||
|
||
<a name="index-Relational-databases"></a>
|
||
<a name="index-DBMS"></a>
|
||
|
||
<table summary="" class="menu" border="0" cellspacing="0">
|
||
<tr><td align="left" valign="top">• <a href="#Why-use-a-database_003f" accesskey="1">Why use a database?</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Overview-of-RDBMSs" accesskey="2">Overview of RDBMSs</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#R-interface-packages" accesskey="3">R interface packages</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
</table>
|
||
|
||
<hr>
|
||
<a name="Why-use-a-database_003f"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Overview-of-RDBMSs" accesskey="n" rel="next">Overview of RDBMSs</a>, Previous: <a href="#Relational-databases" accesskey="p" rel="prev">Relational databases</a>, Up: <a href="#Relational-databases" accesskey="u" rel="up">Relational databases</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Why-use-a-database_003f-1"></a>
|
||
<h3 class="section">4.1 Why use a database?</h3>
|
||
|
||
<p>There are limitations on the types of data that R handles well.
|
||
Since all data being manipulated by R are resident in memory, and
|
||
several copies of the data can be created during execution of a
|
||
function, R is not well suited to extremely large data sets. Data
|
||
objects that are more than a (few) hundred megabytes in size can cause
|
||
R to run out of memory, particularly on a 32-bit operating system.
|
||
</p>
|
||
<p>R does not easily support concurrent access to data. That is, if
|
||
more than one user is accessing, and perhaps updating, the same data,
|
||
the changes made by one user will not be visible to the others.
|
||
</p>
|
||
<p>R does support persistence of data, in that you can save a data
|
||
object or an entire worksheet from one session and restore it at the
|
||
subsequent session, but the format of the stored data is specific to
|
||
R and not easily manipulated by other systems.
|
||
</p>
|
||
<p>Database management systems (DBMSs) and, in particular, relational
|
||
DBMSs (RDBMSs) <em>are</em> designed to do all of these things well.
|
||
Their strengths are
|
||
</p>
|
||
<ol>
|
||
<li> To provide fast access to selected parts of large databases.
|
||
|
||
</li><li> Powerful ways to summarize and cross-tabulate columns in databases.
|
||
|
||
</li><li> Store data in more organized ways than the rectangular grid model of
|
||
spreadsheets and R data frames.
|
||
|
||
</li><li> Concurrent access from multiple clients running on multiple hosts while
|
||
enforcing security constraints on access to the data.
|
||
|
||
</li><li> Ability to act as a server to a wide range of clients.
|
||
</li></ol>
|
||
|
||
<p>The sort of statistical applications for which DBMS might be used are to
|
||
extract a 10% sample of the data, to cross-tabulate data to produce a
|
||
multi-dimensional contingency table, and to extract data group by group
|
||
from a database for separate analysis.
|
||
</p>
|
||
<p>Increasingly OSes are themselves making use of DBMSs for these reasons,
|
||
so it is nowadays likely that one will be already installed on your
|
||
(non-Windows) OS. <a href="https://en.wikipedia.org/wiki/Akonadi">Akonadi</a>
|
||
is used by KDE4 to store personal information. Several OS X
|
||
applications, including Mail and Address Book, use SQLite.
|
||
</p>
|
||
<hr>
|
||
<a name="Overview-of-RDBMSs"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#R-interface-packages" accesskey="n" rel="next">R interface packages</a>, Previous: <a href="#Why-use-a-database_003f" accesskey="p" rel="prev">Why use a database?</a>, Up: <a href="#Relational-databases" accesskey="u" rel="up">Relational databases</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Overview-of-RDBMSs-1"></a>
|
||
<h3 class="section">4.2 Overview of RDBMSs</h3>
|
||
|
||
<p>Traditionally there had been large (and expensive) commercial RDBMSs
|
||
(<a href="http://www.informix.com">Informix</a>; <a href="https://www.oracle.com">Oracle</a>; <a href="http://www.sybase.com">Sybase</a>;
|
||
<a href="http://www.ibm.com/db2">IBM’s DB2</a>;
|
||
<a href="https://www.microsoft.com/SQL/default.mspx">Microsoft <acronym>SQL</acronym>
|
||
Server</a> on Windows) and academic and small-system databases (such as
|
||
MySQL<a name="DOCF4" href="#FOOT4"><sup>4</sup></a>, PostgreSQL, Microsoft
|
||
Access, …), the former marked out by much greater emphasis on data
|
||
security features. The line is blurring, with MySQL and PostgreSQL
|
||
having more and more high-end features, and free ‘express’ versions
|
||
being made available for the commercial DBMSs.
|
||
</p>
|
||
<a name="index-ODBC"></a>
|
||
<a name="index-Open-Database-Connectivity"></a>
|
||
<p>There are other commonly used data sources, including spreadsheets,
|
||
non-relational databases and even text files (possibly compressed).
|
||
Open Database Connectivity (<acronym>ODBC</acronym>) is a standard to use all of
|
||
these data sources. It originated on Windows (see
|
||
<a href="https://msdn.microsoft.com/en-us/library/ms710252%28v=vs.85%29.aspx">https://msdn.microsoft.com/en-us/library/ms710252%28v=vs.85%29.aspx</a>)
|
||
but is also implemented on Linux/Unix/OS X.
|
||
</p>
|
||
<p>All of the packages described later in this chapter provide clients to
|
||
client/server databases. The database can reside on the same machine or
|
||
(more often) remotely. There is an <acronym>ISO</acronym> standard (in fact
|
||
several: <acronym>SQL</acronym>92 is <acronym>ISO</acronym>/IEC 9075, also known as
|
||
<acronym>ANSI</acronym> X3.135-1992, and <acronym>SQL</acronym>99 is coming into use) for
|
||
an interface language called <acronym>SQL</acronym> (Structured Query Language,
|
||
sometimes pronounced ‘sequel’: see Bowman <em>et al.</em> 1996 and Kline
|
||
and Kline 2001) which these DBMSs support to varying degrees.
|
||
</p>
|
||
|
||
<table summary="" class="menu" border="0" cellspacing="0">
|
||
<tr><td align="left" valign="top">• <a href="#SQL-queries" accesskey="1">SQL queries</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Data-types" accesskey="2">Data types</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
</table>
|
||
|
||
<hr>
|
||
<a name="SQL-queries"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Data-types" accesskey="n" rel="next">Data types</a>, Previous: <a href="#Overview-of-RDBMSs" accesskey="p" rel="prev">Overview of RDBMSs</a>, Up: <a href="#Overview-of-RDBMSs" accesskey="u" rel="up">Overview of RDBMSs</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="SQL-queries-1"></a>
|
||
<h4 class="subsection">4.2.1 <acronym>SQL</acronym> queries</h4>
|
||
<a name="index-SQL-queries"></a>
|
||
|
||
<p>The more comprehensive R interfaces generate <acronym>SQL</acronym> behind the
|
||
scenes for common operations, but direct use of <acronym>SQL</acronym> is needed
|
||
for complex operations in all. Conventionally <acronym>SQL</acronym> is written
|
||
in upper case, but many users will find it more convenient to use lower
|
||
case in the R interface functions.
|
||
</p>
|
||
<p>A relational DBMS stores data as a database of <em>tables</em> (or
|
||
<em>relations</em>) which are rather similar to R data frames, in that
|
||
they are made up of <em>columns</em> or <em>fields</em> of one type
|
||
(numeric, character, date, currency, …) and <em>rows</em> or
|
||
<em>records</em> containing the observations for one entity.
|
||
</p>
|
||
<p><acronym>SQL</acronym> ‘queries’ are quite general operations on a relational
|
||
database. The classical query is a SELECT statement of the type
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">SELECT State, Murder FROM USArrests WHERE Rape > 30 ORDER BY Murder
|
||
|
||
SELECT t.sch, c.meanses, t.sex, t.achieve
|
||
FROM student as t, school as c WHERE t.sch = c.id
|
||
|
||
SELECT sex, COUNT(*) FROM student GROUP BY sex
|
||
|
||
SELECT sch, AVG(sestat) FROM student GROUP BY sch LIMIT 10
|
||
</pre></div>
|
||
|
||
<p>The first of these selects two columns from the R data frame
|
||
<code>USArrests</code> that has been copied across to a database table,
|
||
subsets on a third column and asks the results be sorted. The second
|
||
performs a database <em>join</em> on two tables <code>student</code> and
|
||
<code>school</code> and returns four columns. The third and fourth queries do
|
||
some cross-tabulation and return counts or averages. (The five
|
||
aggregation functions are COUNT(*) and SUM, MAX, MIN and AVG, each
|
||
applied to a single column.)
|
||
</p>
|
||
<p>SELECT queries use FROM to select the table, WHERE to specify a
|
||
condition for inclusion (or more than one condition separated by AND or
|
||
OR), and ORDER BY to sort the result. Unlike data frames, rows in RDBMS
|
||
tables are best thought of as unordered, and without an ORDER BY
|
||
statement the ordering is indeterminate. You can sort (in
|
||
lexicographical order) on more than one column by separating them by
|
||
commas. Placing DESC after an ORDER BY puts the sort in descending
|
||
order.
|
||
</p>
|
||
<p>SELECT DISTINCT queries will only return one copy of each distinct row
|
||
in the selected table.
|
||
</p>
|
||
<p>The GROUP BY clause selects subgroups of the rows according to the
|
||
criterion. If more than one column is specified (separated by commas)
|
||
then multi-way cross-classifications can be summarized by one of the
|
||
five aggregation functions. A HAVING clause allows the select to
|
||
include or exclude groups depending on the aggregated value.
|
||
</p>
|
||
<p>If the SELECT statement contains an ORDER BY statement that produces a
|
||
unique ordering, a LIMIT clause can be added to select (by number) a
|
||
contiguous block of output rows. This can be useful to retrieve rows a
|
||
block at a time. (It may not be reliable unless the ordering is unique,
|
||
as the LIMIT clause can be used to optimize the query.)
|
||
</p>
|
||
<p>There are queries to create a table (CREATE TABLE, but usually one
|
||
copies a data frame to the database in these interfaces), INSERT or
|
||
DELETE or UPDATE data. A table is destroyed by a DROP TABLE ‘query’.
|
||
</p>
|
||
<p>Kline and Kline (2001) discuss the details of the implementation of SQL
|
||
in Microsoft SQL Server 2000, Oracle, MySQL and PostgreSQL.
|
||
</p>
|
||
<hr>
|
||
<a name="Data-types"></a>
|
||
<div class="header">
|
||
<p>
|
||
Previous: <a href="#SQL-queries" accesskey="p" rel="prev">SQL queries</a>, Up: <a href="#Overview-of-RDBMSs" accesskey="u" rel="up">Overview of RDBMSs</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Data-types-1"></a>
|
||
<h4 class="subsection">4.2.2 Data types</h4>
|
||
|
||
<p>Data can be stored in a database in various data types. The range of
|
||
data types is DBMS-specific, but the <acronym>SQL</acronym> standard defines many
|
||
types, including the following that are widely implemented (often not by
|
||
the <acronym>SQL</acronym> name).
|
||
</p>
|
||
<dl compact="compact">
|
||
<dt><code>float(<var>p</var>)</code></dt>
|
||
<dd><p>Real number, with optional precision. Often called <code>real</code> or
|
||
<code>double</code> or <code>double precision</code>.
|
||
</p></dd>
|
||
<dt><code>integer</code></dt>
|
||
<dd><p>32-bit integer. Often called <code>int</code>.
|
||
</p></dd>
|
||
<dt><code>smallint</code></dt>
|
||
<dd><p>16-bit integer
|
||
</p></dd>
|
||
<dt><code>character(<var>n</var>)</code></dt>
|
||
<dd><p>fixed-length character string. Often called <code>char</code>.
|
||
</p></dd>
|
||
<dt><code>character varying(<var>n</var>)</code></dt>
|
||
<dd><p>variable-length character string. Often called <code>varchar</code>. Almost
|
||
always has a limit of 255 chars.
|
||
</p></dd>
|
||
<dt><code>boolean</code></dt>
|
||
<dd><p>true or false. Sometimes called <code>bool</code> or <code>bit</code>.
|
||
</p></dd>
|
||
<dt><code>date</code></dt>
|
||
<dd><p>calendar date
|
||
</p></dd>
|
||
<dt><code>time</code></dt>
|
||
<dd><p>time of day
|
||
</p></dd>
|
||
<dt><code>timestamp</code></dt>
|
||
<dd><p>date and time
|
||
</p></dd>
|
||
</dl>
|
||
|
||
<p>There are variants on <code>time</code> and <code>timestamp</code>, <code>with
|
||
timezone</code>. Other types widely implemented are <code>text</code> and
|
||
<code>blob</code>, for large blocks of text and binary data, respectively.
|
||
</p>
|
||
<p>The more comprehensive of the R interface packages hide the type
|
||
conversion issues from the user.
|
||
</p>
|
||
<hr>
|
||
<a name="R-interface-packages"></a>
|
||
<div class="header">
|
||
<p>
|
||
Previous: <a href="#Overview-of-RDBMSs" accesskey="p" rel="prev">Overview of RDBMSs</a>, Up: <a href="#Relational-databases" accesskey="u" rel="up">Relational databases</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="R-interface-packages-1"></a>
|
||
<h3 class="section">4.3 R interface packages</h3>
|
||
|
||
<p>There are several packages available on <acronym>CRAN</acronym> to help R
|
||
communicate with DBMSs. They provide different levels of abstraction.
|
||
Some provide means to copy whole data frames to and from databases. All
|
||
have functions to select data within the database via <acronym>SQL</acronym>
|
||
queries, and to retrieve the result as a whole as a
|
||
data frame or in pieces (usually as groups of rows).
|
||
</p>
|
||
<p>All except <a href="https://CRAN.R-project.org/package=RODBC"><strong>RODBC</strong></a> are tied to one DBMS, but there has been a
|
||
proposal for a unified ‘front-end’ package <a href="https://CRAN.R-project.org/package=DBI"><strong>DBI</strong></a>
|
||
(<a href="https://developer.r-project.org/db">https://developer.r-project.org/db</a>) in conjunction with a
|
||
‘back-end’, the most developed of which is <a href="https://CRAN.R-project.org/package=RMySQL"><strong>RMySQL</strong></a>. Also on
|
||
<acronym>CRAN</acronym> are the back-ends <a href="https://CRAN.R-project.org/package=ROracle"><strong>ROracle</strong></a>, <a href="https://CRAN.R-project.org/package=RPostgreSQL"><strong>RPostgreSQL</strong></a> and
|
||
<a href="https://CRAN.R-project.org/package=RSQLite"><strong>RSQLite</strong></a> (which works with the bundled DBMS <code>SQLite</code>,
|
||
<a href="https://www.sqlite.org">https://www.sqlite.org</a>), <a href="https://CRAN.R-project.org/package=RJDBC"><strong>RJDBC</strong></a> (which uses Java and can
|
||
connect to any DBMS that has a JDBC driver) and <a href="https://CRAN.R-project.org/package=RpgSQL"><strong>RpgSQL</strong></a> (a
|
||
specialist interface to PostgreSQL built on top of <a href="https://CRAN.R-project.org/package=RJDBC"><strong>RJDBC</strong></a>).
|
||
</p>
|
||
<p>The BioConductor project has updated <strong>RdbiPgSQL</strong> (formerly on
|
||
<acronym>CRAN</acronym> ca 2000), a first-generation interface to PostgreSQL.
|
||
</p>
|
||
<p><strong>PL/R</strong> (<a href="http://www.joeconway.com/plr/"><code>http://www.joeconway.com/plr/</code></a>) is a project to embed R into
|
||
PostgreSQL.
|
||
</p>
|
||
<p>Package <a href="https://CRAN.R-project.org/package=RMongo"><strong>RMongo</strong></a> provides an R interface to a Java client for
|
||
‘MongoDB’ (<a href="https://en.wikipedia.org/wiki/MongoDB">https://en.wikipedia.org/wiki/MongoDB</a>) databases, which
|
||
are queried using JavaScript rather than SQL. Package <a href="https://CRAN.R-project.org/package=rmongodb"><strong>rmongodb</strong></a> is
|
||
another client using <strong>mongodb</strong>’s C driver.
|
||
</p>
|
||
|
||
<table summary="" class="menu" border="0" cellspacing="0">
|
||
<tr><td align="left" valign="top">• <a href="#DBI" accesskey="1">DBI</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#RODBC" accesskey="2">RODBC</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
</table>
|
||
|
||
|
||
<hr>
|
||
<a name="DBI"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#RODBC" accesskey="n" rel="next">RODBC</a>, Previous: <a href="#R-interface-packages" accesskey="p" rel="prev">R interface packages</a>, Up: <a href="#R-interface-packages" accesskey="u" rel="up">R interface packages</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Packages-using-DBI"></a>
|
||
<h4 class="subsection">4.3.1 Packages using DBI</h4>
|
||
<a name="index-MySQL-database-system"></a>
|
||
|
||
<p>Package <a href="https://CRAN.R-project.org/package=RMySQL"><strong>RMySQL</strong></a> on <acronym>CRAN</acronym> provides an interface to the
|
||
MySQL database system (see <a href="https://www.mysql.com">https://www.mysql.com</a> and Dubois,
|
||
2000) or its fork MariaDB (see <a href="https://mariadb.org/">https://mariadb.org/</a>). The
|
||
description here applies to versions <code>0.5-0</code> and later: earlier
|
||
versions had a substantially different interface. The current version
|
||
requires the <a href="https://CRAN.R-project.org/package=DBI"><strong>DBI</strong></a> package, and this description will apply with
|
||
minor changes to all the other back-ends to <a href="https://CRAN.R-project.org/package=DBI"><strong>DBI</strong></a>.
|
||
</p>
|
||
<p>MySQL exists on Unix/Linux/OS X and Windows: there is a ‘Community
|
||
Edition’ released under GPL but commercial licenses are also available.
|
||
MySQL was originally a ‘light and lean’ database. (It preserves the
|
||
case of names where the operating file system is case-sensitive, so not
|
||
on Windows.)
|
||
</p>
|
||
|
||
<a name="index-dbDriver"></a>
|
||
<a name="index-dbConnect"></a>
|
||
<a name="index-dbDisconnect"></a>
|
||
<p>The call <code>dbDriver("MySQL")</code> returns a database connection manager
|
||
object, and then a call to <code>dbConnect</code> opens a database connection
|
||
which can subsequently be closed by a call to the generic function
|
||
<code>dbDisconnect</code>. Use <code>dbDriver("Oracle")</code>,
|
||
<code>dbDriver("PostgreSQL")</code> or <code>dbDriver("SQLite")</code> with those
|
||
DBMSs and packages <a href="https://CRAN.R-project.org/package=ROracle"><strong>ROracle</strong></a>, <a href="https://CRAN.R-project.org/package=RPostgreSQL"><strong>RPostgreSQL</strong></a> or <a href="https://CRAN.R-project.org/package=RSQLite"><strong>RSQLite</strong></a>
|
||
respectively.
|
||
</p>
|
||
<a name="index-dbSendQuery"></a>
|
||
<a name="index-dbClearResult"></a>
|
||
<a name="index-dbGetQuery"></a>
|
||
<p><acronym>SQL</acronym> queries can be sent by either <code>dbSendQuery</code> or
|
||
<code>dbGetQuery</code>. <code>dbGetquery</code> sends the query and retrieves the
|
||
results as a data frame. <code>dbSendQuery</code> sends the query and returns
|
||
an object of class inheriting from <code>"DBIResult"</code> which can be used
|
||
to retrieve the results, and subsequently used in a call to
|
||
<code>dbClearResult</code> to remove the result.
|
||
</p>
|
||
<a name="index-fetch"></a>
|
||
<p>Function <code>fetch</code> is used to retrieve some or all of the rows in the
|
||
query result, as a list. The function <code>dbHasCompleted</code> indicates if
|
||
all the rows have been fetched, and <code>dbGetRowCount</code> returns the
|
||
number of rows in the result.
|
||
</p>
|
||
<a name="index-dbReadTable"></a>
|
||
<a name="index-dbWriteTable"></a>
|
||
<a name="index-dbExistsTable"></a>
|
||
<a name="index-dbRemoveTable"></a>
|
||
<p>These are convenient interfaces to read/write/test/delete tables in the
|
||
database. <code>dbReadTable</code> and <code>dbWriteTable</code> copy to and from
|
||
an R data frame, mapping the row names of the data frame to the field
|
||
<code>row_names</code> in the <code>MySQL</code> table.
|
||
</p>
|
||
<div class="smallexample">
|
||
<pre class="smallexample">> library(RMySQL) # will load DBI as well
|
||
## open a connection to a MySQL database
|
||
> con <- dbConnect(dbDriver("MySQL"), dbname = "test")
|
||
## list the tables in the database
|
||
> dbListTables(con)
|
||
## load a data frame into the database, deleting any existing copy
|
||
> data(USArrests)
|
||
> dbWriteTable(con, "arrests", USArrests, overwrite = TRUE)
|
||
TRUE
|
||
> dbListTables(con)
|
||
[1] "arrests"
|
||
## get the whole table
|
||
> dbReadTable(con, "arrests")
|
||
Murder Assault UrbanPop Rape
|
||
Alabama 13.2 236 58 21.2
|
||
Alaska 10.0 263 48 44.5
|
||
Arizona 8.1 294 80 31.0
|
||
Arkansas 8.8 190 50 19.5
|
||
...
|
||
## Select from the loaded table
|
||
> dbGetQuery(con, paste("select row_names, Murder from arrests",
|
||
"where Rape > 30 order by Murder"))
|
||
row_names Murder
|
||
1 Colorado 7.9
|
||
2 Arizona 8.1
|
||
3 California 9.0
|
||
4 Alaska 10.0
|
||
5 New Mexico 11.4
|
||
6 Michigan 12.1
|
||
7 Nevada 12.2
|
||
8 Florida 15.4
|
||
> dbRemoveTable(con, "arrests")
|
||
> dbDisconnect(con)
|
||
</pre></div>
|
||
|
||
<hr>
|
||
<a name="RODBC"></a>
|
||
<div class="header">
|
||
<p>
|
||
Previous: <a href="#DBI" accesskey="p" rel="prev">DBI</a>, Up: <a href="#R-interface-packages" accesskey="u" rel="up">R interface packages</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Package-RODBC"></a>
|
||
<h4 class="subsection">4.3.2 Package RODBC</h4>
|
||
<a name="index-ODBC-1"></a>
|
||
<a name="index-Open-Database-Connectivity-1"></a>
|
||
|
||
<p>Package <a href="https://CRAN.R-project.org/package=RODBC"><strong>RODBC</strong></a> on <acronym>CRAN</acronym> provides an interface to
|
||
database sources supporting an <acronym>ODBC</acronym> interface. This is very
|
||
widely available, and allows the same R code to access different
|
||
database systems. <a href="https://CRAN.R-project.org/package=RODBC"><strong>RODBC</strong></a> runs on Unix/Linux, Windows and OS X,
|
||
and almost all database systems provide support for <acronym>ODBC</acronym>. We
|
||
have tested Microsoft SQL Server, Access, MySQL, PostgreSQL, Oracle and
|
||
IBM DB2 on Windows and MySQL, MariaDB, Oracle, PostgreSQL and SQLite on
|
||
Linux.
|
||
</p>
|
||
<p>ODBC is a client-server system, and we have happily connected to a DBMS
|
||
running on a Unix server from a Windows client, and <em>vice versa</em>.
|
||
</p>
|
||
<p>On Windows ODBC support is part of the OS. On Unix/Linux you will need
|
||
an <acronym>ODBC</acronym> Driver Manager such as unixODBC
|
||
(<a href="http://www.unixODBC.org">http://www.unixODBC.org</a>) or iOBDC (<a href="http://www.iODBC.org">http://www.iODBC.org</a>:
|
||
this is pre-installed in OS X) and an installed driver for your
|
||
database system.
|
||
</p>
|
||
<a name="index-Excel"></a>
|
||
<a name="index-_002exls"></a>
|
||
<a name="index-Dbase"></a>
|
||
<a name="index-_002edbf"></a>
|
||
<p>Windows provides drivers not just for DBMSs but also for Excel
|
||
(<samp>.xls</samp>) spreadsheets, DBase (<samp>.dbf</samp>) files and even text
|
||
files. (The named applications do <em>not</em> need to be
|
||
installed. Which file formats are supported depends on the versions of
|
||
the drivers.) There are versions for Excel and Access 2007/2010 (go to
|
||
<a href="https://www.microsoft.com/en-us/download/default.aspx">https://www.microsoft.com/en-us/download/default.aspx</a>, and
|
||
search for ‘Office ODBC’, which will lead to
|
||
<samp>AccessDatabaseEngine.exe</samp>), the ‘2007 Office System Driver’ (the
|
||
latter has a version for 64-bit Windows, and that will also read earlier
|
||
versions).
|
||
</p>
|
||
<p>On OS X the Actual Technologies
|
||
(<a href="https://www.actualtech.com/product_access.php">https://www.actualtech.com/product_access.php</a>) drivers
|
||
provide ODBC interfaces to Access databases (including Access 2007/2010)
|
||
and to Excel spreadsheets (not including Excel 2007/2010).
|
||
</p>
|
||
<a name="index-odbcConnect"></a>
|
||
<a name="index-odbcDriverConnect"></a>
|
||
<a name="index-odbcGetInfo"></a>
|
||
<p>Many simultaneous connections are possible. A connection is opened by a
|
||
call to <code>odbcConnect</code> or <code>odbcDriverConnect</code> (which on the
|
||
Windows GUI allows a database to be selected via dialog boxes) which
|
||
returns a handle used for subsequent access to the database. Printing a
|
||
connection will provide some details of the ODBC connection, and calling
|
||
<code>odbcGetInfo</code> will give details on the client and server.
|
||
</p>
|
||
|
||
<a name="index-odbcClose"></a>
|
||
<a name="index-close"></a>
|
||
<p>A connection is closed by a call to <code>close</code> or <code>odbcClose</code>,
|
||
and also (with a warning) when not R object refers to it and at the end
|
||
of an R session.
|
||
</p>
|
||
<a name="index-sqlTables"></a>
|
||
<p>Details of the tables on a connection can be found using
|
||
<code>sqlTables</code>.
|
||
</p>
|
||
<a name="index-sqlFetch"></a>
|
||
<a name="index-sqlSave"></a>
|
||
<p>Function <code>sqlSave</code> copies an R data frame to a table in the
|
||
database, and <code>sqlFetch</code> copies a table in the database to an R
|
||
data frame.
|
||
</p>
|
||
<a name="index-sqlQuery"></a>
|
||
<a name="index-sqlCopy"></a>
|
||
<a name="index-odbcQuery"></a>
|
||
<a name="index-sqlGetResults"></a>
|
||
<a name="index-sqlFetchMore"></a>
|
||
<p>An <acronym>SQL</acronym> query can be sent to the database by a call to
|
||
<code>sqlQuery</code>. This returns the result in an R data frame.
|
||
(<code>sqlCopy</code> sends a query to the database and saves the result as a
|
||
table in the database.) A finer level of control is attained by first
|
||
calling <code>odbcQuery</code> and then <code>sqlGetResults</code> to fetch the
|
||
results. The latter can be used within a loop to retrieve a limited
|
||
number of rows at a time, as can function <code>sqlFetchMore</code>.
|
||
</p>
|
||
<a name="index-PostgreSQL-database-system"></a>
|
||
<p>Here is an example using PostgreSQL, for which the <acronym>ODBC</acronym> driver
|
||
maps column and data frame names to lower case. We use a database
|
||
<code>testdb</code> we created earlier, and had the DSN (data source name) set
|
||
up in <samp>~/.odbc.ini</samp> under <code>unixODBC</code>. Exactly the same code
|
||
worked using MyODBC to access a MySQL database under Linux or Windows
|
||
(where MySQL also maps names to lowercase). Under Windows,
|
||
<acronym>DSN</acronym>s are set up in the <acronym>ODBC</acronym> applet in the Control
|
||
Panel (‘Data Sources (ODBC)’ in the ‘Administrative Tools’ section).
|
||
<a name="index-MySQL-database-system-1"></a>
|
||
</p>
|
||
<div class="smallexample">
|
||
<pre class="smallexample">> library(RODBC)
|
||
## tell it to map names to l/case
|
||
> channel <- odbcConnect("testdb", uid="ripley", case="tolower")
|
||
## load a data frame into the database
|
||
> data(USArrests)
|
||
> sqlSave(channel, USArrests, rownames = "state", addPK = TRUE)
|
||
> rm(USArrests)
|
||
## list the tables in the database
|
||
> sqlTables(channel)
|
||
TABLE_QUALIFIER TABLE_OWNER TABLE_NAME TABLE_TYPE REMARKS
|
||
1 usarrests TABLE
|
||
## list it
|
||
> sqlFetch(channel, "USArrests", rownames = "state")
|
||
murder assault urbanpop rape
|
||
Alabama 13.2 236 58 21.2
|
||
Alaska 10.0 263 48 44.5
|
||
...
|
||
## an SQL query, originally on one line
|
||
> sqlQuery(channel, "select state, murder from USArrests
|
||
where rape > 30 order by murder")
|
||
state murder
|
||
1 Colorado 7.9
|
||
2 Arizona 8.1
|
||
3 California 9.0
|
||
4 Alaska 10.0
|
||
5 New Mexico 11.4
|
||
6 Michigan 12.1
|
||
7 Nevada 12.2
|
||
8 Florida 15.4
|
||
## remove the table
|
||
> sqlDrop(channel, "USArrests")
|
||
## close the connection
|
||
> odbcClose(channel)
|
||
</pre></div>
|
||
|
||
<a name="index-Excel-1"></a>
|
||
<a name="index-_002exls-1"></a>
|
||
<a name="index-odbcConnectExcel"></a>
|
||
<p>As a simple example of using <acronym>ODBC</acronym> under Windows with a Excel
|
||
spreadsheet, we can read from a spreadsheet by
|
||
</p>
|
||
<div class="smallexample">
|
||
<pre class="smallexample">> library(RODBC)
|
||
> channel <- odbcConnectExcel("bdr.xls")
|
||
## list the spreadsheets
|
||
> sqlTables(channel)
|
||
TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE REMARKS
|
||
1 C:\\bdr NA Sheet1$ SYSTEM TABLE NA
|
||
2 C:\\bdr NA Sheet2$ SYSTEM TABLE NA
|
||
3 C:\\bdr NA Sheet3$ SYSTEM TABLE NA
|
||
4 C:\\bdr NA Sheet1$Print_Area TABLE NA
|
||
## retrieve the contents of sheet 1, by either of
|
||
> sh1 <- sqlFetch(channel, "Sheet1")
|
||
> sh1 <- sqlQuery(channel, "select * from [Sheet1$]")
|
||
</pre></div>
|
||
|
||
<p>Notice that the specification of the table is different from the name
|
||
returned by <code>sqlTables</code>: <code>sqlFetch</code> is able to map the
|
||
differences.
|
||
</p>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<hr>
|
||
<a name="Binary-files"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Image-files" accesskey="n" rel="next">Image files</a>, Previous: <a href="#Relational-databases" accesskey="p" rel="prev">Relational databases</a>, Up: <a href="#Top" accesskey="u" rel="up">Top</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Binary-files-1"></a>
|
||
<h2 class="chapter">5 Binary files</h2>
|
||
<a name="index-Binary-files"></a>
|
||
|
||
<table summary="" class="menu" border="0" cellspacing="0">
|
||
<tr><td align="left" valign="top">• <a href="#Binary-data-formats" accesskey="1">Binary data formats</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#dBase-files-_0028DBF_0029" accesskey="2">dBase files (DBF)</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
</table>
|
||
|
||
<p>Binary connections (<a href="#Connections">Connections</a>) are now the preferred way to
|
||
handle binary files.
|
||
</p>
|
||
|
||
|
||
<hr>
|
||
<a name="Binary-data-formats"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#dBase-files-_0028DBF_0029" accesskey="n" rel="next">dBase files (DBF)</a>, Previous: <a href="#Binary-files" accesskey="p" rel="prev">Binary files</a>, Up: <a href="#Binary-files" accesskey="u" rel="up">Binary files</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Binary-data-formats-1"></a>
|
||
<h3 class="section">5.1 Binary data formats</h3>
|
||
<a name="index-hdf5"></a>
|
||
<a name="index-Hierarchical-Data-Format"></a>
|
||
|
||
<a name="index-netCDF"></a>
|
||
<a name="index-network-Common-Data-Form"></a>
|
||
|
||
<p>Packages <a href="https://CRAN.R-project.org/package=hdf5"><strong>hdf5</strong></a>, <a href="https://CRAN.R-project.org/package=h5r"><strong>h5r</strong></a>, Bioconductor’s <strong>rhdf5</strong>,
|
||
<a href="https://CRAN.R-project.org/package=RNetCDF"><strong>RNetCDF</strong></a>, <a href="https://CRAN.R-project.org/package=ncdf"><strong>ncdf</strong></a> and <a href="https://CRAN.R-project.org/package=ncdf4"><strong>ncdf4</strong></a> on <acronym>CRAN</acronym> provide
|
||
interfaces to <acronym>NASA</acronym>’s HDF5 (Hierarchical Data Format, see
|
||
<a href="https://www.hdfgroup.org/HDF5/">https://www.hdfgroup.org/HDF5/</a>) and to UCAR’s netCDF data files
|
||
(network Common Data Form, see
|
||
<a href="http://www.unidata.ucar.edu/software/netcdf/">http://www.unidata.ucar.edu/software/netcdf/</a>).
|
||
</p>
|
||
<p>Both of these are systems to store scientific data in array-oriented
|
||
ways, including descriptions, labels, formats, units, …. HDF5 also
|
||
allows <em>groups</em> of arrays, and the R interface maps lists
|
||
to HDF5 groups, and can write numeric and character vectors and
|
||
matrices.
|
||
</p>
|
||
<p>NetCDF’s version 4 format (confusingly, implemented in netCDF 4.1.1 and
|
||
later, but not in 4.0.1) includes the use of various HDF5 formats. This
|
||
is handled by package <a href="https://CRAN.R-project.org/package=ncdf4"><strong>ncdf4</strong></a> whereas <a href="https://CRAN.R-project.org/package=RNetCDF"><strong>RNetCDF</strong></a> and
|
||
<a href="https://CRAN.R-project.org/package=ncdf"><strong>ncdf</strong></a> handle version 3 files.
|
||
</p>
|
||
<p>The availability of software to support these formats is somewhat
|
||
limited by platform, especially on Windows.
|
||
</p>
|
||
<hr>
|
||
<a name="dBase-files-_0028DBF_0029"></a>
|
||
<div class="header">
|
||
<p>
|
||
Previous: <a href="#Binary-data-formats" accesskey="p" rel="prev">Binary data formats</a>, Up: <a href="#Binary-files" accesskey="u" rel="up">Binary files</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="dBase-files-_0028DBF_0029-1"></a>
|
||
<h3 class="section">5.2 dBase files (DBF)</h3>
|
||
|
||
<a name="index-dBase"></a>
|
||
<a name="index-DBF-files"></a>
|
||
<p><code>dBase</code> was a DOS program written by Ashton-Tate and later owned by
|
||
Borland which has a binary flat-file format that became popular, with
|
||
file extension <samp>.dbf</samp>. It has been adopted for the ’Xbase’ family
|
||
of databases, covering dBase, Clipper, FoxPro and their Windows
|
||
equivalents Visual dBase, Visual Objects and Visual FoxPro (see
|
||
<a href="http://www.e-bachmann.dk/docs/xbase.htm">http://www.e-bachmann.dk/docs/xbase.htm</a>). A dBase file contains
|
||
a header and then a series of fields and so is most similar to an R
|
||
data frame. The data itself is stored in text format, and can include
|
||
character, logical and numeric fields, and other types in later versions
|
||
(see for example
|
||
<a href="http://www.digitalpreservation.gov/formats/fdd/fdd000325.shtml">http://www.digitalpreservation.gov/formats/fdd/fdd000325.shtml</a>
|
||
and
|
||
<a href="http://www.clicketyclick.dk/databases/xbase/format/index.html">http://www.clicketyclick.dk/databases/xbase/format/index.html</a>).
|
||
</p>
|
||
<a name="index-read_002edbf"></a>
|
||
<a name="index-write_002edbf"></a>
|
||
<p>Functions <code>read.dbf</code> and <code>write.dbf</code> provide ways to read and
|
||
write basic DBF files on all R platforms. For Windows users
|
||
<code>odbcConnectDbase</code> in package <a href="https://CRAN.R-project.org/package=RODBC"><strong>RODBC</strong></a> provides more
|
||
comprehensive facilities to read DBF files <em>via</em> Microsoft’s dBase
|
||
ODBC driver (and the Visual FoxPro driver can also be used via
|
||
<code>odbcDriverConnect</code>).
|
||
<a name="index-odbcConnectDbase"></a>
|
||
</p>
|
||
<hr>
|
||
<a name="Image-files"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Connections" accesskey="n" rel="next">Connections</a>, Previous: <a href="#Binary-files" accesskey="p" rel="prev">Binary files</a>, Up: <a href="#Top" accesskey="u" rel="up">Top</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Image-files-1"></a>
|
||
<h2 class="chapter">6 Image files</h2>
|
||
|
||
<p>A particular class of binary files are those representing images, and a
|
||
not uncommon request is to read such a file into R as a matrix.
|
||
</p>
|
||
<p>There are many formats for image files (most with lots of variants), and
|
||
it may be necessary to use external conversion software to first convert
|
||
the image into one of the formats for which a package currently provides
|
||
an R reader. A versatile example of such software is ImageMagick and
|
||
its fork GraphicsMagick. These provide command-line programs
|
||
<code>convert</code> and <code>gm convert</code> to convert images from one
|
||
format to another: what formats they can input is determined when they
|
||
are compiled, and the supported formats can be listed by e.g.
|
||
<code>convert -list format</code>.
|
||
</p>
|
||
<p>Package <a href="https://CRAN.R-project.org/package=pixmap"><strong>pixmap</strong></a> has a function <code>read.pnm</code> to read ‘portable
|
||
anymap’ images in PBM (black/white), PGM (grey) and PPM (RGB colour)
|
||
formats. These are also known as ‘netpbm’ formats.
|
||
</p>
|
||
<p>Packages <a href="https://CRAN.R-project.org/package=bmp"><strong>bmp</strong></a>, <a href="https://CRAN.R-project.org/package=jpeg"><strong>jpeg</strong></a> and <a href="https://CRAN.R-project.org/package=png"><strong>png</strong></a> read the
|
||
formats after which they are named. See also packages <a href="https://CRAN.R-project.org/package=biOps"><strong>biOps</strong></a>
|
||
and <a href="https://CRAN.R-project.org/package=Momocs"><strong>Momocs</strong></a>, and Bioconductor package <strong>EBImage</strong>.
|
||
</p>
|
||
<p>TIFF is more a meta-format, a wrapper within which a very large variety
|
||
of image formats can be embedded. Packages <a href="https://CRAN.R-project.org/package=rtiff"><strong>rtiff</strong></a> (orphaned)
|
||
and <a href="https://CRAN.R-project.org/package=tiff"><strong>tiff</strong></a> can read some of the sub-formats (depending on the
|
||
external <code>libtiff</code> software against which they are compiled).
|
||
There some facilities for specialized sub-formats, for example in
|
||
Bioconductor package <strong>beadarray</strong>.
|
||
</p>
|
||
<p>Raster files are common in the geographical sciences, and package
|
||
<a href="https://CRAN.R-project.org/package=rgdal"><strong>rgdal</strong></a> provides an interface to GDAL which provides some
|
||
facilities of its own to read raster files and links to many others.
|
||
Which formats it supports is determined when GDAL is compiled: use
|
||
<code>gdalDrivers()</code> to see what these are for the build you are using.
|
||
It can be useful for uncommon formats such as JPEG 2000 (which is a
|
||
different format from JPEG, and not currently supported in the OS X
|
||
nor Windows binary versions of <a href="https://CRAN.R-project.org/package=rgdal"><strong>rgdal</strong></a>).
|
||
</p>
|
||
|
||
<hr>
|
||
<a name="Connections"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Network-interfaces" accesskey="n" rel="next">Network interfaces</a>, Previous: <a href="#Image-files" accesskey="p" rel="prev">Image files</a>, Up: <a href="#Top" accesskey="u" rel="up">Top</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Connections-1"></a>
|
||
<h2 class="chapter">7 Connections</h2>
|
||
|
||
<a name="index-Connections"></a>
|
||
<p><em>Connections</em> are used in R in the sense of Chambers (1998) and
|
||
Ripley (2001), a set of functions to replace the use of file names by a
|
||
flexible interface to file-like objects.
|
||
</p>
|
||
|
||
<table summary="" class="menu" border="0" cellspacing="0">
|
||
<tr><td align="left" valign="top">• <a href="#Types-of-connections" accesskey="1">Types of connections</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Output-to-connections" accesskey="2">Output to connections</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Input-from-connections" accesskey="3">Input from connections</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Listing-and-manipulating-connections" accesskey="4">Listing and manipulating connections</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Binary-connections" accesskey="5">Binary connections</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
</table>
|
||
|
||
|
||
<hr>
|
||
<a name="Types-of-connections"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Output-to-connections" accesskey="n" rel="next">Output to connections</a>, Previous: <a href="#Connections" accesskey="p" rel="prev">Connections</a>, Up: <a href="#Connections" accesskey="u" rel="up">Connections</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Types-of-connections-1"></a>
|
||
<h3 class="section">7.1 Types of connections</h3>
|
||
<a name="index-Connections-1"></a>
|
||
|
||
<a name="index-file"></a>
|
||
<a name="index-File-connections"></a>
|
||
<p>The most familiar type of connection will be a file, and file
|
||
connections are created by function <code>file</code>. File connections can
|
||
(if the OS will allow it for the particular file) be opened for reading
|
||
or writing or appending, in text or binary mode. In fact, files can be
|
||
opened for both reading and writing, and R keeps a separate file
|
||
position for reading and writing.
|
||
</p>
|
||
<a name="index-open"></a>
|
||
<a name="index-close-1"></a>
|
||
<p>Note that by default a connection is not opened when it is created. The
|
||
rule is that a function using a connection should open a connection
|
||
(needed) if the connection is not already open, and close a connection
|
||
after use if it opened it. In brief, leave the connection in the state
|
||
you found it in. There are generic functions <code>open</code> and
|
||
<code>close</code> with methods to explicitly open and close connections.
|
||
</p>
|
||
<a name="index-gzfile"></a>
|
||
<a name="index-bzfile"></a>
|
||
<a name="index-Compressed-files"></a>
|
||
<p>Files compressed via the algorithm used by <code>gzip</code> can be used as
|
||
connections created by the function <code>gzfile</code>, whereas files
|
||
compressed by <code>bzip2</code> can be used via <code>bzfile</code>.
|
||
</p>
|
||
<a name="index-Terminal-connections"></a>
|
||
<a name="index-stdin"></a>
|
||
<a name="index-stdout"></a>
|
||
<a name="index-stderr"></a>
|
||
<p>Unix programmers are used to dealing with special files <code>stdin</code>,
|
||
<code>stdout</code> and <code>stderr</code>. These exist as <em>terminal
|
||
connections</em> in R. They may be normal files, but they might also
|
||
refer to input from and output to a GUI console. (Even with the standard
|
||
Unix R interface, <code>stdin</code> refers to the lines submitted from
|
||
<code>readline</code> rather than a file.)
|
||
</p>
|
||
<p>The three terminal connections are always open, and cannot be opened or
|
||
closed. <code>stdout</code> and <code>stderr</code> are conventionally used for
|
||
normal output and error messages respectively. They may normally go to
|
||
the same place, but whereas normal output can be re-directed by a call
|
||
to <code>sink</code>, error output is sent to <code>stderr</code> unless re-directed
|
||
by <code>sink, type="message")</code>. Note carefully the language used here:
|
||
the connections cannot be re-directed, but output can be sent to other
|
||
connections.
|
||
</p>
|
||
<a name="index-Text-connections"></a>
|
||
<a name="index-textConnection"></a>
|
||
<p><em>Text connections</em> are another source of input. They allow R
|
||
character vectors to be read as if the lines were being read from a text
|
||
file. A text connection is created and opened by a call to
|
||
<code>textConnection</code>, which copies the current contents of the
|
||
character vector to an internal buffer at the time of creation.
|
||
</p>
|
||
<p>Text connections can also be used to capture R output to a character
|
||
vector. <code>textConnection</code> can be asked to create a new character
|
||
object or append to an existing one, in both cases in the user’s
|
||
workspace. The connection is opened by the call to
|
||
<code>textConnection</code>, and at all times the complete lines output to the
|
||
connection are available in the R object. Closing the connection
|
||
writes any remaining output to a final element of the character vector.
|
||
</p>
|
||
<a name="index-Pipe-connections"></a>
|
||
<a name="index-pipe"></a>
|
||
<p><em>Pipes</em> are a special form of file that connects to another
|
||
process, and pipe connections are created by the function <code>pipe</code>.
|
||
Opening a pipe connection for writing (it makes no sense to append to a
|
||
pipe) runs an OS command, and connects its standard input to whatever
|
||
R then writes to that connection. Conversely, opening a pipe
|
||
connection for input runs an OS command and makes its standard output
|
||
available for R input from that connection.
|
||
</p>
|
||
<a name="index-URL-connections"></a>
|
||
<a name="index-url"></a>
|
||
<p><acronym>URL</acronym>s of types ‘<samp>http://</samp>’, ‘<samp>ftp://</samp>’ and ‘<samp>file://</samp>’
|
||
can be read from using the function <code>url</code>. For convenience,
|
||
<code>file</code> will also accept these as the file specification and call
|
||
<code>url</code>. On most platforms ‘<samp>https://</samp>’ are also accepted.
|
||
</p>
|
||
<a name="index-Sockets"></a>
|
||
<a name="index-socketConnection"></a>
|
||
<p>Sockets can also be used as connections via function
|
||
<code>socketConnection</code> on platforms which support Berkeley-like sockets
|
||
(most Unix systems, Linux and Windows). Sockets can be written to or
|
||
read from, and both client and server sockets can be used.
|
||
</p>
|
||
|
||
<hr>
|
||
<a name="Output-to-connections"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Input-from-connections" accesskey="n" rel="next">Input from connections</a>, Previous: <a href="#Types-of-connections" accesskey="p" rel="prev">Types of connections</a>, Up: <a href="#Connections" accesskey="u" rel="up">Connections</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Output-to-connections-1"></a>
|
||
<h3 class="section">7.2 Output to connections</h3>
|
||
<a name="index-Connections-2"></a>
|
||
|
||
<a name="index-cat-1"></a>
|
||
<a name="index-write-1"></a>
|
||
<a name="index-write_002etable-1"></a>
|
||
<a name="index-sink-1"></a>
|
||
<p>We have described functions <code>cat</code>, <code>write</code>, <code>write.table</code>
|
||
and <code>sink</code> as writing to a file, possibly appending to a file if
|
||
argument <code>append = TRUE</code>, and this is what they did prior to R
|
||
version 1.2.0.
|
||
</p>
|
||
<p>The current behaviour is equivalent, but what actually happens is that
|
||
when the <code>file</code> argument is a character string, a file connection
|
||
is opened (for writing or appending) and closed again at the end of the
|
||
function call. If we want to repeatedly write to the same file, it is
|
||
more efficient to explicitly declare and open the connection, and pass
|
||
the connection object to each call to an output function. This also
|
||
makes it possible to write to pipes, which was implemented earlier in a
|
||
limited way via the syntax <code>file = "|cmd"</code> (which can still be
|
||
used).
|
||
</p>
|
||
<a name="index-writeLines"></a>
|
||
<p>There is a function <code>writeLines</code> to write complete text lines
|
||
to a connection.
|
||
</p>
|
||
<p>Some simple examples are
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">zz <- file("ex.data", "w") # open an output file connection
|
||
cat("TITLE extra line", "2 3 5 7", "", "11 13 17",
|
||
file = zz, sep = "\n")
|
||
cat("One more line\n", file = zz)
|
||
close(zz)
|
||
|
||
## convert decimal point to comma in output, using a pipe (Unix)
|
||
## both R strings and (probably) the shell need \ doubled
|
||
zz <- pipe(paste("sed s/\\\\./,/ >", "outfile"), "w")
|
||
cat(format(round(rnorm(100), 4)), sep = "\n", file = zz)
|
||
close(zz)
|
||
## now look at the output file:
|
||
file.show("outfile", delete.file = TRUE)
|
||
|
||
## capture R output: use examples from help(lm)
|
||
zz <- textConnection("ex.lm.out", "w")
|
||
sink(zz)
|
||
example(lm, prompt.echo = "> ")
|
||
sink()
|
||
close(zz)
|
||
## now ‘ex.lm.out’ contains the output for futher processing.
|
||
## Look at it by, e.g.,
|
||
cat(ex.lm.out, sep = "\n")
|
||
</pre></div>
|
||
|
||
<hr>
|
||
<a name="Input-from-connections"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Listing-and-manipulating-connections" accesskey="n" rel="next">Listing and manipulating connections</a>, Previous: <a href="#Output-to-connections" accesskey="p" rel="prev">Output to connections</a>, Up: <a href="#Connections" accesskey="u" rel="up">Connections</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Input-from-connections-1"></a>
|
||
<h3 class="section">7.3 Input from connections</h3>
|
||
|
||
<a name="index-scan-2"></a>
|
||
<a name="index-read_002etable-1"></a>
|
||
<a name="index-readLines-1"></a>
|
||
<p>The basic functions to read from connections are <code>scan</code> and
|
||
<code>readLines</code>. These take a character string argument and open a
|
||
file connection for the duration of the function call, but explicitly
|
||
opening a file connection allows a file to be read sequentially in
|
||
different formats.
|
||
</p>
|
||
<p>Other functions that call <code>scan</code> can also make use of connections,
|
||
in particular <code>read.table</code>.
|
||
</p>
|
||
<p>Some simple examples are
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">## read in file created in last examples
|
||
readLines("ex.data")
|
||
unlink("ex.data")
|
||
|
||
## read listing of current directory (Unix)
|
||
readLines(pipe("ls -1"))
|
||
|
||
# remove trailing commas from an input file.
|
||
# Suppose we are given a file ‘data’ containing
|
||
450, 390, 467, 654, 30, 542, 334, 432, 421,
|
||
357, 497, 493, 550, 549, 467, 575, 578, 342,
|
||
446, 547, 534, 495, 979, 479
|
||
# Then read this by
|
||
scan(pipe("sed -e s/,$// data"), sep=",")
|
||
</pre></div>
|
||
|
||
<a name="index-URL-connections-1"></a>
|
||
<p>For convenience, if the <code>file</code> argument specifies a FTP or HTTP
|
||
<acronym>URL</acronym>, the <acronym>URL</acronym> is opened for reading via <code>url</code>.
|
||
Specifying files via ‘<samp>file://foo.bar</samp>’ is also allowed.
|
||
</p>
|
||
<table summary="" class="menu" border="0" cellspacing="0">
|
||
<tr><td align="left" valign="top">• <a href="#Pushback" accesskey="1">Pushback</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
</table>
|
||
|
||
<hr>
|
||
<a name="Pushback"></a>
|
||
<div class="header">
|
||
<p>
|
||
Previous: <a href="#Input-from-connections" accesskey="p" rel="prev">Input from connections</a>, Up: <a href="#Input-from-connections" accesskey="u" rel="up">Input from connections</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Pushback-1"></a>
|
||
<h4 class="subsection">7.3.1 Pushback</h4>
|
||
|
||
<a name="index-pushBack_002e"></a>
|
||
<a name="index-Pushback-on-a-connection"></a>
|
||
<p>C programmers may be familiar with the <code>ungetc</code> function to push
|
||
back a character onto a text input stream. R connections have the
|
||
same idea in a more powerful way, in that an (essentially) arbitrary
|
||
number of lines of text can be pushed back onto a connection via a call
|
||
to <code>pushBack</code>.
|
||
</p>
|
||
<p>Pushbacks operate as a stack, so a read request first uses each line
|
||
from the most recently pushbacked text, then those from earlier
|
||
pushbacks and finally reads from the connection itself. Once a
|
||
pushbacked line is read completely, it is cleared. The number of
|
||
pending lines pushed back can be found via a call to
|
||
<code>pushBackLength</code>.
|
||
<a name="index-pushBackLength"></a>
|
||
</p>
|
||
<p>A simple example will show the idea.
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">> zz <- textConnection(LETTERS)
|
||
> readLines(zz, 2)
|
||
[1] "A" "B"
|
||
> scan(zz, "", 4)
|
||
Read 4 items
|
||
[1] "C" "D" "E" "F"
|
||
> pushBack(c("aa", "bb"), zz)
|
||
> scan(zz, "", 4)
|
||
Read 4 items
|
||
[1] "aa" "bb" "G" "H"
|
||
> close(zz)
|
||
</pre></div>
|
||
|
||
<p>Pushback is only available for connections opened for input in text mode.
|
||
</p>
|
||
<hr>
|
||
<a name="Listing-and-manipulating-connections"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Binary-connections" accesskey="n" rel="next">Binary connections</a>, Previous: <a href="#Input-from-connections" accesskey="p" rel="prev">Input from connections</a>, Up: <a href="#Connections" accesskey="u" rel="up">Connections</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Listing-and-manipulating-connections-1"></a>
|
||
<h3 class="section">7.4 Listing and manipulating connections</h3>
|
||
<a name="index-Connections-3"></a>
|
||
|
||
<a name="index-showConnections"></a>
|
||
<p>A summary of all the connections currently opened by the user can be
|
||
found by <code>showConnections()</code>, and a summary of all connections,
|
||
including closed and terminal connections, by <code>showConnections(all
|
||
= TRUE)</code>
|
||
</p>
|
||
<a name="index-seek"></a>
|
||
<a name="index-isSeekable"></a>
|
||
<p>The generic function <code>seek</code> can be used to read and (on some
|
||
connections) reset the current position for reading or writing.
|
||
Unfortunately it depends on OS facilities which may be unreliable
|
||
(e.g. with text files under Windows). Function <code>isSeekable</code>
|
||
reports if <code>seek</code> can change the position on the connection
|
||
given by its argument.
|
||
</p>
|
||
<a name="index-truncate"></a>
|
||
<p>The function <code>truncate</code> can be used to truncate a file opened for
|
||
writing at its current position. It works only for <code>file</code>
|
||
connections, and is not implemented on all platforms.
|
||
</p>
|
||
|
||
<hr>
|
||
<a name="Binary-connections"></a>
|
||
<div class="header">
|
||
<p>
|
||
Previous: <a href="#Listing-and-manipulating-connections" accesskey="p" rel="prev">Listing and manipulating connections</a>, Up: <a href="#Connections" accesskey="u" rel="up">Connections</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Binary-connections-1"></a>
|
||
<h3 class="section">7.5 Binary connections</h3>
|
||
<a name="index-Binary-files-1"></a>
|
||
|
||
<a name="index-readBin"></a>
|
||
<a name="index-writeBin"></a>
|
||
<p>Functions <code>readBin</code> and <code>writeBin</code> read to and write from
|
||
binary connections. A connection is opened in binary mode by appending
|
||
<code>"b"</code> to the mode specification, that is using mode <code>"rb"</code> for
|
||
reading, and mode <code>"wb"</code> or <code>"ab"</code> (where appropriate) for
|
||
writing. The functions have arguments
|
||
</p>
|
||
<div class="example">
|
||
<pre class="example">readBin(con, what, n = 1, size = NA, endian = .Platform$endian)
|
||
writeBin(object, con, size = NA, endian = .Platform$endian)
|
||
</pre></div>
|
||
|
||
<p>In each case <code>con</code> is a connection which will be opened if
|
||
necessary for the duration of the call, and if a character string is
|
||
given it is assumed to specify a file name.
|
||
</p>
|
||
<p>It is slightly simpler to describe writing, so we will do that first.
|
||
<code>object</code> should be an atomic vector object, that is a vector of
|
||
mode <code>numeric</code>, <code>integer</code>, <code>logical</code>, <code>character</code>,
|
||
<code>complex</code> or <code>raw</code>, without attributes. By default this is
|
||
written to the file as a stream of bytes exactly as it is represented in
|
||
memory.
|
||
</p>
|
||
<p><code>readBin</code> reads a stream of bytes from the file and interprets them
|
||
as a vector of mode given by <code>what</code>. This can be either an object
|
||
of the appropriate mode (e.g. <code>what=integer()</code>) or a character
|
||
string describing the mode (one of the five given in the previous
|
||
paragraph or <code>"double"</code> or <code>"int"</code>). Argument <code>n</code>
|
||
specifies the maximum number of vector elements to read from the
|
||
connection: if fewer are available a shorter vector will be returned.
|
||
Argument <code>signed</code> allows 1-byte and 2-byte integers to be
|
||
read as signed (the default) or unsigned integers.
|
||
</p>
|
||
<p>The remaining two arguments are used to write or read data for
|
||
interchange with another program or another platform. By default binary
|
||
data is transferred directly from memory to the connection or <em>vice
|
||
versa</em>. This will not suffice if the data are to be transferred to a
|
||
machine with a different architecture, but between almost all R
|
||
platforms the only change needed is that of byte-order. Common PCs
|
||
(‘<samp>ix86</samp>’-based and ‘<samp>x86_64</samp>’-based machines), Compaq Alpha
|
||
and Vaxen are <em>little-endian</em>, whereas Sun Sparc, mc680x0 series,
|
||
IBM R6000, SGI and most others are <em>big-endian</em>. (Network
|
||
byte-order (as used by XDR, eXternal Data Representation) is
|
||
big-endian.) To transfer to or from other programs we may need to do
|
||
more, for example to read 16-bit integers or write single-precision real
|
||
numbers. This can be done using the <code>size</code> argument, which
|
||
(usually) allows sizes 1, 2, 4, 8 for integers and logicals, and sizes
|
||
4, 8 and perhaps 12 or 16 for reals. Transferring at different sizes
|
||
can lose precision, and should not be attempted for vectors containing
|
||
<code>NA</code>’s.
|
||
</p>
|
||
<a name="index-readChar"></a>
|
||
<a name="index-writeChar"></a>
|
||
<p>Character strings are read and written in C format, that is as a string
|
||
of bytes terminated by a zero byte. Functions <code>readChar</code> and
|
||
<code>writeChar</code> provide greater flexibility.
|
||
</p>
|
||
|
||
<table summary="" class="menu" border="0" cellspacing="0">
|
||
<tr><td align="left" valign="top">• <a href="#Special-values" accesskey="1">Special values</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
</table>
|
||
|
||
<hr>
|
||
<a name="Special-values"></a>
|
||
<div class="header">
|
||
<p>
|
||
Previous: <a href="#Binary-connections" accesskey="p" rel="prev">Binary connections</a>, Up: <a href="#Binary-connections" accesskey="u" rel="up">Binary connections</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Special-values-1"></a>
|
||
<h4 class="subsection">7.5.1 Special values</h4>
|
||
|
||
<p>Functions <code>readBin</code> and <code>writeBin</code> will pass missing and
|
||
special values, although this should not be attempted if a size change
|
||
is involved.
|
||
</p>
|
||
<p>The missing value for R logical and integer types is <code>INT_MIN</code>,
|
||
the smallest representable <code>int</code> defined in the C header
|
||
<samp>limits.h</samp>, normally corresponding to the bit pattern
|
||
<code>0x80000000</code>.
|
||
</p>
|
||
<p>The representation of the special values for R numeric and complex
|
||
types is machine-dependent, and possibly also compiler-dependent. The
|
||
simplest way to make use of them is to link an external application
|
||
against the standalone <code>Rmath</code> library which exports double
|
||
constants <code>NA_REAL</code>, <code>R_PosInf</code> and <code>R_NegInf</code>, and
|
||
include the header <samp>Rmath.h</samp> which defines the macros <code>ISNAN</code>
|
||
and <code>R_FINITE</code>.
|
||
</p>
|
||
<p>If that is not possible, on all current platforms IEC 60559 (aka IEEE
|
||
754) arithmetic is used, so standard C facilities can be used to test
|
||
for or set <code>Inf</code>, <code>-Inf</code> and <code>NaN</code> values. On such
|
||
platforms <code>NA</code> is represented by the <code>NaN</code> value with low-word
|
||
<code>0x7a2</code> (1954 in decimal).
|
||
</p>
|
||
<p>Character missing values are written as <code>NA</code>, and there are no
|
||
provision to recognize character values as missing (as this can be done
|
||
by re-assigning them once read).
|
||
</p>
|
||
|
||
<hr>
|
||
<a name="Network-interfaces"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Reading-Excel-spreadsheets" accesskey="n" rel="next">Reading Excel spreadsheets</a>, Previous: <a href="#Connections" accesskey="p" rel="prev">Connections</a>, Up: <a href="#Top" accesskey="u" rel="up">Top</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Network-interfaces-1"></a>
|
||
<h2 class="chapter">8 Network interfaces</h2>
|
||
|
||
<table summary="" class="menu" border="0" cellspacing="0">
|
||
<tr><td align="left" valign="top">• <a href="#Reading-from-sockets" accesskey="1">Reading from sockets</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
<tr><td align="left" valign="top">• <a href="#Using-download_002efile" accesskey="2">Using download.file</a>:</td><td> </td><td align="left" valign="top">
|
||
</td></tr>
|
||
</table>
|
||
|
||
<p>Some limited facilities are available to exchange data at a lower level
|
||
across network connections.
|
||
</p>
|
||
<hr>
|
||
<a name="Reading-from-sockets"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Using-download_002efile" accesskey="n" rel="next">Using download.file</a>, Previous: <a href="#Network-interfaces" accesskey="p" rel="prev">Network interfaces</a>, Up: <a href="#Network-interfaces" accesskey="u" rel="up">Network interfaces</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Reading-from-sockets-1"></a>
|
||
<h3 class="section">8.1 Reading from sockets</h3>
|
||
|
||
<a name="index-Sockets-1"></a>
|
||
<p>Base R comes with some facilities to communicate <em>via</em>
|
||
<acronym>BSD</acronym> sockets on systems that support them (including the common
|
||
Linux, Unix and Windows ports of R). One potential problem with
|
||
using sockets is that these facilities are often blocked for security
|
||
reasons or to force the use of Web caches, so these functions may be
|
||
more useful on an intranet than externally. For new projects it
|
||
is suggested that socket connections are used instead.
|
||
</p>
|
||
<a name="index-make_002esocket"></a>
|
||
<a name="index-read_002esocket"></a>
|
||
<a name="index-write_002esocket"></a>
|
||
<a name="index-close_002esocket"></a>
|
||
<p>The earlier low-level interface is given by functions <code>make.socket</code>,
|
||
<code>read.socket</code>, <code>write.socket</code> and <code>close.socket</code>.
|
||
</p>
|
||
|
||
<hr>
|
||
<a name="Using-download_002efile"></a>
|
||
<div class="header">
|
||
<p>
|
||
Previous: <a href="#Reading-from-sockets" accesskey="p" rel="prev">Reading from sockets</a>, Up: <a href="#Network-interfaces" accesskey="u" rel="up">Network interfaces</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Using-download_002efile-1"></a>
|
||
<h3 class="section">8.2 Using <code>download.file</code></h3>
|
||
|
||
<p>Function <code>download.file</code> is provided to read a file from a
|
||
Web resource via FTP or HTTP and write it to a file. Often this can be
|
||
avoided, as functions such as <code>read.table</code> and <code>scan</code> can read
|
||
directly from a URL, either by explicitly using <code>url</code> to open a
|
||
connection, or implicitly using it by giving a URL as the <code>file</code>
|
||
argument.
|
||
</p>
|
||
|
||
<hr>
|
||
<a name="Reading-Excel-spreadsheets"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#References" accesskey="n" rel="next">References</a>, Previous: <a href="#Network-interfaces" accesskey="p" rel="prev">Network interfaces</a>, Up: <a href="#Top" accesskey="u" rel="up">Top</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Reading-Excel-spreadsheets-1"></a>
|
||
<h2 class="chapter">9 Reading Excel spreadsheets</h2>
|
||
|
||
<p>The most common R data import/export question seems to be ‘how do I read
|
||
an Excel spreadsheet’. This chapter collects together advice and
|
||
options given earlier. Note that most of the advice is for pre-Excel
|
||
2007 spreadsheets and not the later <samp>.xlsx</samp> format.
|
||
</p>
|
||
<a name="index-read_002ecsv-1"></a>
|
||
<a name="index-read_002edelim-1"></a>
|
||
<a name="index-read_002eDIF-1"></a>
|
||
<a name="index-read_002etable-2"></a>
|
||
<a name="index-readClipboard"></a>
|
||
<p>The first piece of advice is to avoid doing so if possible! If you have
|
||
access to Excel, export the data you want from Excel in tab-delimited or
|
||
comma-separated form, and use <code>read.delim</code> or <code>read.csv</code> to
|
||
import it into R. (You may need to use <code>read.delim2</code> or
|
||
<code>read.csv2</code> in a locale that uses comma as the decimal point.)
|
||
Exporting a DIF file and reading it using <code>read.DIF</code> is another
|
||
possibility.
|
||
</p>
|
||
<p>If you do not have Excel, many other programs are able to read such
|
||
spreadsheets and export in a text format on both Windows and Unix, for
|
||
example Gnumeric (<a href="http://www.gnome.org/projects/gnumeric/">http://www.gnome.org/projects/gnumeric/</a>) and
|
||
OpenOffice (<a href="https://www.openoffice.org">https://www.openoffice.org</a>). You can also
|
||
cut-and-paste between the display of a spreadsheet in such a program and
|
||
R: <code>read.table</code> will read from the R console or, under Windows,
|
||
from the clipboard (via <code>file = "clipboard"</code> or
|
||
<code>readClipboard</code>). The <code>read.DIF</code> function can also read from
|
||
the clipboard.
|
||
</p>
|
||
<p>Note that an Excel <samp>.xls</samp> file is not just a spreadsheet: such
|
||
files can contain many sheets, and the sheets can contain formulae,
|
||
macros and so on. Not all readers can read other than the first sheet,
|
||
and may be confused by other contents of the file.
|
||
</p>
|
||
<a name="index-odbcConnectExcel-1"></a>
|
||
<a name="index-odbcConnectExcel2007"></a>
|
||
<p>Windows users (of 32-bit R) can use <code>odbcConnectExcel</code> in
|
||
package <a href="https://CRAN.R-project.org/package=RODBC"><strong>RODBC</strong></a>. This can select rows and columns from any of the
|
||
sheets in an Excel spreadsheet file (at least from Excel 97–2003,
|
||
depending on your ODBC drivers: by calling <code>odbcConnect</code> directly
|
||
versions back to Excel 3.0 can be read). The version
|
||
<code>odbcConnectExcel2007</code> will read the Excel 2007 formats as well as
|
||
earlier ones (provided the drivers are installed, including with 64-bit
|
||
Windows R: see <a href="#RODBC">RODBC</a>). OS X users can also use <a href="https://CRAN.R-project.org/package=RODBC"><strong>RODBC</strong></a> if
|
||
they have a suitable driver (e.g. that from Actual Technologies).
|
||
</p>
|
||
<a name="index-read_002exls"></a>
|
||
<p><code>Perl</code> users have contributed a module
|
||
<code>OLE::SpreadSheet::ParseExcel</code> and a program <code>xls2csv.pl</code> to
|
||
convert Excel 95–2003 spreadsheets to CSV files. Package <a href="https://CRAN.R-project.org/package=gdata"><strong>gdata</strong></a>
|
||
provides a basic wrapper in its <code>read.xls</code> function. With suitable
|
||
<code>Perl</code> modules installed this function can also read Excel 2007
|
||
spreadsheets.
|
||
</p>
|
||
<a name="index-xlsReadWrite"></a>
|
||
<p>32-bit Windows package <a href="https://CRAN.R-project.org/package=xlsReadWrite"><strong>xlsReadWrite</strong></a> from
|
||
<a href="http://www.swissr.org/">http://www.swissr.org/</a> and CRAN has a function <code>read.xls</code> to
|
||
read <samp>.xls</samp> files (based on a third-party non-Open-Source Delphi
|
||
component).
|
||
</p>
|
||
<a name="index-dataframes2xls"></a>
|
||
<a name="index-WriteXLS"></a>
|
||
<p>Packages <a href="https://CRAN.R-project.org/package=dataframes2xls"><strong>dataframes2xls</strong></a> and <a href="https://CRAN.R-project.org/package=WriteXLS"><strong>WriteXLS</strong></a> each contain a function
|
||
to <em>write</em> one or more data frames to an <samp>.xls</samp> file, using
|
||
Python and Perl respectively. Another version of <code>write.xls</code> in
|
||
available in package <a href="https://CRAN.R-project.org/package=xlsReadWrite"><strong>xlsReadWrite</strong></a>.
|
||
</p>
|
||
<a name="index-xlsx"></a>
|
||
<a name="index-RExcelXML"></a>
|
||
<p>Two packages which can read and and manipulate Excel 2007/10
|
||
spreadsheets but not earlier formats are <a href="https://CRAN.R-project.org/package=xlsx"><strong>xlsx</strong></a> (which requires
|
||
Java) and the Omegahat package <strong>RExcelXML</strong>.
|
||
</p>
|
||
<a name="index-XLConnect"></a>
|
||
<p>Package <a href="https://CRAN.R-project.org/package=XLConnect"><strong>XLConnect</strong></a> can read, write and manipulate both Excel
|
||
97–2003 and Excel 2007/10 spreadsheets, requiring Java.
|
||
</p>
|
||
|
||
<hr>
|
||
<a name="References"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Function-and-variable-index" accesskey="n" rel="next">Function and variable index</a>, Previous: <a href="#Reading-Excel-spreadsheets" accesskey="p" rel="prev">Reading Excel spreadsheets</a>, Up: <a href="#Top" accesskey="u" rel="up">Top</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="References-1"></a>
|
||
<h2 class="appendix">Appendix A References</h2>
|
||
|
||
<p>R. A. Becker, J. M. Chambers and A. R. Wilks (1988)
|
||
<em>The New S Language. A Programming Environment for Data Analysis
|
||
and Graphics.</em> Wadsworth & Brooks/Cole.
|
||
</p>
|
||
<p>J. Bowman, S. Emberson and M. Darnovsky (1996) <em>The
|
||
Practical <acronym>SQL</acronym> Handbook. Using Structured Query Language.</em>
|
||
Addison-Wesley.
|
||
</p>
|
||
<p>J. M. Chambers (1998) <em>Programming with Data. A Guide to the S
|
||
Language.</em> Springer-Verlag.
|
||
</p>
|
||
<p>P. Dubois (2000) <em>MySQL.</em> New Riders.
|
||
</p>
|
||
<p>M. Henning and S. Vinoski (1999) <em>Advanced CORBA Programming
|
||
with C++.</em> Addison-Wesley.
|
||
</p>
|
||
<p>K. Kline and D. Kline (2001) <em>SQL in a Nutshell.</em> O’Reilly.
|
||
</p>
|
||
<p>B. Momjian (2000) <em>PostgreSQL: Introduction and Concepts.</em>
|
||
Addison-Wesley.
|
||
Also available at <a href="http://momjian.us/main/writings/pgsql/aw_pgsql_book/">http://momjian.us/main/writings/pgsql/aw_pgsql_book/</a>.
|
||
</p>
|
||
<p>B. D. Ripley (2001) Connections. \<em>R News</em>, <strong>1/1</strong>, 16–7.
|
||
\<a href="https://www.r-project.org/doc/Rnews/Rnews_2001-1.pdf">https://www.r-project.org/doc/Rnews/Rnews_2001-1.pdf</a>
|
||
</p>
|
||
|
||
<p>T. M. Therneau and P. M. Grambsch (2000) <em>Modeling Survival
|
||
Data. Extending the Cox Model.</em> Springer-Verlag.
|
||
</p>
|
||
<p>E. J. Yarger, G. Reese and T. King (1999) <em>MySQL & mSQL</em>.
|
||
O’Reilly.
|
||
</p>
|
||
<hr>
|
||
<a name="Function-and-variable-index"></a>
|
||
<div class="header">
|
||
<p>
|
||
Next: <a href="#Concept-index" accesskey="n" rel="next">Concept index</a>, Previous: <a href="#References" accesskey="p" rel="prev">References</a>, Up: <a href="#Top" accesskey="u" rel="up">Top</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Function-and-variable-index-1"></a>
|
||
<h2 class="unnumbered">Function and variable index</h2>
|
||
|
||
<table summary=""><tr><th valign="top">Jump to: </th><td><a class="summary-letter" href="#Function-and-variable-index_vr_symbol-1"><b>.</b></a>
|
||
|
||
<br>
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-B"><b>B</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-C"><b>C</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-D"><b>D</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-F"><b>F</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-G"><b>G</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-H"><b>H</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-I"><b>I</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-M"><b>M</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-N"><b>N</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-O"><b>O</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-P"><b>P</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-R"><b>R</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-S"><b>S</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-T"><b>T</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-U"><b>U</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-W"><b>W</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-X"><b>X</b></a>
|
||
|
||
</td></tr></table>
|
||
<table summary="" class="index-vr" border="0">
|
||
<tr><td></td><th align="left">Index Entry</th><td> </td><th align="left"> Section</th></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_symbol-1">.</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-_002edbf"><code>.dbf</code></a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-_002exls"><code>.xls</code></a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-_002exls-1"><code>.xls</code></a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_letter-B">B</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-bzfile"><code>bzfile</code></a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_letter-C">C</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-cat"><code>cat</code></a>:</td><td> </td><td valign="top"><a href="#Export-to-text-files">Export to text files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-cat-1"><code>cat</code></a>:</td><td> </td><td valign="top"><a href="#Output-to-connections">Output to connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-close"><code>close</code></a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-close-1"><code>close</code></a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-close_002esocket"><code>close.socket</code></a>:</td><td> </td><td valign="top"><a href="#Reading-from-sockets">Reading from sockets</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-count_002efields"><code>count.fields</code></a>:</td><td> </td><td valign="top"><a href="#Variations-on-read_002etable">Variations on read.table</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_letter-D">D</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-data_002erestore"><code>data.restore</code></a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-dataframes2xls"><code>dataframes2xls</code></a>:</td><td> </td><td valign="top"><a href="#Reading-Excel-spreadsheets">Reading Excel spreadsheets</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-dbClearResult"><code>dbClearResult</code></a>:</td><td> </td><td valign="top"><a href="#DBI">DBI</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-dbConnect"><code>dbConnect</code></a>:</td><td> </td><td valign="top"><a href="#DBI">DBI</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-dbDisconnect"><code>dbDisconnect</code></a>:</td><td> </td><td valign="top"><a href="#DBI">DBI</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-dbDriver"><code>dbDriver</code></a>:</td><td> </td><td valign="top"><a href="#DBI">DBI</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-dbExistsTable"><code>dbExistsTable</code></a>:</td><td> </td><td valign="top"><a href="#DBI">DBI</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-dbGetQuery"><code>dbGetQuery</code></a>:</td><td> </td><td valign="top"><a href="#DBI">DBI</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-dbReadTable"><code>dbReadTable</code></a>:</td><td> </td><td valign="top"><a href="#DBI">DBI</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-dbRemoveTable"><code>dbRemoveTable</code></a>:</td><td> </td><td valign="top"><a href="#DBI">DBI</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-dbSendQuery"><code>dbSendQuery</code></a>:</td><td> </td><td valign="top"><a href="#DBI">DBI</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-dbWriteTable"><code>dbWriteTable</code></a>:</td><td> </td><td valign="top"><a href="#DBI">DBI</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_letter-F">F</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-fetch"><code>fetch</code></a>:</td><td> </td><td valign="top"><a href="#DBI">DBI</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-file"><code>file</code></a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-format"><code>format</code></a>:</td><td> </td><td valign="top"><a href="#Export-to-text-files">Export to text files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-ftable"><code>ftable</code></a>:</td><td> </td><td valign="top"><a href="#Flat-contingency-tables">Flat contingency tables</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_letter-G">G</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-gzfile"><code>gzfile</code></a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_letter-H">H</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-hdf5"><code>hdf5</code></a>:</td><td> </td><td valign="top"><a href="#Binary-data-formats">Binary data formats</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_letter-I">I</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-isSeekable"><code>isSeekable</code></a>:</td><td> </td><td valign="top"><a href="#Listing-and-manipulating-connections">Listing and manipulating connections</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_letter-M">M</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-make_002esocket"><code>make.socket</code></a>:</td><td> </td><td valign="top"><a href="#Reading-from-sockets">Reading from sockets</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_letter-N">N</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-netCDF"><code>netCDF</code></a>:</td><td> </td><td valign="top"><a href="#Binary-data-formats">Binary data formats</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_letter-O">O</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-odbcClose"><code>odbcClose</code></a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-odbcConnect"><code>odbcConnect</code></a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-odbcConnectDbase"><code>odbcConnectDbase</code></a>:</td><td> </td><td valign="top"><a href="#dBase-files-_0028DBF_0029">dBase files (DBF)</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-odbcConnectExcel"><code>odbcConnectExcel</code></a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-odbcConnectExcel-1"><code>odbcConnectExcel</code></a>:</td><td> </td><td valign="top"><a href="#Reading-Excel-spreadsheets">Reading Excel spreadsheets</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-odbcConnectExcel2007"><code>odbcConnectExcel2007</code></a>:</td><td> </td><td valign="top"><a href="#Reading-Excel-spreadsheets">Reading Excel spreadsheets</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-odbcDriverConnect"><code>odbcDriverConnect</code></a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-odbcGetInfo"><code>odbcGetInfo</code></a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-odbcQuery"><code>odbcQuery</code></a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-open"><code>open</code></a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_letter-P">P</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-pipe"><code>pipe</code></a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-pushBack_002e"><code>pushBack.</code></a>:</td><td> </td><td valign="top"><a href="#Pushback">Pushback</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-pushBackLength"><code>pushBackLength</code></a>:</td><td> </td><td valign="top"><a href="#Pushback">Pushback</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_letter-R">R</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002ecsv"><code>read.csv</code></a>:</td><td> </td><td valign="top"><a href="#Variations-on-read_002etable">Variations on read.table</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002ecsv-1"><code>read.csv</code></a>:</td><td> </td><td valign="top"><a href="#Reading-Excel-spreadsheets">Reading Excel spreadsheets</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002ecsv2"><code>read.csv2</code></a>:</td><td> </td><td valign="top"><a href="#Variations-on-read_002etable">Variations on read.table</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002edbf"><code>read.dbf</code></a>:</td><td> </td><td valign="top"><a href="#dBase-files-_0028DBF_0029">dBase files (DBF)</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002edelim"><code>read.delim</code></a>:</td><td> </td><td valign="top"><a href="#Variations-on-read_002etable">Variations on read.table</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002edelim-1"><code>read.delim</code></a>:</td><td> </td><td valign="top"><a href="#Reading-Excel-spreadsheets">Reading Excel spreadsheets</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002edelim2"><code>read.delim2</code></a>:</td><td> </td><td valign="top"><a href="#Variations-on-read_002etable">Variations on read.table</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002eDIF"><code>read.DIF</code></a>:</td><td> </td><td valign="top"><a href="#Data-Interchange-Format-_0028DIF_0029">Data Interchange Format (DIF)</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002eDIF-1"><code>read.DIF</code></a>:</td><td> </td><td valign="top"><a href="#Reading-Excel-spreadsheets">Reading Excel spreadsheets</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002edta"><code>read.dta</code></a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002eepiinfo"><code>read.epiinfo</code></a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002efortran"><code>read.fortran</code></a>:</td><td> </td><td valign="top"><a href="#Fixed_002dwidth_002dformat-files">Fixed-width-format files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002eftable"><code>read.ftable</code></a>:</td><td> </td><td valign="top"><a href="#Flat-contingency-tables">Flat contingency tables</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002efwf"><code>read.fwf</code></a>:</td><td> </td><td valign="top"><a href="#Fixed_002dwidth_002dformat-files">Fixed-width-format files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002emtp"><code>read.mtp</code></a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002eoctave"><code>read.octave</code></a>:</td><td> </td><td valign="top"><a href="#Octave">Octave</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002eS"><code>read.S</code></a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002esocket"><code>read.socket</code></a>:</td><td> </td><td valign="top"><a href="#Reading-from-sockets">Reading from sockets</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002espss"><code>read.spss</code></a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002esystat"><code>read.systat</code></a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002etable"><code>read.table</code></a>:</td><td> </td><td valign="top"><a href="#Variations-on-read_002etable">Variations on read.table</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002etable-1"><code>read.table</code></a>:</td><td> </td><td valign="top"><a href="#Input-from-connections">Input from connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002etable-2"><code>read.table</code></a>:</td><td> </td><td valign="top"><a href="#Reading-Excel-spreadsheets">Reading Excel spreadsheets</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002exls"><code>read.xls</code></a>:</td><td> </td><td valign="top"><a href="#Reading-Excel-spreadsheets">Reading Excel spreadsheets</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-read_002export"><code>read.xport</code></a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-readBin"><code>readBin</code></a>:</td><td> </td><td valign="top"><a href="#Binary-connections">Binary connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-readChar"><code>readChar</code></a>:</td><td> </td><td valign="top"><a href="#Binary-connections">Binary connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-readClipboard"><code>readClipboard</code></a>:</td><td> </td><td valign="top"><a href="#Reading-Excel-spreadsheets">Reading Excel spreadsheets</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-readLines"><code>readLines</code></a>:</td><td> </td><td valign="top"><a href="#Using-scan-directly">Using scan directly</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-readLines-1"><code>readLines</code></a>:</td><td> </td><td valign="top"><a href="#Input-from-connections">Input from connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-reshape"><code>reshape</code></a>:</td><td> </td><td valign="top"><a href="#Re_002dshaping-data">Re-shaping data</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-RExcelXML"><code>RExcelXML</code></a>:</td><td> </td><td valign="top"><a href="#Reading-Excel-spreadsheets">Reading Excel spreadsheets</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_letter-S">S</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-scan"><code>scan</code></a>:</td><td> </td><td valign="top"><a href="#Imports">Imports</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-scan-1"><code>scan</code></a>:</td><td> </td><td valign="top"><a href="#Using-scan-directly">Using scan directly</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-scan-2"><code>scan</code></a>:</td><td> </td><td valign="top"><a href="#Input-from-connections">Input from connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-seek"><code>seek</code></a>:</td><td> </td><td valign="top"><a href="#Listing-and-manipulating-connections">Listing and manipulating connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-showConnections"><code>showConnections</code></a>:</td><td> </td><td valign="top"><a href="#Listing-and-manipulating-connections">Listing and manipulating connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-sink"><code>sink</code></a>:</td><td> </td><td valign="top"><a href="#Export-to-text-files">Export to text files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-sink-1"><code>sink</code></a>:</td><td> </td><td valign="top"><a href="#Output-to-connections">Output to connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-socketConnection"><code>socketConnection</code></a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-sqlCopy"><code>sqlCopy</code></a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-sqlFetch"><code>sqlFetch</code></a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-sqlFetchMore"><code>sqlFetchMore</code></a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-sqlGetResults"><code>sqlGetResults</code></a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-sqlQuery"><code>sqlQuery</code></a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-sqlSave"><code>sqlSave</code></a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-sqlTables"><code>sqlTables</code></a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-stack"><code>stack</code></a>:</td><td> </td><td valign="top"><a href="#Re_002dshaping-data">Re-shaping data</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-stderr"><code>stderr</code></a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-stdin"><code>stdin</code></a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-stdout"><code>stdout</code></a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Sys_002elocaleconv"><code>Sys.localeconv</code></a>:</td><td> </td><td valign="top"><a href="#Variations-on-read_002etable">Variations on read.table</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_letter-T">T</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-textConnection"><code>textConnection</code></a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-truncate"><code>truncate</code></a>:</td><td> </td><td valign="top"><a href="#Listing-and-manipulating-connections">Listing and manipulating connections</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_letter-U">U</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-unstack_002e"><code>unstack.</code></a>:</td><td> </td><td valign="top"><a href="#Re_002dshaping-data">Re-shaping data</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-url"><code>url</code></a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_letter-W">W</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-write"><code>write</code></a>:</td><td> </td><td valign="top"><a href="#Export-to-text-files">Export to text files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-write-1"><code>write</code></a>:</td><td> </td><td valign="top"><a href="#Output-to-connections">Output to connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-write_002ecsv"><code>write.csv</code></a>:</td><td> </td><td valign="top"><a href="#Export-to-text-files">Export to text files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-write_002ecsv2"><code>write.csv2</code></a>:</td><td> </td><td valign="top"><a href="#Export-to-text-files">Export to text files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-write_002edbf"><code>write.dbf</code></a>:</td><td> </td><td valign="top"><a href="#dBase-files-_0028DBF_0029">dBase files (DBF)</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-write_002edta"><code>write.dta</code></a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-write_002eforeign"><code>write.foreign</code></a>:</td><td> </td><td valign="top"><a href="#Export-to-text-files">Export to text files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-write_002ematrix"><code>write.matrix</code></a>:</td><td> </td><td valign="top"><a href="#Export-to-text-files">Export to text files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-write_002esocket"><code>write.socket</code></a>:</td><td> </td><td valign="top"><a href="#Reading-from-sockets">Reading from sockets</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-write_002etable"><code>write.table</code></a>:</td><td> </td><td valign="top"><a href="#Export-to-text-files">Export to text files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-write_002etable-1"><code>write.table</code></a>:</td><td> </td><td valign="top"><a href="#Output-to-connections">Output to connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-writeBin"><code>writeBin</code></a>:</td><td> </td><td valign="top"><a href="#Binary-connections">Binary connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-writeChar"><code>writeChar</code></a>:</td><td> </td><td valign="top"><a href="#Binary-connections">Binary connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-writeLines"><code>writeLines</code></a>:</td><td> </td><td valign="top"><a href="#Output-to-connections">Output to connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-WriteXLS"><code>WriteXLS</code></a>:</td><td> </td><td valign="top"><a href="#Reading-Excel-spreadsheets">Reading Excel spreadsheets</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Function-and-variable-index_vr_letter-X">X</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-XLConnect"><code>XLConnect</code></a>:</td><td> </td><td valign="top"><a href="#Reading-Excel-spreadsheets">Reading Excel spreadsheets</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-xlsReadWrite"><code>xlsReadWrite</code></a>:</td><td> </td><td valign="top"><a href="#Reading-Excel-spreadsheets">Reading Excel spreadsheets</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-xlsx"><code>xlsx</code></a>:</td><td> </td><td valign="top"><a href="#Reading-Excel-spreadsheets">Reading Excel spreadsheets</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
</table>
|
||
<table summary=""><tr><th valign="top">Jump to: </th><td><a class="summary-letter" href="#Function-and-variable-index_vr_symbol-1"><b>.</b></a>
|
||
|
||
<br>
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-B"><b>B</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-C"><b>C</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-D"><b>D</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-F"><b>F</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-G"><b>G</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-H"><b>H</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-I"><b>I</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-M"><b>M</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-N"><b>N</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-O"><b>O</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-P"><b>P</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-R"><b>R</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-S"><b>S</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-T"><b>T</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-U"><b>U</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-W"><b>W</b></a>
|
||
|
||
<a class="summary-letter" href="#Function-and-variable-index_vr_letter-X"><b>X</b></a>
|
||
|
||
</td></tr></table>
|
||
|
||
<hr>
|
||
<a name="Concept-index"></a>
|
||
<div class="header">
|
||
<p>
|
||
Previous: <a href="#Function-and-variable-index" accesskey="p" rel="prev">Function and variable index</a>, Up: <a href="#Top" accesskey="u" rel="up">Top</a> [<a href="#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="#Function-and-variable-index" title="Index" rel="index">Index</a>]</p>
|
||
</div>
|
||
<a name="Concept-index-1"></a>
|
||
<h2 class="unnumbered">Concept index</h2>
|
||
|
||
<table summary=""><tr><th valign="top">Jump to: </th><td><a class="summary-letter" href="#Concept-index_cp_letter-A"><b>A</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-B"><b>B</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-C"><b>C</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-D"><b>D</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-E"><b>E</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-F"><b>F</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-H"><b>H</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-I"><b>I</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-L"><b>L</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-M"><b>M</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-N"><b>N</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-O"><b>O</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-P"><b>P</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-Q"><b>Q</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-R"><b>R</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-S"><b>S</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-T"><b>T</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-U"><b>U</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-X"><b>X</b></a>
|
||
|
||
</td></tr></table>
|
||
<table summary="" class="index-cp" border="0">
|
||
<tr><td></td><th align="left">Index Entry</th><td> </td><th align="left"> Section</th></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-A">A</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-awk">awk</a>:</td><td> </td><td valign="top"><a href="#Introduction">Introduction</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-B">B</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Binary-files">Binary files</a>:</td><td> </td><td valign="top"><a href="#Binary-files">Binary files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Binary-files-1">Binary files</a>:</td><td> </td><td valign="top"><a href="#Binary-connections">Binary connections</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-C">C</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-comma-separated-values">comma separated values</a>:</td><td> </td><td valign="top"><a href="#Export-to-text-files">Export to text files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Compressed-files">Compressed files</a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Connections">Connections</a>:</td><td> </td><td valign="top"><a href="#Connections">Connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Connections-1">Connections</a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Connections-2">Connections</a>:</td><td> </td><td valign="top"><a href="#Output-to-connections">Output to connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Connections-3">Connections</a>:</td><td> </td><td valign="top"><a href="#Listing-and-manipulating-connections">Listing and manipulating connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-CSV-files">CSV files</a>:</td><td> </td><td valign="top"><a href="#Export-to-text-files">Export to text files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-CSV-files-1">CSV files</a>:</td><td> </td><td valign="top"><a href="#Variations-on-read_002etable">Variations on read.table</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-D">D</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Data-Interchange-Format-_0028DIF_0029">Data Interchange Format (DIF)</a>:</td><td> </td><td valign="top"><a href="#Data-Interchange-Format-_0028DIF_0029">Data Interchange Format (DIF)</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Dbase">Dbase</a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-dBase">dBase</a>:</td><td> </td><td valign="top"><a href="#dBase-files-_0028DBF_0029">dBase files (DBF)</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-DBF-files">DBF files</a>:</td><td> </td><td valign="top"><a href="#dBase-files-_0028DBF_0029">dBase files (DBF)</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-DBMS">DBMS</a>:</td><td> </td><td valign="top"><a href="#Relational-databases">Relational databases</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-E">E</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Encodings">Encodings</a>:</td><td> </td><td valign="top"><a href="#Encodings">Encodings</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Encodings-1">Encodings</a>:</td><td> </td><td valign="top"><a href="#Export-to-text-files">Export to text files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-EpiData">EpiData</a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-EpiInfo">EpiInfo</a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Excel">Excel</a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Excel-1">Excel</a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Exporting-to-a-text-file">Exporting to a text file</a>:</td><td> </td><td valign="top"><a href="#Export-to-text-files">Export to text files</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-F">F</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-File-connections">File connections</a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Fixed_002dwidth_002dformat-files">Fixed-width-format files</a>:</td><td> </td><td valign="top"><a href="#Fixed_002dwidth_002dformat-files">Fixed-width-format files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Flat-contingency-tables">Flat contingency tables</a>:</td><td> </td><td valign="top"><a href="#Flat-contingency-tables">Flat contingency tables</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-H">H</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Hierarchical-Data-Format">Hierarchical Data Format</a>:</td><td> </td><td valign="top"><a href="#Binary-data-formats">Binary data formats</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-I">I</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Importing-from-other-statistical-systems">Importing from other statistical systems</a>:</td><td> </td><td valign="top"><a href="#Importing-from-other-statistical-systems">Importing from other statistical systems</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-L">L</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-locales">locales</a>:</td><td> </td><td valign="top"><a href="#Variations-on-read_002etable">Variations on read.table</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-M">M</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Minitab">Minitab</a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Missing-values">Missing values</a>:</td><td> </td><td valign="top"><a href="#Export-to-text-files">Export to text files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Missing-values-1">Missing values</a>:</td><td> </td><td valign="top"><a href="#Variations-on-read_002etable">Variations on read.table</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-MySQL-database-system">MySQL database system</a>:</td><td> </td><td valign="top"><a href="#DBI">DBI</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-MySQL-database-system-1">MySQL database system</a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-N">N</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-network-Common-Data-Form">network Common Data Form</a>:</td><td> </td><td valign="top"><a href="#Binary-data-formats">Binary data formats</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-O">O</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Octave">Octave</a>:</td><td> </td><td valign="top"><a href="#Octave">Octave</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-ODBC">ODBC</a>:</td><td> </td><td valign="top"><a href="#Overview-of-RDBMSs">Overview of RDBMSs</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-ODBC-1">ODBC</a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Open-Database-Connectivity">Open Database Connectivity</a>:</td><td> </td><td valign="top"><a href="#Overview-of-RDBMSs">Overview of RDBMSs</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Open-Database-Connectivity-1">Open Database Connectivity</a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-P">P</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-perl">perl</a>:</td><td> </td><td valign="top"><a href="#Introduction">Introduction</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-perl-1">perl</a>:</td><td> </td><td valign="top"><a href="#Fixed_002dwidth_002dformat-files">Fixed-width-format files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Pipe-connections">Pipe connections</a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-PostgreSQL-database-system">PostgreSQL database system</a>:</td><td> </td><td valign="top"><a href="#RODBC">RODBC</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Pushback-on-a-connection">Pushback on a connection</a>:</td><td> </td><td valign="top"><a href="#Pushback">Pushback</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-Q">Q</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Quoting-strings">Quoting strings</a>:</td><td> </td><td valign="top"><a href="#Export-to-text-files">Export to text files</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Quoting-strings-1">Quoting strings</a>:</td><td> </td><td valign="top"><a href="#Variations-on-read_002etable">Variations on read.table</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-R">R</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Re_002dshaping-data">Re-shaping data</a>:</td><td> </td><td valign="top"><a href="#Re_002dshaping-data">Re-shaping data</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Relational-databases">Relational databases</a>:</td><td> </td><td valign="top"><a href="#Relational-databases">Relational databases</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-S">S</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-S_002dPLUS">S-PLUS</a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-SAS">SAS</a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Sockets">Sockets</a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Sockets-1">Sockets</a>:</td><td> </td><td valign="top"><a href="#Reading-from-sockets">Reading from sockets</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Spreadsheet_002dlike-data">Spreadsheet-like data</a>:</td><td> </td><td valign="top"><a href="#Spreadsheet_002dlike-data">Spreadsheet-like data</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-SPSS">SPSS</a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-SPSS-Data-Entry">SPSS Data Entry</a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-SQL-queries">SQL queries</a>:</td><td> </td><td valign="top"><a href="#SQL-queries">SQL queries</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Stata">Stata</a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Systat">Systat</a>:</td><td> </td><td valign="top"><a href="#EpiInfo-Minitab-SAS-S_002dPLUS-SPSS-Stata-Systat">EpiInfo Minitab SAS S-PLUS SPSS Stata Systat</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-T">T</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Terminal-connections">Terminal connections</a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Text-connections">Text connections</a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-U">U</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-Unix-tools">Unix tools</a>:</td><td> </td><td valign="top"><a href="#Introduction">Introduction</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-URL-connections">URL connections</a>:</td><td> </td><td valign="top"><a href="#Types-of-connections">Types of connections</a></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-URL-connections-1">URL connections</a>:</td><td> </td><td valign="top"><a href="#Input-from-connections">Input from connections</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
<tr><th><a name="Concept-index_cp_letter-X">X</a></th><td></td><td></td></tr>
|
||
<tr><td></td><td valign="top"><a href="#index-XML">XML</a>:</td><td> </td><td valign="top"><a href="#XML">XML</a></td></tr>
|
||
<tr><td colspan="4"> <hr></td></tr>
|
||
</table>
|
||
<table summary=""><tr><th valign="top">Jump to: </th><td><a class="summary-letter" href="#Concept-index_cp_letter-A"><b>A</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-B"><b>B</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-C"><b>C</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-D"><b>D</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-E"><b>E</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-F"><b>F</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-H"><b>H</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-I"><b>I</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-L"><b>L</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-M"><b>M</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-N"><b>N</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-O"><b>O</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-P"><b>P</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-Q"><b>Q</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-R"><b>R</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-S"><b>S</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-T"><b>T</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-U"><b>U</b></a>
|
||
|
||
<a class="summary-letter" href="#Concept-index_cp_letter-X"><b>X</b></a>
|
||
|
||
</td></tr></table>
|
||
|
||
|
||
<div class="footnote">
|
||
<hr>
|
||
<h4 class="footnotes-heading">Footnotes</h4>
|
||
|
||
<h3><a name="FOOT1" href="#DOCF1">(1)</a></h3>
|
||
<p>the
|
||
distinction is subtle,
|
||
<a href="https://en.wikipedia.org/wiki/UTF-16/UCS-2">https://en.wikipedia.org/wiki/UTF-16/UCS-2</a>, and the use of
|
||
surrogate pairs is very rare.</p>
|
||
<h3><a name="FOOT2" href="#DOCF2">(2)</a></h3>
|
||
<p>Even then,
|
||
Windows applications may expect a Byte Order Mark which the
|
||
implementation of <code>iconv</code> used by R may or may not add depending
|
||
on the platform.</p>
|
||
<h3><a name="FOOT3" href="#DOCF3">(3)</a></h3>
|
||
<p>This is normally
|
||
fast as looking at the first entry rules out most of the possibilities.</p>
|
||
<h3><a name="FOOT4" href="#DOCF4">(4)</a></h3>
|
||
<p>and forks, notably MariaDB.</p>
|
||
</div>
|
||
<hr>
|
||
|
||
|
||
|
||
</body>
|
||
</html>
|