Spline regression¶
Patsy offers a set of specific stateful transforms (for more details about stateful transforms see Stateful transforms) that you can use in formulas to generate splines bases and express non-linear fits.
General B-splines¶
B-spline bases can be generated with the bs()
stateful
transform. The spline bases returned by bs()
are designed to be
compatible with those produced by the R bs
function.
The following code illustrates a typical basis and the resulting spline:
In [1]: import matplotlib.pyplot as plt
In [2]: plt.title("B-spline basis example (degree=3)");
In [3]: x = np.linspace(0., 1., 100)
In [4]: y = dmatrix("bs(x, df=6, degree=3, include_intercept=True) - 1", {"x": x})
# Define some coefficients
In [5]: b = np.array([1.3, 0.6, 0.9, 0.4, 1.6, 0.7])
# Plot B-spline basis functions (colored curves) each multiplied by its coeff
In [6]: plt.plot(x, y*b);
# Plot the spline itself (sum of the basis functions, thick black curve)
In [7]: plt.plot(x, np.dot(y, b), color='k', linewidth=3);

In the following example we first set up our B-spline basis using some data and then make predictions on a new set of data:
In [8]: data = {"x": np.linspace(0., 1., 100)}
In [9]: design_matrix = dmatrix("bs(x, df=4)", data)
In [10]: new_data = {"x": [0.1, 0.25, 0.9]}
In [11]: build_design_matrices([design_matrix.design_info], new_data)[0]
Out[11]:
DesignMatrix with shape (3, 5)
Intercept bs(x, df=4)[0] bs(x, df=4)[1] bs(x, df=4)[2] bs(x, df=4)[3]
1 0.43400 0.052 0.00200 0.000
1 0.59375 0.250 0.03125 0.000
1 0.00200 0.052 0.43400 0.512
Terms:
'Intercept' (column 0)
'bs(x, df=4)' (columns 1:5)
bs()
can produce B-spline bases of arbitrary degrees – e.g.,
degree=0
will give produce piecewise-constant functions,
degree=1
will produce piecewise-linear functions, and the default
degree=3
produces cubic splines. The next section describes more
specialized functions for producing different types of cubic splines.
Natural and cyclic cubic regression splines¶
Natural and cyclic cubic regression splines are provided through the stateful
transforms cr()
and cc()
respectively. Here the spline is
parameterized directly using its values at the knots. These splines were designed
to be compatible with those found in the R package
mgcv
(these are called cr, cs and cc in the context of mgcv), but
can be used with any model.
Warning
Note that the compatibility with mgcv applies only to the generation of spline bases: we do not implement any kind of mgcv-compatible penalized fitting process. Thus these spline bases can be used to precisely reproduce predictions from a model previously fitted with mgcv, or to serve as building blocks for other regression models (like OLS).
Here are some illustrations of typical natural and cyclic spline bases:
In [12]: plt.title("Natural cubic regression spline basis example");
In [13]: y = dmatrix("cr(x, df=6) - 1", {"x": x})
# Plot natural cubic regression spline basis functions (colored curves) each multiplied by its coeff
In [14]: plt.plot(x, y*b);
# Plot the spline itself (sum of the basis functions, thick black curve)
In [15]: plt.plot(x, np.dot(y, b), color='k', linewidth=3);

In [16]: plt.title("Cyclic cubic regression spline basis example");
In [17]: y = dmatrix("cc(x, df=6) - 1", {"x": x})
# Plot cyclic cubic regression spline basis functions (colored curves) each multiplied by its coeff
In [18]: plt.plot(x, y*b);
# Plot the spline itself (sum of the basis functions, thick black curve)
In [19]: plt.plot(x, np.dot(y, b), color='k', linewidth=3);

In the following example we first set up our spline basis using same data as for the B-spline example above and then make predictions on a new set of data:
In [20]: design_matrix = dmatrix("cr(x, df=4, constraints='center')", data)
In [21]: new_design_matrix = build_design_matrices([design_matrix.design_info], new_data)[0]
In [22]: new_design_matrix
Out[22]:
DesignMatrix with shape (3, 5)
Columns:
['Intercept',
"cr(x, df=4, constraints='center')[0]",
"cr(x, df=4, constraints='center')[1]",
"cr(x, df=4, constraints='center')[2]",
"cr(x, df=4, constraints='center')[3]"]
Terms:
'Intercept' (column 0)
"cr(x, df=4, constraints='center')" (columns 1:5)
(to view full data, use np.asarray(this_obj))
In [23]: np.asarray(new_design_matrix)