By default, and in order to avoid having to pay
marshalling/unmarshalling costs for each argument every time one
invokes an internal R function, we represent R values in exactly the
same way R does, as a pointer to a SEXPREC
structure (defined in
R/Rinternals.h
). This choice has a downside, however: Haskell’s
pattern matching facilities are not immediately available, since only
algebraic datatypes can be pattern matched.
HExp
is R’s SEXP
(or *SEXPREC
) structure represented as
a (generalized) algebraic datatype. A simplified definition of HExp
would go along the lines of:
data HExp
= Nil -- NILSXP
| Symbol { ... } -- SYMSXP
| Real { ... } -- REALSXP
| ...
We define one constructor for each value of the SEXPTYPE
enumeration
in <RInternals.h>
.
For the sake of efficiency, we do not use HExp
as the basic
datatype that all inline-r
generated code expects. That is, we do
not use HExp
as the universe of R expressions, merely as a view.
We introduce the following view function to locally convert to
a HExp
, given a SEXP
from R.
hexp :: SEXP s -> HExp
The fact that this conversion is local is crucial for good performance
of the translated code. It means that conversion happens at each use
site, and happens against values with a statically known form. Thus we
expect that the view function can usually be inlined, and the
short-lived HExp
values that it creates compiled away by code
simplification rules applied by GHC. In this manner, we get the
convenience of pattern matching that comes with a bona fide
algebraic datatype, but without paying the penalty of allocating
long-lived data structures that need to be converted to and from
R internals every time we invoke internal R functions or C extension
functions.
Using an algebraic datatype for viewing R internal functions further has the advantage that invariants about these structures can readily be checked and enforced, including invariants that R itself does not check for (e.g. that types that are special forms of the list type really do have the right number of elements). The algebraic type statically guarantees that no ill-formed type will ever be constructed on the Haskell side and passed to R.
We also define an inverse of the view function:
unhexp :: HExp -> SEXP
In reality, inline-r
defines HExp
in a slightly more elaborate
way. Most R functions expect their inputs to have certain
predetermined forms. For example, the +
function expects that its
arguments be of some numeric type. A runtime error will occur when
this is not the case. Likewise, append
expects its first argument to
be a vector, and its last argument to be a subscript. These form
restrictions are documented in a systematic way in each function’s
manual page. While R itself, nor its implementation, make any attempt
to enforce these restrictions statically, Haskell’s type system is
rich enough to allow us to do so.
For this reason, inline-r
allows the SEXP
and HExp
types to be
indexed by the form of the expression. For example, a value which is
known to be a real number can be given the type SEXP s R.Real
. In
general, one does not always know a priori the form of an
R expression, but pattern matching on an algebraic view of the
expression allows us to “discover” the form at runtime. In inline-r
,
we define the HExp
algebraic view type as
a
generalized algebraic datatype
(GADT). In this way, the body of each branch can be typed under the
assumption that the scrutinee matches the pattern in the left hand
side of the branch. For example, in the body of a branch with pattern
Real x
, the type checker can refine the type of the scrutinee to
SEXP s R.Real
. In inline-r
, HExp
is defined as follows:
data HExp s (a :: SEXPTYPE) where
Nil :: HExp R.Nil
-- Fields: pname, value, internal.
Symbol :: SEXP s R.Char
-> SEXP s a
-> Maybe (SEXP s b)
-> HExp R.Symbol
Int :: {-# UNPACK #-} !(Vector.Vector R.Int Int32)
-> HExp R.Int
Real :: {-# UNPACK #-} !(Vector.Vector R.Real Double)
...
See the Haddock generated documentation for the Language.R.HExp
module for the full definition.
In the above, notice that the Symbol
constructor produces a value of
type HExp R.Symbol
, while the Real
constructor produces a value of
type HExp R.Real
. In other words, the type index reflects the
constructor of each variant, which itself is a function of the form of
a SEXP
. For safety and clarity, we preclude indexing SEXP
and
HExp
with any Haskell type (which are all usually of kind *
). We
use GHC’s DataKinds
extension to introduce a new kind of types,
named SEXPTYPE
, and limit the possible type indexes to types that
have kind SEXPTYPE
. Version 7.4 of GHC and later feature the
DataKinds
extension to permit defining SEXPTYPE
as a regular
algebraic datatype and then allowing SEXPTYPE
to be considered as
a kind and the constructors of this type to be considered types of the
SEXPTYPE
kind, depending on context. See the relevant
section
in GHC user’s guide for more information.