The Rfssa R package implementation of the functional singular spectrum analysis (FSSA) algorithm begins with the computer representation of functional time series (FTS) which may be univariate or multivariate. Furthermore, the variables of the FTS may be observed over one or two-dimensional domains allowing for analysis of functional curves or images or both jointly. We refer the reader to Haghbin et al. (2021) for information on FTS and the FSSA algorithm, we refer to Trinka, Haghbin, and Maadooliat (2022) for information on multivariate FTS and multivariate FSSA, and we refer to Trinka et al. ({in press}) for more information on the univariate FSSA-based forecasting techniques.

We use an S3 object of class funts to implement the computer representation and we note that the constructor of such objects leverages a variety of techniques that may be used to define a funts object. We also offer custom validity checking to help guide the user through the different methods used to create funts objects. We illustrate the use of the constructor by way of call center data from a bank and satellite images of vegetation near the Saint Mary, Montana coupled with intraday temperature curves observed from a weather station near the same location. The data is hosted on GitHub which can be accessed using the Rfssa R package.

## Load packages ##
require(Rfssa)
require(fda)

The call center data is a collection of 365 functions (curves) observed between January 1, 1999 and December 31, 1999 where the domain of each function is intraday time between 12:00 AM and 11:59 PM and the range of each function is the square root of the number of calls to a call center. The data is observed in a 240 by 365 matrix where the first dimension captures the number of sampling points for each function and the second dimension captures the length of the FTS. The following code preprocesses the data and constructs the funts object.

## Call center data ##
loadCallcenterData()
D <- matrix(sqrt(callcenter$calls), nrow = 240)
bs1 <- create.bspline.basis(c(0, 23), 22)
u <- seq(0, 23, len = nrow(D))
Y <- funts(X = D, basisobj = bs1, start = as.Date("1999-1-1"),
           vnames = "Sqrt of Call Numbers",
           dnames = "Time (6 minutes aggregated)",
           tname = "Date" )
print(Y)
## 
## Functional time series (funts) object:
## Number of variables:  1
## Lenght:  365
## Start:  10592
## End:  10956
## Time:  Date[1:365], format: "1999-01-01" "1999-01-02" "1999-01-03" "1999-01-04" "1999-01-05" ...

Now that the funts object has been defined, we may visualize the FTS using the corresponding plotting functions. The plotly_funts plotting function is a wrapper of plotly codes which has been specially designed for FTS data. The following code shows various plots of the call center funts object.

## Line plot ##
plotly_funts(Y,
             main = "Callcenter Data",
             xlab = "Time (6 minutes aggregated)",
             ylab = "Sqrt of Call Numbers", type = "line",xticklabels = list(c("00:00","06:00","12:00","18:00","24:00")),xticklocs =
 list(c(1,60,120,180,240))
)

## Heatmap plot ##
plotly_funts(Y,
             main = "Callcenter Data",
             xlab = "Time (6 minutes aggregated)",
             ylab = "Sqrt of Call Numbers", type = "heatmap",,xticklabels = list(c("00:00","06:00","12:00","18:00","24:00")),xticklocs =
 list(c(1,60,120,180,240))
)

## 3Dsurface plot ##
plotly_funts(Y,
             main = "Callcenter Data",
             xlab = "Time Point",
             ylab = "Sqrt of Call Numbers", type = "3Dsurface"
)

## 3Dline plot ##
plotly_funts(Y,
             main = "Callcenter Data",
             xlab = "Time Point",
             ylab = "Sqrt of Call Numbers", type = "3Dline"
)

For the multivariate, different dimensional domains example, we consider satellite images of vegetation as well as intraday temperature curves observed from the same region on the same days. The images of vegetation are gathered from the MODIS Terra satellite where each image may be viewed as a function observed over a two-dimensional domain of longitude and latitude whose range is a normalized difference vegetation index (NDVI) value. We note that the NDVI value at each sampling point (pixel value) is bounded between zero and one where NDVI values closer to zero indicate less vegetation is present while values closer to one indicate more vegetation. Each image is represented by a 33 by 33 matrix. Each intraday temperature curve is represented by a vector of length 24 (one sampling point each hour of each day) and is standardized by dividing each sampling point by the standard deviation of all the sampling points. The data consists of 133 multivariate observations (133 images and 133 intraday temperature curves), recorded in 16 day intervals, starting in January 1, 2008 and ending in September 30, 2013. The following code temporarily loads the data from GitHub and defines the multivariate funts object.

## NDVI images and intraday temperature curves from Saint Mary, Montana, USA ##
loadMontanaData()
Temp <- montana$Temp
NDVI <- montana$NDVI
Montana_Data <- list(Temp / sd(Temp), NDVI)
bs1 <- create.bspline.basis(c(0, 23), 11)
bs2 <- create.bspline.basis(c(1, 33), 13)
bs2d <- list(bs2, bs2)
bsmv <- list(bs1, bs2d)
Y <- funts(
  X = Montana_Data, basisobj = bsmv,
  start = as.Date("2008-01-01"),
  end = as.Date("2013-09-30")
)
print(Y)
## 
## Functional time series (funts) object:
## Number of variables:  2
## Lenght:  133
## Start:  13879
## End:  15978
## Time:  Date[1:133], format: "2008-01-01" "2008-01-16" "2008-02-01" "2008-02-17" "2008-03-04" ...

Visualization of funts objects that are multivariate in nature is carried out using the same plotting function. In the following, we plot the first 100 observations of the Montana FTS variables.

## Plotting method ##
plotly_funts(Y[1:100],
             xlab = c("Time", "Longitude"),
             ylab = c("Normalized Temperature (\u00B0C)", "Latitude"),
             zlab = c("", "NDVI"),
             main = c("Temperature Curves", "NDVI Images"),
             color_palette = "RdYlGn",
             xticklabels = list(
               c("00:00", "06:00", "12:00", "18:00", "24:00"),
               c("113.40\u00B0 W", "113.30\u00B0 W")
             ),
             xticklocs = list(c(1, 6, 12, 18, 24), c(1, 33)),
             yticklabels = list(NA, c("48.70\u00B0 N", "48.77\u00B0 N")),
             yticklocs = list(NA, c(1, 33))
)

Oftentimes, the visualization method may be used to get an idea of what modes of variation are present in the data such as trend, periodic, and mean behaviors. In the FSSA decomposition routine we choose a window length parameter known as L using a variety of techniques and we extract the aforementioned signals. We note that there exists several methods for choosing L such as using domain-specific knowledge.

Haghbin, Hossein, Seyed Morteza Najibi, Rahim Mahmoudvand, Jordan Trinka, and Mehdi Maadooliat. 2021. “Functional Singular Spectrum Analysis.” Stat, e330. https://doi.org/10.1002/sta4.330.
Trinka, Jordan, Hossein Haghbin, and Mehdi Maadooliat. 2022. “Multivariate Functional Singular Spectrum Analysis: A Nonparametric Approach for Analyzing Multivariate Functional Time Series.” In Innovations in Multivariate Statistical Modelling: Navigating Theoretical and Multidisciplinary Domains, edited by A. Bekker, J. Ferreira, M. Arashi, and D. Chen. Emerging Topics in Statistics and Biostatistics. Springer Cham.
Trinka, Jordan, Hossein Haghbin, Han Lin Shang, and Mehdi Maadooliat. {in press}. “Functional Time Series Forecasting: Functional Singular Spectrum Analysis Approaches.” Stat, {in press}, e621. https://doi.org/10.1002/sta4.621.