1. Introduction
Multivariate count time series are prevalent in modern statistical analysis, and they rarely exhibit independence between the series. Discretization of time series data has been proposed in data mining by authors such as Chaudhari et al. (2014) [
1], Marquez-Grajales et al. (2020) [
2], and Mordvanyuk et al. (2022) [
3]. Our objective is to propose a modeling of the discrete data with temporal dependence. Pair copula construction has gained popularity, as it offers flexibility and useful advantages in the learning of the joint distribution of time series data. As we know, Pearson’s correlation is based on the normal shape of the marginal distributions. Stretching and reshaping the marginal distributions will give different answers. Categorizing the values as it is usually done in climate research (e.g., saying the temperature is in the 80s) creates challenges that can be overcome with the copula-based modeling. The advantages of using the copula also include the distribution assumptions, the time dependence, the mixture of the types of marginal distributions, and the addition of covariates. Further, there are new connections or time markers, that may have been missed in how observations are linked, that can be discovered. In this paper, we propose a comparative study of the climate data under the following two different cases:
These models are all part of a multivariate time series which offers flexible auto-correlation structures when the data are described as discrete counts.
Weather describes the short-term fluctuations of temperature, dew point, humidity, wind speed and direction, precipitation, atmospheric pressure, and other meteorological variables at a given location. Climate, on the other hand, is the long-term average variation of these meteorological variables at a location. By way of example, when we want to decide what clothes to put on each day, we can look at the weather, and what we stock up in our closet will probably depend on the climate of the place. The day-to-day activities of humans continue to change the make up of the earth’s atmosphere. The causes of climate and weather change are air pollution, deforestation, and increasing energy demand for heating and cooling, to name a few. Some other natural processes, such as volcanic eruptions, add to the increase in greenhouse gases in the atmosphere. When these atmospheric variables change quickly and negatively, they can impact many elements of human activity. In recent years, extreme weather conditions have displaced many people and exacerbated the factors driving people into poverty. These unfavorable weather events also increase health problems, as droughts and destructive storms cause food growth problems. It is essential to continue to explore more statistical methods, ranging from data collection to model development, to explain the variability of weather and climate data and assess future weather conditions so that people can be better prepared for extreme weather events. Advanced technology makes it possible to collect data for many weather-related factors. Thus, new approaches to visualizing and analyzing any relationship between these variables are needed as multivariate climate data sets become more widely available. A range of filter colors and patterns are available in modern data visualizations, which are intended to help users interactively visualize graphs and increase the forecasting of future trends under changing environmental conditions. Since there are also possible subjective interpretations of these graphs, statistical climate models based on the mathematical representations of the atmospheric variables remain a more dependable approach to obtaining information about current and future weather and climate states.
Quite a few data visualization methods have been developed to explore the inter-variable relations of these atmospheric properties. Teuling et al. [
4] present a methodology which is an extension of that of [
5] for describing the inter-variable relations of atmospheric properties. Their method is based on the properties of common color schemes to plot two variables in a single color map using a two-dimensional color legend for both sequential and diverging data. Concerning climate models, Agrawal [
6] investigates the effectiveness of copula models in estimating and predicting climate extremes. Their study examines the bivariate distributions of temperature–humidity, temperature–wind speed, and wind speed–humidity in Boulder County, Colorado. The study bootstraps simulated data from a climate model and examines the accuracy of extreme event probability predictions when data are of different lengths and internal variability for the different copula functions. The analysis results reveal lower bias and variance for longer data records than for shorter data records when estimating the true probability of extreme compound events. Li et al. [
7], in their paper, utilize three Archimedean copula models (the Clayton, Frank, and Gumbel copulas) to compare measured wave data to simulated wave climate data at a wave energy converter test site. In assessing the goodness of fit of the three models using
, the study finds that Gumbel’s copula performs better compared to the other two copulas. Lee et al. [
8] apply the Clayton, Frank, Gumbel, and Gaussian copula functions to analyze the joint frequency of drought intensity and duration. They examine the performance of these copulas and find that the Frank and Gumbel copulas outperform the Clayton copula in the drought bivariate frequency analysis. The impacts of climate change and human activity have led other natural scientists to develop non-stationary multivariate analysis techniques to model these environmental changes (Li et al. [
9], see also Yin et al. [
10]).
The two most often discussed atmospheric properties that are connected to living situations are temperature and humidity. Temperature is the degree of warmth or coldness measured on a definite temperature scale using thermometers. Though the degree Celsius (°C) scale and Kelvin (K) scale are used for temperature measuring purposes, the Fahrenheit (°F) temperature scale is used by the United States and very few other countries. Relative humidity, typically expressed as a percentage, measures the amount of water vapor in the air relative to its capacity to hold it at a given temperature. We find that the temperature–relative humidity relationship is inversely proportional since relative humidity decreases as temperature rises and vice versa. Thus, temperature relates to the amount of moisture the atmosphere is able to hold. The way these variables interact affects human health and well-being as well as the weather. Barma et al. [
11] utilize one-parameter bivariate Archimedean copulas to assess the conditional probability of the number of COVID-19 cases given the mean daily temperature and relative humidity.
The literature has shown that climate data at different time points are correlated. These environmental phenomena exhibit both serial and cross-correlations that add complexities in the model specification and inference. Research in the case of negative correlation is still ongoing. In the bivariate case, the joint distribution is typically assumed to be normally distributed, but that assumption is easily violated because of changes in behavior or climate. Integrating over a discrete set requires special treatment after description. The bivariate copula offers flexibility about the underlying distributions. Moreover, when counts are used in the modeling, the theoretical and sample results lack consistency. The main reason is that the count data are modeled under the assumption of a conditional discrete distribution, instead of a marginal distribution.
Here, we use data from Mobile Regional Airport in Alabama, where the temperatures are measured over a 14-month period with thermal gradients. Data on humidity for the same period are also recorded. George et al. [
12], in their research, establish that the temperature and rain amount are related. In their paper, they present linear regression models in describing bivariate relationships between the two meteorological variables. One can also regress temperature and humidity, but it is not obvious to define either variable as a response or predictor variable. Additionally, even though we know that these two variables are related, the relationship may not always be linear. To gain in efficiency, we propose to capture the dependence between these variables, in a multivariate distribution format with discrete classification. To circumvent limitations due to the normality assumption, or the fact that the data exhibit many same values, we investigate the relationship by applying copula-based time series models.
Our analysis differs twofold from the above analyses. We do not classify the temperature as high, low, or medium. We convert both humidity and temperature variables into a discrete scale by partitioning them in their associated time intervals. The copula-based approach will derive the relationship in a general framework.
The paper is organized as follows. Motivation of the bivariate time series data is given in
Section 2, with a review of the correlation structure. Model construction in each of the cases and inference (under maximum likelihood estimation) are provided in
Section 3 and
Section 4, respectively. The simulations and data application are shown in
Section 5 and
Section 6, respectively, followed by a discussion and conclusion in
Section 7.
4. Inference
Parameter estimation has been conducted by maximizing the likelihood function. The log-likelihood function is constructed using copula theory. However, such a function has no closed form, so its maximization does not follow the standard theory [
16]. The maximization technique used is presented next.
Using the conditional density function mentioned in Equation (
3) for
, the joint distribution of
and
is given by
and for
, the conditional bivariate distribution of
and
given
and
is given by
Hence, combining Equations (
4) and (
5), the likelihood function is given by
where
; here,
is the vector of marginal parameters, and
and
are the serial dependence parameters to deal with the first and second time series, respectively. The bivariate dependence between the two time series is captured by
. Therefore, taking the log of the function in (
6), we can construct the log-likelihood function as follows:
Maximizing the log-likelihood function in (
7) provides ML estimates for the proposed class of model. However, within the log-likelihood function, there exists a bivariate normal integral function that does not have a closed form as shown in (
3). Hence, we evaluate the bivariate integral function using the standard randomized importance sampling method presented by Genz and Bretz [
17]. This method has been proven to be effective with dimensions less than ten. Hothorn et al. [
18] implement this procedure in a package mvtnorm, available at CRAN. The package consists of a function, pmvnorm, for the computation of multivariate normal probabilities. Then, the parameter estimates, i.e.,
, can be obtained as
This maximization technique produces a numerically calculated Hessian matrix that provides the Fisher’s information matrix (FIM) as shown by Silva and Diniz [
19]. Using the inverse of the FIM yields standard errors of the ML estimates of
. In the next section, we evaluate the effectiveness of the proposed class of models through a comprehensive simulation study.
5. Simulation Studies
A comprehensive simulation study is conducted to evaluate the proposed estimation method and validate the asymptotic properties of the parameter estimates. We first consider the bivariate Poisson count time series data. For each univariate time series, we consider a first-order stationary copula-based Markov model, where a copula family is used for the joint distribution of consecutive observations, and then we couple these two time series using a bivariate copula function at each time point. Here,
and
denote the means of two marginal distributions;
and
measure the serial dependence within each time series; and
measures the cross-correlation between the two time series. The Gaussian copula is selected as the candidate copula family with true parameters (
= 4,
= 6,
=
,
=
,
=
). Assuming the process is stationary, the marginal distributions’ parameters
is set to be constant across time. Simulations are performed using sample sizes of 100, 500, and 1500 while replicating them 1000 times. For each of the above five parameter estimates, the standard error (
SE), mean square error (
MSE), and mean absolute error (
MAE) are calculated, and the results are displayed in
Table 2. The
SE is the standard deviation of the estimates over 1000 replications. The
MSE measures the average squared difference between the estimated values and the actual value, while
MAE measures the average absolute difference between the estimated values and the actual value. Mathematically, we can define the
SE,
MSE and
MAE as given below:
where
is the estimated value of the parameter and
m is the number of replications. We conduct another simulation setting using Gaussian copula as the candidate copula family with true parameters (
= 3,
= 5,
=
,
=
,
=
). In this simulation setting, we consider the negative cross-correlation between two time series. The corresponding simulation results are displayed in
Table 3.
Table 2 and
Table 3 illustrate that the parameter estimates are converging to true values, and the standard error decreases as the sample size increases. The results show that the estimates become more and more robust as the sample size increase.
Figure 1 and
Figure 2 show the quantile plots of the estimated parameters. They are approximately normally distributed.
6. Real-Data Application
We apply our proposed bivariate model to analyze bivariate weather data acquired from Mobile Regional Airport Station in Alabama at
https://www.wunderground.com/history/monthly/us/al/mobile/KMOB, accessed on 6 April 2023. The data consist of average daily temperature and humidity values from January 2022 and ending in February 2023. These averages are computed from daily 15 min interval readings. The daily temperature averages are given on the degree Fahrenheit scale
F), while the daily humidity average is expressed as a percent. Both variables are on a continuous scale and converted to discrete count data. The data conversion is considered to fit counts that are defined by levels ranging from 0 to 5 as shown in
Table 4. We categorize the data in this way because we usually define values like that on any given day or time to fall into one of those groups. Erhardt et al. [
20] considered a similar transformation of the temperature data into copula data in their paper.
Figure 3 shows the daily level of the humidity and temperature for the first three months of 2022, from January to March.
Figure 4 shows the relationship between temperature and humidity for the weather data. There appears to be a relationship between the temperature and humidity for the given months but, apparently, the relationship is not linear. In
Figure 5, we observe that the distributions of temperature and humidity are relatively the same except when the two atmospheric variables are at levels zero and five.
Table 5 represents the parameter estimates for the fitted univariate model for temperature and humidity, choosing the Gaussian copula as the candidate copula family with a Poisson marginal distribution. Here,
and
denote the means of two marginal distributions;
and
measure the serial dependence within each time series for temperature and humidity, respectively.
These results suggest that the estimated Poisson mean temperature and humidity are around the 50s in degrees Fahrenheit.
Table 6 represents the parameter estimates for the fitted bivariate model choosing Gaussian copula as the candidate copula family with Poisson marginals.
Table 6 also displays the standard errors associated with the parameter estimates for the bivariate copula model fitted using the Gaussian copula as the chosen copula family. Notably, both marginal and copula parameter estimates exhibit robust standard errors but more reliable parameter estimates. The temperature range around 60 degrees Fahrenheit holds greater relevance than the temperature around the 50s in the city of Mobile, AL. The same can be said for the city’s humidity. Hence, this alternate structure better underscores the joint relationship indicator that was ignored. The inclusion of other pertinent variables would enhance the climate and ecosystems models, making it more compelling for preparedness efforts.