Description
This thesis addresses two approaches to describe the complexity of US climate data. The basis for our analyses is provided by a partition of the contiguous United States into 120 grid boxes of size 2.5° ×3.5° by latitude and longitude, respectively. The first part deals with the determination of spatial degrees of freedom (DOF) for monthly United States Historical Climatology Network, version 2 (USHCN V2) temperature and precipitation data from 1961 to 1990. The DOF can be interpreted as the effective number of statistically independent spatial locations of a climate field. The __ and S methods are used to estimate the DOFs for each month and four different data sets. Estimates from the S method suggest that around two to six grid boxes suffice to explain temperature data while precipitation data is more complex and requires 6 to 30 grid boxes. The __ method apparently underestimates the DOF in most cases but agrees with the S method in that climate in the summer months is affected by more independent influences than in the winter months. Based on the results from the S method three different methods are used to find the supposedly most independent subsets of grid boxes. Each method computes a value of interest for all underlying subsets. The __ and S DOF maximization methods find the subset with the highest DOF value from the respective DOF estimation method if only data from the considered subset is used for estimation. The correlation minimization method minimizes a so-called correlation value. By visual judgment most of the determined subsets depict a fairly uniform distribution. The second part starts by discussing theory on the Akaike information criterion (AIC) and Bayesian information criterion (BIC) and gives an example of their application to a linear regression model. Subsequently an optimal average method is introduced which serves to determine optimal weights for empirical orthogonal functions (EOF) in the approximation of mean squared errors (MSE) between theoretical values of temperature anomaly averages and a discrete sum to compute them in practice using the partition of the contiguous US into grid boxes. The AIC and BIC criteria are applied to MSEs computed for USHCN V2 temperature data from 1897 to 2010 in an attempt to determine optimal modes, i.e. cutoff points for the MSE approximation. Based on the results an inconsistency in the theory is discovered: the idea of MSEs considered in the optimal average method doesn't correspond to MSEs in a regression model. Thus AIC and BIC are not appropriate to determine the optimal modes.