## Description

The flow duration curve (FDC) is one of the most widely used tools for displaying streamflow data, and percentile flows derived from the FDC provide essential information for managing rivers. These statistics are generally not available since most basins are ungauged. Percentile flows are frequently predicted using regression models developed using streamflow and ancillary data from gauged basins. Many potential independent variables are now available to predict percentile flows due to the ready availability of spatially distributed physical and climatic data for basins. A subset of the variables is often selected using automated regression procedures, but these procedures only evaluate a portion of the possible variable combinations. Other approaches for exploiting the information from physical and climatic data may produce stronger models for predicting percentile flows. The overarching hypothesis guiding this dissertation research was that more extensive approaches for extracting information from large sets of independent variables may improve percentile flow predictions. The dissertation was organized into the following three linked studies: (1) a performance evaluation of various approaches for selecting the independent variables of percentile flow regression models, (2) a comparison of different sets of variables for percentile flow regression modeling with increasing amounts of information in terms of the number of variables and their description of the statistical distribution of the data, and (3) a proof-of-concept study using a neural network approach called the self-organizing map (SOM) to account for the noise and non-linearity of predictive relations between the independent variables and percentile flows. Key findings from these studies were as follows: (1) random forests was the best approach for selecting the independent variables for regression models used to predict percentile flows, but variables selected based on a conceptual understanding of the FDC performed nearly as well, (2) a set of only three variables (mean annual precipitation, potential evapotranspiration, and baseflow index) performed as well as models with larger sets of variables representing more physical and climatic information, and (3) the SOM performed similarly to global regression models based on all the basins, but did not outperform regression models developed for regions composed of similar basins. This may be due to the SOM using all the independent variables, whereas the regression models discarded irrelevant variables that could increase the error in percentile flow predictions. All the studies of this dissertation were performed using 918 basins in the contiguous US, and the resulting predictive models provide a tool for local watershed managers to predict 13 percentile flows along with an estimate of the predictive error. These models could be improved through future research that (1) emphasizes the role of geology as this provided the most valuable information for predicting the percentile flows, (2) exploits new sources of remotely sensed information as classic topographic variables provided little predictive information, and (3) develops specialized models designed for high and low flows as these were the most difficult to predict.