We've Moved!
Visit SDSU’s new digital collections website at https://digitalcollections.sdsu.edu
Description
Simultaneous measurement of gene expression on a genomic scale can be accomplished using microarray technology or by sequencing based methods. Researchers who perform high throughput gene expression assays often deposit their data in public databases, but heterogeneity of measurement platforms leads to challenges for the integration of data sets. Researchers wishing to perform cross platform normalization face three major obstacles. Firstly, a choice must be made about which method or methods to employ. Nine are currently available, and no rigorous comparison exists. Secondly, a suitable multi-platform data set must be constructed. Thirdly, software for the selected method must be obtained and incorporated into a data analysis workflow. In this work, cross-platform normalization methods are compared based on inter-platform concordance and on the consistency of gene lists obtained with transformed data using two publicly available testing data sets. Scatter and ROC-like plots are produced, and bootstrapping is employed to obtain distributions for statistics based on those plots. The consistency of platform effects across studies is explored theoretically and with respect to testing data sets. Comparisons indicate that four methods, DWD, EB, GQ, and XPN, are generally effective, while the remaining methods do not adequately correct for platform effects. Of the four successful methods, XPN generally shows the highest inter-platform concordance when treatment groups are equally sized, while DWD is most robust to differently sized treatment groups and consistently shows the smallest loss in gene detection. The applicability of cross-platform normalization to non-ideal data sets is explored, and the significance of treatment-platform interaction effects in microarray data is demonstrated. An R package, CONOR, is provided that is capable of performing the nine cross-platform normalization methods considered.