Many data-intensive applications need to process massive data sets (e.g., scientific data, photographs, and videos) to discover useful information such as hidden patterns or market trends. Delivering these large data sets all the way from a storage system to host CPU(s) puts a tremendous pressure on a traditional host-CPU based computing architecture as it incurs a substantial data transfer latency and energy consumption. Thus, a new computing paradigm is greatly needed. Non-volatile memory (NVM) technologies like NAND flash advanced rapidly in the past decade, which offers huge potential for processing data in-situ or offloading computation near the data. In this dissertation research, we first propose a new in-storage processing architecture called RISP (Reconfigurable In-Storage Processing), which employs field-programmable gate array (FPGA) as data processing unit and NVM controller. Unlike traditional ISP techniques, RISP can reconfigure storage data processing resources to achieve a high energy-efficiency without any performance degradation for big data analysis applications. Three case studies are provided in this project. Experimental results show that RISP significantly outperforms a CPU-centric processing architecture in terms of performance and energy-efficiency. Second, a near-data processing (NDP) server architecture is proposed to evaluate its impact on a diverse range of data center applications from data-intensive to compute-intensive. Several new findings have been observed. For example, we found that an FPGA-based NDP server can offer performance benefits not only for data-intensive applications but also for compute-intensive applications. Third, by applying the idea of reconfigurability proposed in the first project on an NDP server, we developed a reconfigurable NDP server that can dynamically reconfigure its computing resources according to the characteristics of an application. Our results shown that the NDP server can achieve a higher energy-efficiency without any performance degradation compared to the NDP architecture proposed in the second project. Finally, two memory-access-efficient implementations of kNN (k-Nearest Neighbors) on FPGA are presented. Principal component analysis (PCA) and low-precision data representation are employed to reduce data accesses. Results show that external memory accesses are substantially reduced (>28x), which obviously improves performance in terms of execution time and energy-efficiency.