(Invited) An Experimental and Computational Data Ecosystem for Advancing Materials Research

Tuesday, 15 October 2019: 16:30
Room 216 (The Hilton Atlanta)
R. R. White, K. Munch, and J. D. Perkins (National Renewable Energy Laboratory)
For the past few decades technological advancement in computer capabilities along with researchers’ desire for more data has combined into a perfect storm of data overload. While data is constantly being generated at an ever-increasing pace, there is also a wealth of historical and potentially useful data residing on researchers’ computers and abandoned hard drives. Over 2.5 quintillion bytes of data is generated every day and over the last two years 90% of all data in the world was generated. These statistics present a daunting problem for both data scientists and materials researchers in general. Not only do we need methods to manage the oncoming flood of materials data, but methods to prepare, aggregate and effectively utilize that data.

Over the past decade, we have developed a materials data ecosystem at the National Renewable Energy Lab (NREL) to provide researchers with the ability to manage and explore the vast amounts of data generated both in-house and across multiple materials collaborations. During the course of this work, we have identified several key concepts required for any research data ecosystem, including a ground-up development approach, workflow and metadata capture, easy accessibility, ability to handle large data formats, and integration with other data resources. Any such system needs to initially focus on the acquisition and archiving of raw data from across all expected and available resources, and in most cases, this should be an automated process. It needs to be able to help researchers past traditional work flow barriers and capture metadata and associate that with data easily. All the acquired information must easily be queryable and retrievable at numerous nexus of the process as a technique for supporting both research and data quality control. The resulting data warehouse needs the ability to be filtered and aggregated to feed supplemental research-targeted databases. Due to the possibility of extremely large resources needing to be processed and analyzed, a means for transmitting and providing enclosed work environments for big data should be anticipated and developed. With the advent of large materials repositories and virtual laboratory consortia, such as the Energy Material Networks (EMN), the data ecosystem needs to take advantage of its resources and supply data streams as needed for researchers wanting to contribute to those platforms. Finally, the research data ecosystem needs forward-facing tools and web applications that are intuitive and easy to use. In the end, information garnered from the data ecosystem would complete the research cycle and foment the next round of experimental design and data generation. In this talk we will look at the underlying drivers of our ecosystem design, the design and development of the various elements, and how that work helped drive our development of the data hub, analysis tools and support for the Energy Material Networks (EMN).

Acknowledgements: This work was supported by U.S. Department of Energy, under contract under contract DEAC36-08GO28308 to Alliance for Sustainable Energy, LLC, the manager and operator of the National Renewable Energy Laboratory.