16–19 Oct 2016
Copenhagen University
Europe/Copenhagen timezone

High-Performance XPCS Data Reduction using Virtualized Computing Resources

17 Oct 2016, 12:30
5m
Marble Hall (Copenhagen University)

Marble Hall

Copenhagen University

Thorvaldsensvej 40
Mini Oral Contributions 1

Speaker

Mr Nicholas Schwarz (Argonne National Laboratory)

Description

Demands for increased computing at synchrotron facilities are driven by new scientific opportunities often enabled by technological advances in detectors, as well as advances in data analysis algorithms. These advances generate larger amounts of data, which in turn require more computing power in order to obtain near real-time results. An example where advances in computation are critical is found in the X-ray Photon Correlation Spectroscopy (XPCS) technique, a unique tool to study the dynamic nature of complex materials from micrometer to atomic length scales, and time scales ranging from seconds to nanoseconds. The recent development and application of higher-frequency detectors allows the investigation of faster dynamic processes enabling novel science in a wide range of areas. A consequence of XPCS detector advancements is the creation of greater amounts of image data that must be processed within the time it takes to collect the next data set in order to guide the experiment. Parallel computational and algorithmic techniques and high-performance computing (HPC) resources are required to handle this increase in data. In order to realize this, the APS has teamed with the Computing, Environment, and Life Sciences directorate at Argonne to use a virtualized computing resource located on the Argonne site. Virtual computing environments separate physical hardware resources from the software running on them, isolating an application from the physical platform. The use of this remote virtualized computing resource and the OpenStack and Cloudera management tools affords the APS many benefits. The virtualized environment allows the APS to install, configure, and update its Hadoop-based XPCS reduction software easily and without interfering with other users on the system. Its scalability allows the provisioning of more computing resources when larger data sets are collected. The XPCS workflow starts with raw data streaming directly from the detector to a compressed file on the parallel file system located at the APS. Once the acquisition is complete, the data is automatically transferred using GridFTP to the Hadoop Distributed File System (HDFS) running on the virtualized resource in a different building. This transfer occurs over two dedicated 10 Gbps optical links. By bypassing intermediate firewalls, this dedicated connection provides a very low latency, high-performance data pipe between the two facilities. Immediately after the transfer, Hadoop MapReduce-based data reduction algorithms are run in parallel on the provisioned compute instances, followed by Python-based error-fitting code. Resources provisioned for typical use by the XPCS application includes approximately 120 CPU cores, 500 GB of distributed RAM, and 20 TB of distributed disk storage. Provenance information and the resultant reduced data are added to an HDF5 file, which is automatically transferred back to the APS for interpretation. This system is in regular use at the 8-ID-I beamline of the APS. The whole reduction process is completed shortly after data acquisition, typically in less than one minute - a significant improvement over previous setups. The faster turnaround time helps scientists make time-critical, near real-time adjustments to experiments, enabling greater scientific discovery. *Work supported by U.S. Department of Energy, Office of Science, under Contract No. DE-AC02-06CH11357.

Primary author

Mr Faisal Khan (Argonne National Laboratory)

Co-authors

Dr Alec Sandy (Argonne National Laboratory) Mr Benjamin Pausma (Argonne National Laboratory) Mr Collin Schmitz (Argonne National Laboratory) Mr Daniel Murphy-Olson (Argonne National Laboratory) Mr Nicholas Schwarz (Argonne National Laboratory) Mr Ryan Aydelott (Argonne National Laboratory) Dr Suresh Narayanan (Argonne National Laboratory)

Presentation materials

There are no materials yet.