NOBUGS 2016

Name: NOBUGS 2016
Start: 2016-10-16T14:00:00+02:00
End: 2016-10-19T18:00:00+02:00
Location: Copenhagen University

16–19 Oct 2016

Copenhagen University

Europe/Copenhagen timezone

Organiser Email

nobugs@esss.se

Data Analysis Infrastructure for Diamond Light Source Macromolecular & Chemical Crystallography

18 Oct 2016, 14:30

20m

Marble Hall (Copenhagen University)

Marble Hall

Copenhagen University

Thorvaldsensvej 40

Oral Contribution Contributions 5

Dr Markus Gerstel (Diamond Light Source Ltd)

A proposal for the future data analysis infrastructure at Diamond Light Source is presented. Built on a messaging framework a variable number of distributed servers working in parallel replaces monolithic batch jobs running on a single node. This infrastructure is scalable, can be easily extended, and even allows moving heavily CPU-bound tasks, such as the processing of reduced macromolecular data, off-site, e.g. to external cloud providers. Diamond Light Source has 8 MX & CX beamlines with DECTRIS PILATUS detectors, each capable of producing diffraction data at rates between 25 and 100 images per second. Upgrades to new DECTRIS EIGER detectors are planned over the forthcoming year. These offer frame rates up to 133-3,000 Hz concomitant with increased image sizes for compressed data rates of around 18 Gbit/s. The current automated data analysis process consists of mainly two aspects: a very fast and embarrassingly parallel per-image-analysis for timely feedback during data collection, and more involved data reduction and processing designed to give answers to the experimental questions. The existing infrastructure depends on submitting batch jobs to a high performance computing cluster. While appropriate for the current workload this approach alone does not scale to the very high data rates anticipated in the near future. In particular with live processing there are shortcomings in performance when the workload exceeds the capacity of one cluster node. When data rates stay significantly below a node's capacity the cluster is currently not used efficiently. In the proposed infrastructure fine-grained tasks are submitted as messages to a central queue. Servers, running on cluster nodes, consume these messages and process the tasks. Results can be written to a common file system, sent to another queue for further downstream processing, sent to a dynamic number of subscribing observers, or any combination of these. This will increase the availability of high performance nodes to allow increased parallelisation of more computationally expensive tasks, thus increasing the overall efficiency of cluster usage. The resulting distributed infrastructure is resource-optimal, low-latency, fault-tolerant, and allows for highly dynamic data processing.

Dr Markus Gerstel (Diamond Light Source Ltd)

Dr Alun Ashton (Diamond Light Source Ltd) Dr Graeme Winter (Diamond Light Source) Dr Richard Gildea (Diamond Light Source Ltd)

Slides

Data Analysis Infrastructure for Diamond MX and CX

NOBUGS 2016

Organiser Email

Data Analysis Infrastructure for Diamond Light Source Macromolecular & Chemical Crystallography

Marble Hall

Copenhagen University

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

NOBUGS 2016

Organiser Email

Speaker

Description

Author

Co-authors

Presentation materials