FAIR-DI e.V.
FAIRmat
NOMAD Laboratory
NOMAD CoE

Pillar C: Soft-matter and biomolecular simulations

Spokesperson: Kurt Kremer (Max Planck Institute for Polymer Research, Mainz)
Deputy: Carsten Baldauf (Fritz Haber Institute of the Max Planck Society, Berlin)

SCOPE

The field of biophysical and soft-matter simulations covers a wide range of methodologies, for example from all-electron simulations to atomistic resolution to coarse graining and to finite-element methods as well as hybrid methods thereof. It touches related fields like computational/theoretical chemistry, materials science and condensed-matter physics. Let alone in the field of molecular-mechanics based atomistic simulations, a multitude of different computer codes utilizing many different force fields and parametrizations are being used.

GOALS

To develop an infrastructure for the upload, storage, and sharing of input and output files of diverse simulation types;
To raise awareness in the community about the importance of publicly and accessible sharing simulation outputs, in particular in order to allow for data storing according to the rules of good scientific practice, but also to allow data sharing to bring forward science and to support the reach and visibility of one’s own research work.

MEANS

At first, pillar C will have to come up with a categorization of simulations and will have to develop an infrastructure for data that allows for upload, processing, categorization, normalization, storage and sharing following the example of NOMAD Repository and Archive. We plan to extend the infrastructure developed in NOMAD towards trajectories of calculations (e.g. molecular dynamics) and force fields (including storage of run parameters etc.). In particular molecular dynamics simulations cover a large range of multiscale biophysics and soft-matter simulations, and thus stand as an essential first step. Parsing of input and output files from the main existing MD codes, e.g. Gromacs or LAMMPS, will be the first step. It is crucial to obtain all parameters relevant for propagating the simulation, including the force field used for the simulation. This will require some flexibility, given the tendency to use custom force fields, especially in soft matter community. Parsing the output should contain enough information to perform any type of commonly-applied analysis. This includes, but is not limited to:

  • Conformational/configurational averages of structural order parameters (e.g., radial distribution function, radius of gyration, root-mean-squared deviation) in order to extract Boltzmann averages from trajectories,
  • Kinetic analysis, such as Markov state models
  • Free-energy calculations, as for example thermodynamic integration, requiring conformational averages of different Hamiltonians.

In any case, not only normalized (extracted) data will be stored but also the original uploaded files. Besides these first fundamental steps, future goals include:

  • Quality measures, e.g. regarding the equilibration of simulations or the accuracy of potentials.
  • In particular for biomolecular simulations, linking to existing online resources by the Protein Data Bank (www.wwpdb.org), the European Bioinformatics Institute (www.ebi.ac.uk), ExPASy (www.expasy.org), or Kyoto Encyclopedia of Genes and Genomes (www.genome.jp/kegg/) is imperative.

Making the data usable and accessible for analyses based on artificial intelligence methods.