Kirchmair Group | Servers, Software, Data

The New E-Resource for Drug Discovery (NERDD)

Most of the tools developed in our research group and presented on the Research website are available via NERDD. The web platform is designed to be maintainable and scalable. NERDD meets modern security standards and supports encrypted communication via HTTPS. The web service is linked to an in-house high-performance computing facility that can handle large numbers of concurrent requests.

The Sperrylite Dataset

The Sperrylite Dataset is a complete collection of high-quality structures of protein-bound ligand conformations extracted from the PDB. It consists of a total of 10,936 high-quality structures of 4548 unique ligands and hence offers a unique resource for the study of protein-bound ligand conformations.

The Sperrylite Dataset was compiled with a recently published cheminformatics pipeline that automatically (i) prepares the chemical structures of small molecules by taking into account the protein environment (in order to determine, e.g., the most likely tautomeric and protonation states); (ii) removes undesirable molecules such as crystallization aids as well as structures with topological and/or geometrical errors; and (iii) rejects structures of low quality. Importantly, the procedure not only includes checks for resolution and DPI, but also employs the recently developed EDIA method to assess the support of individual atoms of a structure by the electron density.

The Sperrylite Dataset contains (among others) a total of 91 ligands represented by at least ten high-quality structures of their protein-bound conformations. Recently we published an analysis of the diversity of the conformations of these ligands. Of these 91 molecules, 69 had at least two distinct conformations (defined by an RMSD greater than 1 Å). For a representative subset of 17 approved drugs and cofactors we observed a clear trend for the formation of few clusters of highly similar conformers. Even for proteins that share a very low sequence identity, ligands were regularly found to adopt similar conformations. For cofactors, a clear trend for extended conformations was measured, although in few cases also coiled conformers were observed.

The Sperrylite dataset has been published in Frontiers in Chemistry. The full dataset can be downloaded from here, whereas the subset of 91 ligands represented by at least ten high-quality conformations is available for download here.

The Platinum Dataset

The Platinum Dataset is a complete subset of unique molecules of the Sperrylite Dataset and contains a total of more than 4500 high-quality structures. It was designed as a benchmark dataset for assessing the performance of conformer ensemble generators. The first version of the Platinum Dataset was published in the Journal of Chemical Information and Modeling in early 2017. An updated version of the dataset was published within the scope of a benchmarking study of eight commercial conformer ensemble generators in the same journal recently.

Regularly updated versions of the Platinum Dataset can be downloaded from this website below.

 

The following versions of the Platinum Dataset are available for download:

Platinum Dataset 2016_01
(as published in
DOI:10.1021/acs.jcim.6b00613)
Platinum Dataset 2017_01
  • Data extracted from the PDB on
February 12, 2016 February 16, 2017
  • No. of compounds Platinum Dataset
4626 4548
  • No. of compounds Platinum Diverse Dataset
2912 2859
  • Compounds present in both the 2016_01 and 2017_01 versions of the Platinum Dataset
4456
  • Compounds present in both the 2016_01 and 2017_01 versions of the Platinum Diverse Dataset
2763
  • Compounds removed from the 2016_01 Platinum Dataset
170
  • Compounds added to the 2017_01 Platinum Dataset
92
  • Download
Platinum Dataset 2016_01 Platinum Dataset 2017_01
Platinum Diverse Dataset 2016_01 Platinum Diverse Dataset 2017_01

Change log

  • Platinum Dataset 2017_01
    • Refined EDIA method
    • Pipeline now rejects ligands that are wrongly annotated as “free” ligands in the PDB (while actually being covalently bound)
    • Pipeline now also rejects ligands with planarity issues of aromatic systems
  • Platinum Dataset 2016_01