Contact: daniel.hollarek@student.kit.edu
Towards a high quality public powder X-ray diffraction dataset
— About
We, the AiMat group at the Karlsruhe Institute of Technology (KIT), are collecting a dataset of both labeled and unlabeled experimental powder X-ray diffraction (pXRD) data, which will be made publicly available once sufficient data has been collected. We look forward to publishing a paper surveying the database and discussing its uses, and would be happy to include anyone contributing data as co-authors. You can view the current draft here: State-of-the-database
The main motivation for collecting and publishing this dataset is to assist researchers working on machine learning based approaches for pXRD analysis. Any data, even unlabeled data, can provide clues how simulations can be made to better reflect real-world experiments and labeled data can be used to test developed algorithms. We hope to provide both types of data in a wide variety and large quantity.
While we do not collect single crystal data, we are interested in any kind of polycrystalline samples, including thin films. If your samples exhibit preferred crystallite orientation, please make mention of this in the description field of the upload form. So far, we have received upwards of 4155 unique powder X-ray diffraction patterns from institutions including:
- University of Southern California
- Lawrence Berekley National Laboratory
- Institute of Nanotechnology at KIT
— How to contribute
Please upload your data using the “Data upload” form below. There are two upload fields, one for the files and one for the labels.
(1) Data and .cif files: Please submit your intensity-over-angle data files in the first upload field in the format of a single .zip archive file. If possible, sort the data files in your .zip file into folders by the material system of the underlying samples and make sure that your contribution contains only pXRD files and not also other types of XRD data. If any information on the samples is available to you in the format of .cif files, please include them alongside the data files, labeled with the same filename.
Our preferred data file formats are: .json, .raw, .xrdml, .txt, .json, .dat, .cif, .rd, .udf, .uxd, .gss, .cpi, .dbw, .mca, .cnf, .xdd, .dat, .xy, .xsyg.
If your data files have a different format, please include instructions for how to convert them into intensity-over-angle files in the description field of the submission form. If your files map intensity over q value instead of angle, please also mention this in the description field.
You can download a template .zip file here: template.zip
(2) (Optional) Labels: If any labels (i.e. sample properties) are available to you, please submit them in the form of a single .csv file delimited with “,” with the first column specifying file paths in the zip archive and the following columns specifying properties of the underlying sample. Any columns for which you do not have information you can simply leave out. In your .csv file, specify the:
- Lattice parameters in nm
- Angles in degrees
- Space group through the International space group number
- Element composition as specified by the IUPAC conventions for inorganic chemistry
You can view and download a template .csv file here: template.csv.
— (Optional): Submission helper:
To help you in preparing files for submission, we offer a “submission helper”, which you can download as an executable below in the “Read more” field. The usage of the submission helper is entirely optional.
— Data upload:
In addition to the upload of your data, please also fill out the below form, so we can keep track of who has contributed to the dataset.
If you have any questions regarding the upload process or the submission helper, feel free to direct them towards
Daniel Hollarek (daniel.hollarek@student.kit.edu) or Pascal Friederich (pascal.friederich@kit.edu).
Thank you for your contribution!