Towards a high quality public powder X-ray diffraction dataset

— About
We, the AiMat group at the Karlsruhe Institute of Technology (KIT), are collecting a dataset of both labeled and unlabeled experimental powder X-ray diffraction (pXRD) data, which will be made publicly available once sufficient data has been collected. We look forward to publishing a paper surveying the database and discussing its uses, and would be happy to include anyone contributing data as co-authors. You can view the current draft here: State-of-the-database

The main motivation for collecting and publishing this dataset is to assist researchers working on machine learning based approaches for pXRD analysis. Any data, even unlabeled data, can provide clues how simulations can be made to better reflect real-world experiments and labeled data can be used to test developed algorithms. We hope to provide both types of data in a wide variety and large quantity.

While we do not collect single crystal data, we are interested in any kind of polycrystalline samples, including thin films. If your samples exhibit preferred crystallite orientation, please make mention of this in the description field of the upload form. So far, we have received upwards of 4155 unique powder X-ray diffraction patterns from institutions including:

  • University of Southern California
  • Lawrence Berekley National Laboratory
  • Institute of Nanotechnology at KIT

How to contribute
Please upload your data using the “Data upload” form below. There are two upload fields, one for the files and one for the labels.

(1) Data and .cif files: Please submit your intensity-over-angle data files in the first upload field in the format of a single .zip archive file. If possible, sort the data files in your .zip file into folders by the material system of the underlying samples and make sure that your contribution contains only pXRD files and not also other types of XRD data. If any information on the samples is available to you in the format of .cif files, please include them alongside the data files, labeled with the same filename.

Our preferred data file formats are: .json, .raw, .xrdml, .txt, .json, .dat, .cif, .rd, .udf, .uxd, .gss, .cpi, .dbw, .mca, .cnf, .xdd, .dat, .xy, .xsyg.
If your data files have a different format, please include instructions for how to convert them into intensity-over-angle files in the description field of the submission form. If your files map intensity over q value instead of angle, please also mention this in the description field.

You can download a template .zip file here:

(2) (Optional) Labels: If any labels (i.e. sample properties) are available to you, please submit them in the form of a single .csv file delimited with “,” with the first column specifying file paths in the zip archive and the following columns specifying properties of the underlying sample. Any columns for which you do not have information you can simply leave out. In your .csv file, specify the:

You can view and download a template .csv file here: template.csv.

(Optional): Submission helper:
To help you in preparing files for submission, we offer a “submission helper”, which you can download as an executable below in the “Read more” field. The usage of the submission helper is entirely optional.

The submission helper allows you to search a folder of your choosing for files matching common XRD formats. You can add or remove formats according to your preferences. From the list of matched files you can then select files that are to be copied into a .zip file ready for submission through the form below. When confirmation is given, a .zip file with the copied data is created, along with a corresponding .csv file:

In the process of creating the .zip file, your data files are only ever opened in read mode. Even if the program were to crash during execution, your data files remain safe and unchanged. Warnings that Chrome or Microsoft may show when downloading and executing the software can be ignored. The source code for these executables can be viewed on the same GitHub repositority that also features the paper draft.

On Ubuntu: To run the tool, execute the following command in the download folder: “chmod +x aimat_xrd_ubuntu;./aimat_xrd_ubuntu”

— Data upload:
In addition to the upload of your data, please also fill out the below form, so we can keep track of who has contributed to the dataset.
If you have any questions regarding the upload process or the submission helper, feel free to direct them towards
Daniel Hollarek ( or Pascal Friederich (

Thank you for your contribution!

Does the data contain in-situ/in-operando experiments?
Upload progress 0%