matsml is a python-based machine-learning (ML) toolkit for some generic problems in materials science. Being initiated to support some of non-polymer ML research works by Huan Tran, matsml was designed to be portable and self-contained, providing necessary materials to follow the workflows and to reproduce the results reported. Given this objective, actual scripts and data used for his works can be found in some examples of matsml while others are for tutorial purposes and beyond. For an example, the Jupyter Notebook ex4_hoips.ipynb and its pdf version ex4_hoips.pdf detailing the training processes of the ML models reported in Phys. Rev. Materials 5, 125402 (2021) can be found in an example of the toolkit. matsml is available at https://github.com/huantd/matsml.git.
The traditional workflow of materials informatics, which has been used widely for a while and assumed in this toolkit, includes preparing, generating, and collecting suitable data, featurizing (or fingerprinting) the data, learning the featurized data to make models, using the developed models to make predictions, inverting the models to solve inverse problems, and more. This toolkit does not aim at providing complete solutions to any of these steps. However, demonstrations for most of the typical workflow can be found within matsml, specifically in examples/ex1_pcm-molecs and others.
In fact, the field has recently been moving so drastically with numerous new developments that are well beyond this classical workflow. These advances will be updated in matsml as soon as possible.
Most of the computed data referred to in this toolkit are from Huan's works, others are open reported data. In cases of experimental data that are subjected to copyright and ownership, suitable freely available alternatives are provided for demonstration purpose.