Document Type


Publication Date


Publication Title

Data in Brief


Asphaltenes, a distinct class of molecules found in crude oil, exhibit insolubility in nonpolar solvents like n-heptane but are soluble in aromatic solvents such as toluene and benzene. Understanding asphaltenes is crucial in the petroleum industry due to their detrimental effects on oil processing, resulting in significant economic losses and production disruptions. While no singular structure defines asphaltenes, two major molecular architectures, namely archipelago and continental models, have gained wide acceptance for their consistency with various experimental investigations and subsequent use in computational studies.

The archipelago model comprises two or more polyaromatic hydrocarbon entities interconnected via aliphatic side chains. In contrast, the island or continental model features a unified polyaromatic hydrocarbon moiety with 4 to 10 fused aromatic rings, averaging around 7 rings. To establish a comprehensive collection, we meticulously curated over 250 asphaltene structures derived from previous experimental and computational studies in this field. Our curation process involved an extensive literature survey, conversion of figures from publications into molecular structure files, careful verification of conversion accuracy, and structure editing to ensure alignment with molecular formulas. Our database provides digital structure files and optimized geometries for both predominant structural motifs. The optimization procedure commenced with the PM6 semi-empirical method, followed by further optimization utilizing density functional theory employing the B3LYP functional and the 6-31+G(d,p) basis set. Furthermore, we compiled a range of structural and electronic features for these molecules, serving as a valuable foundation for employing machine learning algorithms to investigate asphaltenes. This work provides a ready to use structural database of asphaltenes and sets the stage for future research endeavours in this domain.


Published in Data in Brief by Elsevier Inc. Available via doi: 10.1016/j.dib.2023.109907.

This is an open access article under the CC BY-NC-ND license (