Compare molecular and crystal structures
Introduction
This page contains a set of examples that shows how to compare molecular and crystal structures using critic2.
There are different comparison methods available, depending on whether molecules or crystals are being compared but, in general, they all render a numerical value for the similarity of a chosen pair of structures, and it is up to the user to then decide whether the structures are considered “the same” or “different” for the given similarity value. The interface to the comparison methods is the COMPARE keyword in general and COMPAREVC for comparisons between crystal structures allowing for cell distortions. We focus on COMPARE for now.
The simplest way to use the COMPARE keyword is to simply list the structures after the keyword:
COMPARE file1.xyz file2.cif file3.in . file4.out
where any of the formats supported by critic2
are allowed and .
represents the system that has been loaded by a
previous CRYSTAL or
MOLECULE command.
Comparing Molecules
Atoms Are in the Same Order
There are several different ways of comparing molecules. The most
important point to note about molecular comparison is whether the atomic
sequence is the same in all the structures being compared, that is, if
the atoms and their coordinates as they appear in the corresponding
structure files can be assumed to be in the same order. If this is the
case, then the comparison is much simpler and you should use the
SORTED keyword. For instance, the HIS_TYR_HIS
molecule in the
example contains a tripeptide from the
PEPCONF library. The three
variants of this structure are:
-
HIS_TYR_HIS_0.xyz
: the original structure calculated at the LC-wPBE-XDM/aug-cc-pVTZ level of theory. -
HIS_TYR_HIS_0-rattle.xyz
: the same structure with a 0.02 standard deviation rattle imposed using ASE. -
HIS_TYR_HIS_0-sto3g.xyz
: the same structure after relaxation with B3LYP/STO-3G.
(The choice of xyz
as the format for these files is immaterial; any
format accepted by critic2 can be used.)
Neither the rattle nor the geometry relaxation alter the order of the
atoms in this molecule, and therefore the comparison can be carried
out assuming the atomic sequence is the same, with:
compare sorted HIS_TYR_HIS_0-sto3g.xyz HIS_TYR_HIS_0-rattle.xyz HIS_TYR_HIS_0.xyz
For a given pair of molecules, the comparison is carried out by translating the centers of mass of both molecules to the origin and then finding the rotation that brings one of them in closest agreement with the other in the least-squares sense. The output is:
* COMPARE: compare structures
Molecule 1 : HIS_TYR_HIS_0-sto3g.xyz
Molecule 2 : HIS_TYR_HIS_0-rattle.xyz
Molecule 3 : HIS_TYR_HIS_0.xyz
# Assuming the atom sequence is the same in all molecules.
# RMS of the atomic positions in bohr
Molecule HIS_TYR_HIS_0-sto3g.xyz HIS_TYR_HIS_0-rattle.xyz HIS_TYR_HIS_0.xyz
RMS 1 2 3
HIS_TYR_HIS_0-sto3g.xyz 0.0000000 2.5006092 2.4935909
HIS_TYR_HIS_0-rattle.xyz 2.5006092 0.0000000 0.0599950
HIS_TYR_HIS_0.xyz 2.4935909 0.0599950 0.0000000
where the individual values represent the root-mean-square (RMS) of the deviations in the atomic positions in bohr (the units can be changed using the UNITS keyword). Note how the RMS is about 0.03 angstrom for the rattled structure, consistent with the rattling parameter used in ASE.
Atoms Are Not in the Same Order
The comparison problem becomes considerably trickier if the atomic sequences are not in the same order. Consider the structure:
HIS_TYR_HIS_0-shuf.xyz
: the same structure asHIS_TYR_HIS_0-rattle.xyz
but with a random permutation applied on the atoms.
In this case, there are two options: either a method is applied to try to put the atoms in order or a comparison method is used for which the order of the atomic sequence is irrelevant. The recommended procedure is to use Ullmann’s graph matching algorithm, which is the default if no keywords are passed to COMPARE:
compare HIS_TYR_HIS_0-rattle.xyz HIS_TYR_HIS_0-shuf.xyz HIS_TYR_HIS_0.xyz
This method tries to match the molecular graph of one molecule onto the other based on the atomic connectivity by exploring all possible permutations, with tricks applied to shorten the search. Ullmann’s method can be expensive particularly for branching molecules with symmetric (in the molecular graph sense) substituents. Once the two atomic sequences are in the same order, the comparison proceeds as in the ordered case:
* COMPARE: compare structures
Molecule 1 : HIS_TYR_HIS_0-rattle.xyz
Molecule 2 : HIS_TYR_HIS_0-shuf.xyz
Molecule 3 : HIS_TYR_HIS_0.xyz
# Using Ullmann's graph matching algorithm.
# RMS of the atomic positions in bohr
Molecule HIS_TYR_HIS_0-rattle.xyz HIS_TYR_HIS_0-shuf.xyz HIS_TYR_HIS_0.xyz
RMS 1 2 3
HIS_TYR_HIS_0-rattle.xyz 0.0000000 0.0000000 0.0599950
HIS_TYR_HIS_0-shuf.xyz 0.0000000 0.0000000 0.0599950
HIS_TYR_HIS_0.xyz 0.0599950 0.0599950 0.0000000
Note how the RMS between the rattled structure and the same structure with shuffled atoms is zero, and the RMS of both of them with the original structure is the same value as above.
Comparing Crystals
The recommended procedure for comparing crystal structures is to use their simulated powder diffraction patterns (see the POWDER keyword) and compare them using cross-correlation functions as described by de Gelder et al.. In this method, the powder diffractograms of the two crystals are calculated with some arbitrary, but reasonable, parameters for peak shapes and sizes, incident wavelength, etc. Unlike the molecular comparison, this method does not rely on the atoms being in the same order. In the provided example three structures are given:
-
uracil.cif
: the uracil structure from the critic2 structure library. -
urea.cif
: the urea structure from the critic2 structure library. -
urea-rattle.cif
: the same urea structure but with a 0.02 standard deviation rattle imposed using ASE. -
urea-bigrattle.cif
: the same urea structure but with a 1.0 standard deviation rattle imposed using ASE. This essentially destroys the molecular structure.
(The choice of cif
as the format for these files is immaterial; any
format accepted by critic2 can be used.) To compare them, use:
compare urea.cif urea-rattle.cif urea-bigrattle.cif uracil.cif
and the result is:
* COMPARE: compare structures
Crystal 1 : urea.cif
Crystal 2 : urea-rattle.cif
Crystal 3 : urea-bigrattle.cif
Crystal 4 : uracil.cif
# Using cross-correlated POWDER diffraction patterns.
# Please cite:
# de Gelder et al., J. Comput. Chem., 22 (2001) 273
# Two structures are exactly equal if DIFF = 0.
Crystal urea.cif urea-rattle.cif urea-bigrattle.cif uracil.cif
DIFF 1 2 3 4
urea.cif 0.0000000 0.0000248 0.2463948 0.9384096
urea-rattle.cif 0.0000248 0.0000000 0.2448225 0.9385707
urea-bigrattle.cif 0.2463948 0.2448225 0.0000000 0.8963729
uracil.cif 0.9384096 0.9385707 0.8963729 0.0000000
The similarity index calculated by this method (their POWDIFF value) goes between 0 and 1, which 0 being a perfect match and 1 complete dissimilarity. Note how the small rattle gives a urea structure that is essentially coincident with the original, whereas a big rattle results in a much higher value, and the uracil and urea structures have nothing in common.
Determining Repeated and Unique Structures
An operation that is commonly carried out with the COMPARE keyword is to determine, from a (large) list of structures, which of them are unique and which are repeated. This can be easily accomplished with the REDUCE option to COMPARE, which works similar to normal COMPARE, but skips over structures that have already been determined to be equivalent to others in the list. For instance, if we wanted to compare a list of 50 very similar structures we would do:
COMPARE REDUCE 1e-6 str-001.POSCAR str-002.POSCAR str-003.POSCAR \
str-004.POSCAR ...
This means that two of the crystal structures in the list are considered equal if the difference in their powder diffraction patterns is less than 1e-6, calculated as in the example above. The output of COMPARE/REDUCE contains the following parts. First, the powder diffraction difference values are reported:
DIFF 1 2 3 4 5
str-001.POSCAR not-calculated 0.0007855 0.0007439 0.0007439 0.0007855
str-002.POSCAR 0.0007855 not-calculated 0.0000701 0.0000701 not-calculated
str-003.POSCAR 0.0007439 0.0000701 not-calculated not-calculated not-calculated
str-004.POSCAR 0.0007439 0.0000701 not-calculated not-calculated not-calculated
...
where some of the difference values were not calculated, because some of the structures were determined to be equivalent to others at the requested level (i.e. their DIFF was lower than 1e-6). This cuts down the computational cost of the COMPARE/REDUCE calculation significantly. After this, the list of unique structures is listed:
+ List of unique structures (15):
1: str-001.POSCAR with multiplicity 1
2: str-002.POSCAR with multiplicity 4
3: str-003.POSCAR with multiplicity 7
6: str-006.POSCAR with multiplicity 3
...
where the “multiplicity” is the number of repeated structures in the original list corresponding to the given unique representative structure. Lastly, the repeated structures are listed, along with their unique representative from the list above:
+ List of repeated structures (34):
4: str-004.POSCAR same as 3: str-003.POSCAR
5: str-005.POSCAR same as 2: str-002.POSCAR
9: str-009.POSCAR same as 3: str-003.POSCAR
10: str-010.POSCAR same as 7: str-007.POSCAR
11: str-011.POSCAR same as 6: str-006.POSCAR
...
Comparing Crystals Allowing for Cell Distortions (VC-PWDF)
The variable-cell powder difference (VC-PWDF) comparison method is used to compare two crystal structures. It is similar to the plain powder diffraction comparison method described above, but it is designed to produce a high similarity result (a low VC-PWDF value) when one of the structures is a lattice distortion of the other. This can happen, for instance, due to the effect of temperature or pressure, or when comparing a calculated structure with an experimental one. VC-PWDF works by designating one of the structures as reference and the other as candidate; both are first transformed into their reduced cells. Then, all possible transformations of the reduced cell of the candidate structure are explored that may bring it into (rough) agreement with the reference reduced cell. Then, the candidate adopts the cell parameters of the reference structure’s reduced cell and the powder diffractogram dissimilarity index (POWDIFF) is calculated as in the example above. The process is repeated for all possible cell transformations and the final VC-PWDF value is the minimum of all calculated dissimilarity indices. The algorithm for the VC-PWDF method is described in detail in Mayo et al..
Because it involves several (sometimes many) powder diffraction generation and comparison steps, VC-PWDF is slower than the usual powder diffraction comparison and it should not be used unless lattice distortions are expected to play a role in the comparison. A VC-PWDF value of 0.03 or lower indicates considerable similarity and a probable match, although user discretion is recommended.
To compare two structures using the VC-PWDF method, use the COMPAREVC keyword:
COMPAREVC xtal1.cif xtal2.cif
The output shows which structure is being used as reference and which is the candidate, the reduced cell lattice vectors for both, the list of transformed candidate lattice vectors, and the calculated powder diffraction dissimilarity values for each transformation. The final VC-PWDF value is given at the end, calculated as the minimum of the dissimilarity indices for all candidate transformations. Note that, unlike COMPARE, COMPAREVC cannot be used to compare more than two structures at a time. Hence, if you want to compare a list of structures, you will need to repeat COMPAREVC as many times as necessary (see the link at the end of this section).
After the two crystals are compared, COMPAREVC can be used to write
the transformed structures that most resemble each other, that is, the
ones that generated the final VC-PWDF value. This is done by using the
WRITE
option:
COMPAREVC xtal1.cif xtal2.cif WRITE
which generates two .res
files (<root>_structure_1.res
and
<root>_structure_2.res
), one for each of the crystal structures that
yielded the VC-PWDF score. The powder diffraction patterns
of the original and deformed structures can be generated and plotted for
visual comparison using the
POWDER command on the corresponding files.
The same method implemented in COMPAREVC can be used to compare crystal
structures to
experimental powder diffraction patterns, as described in
Mayo et al..
This experimental VC-PWDF (VC-xPWDF) method requires the experimental
diffraction pattern using Cu Kα radiation in .xy
format
(2θ (°) vs intensity), the indexed cell dimensions from the
experimental PXRD data, and the crystal structure you wish to compare
to the PXRD data. For now, it can be accessed via:
TRICK COMPARE PROGST10.cif PROGST-PXRD.xy 10.3741 12.6059 13.8464 90 90.268 90
where the candidate structure, diffractogram, cell lengths (in Angstroms) and cell angles (in degrees) are given. The output is entirely analogous to COMPAREVC. Same as with COMPAREVC, the WRITE keyword can be used to write the transformed structure that best matches the experimental pattern. A VC-xPWDF score of 0.1 or less implies notable similarity but does not guarantee a match, so it is recommended that the simulated powder diffractogram is plotted along with the experimental one.
The VC-PWDF methods cannot be used reliably for disordered structures,
and will yield low scores for certain polytype and conformational
phase structures. In VC-xPWDF, experimental PXRD data that exhibit
considerable baseline require pre-processing because critic2 onlyl
does minimal processing of the experimental pattern (specifically, the
minimum y value in the .xy
file is subtracted). In addition, data
that shows severe preferred orientation may yield poor results,
because the peak heights deviate from those predicted by critic2, and
pre-processing to remove extraneous peaks is highly recommended.
You can find scripts and more detailed instructions on how to use VC-PWDF and VC-xPWDF at Erin Johnson’s software page.
Example Files Package
Files: example_13_01.tar.xz.