Reading a Molecular Structure
Loading a Molecular Structure (MOLECULE)
MOLECULE file.xyz [border.r] [CUBIC|CUBE]
MOLECULE file.mol2 [border.r] [CUBIC|CUBE] [name.s]
MOLECULE file.pdb [border.r] [CUBIC|CUBE]
MOLECULE file.wfn [border.r] [CUBIC|CUBE]
MOLECULE file.wfx [border.r] [CUBIC|CUBE]
MOLECULE file.fchk [border.r] [CUBIC|CUBE]
MOLECULE file.molden [border.r] [CUBIC|CUBE]
MOLECULE file.molden.input [border.r] [CUBIC|CUBE]
MOLECULE file.log [border.r] [CUBIC|CUBE]
MOLECULE file.{gjf,com} [border.r] [CUBIC|CUBE]
MOLECULE file.zmat [border.r] [CUBIC|CUBE]
MOLECULE file.dat [border.r] [CUBIC|CUBE]
MOLECULE file.out [border.r] [CUBIC|CUBE]
MOLECULE file.pgout [border.r] [CUBIC|CUBE]
MOLECULE file.gen [border.r] [CUBIC|CUBE]
MOLECULE file.cube
MOLECULE file.bincube
MOLECULE file.{in,in.next_step} # (geometry.in, FHIaims input)
MOLECULE file.{out,own} # (FHIaims output)
MOLECULE file.cif
MOLECULE ...
MOLECULE [CIF|SHELX|21|CUBE|BINCUBE|WIEN|ABINIT|ELK|QE_IN|QE_OUT|CRYSTAL|XYZ|MOL2|WFN|WFX|
FCHK|MOLDEN|GAUSSIAN|GJF|ZMAT|SIESTA|XSF|GEN|VASP|PWC|AXSF|DAT|PGOUT|ORCA|DMAIN|
FHIAIMS_IN|FHIAIMS_OUT|FRAC] ...
MOLECULE
NEQ x.r y.r z.r atom.s [ANG/ANGSTROM] [BOHR/AU]
atom.s x.r y.r z.r [ANG/ANGSTROM] [BOHR/AU]
atnumber.i x.r y.r z.r [ANG/ANGSTROM] [BOHR/AU]
CUBIC|CUBE
BORDER border.r
ENDMOLECULE/END
MOLECULE LIBRARY label.s
Critic2 can be used for gas-phase (isolated) molecules as well as crystals. A molecular structure is loaded using the MOLECULE keyword. The MOLECULE keyword is most often used for loading an xyz or a similar format generated by a program that works natively with gas-phase molecules (Gaussian, psi4, etc.). However, the molecular geometry can also be given directly in the input using the MOLECULE/ENDMOLECULE environment or using any of the formats typically employed for crystals (cube, cif, scf.in, etc.). All file formats valid in the CRYSTAL keyword are also allowed in MOLECULE. As in the CRYSTAL case, the file extension is used to interpret the file format.
Because critic2 always works under periodic boundary conditions, it does the analysis of molecular structures by placing the molecule at the center of a supercell large enough to contain it plus a border. Provided the size of the vacuum is large enough, the results of the analysis should be correct. The use of MOLECULE instead of CRYSTAL changes some of the default behavior in critic2. Namely:
-
The default distance units in input and output are angstrom instead of bohr (use the UNITS keyword to change this behavior). In particular, this applies to the Cartesian coordinates for the atoms in the MOLECULE environment and to the argument for BORDER. In the case of xyz, mol2, wfn, wfx, fchk, dat, out, pgout, molden, molden.input, gen, and cube files, the Cartesian coordinate system in input and output is the same as in the original file. The “Input orientation” is read from Gaussian log (output) files.
-
The use of symmetry is automatically deactivated. All molecular structures are run in the P1 space group (equivalent to the C1 point group).
-
The default critical point search seeding strategy in AUTO is modified. In a crystal, a recursive subdivision of a symmetry-reduced portion of the Wigner-Seitz cell is used (SEED WS with DEPTH 1). In a molecule, the default is to seed at the center of every interatomic line between atom pairs less than 15 bohr apart (SEED PAIR).
-
In addition to the supercell, a second smaller cell is defined, the “molecular cell”. The molecular cell can be visualized by using the MOLCELL keyword in WRITE or CPREPORT. The region outside the molecular cell is assumed to be vacuum outside molecular space. Any CPs found outside the molecular cell are discarded, and all downwards gradient paths that exit the molecular cell are assumed to have diverged to infinity.
A simple example of MOLECULE input and the corresponding output generated by critic2 can be found here. Multiple molecular and crystal structures can be read in succession, same as in CRYSTAL.
Molecular File Formats (xyz, wfn, wfx, log, gjf, zmat, com, fchk, dat, out, pgout, molden, molden.input)
A gas-phase molecule can be input using the following format:
-
An xyz file.
-
Gaussian wavefunction (wfn/wfx) file.
-
Gaussian output (log) file.
- Gaussian input file (gjf, com). Only simple input files are
interpreted correctly. The molecular geometry is read from the text
block after the second blank line. The first line (charge and
multiplicity) is skipped and the rest are intrepreted as:
at.s x.r y.r z.r
where
at.s
is the atomic symbol and the rest of the fields are the atomic coordinates in angstrom. -
Z-matrix file format (zmat). This file contains the z-matrix of the molecule line by line in Gaussian format, optionally with the charge and multiplicity in the first line.
-
Gaussian formatted checkpoint file (fchk).
-
psi4 output file (dat).
-
orca output file (out).
-
postg output file (pgout).
- molden format (psi4, ADF, orca, etc.). The molden.input extension is the same as molden (used by orca).
The input molecule is enclosed in a box that is larger (default: 10
angstrom) in all directions than the minimal box encompassing the
molecule. If the CUBIC (or CUBE) keyword is given, then a cubic
supercell is used. The width of the vacuum around the molecule can be
changed with the optional border.r
argument (by default in angstrom,
the units can be changed with the
UNITS keyword). The molecule
is automatically translated to the center of the supercell. The
transformation from fractional coordinates referred to the
encompassing cell to Cartesian coordinates is made so that the latter
correspond to the original coordinate system in the input file. A
molecular cell is chosen following the default procedure, see below.
TRIPOS/SYBYL mol2 format (mol2)
The TRIPOS/SYBYL mol2 is a molecular format. A mol2 file may contain
one or more molecule specifications. If no further information is
given, critic2 reads the first molecule in the mol2 file. If the
optional argument name.s
is given, read the molecule with that name
(the name is the line after @<TRIPOS>MOLECULE
). name.s
is
case-sensitive.
pdb format (pdb)
This is a molecular format used by the Protein Data Bank (PDB) for biological molecules. Critic2 reads the molecular structure from the ATOM and HETATM records.
DFTB+ gen Format (gen)
A molecule can be read in DFTB+’s gen file format. If no lattice
vectors are provided, the optional border.r
and CUBIC (or CUBE)
keyword can be used to control the size and shape of the encompassing
cell. The same considerations as for xyz files apply. The coordinates
in critic2’s input and output are the same as in the gen file.
See the DFTB+ example for worked out cases.
Cube Files (cube, bincube)
Cube files are also often used to describe molecular structures. For
instance, the cube files generated by Gaussian’s cubegen
program. As
in the case of xyz files, the Cartesian coordinate system in the rest
of the input and in the output is chosen so that it is the same as in
the cube file.
Note that, contrary to xyz files, critic2 does not choose the size and shape of the encompassing cell; the cell is given by the cube file. Hence, the molecule is not translated by critic2, and it should be centered for MOLECULE to work correctly.
Critic2 can be used to convert cube files to binary format in order to save disk space and reading/writing time. Binary cube files have extension .bincube, and contain essentially the same information as a usual cube file.
FHIaims Inputs and Outputs (in, in.next_step, out, own)
Molecular (and crystal) structures can be loaded from an FHIaims “geometry.in” input file. Alternatively, you can also load the structure from the “geometry.in.next_step” file written by FHIaims during a geometry optimization.
The molecular structure can also be loaded from an FHIaims output file, which is assumed to have a .out or .own extension. In the case of a geometry optimization, the last available geometry in the output file is read.
Other Crystallographic Formats (cif, scf.in,…)
All CRYSTAL keywords can be replaced by MOLECULE and viceversa, with the effect discussed above. The behavior of MOLECULE in this case is essentially the same as in the case of a cube file: the encompassing cell is taken from the file, and the molecule is not translated in any way.
Files with Other Extensions
If the molecular structure file you want to read does not have one of
the above extensions but conforms to one of these formats, you can
force critic2 to read the file using that particular format. To do
this, you must follow the MOLECULE
keyword with another keyword
specifying the required format. The allowed keywords are:
-
XYZ
: an.xyz
file. -
WFN
: a.wfn
wavefunction file. -
WFX
: a.wfx
wavefunction file. -
FCHK
: a Gaussian formatted checkpoint file (.fchk
). -
MOLDEN
: a Molden file (.molden
). -
DAT
: a psi4 output file (.dat
). -
PGOUT
: a postg output file (.pgout
). -
GAUSSIAN
: a Gaussian output file. -
GJF
: a Gaussian input file. -
GEN
: a DFTB+ structure file (.gen
). -
CUBE
: a cube file. -
BINCUBE
: a binary cube file. -
FHIAIMS_IN
: an FHIaims input file (geometry.in
). -
FHIAIMS_OUT
: an FHIaims output file. -
ORCA
: an ORCA output file.
Manual Specification of the Molecular Structure (MOLECULE Environment)
A molecule can be specified directly in the input using the MOLECULE environment. The atoms can be given in three different ways:
NEQ x.r y.r z.r atom.s [ANG/ANGSTROM] [BOHR/AU]
atom.s x.r y.r z.r [ANG/ANGSTROM] [BOHR/AU]
atnumber.i x.r y.r z.r [ANG/ANGSTROM] [BOHR/AU]
Each of these lines adds one atom to the molecule: the atom can be
given either with the NEQ keyword followed by the position and the
atomic symbol, or by putting the atomic symbol or the atomic number in
the first field. The position (x.r
, y.r
, z.r
) must be given in
Cartesian coordinates. The units default to angstrom, but can be
changed using the ANG/ANGSTROM and BOHR/AU keywords, and also with the
global UNITS keyword.
The keywords CUBIC (or CUBE) and BORDER set the size and shape of the encompassing supercell. This cell is taken as the minimal encompassing cell plus a default border of 10 angstrom. This value can be changed with the BORDER keyword (units: angstrom by default, unless changed by the global UNITS keyword). The default cell is an orthogonal box: the three axes have different lengths. To make the cell cubic, use the CUBE/CUBIC keyword.
The Molecular Library (MOLECULE LIBRARY)
A library of molecular structures is provided with critic2, and can be accessed using the MOLECULE LIBRARY keyword:
MOLECULE LIBRARY h2o
The molecular library file is dat/molecule.dat
, in the root of the
critic2 distribution. The location of the molecular library can be
changed using:
LIBRARY MOLECULE bleh.s
The behavior of the LIBRARY keyword is the same as in CRYSTAL.
The Molecular Cell (MOLCELL)
In molecular calculations, it is convenient to define a region of space away from the molecule that represents infinity. Critical points in this region are discarded because the electron density (and therefore the gradient) is zero everywhere. Gradient paths that reach this region are terminated as if they had diverged to infinity.
In molecular systems, critic2 will reserve some space close to the edges of the cell encompassing the molecule for this region. The remaining (smaller) cell where the molecule is placed is called the “molecular cell”.
When the MOLECULE keyword is used, a molecular cell is automatically set up. By default, the molecular cell is chosen as the minimal encompassing cell for the molecule plus 80% of the border or 2 bohr, whichever is larger. Naturally, the molecular cell can not exceed the actual cell. If the molecular structure is loaded from an external file (xyz, wfn, etc.), then critic2 will set up both the encompassing and the molecular cells correctly. If the structure source is a cube or any other file format in which the encompassing cell is read from the file, it is the users’ responsibility to leave enough room for the molecular cell.
The size of the molecular cell can be changed after the structure is read using the MOLECELL keyword:
MOLCELL [border.r]
The MOLCELL keyword calculates the smallest box encompassing the
molecule and then adds a border to it in order to build the molecular
cell. The border length can be controlled by passing a
numerical argument (border.r
, in the default distance units for the
run, angstrom if you used MOLECULE to read the structure). Using this
keyword only makes sense if the molecule is placed close to the center
of the cell and if there enough vacuum between the molecule and the
cell edges to contain the molecular cell. If no numerical argument is
given, border.r
defaults to 10 angstrom. In order to use MOLCELL,
the input structure needs to be read using the MOLECULE keyword and the
cell needs to be orthogonal.