Reading a Molecular Structure

Loading a Molecular Structure (MOLECULE)

MOLECULE file.xyz [border.r] [CUBIC|CUBE]
MOLECULE file.mol2 [border.r] [CUBIC|CUBE] [name.s]
MOLECULE file.sdf [border.r] [CUBIC|CUBE] [ID id.i]
MOLECULE file.mol [border.r] [CUBIC|CUBE] [ID id.i]
MOLECULE file.pdb [border.r] [CUBIC|CUBE]
MOLECULE file.wfn [border.r] [CUBIC|CUBE]
MOLECULE file.wfx [border.r] [CUBIC|CUBE]
MOLECULE file.fchk [border.r] [CUBIC|CUBE]
MOLECULE file.molden [border.r] [CUBIC|CUBE]
MOLECULE file.molden.input [border.r] [CUBIC|CUBE]
MOLECULE file.log [border.r] [CUBIC|CUBE]
MOLECULE file.{gjf,com} [border.r] [CUBIC|CUBE]
MOLECULE file.zmat [border.r] [CUBIC|CUBE]
MOLECULE file.dat [border.r] [CUBIC|CUBE]
MOLECULE file.out [border.r] [CUBIC|CUBE]
MOLECULE file.pgout [border.r] [CUBIC|CUBE]
MOLECULE file.gen [border.r] [CUBIC|CUBE]
MOLECULE file.cube
MOLECULE file.bincube
MOLECULE file.{in,in.next_step} # (geometry.in, FHIaims input)
MOLECULE file.{out,own} # (FHIaims output)
MOLECULE file.cif
MOLECULE ...
MOLECULE [CIF|SHELX|21|CUBE|BINCUBE|WIEN|ABINIT|ELK|QE_IN|QE_OUT|CRYSTAL|XYZ|MOL2|WFN|WFX|
          FCHK|MOLDEN|GAUSSIAN|GJF|ZMAT|SIESTA|XSF|GEN|VASP|PWC|AXSF|DAT|PGOUT|ORCA|DMAIN|
          FHIAIMS_IN|FHIAIMS_OUT|FRAC] ...
MOLECULE
  NEQ x.r y.r z.r atom.s [ANG/ANGSTROM] [BOHR/AU]
  atom.s x.r y.r z.r [ANG/ANGSTROM] [BOHR/AU]
  atnumber.i x.r y.r z.r [ANG/ANGSTROM] [BOHR/AU]
  CUBIC|CUBE
  BORDER border.r
ENDMOLECULE/END
MOLECULE LIBRARY label.s

Critic2 can be used for gas-phase (isolated) molecules as well as crystals. A molecular structure is loaded using the MOLECULE keyword. The MOLECULE keyword is most often used for loading an xyz or a similar format generated by a program that works natively with gas-phase molecules (Gaussian, psi4, etc.). However, the molecular geometry can also be given directly in the input using the MOLECULE/ENDMOLECULE environment or using any of the formats typically employed for crystals (cube, cif, scf.in, etc.). All file formats valid in the CRYSTAL keyword are also allowed in MOLECULE. As in the CRYSTAL case, the file extension is used to interpret the file format.

Because critic2 always works under periodic boundary conditions, it does the analysis of molecular structures by placing the molecule at the center of a supercell large enough to contain it plus a border. Provided the size of the vacuum is large enough, the results of the analysis should be correct. The use of MOLECULE instead of CRYSTAL changes some of the default behavior in critic2. Namely:

  • The default distance units in input and output are angstrom instead of bohr (use the UNITS keyword to change this behavior). In particular, this applies to the Cartesian coordinates for the atoms in the MOLECULE environment and to the argument for BORDER. In the case of xyz, mol2, sdf/mol, wfn, wfx, fchk, dat, out, pgout, molden, molden.input, gen, and cube files, the Cartesian coordinate system in input and output is the same as in the original file. The “Input orientation” is read from Gaussian log (output) files.

  • The use of symmetry is automatically deactivated. All molecular structures are run in the P1 space group (equivalent to the C1 point group).

  • The default critical point search seeding strategy in AUTO is modified. In a crystal, a recursive subdivision of a symmetry-reduced portion of the Wigner-Seitz cell is used (SEED WS with DEPTH 1). In a molecule, the default is to seed at the center of every interatomic line between atom pairs less than 15 bohr apart (SEED PAIR).

  • In addition to the supercell, a second smaller cell is defined, the “molecular cell”. The molecular cell can be visualized by using the MOLCELL keyword in WRITE or CPREPORT. The region outside the molecular cell is assumed to be vacuum outside molecular space. Any CPs found outside the molecular cell are discarded, and all downwards gradient paths that exit the molecular cell are assumed to have diverged to infinity.

A simple example of MOLECULE input and the corresponding output generated by critic2 can be found here. Multiple molecular and crystal structures can be read in succession, same as in CRYSTAL.

Molecular File Formats (xyz, wfn, wfx, log, gjf, zmat, com, fchk, dat, out, pgout, molden, molden.input)

A gas-phase molecule can be input using the following format:

  • An xyz file.

  • Gaussian wavefunction (wfn/wfx) file.

  • Gaussian output (log) file.

  • Gaussian input file (gjf, com). Only simple input files are interpreted correctly. The molecular geometry is read from the text block after the second blank line. The first line (charge and multiplicity) is skipped and the rest are intrepreted as:
    at.s x.r y.r z.r
    

    where at.s is the atomic symbol and the rest of the fields are the atomic coordinates in angstrom.

  • Z-matrix file format (zmat). This file contains the z-matrix of the molecule line by line in Gaussian format, optionally with the charge and multiplicity in the first line.

  • Gaussian formatted checkpoint file (fchk).

  • psi4 output file (dat).

  • orca output file (out).

  • postg output file (pgout).

  • molden format (psi4, ADF, orca, etc.). The molden.input extension is the same as molden (used by orca).

The input molecule is enclosed in a box that is larger (default: 10 angstrom) in all directions than the minimal box encompassing the molecule. If the CUBIC (or CUBE) keyword is given, then a cubic supercell is used. The width of the vacuum around the molecule can be changed with the optional border.r argument (by default in angstrom, the units can be changed with the UNITS keyword). The molecule is automatically translated to the center of the supercell. The transformation from fractional coordinates referred to the encompassing cell to Cartesian coordinates is made so that the latter correspond to the original coordinate system in the input file. A molecular cell is chosen following the default procedure, see below.

TRIPOS/SYBYL mol2 format (mol2)

The TRIPOS/SYBYL mol2 is a molecular format. A mol2 file may contain one or more molecule specifications. If no further information is given, critic2 reads the first molecule in the mol2 file. If the optional argument name.s is given, read the molecule with that name (the name is the line after @<TRIPOS>MOLECULE). name.s is case-sensitive.

sdf/mol format (mol,sdf)

The MOL and SDF are molecular file formats used by MDL Information Systems (now BIOVIA). There are two different formats these files can use: V2000 and V3000, indicated at the end of the the fourth line in the file. Both are supported by critic2. A mol/sdf file may contain one or more molecule specifications. Each molecule is separated by $$$$. If no further information is given, critic2 reads the first molecule in the file. If the optional keyword ID is given with a positive integer argument id.i, read the molecule number id.i from the file.

pdb format (pdb)

This is a molecular format used by the Protein Data Bank (PDB) for biological molecules. Critic2 reads the molecular structure from the ATOM and HETATM records.

DFTB+ gen Format (gen)

A molecule can be read in DFTB+’s gen file format. If no lattice vectors are provided, the optional border.r and CUBIC (or CUBE) keyword can be used to control the size and shape of the encompassing cell. The same considerations as for xyz files apply. The coordinates in critic2’s input and output are the same as in the gen file.

See the DFTB+ example for worked out cases.

Cube Files (cube, bincube)

Cube files are also often used to describe molecular structures. For instance, the cube files generated by Gaussian’s cubegen program. As in the case of xyz files, the Cartesian coordinate system in the rest of the input and in the output is chosen so that it is the same as in the cube file.

Note that, contrary to xyz files, critic2 does not choose the size and shape of the encompassing cell; the cell is given by the cube file. Hence, the molecule is not translated by critic2, and it should be centered for MOLECULE to work correctly.

Critic2 can be used to convert cube files to binary format in order to save disk space and reading/writing time. Binary cube files have extension .bincube, and contain essentially the same information as a usual cube file.

FHIaims Inputs and Outputs (in, in.next_step, out, own)

Molecular (and crystal) structures can be loaded from an FHIaims “geometry.in” input file. Alternatively, you can also load the structure from the “geometry.in.next_step” file written by FHIaims during a geometry optimization.

The molecular structure can also be loaded from an FHIaims output file, which is assumed to have a .out or .own extension. In the case of a geometry optimization, the last available geometry in the output file is read.

Other Crystallographic Formats (cif, scf.in,…)

All CRYSTAL keywords can be replaced by MOLECULE and viceversa, with the effect discussed above. The behavior of MOLECULE in this case is essentially the same as in the case of a cube file: the encompassing cell is taken from the file, and the molecule is not translated in any way.

Files with Other Extensions

If the molecular structure file you want to read does not have one of the above extensions but conforms to one of these formats, you can force critic2 to read the file using that particular format. To do this, you must follow the MOLECULE keyword with another keyword specifying the required format. The allowed keywords are:

  • XYZ: an .xyz file.

  • WFN: a .wfn wavefunction file.

  • WFX: a .wfx wavefunction file.

  • FCHK: a Gaussian formatted checkpoint file (.fchk).

  • MOLDEN: a Molden file (.molden).

  • DAT: a psi4 output file (.dat).

  • PGOUT: a postg output file (.pgout).

  • GAUSSIAN: a Gaussian output file.

  • GJF: a Gaussian input file.

  • GEN: a DFTB+ structure file (.gen).

  • CUBE: a cube file.

  • BINCUBE: a binary cube file.

  • FHIAIMS_IN: an FHIaims input file (geometry.in).

  • FHIAIMS_OUT: an FHIaims output file.

  • ORCA: an ORCA output file.

Manual Specification of the Molecular Structure (MOLECULE Environment)

A molecule can be specified directly in the input using the MOLECULE environment. The atoms can be given in three different ways:

NEQ x.r y.r z.r atom.s [ANG/ANGSTROM] [BOHR/AU]
atom.s x.r y.r z.r [ANG/ANGSTROM] [BOHR/AU]
atnumber.i x.r y.r z.r [ANG/ANGSTROM] [BOHR/AU]

Each of these lines adds one atom to the molecule: the atom can be given either with the NEQ keyword followed by the position and the atomic symbol, or by putting the atomic symbol or the atomic number in the first field. The position (x.r, y.r, z.r) must be given in Cartesian coordinates. The units default to angstrom, but can be changed using the ANG/ANGSTROM and BOHR/AU keywords, and also with the global UNITS keyword.

The keywords CUBIC (or CUBE) and BORDER set the size and shape of the encompassing supercell. This cell is taken as the minimal encompassing cell plus a default border of 10 angstrom. This value can be changed with the BORDER keyword (units: angstrom by default, unless changed by the global UNITS keyword). The default cell is an orthogonal box: the three axes have different lengths. To make the cell cubic, use the CUBE/CUBIC keyword.

The Molecular Library (MOLECULE LIBRARY)

A library of molecular structures is provided with critic2, and can be accessed using the MOLECULE LIBRARY keyword:

MOLECULE LIBRARY h2o

The molecular library file is dat/molecule.dat, in the root of the critic2 distribution. The location of the molecular library can be changed using:

LIBRARY MOLECULE bleh.s

The behavior of the LIBRARY keyword is the same as in CRYSTAL.

The Molecular Cell (MOLCELL)

In molecular calculations, it is convenient to define a region of space away from the molecule that represents infinity. Critical points in this region are discarded because the electron density (and therefore the gradient) is zero everywhere. Gradient paths that reach this region are terminated as if they had diverged to infinity.

In molecular systems, critic2 will reserve some space close to the edges of the cell encompassing the molecule for this region. The remaining (smaller) cell where the molecule is placed is called the “molecular cell”.

When the MOLECULE keyword is used, a molecular cell is automatically set up. By default, the molecular cell is chosen as the minimal encompassing cell for the molecule plus 80% of the border or 2 bohr, whichever is larger. Naturally, the molecular cell can not exceed the actual cell. If the molecular structure is loaded from an external file (xyz, wfn, etc.), then critic2 will set up both the encompassing and the molecular cells correctly. If the structure source is a cube or any other file format in which the encompassing cell is read from the file, it is the users’ responsibility to leave enough room for the molecular cell.

The size of the molecular cell can be changed after the structure is read using the MOLECELL keyword:

MOLCELL [border.r]

The MOLCELL keyword calculates the smallest box encompassing the molecule and then adds a border to it in order to build the molecular cell. The border length can be controlled by passing a numerical argument (border.r, in the default distance units for the run, angstrom if you used MOLECULE to read the structure). Using this keyword only makes sense if the molecule is placed close to the center of the cell and if there enough vacuum between the molecule and the cell edges to contain the molecular cell. If no numerical argument is given, border.r defaults to 10 angstrom. In order to use MOLCELL, the input structure needs to be read using the MOLECULE keyword and the cell needs to be orthogonal.

Updated: