The program gmconvert converts a various model of molecules (such as atomic model, 3D density map) into GMM (gaussian mixture model). EM (expectation maximization) algorithm is employed for covertion into GMM. The program gmconvert also has many other useful functions to handle GMM.
The source code of gmconvert is written in C assuming the compiler "gcc" in Linux environment. After you download the file "gmconvert-src-[date].tar.gz", just type following commands:
tar zxvf gmconvert-src-[date].tar.gz cd src makeThen you will find the execute file "gmconvert" in the upper directory (../src).
Both atom model and 3D density map can be conveted in GMM.
gmconvert -ipdb [atomic model in pdb] -ogmm [output GMM file] -ng [number of Gaussian functions]
gmconvert -icif [atomic model in mmCIF] -assembly [assembly_id] -ogmm [output GMM file] -ng [number of Gaussian functions]
gmconvert -imap [3D density map] -ogmm [output GMM file] -ng [number of Gaussian functions]
gmconvert -igmm [input GMM file] -omap [output 3D density map] -gw [grid_width]
gmconvert -igmm [input GMM file] -opdb [wireframe model in PDB] -gw [grid_width]
gmconvert -igmm [input GMM file] -owrl [surface/wireframe model in VRML] -gw [grid_width]
gmconvert -igmm [input GMM file] -oewrl [ellipsoidal model in VRML]
gmconvert -igmm [input GMM file] -oobj [surface model in obj] -gw [grid_width]
gmconvert -imap [input 3D density map] -opdb [wireframe model in PDB] gmconvert -imap [input 3D density map] -owrl [surface/wireframe model in VRML] gmconvert -imap [input 3D density map] -oobj [surface in Object]
gmconvert -ipdb [atomic model in pdb] -omap [output 3D density map] -reso [resolution]
gmconvert -igmm [GMMfile] -ogmm [output transformed GMM file] -ipdb [original PDBfile] -itpdb [target PDB file]
-ng : number of Gaussian distribution functions.
-emalg: Algorithm type for EM-algorithm. The default is -emalg G
.
-emalg P
: Point-input EM. A set of 3D points is employed as observed inputs for the EM algorithm.
In the case of atomic model, the centers of heavy atoms are used as the input 3D points.
In the case of 3D density map, the position of grids are used as the input 3D points with their density values.
-emalg G
: GMM-input EM. A set of 3D Gaussian distribution function(GDFs)
is employed as observed inptus for the EM algorithm.
In the case of atomic model, one isotropic GDF is assigned to each heavy atom. Its center is the center of the heavy atom.
The variance of the GDF is rr2var
* [radius] * [radius], where [radius] is the atomic radius.
In the case of 3D densitymap, one isotropic GDF is assigned to each grid. Its center is the position of the grid.
The variance of the GDF is ww2var
* [grid_width] * [grid_width].
-emalg O
: one-to-one_atom/grid. It simply assigns one GDF to one atom or one grid. It does not perform any modification by the EM algorithm.
-I
: Initialization of GMM. 'K'-means, 'R'andom 'O':one-to-one_atom/voxel [K]
-delzw
: Delete Zero-weight gdfs from the GMM. ('T' or 'F') [T]
-delid
: Delete identical gdfs in the GMM. ('T' or 'F') [T]
Options to convert atomic model into GMM
- How to restrict atoms in the PDB file ?
-hetatm
: Read HETATM ('T' or 'F') [F].
The defaul is -hetatm F
, it means the program only read "ATOM" line.
-ch
: Chain ID. (or 'auth_asym_id' in mmCIF).
The defaul is -ch -
, it means the program reads all the chains in PDB file/mmCIF file.
-assembly
: assembly_id for mmCIF file (-icif) [].
The assembly_id
in mmCiF file (such as 1,2,3,PAU,XAU,..) can be assingned.
The program performs symmetrc operations to asymmetric unit to generate XYZ coordinates of the assembly.
If the option assembly_id
is not assigned, the program use the asymmetric unit.
-atmsel
: Atom selection. 'A'll atom except hydrogen, 'R'esidue-based (only ' CA ' and ' P ') [A]
-maxatm
: maximum allowed number of atoms for '-atmsel A'. If over '-minatmA', then change '-atmsel R'.[-1]
-model
: 'S':read only single model (for NMR). 'M':read multiple models (for biological unit) [S]
- Other options for changing atom to gdf.
-atmrw
: Model for radius and weight. 'A':atom model, 'R':residue model, 'U':uniform raidus/weight,'C':decide from content. [C]
-varatm
: Variance type for atom (for -emalg G). 'A': var = rr2var * Rvdw*Rvdw for each atom, 'R': var = (resoatm/2.0)^2.[A]
-rr2var
: Constant for variance = Const * Rvdw*Rvdw for -emalg G -varatm A. Default is 1/5. [0.200000].
-resoatm
: Resolution for atom for -emalg G -varatm R. [0.000000].
-radtype
: radius type for '-atmrw A'. V:van der Waals radius, C:covalent radius [V]
-raduni
: radius for uniform model for '-atmrw U'. [1.900000]
-ccatm
: Calculation Corr Coeff bwn Atoms and GMMs.(It takes times..) (T or F)[F]
Options to convert 3D density map into GMM
- How to assign thereshold value of 3D density map ?
The options -zth
and -zsd
specify the threshold value of 3D density map.
In order to get the proper GMM of the map, you should assign a proper threshold value, such as the 'ContourLevel' value in EMDB entry.
-zth : if density < [-zth], it is regarded as zero density. [-1.000000]
If a density of a voxel is less than the -zth
value, its density is assigned as zero.
After that, the voxel is regarded as the place where no atom exists.
Only positive -zth
value is meaningful. The negative -zth
value will be ignored.
-zsd : if density < MEAN + [-zsd]*SD, it is regarded as zero density. [3.000000]
If you do not know the proper threshold value, the statistics of the density map will help you.
If the option -zsd
is assigned, the threshold value is [MEAN of density] + [-zth]
* [SD of density].
Only positive -zsd
value is meaningful. The negative -zsd
value will be ignored.
If both -zth
and -zsd
are positive, the option -zth
has a priority.
- How to fastly generate GMM for 3D density map ?
A computational time for the EM algorithm is roughly proportional to [Number of data points] x [Number of Gaussian functions].
It means that a 256x256x256 map requires at least 64 times longer computatioanl time than the 64x64x64 map does.
If you want to speed up the computation by decreasing resolution of density map, we recommend to use the '-redsize
' option.
-redsize : reducing size scale (2,3,4,...) [1]
If you add '-redsize 2
', the program transforms 256x256x256 map into 128x128x128 map. If you add '-redsize 4
', the program transforms 256x256x256 map into 64x64x64 map.
A similar reduction of voxel size can be done by the option '-maxsize
':
-maxsize : maximum voxel size of each axis(if over, reducing size). [-1]
If total number of the voxels s over the -maxsize
^{3}, the program automatically set up the -redsize
value
so that number of voxels is not over the -maxsize
^{3}. For example, if the 180x180x120 map is given with the option '-maxsize 64
', the program
set up the -redsize
as 3, then the map becomes 60x60x40.
Options for making surface/wireframe model
- How to determine threshold density value for surface/wireframe model ?
We parepare four methods to detemine threshold density values.
The reason why name of these four option has 'mc', we employ the Marching Cube algorithm
to convert a density map into a surface/wireframe model.
-mcth: raw density value for threshold density [-1000.000000]
-mcsd: SD value for threshold density [3.000000]
-mcnv: number for voxel for threshold density [-1]
-mcvo: volume (A^3) for threshold density [-1.000000]
These four options are ignored if a minus value is assigned.
In the default setting, the option -mcsd 3.0
is assigned, it means that
the threshold density map := [average density] + 3.0 * [standard deviation of density].
If the putative volume of the molecule is known, you can assign the volume by the option -mcvo
,
in the angstrom cubed unit.
- How to control the quality of surface/wireframe model ?
The value of grid width controls the quality of surface/wireframe model,
assigned by the option -gw
. The default value is 4 angstrom.
-gw : grid width (angstrom) [4.000000]
If you output in the VRML format (-owrl
), you can chose surface model by the option -mcSW S
.
-mcSW : model type. 'S'urface, 'W'ireframe [W]
In the surface VRML model, you can assign the color of surface by the option -mc RGBT
.
-mcRGBT: RGBT string (red:blue:green:transparency) [0:1:0:0]
File format For GMM
An example of GMM (PDBcode:1omp, number of gaussian = 2) is shown as follows:
HEADER 3D Gaussian Mixture Model
REMARK COMMAND gmconvert -ipdb /DB/PDBv3/om/pdb1omp.ent -ng 2 -ogmm 1omp_g2.gmm
REMARK START_DATE Feb 16,2013 16:40:19
REMARK END_DATE Feb 16,2013 16:40:19
REMARK COMP_TIME_SEC 0.024982 2.498198e-02
REMARK FILENAME 1omp_g2.gmm
REMARK NGAUSS 2
HETATM 1 GAU GAU A 1 1.491 -5.359 -14.106 0.473 0.473
REMARK GAUSS 1 W 0.4731064444
REMARK GAUSS 1 M 1.491210 -5.358893 -14.106107
REMARK GAUSS 1 CovM xx 68.3302668925 xy 1.4432837673 xz 20.6877652908
REMARK GAUSS 1 CovM yy 91.2978302824 yz -0.6854124916 zz 78.2751151639
HETATM 2 GAU GAU A 2 -1.303 4.885 12.899 0.527 0.527
REMARK GAUSS 2 W 0.5268935556
REMARK GAUSS 2 M -1.302614 4.884799 12.899464
REMARK GAUSS 2 CovM xx 99.7411943616 xy -54.9586151001 xz 7.6722146525
REMARK GAUSS 2 CovM yy 99.9101607174 yz -11.9555659402 zz 76.8822244943
TER
This is a pseudo-PDB format. If the molecular viewer program opens it as the PDB format,
it reads only "HETATM" lines which describe centers of each Gaussian distribution function.
However, the important information of this file is described in "REMARK" lines.
Gaussian Mixture Model is the weighted sum of Gaussian Distribution Functions (GDFs).
Its parameters are Ngauss(Number of GDFs) and Ngauss sets of
{ weight, center postiion(x,y,z), covariance matrix(3x3)}.
The covariance matrix (CovM) is a 3x3 symmetric matrix, it requires only six parameters(xx,xy,xz,yy,yz,zz).
These paramerers are described in a following format:
REMARK NGAUSS [Number of GDFs for GMM]
REMARK GAUSS [GDFnumber] W [Weight for GDF]
REMARK GAUSS [GDFnumber] M [Center position of GDF (x y z) ]
REMARK GAUSS [GDFnumber] CovM xx [xx of CovM] xy [xy of CovM] xz [xz of CovM]
REMARK GAUSS [GDFnumber] CovM yy [yy of CovM] yz [yz of CovM] zz [zz of CovM]
Reference
Kawabata, T.
Multiple subunit fitting into a low-resolution density map of a macromolecular complex using a gaussian mixture model.
Biophys J 2008 Nov 15;95(10):4643-58.
[PubMed]