RasTop Manual

Home | All Commands | Notice

sigle.gif (4112 bytes) bannerrms2.gif (19906 bytes)

Home

About help Copyrights License Installing & using RasTop Library
dotnul.gif (807 bytes)

back.gif (141 bytes)File Formats

PDB files
arrow1.gif (162 bytes)RasMol and PDB
arrow1.gif (162 bytes)NMR models
arrow1.gif (162 bytes)PDB colours
CIF and mmCIF Files

RasMol Scripts
arrow1.gif (162 bytes)Inline scripts
arrow1.gif (162 bytes)RSM scripts
arrow1.gif (162 bytes)Clipboard formats

 

Protein Data Bank Files

 

If you do not have the PDB documentation, you may find the following summary of the PDB file format useful. The Protein Data Bank is a computer-based archival database for macromolecular structures. The database was established in 1971 by Brookhaven National Laboratory, Upton, New York, as a public domain repository for resolved crystallographic structures. The Bank uses a uniform format to store atomic coordinates and partial bond connectivities as derived from crystallographic studies. In 1999 the Protein Data Bank moved to the Research Collaboratory for Structural Biology.

PDB file entries consist of records of 80 characters each. Using the punched card analogy, columns 1 to 6 contain a record-type identifier, the columns 7 to 70 contain data. In older entries, columns 71 to 80 are normally blank, but may contain sequence information added by library management programs. In new entries conforming to the 1996 PDB format, there is other information in those columns. The first four characters of the record identifier are sufficient to identify the type of record uniquely, and the syntax of each record is independent of the order of records within any entry for a particular macromolecule.

 

The only record types that are of major interest to the RasMol program are the ATOM and HETATM records which describe the position of each atom. ATOM/HETATM records contain standard atom names and residue abbreviations, along with sequence identifiers, coordinates in Ångstrom units, occupancies and thermal motion factors. The exact details are given below as a FORTRAN format statement. The "fmt" column indicates use of the field in all PDB formats, in the 1992 and earlier formats or in the 1996 and later formats.

FORMAT(6A1,I5,1X,A4,A1,A3,1X,A1,I4,A1,3X,3F8.3,2F6.2,1X,I3,2X,A4,2A2)

Column Content fmt
1-6 'ATOM' or 'HETATM' all
7-11 Atom serial number (may have gaps) all
13-16 Atom name, in IUPAC standard format all
17 Alternate location indicator indicated by A, B or C all
18-20 Residue name, in IUPAC standard format all
23-26 Residue sequence number all
27 Code for insertions of residues (i.e. 66A & 66B) all
31-38 X coordinate all
39-46 Y coordinate all
47-54 Z coordinate all
55-60 Occupancy all
61-66 Temperature factor all
68-70 Footnote number 92
73-76 Segment Identifier (left-justified) 96
77-78 Element Symbol (right-justified) 96
79-80 Charge on the Atom 96

 

 

Residues occur in order starting from the N-terminal residue for proteins and 5'-terminus for nucleic acids. If the residue sequence is known, certain atom serial numbers may be omitted to allow for future insertion of any missing atoms. Within each residue, atoms are ordered in a standard manner, starting with the backbone (N-C-C-O for proteins) and proceeding in increasing remoteness from the alpha carbon, along the side chain.

HETATM records are used to define post-translational modifications and cofactors associated with the main molecule. TER records are interpreted as breaks in the main molecule's backbone.

If present, RasMol also inspects HEADER, COMPND, HELIX, SHEET, TURN, CONECT, CRYST1, SCALE, MODEL, ENDMDL, EXPDTA and END records. Information such as the name, database code, revision date and classification of the molecule are extracted from HEADER and COMPND records, initial secondary structure assignments are taken from HELIX, SHEET and TURN records, and the end of the file may be indicated by an END record.


up.gif (892 bytes) RasMol Interpretation of PDB fields

Atoms located at 9999.000, 9999.000, 9999.000 are assumed to be Insight pseudo atoms and are ignored by RasMol. Atom names beginning ' Q' are also assumed to be pseudo atoms or position markers.

When a data file contains an NMR structure, multiple conformations may be placed in a single PDB file delimited by pairs of MODEL and ENDMDL records. RasMol displays all the NMR models contained in the file.

Residue names "CSH", "CYH" and "CSM" are considered pseudonyms for cysteine "CYS". Residue names "WAT", "H20", "SOL" and "TIP" are considered pseudonyms for water "HOH". The residue name "D20" is consider heavy water "DOD". The residue name "SUL" is considered a sulphate ion "SO4". The residue name "CPR" is considered to be cis-proline and is translated as "PRO". The residue name "TRY" is considered a pseudonym for tryptophan "TRP".

RasMol uses the HETATM fields to define the sets hetero, water, solvent and ligand. Any group with the name "HOH", "DOD", "SO4" or "PO4" (or aliased to one of these names by the preceding rules) is considered a solvent and is considered to be defined by a HETATM field.

RasMol only respects CONECT connectivity records in PDB files containing fewer than 256 atoms. This is explained in more detail in the section on determining molecule connectivity. CONECT records that define a bond more than once are interpreted as specifying the bond order of that bond, i.e. a bond specified twice is a double bond and a bond specified three (or more) times is a triple bond. This is not a standard PDB feature.


up.gif (892 bytes) PDB Colour Scheme Specification

RasMol also accepts the supplementary COLO record type in the PDB files. This record format was introduced by David Bacon's Raster3D program for specifying the colour scheme to be used when rendering the molecule. This extension is not currently supported by the PDB. The COLO record has the same basic record type as the ATOM and HETATM records described above.

Colours are assigned to atoms using a matching process. The Mask field is used in the matching process as follows. First RasMol reads in and remembers all the ATOM, HETATM and COLO records in input order. When the user-defined ('User') colour scheme is selected, RasMol goes through each remembered ATOM/HETATM record in turn, and searches for a COLO record that matches in all of columns 7 through 30. The first such COLO record to be found determines the colour and radius of the atom.

Column Content
1-6 'COLOR' or 'COLOUR'
7-30 Mask (described below)
31-38 Red component
39-46 Green component
47-54 Blue component
55-60 Sphere radius in Ångstroms
61-70 Comments
 

Note that the Red, Green and Blue components are in the same positions as the X, Y, and Z components of an ATOM or HETA record, and the van der Waals radius goes in the place of the Occupancy. The Red, Green and Blue components must all be in the range 0 to 1.

 

In order that one COLO record can provide colour and radius specifications for more than one atom (e.g. based on residue, atom type, or any other criterion for which labels can be given somewhere in columns 7 through 30), a 'don't-care' character, the hash mark "#" (number or sharp sign) is used. This character, when found in a COLO record, matches any character in the corresponding column in a ATOM/HETATM record. All other characters must match identically to count as a match. As an extension to the specification, any atom that fails to match a COLO record is displayed in white.


up.gif (892 bytes) Multiple NMR Models

RasMol loads all of the NMR models from a PDB file no matter which command is used: 'load pdb <filename>' or 'load nmrpdb <filename>'

Once multiple NMR conformations have been loaded they may be manipulated with the atom expression extensions described in 'Primitive Expressions'. In particular, the command 'restrict */1' will restrict the display to the first model only.

 

CIF and mmCIF Format Files

up.gif (892 bytes)

 

CIF is the IUCr standard for presentation of small molecules and mmCIF is intended as the replacement for the fixed-field PDB format for presentation of macromolecular structures. RasMol can accept data sets in either format.

There are many useful sites on the World Wide Web where information tools and software related to CIF, mmCIF and the PDB can be found. The following are good starting points for exploration:

The International Union of Crystallography (IUCr) provides access to software, dictionaries, policy statements and documentation relating to CIF and mmCIF at: IUCr, Chester, England (www.iucr.org/iucr-top/cif/) with many mirror sites.

The Nucleic Acid Database Project provides access to its entries, software and documentation, with an mmCIF page giving access to the dictionary and mmCIF software tools at Rutgers University, New Jersey, USA (http://ndbserver.rutgers.edu/NDB/mmcif) with many mirror sites.

This version of RasMol restricts CIF or mmCIF tag values to essentially the same conventions as are used for the fixed-field PDB format. Thus chain identifiers and alternate conformation identifiers are limited to a single character, atom names are limited to 4 characters, etc. RasMol interprets the following CIF and mmCIF tags:

A search is made through multiple data blocks for the desired tags, so a single dataset may be composed from multiple data blocks, butmultiple data sets may not be stacked in the same file.

mmCIF tag CIF tag Used for
_struct_biol.details Info.classification
_database_2.database_code Info.identcode
_entry.id  
_struct_biol.id  
_struct.title  Info.moleculename
_chemical_name_common  
_chemical_name_systematic  
_chemical_name_mineral  
_symmetry.space_group_name_H-M _symmetry_space_group_name_H-M Info.spacegroup
_cell.length_a _cell_length_a Info.cell
_cell.length_b _cell_length_b  
_cell.length_c _cell_length_c  
_cell.angle_alpha _cell_angle_alpha  
_cell.angle_alpha _cell_angle_alpha  
_cell.angle_beta _cell_angle_beta  
_cell.angle_gamma _cell_angle_gamma  
_atom_sites.fract_transf_matrix[1][1] _atom_sites_fract_tran_matrix_11 Used to compute orthogonal coords
... ...  
_atom_sites.fract_transf_vector[1] _atom_sites_fract_tran_vector_1  
... ...  
_atom_sites.cartn_transf_matrix[1][1] _atom_sites_cartn_tran_matrix_11 Alternative to compute orth. coords
... ...  
_atom_sites.cartn_transf_vector[1] _atom_sites_cartn_tran_vector_1  
... ...  
_atom_site.cartn_x _atom_site_cartn_x atomic coordinates
... ...  
or  
_atom_site.fract_x _atom_site_fract_x  
... ...  
_struct_conn.id bonds
...  
_geom_bond.atom_site_id_1 _geom_bond_atom_site_label_1  
... ...  
_struct_conf.id helices, sheets, turns
_struct_sheet_range.id  

 

RasMol Script Format

 

up.gif (892 bytes)Script Files

A script is a text file containing a set of commands that are executed sequentially by RasMol using the 'script' command. The RasMol command 'source' is synonymous with the 'script' command.

All command are allowed in a script file; this includes the 'load' command to load a molecular coordinate file during this execution of the script. Be sure in this case to write a 'zap' command just before to erase any molecule already loaded.

Scripts may be written with a simple text editor or automatically generated using the 'write script' or 'write rasmol'  command to describe a current representation.

A RasMol script file may contain a further 'script' command up to a maximum "depth" of 10, allowing complicated sequences of actions to be executed. This is particularly useful for teaching purpose in combination with the 'pause' command to interrupt transiently the script execution.

The 'echo' command produces a text output on the command line allowing online annotation. This command is particularly useful when preceding a 'pause' command for example.

RasMol ignores all characters after the first '#' character on each line allowing the script text files to be annotated.

The 'refresh' command refreshes the current drawing.  

The 'exit' command executed within a script terminates its current execution. The command applies for the current 'script level' only and is useful for mixing molecular coordinate files with script files (see below inline script).


up.gif (892 bytes) Inline Script

Inline script files combine scripting information with molecular coordinates in a unique file. 

The format is typically:

#!rasmol -script (optional)
zap
load <fileformat> inline
...
exit
<molecular data>

Typically this is used in the command 'load pdb inline'. The 'exit' command terminates execution of the current script and returns control to the command line (or the calling script). This means any lines following 'exit' are never interpreted by RasMol. These may be used to store atomic coordinates in PDB, CIF or mmCIF file format.


up.gif (892 bytes) RSM Scripts

RSM scripts are RasTop specific.

RSM scripts are an attempt to link the scripting information with the molecular coordinates in a unique file in a slightly different way from inline scripts. The term RSM files with the  <*.rsm> extension is just a convenient way for RasTop to handle these files and does not mean in any way that the molecular coordinates that they contain are written in a specific molecular format. Indeed, the original extension, for example <*.cif> or <*.pdb>, may be conserved.

When a file is opened, RasTop looks for the field '#!rasmol -rsm file'. When found, RasTop looks for the field 'load <format> inline' on the next line. If it finds one, RasTop asks RasMol to reopen the file and to load the molecule under the specified format. Then RasTop comes back to the initial script line and executes the remaining of the script. For compatibility between molecular files and RSM files, it is judicious to stop the molecular file before the start of the script. This is done in pdb files by the keyword END located at the beginning of a new line. The script can make use of the keyword 'exit'. This causes the script to stop and allows the user to put other kind of information at the end of the file. In RasTop 1.3.1 the 'exit' keyword is automatically entered at the end.

Format:

<molecular coordinate data in format>
#!rasmol -rsm file
load <format> inline
<rasmol script to describe the molecule>
exit (optional)

The tag '#!rasmol -rsm file' is a required field and marks the beginning of the script.  The <format> field indicates the format of the molecular coordinates.  The 'exit' command at the end of the script is optional.

World Script:

Starting version 2.0, RasTop saves multiple molecules in a single <RSM> script. The first molecule is saved in the main script <filename.rsm>, molecules of higher index are saved with the same filename appended by the suffix 'wn' where n is the molecule index (filenamew1, filenamew2, etc.). Following is a typical main script annotated for better understanding: 

#!rasmol -rsm file	         #required
zap
set connect on
load <format> inline	#required
set title <title>
# Colour details
<...>
# Transformation
reset
rotate molecule
set picking ident        #some commands are picking mode dependent!
set worlddepth 18984	 #set worlddepth to the largest molecule
# zoom 203.21            #define zoom before translation
scale 37.37              #but use scale indeed, see command description
position x 0.000 y 0.000 z 0.000
<...>

<...>

# Molecules
add <title>w2		#add multiple molecules
add <title>w3
add <title>w4
<...>

# World Transformation
rotate world           #start world loading
centre origin          #only way to make zooming and rotation coherent
set axes world off
set worlddepth 18984
rotate world x 160.64
rotate world y 54.13
rotate world z -158.34
# translate world x -36.79
# translate world y 1.34
position world x -13.749 y 0.500
reset slab
slab off
reset depth
depth off
molecule 2		#select centre in different molecule in this case
centre molecule
molecule 1

exit

up.gif (892 bytes) Clipboard Formats

Clipboard formats are RasTop specific.

RasTop allows on the command line the command 'clipboard image', 'clipboard selected', and 'clipboard position' to copy the current image, current atom selection, or current molecule position to the clipboard, respectively. The two latter are in text format with the following starting tags:

Atom selection:

#!rasmol -selection
<rasmol script to describe the current atom selection>

Position:

#!rasmol -position
<rasmol script to describe the current molecule or world position>

When the world is active, the world position is copied to the clipboard. When the molecule is active, the molecule position is copied to the clipboard.

The command 'clipboard paste' is used to paste the clipboard content.  A different running instance of RasTop may paste the clipboard.

RasTop recognizes the three following tags:

#!rasmol -script
#!rasmol -position
#!rasmol -selection

Therefore, it is possible to copy a script from a text editor and to paste it into RasTop, to the condition that it starts with the '#!rasmol -script' tag.

 

dotnul.gif (43 bytes)

RasTop Help Site - home | notice | copyright