Atom Expressions
RasMol atom expressions uniquely identify an arbitrary group of atoms
within a molecule. Atom expressions are composed of either primitive
expressions, predefined sets, comparison operators, 'within'
expressions, or logical (boolean) combinations of the above expression
types.
The logical operators allow complex queries to be constructed out of
simpler ones using the standard boolean connectives 'and',
'or' and 'not'. These may be abbreviated
by the symbols "&", "|" and
"!", respectively. Parentheses (brackets) may be used
to alter the precedence of the operators. For convenience, a comma may
also be used for boolean disjunction.
The atom expression is evaluated for each atom, hence 'protein
and backbone' selects protein backbone atoms, not the protein and
[nucleic] acid backbone atoms!
RasMol primitive expressions are the fundamental building blocks of
atom expressions. There are two types of primitive expression. The first
type is used to identify a given residue number or range of residue
numbers. A single residue is identified by its number (position in the
sequence), and a range is specified by lower and upper bounds separated by
a hyphen character. For example 'select 5,6,7,8' is also 'select
5-8'. Note that this selects the given residue numbers in all
macromolecule chains.
The second type of primitive expression specifies a sequence of fields
that must match for a given atom. The first part specifies a residue (or
group of residues) and an optional second part specifies the atoms within
those residues. The first part consists of a residue name, optionally
followed by a residue number and/or chain identifier.
A residue name typically consists of up to three alphabetic characters,
which are case insensitive. Hence the primitive expressions 'SER'
and 'ser' are equivalent, identifying all serine residues.
Residue names that contain non-alphabetic characters, such as sulphate
groups, may be delimited using square brackets, i.e. '[SO4]'.
The residue number is intended to be the residue's position in the
macromolecule sequence, but negative sequence numbers, gaps in numbering,
or even reverse numbering are permitted in the PDB format. Care must be
taken when specifying both residue name and number. If the group at the
specified position isn't the specified residue then no atoms are selected.
The chain identifier is typically a single case-insensitive alphabetic
or numeric character. Numeric chain identifiers must be distinguished or
separated from residue numbers by a colon character. For example, "SER70:A"
for the alphabetic chain identifier, "A", or "SER70:1"
for the numeric chain identifier, "1".
A second colon is used to specify an alternate conformer or an NMR
model. For example the expression "CYS32:A:25.SG"
denotes the gamma sulfur of residue cysteine 32 in chain A of model 25.
The second part consists of a period character followed by an atom
name. An atom name may be up to four alphabetic or numeric characters. An
optional semicolon or a slash followed by a conformation identifier or a
model number may also be appended to the atom name.
An asterisk may be used as a wild card for a whole field and a question
mark as a single character wildcard. The wildcard "*"
may be dropped in residue identifier, e.g. ":A"
for chain A, ":A:4" for chain A of
model 4, or "::4" for all atoms in all
chains of NMR model 4.
Parts of a molecule may also be distinguished using equality,
inequality and ordering operators on their properties. The format of such
comparison expression is a property name, followed by a comparison
operator and then an integer value.
The atom properties that may be used in RasMol are 'atomno'
for the atom serial number, 'elemno' for the atom's atomic
number (element), 'resno' for the residue number, 'radius'
for the spacefill radius in RasMol units (or zero if not represented as a
sphere) and 'temperature' for the PDB isotropic
temperature value.
The equality operator is denoted either "="
or "==". The inequality operator as either
"<>", "!=" or
"/=". The ordering operators are "<"
for less than, "<=" for less than or equal
to, ">" for greater than, and ">="
for greater than or equal to.
Examples: resno < 23
temperature >= 900
atomno == 487
A RasMol 'within' expression allows atoms to be
selected on their proximity to another set of atoms. A 'within'
expression takes two parameters separated by a comma and surrounded by
parentheses. The first argument is an integer value called the
"cut-off" distance of the within expression and the second
argument is any valid atom expression. The cut-off distance is expressed
in either integer RasMol units or Ångstroms containing a decimal point.
An atom is selected if it is within the cut-off distance of any of the
atoms defined by the second argument. This allows complex expressions to
be constructed containing nested 'within' expressions.
For example, the command 'select within(3.2,backbone)'
selects any atom within a 3.2 Ångstrom radius of any atom in a protein or
nucleic acid backbone. 'Within' expressions are
particularly useful for selecting the atoms around an active site.
The following table gives some useful examples of RasMol atom
expressions.
Expression Interpretation
* All atoms
cys Atoms in cysteines
hoh Atoms in heterogeneous water molecules
as? Atoms in either asparagine or aspartic acid
*120 Atoms at residue 120 of all chains
*p Atoms in chain P
*.n? Nitrogen atoms
cys.sg Sulphur atoms in cysteine residues
ser70.c? Carbon atoms in serine-70
hem*p.fe Iron atoms in the Heme groups of chain P
*.*;A All atoms in alternate conformation A
*/4 All atoms in model 4
Examples using combination of basic expressions
backbone and not helix
within( 8.0, ser70 )
not (hydrogen or hetero)
not *.FE and hetero
8, 12, 16, 20-28
arg, his, lys
![](images/dotnul.gif)
|