A Formalism of Organic Compounds

Formal Classes of Organic Compounds
Luciana Morogan
No Institute Given
Abstract. abstract
1 Prerequisites
We choose O a nonempty finite alphabet of abstract symbols (called objects)

representing chemicals in an aqueous solution. If we denote by O∗ the set of all
strings of objects in O and by λ ∈ O∗ the empty string, then, mathematically
speaking, O∗ is the free monoid generated by O under the operation of concate-
nation. For x ∈ O∗ we may say that the string of objects x may represent the
chemicals in solution needed in the construction of a well formed molecule. The
notation |x| represents the number of objects occurrences in x (the total number
of the chemicals needed in the formation of a molecule represented by x) and we
called it the length of x. For a ∈ O, the number of occurrences of object a in x
is denoted |x|a and it is called the multiplicity of a in x (or simply the number
of copies of object a in x), |x|a ∈ IN. If we denote with alph(x) (alph(x) ⊆ O)
the set set of symbol objects forming x, then
X
|x| = |x|a .
a∈alph(x)
The length of the string of objects obtained by deleting from x of all symbols
not in U, U ⊆ O is denoted |x|U and
X
|x|U = |x|a .
a∈U
The last working definition given here is that of the Parikh vector of x associated
with alph(x) for alph(x) = {a1 , a2 , . . . , an } , n ∈ IN+ (the order of a1 , a2 , . . . , an
is arbitrary but fixed, we may say that some order it is chosen, no matter which
one) is defined as the function ψalph(x) : O∗ → INn ,
ψalph(x) (x) = (|x|a1 , |x|a2 , . . . , |x|an )
(for a language L ⊆ O∗ we have ψ(L) = {ψ(x)|x ∈ L}).
Similarly, we denote the number of symbol objects occurrences in an aqueous

solution by |O| and for a ∈ O the number of occurances of the object a in solu-
tion it is denoted by |O|a and it is called the multiplicity of a in O. The number
2
notation
|O|a ∈ IN represents the number of copies of a in O. If |O|a = ∗, then we
say that the object a is found in an arbitrary number of copies.
As O is the alphabet objects representing chemicals in an aqueous solution,

what has a major importance is the concentration, the number of copies of each
object. In order to illustrate this, we consider a multiset as the data structure we
will work with. Formally, a mutiset over the alphabet O of obects is a mapping
M : O → IN ∪ {∗}, defined by


 |M |a , if a ∈ O and it represents the number of copies of a into the
multiset M

M (a) =

 *, if there is an arbitrary finite number of copies of a into the
multiset M

We denote by alph(M ) = {a ∈ O|M (a) > 0} the finite alphabet of multiset

M (one may say the support of M ). Generally, we can consider that S ⊆ O is
interpreted as the multiset defined by

∀a ∈ S, S(a) = 1
∀a ∈
/ S, S(a) = 0
(any subset of O is a multiset in which each object element is found in a unique
copy).
For any two multisets M1 , M2 : O → IN ∪ {∗}, we say that M2 is included in

M1 iff M1 (a) ≥ M2 (a), for all a ∈ O. The union of M1 with M2 is the multiset
M1 ∪ M2 : O → IN ∪ {∗}, (M1 ∪ M2 )(a) = M1 (a) + M2 (a), for all a ∈ O (if
M1 (a) = ∗ or M2 (a) = ∗ then (M1 ∪ M2 )(a) = M1 (a) + M2 (a) = ∗, for all
a ∈ O). The difference between M1 and M2 (defined only if M2 is included in
◦
M1 ) is the multiset M1 − M2 : O −→ IN ∪ {∗}, (M1 − M2 )(a) = M1 (a) − M2 (a),
for all a ∈ O, M1 (a) ≥ M2 (a), for all a ∈ O, M1 (a), M2 (a) finite (if M1 (a) = ∗,
then (M1 − M2 )(a) = M1 (a) − M2 (a) = ∗, for all a ∈ O). We define the multiset
M of finite alphabet (support) as M = {(a, M (a))|a ∈ alph(M )}.
Example 1. For alph(M1 ) = alph(M2 ) = {a, b, c ∈ O|Mi (a) > 0, Mi (b) > 0, Mi (c)
> 0, i = {1, 2}} if M1 = {(a, 3), (b, 2), (c, ∗)} then M1 is the multiset over O
with the support alph(M1 ) and consisting in three objects a, b, c such as object
a apearing in three copies, b is found in two copies and c in an arbitrary finite
number of copies in M1 . Similarly, M2 = {(a, 1), (b, 2), (c, 3)}. It can be easily
seen that M2 is included in M1 . As (M1 ∪ M2 )(a) = M1 (a) + M2 (a) = 3 + 1 = 4,
(M1 ∪ M2 )(b) = M1 (b) + M2 (b) = 4 and (M1 ∪ M2 )(c) = M1 (c) + M2 (c) = ∗
the multiset M1 ∪ M2 = {(a, 4), (b, 4), (c, ∗)}. Also, as (M1 − M2 )(a) = M1 (a) −
M2 (a) = 3 − 1 = 2, (M1 − M2 )(b) = M1 (b) − M2 (b) = 0 and (M1 − M2 )(c) =
M1 (c) − M2 (c) = ∗ the multiset M1 − M2 = {(a, 2), (c, ∗)}.
Obviousely, every string x ∈ O∗ describes a multiset Mx over O and we write

Mx = {(a, |x|a )|a ∈ alph(x)}. Let us consider the string x = a1 a2 . . . an ∈ O∗
3
with ψalph(x) (x) = (|x|a1 , |x|a2 , . . . , |x|an ) and Sn the set of all permutations
of n elements. If we choose any permutation π ∈ Sn then we define a map-
not
ping fπ : On → On such as fπ (a1 . . . an ) = aπ(1) . . . aπ(n) = y, y ∈ O∗ with
ψalph(y) (y) = (|y|a1 , |y|a2 , . . . , |y|an ) (y is a permutation of x). As alph(x) =
alph(y), (ψalph(x) (y) = (|y|a1 , |y|a2 , . . . , |y|an )) and (|x|a1 , |x|a2 , . . . , |x|an ) =
(|y|a1 , |y|a2 , . . . , |y|an ) (y is a permutation of x) we have that ψalph(x) (x) =
ψalph(y) (y). So, if Mx = {(a1 , |x|a1 ), (a2 , |x| a2 ), . . . (an , |x|an )} and My = (aπ(1) ,
|y|aπ(1) ), (aπ(2) , |y|aπ(2) ), . . . (aπ(n) , |y|aπ(n) ) and y is obtained by a permutation
of x, then Mx = My . We may say that permutations of the same string x ∈ O∗
are describing exactely the objects in the support of M and their multiplicities
(permutations of the same sting are describing exactley the same multiset over
O).
We will represent the multiset M = {(a1 , M (a1 )), (a2 , M (a2 )), . . . , (an , M (an ))},
M (a ) M (a ) M (a )
with alph(M ) = {a1 , . . . , an }, as the string a1 1 a2 2 . . . an n .
Example 2. Let us consider the string x ∈ O∗ , x = abcaadf f with alph(x) =

a, b, c, d, f . The Parikh vector associated with x over alph(x) is ψalph(x) (x) =
(|x|a , |x|b , |x|c , |x|d , |x|f ) = (3, 1, 1, 1, 2). If we consider the permutation π ∈ S8 ,

12345678
π=
31254876
such as there is a mapping fπ : O8 → O8 , fπ (a1 . . . a8 ) = aπ(1) . . . aπ(8) = y, then

fπ (abcaadf f ) = cabaaf f d = y and the Parikh vector associated with y over
alph(y) = alph(x) is ψalph(y) (y) = (|y|a , |y|b , |y|c , |y|d , |y|f ) = (3, 1, 1, 1, 2). We
have that ψalph(x) (x) = ψalph(y) (y) when y is a permutated string obtaind from x.
The notion of a multiset may be extended for strings. A multiset over O∗ is

a mapping M : O∗ → IN ∪ {∗}, defined by
|M |x , if x ∈ O∗ and it represents the number of copies of the string




x into the multiset M

M (x) =

 *, if there is an arbitrary finite number of copies of the string
x into the multiset M

We denote by alph(M ) = x ∈ O∗ |∃ai ∈ alph(x), i = 1, n, n ∈ IN, alph(x) ⊆ O,

such as M (ai ) > 0 and ψalph(x) (x) = (|x|a1 , . . . , |x|an ) the finite alphabet of
multiset M (one may say the support of M ). Generally, we can consider that
S ⊆ O∗ is interpreted as the multiset defined by

∀x ∈ S, S(x) = 1
∀x ∈
/ S, S(x) = 0
(any subset of O∗ is a multiset in which each string object is found in a unique

copy). The strings x ∈ alph(M ) are legally strings of chemicals in solution needed
in the formation of well formed macro-molecules. The multiset M of a finite
support can be represented either by a set M = {(x, M (x))|x ∈ alph(M )}, either
4
M (x ) M (x ) M (x )
as a string of the form x1 1 x2 2 . . . xn n , for alph(M ) = {x1 , . . . , xn }
and the multiplicities of the objects in the support M (x1 ), M (x2 ), . . . , M (xn ).
All permutations of this string identfie objects in the support (alphabet) of M
and their multiplicities. For two multisets M1 and M2 the inclusion, union and
difference between them are defined as above.
Example 3. The string (ab)3 (abb)2 (aa)∗ describes the multiset of string objects
containing three copies of ab, two copies of abb and an arbitrary number of copies
of aa. Here, alph(M ) = {ab, abb, aa} ∪ {(, )}, where the paranteses are used for
a better representation of string molecules in M .
2 A formalization of primary structures of object

(”organic”) compounds
We construct here a finite set of rules for molecular compunds formation (molecules,
macro-molecules, organic compounds or organic complexes). We consider the sets
of string objects over O∗ recurently defined as fallows: if for all k, i ∈ {1, ..., n}
we denote by Mk (ai ) the multiplicity of ai in Mk for all objects ai ∈ alph(Mk )
with alph(Mk ) ⊆ O∗ , then
n o
M (a )
– M1 = ai 1 i |∀i ∈ {1, ..., n} , ai ∈ alph(M1 ), alph(M1 ) = {a1 , ..., an }
n
M (a ) M (a )
– M2 = ai1 2 i1 ai2 2 i2 |∀i1 , i2 ∈ {1, ..., n} , i1 6= i2 , ai1 , ai2 ∈ alph(M2 ),
alph(M2 ) = {a1 a2 , a1 a3 , ..., an an−1 }}
... n
M (a ) M (a ) M (a )
– Mn = a1 n 1 a2 n 2 ...an n n |∀i ∈ {1, ..., n} , ai ∈ alph(Mn ),
alph(Mn ) = {a1 a2 ...an }}
n say that for all k ∈

One may {1, ..., n}, we define
Mk (aik )
Mk (ai1 ) Mk (ai2 )
Mk = ai1 ai2 ...aik |∀i1 , ..., ik ∈ {1, ..., n} , ai1 , ..., aik ∈ alph(Mk )
and ∀r1 , r2 ∈ {1, ..., n} with r1 6= r2 we have ir1 6= ir2 }. If we denote [M ] =
{Mk |k ∈ {1, .., n} , n ∈ IN, n ≥ 1(f inite), Mk defined as above } (????sau mai
corect ar fi [M ] = {xk ∈ Mk |k ∈ {1, .., n} , n ∈ IN, n ≥ 1(f inite), Mk defined
as above }), then we have [M ]∗ the set of all string objects formed with elements
of Mk for k ∈ {1, ..., n} under the operation of concatenation.
Over [M ]∗ we define a rewriting rule, denoted ”⇒” (⇒⊆ [M ]∗ × [M ]∗ ), such
as for any k1 , k2 ∈ {1, ..., n} with k1 ≥ k2 and Mk1 , Mk2 ⊆ [M ]∗ the pair
(x1 , x2 ) ∈ Mk1 × Mk2 . We say that the string x1 is rewrited into the string x2
and we write x1 ⇒ x2 iff if the string x1 ∈ Mk1 has the form
M (ai1 ) Mk1 (ai2 ) Mk (ai ) Mk (air ) Mk1 (aik )
x1 = ai1 k1 ai2 ...air 1 r1 ...airp 1 p ...aik 1
1 1
such as
not
– there is r1 , r2 , ..., rp ∈ {1, ..., k1 }, 1 ≤ p ≤ k1 with ir1 = ir2 = ... = irp = ir
– for all j ∈ {1, ..., k1 } with j 6= rl for all l ∈ {1, ..., p} we have ij 6= irl
5
then the string x2 ∈ Mk2 has the form
M (ai1 ) M (air ) Mk2 (aik )

x2 = ai1 k2 ...air k2 ...aik 2
2
where
– Mk2 (air ) = Mk1 (air1 ) + Mk1 (air2 ) + ... + Mk1 (airp )

– if for all j1 ∈ {1, ..., k1 } with j1 6= rl for all l ∈ {1, ..., p} and for all j2 ∈
{1, ..., k2 } with j2 6= r for all l ∈ {1, ..., p} we have ij1 = ij2 then Mk1 (aij1 ) =
Mk2 (aij2 )
We will denote the set [M ]∗⇒ = {x2 ∈ Mk2 | there is (x1 , x2 ) ∈ Mk1 × Mk2 for
k1 ≥ k2 and Mk1 , Mk2 ∈ [M ]∗ such as for all x1 ∈ Mk2 , x1 ⇒ x2 }. The set [M ]∗⇒
is called the Compounds Domain and, to simplify the writing, from now on
we will refear to it by the notation CD.
M (a ) M (a2 )
Example 4. To illustrate what we wrote above, let‘s take two strings a1 1 1 , a2 1 ∈
M (a ) M (a )
M1 . Under the concatenation, a new string a1 1 1 a2 1 2 is formed in M2 . An-
M1 (a1 ) M (a ) M (a )
other example: taking the string objects a1 ∈ M1 and a1 2 1 a2 2 2 ∈
M (a ) M (a ) M (a )
M2 , a new formed strig a1 1 1 a1 2 1 a2 2 2 from M3 can be rewrited in
M1 (a1 )+M2 (a1 ) M2 (a2 )
a1 a2 in M2 .
Fallow up, we consider P1 a finite subset of rules in M1 × ... × M1 × CD.
2.1 Molecular formula production rules
Production rules of molecular formulas are elements of the form (x, y) ∈ P1

with x ∈ M1 × ... × M1 (M1 × ... × M1 of m times if we suppose we want to
form a molecular formula in witch composition we need m objects, each of them
representing chemicals in solution) and y ∈ CD. The rules have the form:
M1 /x → M1new /y
where:
n
M (a ) M (a ) M (a ) M (a ) M (a )
– if M1 = a1 1 , ..., ai1 i1 , ..., ai2 i2 , ..., aim im , ...an n |∀k ∈ {1, ..., n} ,
ak ∈ alph(M1 ), alph(M1 ) = {a1 , ..., an }} and x = (aji11 , aji22 , ..., ajim
m
) for jk ∈
j1 j2 jm
IN, 0 ≤ jk ≤ M (aik ) then y = ai1 ai2 ...aim and
n o
M (a ) M (a )−j M (a )−j M (a )−j M (a )
– M1new = a1 1 , ..., ai1 i1 1 , ..., ai2 i2 2 , ..., aim im m , ...an n
If there is ak ∈ M1 such as M1 (ak ) = ∗ then M1 (ak ) − jk = ∗.
We call such elemets y ∈ CD well-formed molecular formulas.

6
2.2 Molecular formula decomposition rules

Decomposition rules of molecular formulas are elements of the form (y, x) ∈ P1
with x ∈ M1 × ... × M1 (M1 × ... × M1 of m times if we suppose that, by
applying this rule, the molecular formula is decomposed in m objects, each of
them representing chemicals in solution) and y ∈ CD. The rules have the form:
M1 /y → M1new /x
where:
n
M (a ) M (a ) M (a ) M (a ) M (a )
– if M1 = a1 1 , ..., ai1 i1 , ..., ai2 i2 , ..., aim im , ...an n |∀k ∈ {1, ..., n} ,
ak ∈ alph(M1 ), alph(M1 ) = {a1 , ..., an }} and y = aji11 aji22 ...ajim
m
then x =
j1 j2 jm
(ai1 , ai2 , ..., aim ) for jk ∈ IN, 0 ≤ jk ≤ M (aik ) and
n o
M (a ) M (a )+j M (a )+j M (a )+j M (a )
– M1new = a1 1 , ..., ai1 i1 1 , ..., ai2 i2 2 , ..., aim im m , ...an n
If there is ak ∈ M1 such as M1 (ak ) = ∗ then M1 (ak ) + jk = ∗.
Example 5. If we consider alph(M1 ) = {a, b, c} and M1 = a3 , b2 , c∗ , then the

molecule a3 c5 is obtained by applying the rule M1 /(a3 , c5 ) → M1new /a3 c5 with
M1new = b2 , c∗ . At the decomposition of the molecular formula a3 c5 , if we
3 5 new 3 5
new new 6 2 ∗
c → M1 /(a , c ) is applied
suppose the initial content of M1 , therule M1 /a
and the content of M1 is M1 = a ,b ,c .
2.3 Primary structures of object (”organic”) compounds

If instead of x ∈ M1 × ... × M1 we use elements x ∈ Mk1 × ... × Mkn for any (ar-
bitrary or any specific choice of) k1 , ..., kn ∈ {1, ..., n}, then we get well-formed
molecular formulas for any molecules, macro-molecules, organic compounds or
organic complexes. The set of all copounds obtained in ways described above is
called the set (language) of primary structures of object compounds. So, we
illustrated here that the primary structure of an object (”organic”) compound
is represented by its molecular formula. The set (the language) of all primary
structures of object compounds (object complexes) is denoted by:
SI = {y ∈ CD| there is x ∈ Mk1 × ... × Mkn for any k1 , ..., kn ∈ {1, ..., n}
for k ∈ {1, ..., n}}.
3 A formalization of structural formulas of object

P2 is a set of rules that attach a structural arrangement to the primary struc-
ture of a compound. This structural arrangement is connected with the way that
objects bind together in molecules formation. The necessity of defining such a
structural formula commes from the physico-chemical properties of a real or-
ganic compound which are determined by the way atoms (different groups of
7
atoms) bind together in forming organic compounds or complexes. In order to

model this phenomena we will consider two more symbols over CD: ”|” and
”—”.The symbol ”|” will be used to illustrate the attachement of the struc-
tural arrangement to the primary structure of a compound, while ”—” will
illustrate the bounds between atoms (different groups of atoms) in molecules
(compouns/complexes) formation. We make an observation here that although
in biology and bio-chemistry there are different types of chemical bounds, it is
not our goal to represent an accurate reality, but to construct a mathematical
model of a ”molecule, compound or complex”. This is the reason why the symbol
”—” will represent one type of a chemical bound (or all types of chemical bound
if one prefears) between atoms (between different groups of atoms).
The rule of structural arrangement of an object molecule/compound/complex
is denoted by `SI and associates a distribution of atoms (different groups of
atoms) in molecules to each molecular formula:
for y ∈ SI of the form y = aj11 ...ajmm we have y `SI z iff
j j
– z = aj111 ...am
jm1
− aj112 ...ajmm2 1p mp
P−p ..... − a1 ...am , for 1 ≤ p ≤ m, p ∈ IN and for
all l ∈ {1, ..., m} we have k=1 jlk = jl with 0 ≤ jlk ≤ jl
• p represents the number of atoms/groups of atoms/groups of compounds
y is formed of
• If jlk = 0 then the proper group will no contain the element ajl lk for
k ∈ {1, ..., p} and l ∈ {1, ..., m} , k, l, ∈ IN
• Each a1j1k ...ajmmk with k ∈ {1, ..., p} is an element from CD
– We denote by N om a complete set of all legally existing objects, string ob-
jects or string object compounds comming from a domain where the appli-
cation is needed. We will need this set to validate the well-formed structural
formulas. Let it be a mapping ver : CD ∪ {−} → {0, 1}, such as

1, if z exists in N om
ver(z) =
0, otherwise
not
If ver(z) = 1 then z = z and z is called a well-formed structural formula.
The attachement of the structural formula of an object molecule (string ob-

ject compound) to its molecular formula defines the set (the language) of
all secondary structures of object compounds (object complexes) and it is
denoted by:
SII = {(y|z)|y ∈ SI and y `SI z}.
4 A formalization of spatial configurations of object

We will use a set P3 of rules that attach a spatial orientation (of atoms/groups
of atoms in its constitution) to a compound from SII . The universal alphabet of
8
all possible spatial configurations of objects is Γ . As our work has its inspiration
in biology where the spatial arrangement of atoms/groups of atoms in a real
molecule is finite (for example, there are only a few possiblities of different protein
spatial conformations), the term universal alphabet is used here to refear to all
three-dimensional structure of all molecules object compound types. So, Γ is the
set of all three-dimensional structures molecules, the spatial arrangements of the
secondary structures of molecules in SII and we will refear to it as the set of all
spatial configurations.
Let it be Γ = {Γ1 , Γ2 , ..., Γq }, where q ∈ IN, q fixed. Γ ∗ is the set of all possible
strings of configurations under the operation of concatenation. A rule in P3 has
the form (u, v) ∈ P3 with P3 ⊆ SII × Γ ∗ and we denote it by the symbol |=SII .
We write that
for all u ∈ SII there is v ∈ Γ ∗ such as u |=SII (u|v).
So, we have
for all u ∈ SII , u = (y|z) there is v ∈ Γ ∗ such as (y|z) |=SII (y|z|v).
We define, this way,
SIII = {(y|z|v)| for all y ∈ SI , y `SI z there is v ∈ Γ ∗ such as
(y|z) |=SII (y|z|v)}.
This hierarchy of construction of the complex language of objects molecules
(componds or complexes) can be graphically represented in the figure below.

A Formalism of Organic Compounds

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

A Formalism of Organic Compounds

Hochgeladen von

Copyright:

Verfügbare Formate

Formal Classes of Organic Compounds

We choose O a nonempty finite alphabet of abstract symbols (called objects)

ψalph(x) (x) = (|x|a1 , |x|a2 , . . . , |x|an )

(for a language L ⊆ O∗ we have ψ(L) = {ψ(x)|x ∈ L}).

Similarly, we denote the number of symbol objects occurrences in an aqueous

As O is the alphabet objects representing chemicals in an aqueous solution,

We denote by alph(M ) = {a ∈ O|M (a) > 0} the finite alphabet of multiset

For any two multisets M1 , M2 : O → IN ∪ {∗}, we say that M2 is included in

Obviousely, every string x ∈ O∗ describes a multiset Mx over O and we write

Example 2. Let us consider the string x ∈ O∗ , x = abcaadf f with alph(x) =

such as there is a mapping fπ : O8 → O8 , fπ (a1 . . . a8 ) = aπ(1) . . . aπ(8) = y, then

The notion of a multiset may be extended for strings. A multiset over O∗ is

|M |x , if x ∈ O∗ and it represents the number of copies of the string

We denote by alph(M ) = x ∈ O∗ |∃ai ∈ alph(x), i = 1, n, n ∈ IN, alph(x) ⊆ O,

(any subset of O∗ is a multiset in which each string object is found in a unique

2 A formalization of primary structures of object

n say that for all k ∈

then the string x2 ∈ Mk2 has the form

M (ai1 ) M (air ) Mk2 (aik )

– Mk2 (air ) = Mk1 (air1 ) + Mk1 (air2 ) + ... + Mk1 (airp )

Fallow up, we consider P1 a finite subset of rules in M1 × ... × M1 × CD.

2.1 Molecular formula production rules

Production rules of molecular formulas are elements of the form (x, y) ∈ P1

If there is ak ∈ M1 such as M1 (ak ) = ∗ then M1 (ak ) − jk = ∗.

We call such elemets y ∈ CD well-formed molecular formulas.

2.2 Molecular formula decomposition rules

If there is ak ∈ M1 such as M1 (ak ) = ∗ then M1 (ak ) + jk = ∗.

Example 5. If we consider alph(M1 ) = {a, b, c} and M1 = a3 , b2 , c∗ , then the

2.3 Primary structures of object (”organic”) compounds

3 A formalization of structural formulas of object

atoms) bind together in forming organic compounds or complexes. In order to

The attachement of the structural formula of an object molecule (string ob-

SII = {(y|z)|y ∈ SI and y `SI z}.

4 A formalization of spatial configurations of object

Das könnte Ihnen auch gefallen