Beruflich Dokumente
Kultur Dokumente
............................................................................................................................................................
Eythan Levy
Ecole superieure dinformatique (HEB ESI), Bruxelles, Belgium,
Centre de Recherches en Archeologie et Patrimoine, Universite libre
de Bruxelles, Bruxelles, Belgium
Frederic Pluquet
Ecole superieure dinformatique (HEB ESI), Bruxelles, Belgium
.......................................................................................................................................
Correspondence: Eythan
Levy, Ecole superieure
dinformatique (HEB ESI),
rue royale 67, 1000 Bruxelles,
Belgium.
E-mail: elevy@heb.be
This article presents a new theoretical framework for computer-assisted decipherment of ancient alphabetic inscriptions. This framework is based on regular
expressions, a widely used computer science formalism for encoding text strings
with partially unknown characters. We then present a new software called SCRYPT,
which applies our framework to the Khirbet Qeiyafa ostracon, an important
Proto-Canaanite inscription recently discovered in Israel, as a first case study.
Several new anthroponymic readings for the Qeiyafa ostracon, found with the
help of our software, are presented as part of that case study. The software, freely
available online (www.ScryptApp.com), enables users to encode all possible readings for a given grapheme in the ostracon and provides fast automated dictionary
searches for lexemes.
.................................................................................................................................................................................
1 Introduction
The Khirbet Qeiyafa ostracon1 (Fig. 1) is an
inscribed pottery sherd discovered in 2008 in
Khirbet Qeiyafa, 27 km southwest of Jerusalem, in
an archaeological excavation led by the Hebrew
University of Jerusalem and the Israel Antiquities
Authority, under the direction of Yosef Garfinkel
and Saar Ganor (Garfinkel and Ganor, 2009). The
inscription, now on display in the Israel Museum,
Jerusalem, is dated to the 10th century BCE, and is
considered as the longest existing inscription in the
Proto-Canaanite script, a typicalbut rareLate
Bronze and early Iron Age script that predates the
well-known Hebrew and Phoenician Iron Age
scripts. The ostracon is composed of five lines of
text, and is written from left to right.2 As the
upper line is partially broken, it is unknown whether
Digital Scholarship in the Humanities The Author 2016. Published by Oxford University Press on behalf of EADH.
All rights reserved. For Permissions, please email: journals.permissions@oup.com
doi:10.1093/llc/fqw028
1 of 21
Abstract
2 Methods
2.1 Regular expressions
epigraphic community, and a more cautious approach, as recently expressed by Rollston (2011),
seems to be called for. The difficulties in interpreting
the inscription are certainly due to its very damaged
state, the unusual shape of some of its letters, and the
scarcity of our knowledge of the Proto-Canaanite
script. Furthermore, the lack of division marks between words, in most parts of the ostracon, provides
an additional difficulty for identifying lexemes. Since
the existing readings of the ostracon vary considerably, but at the same time agree on the identification
of many letters, it has appeared as natural to us to
examine the ostracon using a computer-assisted approach, by encoding its inscription as a regular expression, a widely used computer science formalism
for encoding text strings with partially unknown characters. The first part of this article will show how
regular expressions indeed offer an efficient theoretical framework for representing partial epigraphic
knowledge in alphabetic inscriptions of any script
and language. The second part will present a software
that implements our theoretical framework and
applies it to the Qeiyafa ostracon, resulting in the
proposal of several new readings for the inscription.
This software, called SCRYPT, enables users to encode
all possible readings for a given grapheme in the ostracon and provides fast automated dictionary
searches for lexemes using the Brown-Driver-Briggs
2 of 21
2.2 Methodology
Our methodology is based on dividing an inscription in discrete cells containing at most one letter,
then launching automated dictionary searches for
words in contiguous cells. These steps are detailed
below.
2.2.1 Cell encoding
A cell can contain either a clearly readable letter, a
partially readable letter, or no letter at all. A regular
expression is associated to each cell, defining its possible readings. These expressions have one type
among the following:
: no letter
a : a clearly identied letter a in the given
alphabet
ajbjc : one letter among a limited set of possible readings
. : one unidentied letter
.? : one or no letter
(ajbjc)? : one letter among a limited set of possible readings, or no letter
Cell 1
Cell 2
Cell 3
a
a
a
c|d
b
b
e
.
.?
ace, ade
aba, abb, abc, abd, abe, . . ., abz
aba, abb, abc, abd, abe, . . ., abz, ab
3 of 21
The inductive nature of the above definition enables the construction of more complex regular expressions than in the given examples. In such cases,
concatenation normally gets higher precedence than
union, and parentheses can be used to impose the
order of application of the operators. For example:
(a b)|c denotes the set {ab, c}, while a (b|c) denotes
the set {ab, ac}.
Regular expressions are used in many computing
languages, under various syntaxes, sometimes different than the formal one given above. These syntaxes
often include notational shortcuts, such as . (dot
operator, not to be confused with the concatenation
operator) denoting any letter of the alphabet
(hence, if is the Latin alphabet, . is a shortcut
for ajbjcjdj . . . jz) and, for any regular expression X,
the notation X? (question mark operator), denoting Xj (any string in X or the empty string). This
article will make use of all the above-mentioned
operators (including . and ?) to the exception of
the Kleene closure.5
The software presented in Section 3 provides preencoded default expressions for each cell, but still
allows the user to change the encoding of a cell at
any time. This section describes these default expressions and the methodology used to obtain
them. The default expressions are meant as a starting point for users of the software, and are naturally
destined to be modified and refined by them during
the computer-assisted decipherment process.
Fig. 3 Determination of default expressions, with standing for the empty string (see Section 2.1) and - standing for
a word divider
5 of 21
6 of 21
7 of 21
3.2 Internals
The SCRYPT software is implemented as a web application using the latest web standards and technologies, such as HTML5, CSS3, and Javascript. The
application is totally client-based: all computations
and dictionary searches being performed on client
side19 and implemented in Javascript. The Javascript
JQuery 2.0.3 library20 has been used to facilitate
manipulation of the Document Object Model. The
KineticJS21 library has been used to create the cells
of the ostracon, their highlighting, and to handle
mouse events related to these cells. The image processing tool for the ostracon has been implemented
using the glfx.js library.22 The Javascript source code
has been designed using the Model-View-Controller
design pattern, thus separating domain objects from
the view and enhancing modularity. The code is
hence rendered modular enough to be easily adaptable to other inscriptions,23 dictionaries, languages,
9 of 21
Fig. 7 Launching a dictionary search for the first three cells of line 2
Fig. 10 The magnifying glass mode (magnifying here the first cell)
Cells 47. We adapt the default regular expression in the following way: cell 6 is clearly too small
to hold a letter (pace Galil), hence we set its reading
to Nothing. We also add another possible identification to cell 7, namely the letter y, which has apparently been overlooked by our four base authors,
although the visible elements of the letter clearly
match the y of some other Proto-Canaanite inscriptions such as the Lachish ewer and the Izbet Sartah
ostracon. We now need to determine the length of
the word. SCRYPT finds no matches for any region
longer than cells 49 (starting in cell 4). For cells 4
9, interesting theophoric names are found, namely
Digital Scholarship in the Humanities, 2016 11 of 21
Abiel and Shebuel, both of which need to be discarded however (the first one because Abiel would
be written defectively as 8b8l rather than 8by8l in
Proto-Canaanite orthography, the second because
the traces of a vertical stroke35 on the right side of
cell 4 do not seem to match a letter s ). For cells 48,
only one name is found, namely Tsibya (sby8) but is
_ is then
also discarded because no suitable match
found for the sequel, beginning in cell 9.36 Finally,
for cells 47, nine matches are found: two of which
are false positives,37 and five unlikely candidates.38
We retain the two remaining ones, Abyah39 (8byh)
and Ribay40 (ryby), as likely candidates.
Cells 810. Starting in cell 8, SCRYPT finds no
anthroponymic matches covering more than three
cells, hence orienting us toward the search for a
name covering cells 810. Only one anthroponym
is found, namely Ulam41 (8wlm).
Cells 1114. SCRYPT finds two possible anthroponyms for this region, namely Yoshafat42 (ywspt) and
_
Shafat (spt, already noted by Millard and Richelle),
_
depending on whether one considers cell 11 (One
or no letter) to be a real letter or not.
Cells 1517. If we consider the vertically written
letters as part of the preceding name, then Shafatbased theophoric names are possible, as proposed
by Richelle.43 An alternative is to see these letters as
part of a new name, possibly featuring an additional
12 of 21
letter in the broken upper right corner of the ostracon. To reduce the number of matches, we set cell
16 to z (Puechs reading), rather than the default
One letter, as z seems by far the best identification
here. SCRYPT then finds only the name Yaziz44 (yzz).
3.3.2.3 Line 3. Cells 14. A large amount of
matches are possible for this region, and hence we
will restrict cell identifications to a minimum. For
cell 1, we encode w or b or Nothing, limiting ourselves to the actual identifications of three of our
base authors (thus rejecting Yardenis cautious One
or no letter). In the same way, we reject Misgavs
minority identifications of cells 2 and 3, and keep
the identifications g and r proposed by the three
other authors. SCRYPT then finds no matches for
cells 12 nor 13. But cells 14 yield Gera45 (gr8)
and Gareb46 (grb), the same two readings as proposed by Millard. The available traces in cell 4
clearly resemble an alef in our opinion, and hence
we retain the reading Gera. Checking for longer options (cells 15, 16, 17, or 18) yields no other
match.
Cells 57. This region is problematic. As noted by
most authors, the well-known Semitic root b6l naturally comes to mind, a root appearing in nouns
(master, lord, husband), verbs (marry, rule over),
theonyms (Baal, Baal-Zaphon, Baal-Shamem), and
which case an additional name Iddo54 (ydw, in defective spelling here) is proposed by SCRYPT in cells
1516. The alternative is to search for a single name
in the whole region, which entails the unique match
Mikayah55 (mykyh) (in which case cell 16 bears no
letter, with Yardeni and Galil). These readings are
however much more conjectural than the preceding
ones, since the letters are much more damaged here,
hence other letter identifications than the ones proposed by our four base authors should also be
possible.
3.3.2.4 Line 4. Cells 14. To reduce the large
number of possible readings, we need to limit the
default reading One letter of cell 4. We retain
Misgav and Puechs w but reject Galils n, since
the preserved vertical stroke seems too long for
the shape of the letter n in the Proto-Canaanite
stage.56 We also add a divider (vertical stroke57) as
an additional possible reading of this cell. We also
change cell 2 from One or no letter to One letter,
since we do not expect a vacat in the middle of the
first name. We now discuss the division of words.
Limiting the first word to cells 12 would yield a
large number of matches (due to the unknown value
of cell 2), but a problem then arises for the second
word since SCRYPT finds no anthroponyms for any
region beginning in cell 3. We thus search for a
name covering more than two cells. SCRYPT finds
no matches for any region longer than cells 13,
but proposes the following names for cells 13:
Edom (8dwm), Adam (8dm), Ulam (8wlm), Onam
(8wnm), Otsem (8sm), and Aram (8rm). The first
two names need _to be discarded since they are
borne only by the mythological first human, and
by the eponymic ancestor of the Edomite people.
We thus retain Ulam58 (8wlm), Onam59 (8wnm),
Otsem60 (8sm), and Aram61 (8rm) as likely candidates
_ Cell 4 then could either be a conjuncfor cells 13.
tive w (and) or a vertical word divider (like cell 8
of line 1). Indeed, SCRYPT finds no match (of any
length) beginning with cell 4.
Cells 57. As for word division, cells 56 yield a
reading62 Ner (nr), which we reject since no word can
then be found starting from cell 7. In the same way,
SCRYPT finds no anthroponyms for cells 57, nor for
any longer region starting from cell 5. Millard
Digital Scholarship in the Humanities, 2016 13 of 21
many anthroponyms, and attested in several ProtoCanaanite inscriptions.47 The reading b6l has however been discarded by Galil and Puech because of
an apparent word divider in cell 10 (a single dot)
and an apparent letter l in cell 9. Since the letter l
does not correspond to any possible single-letter
morphemic suffix in Hebrew (one would have
rather expected y, k, m, or n for example), these
authors have opted for the rarer48 lexeme 6wll
(infant). We believe however that the preservation
of the much more common root b6l in the reading is
a more probable working hypothesis, and use SCRYPT
to check if one can cover line 3 with anthroponyms
while retaining this root. This would entail of course
a rejection of the reading of cell 10 as a word divider
(as did Galil, who joined it with cell 11 to form the
letter r) or to see it as a misplaced divider.49
Experimenting with cells 57 yields one possible
match, namely the anthroponym Baal50 (b6l).
Checking for matches in any longer region starting
in cell 5 yields no result.
Cells 811. Since cells 10 and 11 have One letter
we get too many matches. We shall thus restrict the
regular expression to the actual identifications made
by our base authors, namely r or q for cell 10 and b
or s for cell 11 (thus rejecting Yardenis cautious
_ One letter for these cells). We also encode
choice
Nothing in cell 13 (pace Yardeni) since the cell
seems too small to hold a letter, and we remove
the reading d in cell 12, since the visible traces clearly
do not match this letter (pace Galil). SCRYPT finds no
matches for any region beginning in cell 8. We therefore decide to encode an alternative reading for cell 8,
which was read as l by our four base authors. We
propose a badly drawn p, since this letter is close to
an l in the Proto-Canaanite alphabet (see for example
the roundish shape of p in the second cell of line 2).
SCRYPT now finds only two matches beginning in cell
8: either Purah (prh, see Judg 7:10) in cells 810 or
Perets51 (prs) in cells 811. We retain Perets (cells 8
_ SCRYPT finds no anthroponym (of any
11) only, since
length) starting in cell 11.
Cells 1216. As noted in the preceding paragraph,
we have encoded Nothing in cell 13 and removed
the reading d in cell 12. Two possibilities now exist
for dividing between words. Cells 1214 yield two
matches: Mikah52 (mykh) and Maki53 (mky), in
3.3.2.5 Line 5. Cells 14. We adapt the regular expression in the following way: Galils reading b and y
in cells 2 and 3 seem forced (see Rollston, 2011, p. 76),
hence are removed from the regular expression. We
also remove the reading n from cell 4, as the shape of
the letter clearly seems like a m to us (with Yardeni
and Puech, pace Misgav and Galil). SCRYPT then proposes Aram67 (8rm), Hiram68 (hyrm), and Harim69
_ match. A shorter
(hrm). No longer region yields any
_
region (cells 13) does yield results, but does not
permit us to find a second anthroponymic match
starting in cell 4 (with the letter identifications for
cells 57 described below).
Cells 57. The default expression is One letter
for each of these three cells, so we obviously need to
restrict them to limit the number of matches. We
take the actual identifications proposed by our base
authors, namely the 6, b, and d proposed by Galil
and Puech, and the additional s for cell 7 proposed
by Yardeni. Cells 56 yield no anthroponymic
match, neither does any region longer than cells
57, with the expression we retained for cells 813
(see below). For cells 57, SCRYPT proposes the following matches: Ebed70 (6bd), Abdy71 (6bdy), and
Obed72 (6wbd).
14 of 21
4 Conclusion
4.1 Discussion
16 of 21
Acknowledgements
Funding
References
Ariel, C. (2013). Orthography: Biblical Hebrew. In Khan,
G. (ed.), Encyclopedia of Hebrew Language and
Linguistics. Brill, Leiden.
Beit-Arieh, I. (1993). A literary ostracon from Horvat
_
6Uza. Tel Aviv 20, 5565.
Benz, F. L. (1972). Personal Names in the Phoenician and
Punic Inscriptions. Rome: Pontifical Biblical Institute.
Richelle, M. (2015). Quelques nouvelles lectures sur lostracon de Khirbet Qeiyafa. Semitica, 57: 14762.
Rollston, C. (2011). The Khirbet Qeiyafa ostracon:
Methodological musings and caveats. Tel Aviv, 38:
6782.
Sass, B. (1988). The Genesis of the Alphabet and its
Development in the Second Millennium B.C., Vol. 13
gypten und Altes Testament. Wiesbaden: Otto
of A
Harrassowitz.
Yardeni, A. (2009). Further observations on the ostracon.
In Garfinkel, Y., Ganor, S. (eds), Khirbet Qeiyafa Vol. 1.
Excavation Report 2007-2008. Jerusalem: Israel
Exploration Society, pp. 25960.
Notes
1. The word ostracon refers to inscribed pottery sherds.
2. Such is the current standard understanding of the ostracon. Note however that Demsky proposed that the
ostracon was rather written vertically, and that its horizontal dividing lines (see Fig. 1) represent columns
rather than lines (Demsky, 2012). This theory actually
only affects the way the ostracon was held by the scribe,
but Demskys order of reading the letters is the same as
that of the other authors.
3. Note that concatenation is not commutative, i.e. X
Y 6YX.
4. Note that union is commutative, i.e. X|Y Y|X.
5. The Kleene closure has been presented here for the sake
of theoretical completeness in the formal definition of
regular expressions. This operator is useful for denoting infinite languages, as in the expression . b, denoting all strings, of any length, ending in b. Our
approach, however, is based on finite structures, with
18 of 21
14. This section uses the classical linguistic square brackets notation for phonetic transcription. The notations
used for Hebrew vowels are those of Brills
Encyclopedia of Hebrew Language and Linguistics
(Khan, 2013), with [a ] for qames gadol, [e] for sere,
[] for seghol, [] for long hireq, [i]_ for short hireq,_ [u]
_ s, and
for long s ureq/qibbus, [u] _for short s ureq/qibbu
_
_
[o] for holem.
_ here, an exhaustive manual dictionary
15. Note that
search for an expression like :ajbcjd: would take
even more prohibitive time, since the need for inserting matres lectionis entails a combinatorial explosion
in the number of possible matches, making this
number jump from 2,704 (see Section 2.2) to 292,032.
16. The name SCRYPT is inspired by a fusion of script (as
in ancient script), script (a list of computer commands stored in a file), and crypt (as in
cryptography).
17. These cells can either be on the same line, or overlap
two consecutive lines. Order of selection is also important, since the user can select cells either right-toleft or left-to-right, depending on the assumed writing order of the inscription (left-to-right in the case of
the Qeiyafa ostracon).
18. The word can later be removed from the panel list at
any time by clicking on the X button to its left.
19. Future versions of the software might include server
treatment, to save user sessions and readings for
example.
20. http://jquery.com
21. http://kineticjs.com
22. http://evanw.github.io/glfx.js/
23. Currently, two other recently discovered inscriptions
have been added to the software, namely the Ophel
pithos inscription (Mazar et al., 2013) and the
Qeiyafa Ishbaal inscription (Garfinkel et al., 2015),
both available through the Select an inscription
button in the application menu.
24. For line 1, Millard proposed the unattested anthroponym Ellat-ash (8lt6s ) (litt. the goddess helped),
followed by the conjunction w (and) and the biblical
name Abdel or Abdiel (6bd8l). For line 2, he proposed
the name Shafat (spt), and noted the possibility of a
second Shafat at the end of the line, possibly followed
by a complementary element in the vertical cells
above the line. For line 3, he proposed Gera (gr8) or
Gerab (grb), followed by Baal-X (i.e. a theophoric
name beginning with Baal). For line 4, he proposed
the epigraphically attested Hebrew name Naqmay
(nqmy), and the Phoenician name Bodmilk (bdmlk)
already noted by Yardeni. Millard proposed no readings for line 5.
20 of 21
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
81.
82.
83.
84.
85.
86.