Tito Orlandi
THE "CORPUS DEI MANOSCRITTI COPTI LETTERARI"
I
The enterprise, which is now called Corpus dei Manoscritti Copti
Letterari, started in 1968 as a conventional archive, mainly of
photos of Coptic literary manuscripts. Its aim was the
reconstruction of the Coptic codices coming from the library of
the White Monastery in Upper Egypt, which were dispersed in
libraries throughout the world, and the study and publication of
their contents.
The program was later expanded to include the whole field of
Coptic literature, and in the process, unique archives of data
and photographs were gathered. The work was concentrated upon the
following items: 1. Photographic archive; 2. Catalogue of
manuscript collections; 3. History of the manuscripts;
4. Catalogue of Coptic literary texts; 5. Reconstruction of the
White Monastery Library; 6. Bibliography of the Coptic
Literature, then general Coptic bibliography; 7. Publication of
texts with introduction and translation.
In 1980 a new project was started, to transfer the data of the
archives into computer memories, in order that they may be
manipulated by automatic process. We used a Data Base Management
program ("OMNIDATA") working in the SPERRY 1100/80 mainframe of
the Centro di Calcolo of the University of Rome, and several
files of data were created: 1. Description of manuscripts;
2. Coptic Bibliography; 3. Inventory of the collections of
manuscripts; 4. Catalogue of literary works in Coptic.
After the experiences of the following years, with different
programs both of Data Base and (later) manipulation of texts, the
organization of the work was changed. Presently, the Corpus dei
Manoscritti Copti Letterari is AN ENTERPRISE WHOSE AIM IS TO
TREAT IN A COMPREHENSIVE BUT ALSO ARTICULATE AND FLESSIBLE WAY,
AND - WHERE CONVENIENT - AUTOMATICALLY, THE REPRODUCTION OF THE
COPTIC LITERARY MANUSCRIPTS AND ALL INFORMATION ON THEM AND ON
SUBJECT RELATED TO THEM (scribes, authors, texts, production,
readers, collections, scholars), AND TO DISSEMINATE THE
SCIENTIFIC RESULTS THEREOF BY THE TECHNICAL MEANS WHICH IN TURN
ARE THOUGHT TO BE THE MOST APPROPRIATE.
The functions of the enterprise are organized around two main
tasks: DESCRIPTION and REPRODUCTION, of manuscripts, works, etc.
The description is obtained putting in various connections the
information about Coptic manuscripts and literature, in order to
form a consistent and possibly complete picture of the Coptic
1
literary world. For this the technique of the Data Base
Management is employed. - The reproduction is obtained in
"analogical" form through different photographic systems (mainly
microfilm and microfiche); and in "encoded" ("digital") form
suitable for various automated Text Processing possibilities.
The problems posed by this approach have by now been partially
answered, accounting for the fact that technical and theoretical
progress keeps the matters in movement. We should especially
mention the recent choice of UNIX (but especially of the UNIX
"philosophy") as the privileged environment in which the
computerized work is done. This has brought a series of
invaluable improvements in all the steps of the organization.
The problems which CMCL tries to cope with are:
PROCEDURAL ARRANGEMENTS:
- Identification of the different archives, through the
definition of uniform characteristics of the objects taken into
consideration;
- Organization of the information to be put in the records
forming the archives;
- Relations between the archives and cross references.
COMPUTER-RELATED PROPERTIES:
- Portability of the files, that may be processed in different
machines and by different programs;
- Central updating of the files with simultaneous correction of
data wherever necessary;
- Visualization of the files convenient for the different
steps in the management activity (screen or paper or microfiche;
Coptic or Latin characters; etc.).
II
Having all this in mind, the work in the Corpus dei Manoscritti
Copti Letterari is now carried on using nearly all kind of
machines available, from portable computers to main frame (with
their peripherals), and five basic kinds of programs: editor,
word processor, text formatter, text analyzer (mainly concordance
producer) - in the different steps of information and texts
processing.
On the other hand, all information (texts, bibliography, archival
data, etc.) is stored only in the form of pure ASCII files,
without any form of interspersed codes eventually produced by the
manipulation of certain packages (notably word-processors and
data-managers) and required by them, but inintelligible by
others.
The existing packages are used insofar they do not require nor
2
insert such codes, except for certain particular (generally
final) purposes, or in particular moments of the process, after
which the texts are again made free of non-ASCII codes.
Therefore the products of the Corpus will be available to
scholars, not only in the more or less conventional ways as
printed texts or microform, but also in files suitable for
management with most of the machines and software which they
normally use.
FILES OF TEXTS
The files containing Coptic texts do not "reproduce" the physical
shape of the text in any given manuscript, but are seen as a
"kilometric" text (using the ASCII charecter Decimal 12 for
practical reasons in the visualization) in which the ASCII
character will be adapted to ENCODE ALL PHENOMENA found in the
manuscript in question, which are RELEVANT TO THE PRESERVATION OF
THE TEXT in a magnetic memory.
The CODIFICATION ratio, i.e. the correspondence between Coptic
and other necessary special characters from one side, and the
"numbers" (sequences of bits) stored in the memory of the
computer, from the other, will be INDEPENDENT "per se" from those
actually used in the keyboards and in the printing devices. This
has been done, because the keyboards and printing devices
generally in use do not share exactly the same systems. It is
true, however, that the systems in use are rather similar.
The CODIFICATION system has been studied in order to facilitate
the INPUT of the texts by the scholars through the keyboards
normally in use today. It is understood that the encoder of one
manuscript, or part of it, shall not change in any way the Coptic
text, not even separate words. He will only read what is surely
readable, encoding the Coptic text as it appears in the
manuscript in the present state of conservation, and encoding
other relevant information according to the chosen rules.
The phenomena selected to be encoded are:
Coptic text: each Coptic letter will correspond to one ASCII
character which in the "normal" keyboards and printing device
corresponds to one alfanumeric or special character. Inside the
text, the following other relevant information will appear: End
of line; End of column; Beginning of page with eventual
numeration; Punctuation (in several forms); Majuscule in the
margin; Physical lacune; Illegible letters; Separator (a special
Coptic orthographic feature); raised dot; apostrophe; Blank;
Marginal glosses.
After the Coptic text has been coded, it is submitted to
different procedures, according to different steps and goals of
study and publication. Some of the programs used in the
procedures have been especially written inside the CMCL (in
3
BASIC); the above mentioned packages are also used, when possible
and convenient; some passages require an intervention by the
scholar, of course through an "editor" program.
First, the text can be automatically printed with Coptic
characters in the shape of a "diplomatic" transcription. If
another style of publication is (also) envisaged, the text is
automatically divided into numbered paragraphs, according to the
original punctuation of the manuscript (one or more punctuation
signs may be selected). In this shape, it is passed through a
concordance program. The result is used to check the
transcription (e.g., the unusual spellings are highlighted in a
concordance, and this is very helpful), and to normalize the
orthography, if such are the criteria of the edition.
At this stage the editor makes a first attempt to fill the
lacunae, and to improve the texts of the manuscript, when there
are manifest mistakes. After that, he prepares the translation
using a program which checks each word in a dictionary (a sort of
self-augmenting file), presents to the editor a choice of
translations, and registers on another, "vertical", file the
Coptic words and the chose Italian equivalent. This will lead to
other modifications of the text itself, of the division of the
paragraphs, etc., which are done by the editor on a copy of the
original encoded file.
This copy represent the correct form of the text as the editor
sees it, and is used to produce: the "final" concordance, the
formatted text for print, the "final" translation; unless there
are other manuscripts to collate. In this case, every manuscript
is treated first individually, and then their texts are collated.
FILES OF DATA
The files of data are conceived as a list of material, put in the
memory in such way, that it can be easily transformed in whatever
kind of "active" Data Base may be the choice of different
scholars (Catalogues, Bibliographies, Description of manuscripts,
etc.) Therefore they are put in memory in the form of a common,
simple text file, divided only in "lines" (= portion of text
delimited by one so-called CR = "Carriage Return"). But the file
is organized in order to be read easily by any normal "Data Base
Program", whether existing in the market or personally conceived.
In the first time, the data were stored according the principles
of the hierarchical data bases; now they are being modified
according to the relational theory.
The data are stored in Records divided into a number of Fields.
The size of each Field is not fixed. There are codes (always in
plain ASCII) indicating the separation of the different records;
and other codes indicating the separation of the different fields
in one record.
4
Consequently, all we need to define, in order to obtain a FILE
useful for automatic manipulation, is:
1. Content of the file.
2. Number and order of the fields
3. Markers indicating the separation of the records
4. Markers indicating the separation of the fields.
As for the CONTENT, there are 5 types of FILES, namely for:
CODICES, ENTIRE OR RECONSTRUCTED.
COLLECTIONS OF MANUSCRIPTS.
CLAVIS COPTICA.
CODICOLOGICAL AND PALAEOGRAPHICAL DESCRIPTION.
BIBLIOGRAPHY.
CODICES: Each RECORD represents one codex, which is preserved
more or less complete, or can be reconstructed from a sufficient
number of scattered leaves.
FIELD 1. Conventional call-number.
FIELD 2. Call number given by the owner, or (for the
reconstructed codices) the key-word: FRAGMENTS, which
refers to the Field 8.
FIELD 3. Dialect.
FIELD 4. Provenience.
FIELD 4. Editions.
FIELD 5. Available reproductions.
FIELD 6. Other bibliography.
FIELD 7. List of the content.
FIELD 8. List of the fragments.
COLLECTIONS: Each RECORD represents one manuscript, kept in the
Collection which gives the name to the single file.
FIELD 1. Call number given by the owner.
FIELD 2. Catalogue number.
FIELD3.Conventional call-number of the reconstructed
codex (if any).
FIELD 4. Dialect.
FIELD 5. Content.
FIELD 6. Provenience.
FIELD 7. Editions.
FIELD 8. Bibliography.
FIELD 9. Previous owners.
FIELD 10. Complementary fragments.
CLAVIS COPTICA: Each RECORD will represent one work of the
Coptic Literature.
FIELD 1. Number of "access" in the list.
FIELD 2. Number of the Clavis Patrum Graecorum.
FIELD3.Number of the Bibliotheca Hagiographica Graeca.
FIELD4.Author or Literary genre (in case of obviously
anonymous works: Passio; Acta Conc.; Canones; etc.)
FIELD 5. Title.
FIELD 6. Manuscripts.
FIELD 7. Bibliography.
5
FIELD 8. Abstract.
CODICOLOGICAND AND PALAEOGRAPICAL DESCRIPTION: A great number of
fields has been conceived, to contain detailed information on all
the characteristic of the manuscripts.
BIBLIOGRAPHY: It consists of 4 interrelated files. Each listed
publication has an identification number which is the same in all
files. Each file concerns one aspect of the publications.
FILE 1: Description of the publication.
FIELD 1. Author.
FIELD 2. Title
FIELD 3. Periodical.
FIELD 4. Miscellany.
FIELD 5. Editor.
FIELD 6. Collection
FILE 2: Subject. Only one Field, but there may be many
records for each publication.
FILE 3: Manuscripts published (id.).
FILE 4: Reviews (id.).
III
The CMCL has four series of publications:
The COPTIC BIBLIOGRAPHY, which is published every year as a
brochure with a set of microfiche;
A series of printed books of editions of texts, translations and
studies;
A series of preliminary editions of single manuscripts, as a
brochure with a set of microfiche reproducing the manuscript;
A series of Catalogues of the collections of Coptic manuscripts.
In conclusion, the work done by the Corpus dei Manoscritti Copti
Letterari should produce three advantages in the field of Coptic
Studies: 1. To increase the amount of information available about
Coptic literature, and accelerate the publication of texts and of
tools for scholars. 2. To facilitate the subdivision of the work
to be done on each particular text (linguistic, philological,
historical, theological analysis), because each specialist may
exert his particular competence on one part of a set of uniformly
organized materials. 3. To identify the sectors of Coptology
where most urgent is the contribution of scholars.
6