Sie sind auf Seite 1von 113

Allocating Memory the Newfangled Way:

The new Operator


Dale E. Rogerson
Microsoft Developer Network Technology Group

Created: August 6, 1992


Revised: January 21, 1993

"The business side is easy—easy! ...if you're any good at math at all, you understand
business. It's not its own deep, deep subject. It's not like C++."

William Gates, Upside Magazine, April 1992

There are two sample applications associated with this technical article.

Click to open or copy the files for the OWNER sample application.

Click to open or copy the files for the NEWOPR sample application.

Abstract
Many developers ask the question, "Do I need to overload the new operator for
Windows™–based applications?" when they start programming in C++ with the
Microsoft® C/C++ version 7.0 compiler. These developers want to conserve selectors
while allocating memory from the global heap. Fortunately, the C/C++ version 7.0 run-
time library allows developers to reduce selector consumption without overloading the
new operator.

This article examines the behavior of the new operator in a Windows-based program. It
provides an overview of new, discusses whether you should overload new, examines the
C++ ambient memory model, and discusses large-model C++ programming and
dynamic-link library (DLL) ownership issues.

Two sample applications, newopr and owner, illustrate the concepts in this technical
article. A bibliography of suggested reading material is included at the end of the article.

Overview
This section provides an overview of the new operator, the _fmalloc function, and the
_nmalloc function.

new
The new operator calls malloc directly. In small or medium model, it calls the near
version of malloc, which is _nmalloc. In large model, it calls _fmalloc.

Alarms are probably ringing in the heads of experienced programmers. In the past,
Microsoft has recommended against using malloc because it was incompatible with
Windows real mode. In C/C++ version 7.0, malloc is designed for Windows protected-
mode programming, and real mode is no longer a concern in Microsoft® Windows™
version 3.1. In most cases, calling _fmalloc is now better than calling GlobalAlloc
directly.

_fmalloc

_fmalloc is better than GlobalAlloc because of subsegment allocation. Instead of calling


GlobalAlloc directly for each memory request, _fmalloc tries to satisfy as many requests
as possible with only one GlobalAlloc call, and uses GlobalReAlloc to increase the size
of a segment. Reducing the calls to GlobalAlloc cuts down on the overhead, time, and
selectors required by an application.

Reducing selectors is particularly important for C++ programs. Most programs allocate
lots of small objects on the heap. If new called GlobalAlloc directly, each small object
would use a selector, and the program would reach the system limit of 8192 selectors
(4096 in standard mode) too quickly.

_nmalloc

Although _fmalloc is fine and dandy, _nmalloc is not nearly (no pun intended) as
sophisticated. _nmalloc allocates fixed memory with LocalAlloc directly, which may
result in memory fragmentation in the local heap. _nmalloc performs no subsegment
allocation scheme, and the local heap must share a maximum of 64K with the stack and
static data.

Here's another gotcha: _nmalloc is the default for the new operator in the medium and
small models. _nmalloc allocates its memory from the local heap and must share the
heap with the static data and stack—so a lot of things compete for only 64K of space. It is
rather easy to run out of memory in the local heap. For example, a simple phone book
that requires 200 bytes of data per entry would be able to store a maximum of only 330
names.

Heap Walker can help you determine the source of memory allocation. Memory allocated
with LocalAlloc (through _nmalloc) expands the segment labeled DGroup. Memory
allocated with GlobalAlloc (through _fmalloc) is labeled as a private segment.

For more information on _fmalloc, see the "Allocating Memory the Old-Fashioned Way:
_fmalloc and Applications for Windows" technical article on the Microsoft Developer
Network CD (Technical Articles, C/C++ Articles).
Overloading the new Operator
Many developers want to overload the new operator as soon as they learn that new calls
_nmalloc. You can overload the new operator to perform specialized memory
management, but overloading new to call _fmalloc instead of _nmalloc will not work.

The new operator has four versions. In this article, we are concerned only with the
following two:

void __near *operator new( size_t size );


void __far *operator new( size_t size ) ;

In small and medium models, the compiler calls the near version of the new operator, and
this version then calls _nmalloc. If we try to overload this function by calling _fmalloc,
we would get a far-to-near pointer conversion error:

void __near *operator new( size_t size )


{
return _fmalloc(size);
//ERROR: Lost segment in far/near conversion.
}

A memory management scheme that overloads the near version of the new operator can
return only near pointers, so using GlobalAlloc or GlobalAllocPtr will not work either.

Overloading the new operator to call _fmalloc instead of _nmalloc is obviously not the
answer.

Ambient Memory Models


Asking the proper question will lead to a useful solution. The proper question is: "How
do I get the far version of the new operator compiled in my code?" There are three ways
to do this:

• Specify an ambient memory model.

• Override the ambient memory model.

• Use the large memory model.

The following sections describe each method in turn.

Specifying an Ambient Memory Model

You can think of the ambient memory model as the default memory model. Normally, the
ambient memory model of a class is identical to the data model you specify at
compilation time. If the data model is near (for example, in small or medium models), the
ambient memory model is near. You can specify the ambient memory model for a class
explicitly by using __near or __far; for example:

class __far CFoo {


};

Using the new operator on the CFoo class, as defined above, allocates the CFoo object
on the global heap using _fmalloc.

Note The ambient memory model of a class must be identical to the memory model of
all of its base classes. For example, if your class inherits from a Microsoft Foundation
class, your class must have the same memory model as the Foundation class. If you use
small and medium memory models, the ambient memory model of a Foundation class is
near. We discuss the large model in the "Large-Model Programs" section.

Overriding the Ambient Memory Model

You can override the ambient memory model on a per-object-instance basis:

class CBar{
};

void main()
{
CBar __far *pBar = new __far CBar ;
}

At first glance, the code above looks very straightforward. However, nonstatic member
functions have a hidden parameter called the this pointer. It is through the this pointer
that an object instance references its data. If the member function is near, it expects the
this pointer to be near. A far this pointer results in an error because a far pointer cannot
be converted to a near pointer.

The following code generates an error because it cannot find a default constructor that
returns a far this pointer:

class CBar{
public:
CBar();
};

CBar::CBar()
{
}

void main()
{
CBar __far *pBar = new __far CBar ;
// ERROR C2512: 'CBar': An appropriate
// default constructor is not available.
}

To compile the code above, you must override the constructor based on the addressing
type. This results in the following correct code:

class CBar{
public:
CBar();
CBar() __far ;
// Overload the constructor to take far this pointers.
};

CBar::CBar()
{
}

// Overloaded constructor.
CBar::CBar() __far
{
}

void main()
{
CBar __far *pBar = new __far CBar ;
}

Only functions that are actually called through a far pointer need to be overridden.

class CBar{
private:
int value;
buildIt() __far {}; // Must be far: CBar() __far calls it.
public:
CBar();
CBar() __far ;
// Overload the constructor to take far this pointers.

// inc is called through a far pointer.


void inc() __far { value++; } ;

// dec is not called through a far pointer.


void dec() { value--; } ;
};

CBar::CBar()
{
buildIt() ;
}

// Overloaded constructor.
CBar::CBar() __far
{
buildIt() ;
}
void main()
{
CBar *npBar = new CBar ; // Allocated in default data segment.

CBar __far *pBar = new __far CBar ; // Allocated in global heap.

pBar->inc() ; // Far addressing.

npBar->dec() ; // Near addressing.


npBar->inc() ; // Converts near pointer to a far pointer.
}

Confusion Is Nothing New

The use of the __far modifier can make programs very difficult to understand and debug.
For example, let's assume that the following code is compiled in the small or medium
memory models:

class __far CFoo {


public:
CFoo() ;
~CFoo() ;
...
};

class CBar {
public:
CBar() ;
~CBar() ;
...
};

CFoo aFoo; // Allocated in a far data segment.


CBar aBar // Allocated in default data segment.

CFoo *pFoo // Far pointer.


CBar *pBar // Near pointer.

CFoo __near *npFoo ; // Near pointer.


CBar __far *fpBar ; // Far pointer.

main()
{
CFoo anotherFoo; // Allocated on stack
// (default data segment).

pFoo = new CFoo; // Allocated in global heap.


pBar = new CBar; // Allocated in default data segment.

fpBar2 = new __far CBar;


// Error: No appropriate default constructor.

npFoo = new CFoo;


// Error: Cannot convert from a far pointer to a near pointer.
npFoo = new __near CFoo ;
// Error: Cannot convert from a far class to a near class.

npFoo = &aFoo;
// Error : Cannot convert from a far pointer to a near pointer.
}

You can see how complex an application can get when it mixes near objects and far
objects.

Again, Heap Walker can help you determine whether memory is being allocated in the
default data segment or in the global heap.

For additional information on the new operator, see Chapter 5 of the Microsoft C/C++
version 7.0 Programming Techniques manual on the Microsoft Developer Network CD.

Large-Model Programs
As we discussed in the previous section, mixing near and far addressing is even more of a
nightmare in C++ than it is in C and can offset many C++ benefits such as ease of
maintenance and readability. The solution is to use the large model.

Although the large model has not been recommended in the past, the combination of
Microsoft C/C++ version 7.0 and Windows version 3.1 now makes large model the
memory model of choice.

When a C or C++ program is compiled with the large memory model, malloc is mapped
to its model-independent or far version known as _fmalloc. Because the new operator
calls malloc, heap objects are allocated in global memory.

The two issues associated with using the large model involve speed and creating multiple
instances. The time you save by not worrying whether an object is near or far can be used
to run a profiler and to optimize the application, thus compensating for any speed losses
caused by the large model.

Multiple Instances

The new /Gx option in C/C++ version 7.0 simplifies the creation of multiple-instance,
large-model applications. Make sure to use the following compiler options:

/Gt65500 /Gx

Programs with multiple read/write data segments cannot have multiple instances. By
default, the Microsoft C compiler places initialized and uninitialized static data in two
separate segments. The compiler places each static object that is larger than or equal to
32,767 bytes into its own segment. The /Gx and /Gt options override this behavior.
The /Gx option forces all initialized and uninitialized static data into the same segment.
The /Gt[n] option places any object larger than n bytes in a new segment. (n is optional,
as indicated by the square brackets.) If n is not specified, it defaults to 256 bytes. If n is
large (for example, 65,500 bytes), most objects remain in the default data segment.

Because a multiple-instance application can have only one read/write data segment, the
application is limited to 64K for all statics, the local heap, and the stack. However, C++
promotes the use of the heap through the new operator, which allocates memory from the
global heap instead of the local heap in the large model, so the 64K local heap limit
should not be a problem. Moreover, multiple-instance, small-model and medium-model
applications also have only one read/write data segment.

Warning A bug in Microsoft C/C++ version 7.0 causes the compiler to place
uninitialized global instances of classes and structures in a far data segment
(FAR_DATA) when the /Gx option is used, resulting in two data segments. For this
reason, you must declare global class objects and structures as near.

To illustrate, most Microsoft Foundation Class Library programs have a global object
declared as follows:

CTheApp theApp;

To get multiple instances of this program, you must change the line to:

CTheApp __near theApp;

We recommend that you use the NEAR define:

CTheApp NEAR theApp;

The EXEHDR utility determines the number of data segments a program contains. In the
sample EXEHDR output below, the lines that detail the number of segments are
underlined and appear in bold.

Microsoft (R) EXE File Header Utility Version 3.00


Copyright (C) Microsoft Corp 1985-1992. All rights reserved.

Module: NEWOPR
Description: newopr - demonstrates new operator in
medium v. large model
Data: NONSHARED
Initial CS:IP: seg 1 offset e392
Initial SS:SP: seg 4 offset 0000
Extra stack allocation: 1000 bytes
DGROUP: seg 4
Heap allocation: 0400 bytes
Application type: WINDOWAPI
Runs in protected mode only

no. type address file mem flags


1 CODE 00000600 0ff8f 0ff8f PRELOAD, (movable), (discardable)
2 CODE 00010a00 013b0 013b1 PRELOAD, (movable), (discardable)
3 DATA 00000000 00000 00038 PRELOAD, (movable)

4 DATA 00012000 0159f 01fee PRELOAD, (movable)

Exports:
ord seg offset name
1 1 e358 _AFX_VERSION exported
2 1 f718 ___EXPORTEDSTUB exported

The MAP file helps determine the data that is placed in the FAR_DATA segment instead
of the default data segment. To get a MAP file, be sure to specify a MAP filename and the
/MAP option on the link line. In the sample MAP file below, lines of interest are
underlined and shown in bold.

Start Length Name Class


0001:0000 004CFH NEWOPR_TEXT CODE
.
.
.
0001:E378 01C17H _TEXT CODE
0002:0000 013B1H COMDAT_SEG1 CODE

0003:0000 00038H NEWOPR1_DATA FAR_DATA


0004:0000 00010H NULL BEGDATA
0004:0010 011E0H _DATA DATA
0004:11F0 00000H XIFCB DATA
.
.
.
0004:140E 0000CH IOB2E DATA
0004:141A 00004H CONST CONST
0004:141E 00008H HDR MSG
0004:1426 00163H MSG MSG
0004:1589 00015H PAD MSG
0004:159E 00001H EPAD MSG
0004:15A0 009B6H _BSS BSS
.
.
.
Origin Group
0004:0 DGROUP

Address Export Alias

0001:E358 _AFX_VERSION _AFX_VERSION


0001:F718 ___ExportedStub ___ExportedStub

Address Publics by Name

0001:7C52 ??0CArchive@@REC@PEVCFile@@IHPEX@Z
.
.
.
0003:0000 ?spaceholder@@3VCObArray@@E
0001:7BF2 ?Store@CRuntimeClass@@RECXAEVCArchive@@@Z
0002:01F0 ?TextOut@CDC@@RECHHHPFDH@Z

0003:000E ?theApp@@3VCTheApp@@E
.
.
.
Address Publics by Value
.
.
.
0002:1380 ?GetStartPosition@CMapPtrToWord@@RFCPEXXZ

0003:0000 ?spaceholder@@3VCObArray@@E

0003:000E ?theApp@@3VCTheApp@@E
0004:0004 rsrvptrs
.
.
.
0004:1FEE __end
0004:1FEE _end

Program entry point at 0001:E392

Multiple-Instance, Large-Model Foundation Class Programs

Multiple-instance, large-model programs that use the Microsoft Foundation classes must
build special versions of the Microsoft Foundation Class Library using the /Gt and /Gx
options. Use the following command line:

nmake MODEL=L TARGET=W DEBUG=1 OPT="/Gt65500 /Gx"

Warning This variant of the Microsoft Foundation Class Library has not been tested
extensively by Microsoft.

For additional information on using large-model programs with Windows, see the
"Programming at Large" technical article on the Microsoft Developer Network CD
(Technical Articles, C/C++ Articles).

The NEWOPR Sample


newopr is a rather simple application that demonstrates some of the issues presented in
this technical article. newopr tries to allocate 128 blocks of memory, 1024 bytes per
block. When newopr is compiled as a medium-model program, it cannot allocate 128
blocks because it runs out of memory in the default data segment. In fact, the Microsoft
Foundation Class Library raises an exception when the new operator fails, and newopr
handles this exception gracefully.
When newopr is compiled as a large-model program, it can allocate all 128 blocks
because the memory is allocated from the global heap instead of the local heap.

The best way to use newopr is to compile it medium model, run it, and examine the heap
with Heap Walker. Run NMAKE with the CLEAN option, and then compile large model.
Run the large-model version, and re-examine the heap with Heap Walker.

The following parameters control how newopr gets built:

DEBUG=[0|1] Setting of 1 enables debugging information.


LARGE=[0|1] Setting of 1 compiles newopr as a large-model program.
MINST=[0|1] Setting of 1 compiles with /Gt and /Gx options to allow multiple
instances. LARGE must be set to 1.
CLEAN Deletes .exe, .res, and .obj files.

Sample nmake command lines are shown below:

Command Line Makes


nmake Medium-model version.
nmake DEBUG=1 Medium-model debug version.
nmake LARGE=1 Large-model version.
nmake MINST=1 Medium-model version. MINST is ignored.
nmake LARGE=1 MINST=1 Multi-instance, large-model version.
Foundation class large-model library must be
compiled with /Gx and /Gt for this to work.
nmake DEBUG=1 LARGE=1 MINST=1 Same as above, but enables debugging.

DLLs and Memory Ownership


As discussed in the "Allocating Memory the Old-Fashioned Way: _fmalloc and
Applications for Windows" technical article on the Microsoft Developer Network CD,
_fmalloc called from a DLL behaves differently than _fmalloc called from an
application. If you call _fmalloc from a DLL, it calls GlobalAlloc with the
GMEM_SHARE flag, which changes the ownership of the allocated memory from the
calling application to the DLL.
Ownership determines when the system will clean up the memory:

• If the application owns the memory, exiting the application releases the memory.

• If the DLL owns the memory, unloading the DLL from memory releases the
memory.

• If multiple applications or multiple instances of an application use a DLL, the


DLL is unloaded only after all applications that use it are unloaded.

The key point here is that memory owned by a DLL (for example, GMEM_SHARE) can
exist even after your application exits. The Smart Alloc sample application, which
accompanies "Allocating Memory the Old-Fashioned Way: _fmalloc and Applications for
Windows," illustrates this issue.

Ambiguous Memory Ownership

A DLL owns the memory allocated as GMEM_SHARE from within the DLL (in C++ or
C). A DLL also owns the memory allocated by new in the DLL. Determining when and
where memory is allocated can become very confusing in C++.

The code samples below are from the owner sample application and its associated
OWNERDLL.DLL.

The DLL contains the following class:

class __export CContainedClass{


public:
char aMessage[1024] ;
};

class __export CFooInDLL{


public:
CFooInDLL () ;
void yourString() ;
void myString();

CContainedClass aContainedClass ;
char aBuffer[1024] ;
char *aString ;
} ;

CFooInDLL::CFooInDLL()
{
aString = new char[1024] ;
}

/////// INLINE FUNCTION ////////


inline
void CFooInDLL::yourString()
{
if (aString)
delete aString ;
aString = new char[1024] ;
}

/////// OUTLINE FUNCTION ///////


void CFooInDLL::myString()
{
if (aString)
delete aString ;
aString = new char[1024] ;
}

The .EXE for the program contains the following code fragment:

// Code in .EXE
CFooInDLL aFoo;

void somefunc()
{
aFoo.yourString() ; // Now application owns aString.
aFoo.myString() ; // Now DLL owns aString.
aFoo.yourString() ; // Now application owns aString.
}

Given these code fragments (where the object is defined in a DLL and declared in an
application), the following rules apply:

• The application owns the memory for objects declared in the application.

Therefore, the application owns the memory for aFoo.

• Space for an object and its contained objects is allocated where the object is
declared.

Therefore, during the construction of the aFoo object, memory for


aContainedClass is allocated, and aContainedClass is also located in the
application's memory space.

• The process that executes the new operator owns the memory for the object (see
figure below).
• The CFooInDLL constructor calls the new operator to allocate space for
aString; therefore, the DLL owns the memory for aString.

• yourString is an inline function and executes inside the application;


therefore, the application owns the memory allocated by yourString.

• myString executes inside the DLL; therefore, the DLL owns memory
allocated by myString.
• The debug versions of Foundation classes track the allocation of memory. An
assertion in the Microsoft Foundation Class Library MEmory.cpp source file will
fail when yourString tries to free memory allocated by the DLL. Therefore, the
retail versions of owner and OWNERDLL run fine, but the debug versions fail.

In most cases, it is best to design classes exported from a DLL so that memory ownership
will not bounce between the application and the DLL. Using the debug versions of the
Foundation class libraries helps track this problem.

Memory ownership for CFooInDLL object

The problem of determining memory ownership is just one more reason not to export
C++ class interfaces from a DLL. In most cases, it is much better to export a C interface
from a DLL.

Conclusion
There is no need to override the new operator to make it compatible with the Windows
environment. The new operator calls malloc. The model-independent version of malloc,
_fmalloc, is designed to manage subsegment allocation under Windows.

However, in medium or small memory models, malloc calls _nmalloc instead of


_fmalloc. _nmalloc allocates memory through LocalAlloc. The best way to get the new
operator to call _fmalloc is to use the large memory model. The ambient memory model
for a class can be specified or overridden for a class instance, but both of these methods
can quickly lead to confusing and complex code.

Bibliography
The following technical articles on the Microsoft Developer Network CD (Technical
Articles, C/C++ Articles) are good sources of information on memory management in
C++:

• "Allocating Memory the Old-Fashioned Way: fmalloc and Applications for


Windows"

• "Programming at Large"

• "Exporting with Class"

We also recommend the Microsoft C/C++ version 7.0 Programming Techniques manual,
also available on the Microsoft Developer Network CD. Chapter 5 of this manual
discusses memory management in C++.
Allocating Memory the Old-Fashioned
Way: _fmalloc and Applications for
Windows
Dale Rogerson
Microsoft Developer Network Technology Group

Created: July 10, 1992

Click to open or copy the files in the Smart Alloc sample application for this technical
article.

Abstract
One of the most shocking things that a first-time programmer for Windows has to learn is
not to use malloc but to use special Microsoft® Windows™ memory allocation functions
such as GlobalAlloc, GlobalReAlloc, GlobalLock, GlobalUnlock, and GlobalFree.
The reasons for requiring special memory allocation functions have mostly gone away
with the demise of real mode. In fact, Microsoft C/C++ version 7.0 brings us almost full
circle, because the preferred method for memory allocation is the large-model version of
malloc or _fmalloc. Even the C startup code now uses malloc to allocate space for the
environment.

This article discusses the behavior of malloc supplied with Microsoft C/C++ version 7.0.
The article focuses on programming for the protected modes—standard and enhanced—
of Microsoft Windows version 3.1. The following topics are discussed:

• _nmalloc: Why _fmalloc is not the same

• History: Why _fmalloc was bad

• Subsegment Allocation: Why _fmalloc is good

• _ffree: Why _fmalloc is not perfect

• DLLs: Why _fmalloc may not do what you want

• Versatility: Why _fmalloc is not for everything

The information for this article was gleaned from the C/C++ version 7.0 compiler run-
time library source code.
To interactively explore the behavior of _fmalloc, the Smart Alloc (SMART.EXE)
sample application is provided. Smart Alloc is best used in conjunction with Heap
Walker, which shows the exact state of the global segments allocated. Segments allocated
with GlobalAlloc (or _fmalloc) are listed by Heap Walker as having a type designation
of "Private." Smart Alloc has a dynamic-link library (DLL) that intercepts all calls to
GlobalAlloc, GlobalFree, and GlobalReAlloc made by Smart Alloc or the C run-time
library and prints messages with OutputDebugString to the debugging console. It is
usually most convenient to use DBWIN.EXE to view these messages.

_nmalloc: Why _fmalloc Is Not the Same


When compiling with the large data model libraries (compact-, large-, and huge-model
programs), malloc is automatically mapped to _fmalloc. In other memory models, the
programmer must explicitly call _fmalloc, because malloc maps to _nmalloc in these
memory models.

_nmalloc functions differently from _fmalloc. _nmalloc directly maps to LocalAlloc


with the LMEM_NODISCARD | LMEM_FIXED flags. _nfree directly calls LocalFree.
Because _nmalloc allocates fixed memory blocks, it can lead to fragmentation of the
local heap.

History: Why _fmalloc Was Bad


Before Microsoft® Windows™ version 3.1, programmers had to worry about
compatibility with Windows-based real mode, which required the locking and unlocking
of memory handles to support movable memory. A locked block in real mode is fixed in
memory, and leaving blocks locked would result in performance degradation. The way
_fmalloc is defined meant that an allocated block would have to be locked throughout its
lifetime. When Microsoft C version 6.0 was released, real mode was the only mode in
Windows; therefore, _fmalloc was designed to work under real mode.

Microsoft C/C++ version 7.0 was designed to develop protected-mode applications for
Windows. In protected mode, there is no penalty for locking a memory handle and
leaving it locked. It is not even necessary to retain the handle returned from GlobalAlloc,
because the GlobalHandle function returns the handle to a selector returned from
GlobalLock. Macros defined in WINDOWSX.H simplify the process of getting a pointer
to a block of memory. The GlobalAllocPtr and GlobalFreePtr macros automatically
lock and unlock a memory block.

Microsoft C/C++ version 7.0 takes advantage of the new freedom allowed by protected
mode. _fmalloc can now leave memory blocks locked with no penalty under the two
protected modes of Windows version 3.x.

Subsegment Allocation: Why _fmalloc Is Good


One of the current limitations of Windows version 3.x is the systemwide limit of 8192
selectors (4096 for standard mode). Each call to GlobalAlloc uses one selector and has
an overhead of 32 bytes, which makes GlobalAlloc inappropriate for allocating many
small blocks of memory.

For example, take a flat file database that reads in a list of names and addresses from the
hard disk and puts them in a binary tree. If GlobalAlloc is called for each name and
address, this program would not be able to store more than 4096 names. Many companies
have more than 4096 employees. In fact, the actual number of available selectors is far
less than 8192 because all Windows-based applications and libraries must share from the
same pool of selectors.

_fmalloc implements a much more intelligent use of selectors. Instead of allocating a


new segment for each memory request, _fmalloc tries to satisfy as many requests as
possible using a single segment. _fmalloc expands the segment as needed and returns
pointers to areas of memory within the segment. This process of managing memory
within a segment is called subsegment allocation.

In the first call, _fmalloc allocates a segment with GlobalAlloc using


GMEM_MOVEABLE. (GMEM_SHARE, also set when compiling dynamic-link
libraries [DLLs], will be examined in the section on DLLs.) The block allocated by
_fmalloc is, therefore, not fixed in memory. It is movable. The selector associated with
this block of memory will not change. However, because malloc returns a pointer to a
location within the segment, the pointer will not have an offset of zero (selector:0) as
GlobalAlloc does.

In the next call, _fmalloc first tries to satisfy the request without allocating any memory.
If this is not possible, it attempts to do a GlobalReAlloc instead of a GlobalAlloc. This
reduces the number of selectors used by the program. If the segment size must grow
larger than the _HEAP_MAXREQ constant defined in malloc.h to meet the allocation
request, GlobalAlloc is called again. _HEAP_MAXREQ is defined to be 0x0FFE6 or
65,510 bytes. This leaves enough room for the overhead needed to manage the heap and
not have any memory crossing a segment boundary. If more than _HEAP_MAXREQ
memory is requested, the _fmalloc call returns a null pointer.

Figures 1 and 2 illustrate the behavior of _fmalloc.

Figure 1. _fmalloc vs. GlobalAlloc

Figure 1 illustrates how _fmalloc satisfies several memory requests with one segment
consuming only one selector when the requested blocks are less than _HEAP_MAXREQ.
Each call to GlobalAlloc, on the other hand, uses up a selector.
Figure 2. _fmalloc Subsegment Allocation

Figure 2 shows how _fmalloc allocates a new segment when it cannot satisfy a request
with the old segment because the requested block would cause the segment to grow larger
than _HEAP_MAXREQ. Notice how neither GlobalAlloc nor _fmalloc allocates exactly
the number of bytes that are requested. Both functions have some overhead. The current
version of _fmalloc requires 22 bytes of overhead on top of the overhead of GlobalAlloc.
It also defines the smallest segment size to be 26. Future versions of _fmalloc may
require more or less overhead. _fmalloc also returns a pointer that is guaranteed to be
aligned on double-word boundaries.

_fmalloc attempts to be more efficient than GlobalAlloc by allocating memory from


Windows in chunks, hoping to satisfy several memory requests while using only one
selector and without needing to call GlobalAlloc or GlobalReAlloc again. In some cases,
this can lead to faster speeds.

The amount of memory that _fmalloc initially allocates to a new segment is rounded up
to the nearest 4K boundary. If less than 4070 bytes (4096 - 26) is requested, 4K is
allocated. If 4096 - 26 + 1 is requested, 8K is allocated. This behavior differs from the
explanation in the Microsoft C/C++ version 7.0 Run-Time Library Reference, which
states that the initial requested size for a segment is just enough to satisfy the allocation
request.

When _fmalloc can satisfy a request by growing the segment, it calls GlobalReAlloc.
The global variable _amblksiz determines the amount by which the segment is grown.
_fmalloc will grow the segment in enough multiples of _amblksiz to satisfy the request.
The default value of _amblksiz is 4K for Windows, instead of the 8K used by MS-
DOS®. You can set _amblksiz to any value, but _fmalloc rounds it up to the nearest
whole power of two before it is used.

The sample application, Smart Alloc (SMART.EXE), can be used to explore the behavior
of _fmalloc in detail. Examine Smart Alloc's Help file for more information on using it.
Try allocating 1 byte of memory. _fmalloc calls GlobalAlloc with a size of 4K. Try
allocating 4070 bytes and 4071 bytes. Smart Alloc also lets you experiment with different
values of _amblksiz.

The frugal behavior of _fmalloc makes it suited to allocating bunches of small memory
objects. However, as will be shown in the next section, _fmalloc is not suitable for all
uses.

_ffree: Why _fmalloc Is Not Perfect


While the subsegment allocation scheme employed by _fmalloc is very good, the
behavior of _ffree is not as straightforward as GlobalFree. Knowledge of this behavior is
very important to avoid wasting large amounts of memory. The following example
illustrates the behavior of _ffree.

Note In Figures 3 through 7, it is possible for Selector 3 to have a lower or higher value
than Selector 1. The number indicates in what order the selectors were allocated.

Figure 3. Freed Segments Are Not GlobalFree'd

In Figure 3, the last block allocated has been freed. However, its memory is not returned
to the system.

Figure 4. Freed Blocks Are Not Reallocated

In Figure 4, the first and fourth blocks of memory are freed in addition to Block 5. Again,
no memory is returned to Windows with a GlobalFree. If _fmalloc returned the memory
for the first block to Windows, the pointer to Block 2 would have to change. It would be
possible for _fmalloc to GlobalReAlloc the memory associated with Selector 2 and
GlobalFree the memory associated with Selector 3. This can be accomplished with the
C/C++ run-time library, as will be explained in conjunction with Figure 7.

Figure 5. Figure 4 Followed with an _fmalloc(x/2)

In Figure 5, a new block has been allocated. Because this block is half the size of the
previous first block, _fmalloc places it in this empty block of Selector 1.

Figure 6. Figure 5 Followed with an _fmalloc(2 * x)

In Figure 6, another block of memory is allocated. This time it is twice the size of the
previous blocks of memory. Because this block is too large to fit into the heap associated
with Selector 2, the memory associated with Selector 3 is reallocated to hold it.
Figure 7. Figure 4 Followed by _heapmin

If memory is set up as in Figure 4, calling _heapmin will leave memory in the state
shown by Figure 7. _heapmin performs the following actions to achieve this state:

• Memory associated with Selector 1 is GlobalReAlloc'ed to remove the padding.

• Selector 2's memory is GlobalReAlloc'ed to remove the freed block and padding.

• GlobalFree releases Selector 3 and all of its memory.

To recreate the previous examples with Smart Alloc, use 22,000 bytes for the size x. It is
important to note that Smart Alloc sorts allocated memory by handle (that is, selector)
and not the order in which it was allocated.

In addition to _heapmin, the C compiler run-time library contains many other functions
to help manage the heap created by _fmalloc. Descriptions of these functions are in the
Microsoft C/C++ version 7.0 Run-Time Library Reference. Like _heapmin, most of these
functions are unique to C/C++ version 7.0 and are not ANSI C compatible. Below is a list
of these unique functions:

Reallocation functions:
_fexpand Expands or shrinks a block of memory without moving its
location.
_frealloc Reallocates a block to a new size. Might move the block of
memory.
_heapadd Adds memory to a heap.
_heapmin Releases unused memory in a heap.
Information functions:
_fmsize Returns size of an allocated block.
_fheapwalk Returns information about each entry in a heap.
Debugging functions:
_fheapset Fills free heap entries with a specified value.

All programmers who decide to use _fmalloc must be aware that _ffree does not return
memory to the operating system. For example, an application might read in an entire text
file and display it on the screen. Let's say that the application keeps a linked list of lines
and mallocs the memory for each line in the file. If the user selects a large file of about 1
megabyte (MB), the application allocates at least 1 MB of memory. The user then closes
the file. The application faithfully calls _ffree for each line in the file. Even though the
application does not need the memory, it is still hogging it from the system. This
application needs to call _heapmin or one of the other heap management functions.

Why doesn't _ffree call GlobalFree? There are two main reasons:

• Speed. It is faster to keep the memory allocated than to repeatedly call


GlobalAlloc, GlobalReAlloc, and GlobalFree. _fmalloc calls can be extremely
fast when _fmalloc only has to return a pointer to an existing block of memory.

• Pointers. _fmalloc returns pointers to an offset inside a segment. _fmalloc would


have to move the memory pointed to by these pointers if it were to actually call
GlobalFree to free the memory. It is not possible for _fmalloc or _ffree to update
all the pointers into its heap.

Note All memory (freed and unfreed) is returned to the system as part of the Windows
kernel's normal clean-up process when the application exits.

DLLs: Why _fmalloc May Not Do What You Want


As mentioned above, when _fmalloc must allocate a segment, it makes a call to
GlobalAlloc. For applications, it allocates the segment as GMEM_MOVEABLE. For
DLLs, _fmalloc calls GlobalAlloc with GMEM_SHARE | GMEM_MOVEABLE flags.
_fmalloc maintains only one heap for a DLL, which is shared by all applications that use
the DLL. In most cases, programmers do not really want the memory allocated from a
DLL marked as GMEM_SHARE.

The GMEM_SHARE flag tells Windows that this memory is going to be shared by
several programs. The most immediate consequence of using GMEM_SHARE in a DLL
is that the memory will not be released until the DLL is unloaded from memory. The DLL
is not always unloaded from memory when the application that loads it exits. Because
multiple applications or instances of an application are using a DLL, the DLL and its
memory will not be unloaded until all applications using the DLL have exited.

The following are the possible times when memory is freed:

• If an application allocates memory and does not free it, the memory is freed by
Windows when the application exits.

• If an application calls a DLL that allocates memory without the GMEM_SHARE


flag (via GlobalAlloc), the memory is owned by the application and will be freed
when the application exits.

• If an application calls a DLL that allocates memory with the GMEM_SHARE


flag, the memory will be owned by the DLL and not by the application. The
memory will be released when the DLL is unloaded and not when the application
exits.

If a programmer is not careful, the use of _fmalloc in a DLL can lead to large pools of
allocated but unneeded memory. It is usually best to use the GMEM_SHARE flag only
when memory must be shared or must exist for the lifetime of the DLL. This means that,
in many cases, GlobalAlloc should be used instead of _fmalloc in a DLL.

Remember, calling _ffree does not generate a call to GlobalFree. Even if the DLL is
freeing memory before it returns to the application, memory can be wasted. Refer to the
previous section on _ffree for more information.

The situations listed above can be demonstrated by using the Smart Alloc sample
application. Perform the following steps:

1. Run Heap Walker (HEAPWALK.EXE).

2. Run an instance of Smart Alloc (SMART.EXE).

3. GlobalAlloc 1000 bytes of movable memory from a DLL. (See the Smart Alloc
help file for details on how to do this.)

4. Walk the global heap using Heap Walker and examine the listing. The above
memory should be owned by Smart Alloc. It will differ slightly in size due to the
overhead and padding performed by GlobalAlloc.

5. GlobalAlloc 2000 bytes of shared memory from a DLL.

6. Walk the global heap using Heap Walker and examine the listing. The memory
allocated in step 5 should be owned by SMARTDLL.DLL. It will differ slightly in
size due to the overhead and padding performed by GlobalAlloc.

7. Run a second instance of Smart Alloc. Do not exit the first instance.

8. GlobalAlloc 3000 bytes of movable memory from a DLL using the second
instance of Smart Alloc.

9. GlobalAlloc 4000 bytes of shared memory from a DLL using the second instance
of Smart Alloc.

10. Walk the global heap in Heap Walker and examine the listing. The memory
allocated in steps 8 and 9 should be owned and allocated like the memory
allocated by the first instance in steps 4 and 5. In fact, the memory allocated in
step 9 will be allocated in the same segment as the memory allocated for the first
instance of Smart Alloc in step 5.

11. Exit the second instance of Smart Alloc.


12. Walk the global heap using Heap Walker and examine the listing. The 3000-byte
segment will have been discarded by Windows, but the 4000-byte segment owned
by SMARTDLL.DLL will still exist.

Figures 8 and 9 illustrate the above sequence. Figure 8 illustrates the state of memory
after executing steps 1 through 10 in the list above.

Figure 8. State of Memory After Step 10

Figure 9 illustrates what is freed after Instance 2 is deleted.

Figure 9. State of Memory After Closing Instance 2

Remember that _fmalloc allocates memory with the GMEM_SHARE option set.

Versatility: Why _fmalloc Is Not for Everything


While the subsegment allocation makes _fmalloc better for general use, it does not
provide the same kind of versatility that GlobalAlloc does. Below is a list of some of the
things that GlobalAlloc can do that _fmalloc cannot:

• Allocate memory with the GMEM_SHARE flag in an application.

• Allocate nonshared memory from a DLL.

• Allocate more than 64K. GlobalAlloc takes a DWORD, while _fmalloc takes a
size_t, which is an unsigned int. _halloc can also be used to allocate more than
64K in a block of memory.

• Allocate fixed memory, discardable memory, or memory with the other GMEM_*
attributes.

Although most programmers do not think of general protection faults as a positive thing,
they can be helpful in locating where a program writes outside of a memory block.
Because _fmalloc returns a pointer into a block of memory, it is possible to write past the
end of the block and not write past the end of the segment.

Conclusion
In most cases, _fmalloc and _ffree utilize system resources better than directly calling
GlobalAlloc and GlobalFree. The subsegment allocation scheme used by _fmalloc
reduces the number of selectors needed and also reduces the amount of system overhead.

While the subsegment allocation scheme is a boon to programmers for Windows,


_fmalloc is not without its limitations. The most important one to remember is that
memory is not returned to Windows when _ffree is called.

Also keep in mind that calling _fmalloc from a DLL allocates memory with the
GMEM_SHARE attribute set, which is usually not what is wanted because memory is
not freed until the DLL is unloaded.

Calling All Members: Member Functions


as Callbacks
Dale Rogerson
Microsoft Developer Network Technology Group

Created: April 30, 1992

Click to open or copy the files in the CALLB sample application for this technical article.

Abstract
Microsoft® Windows™ version 3.1 has over 30 callback functions that applications can
use to enumerate objects, hook into the hardware, and perform a variety of other
activities. Due to the prevalence of callbacks, it is only natural to want to handle
callbacks with C++ member functions. However, callbacks are prototyped as C functions
and, therefore, do not associate data with operations on that data, making the handling of
callbacks less straightforward when you use C++ than it initially might appear.

This article explains why normal member functions cannot be used as callback functions,
gives several techniques for handling callbacks, and illustrates these techniques with code
fragments. The code fragments are included as the CALLB sample program on the
Microsoft Developer Network CD.

The article and source code are targeted toward Microsoft C/C++ version 7.0, but the
ideas presented apply to all C++ compilers, including those by Borland and Zortech.

The reader should be familiar with Windows callbacks and with C++. A bibliography is
supplied at the end of the article.

The Hidden Parameter, the this Pointer


Every callback function has its own prototype, which determines the parameters that the
Microsoft® Windows™ operating system passes to the function. For example,
EnumObjects is a Windows function that enumerates objects inside of Windows, such as
pens and brushes (these objects should not be confused with C++ objects).
EnumObjectsProc is the callback for EnumObjects and is prototyped this way:

int FAR PASCAL __export EnumObjectsProc( LPSTR lpLogObject, LPSTR


lpData) ;

Note CALLBACK can be used in place of FAR PASCAL above.

When Windows calls the EnumObjectsProc function, it passes the two parameters—
lpLogObject and lpData—to the function.

The following code attempts to set up a member function as a callback. The code
compiles and links successfully but causes a protection fault at run time.

// See CProg1.cpp
// Run nmake -fmake1
class CProg1 {
private:
int nCount ;
// Incorrect callback declaration
// Use a static or nonmember function.
int FAR PASCAL EXPORT
EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData) ;
public:
// Constructor
CProg1() : nCount(0) {};
// Member function
void enumIt(CDC& dc) ;
};

void CProg1::enumIt(CDC& dc)


{
// Register callback
dc.EnumObjects(OBJ_BRUSH, EnumObjectsProc, NULL) ;
}

// Callback handler
int FAR PASCAL EXPORT
CProg1::EnumObjectsProc( LPSTR lpLogObject, LPSTR pData)
{
// Process the callback.
nCount++ ;
MessageBeep(0) ;
return 1 ;
}

If the Windows ::EnumObjects function is called instead of CDC::EnumObjects, as in


this line:
::EnumObjects(hdc, OBJ_BRUSH, (FARPROC)EnumObjectsProc, NULL) ;

the following error would occur:

cprog1.cpp(13) : error C2643: illegal cast from pointer to member

The reason for the above error and protection fault is that C++ member functions have a
hidden parameter known as the this pointer. C++ is able to associate a function with a
particular instance of an object by means of the this pointer. When C++ compiles the
following line:

dc.EnumObjects(OBJ_BRUSH, EnumObjectsProc, NULL) ;

it generates a call equivalent to:

CDC::EnumObjects(OBJ_BRUSH, EnumObjectsProc, NULL, (CDC *)&dc) ;

The last parameter, (CDC*) &dc, is the this pointer. Member functions access an object's
data through the this pointer. C++ handles the this pointer implicitly when accessing
member data. In the CProg1::enumIt function, the line:

nCount = 0 ;

is actually compiled this way:

this->nCount = 0 ;

Windows passes only two parameters to EnumObjectsProc. It does not call functions
through objects and cannot send a this pointer to the callback function. However, as
compiled above, EnumObjectsProc expects three parameters instead of two. The result
is that a random value on the stack is used as the this pointer, causing a crash. To handle
EnumObjectsProc as a member function, the compiler must be told not to expect a this
pointer as the last parameter.

Avoiding the this Pointer


Two function types in C++ do not have a this pointer:

• Nonmember functions

• Static member functions

Nonmember Functions

A nonmember function is not part of a C++ class and, therefore, does not have a this
pointer. A nonmember function does not have access to the private or protected members
of a class. However, a nonmember friend function can access the private and protected
class members with which the function is friendly. Using nonmember functions to handle
a callback is similar to handling a callback in C.

Static Member Functions

Static member functions are class member functions that do not receive this pointers. As
a result:

• An object does not have to be created before a static member function is called or
static member data is accessed.

• The class scope operator can access static members without an object, for
example:
• CFoo::someFunc(someValue)
• A static member function cannot access a nonstatic member of its class without an
object instance. In other words, all object access must be explicit, such as:
• object.nonStatFunc(someValue);
• // NOT: nonStatFunc(someValue) ;

or an object pointer, such as:

ptrObject->nonStatFunc(someValue);
// NOT: nonStatFunc(someValue) ;

The last point above is the kicker. Unlike a nonstatic member function, a static member
function is not bound to an object. A static function cannot implicitly access nonstatic
members.

For more information on static member functions, see the bibliography at the end of this
article.

Techniques for Handling Callbacks


The rest of this article demonstrates techniques for handling callbacks with static member
functions. The main concern is linking the callback routine with a particular object by
providing a pointer to the object—kind of a pseudo-this pointer. In other words, our goal
is to make a static function act like a nonstatic function. You can use the following
techniques to achieve this goal:

• Not providing a pointer

• Providing a pointer in a static member variable

• Passing a pointer in a parameter for application-supplied data

• Keeping a pointer in a collection indexed by a return value


The callback being handled will determine the technique to use. Many callbacks do not
have a parameter for application-supplied data, nor do they return a unique value.

Not Providing a Pointer

In some cases, object pointers are unnecessary because the callback does not need to
access member data. In these cases, the callback operates only on static data. The
following code fragment demonstrates the technique.

// See CProg3.cpp
// Run nmake -fmake3
class CProg3 {
private:
static int statCount ;
int nCount ;
// Use a static member function for callbacks.
static int FAR PASCAL EXPORT
EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData) ;
public:
// Constructor
CProg3() : nCount(0) {};
// Member function
void enumIt(CDC& dc) ;
};

// Static data members must be defined.


int CProg3::statCount = 0 ;

// Enumerate the Windows DC objects.


void CProg3::enumIt(CDC& dc)
{
// Register callback and start enumerating.
dc.EnumObjects(OBJ_BRUSH, EnumObjectsProc, NULL) ;
}

// Callback handler
int FAR PASCAL EXPORT
CProg3::EnumObjectsProc( LPSTR lpLogObject, LPSTR pData)
{
// Process the callback.
statCount++;
// nCount++; This line would cause an error if not commented.
MessageBeep(0) ;
return 1 ;
}

Note that all objects of the CProg3 class above will share the statCount variable. Whether
this is good or bad depends on what the application is trying to accomplish. The
following code fragment illustrates how the outcome might not be what is expected.

void someFunc(CDC& aDC, CDC& bDC, CDC& cDC)


{
// Assume that aDC has a = 3 objects.
// Assume that bDC has b = 4 objects.
// Assume that cDC has c = 7 objects.
// Create some objects.
CProg3 aObject;
CProg3 bObject;
CProg3 cObject;
aObject.enumIt(aDC) ; // statCount = a = 3
aObject.enumIt(bDC) ; // statCount = a + b = 7
aObject.enumIt(cDC) ; // statCount = a + b + c = 14
}

There are several ways to avoid the sharing of data between instances of a class. The next
sections describe techniques that link the callback function to a particular object by
providing a pseudo-this pointer.

Providing a Pointer in a Static Member Variable

The main reason to have a callback as a member function is for accessing class members
unique to a particular object (that is, nonstatic members). A callback member function
must be a static function and, therefore, can only access static members without using "."
or "->".

The next listing shows how to use a static member variable to pass an object's this
pointer to the callback. The callback can then use the pointer to access object members.
To simplify the code, the callback calls a helper function that performs all the work. The
helper function is nonstatic and can implicitly access member data through its this
pointer.

// See CProg5.cpp
// Run nmake -fmake1
class CProg5 {
private:
int nCount ;
// Use a static variable to pass the this pointer.
static CProg5 * pseudoThis ;

// Use a static member function for callbacks.


static int FAR PASCAL EXPORT
EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData) ;

// Use a nonstatic member function as a helper.


int EnumObjectsHelper( LPSTR lpLogObject, LPSTR lpData);
public:
CProg5() : nCount(0) {};
void enumIt(CDC& dc) ;
};

// Static data members must be defined.


CProg5 * CProg5::pseudoThis = NULL;

// Enumerate the objects.


void CProg5::enum(CDC& dc)
{
pseudoThis = this ;
// Register callback and start enumerating.
dc.EnumObjects(OBJ_BRUSH, EnumObjectsProc, NULL) ;
pseudoThis = NULL ;
}

// Callback handler
int FAR PASCAL EXPORT
CProg5::EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData)
{
if (pseudoThis != (CProg *)NULL)
return pseudoThis->EnumObjectsHelper(lpLogObject, lpData) ;
else
return 0 ;
}

int CProg5::EnumObjectsHelper( LPSTR lpLogObject, LPSTR lpData)


{
// Process the callback.
nCount++;
MessageBeep(0) ;
return 1 ;
}

While the above technique works fine in many cases, the objects must coordinate the use
of the callback. For callbacks (such as EnumObjects) that do their work and then exit,
coordination is not much of a problem. For other callbacks, it may be. The techniques
described in the next two sections require less coordination but work only with certain
callbacks.

Passing a Pointer in a Parameter for Application-Supplied Data

A close examination of the EnumObjects function reveals that it has an extra 32-bit
parameter, lpData, for supplying data to the callback routine. This is a great place to pass
a pointer to an object. The following overworked sample demonstrates this technique.

// See CProg6.cpp
// Run nmake -fmake1
class CProg6 {
private:
int nCount ;
// Use a static member function for callbacks.
static int FAR PASCAL EXPORT
EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData) ;
// Use a nonstatic member function as a helper.
int EnumObjectsHelper( LPSTR lpLogObject) ;
public:
CProg6() : nCount(0) {};
void enumIt(CDC& dc) ;
};

// Enumerate the objects.


void CProg6::enumIt(CDC& dc)
{
// Register callback and start enumerating.
dc.EnumObjects(OBJ_BRUSH, EnumObjectsProc, (LPSTR)this) ;
}

// Callback handler
int FAR PASCAL EXPORT
CProg6::EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData)
{
CProg6 * pseudoThis = (CProg6 *)lpData ;
if ( pseudoThis != (CProg6 *)NULL )
return pseudoThis->EnumObjectsHelper(lpLogObject) ;
else
return 0 ;
}

// Callback helper function.


int CProg6::EnumObjectsHelper( LPSTR lpLogObject)
{
// Process the callback.
nCount++;
MessageBeep(0) ;
return 1;
}

This technique will, of course, only work with callbacks that take application-supplied
data. The following list shows those callbacks:

• EnumChildProc

• EnumChildWindows

• EnumFontFamProc

• EnumFontFamilies

• EnumFontsProc

• EnumMetaFileProc

• EnumObjectsProc

• EnumPropFixedProc

• EnumPropMovableProc

• EnumTaskWndProc

• EnumWindowsProc

• LineDDAProc
Keeping a Pointer in a Collection Indexed by a Return Value

Another technique for linking an object pointer with a callback uses the return value of
the function that sets up the callback. This return value is used as an index into a
collection of object pointers.

In the following example, SetTimer sets up a TimerProc callback and returns a unique
timer ID. The timer ID is passed to TimerProc each time the function is called. The
CTimer class uses the timer ID to find the object pointer in a CMapWordToPtr
collection. The CTimer class is an abstract class designed to be inherited by other classes.

// See CTimer.h
// Run nmake -ftmake

// Declaration
class CTimer
{
private:
UINT id ;
static CMapWordToPtr timerList ;
static void stopTimer(int id) ;
static void FAR PASCAL EXPORT
timerProc(HWND hwnd, UINT wMsg, int timerId, DWORD dwTime);

protected:
virtual void timer(DWORD dwTime) = 0 ;

public:
// Constructor
CTimer() : id(NULL) {};
// Destructor
~CTimer() {stop();};
// Use
BOOL start(UINT msec) ;
void stop() ;
};

// Define statics.
CMapWordToPtr CTimer::timerList ;

// Implementation

BOOL CTimer::start (UINT msecs)


{
id = SetTimer(NULL,0,msecs,(FARPROC)timerProc);
if (id != NULL)
{
timerList.SetAt(id, this);
return TRUE ;
}
else
return FALSE;
}
void CTimer::stop()
{
if (id != NULL)
{
stopTimer(id) ;
id = NULL ;
}
}

static void CTimer::stopTimer(int timerId)


{
KillTimer(NULL,timerId) ;
timerList.RemoveKey(timerId) ;
}

static void FAR PASCAL EXPORT


CTimer::timerProc(HWND hwnd, UINT wMsg, int timerId, DWORD dwTime)
{
CTimer * pseudoThis ;
if ( timerList.Lookup(timerId, (void*&)pseudoThis))
{
if ( pseudoThis != (CTimer *)NULL)
pseudoThis->timer(dwTime) ;
else
stopTimer(timerId) ;
}
else
KillTimer(NULL,timerId) ;
}

// Inherit CTimer class in order to use it.


class CMyTimer : public CTimer {
protected:
void timer(DWORD dwTimer) { MessageBeep(0); } ;
};

Conclusion
Static member functions are used in C++ to handle callbacks because they do not have
this pointers. Callback functions are not designed to accept this pointers. Because static
member functions do not have this pointers and, in many cases, it is desirable to have
access to an object, this article has suggested four ways of providing the static member
function with a this pointer.

Bibliography
For more information on C++ topics such as the this pointer, friend functions, or static
functions, see:

• Stroustrup, Bjarne. The C++ Programming Language. 2d ed. Addison-Wesley,


1991.
• Ellis and Stroustrup. The Annotated C++ Reference Manual. Addison-Wesley,
1990.

• Lippman, Stanley B. C++ Primer. 2d ed. Addison-Wesley, 1991.

• Microsoft C/C++ version 7.0 C++ Language Reference. Microsoft Corporation,


1991.

• Microsoft C/C++ version 7.0 C++ Class Libraries User's Guide, Microsoft
Corporation, 1991.

For information on callbacks, see:

• Microsoft Windows version 3.1 Software Development Kit (SDK) Programmer's


Reference, Volume 1: Overview. Microsoft Corporation, 1987-1992.

• Microsoft Windows version 3.1 SDK Programmer's Reference, Volume 2:


Functions. Microsoft Corporation, 1987-1992.

• Microsoft Windows version 3.1 SDK Guide to Programming. Microsoft


Corporation, 1987-1992.

• Petzold, Charles. Programming Windows. 2d ed. Microsoft Press, 1990.

• Norton, Peter and Paul Yao. Peter Norton's Windows 3.0 Power Programming
Techniques. Bantam Computer Books, 1990.

The C/C++ Compiler Learns New Tricks


Dale Rogerson
Microsoft Developer Network Technology Group

Created: August 28, 1992


Revised: January 27, 1993
The section on simplified building was removed. (This method links all programs with
the /NOI option enabled, which causes problems.)

Click to open or copy the files in the Back sample application for this technical article.

Abstract
WinMain, GlobalAlloc, and mixed-model programming—these are just some of the
conventions C programmers had to accept when they started programming for the
Microsoft® Windows™ operating system. Microsoft C/C++ version 7.0 can now hide
these conventions so that programmers can use standard C practices; applications thus
become much easier to develop and port. This article provides an overview of
programming conventions that C/C++ programmers no longer need and a discussion of
the new programming practices in C/C++ version 7.0. A bibliography of suggested
reading material is included at the end of this article.

A sample application called Back (BACK.EXE) and its accompanying dynamic-link


library (DLL) called Trace (TRACE.DLL) demonstrate many of the ideas in this article.
See the "Notes on the Sample Application" section for more information about Back and
Trace.

Note: The information in this article is valid only for Microsoft Windows version 3.x
standard and enhanced modes.

Introduction
The Microsoft® C/C++ version 7.0 compiler and run-time libraries were designed for the
Microsoft Windows™ operating system. For this reason, programmers no longer have to
follow many of the conventions that differentiated Windows-based programs from MS-
DOS®–based programs. For example, C/C++ programmers can now use:

• Large-model programming instead of mixed-model programming.

• The main function instead of the WinMain function.

• _fmalloc instead of GlobalAlloc.

• GlobalAllocPtr instead of GlobalAlloc.

• Dynamic-link libraries (DLLs) with default LibMain and WEP.

The following sections discuss each of these topics in detail.

Large Model vs. Mixed Model


One of the first weird conventions that programmers moving to Windows face is mixed-
model programming. Mixed-model programming brings out the worst in segmented
processor architectures. Some pointers are near while others are far. Some variables
default to near while others default to far. Source code becomes a confused mass with
near and far casts strewn throughout. In Windows protected modes, large model is now
the model of choice.

Single Instances

The behavior of Microsoft C version 6.0 was one reason why programmers were
reluctant to use the large model. C version 6.0 built large-model applications with
multiple read/write data segments. Windows forces an application that uses multiple
read/write data segments to be single instance; therefore, applications built by C version
6.0 would run only single instance.

If you want to build a single-instance application, the Microsoft C/C++ compiler's large
model gives it to you for free. There is no need to check hPrevInstance—Windows does
all the work for you, including putting up an informative dialog box that tells the user that
only one instance can run.

Note If you are not using the Microsoft C/C++ compiler, you should check the
documentation for your C compiler to see which options will generate multiple read/write
data segments.

Multiple Instances

It is possible to get multiple instances with a large-model application. If you use


Borland® C++ or Microsoft C/C++ version 7.0, it is easy to get a single 64K read/write
data segment. For the Microsoft C/C++ compiler, the /Gx and /Gtnnn options will do the
trick; for the Borland C++ compiler, a single data segment is the default.

For more information, see the "Programming at Large" and "Allocating Memory the
Newfangled Way: The new Operator" technical articles on the Microsoft Developer
Network CD (Technical Articles, C/C++ Articles).

Performance

Many programmers are concerned about the amount of overhead in a large-model


application compared with a small-model or mixed-model application. Performance is
never free. Just using the mixed model instead of the large model never makes an
application significantly faster. The best method is to use a profiler to determine which
code gets executed the most, and optimize that code.

It is preferable to optimize code using portable techniques. If you spend a week making
functions near and using other optimizations specific to a segmented architecture, the
optimizations (and your week of work) will be lost when you port the code to Windows
NT™. Instead, you could spend the week reworking the algorithms used in the code that
is executed the most. These improvements will impact performance more significantly
than which language, compiler, or compiler options you use.

However, if your marketing department changes specifications faster than an 80486 can
prefetch an instruction, algorithms often change overnight. In this situation, a
programmer must often use the compiler (sometimes blindly) to try to speed up code
instead of optimizing the code itself.

main vs. WinMain


The Microsoft C/C++ startup code first checks for a function labeled main in a
Windows-based program. If it cannot find main, it tries to locate a function called
WinMain. The gist of this wonderful information is that a Windows-based application
can use main instead of WinMain as its entry point, just like an MS-DOS C program.
One of the standard ways to declare main is:

void main(int argc, char *argv[], char **envp)


{
}

Why would a program want to use main? Possibly for portability or to use a common
source between Windows and MS-DOS or UNIX®. Using main also allows
programmers to build upon their MS-DOS knowledge for handling the command line and
the environment.

Not to be outdone by any old application, DLLs can also use main instead of LibMain as
an entry point. However, in C/C++ version 7.0, the Windows libraries include a default
LibMain, so most DLLs will not need a main or LibMain function. This is covered later
in the "Using DLLs" section.

The above information was found in plain and public display in the DETAILS.TXT file,
which is provided with the Microsoft C/C++ compiler. Those interested in reading code
should check out the SOURCE\STARTUP\WIN directory for the STUBMAIN.ASM and
CRT0.ASM files.

Getting to hInstance

The careful reader will be wondering where the program is going to get its instance
handle. Why, from _hInstance, of course! _hInstance is an undocumented feature of the
C/C++ startup code.

When Windows calls the startup code, the instance handle is passed to the startup code in
the DI register, as documented in the Microsoft Windows version 3.1 Software
Development Kit (SDK) Programmer's Reference, Volume 1: Overview, in Chapter 22.

The instance handle is then placed in a global variable called _hInstance. To access this
variable, you must declare it first:

extern const HINSTANCE _hInstance;

The startup code also includes the following global variables for the other parameters
normally passed to WinMain:

• _hPrevInstance

• _lpszCmdLine
• _cmdShow

You can access these variables by using the following declarations:

extern const HINSTANCE _hPrevInstance;


extern const LPSTR _lpszCmdLine;
extern const int _cmdShow;

The parameters passed to a DLL are different from parameters passed to an application.
The following global variables are defined in the startup code for a DLL:

• _hModule

• _lpszCmdLine

• _wDataSeg

• _wHeapSize

The following declarations will give you access to these variables:

extern const HINSTANCE _hModule ;


extern const LPSTR _lpszCmdLine ;
extern const WORD _wDataSeg ;
extern const WORD _wHeapSize ;

A quick look into the startup code uncovered the above information. The startup code is
included with Microsoft C/C++ version 7.0; look in the SOURCE\STARTUP directory.
For more information on the startup code and what it does, see "A Comprehensive
Examination of the Microsoft C Version 6.0 Startup Code" in the Microsoft Systems
Journal, Vol. 7, No. 1, on the Microsoft Developer Network CD. The article examines C
version 6.0 startup code for MS-DOS, but most of the information is also valid for
version 7.0. This article explains the work the startup code must perform and provides
background information for reading the source code.

Note The startup source code is subject to change between compiler releases. The
inclusion of specific startup variables or functions is not guaranteed in future releases.

_fmalloc vs. GlobalAlloc


The big problem with GlobalAlloc is that it consumes a selector for each call. A selector
is a limited resource in Windows version 3.x, so GlobalAlloc is inappropriate for
allocating small blocks of memory such as nodes in a linked list. The solution is to
implement a subsegment allocation scheme in which one segment is allocated with
GlobalAlloc and divided up into small blocks.
Fortunately, Microsoft C/C++ version 7.0 includes a subsegment allocation scheme called
_fmalloc. _fmalloc is the large-model or model-independent version of malloc. When
you compile with the large model, malloc is mapped to _fmalloc. In other memory
models, malloc must explicitly be called _fmalloc.

_fmalloc manages its own heap on top of the Windows global heap. When _fmalloc is
called, it first checks whether it can satisfy the memory request by simply returning a
pointer to an unused block inside its heap. If it can't, _fmalloc takes one of the following
actions:

• If the 64K limit of the block is reached or no memory had been allocated,
_fmalloc allocates a segment with GlobalAlloc.

• If more room is needed, _fmalloc enlarges a segment with GlobalReAlloc.

You use _ffree to free the memory blocks allocated by _fmalloc. However, _ffree does
not call GlobalFree. Instead, _ffree marks a block as unused, and _fmalloc tries to
satisfy future requests for memory with these unused blocks by reusing them. The
_heapmin function releases unused blocks back to Windows.

For more information on using malloc in a Windows-based program, see the "Allocating
Memory the Old-Fashioned Way: _fmalloc and Applications for Windows" technical
article on the Microsoft Developer Network CD (Technical Articles, C/C++ Articles).

GlobalAllocPtr vs. GlobalAlloc


If you don't want to use _fmalloc, at least use GlobalAllocPtr instead of GlobalAlloc.
GlobalAllocPtr is a macro defined in WINDOWSX.H that allocates the memory, locks
the handle, and returns a pointer to the allocated memory. To free the memory, use
GlobalFreePtr. There is no need to retain memory handles or lock and unlock memory
blocks.

What makes this possible is the GlobalHandle function, which takes a pointer and
returns the handle to it. GlobalHandle removes the need for saving and tracking handles,
resulting in incredible savings in time, memory, and complexity.

Other convenient memory macros in WINDOWSX.H are:

• GlobalPtrHandle

• GlobalLockPtr

• GlobalUnlockPtr

• GlobalReAllocPtr
If these macros were C functions, they would be prototyped as follows:

void FAR * GlobalAllocPtr(UINT flags, DWORD size) ;


// Allocates and locks a block of size bytes with the
// flags set.

BOOL GlobalFreePtr(void FAR* lp) ;


// Unlocks and frees the block pointer by lp;
// returns a non-zero on success.

void FAR * GlobalReAllocPtr(void FAR* lp, DWORD size, UINT flags) ;


// Reallocates the block pointed to by lp to size bytes with
// the flags set.
// The return value is the pointer to the reallocated block.

HGLOBAL GlobalPtrHandle(void FAR* lp) ;


// Gets global handle pointed to by lp from FAR pointer.

BOOL GlobalLockPtr(void FAR* lp) ;


// Locks the block lp points to.
// If successful, returns a non-zero value.

BOOL GlobalUnlockPtr(void FAR* lp) ;


// Unlocks the block lp points to.
// If successful, returns a non-zero value.

For the curious, here are the definitions of GlobalAllocPtr and GlobalFreePtr:

#define GlobalAllocPtr(flags, cb) \


(GlobalLock(GlobalAlloc((flags), (cb))))

#define GlobalFreePtr(lp) \
(GlobalUnlockPtr(lp),(BOOL)GlobalFree(GlobalPtrHandle(lp)))

Using DLLs
Microsoft C/C++ version 7.0 run-time libraries provide better support for building DLLs.
Two changes that simplify building DLLs are:

• A default LibMain function

• A default WEP function

Most of the information for this section can be found in the DETAILS.TXT file, which is
included with the Microsoft C/C++ compiler.

Note The library files that do not include the C run-time functions (for example,
xNOCRTDW.LIB, where x is the memory model) do not have a default LibMain or
WEP function. You must provide your own LibMain and WEP functions if you use
these libraries.
LibMain

Many DLLs are collections of functions that do not need to perform initialization and
therefore do nothing in the LibMain function. If a function does not do anything, it
would be nice if the developer did not have to worry about it. The C run-time libraries
now include a version of LIBENTRY.OBJ and a default LibMain function. So, if the
DLL links to the C run-time functions, it does not have to link to LIBENTRY.OBJ or
provide its own LibMain function.

The default LibMain function is very clever. It does nothing.

WEP

It is no longer necessary to include a dummy WEP function in your DLL code. The C
run-time libraries now include a default version of the WEP function. The default WEP
performs the following functions:

1. Calls the optional user termination function _WEP (see next section).

2. Performs C exit processing (calls _cexit).

3. Returns to Windows the value returned by _WEP.

Placing WEP in a fixed segment ensures that it will exist in memory in case of error. For
proper placement of WEP, include the following lines in the .DEF file:

SEGMENTS 'WEP_TEXT' FIXED PRELOAD

EXPORTS
WEP @1 RESIDENTNAME

The source code for the default WEP function is included with the Microsoft C/C++
version 7.0 compiler. Look in the SOURCE\STARTUP\WIN directory for a file called
WEP.ASM.

_WEP

To add your own processing to the default WEP function, add a _WEP function to your
DLL. (Note the leading underscore character.) Here is an example:

int FAR PASCAL _WEP(int nExitType);

// Put _WEP code into same fixed segment as the WEP function.
#pragma alloc_text(WEP_TEXT, _WEP)

int FAR PASCAL _WEP(int nExitType)


{
//
// Exit cleanup code goes here.
//
return nExitType ;
}

The _WEP function is optional; use this function for cleanup tasks that you want done
when the DLL is unloaded. If you do not provide a _WEP function, the default WEP
function calls the default _WEP function, which simply returns a one (1). To verify for
yourself, check the source in the STUBWEP.ASM file included with the Microsoft
C/C++ compiler in the SOURCE\STARTUP\WIN directory.

Avoid the following in a _WEP function:

• Do not use deep stacks (that is, do not use recursion or call a bunch of functions).

• Do not use operating system requests.

• Do not use file I/O.

• Do not call functions that are not in a FIXED segment.

Building DLLs becomes much easier with the default WEP and LibMain functions. It is
almost possible to cut functions from an application and simply recompile them to get a
DLL. Using the large model for both the DLL and the application simplifies this process.

Notes on the Sample Application


The sample application, Back, demonstrates some of the concepts presented in this
article. Back lists command-line options and environment variables. It can be built for
MS-DOS or for Windows. The MKWIN.BAT batch file builds the Windows version,
while the MKDOS.BAT batch file builds the MS-DOS version.The code for the sample
application is simple and straightforward.

To display the output, the MS-DOS version of Back uses printf and the Windows version
uses trace, which is a function exported from a DLL called TRACE.DLL. trace performs
printf-like printing to the debug monitor. It demonstrates how to export a CDECL
variable argument function from a DLL and shows how simple a DLL can be.

To view the BACK.C and TRACE.C files, click the sample application button at the
beginning of this article.

Conclusions
Microsoft C/C++ version 7.0 introduces new programming practices that facilitate the
development of applications for Windows version 3.1 in protected mode. Programmers
can now:
• Use the large memory model.

Large-model programs are compatible with the protected modes of Windows


version 3.1 and can have multiple instances.

• Use _fmalloc.

With C/C++, _fmalloc, which is the large or model-independent version of


malloc, performs subsegment allocation and conserves selector usage.

• Use the run-time version of LibMain.

DLLs no longer have to link to LIBMAIN.OBJ because the C/C++ run-time


libraries include a default LibMain.

• Use the run-time version of WEP.

You no longer need a dummy WEP function because the C/C++ run-time
libraries include a default WEP function. To add your own exit processing, use a
_WEP function.

Bibliography
Technical Articles

All of the articles below are available on the Microsoft Developer Network CD
(Technical Articles, C/C++ Articles):

• "Microsoft Windows and the C Compiler Options"

• "Allocating Memory the Old-Fashioned Way: _fmalloc and Applications for


Windows"

• "Allocating Memory the Newfangled Way: The new Operator"

• "Programming at Large"

Product Documentation

On the Microsoft Developer Network CD, you can find the following books under
C/C++ 7.0 in the Product Documentation section of the Source index:

• Programming Techniques

• Environment and Tools


See the following book under Windows 3.1 SDK in the Product Documentation section of
the Source index:

• Programmer's Reference, Volume 1: Overview

Other

• DETAILS.TXT included with Microsoft C/C++ version 7.0

How to Pass Parameters Between


COBOL and C
Michael Hendrick
Systems Support Engineer, Languages

February 1992

Abstract
This article explains how Microsoft® COBOL programs can pass parameters to and
receive parameters from Microsoft C programs. It assumes you have a basic
understanding of the COBOL and C languages.

Microsoft COBOL supports calls to routines written in Microsoft C, FORTRAN, Pascal,


and Assembler. This article describes the necessary syntax for calling Microsoft C
routines and contains a series of examples demonstrating the interlanguage capabilities
between COBOL and C. The sample programs apply to the following Microsoft products:

• Microsoft COBOL Professional Development System (PDS) versions 4.0 and 4.5
for MS-DOS® and OS/2®

• Microsoft C Optimizing Compiler version 6.0 for MS-DOS and OS/2

Mixed-Language Programming with COBOL and C


The C Interface to COBOL

The C interface to COBOL utilizes the standard C extern statement. The following are the
recommended steps for using this statement to execute a mixed-language CALL from C:

1. In the C code, include an extern statement for each COBOL routine CALLed. The
extern statement should be at the beginning of the C program, before any CALLs
to the COBOL routine.
Note: When compiling, if the /Gc compiler directive is used (the /Gc option
causes all functions in the module to use the FORTRAN/Pascal naming and
CALLing conventions), then the cdecl keyword should be used when the COBOL
function is declared (because COBOL uses the C CALLing convention, not the
Pascal CALLing convention).

2. To pass an argument by reference, pass a pointer to the object (all parameters


must be passed by reference to COBOL).

C automatically translates array names into addresses. Therefore, arrays are


automatically passed by reference and don't need the * (asterisk) operator.

3. Once a routine has been properly declared with an extern statement, CALL it just
as you would CALL a C function.

4. If passing structures between COBOL and C, compile the C routine with the /Zp1
compiler option to pack structure members.

5. Always compile the C module in large model.

C Arguments

The default for C is to pass all arrays by reference (near or far, depending on the memory
model) and all other data types by value. C uses far data pointers for compact, large, and
huge models, and near data pointers for small and medium models.

Passing C Arguments by Value

The C default is to pass everything except arrays by value.

Arrays can be passed by value only if they are declared as the only member of a structure.
The following example passes all 100 bytes of x directly to the C function test():

struct x_struct {int x[100]) xs;


.
.
.
test(xs)

Passing C Arguments by Reference (Near or Far)

In C, passing a pointer to an object is the equivalent of passing the object itself by


reference. Within the CALLed function, each reference to the parameter itself is prefixed
by an * (asterisk).

Note: To pass a pointer to an object, prefix the parameter in the CALL statement with
&. To receive a pointer to an object, prefix the parameter's declaration with *. In the latter
case, this may mean adding a second * to a parameter that already has an *. For example,
to receive a pointer by value, declare it as follows:

int *ptr;

But to receive the same pointer by reference, declare it as the following:

int **ptr;

The default for arrays is to pass by reference.

Effect of C Memory Models on Size of Reference

Near reference is the default for passing pointers in small and medium model C. Far
reference is the default for the compact, large, and huge models.

Note All C programs that are linked with COBOL must be compiled with the large
memory model.

Restrictions on CALLs from COBOL

The COBOL to C interface does not support near heap in the C run time. This means you
should not use the function calls that access near heap in your C programs. This includes
the following functions:

• _nfree()

• _nheapchk()

• _nheapset()

• _nheapwalk()

• _nmalloc()

• _nmsize()

To work around this, compile and link with C as the initial program. After the main C
program begins, the COBOL routine can be CALLed. The COBOL code can then CALL
back and forth with C. Since the C support modules are not used, there are no special
restrictions on the near heap functions.

Special Note on C Strings

C stores strings as simple arrays of bytes (like COBOL) but also uses a null character
[ASCII NULL (0)] as the delimiter to show the end of the string. For example, consider
the string declared as follows:
char str[] = "String of text"

The string is stored in 15 bytes of memory as follows:

|S|t|r|i|n|g| |o|f| |t|e|x|t|\0|

When passing a string from COBOL to C, the string will normally not have a NULL
appended to the end. Because of this, none of the C routines that deal directly with a
string (printf, sprintf, scanf, and so on) can be used with these strings unless a NULL is
appended to the end.

A NULL can be put at the end of a COBOL string by using the following declaration:

01 CSTRING.
05 THE_STRING PIC X(10).
05 FILLER PIC X VALUE x"0".

Compiling and LINKing

Several compile and link options need to be used when interfacing C and COBOL. The
standard C compile line is as follows:

CL /c /Aulf CProgName ;

Option Description
/c Compiles without linking (produces only an .OBJ file).
/Aulf Sets up a customized large memory model.
u SS not equal to DS. DS is reloaded on function entry.
l Selects large memory model Far (32-bit) code pointers.
f Selects large memory model Far (32-bit) data pointers.

The standard LINK line for COBOL CALLing C is as follows:

For MS-DOS®

LINK CobProg CProg MFC6INTF C6DOSIF C6DOSLB,,,LCOBOL COBAPI


LLIBCE/NOE/NOD;

For OS/2®

LINK CobProg CProg MFC6INTF C6OS2IF C6OS2LB,,,LCOBOL OS2


LLIBCEP/NOE/NOD;
The standard LINK line for C CALLing COBOL is as follows:

For DOS

LINK CProg CobProg,,,LLIBCE LCOBOL COBAPI/NOE/NOD;

For OS/2

LINK CProg CobProg,,,LLIBCEP LCOBOL OS2/NOE/NOD;

Note that the order in which the libraries are specified in the LINK line is important.

Microsoft® COBOL versions 4.0 and 4.5 introduced the shared run-time system.
Although it is generally more useful to link your applications using the static run-time
system (LCOBOL.LIB), you may also choose to link the applications with the shared
run-time library (COBLIB.LIB) to take advantage of its more efficient methods of
utilizing memory. In order to do this and link your applications to Microsoft C, you must
SET the COBPOOL environment variable as referenced in the Microsoft COBOL
Operating Guide.

Common Pitfalls
This list supplies a simple checklist to go over when you encounter problems doing
mixed-language programming:

• Make certain the version numbers of the two languages are compatible. Microsoft
COBOL versions 4.0 and 4.5 are compatible with the C versions 5.1 and 6.x.

• Use the /NOD switch when LINKing to avoid duplicate definition errors. If
duplicate definition errors still occur, use the /NOE switch in addition to the
/NOD switch when LINKing.

• Watch for incompatible functions such as _nfree() and _nheapchk().

• Make certain the C program is compiled in the large memory model and the /Aulf
compile options are used.

• If passing structures (records) to and from COBOL, use the /Zp1 compile option.
(/Zp1 means that structure members will be packed on one-byte boundaries.)

• When COBOL is the main module and there are some C functions that are not
working correctly, make the C routine the main routine and then CALL the
COBOL routine. The COBOL routine can then in turn CALL back into the C
routines. When this method is used, the COBOL/C support modules do not have
to be used. This can correct some incompatibilities.
Batch FIles
The following batch files can be helpful when using the sample programs below. The
CBC6.BAT file can be used to set your environment table correctly, but think of it as a
convenience rather than a necessity when using. This means that you should already have
these parameters preset in your environment when using both languages in tandem.

CBC6.BAT

REM THIS BATCH FILE SHOULD CONFIGURE THE ENVIRONMENT TABLE TO ENABLE
REM YOU TO COMPILE BOTH THE C AND COBOL APPLICATIONS UNDER MS-DOS
REM CORRECTLY.
REM
REM PLEASE LOOK CLOSELY AT THE ENVIRONMENT SETTINGS AND CHANGE THOSE
REM NECESSARY IN YOUR OWN AUTOEXEC.BAT FILE.
REM
REM NOTE: IF, AFTER INVOKING THIS BATCH FILE, YOU SEE THE MESSAGE
REM "OUT OF ENVIRONMENT", YOU WILL HAVE TO INCREASE THE AMOUNT OF
REM ENVIRONMENT TABLE SPACE. PLEASE SEE YOUR MS-DOS MANUAL UNDER THE
REM HEADING COMMAND.COM FOR INSTRUCTIONS ON HOW TO DO THIS.
REM
SET LIB=C:\COBOL\LIB;C:\C600\LIB
SET INCLUDE=C:\C600\INCLUDE;C:\COBOL\SOURCE
SET HELPFILES=C:\C600\HELP\*.HLP
SET INIT=C:\C600\INIT;C:\COBOL\INIT
PATH=C:\COBOL\BINB;C:\COBOL\BINR;C:\C600\BINB;C:\C600\BIN;C:\DOS

RUN.BAT

REM THIS BATCH FILE CAN BE USED TO COMPILE AND LINK BOTH THE C AND
REM COBOL APPLICATIONS FOR MS-DOS.
REM
REM THOSE PROGRAMS THAT REQUIRE A DIFFERENT METHOD OF COMPILING AND
REM LINKING WITHIN THE SCOPE OF THIS APPLICATION NOTE WILL BE NOTED.
REM
REM TO INVOKE THIS BATCH FILE, YOU MUST ENTER THE BATCH FILE NAME,
REM FOLLOWED BY THE C PROGRAM NAME (WITH NO EXTENSION), FOLLOWED BY
REM THE COBOL PROGRAM NAME. FOR EXAMPLE:
REM
REM RUN <C PROGRAM NAME> <COBOL PROGRAM NAME>
REM
cl /c /Aulf %1.c
COBOL %2;
LINK %2 %1 MFC6INTF C6DOSIF C6DOSLB,,,LCOBOL COBAPI LLIBCER /NOD/NOE;

RUN_C.BAT

REM THIS BATCH FILE CAN BE USED TO COMPILE AND LINK UNDER MS-DOS
REM ONLY WHEN THE SAMPLE C CODE IS CALLING A COBOL PROCEDURE.
REM
REM THOSE PROGRAMS THAT REQUIRE A DIFFERENT METHOD OF COMPILING AND
REM LINKING WITHIN THE SCOPE OF THIS APPLICATION NOTE WILL BE NOTED.
REM
REM TO INVOKE THIS BATCH FILE, YOU MUST ENTER THE BATCH FILE NAME,
REM FOLLOWED BY THE C PROGRAM NAME (WITH NO EXTENSION), FOLLOWED BY
REM THE COBOL PROGRAM NAME. FOR EXAMPLE:
REM
REM RUN <C PROGRAM NAME> <COBOL PROGRAM NAME>
REM
cl /c /Aulf %1.c
COBOL %2;
LINK %1 %2,,,LLIBCER LCOBOL COBAPI/NOD/NOE;

WINRUN.BAT

REM THIS BATCH FILE IS USED TO COMPILE AND LINK THE QUICKWIN
REM APPLICATION PROGRAM DEMONSTRATED IN THIS DOCUMENT. THIS IS A
REM SPECIALIZED BATCH FILE. IT HAS BEEN CREATED SPECIFICALLY FOR THE
REM SAMPLE PROGRAM PRESENTED.
REM TO CREATE A GENERIC BATCH FILE, CHANGE ALL OCCURRENCES CDLL AND
REM TEST TO %1 AND %2 RESPECTIVELY.
REM
cl /ML /Gs /c /Zi CDLL.C
LINK CDLL+LIBENTRY,CDLL.DLL,CDLL.MAP/MAP,LDLLCEW+LIBW/NOE/NOD,CDLL
/CO;
IMPLIB CDLL.LIB CDLL.DLL
COPY CDLL.DLL C:\
COBOL TEST TARGET(286);
LINK CBLWINC+TEST+ADIS+ADISINIT+ADISKEY,TEST.EXE,,LIBW+LLIBCEW+LCOBOL+
COBAPIDW+CDLL.LIB,TEST.DEF/NOE/NOD;

Sample Code
The following sample code demonstrates how to pass common numeric types to a C
routine by reference and by value.

COBNUMS.CBL

* Passing Common Numeric Types to C by Reference and by Value


working-storage section.
01 field1 pic 9(4) comp-5 value 123.
01 field2 pic 9(8) comp-5 value 123456.
01 field3 pic 9(4) comp-5 value 456.
01 field4 pic 9(8) comp-5 value 456789.
procedure division.
* Fields 1 and 2 (below) are passed BY REFERENCE. The keywords
* are omitted here since BY REFERENCE is the default method.
call "_CFUNC" using field1, field2,
by value field3,
by value field4.
display "Returned pic 9(4): " field1.
display "Returned pic 9(8): " field2.
stop run.

CFUNC.C
#include <stdio.h>
void CFunc(int *RefInt, long *RefLong, int ValInt, long ValLong)
{
printf("By Reference: %i %li\r\n", *RefInt, *RefLong);
printf("By Value : %i %li\r\n", ValInt, ValLong);

*RefInt = 321;
*RefLong = 987654;
}

OUTPUT

Returned PIC 9(4): 00321


Returned PIC 9(8): 000987654
By Reference: 123 123456
By Value : 456 456789

The following sample code demonstrates how to pass an alphanumeric string from C to
COBOL.

_COBPROG.CBL

program-id. "_cobprog".
data division.
linkage section.
01 field1 pic x(6).

procedure division using field1.


display "String from C: " field1.
stop run.

C.C

#include <stdio.h>
extern cdecl cobprog(char *Cptr);
char Cptr[] = "ABCDEF";
void main() {
cobprog(Cptr);
}

Output

String from C: ABCDEF

The following sample code demonstrates how to pass a record from COBOL to a C data
struct.

STRUCT.CBL

$set vsc2 rtncode-size(4)


* Passing a Record from COBOL to a C struct
data division.
working-storage section.
01 rec-1.
02 var1 pic X(8) value "HELLO".
02 var2 pic X(12) value "W O R LD".
02 varc2 pic 9(04) comp-5 value 2.
02 varc3 pic 9(04) comp-5 value 3.
02 varc4 pic 9(04) comp-5 value 4.
02 varc5 pic 9(04) comp-5 value 5.
02 varc1 pic 9(04) comp-5 value 1.

procedure division.
call "C_FUNCTION1" using by reference rec-1.
display "CBL varC--> " varC1.
display "CBL varC--> " varC2.
display "CBL varC--> " varC3.
display "CBL varC--> " varC4.
display "CBL varC--> " varC5.
display "CBL var1--> " var1.
display "CBL var2--> " var2.
stop run.

STRUCTC.C

#include <stdio.h>
struct struct1 {
unsigned char var1[8];
unsigned char var2[12];
unsigned int var3[5];
};

function1(struct struct1 far *p1)


{
int a;
for (a=0; a<5; a++)
printf("%i\n",p1->var3[a]);

for (a=0; a<8; a++)


printf("%c", p1->var1[a]);
printf("\n");

for (a=0; a<12; a++)


printf("%c", p1->var2[a]);
printf("\n");
}

OUTPUT

2
3
4
5
1
HELLO
W O R LD
CBL VARC--> 00001
CBL VARC--> 00002
CBL VARC--> 00003
CBL VARC--> 00004
CBL VARC--> 00005
CBL VAR1--> HELLO
CBL VAR2--> W O R LD

The following sample code demonstrates how to pass a record from struct from C to
COBOL.

_COBPROC.CBL

identification division.
environment division.
data division.
working-storage section.
01 Integer pic 9(4).
01 Long pic 9(8).
linkage section.
01 CobRec.
03 COBInt pic s9(4) comp-5 value zero.
03 COBLong pic s9(8) comp-5 value zero.
03 COBString pic x(21) value spaces.
procedure division using CobRec.
move COBInt to Integer.
move COBLong to Long.
display "Integer from C: " Integer.
display "Long integer from C: " Long.
display "String from C: " COBString.
exit program.

STRUCT2C.C

#include <stdio.h>
#include <malloc.h>
#include <stdlib.h>
struct CobRec // defines data type CobRec
{
unsigned int varInt; // integer variable
unsigned long varLong; // long int
char szString[21]; // string variable
};

/* COBOL routines are cdecl; this means the name must be prefixed
* with '_'. Alternatively, you can manually reverse the
* parameters.
*/
extern far cdecl COBPROC(struct CobRec *cPtr);

main()
{
struct CobRec *cPtr; // declare pointer to struct

// get memory to hold struct


cPtr = (struct CobRec *) _fmalloc(sizeof(struct CobRec));
/* NOTE: COBOL will be the main program unless BP is nonzero.
* BP is zero until some local variables are allocated and used.
*
* In this example, we do use some local variables; therefore, this
* is taken care of already.
*/
printf("Positive Integers and String\n");
cPtr->varInt = 32767; // refer to member of struct and
cPtr->varLong = 60000; // assign values
strcpy(cPtr->szString,"This is a test string");

COBPROC ( cPtr); // CALL to COBOL procedure

printf("\n\n\n");

printf("Negative Integers and String\n");


cPtr->varInt = -32765;
cPtr->varLong = -987654;
strcpy(cPtr->szString,"Here's another string\n");

COBPROC ( cPtr);
}

OUTPUT

Positive Integers and String


Integer from C: 2767
Long integer from C: 00060000
String from C: This is a test string

Negative Integers and String


Integer from C: 2765
Long integer from C: 00987654
String from C: Here's another string

The following sample code demonstrates how to pass an array of integers from COBOL
to C.

INTARRAY.CBL

* Passing an Array of Integers from COBOL to C


data division.
working-storage section.
01 t-count pic 99.
01 t-table.
05 the-table pic 9(4) comp-5 occurs 5 times.
procedure division.
perform varying t-count from 1 by 1 until t-count > 5
move t-count to the-table(t-count)
end-perform.
call "C_CProc" using t-table.
stop run.
CNTARRAY.C

#include <stdio.h>
void CProc(int IntTable[4]) {
int count;

for (count = 0; count < 5; count++)


printf("Array [%i]: %i\r\n", count, IntTable[count]);

OUTPUT

Array [0]: 1
Array [1]: 2
Array [2]: 3
Array [3]: 4
Array [4]: 5

The following sample code demonstrates how to pass a two-dimensional array of long
integers from COBOL to C.

LINT.CBL

* Passing long integers from COBOL to C


$set bound
data division.
working-storage section.
01 I1 pic 9.
01 J1 pic 9.
01 t-table.
02 t-field occurs 2 times.
05 the-table pic 9(8) comp-5 occurs 3 times.

procedure division.
perform varying I1 from 1 by 1 until I1 > 2
perform varying J1 from 1 by 1 until J1 > 3
move J1 to the-table(I1, J1)
end-perform
end-perform.
call "_CProc" using t-table.
stop run.

LINTC.C

#include <stdio.h>
void CProc(long IntTable[2][3]) {
int i, j;

for (i = 0; i < 2; i++)


for (j = 0; j < 3; j++)
printf("Array [%i,%i]: %ld\r\n", i, j, IntTable[i][j]);
}

OUTPUT

Array [0,0]: 1
Array [0,1]: 2
Array [0,2]: 3
Array [1,0]: 1
Array [1,1]: 2
Array [1,2]: 3

The following sample code demonstrates how to pass a two-dimensional array of records
from C to COBOL

_COBPROC.CBL

program-id. "_CobProc".
data division.
working-storage section.
01 I1 pic 9.
01 J1 pic 9.
linkage section.
01 the-table.
02 t-table occurs 2 times.
05 t-field occurs 3 times.
10 field1 pic 9(4) comp-5.
10 field2 pic x(6).

procedure division using the-table.

perform varying I1 from 1 by 1 until I1 > 2


perform varying J1 from 1 by 1 until J1 > 3
display "table[" I1 "][" J1 "]: " field1(I1, J1)
display " " field2(I1, J1)
end-perform
end-perform.
stop run.

2DRECS.C

#include <stdio.h>

struct TableStruc { /* define structure */


int TheInt;
char String[6];
} TheTable[2][3];

extern void far cdecl CobProc(struct TableStruc TheTable[2][3]);

void main() {
int i, j;

for (i = 0; i < 2; i++) /* initialize structure */


for (j = 0; j < 3; j++) {
TheTable[i][j].TheInt = j;
Sprintf(TheTable[i][j].String, "[%1i][%1i]", i, j);
}

CobProc(TheTable); /* CALL COBOL routine */


}

OUTPUT

table[1][1]: 00000
[0][0]
table[1][2]: 00001
[0][1]
table[1][3]: 00002
[0][2]
table[2][1]: 00000
[1][0]
table[2][2]: 00001
[1][1]
table[2][3]: 00002
[1][2]

The following sample code demonstrates how to pass integers by reference from COBOL
to C.

COBINT.CBL

* Passing integers by reference from COBOL to C


working-storage section.
01 passvar1 pic 9(4) comp-5 value 16384.
01 passvar2 pic 9(4) comp-5 value 33.
procedure division.
display "Before the call to the C swapping function...".
display "Passvar1 is equal to: " passvar1.
display "Passvar2 is equal to: " passvar2.

call "_SwapFunc" using by reference passvar1


by reference passvar2.

display "After the call to the C swapping function...".


display "Passvar1 is equal to: " passvar1.
display "Passvar2 is equal to: " passvar2.
stop run.

CINT.C

/* Manipulates integers passed from a COBOL program */


#include <stdio.h>

void SwapFunc(int *var1, int *var2)


{
int tmp; /* Temporary value for use in swap */

tmp = *var1;
*var1 = *var2;
*var2 = tmp;
return;
}

OUTPUT

Before the call to the C swapping function...


PassVar1 is equal to: 16384
PassVar2 is equal to: 00033
After the call to the C swapping function...
PassVar1 is equal to: 00033
PassVar2 is equal to: 16384

The following sample code demonstrates how to pass an integer from COBOL to C.

CBLINT.CBL

working-storage section.
01 pass-var pic 9(4) comp-5 value 3.

procedure division.
call "_Circum" using by value pass-var.
display "Radius of circle: " pass-var.
display "Circumference of circle: " return-code.
stop run.

C.C

#include <stdio.h>

int Circum(int Radius) {


float cir;
cir = 3.14159 * Radius * Radius;
return((int) cir);
}

OUTPUT

Radius of circle: 00003


Circumference of circle: +0028

The following sample code demonstrates passing a long integer from COBOL to C.

LINT.CBL

$set rtncode-size(2)
working-storage section.
01 pass-var pic 9(4) comp-5.

procedure division.
display "Radius of circle?".
accept pass-var.
call "_Area" using by value pass-var.
display "Area of circle: " return-code.
stop run.

LINTC.C

#include <stdio.h>
long Area(int Radius) {
float cir;
cir = 3.14159 * Radius * Radius;
return((long) cir);
}

OUTPUT

Radius of circle?
1
Area of circle: +0003

The following sample code demonstrates how to pass a string from COBOL to C.

COBSTR.CBL

* Passing a string from COBOL to C


identification division.
program-id. cobstr.
data division.
working-storage section.
01 passvar pic x(15) value "Replace this".
procedure division.
display "This is what is passed: " passvar.
call "_Funct" using pass-var.
display "This is what comes back: " passvar.
stop run.

CSTR.C

#include <ctype.h>

void * Funct(char *Rvalue)


{
char *cp;
cp = Rvalue;
while (*cp != '\0') {
*cp = toupper(*cp);
++cp;
}
return;
}

OUTPUT
This is what is passed: Replace this
This is what comes back: REPLACE THIS

The following samples demonstrate how to call a C 6.x routine from a COBOL 4.5
program, where the C function, in turn, spawns another COBOL 4.5 executable.

Note: The COBOL program titled COB2.CBL must be compiled and linked as a stand-
alone executable module. Use the following lines to compile and link this program:

COBOL COB2;
LINK COB2,,,LCOBOL COBAPI/NOE/NOD;

MAIN.CBL

* How to call a C function that executes another COBOL


* program.
program-id. main.
working-storage section.
01 commandL.
05 filler pic x(01) value 'S'.
05 cmdlin pic x(124) value "COB2.EXE".
procedure division.
display "In COBOL program 1".
call "_pcexec" using commandL.
display "End of COBOL program 1".
stop run.

PCEXEC.C

#include <stdio.h>
#include <process.h>
pcexec (commandL)
char far commandL[125];
{
printf ("Prior to C call of COB2.EXE \n");
spawnl (P_WAIT, "COB2.EXE", "COB2", "spawnl", NULL);
printf("After C call to COB2.EXE \n");
}

COB2.CBL

* This program must be a separate executable.


program-id. cob2.
procedure division.
display "Inside COBOL program 2".
stop run.

OUTPUT

In COBOL program 1
Prior to C call of COB2.EXE
Inside COBOL program 2
After C call to COB2.EXE
End of COBOL program 1

The following samples demonstrate how a COBOL 4.5 Quickwin application can call a
Windows™-based DLL written in C 6.x.

MAIN.CBL

working-storage section.
77 Var1 pic 9(4) comp-5.
77 Char pic x.
procedure division.
move 1 to Var1.
display "Prior to DLL call: " at 0101
display Var1 at 0120.
call 'cdll' using by reference Var1.
display "After DLL call: " at 0201.
display Var1 at 0217.
call "cbl_read_kbd_char" using Char.
stop run.

MAIN.DEF

EXETYPE WINDOWS 3.0


APPLOADER '__MSLANGLOAD'
PROTMODE
STUB 'WINSTUB.EXE'
CODE PRELOAD FIXED
DATA PRELOAD FIXED MULTIPLE
STACKSIZE 16384
HEAPSIZE 1024
EXPORTS WNDPROC

CDLL.C

#include <windows.h>
int FAR PASCAL LibMain(HANDLE hInstance,
WORD wDataSeg,
WORD cbHeapSize,
LPSTR lpszCmdLine)
{
//Additional DLL initialization fits here
if (cbHeapSize != 0)
UnlockData(0);
return (1);
}

VOID FAR PASCAL cdll(int _far *piIntPointer)


{
if((*piIntPointer >= -32768) && (*piIntPointer < 32767))
{
(*piIntPointer)++;
return(1);
}
else
{
return(0);
}
}

VOID FAR PASCAL WEP (int nParameter)


{
if (nParameter == WEP_SYSTEM_EXIT)
{
return (1);
}
else
{ if (nParameter == WEP_FREE_DLL)
{
return (1);
}
else {
return (1);
}
}
}

CDLL.DEF

LIBRARY cdll
DESCRIPTION 'C DLL FOR WINDOWS 3.0'
EXETYPE WINDOWS
STUB 'WINSTUB.EXE'
CODE PRELOAD MOVEABLE DISCARDABLE
DATA PRELOAD MOVEABLE SINGLE
HEAPSIZE 0
EXPORTS Cdll @1
WEP @2 RESIDENTNAME

The Microsoft Overlay Virtual


Environment (MOVE)
Microsoft Corporation

Created: March 20, 1992

Abstract
This article explains how the Microsoft® overlay virtual environment (MOVE) helps
overcome memory limitations for programs that run in the MS-DOS® operating system.
The article compares MOVE technology to conventional overlays and to paged virtual
memory systems, and explains the basics of the technology.
Introduction
Along with death and taxes, all programmers eventually share another misery:
insufficient memory. Since the beginning of their profession, programmers have needed
to cram too-big programs into too-little random-access memory (RAM). Programmers for
MS-DOS® are further restricted by the infamous 640K limit; a program running on a 4
MB computer, for example, can directly execute only in the first 640K of RAM. Many
techniques have been employed to overcome this limitation: optimizing compilers,
interpreters, MS-DOS extenders, and so on. The most commonly used technique,
overlays, is also one of the most cumbersome to use. The new Microsoft overlay virtual
environment (MOVE) is a significant advance over previous overlay methods. MOVE is
both easier to use and more effective than conventional overlay systems.

In many ways, the MOVE technology combines the benefits of overlays and virtual
memory. Some of the advantages of MOVE over conventional overlays are:

• The MOVE system keeps multiple overlays in memory at the same time. This
makes devising efficient overlay structures much easier.

• Discarded overlays can be cached in extended memory (XMS) or expanded


memory (EMS).

• MOVE supports pointers to functions. You do not need to modify your source
code.

• The memory allocated for overlays can be set at program startup. Your program
can adapt to different memory situations.

The MOVE technology can be used only in MS-DOS operating system programs.
Programs in the Microsoft Windows™ graphical environment automatically take
advantage of a similar mechanism built into Windows.

The next three sections cover the basics of conventional overlays and virtual memory. If
you're already familiar with these concepts, you can skip ahead to "MOVE Basics."

Overlay Basics
If you're not using overlays or other techniques, your program size cannot exceed
available memory. When loading your program, MS-DOS copies the program's code and
data segments into memory, starting at the first available memory location and continuing
to the end of the program (see Figure 1).

Figure 1. Memory Map for a Nonoverlaid Application


With overlays, however, the entire program need not fit into memory at one time. A
portion of the program, called the root, occupies the lower portion of available memory
and works just like a nonoverlaid program. The other portions of the program, called
overlays, have overlapping memory addresses. This trick is accomplished by keeping
only one or a subset of these overlays in memory at one time.

When you use overlays, the linker automatically includes a routine called the overlay
manager in your program's EXE file. When the program calls a function located in
another overlay, the overlay manager loads the necessary overlay into memory,
overwriting the previous overlay (Figure 2).

Figure 2. Memory Map for an Overlaid Application

This way, a program can be many times larger than available memory; it only needs
sufficient memory to hold the root and the largest overlay. In some overlay systems the
overlays are included within the EXE file, whereas in others the overlays are separate
files, usually with the OVL extension. You need not keep track of which overlay is in
memory or which function is in which overlay; the overlay manager automatically
handles loading the appropriate overlay when necessary.

Well, if overlays sound too good to be true, you're right; they have some drawbacks. They
slow your program down, sometimes considerably. All that reloading of overlays from
the disk can gum up the works. Reading an instruction from an overlay on the disk can be
several thousand times slower than reading the instruction from an already-loaded
overlay, so the speed of your program depends heavily on how the overlays are
structured. Ideal candidates for overlays are functions that are called only once during a
program's execution, like initialization or error-handling routines. Routines that are used
together should be grouped into the same overlay so that multiple overlays needn't be
loaded to accomplish a task. The worst situation is caused by a tight inner loop calling
routines in two different overlays. In cases like this, the computer spends more time
loading overlays from disk than executing instructions. This phenomenon, called
thrashing, is accompanied by grinding from your user's hard disk and groaning from your
users.

Determining an efficient overlay structure is fiendishly difficult, an activity closer to art


than to science. Your intuitions about who calls what, particularly in a large program, are
often dead wrong. Even when you know which functions are involved in a particular task,
it's still difficult to balance the performance hit with the need to reduce the required
memory.

Example of Overlay Structure


Most programmers structure large projects into several source files, using one file for
each major system in the program. For example, a hypothetical calendar program that
allows the user to add appointments, view a date page, and print the calendar might be
composed of the files listed below.

Source file Key routines


DATABASE.C DatabaseInit
(read, write database appointment records) DatabaseReadRecord
DatabaseWriteRecord
DatabaseExit
DATAFORM.C DataFormEnter
(show, get appointment data entry) DataFormShow
DATEUTIL.C DateDifference
(various date routines) DateGet
DateShow
INIT InitializeApplication
(main initialization routine)
MAIN.C main
(main program file) ShowMenu
PRINTER.C PrinterInit
(print appointments) PrinterWrite
STRUTIL StringGet
(various string routines) StringShow

An obvious overlay structure for the program can be illustrated as follows.

Root Overlays
MAIN.C 1: DATABASE.C
2: DATAFORM.C
3: DATEUTIL.C
4: INIT.C
5: PRINTER.C
6: STRUTIL.C

Although this structure reduces memory requirements to a bare minimum, it is probably


very slow. For example, the primitives in DATEUTIL.C and STRUTIL.C are used
throughout the code, so these routines should be placed in the root.
As you analyze the call tree and optimize the overlay structure, you may find yourself
putting more and more routines in the program's root. However, if you put too many
routines in the root, your program will need nearly as much memory as the nonoverlaid
version. The initialization routines in INIT.C call the hypothetical routines DatabaseInit
in DATABASE.C and PrinterInit in PRINTER.C. Although these routines thematically
belong in DATABASE.C and PRINTER.C, they should be included in the INIT overlay
for best performance. If you move too many routines from where they belong to where
they are used, your program may run faster, but it will be harder to read and to maintain
the source code. A more balanced overlay structure is shown below.

Root Overlays
MAIN.C 1: DATABASE.C
DATEUTIL.C (except DatabaseInit)
STRUTIL.C
2: DATAFORM.C
3: INIT.C
(plus DatabaseInit from DATABASE.C)
(plus PrinterInit from PRINTER.C)
PRINTER.C
(except PrinterInit)

Producing a good overlay structure requires lengthy and tedious trial-and-error work. As
new capabilities are added to your program, the structure quickly becomes obsolete.
Programmers working on a large system that contains hundreds of source files and
thousands of functions often spend as much time tuning the overlay structure as they do
writing code.

Paged Virtual Memory


Because working with overlays is so difficult, computer designers have come up with a
radically different approach called paged virtual memory (VM). In a paged virtual
memory system, the entire address space of the computer is divided into fixed-size blocks
called pages. The address range of the processor can be significantly larger than the
memory physically contained in the computer; therefore, only a fraction of the page
addresses represent actual memory addresses. The programmer doesn't have to worry
about the amount of memory in a computer that has VM.

All addresses used in a VM program are virtual addresses. The computer's virtual
memory manager maps virtual page addresses to the physical addresses of memory.
When a program needs a virtual memory page that is not mapped to a physical page in
memory, the virtual memory manager copies the contents of that page from disk to a page
of physical memory. The operating system maps the virtual address of the page to the
physical address of the page's contents. This way, when the program reads from a
particular virtual address, the computer's VM mapping scheme ensures that the program
reads from the appropriate physical page. The computer doesn't need room for all the
pages containing a program. The more physical pages available, the less disk activity
needed and the faster the program runs. The operating system's VM manager handles
loading pages from the disk, swapping modified pages to the disk and translating virtual
addresses to physical addresses.

Virtual memory has several advantages over overlays. First, it does not require
programmer effort and eliminates the tedious process of creating overlay structures.
Second, the program performs efficiently regardless of the amount of memory the user's
computer contains. Most of the program's execution time is spent in a small fraction of
the code. As the program executes, pages containing this core code replace pages with
less critical code. The set of pages that make up the often-used code is called the
program's working set. If the working set can fit in the computer's physical memory, the
program executes efficiently and swaps pages only occasionally for infrequently used
routines. If the working set cannot fit in the computer's memory, the computer thrashes,
spending more time loading code from the disk than executing the program.

Of course, VM is no panacea either. First, the virtual memory manager and the address
translation scheme must be part of the computer hardware. The more powerful members
of the Intel® CPU family, particularly the 80386 and higher, support address translation.
Less powerful CPUs, however, do not support this feature. Second, the virtual memory
manager must realistically be an integral part of the operating system. MS-DOS does not
support virtual memory.

MOVE Basics
Microsoft's new MOVE overlay technology has the best of both the overlay and virtual
memory worlds. MOVE is an overlay system but has significant advantages over
conventional overlays. Unlike conventional overlays, MOVE allows more than one
overlay to reside in memory simultaneously. Like virtual memory, the MOVE memory
manager keeps resident as many overlays as will fit. Each overlay need not fully cover a
single task; two or three overlays can cooperate to complete the task. When loading a
new overlay, MOVE discards the least recently used (LRU) overlay. If there is still
insufficient room for the new overlay, MOVE discards the next least recently used
overlay, and so on.

With MOVE you can make your overlays smaller and more modular, letting the LRU
algorithm determine which overlays stay in memory. Some of your overlays may remain
in memory because they are needed for the normal operation of the program. This
working set of overlays is similar to the working set of pages in a virtual memory system.
Like virtual memory, MOVE programs naturally configure themselves for efficient
operation on a given computer. Unlike virtual memory, however, you are not limited to
fixed-size pages; you can group functions for better control. For example, if function A is
called each time function B is called and only when function B is called, you can group A
and B in the same overlay to save the disk time of loading them separately.

MOVE Mechanics
You don't need to modify your C source code to create a MOVE application, but you do
need to modify your CL and LINK command lines. These changes are described in the
"Creating Overlaid Programs" section.

Like a nonoverlaid program, a MOVE application has a single EXE file. The EXE file
contains the root and all overlays. The file also contains the overlay manager routines
(about 5K), which are automatically added by the linker. When a MOVE application is
launched, the program's startup routine allocates a memory area to store the overlays.
This area, called the overlay heap, is distinct from the regular heap used for malloc.
When your application calls a function in an overlay that is not currently loaded in RAM,
the MOVE manager must read the overlay from disk and copy its contents to the overlay
heap before program execution can continue. If the heap does not have enough free space
to hold the requested overlay, the MOVE manager discards one or more of the currently
resident overlays. The least recently used overlay is discarded first. Because overlays can
vary in size, the MOVE manager may have to discard multiple overlays to make
sufficient room for the requested overlay.

If your program is running on a computer with EMS or XMS memory, the MOVE
manager can create an overlay cache for copying discarded overlays. The program cannot
execute overlays directly from this cache because the cache resides above the 640K limit.
If a discarded overlay is needed again, the manager copies it from the overlay cache to
the overlay heap rather than reading it from the disk. Because reading from the cache is
much faster than reading from the disk, the space for your working set is effectively the
cache size plus the heap size. The overlay manager routines maintain the overlay cache
with an LRU algorithm in a manner similar to the overlay heap.

Heap and Cache Management


The MOVE overlay manager is responsible for loading requested overlays from the disk
or cache and copying them to the heap. If there is insufficient contiguous heap space for
the requested overlay, the MOVE manager discards the LRU overlay from the heap and
checks for contiguous space again. If space is still insufficient, the MOVE manager
discards the next LRU overlay and repeats these steps until sufficient contiguous space is
available.

At program startup, the MOVE manager attempts to allocate an overlay heap equal to the
sum of the program's three largest overlays. If space is insufficient or there are less than
four program overlays, MOVE allocates a heap that is the size of the largest overlay. The
remaining computer free memory is retained for the conventional (malloc) heap. (This is
default initialization behavior and can be substituted by another scheme if desired.)
If the program is running on a computer with EMS or XMS memory, the MOVE manager
attempts to allocate an overlay cache three times the size of the overlay heap. If there is
not enough memory for a cache this size, all EMS or XMS memory is used.

When the MOVE manager discards an overlay from the heap, it does not copy the
overlay to the cache if a copy of the overlay is already in the cache.

Individual overlays can be up to 64K in size but are usually much smaller. Overlays can
be individual OBJ files, as in a conventional overlay system, or they may contain a list of
functions. With large overlays, your program's performance will suffer the problems
associated with conventional overlays. Your overlays should be large enough to justify
the time it takes to load them from disk. Specifics vary depending on your program, and
experimentation will help you find the optimal overlay size and organization. For most
programs, an optimal overlay size is about 4K.

If your overlaid program temporarily needs the EMS or XMS memory occupied by the
cache, you can use the MOVE application programming interface (API) _movepause
function to release the cache memory and _moveresume to restore the cache. This is
particularly useful if your program spawns another program that needs EMS or XMS
memory to function. The MOVE API functions are described in Appendix A.

How Does MOVE Work?


One aspect of MOVE seems quite mysterious until you know how it works. How does
the overlay manager know when it needs to load an overlay? How do calls to overlaid
functions know where to branch in the overlay heap? This magic is accomplished by
inserting an additional link between the function and its callers. This link, called a thunk,
works like an additional function call. One thunk data structure is created in the root for
each far function contained in the overlays. The thunk data structure contains the overlay
number containing the function and the offset of the function's entry point within the
overlay. The linker modifies all function calls to overlaid functions so that they call the
thunk instead of directly calling the function. When a function calls the thunk, the MOVE
manager locates the appropriate overlay in the heap or loads the overlay from the cache
or disk and jumps to the offset specified in the thunk.

Creating Overlaid Programs


You create a MOVE application by following the same edit-compile-link development
cycle used for all C programs. (The old syntax, link a+(b)+(c), is also supported.) You
will need to create an additional file, called a DEF file, for each application. The DEF file
is used by the linker and specifies the makeup of the root and of each overlay. A sample
DEF file for the hypothetical calendar program is shown below:

EXETYPE DOS

;FUNCTIONS:init DatabaseInit PrinterInit


; Place main, strutil, and dateutil in the root.
FUNCTIONS:0 _main
FUNCTIONS:0 _strutil
FUNCTIONS:0 _dateutil

FUNCTIONS:1 _database
FUNCTIONS:2 _dataform
FUNCTIONS:3 _init
FUNCTIONS:3 _printer

For more information on the syntax of DEF files, see "Creating Overlaid MS-DOS
Programs" and "Creating Module Definition Files" in the C/C++ Environment and Tools
manual.

The first step in creating a MOVE application is to determine an appropriate overlay


structure. For most programs, a good starting point is to place each OBJ file in its own
overlay. The program entry point must be in the root, that is, the normal sequence is
_astart followed by main. OBJ files containing universally called primitives should be
placed in the root as well.

MOVE gives you control over the placement of individual functions. Instead of moving a
function's source code physically to another file, you specify the function in a
FUNCTIONS statement in your application's DEF file. A function can be specified in
this way only if it is a packaged function. Functions can be packaged by specifying the
/Gy switch during compilation. For more information on packaging functions, see "CL
Command Reference" and "Creating Overlaid MS-DOS Programs" in the C/C++
Environment and Tools manual.

Optimizing Overlaid Programs


After you've created a MOVE program, you can run it under different memory
conditions, assess its performance, and compare the performance of different overlay
sizes and structures. A MOVE feature called tracing can help you optimize your overlays.
Tracing a MOVE application generates a log file during program execution. The log file
contains an entry for each load and discard of an overlay. A separate MS-DOS utility
called TRACE reports and summarizes the information in trace log files. The TRACE
utility is discussed in Appendix C. For more information on tracing, see "Creating
Overlaid MS-DOS Programs" in the C/C++ Environment and Tools manual. Future
versions of MOVE will include enhanced tools that make designing and optimizing the
overlay structure easier.

You can modify some of the characteristics of the MOVE manager. For example, you can
change the amount of memory MOVE allocates for the overlay heap and cache by
changing the constants and heuristics in the MOVEINIT.C file. For more information, see
"Creating Overlaid MS-DOS Programs" in the C/C++ Environment and Tools manual.
Appendix A: The MOVE API
The MOVE API is provided in a library called MOVE.LIB. This library is a component
of the C combined libraries for medium and large models. (Another form of the library,
MOVETR.LIB, also contains the MOVE API; see Appendix C.) The MOVE API is
declared in the MOVEAPI.H file, which is available on disk. This appendix describes
MOVE routines and functionality.

The _moveinit Function

MOVE begins an overlaid program with a call to _moveinit, which calculates the heap
and cache needed for the overlays and allocates memory for the heap and cache.

You can use the default _moveinit function provided in MOVE.LIB, or you can write
your own version of _moveinit and link it to your program. The source code for the
default _moveinit function is available in the MOVEINIT.C file.

The _moveinit call occurs before the call to _astart that begins a C program and
performs initialization. For this reason, do not call C run-time routines from any version
of _moveinit.

The following functions are called from _moveinit:

• _movesetheap

• _movegetcache

• _movesetcache

• _movetraceon (only in MOVETR.LIB)

The functions are described in the sections below. In addition, LINK creates several
variables that begin with $$; these variables are described in the "LINK Variables"
section.

Heap Allocation

The _movesetheap function sets the overlay heap size.

extern unsigned short __far __cdecl _movesetheap(


unsigned short maxovl,
unsigned short minheap,
unsigned short reqheap );

where:
maxovl is the maximum number of overlays. The $$COVL variable always contains
this value.
minheap is the minimum heap size, specified in 16-byte paragraphs. The heap must
be at least the size of the largest overlay. To calculate overlay sizes, use
$$MPOVLSIZE as in MOVEINIT.C.
reqheap is the requested heap size, specified in 16-byte paragraphs. The default
_moveinit function requests the sum of the sizes of the three largest
overlays.

MOVE attempts to allocate the requested amount of memory. If that much memory is not
available, MOVE tries to allocate as much as possible. If the amount of available memory
is less than the minimum heap requested, MOVE ends the program and issues a run-time
error.

Cache Allocation

The _movegetcache function determines the amount of memory available for a cache.

extern void __far __cdecl _movegetcache(


unsigned short __far *expmem,
unsigned short __far *extmem );

where:

*expmem is available expanded memory, in kilobytes.


*extmem is available extended memory, in kilobytes.

The _movesetcache function allocates expanded and extended memory for an overlay
cache.

extern unsigned short __far __cdecl _movesetcache(


unsigned short expmem,
unsigned short extmem );

where:

expmem is the requested amount of expanded memory, specified in kilobytes.


extmem is the requested amount of extended memory, specified in kilobytes.
The default _moveinit function requests a cache equal to the sum of all overlays. If
_movesetcache cannot allocate the requested amount of memory, it sets a bit in the return
value. MOVEAPI.H defines the following constants to represent bits in the return value.

Constant Bit Description


__MOVESETCACHE_ERR_NO 0 No error
__MOVESETCACHE_ERR_XMS 1 Cannot allocate extended memory
__MOVESETCACHE_ERR_EMS 2 Cannot allocate expanded memory

The _movesetcache function sets the following global variables when the overlay cache
is allocated:

extern unsigned short __far __cdecl _moveckbxms;


extern unsigned short __far __cdecl _moveckbems;

The _moveckbxms variable is set to the size of the allocated extended memory. The
_moveckbems variable is set to the size of the allocated expanded memory.

Freeing and Reallocating Cache Memory

You can temporarily release and then restore the memory allocated for the overlay cache.
This is useful when your program spawns another program that uses extended or
expanded memory or when you want to prepare for a possible abnormal exit from your
program.

The _movepause function frees the cache memory and closes the executable file.

extern void __far __cdecl _movepause( void );

The _moveresume function reallocates memory for the overlay cache and reopens the
executable file.

extern void __far __cdecl _moveresume( void );

MOVEAPI.H defines the following variables for use by these functions:

extern unsigned short __far __cdecl _movefpause;


extern unsigned short __far __cdecl _movefpaused;

MOVEAPI.H also defines constants to represent bits in _movefpause and _movefpaused


as follows.
Constant Bit Description
__MOVE_PAUSE_DISK 2 Represents the executable file
__MOVE_PAUSE_CACHE 4 Represents the cache memory

The _movepause function reads the value in _movefpause and sets _movefpaused to the
value of the action taken by _movepause. Before you call _movepause, set _movefpause
to __MOVE_PAUSE_DISK to close the file, and set it to __MOVE_PAUSE_CACHE to
free the cache, as in:

_movefpause |= __MOVE_PAUSE_DISK;
_movefpause |= __MOVE_PAUSE_CACHE;
_movepause();

The _moveresume function reads the value in _movefpaused and then clears
_movefpaused. The overlays that were in the heap and cache are not restored. Therefore,
after a call to _moveresume, the program may at first run slightly more slowly as it
makes calls to routines in overlays.

LINK Variables

LINK creates the following variables:

$$MAIN Entry point to an overlaid program. In a C program, this is


defined to be __astart.
$$OVLTHUNKBEG Beginning of the interoverlay call (thunk) table.
$$OVLTHUNKEND End of the interoverlay call table.
$$CGSN Number of global segments. Each object file contributing to an
overlay takes up one global segment number (GSN). Each
COMDAT (packaged function) segment takes up one GSN.
$$COVL Number of overlays. Each overlay can contain several GSNs.
$$MPGSNBASE Map of GSNs to segment displacements in an overlay.
$$MPGSNOVL Map of GSNs to overlay numbers.
$$MPOVLLFA Map of overlay numbers to logical file addresses of overlays in
the executable file.
$$MPOVLSIZE Map of overlay numbers to overlay image sizes (the size of the
code actually loaded into the overlay heap).
$$INTNO Overlay interrupt number.

Appendix B: MOVE Environment Variables


You can use environment variables at run time to specify the size of the requested overlay
heap and overlay cache and the maximum number of overlays. The _moveinit function
given in MOVEINIT.C provides environment support; you can compile this function and
link it with your program. (MOVETR.LIB includes a version of _moveinit that already
contains environment support.)

First, enable environment support by compiling MOVEINIT.C with MOVE_ENV


defined. Then specify the resulting MOVEINIT.OBJ when linking your program. With
MOVE_ENV defined, MOVEAPI.H declares the following variable:

extern unsigned short __far __cdecl _movesegenv;

Compiling for environment support causes MOVEINIT.C to define a function called


_movegetenv. The environment-support version of _moveinit uses _movegetenv to get
the values of the following environment variables:

MOVE_HEAP Requested heap (paragraphs)


MOVE_COVL Maximum number of overlays
MOVE_EMS Requested expanded-memory cache (paragraphs)
MOVE_XMS Requested extended-memory cache (paragraphs)

To use these variables, set them to strings that represent the desired settings. Each string
must consist of exactly four hexadecimal digits.

Appendix C: The TRACE Utility


You can optimize the overlays in your program with the help of the tracing form of the
MOVE library (MOVETR.LIB) and the Microsoft MOVE trace utility (TRACE) version
1.0. MOVETR.LIB contains MOVE.LIB and additional routines for tracing overlay
behavior.

Create a tracing version of your program as described in the following sections. When
you run your program, the tracing functions create a binary file called MOVE.TRC in the
directory from which the program is run. After your program ends, use TRACE to read
MOVE.TRC. If the tracing results indicate that some functions cause overlays to be
swapped frequently, you can reorganize the functions in the overlays by using statements
in the module definition file.

Creating a Tracing Version of an Overlaid Program

To create a program that will trace overlay performance, specify MOVETR.LIB in


LINK's libraries field. This causes LINK to use the MOVETR.LIB library instead of the
MOVE.LIB component of the default combined library. Use LINK's /NOE option to
prevent conflicts between MOVETR.LIB and the combined library. If you explicitly
specify the combined library in the libraries field, list MOVETR.LIB before the
combined library.

The Trace Functions

By default, tracing is in effect during the entire run of your program. You do not need to
make any changes in your program to enable tracing. However, MOVETR.LIB provides
two functions that you can use to turn tracing on and off within your program.

The _movetraceon function turns on tracing.

extern void __far __cdecl _movetraceon( void );

This function opens the MOVE.TRC file and activates tracing. During tracing,
information about overlay behavior is written to MOVE.TRC. The default _moveinit
function calls _movetraceon at the start of the program if MOVE_PROF is defined; this
definition is in MOVETR.LIB.

The _movetraceoff function turns off tracing and closes MOVE.TRC.

extern void __far __cdecl _movetraceoff( void );

The tracing functions are declared in MOVEAPI.H. They are defined only in
MOVETR.LIB.

Running TRACE

To run TRACE, use the following syntax:

TRACE [options] [tracefile]

The tracefile is the MOVE.TRC file created during a tracing session. You can specify a
path with the filename. If tracefile is not specified, TRACE looks in the current directory
for a file called MOVE.TRC.
An option is preceded by an option specifier, either a forward slash (/) or a dash (–).
Options are not case sensitive. An option can be abbreviated to its initial letter. Options
can appear anywhere on the command line.

TRACE options are:

/SUM Displays a summary of the program's performance. If /SUM is not


specified, TRACE displays the entire tracing session. For details,
see the "TRACE Performance Summary" section. If /SUM is
specified, /EXE and /MAP have no effect.
/EXE:filename Allows TRACE to read the executable file that was traced and to
extract function names for use in the trace output. Specify the
filename of the executable file that generated the MOVE.TRC file.
You can specify a path with the filename. If /EXE is not specified,
the trace output refers to functions by overlay number and offset.
The program must contain Microsoft Symbolic Debugging
Information that is compatible with Microsoft CodeView® version
4.0. To include debugging information, create the object file using
the /Zi option and link the program using the /CO option.
/HELP Displays a usage statement.
/? Displays a usage statement.

TRACE Output

TRACE displays information on the tracing session to the standard output device. You
can use the redirection operator (>) to save the output in a file. The output is in table
format. Each line of output represents an interoverlay transaction. A line of information is
organized into the following fields:

• The overlay to which to return from the current transaction. (If blank, the overlay
in the previous line is implied.)

• The physical return address in segment:offset form. (If blank, the address in the
previous line is implied.)

• The transaction type, which is one of the following:


• Present

• Load from disk

• Load from expanded memory

• Load from extended memory


• Discard from heap

• Cache to expanded memory

• Cache to extended memory

• Discard from cache

• Invalid
• The overlay that is the object of the transaction.

• The segment in memory where the transaction overlay is loaded.

• The interoverlay operation, which is one of the following:


• Call function, in which function is:

An overlay number and an offset in default output

A function name if /EXE is used

A decorated function name if /EXE and /MAP are used

• Return.

• If blank, the Call in the previous line is implied.

TRACE Performance Summary

When you run TRACE with the /SUM option, TRACE displays a summary of overlay
performance to the standard output device. The full session is not displayed. You can use
the redirection operator (>) to save the output in a file. The summary information is
organized into the following fields.

OVERALL

calls Sum of Call operations


returns Sum of Return operations

HEAP

discards Sum of "Discard from heap" transactions


discards / entries Discards as percent of (calls + returns)
loads from disk Sum of "Load from disk" transactions
loads from expanded Sum of "Load from expanded memory" transactions
memory
loads from extended Sum of "Load from extended memory" transactions
memory

CACHE

discards Sum of "Discard from cache" transactions


discards / entries Discards as percent of (calls + returns)
caches to expanded memory Sum of "Cache to expanded memory" transactions
caches to extended memory Sum of "Cache to extended memory" transactions

TRACE Errors

TRACE issues the following errors and warnings.

TR1001Invalid filename for /EXE

The string specified with the /EXE option was not a valid filename.

TR1005Missing filename for /EXE

The /EXE option must be followed by a colon and a filename, with no spaces in between.

TR1007Unrecognized option

The command line contained an option specifier, either a forward slash (/) or a dash (–),
followed by a string that was not recognized as a TRACE option.

TR1010Cannot find trace file

One of the following occurred:

• A trace file was specified on the command line, but the specified file does not
exist.

• No trace file was specified on the command line and TRACE assumed a trace file
called MOVE.TRC, but MOVE.TRC does not exist.
TR1011Error opening/reading .EXE file

TRACE either failed to find the executable file specified with /EXE or encountered an
error while opening the file.

TR1012Out of memory

The available memory is insufficient for the size of the program being traced.

TR1013Invalid debugging information

The debugging information contained in the executable file was not packed using
CVPACK version 4.0.

TR4001Cannot find function name

TRACE could not find a function name to display. TRACE continues to generate output
without displaying the function name.

Function names are displayed when the /EXE option is specified. Either the executable
file contains corrupt debugging information or a module in the executable file was
compiled without the /Zi option for including debugging information.

TR4002Missing debugging information for module

TRACE could not find a symbol to correspond to a given physical address. A module
may have been compiled without the /Zi option for including debugging information.

Microsoft Windows and the C Compiler


Options
Dale Rogerson
Microsoft Developer Network Technology Group

Mr. Rogerson is widely known for having reported the largest number of duckbilled
platypus sightings in the greater Seattle area.

Created: May 5, 1992

Click to view or copy the Zusammen sample application files for this technical article.

Abstract
One of the key issues in the development and design of commercial applications is
optimization—how to make an application run quickly while taking up as little memory
as possible. Although optimization is a goal for all applications, the Microsoft®
Windows™ graphical environment presents some unique challenges. This article
provides tips and techniques for using the Microsoft C version 6.0 and C/C++ version 7.0
compilers to optimize applications for Windows. It discusses the following optimization
techniques:

• Using compiler options

• Optimizing the prolog and epilog code

• Optimizing the calling convention

• Aliasing (using the /Ow and /Oa options)

General Optimization Strategies


Optimization is a battle between two forces: small size and fast execution. As with most
engineering problems, deciding which side to take is never easy. The following
guidelines will help you optimize your applications for the Microsoft® Windows™
graphical environment.

• If your application runs in real mode, always optimize for size. Memory is the
limiting resource in real mode. Using too much memory leads to both speed loss
and memory loss, resulting in a performance hit.

• Memory is not as scarce in protected mode (that is, in standard and enhanced
modes) as it is in real mode, so you must decide whether to optimize for speed or
for size. However, as users start running multiple programs simultaneously,
memory becomes scarce. The rule of thumb for both Windows and other
operating environments is to optimize for speed the 10 percent that runs 90
percent of the time. Tools such as the Microsoft Source Code Profiler help
determine where optimizations should be made.

• Because Windows is a visual interactive environment, several shortcuts help


identify areas that need speed optimization. Any code that displays information
directly on the screen, including code that responds to WM_PAINT,
WM_CREATE, and WM_INITDIALOG messages, should be optimized. A dialog
box does not appear until the WM_INITDIALOG process is complete, so the user
must wait. Speed is not as critical in other areas because the user can move the
mouse only so fast. In most situations, the code underlying the selection processes
in a dialog box need not be optimized.

Note The Microsoft C version 6.0 compiler precedes most function modifiers with a
single underscore (_), for example, _loadds, _export, _near, _far, _pascal, _cdecl, and
_export. The Microsoft C/C++ version 7.0 compiler uses two underscores (__) for ANSI
C compatibility but recognizes the single underscore for backward compatibility. This
article uses C version 6.0 compiler syntax except when discussing features available only
in C/C++ version 7.0.

The Sample Application: Zusammen


The sample application, Zusammen, illustrates the use of the compiler options.
Zusammen, which means together in German, scrolls two different windows
simultaneously. To scroll, the user selects the windows with the mouse and clicks
Zusammen's scroll bars. This makes it easy to compare data in two different windows or
applications.

Zusammen consists of a program generated by MAKEAPP and a dynamic-link library


(DLL) called Picker. MAKEAPP is a sample program included in the Windows version
3.1 Software Development Kit (SDK). The Picker DLL selects the windows to be
scrolled.

The make files for Zusammen and Picker are combined for simplicity. All functions are
classified as local, global, entry point, or DLL entry point and declared with an
appropriate #define statement, for example:

void LOCAL cleanup(HWND hwndDlg);


BOOL DLL_ENTRY Picker_Do(HWND, LP_PICKER_STRUCT);

• A local function is a function called from within a segment.

• A global function is a function called from outside a segment.

• An entry point is a function that Windows calls.

• A DLL entry point is a DLL function that a client application calls.

For demonstration purposes, the symbols are defined in the make files. Using symbols
facilitates switching memory models and optimizing applications. You can also port
applications to flat-model environments easily by using #define NEAR and #define FAR
(from WINDOWS.H) instead of __near and __far. Some possibilities are:

#define LOCAL NEAR


#define DLL_ENTRY FAR PASCAL

or:

#define LOCAL NEAR PASCAL


#define DLL_ENTRY FAR PASCAL __loadds

The Solution
Tables 1 through 3 show options recommended for general use. These options can be
used as defaults in make files because they do not require changes to the source code to
compile correctly. Each table shows the options for building an application and a DLL
and differentiates between the debugging (development) phase and the released product.
The options in Table 1 apply to applications or libraries that run in real mode; the options
in Tables 2 and 3 apply to applications or libraries that run only in protected mode. Table
3 is for C/C++ version 7.0 use only.

The developer must choose either the /Ot option to optimize for speed (time) or the /Os
option to optimize for size. The C version 6.0 compiler defaults to /Ot. The C/C++
version 7.0 compiler defaults to /Od, which disables all optimizations and enables fast
compiling (/f).

The /Oa and /Ow options do not appear in the tables; both options assume no aliasing
and require that the C source meet certain conditions to work properly. These two options
are discussed in the "Aliasing and Windows" section. In general, use /Ow instead of /Oa
for Windows-based applications. You can turn the no-aliasing assumption on and off
using #pragma optimize with the a or w switch.

Another option that is not included in the tables is the optimized prolog/epilog option
/GW. In C version 6.0, this option generates code that does not work in real mode; it is
fixed in C/C++ version 7.0. For backward compatibility, the C/C++ version 7.0 /Gq
option generates the same prolog/epilog as the C version 6.0 /GW switch. Although the
fixed /GW option results in a smaller prolog for non-entry-point functions, better
optimizations are available for protected-mode applications, as discussed in the next
section.

Table 1. Compiler Options for Real Mode (C 6.0 and C/C++ 7.0)

The General Solution for Protected Mode

If your application runs only in protected mode, you can use the additional optimization
options shown in the second row of Table 2. Make1 demonstrates the use of these
options, which are safe for all modules in a protected-mode application.

You can realize additional savings in space and time by compiling modules without entry
points separately from those with entry points. Use the options in the third row of Table 2
for modules without entry points. Make2 demonstrates the use of both sets of options.
The Zusammen sample application is already set up with far calls and entry points in
separate C files. This application should run only in protected mode, so you should
compile with the resource compiler (RC) /T option to ensure that the application never
runs in real mode.
DLLs can benefit from the techniques presented in the "Optimized DLL Prolog and
Epilog" section. These techniques work with both C version 6.0 and C/C++ version 7.0.

Table 2. Compiler Options for Protected Mode Only (C 6.0 and C/C++ 7.0)

The General Solution for Protected Mode and C/C++ 7.0

The C/C++ version 7.0 compiler includes special optimizations for protected-mode
Windows programs (see Table 3). These special optimizations include /GA (for
applications), /GD (for DLLs), and /GEx (to customize the prolog) and help reduce the
amount of overhead the prolog/epilog code causes. The /GA and /GD options add the
prolog and epilog code only to far functions marked with __export instead of compiling
all far functions with the extra code. With __export, entry points need not be placed in a
separate file as required by C version 6.0.

Applications that do not mark far functions with __export can use the /GA /GEf or /GD
/GEf options to generate the prolog/epilog code for all far functions. /GEe causes the
compiler to export the functions by emitting a linker EXPDEF record. By default, /GD
emits the EXPDEF record but /GA does not. Applications compiled with /GA usually do
not need the EXPDEF record. Only real-mode applications need /GEr and /GEm;
protected-mode applications have no use for these options. The following options
generate equivalent prolog/epilog code:

• /GA is equivalent to /GA /GEs /D_WINDOWS.

• /GD is equivalent to /GD /GEd /GEe /Aw /D_WINDOWS /D_WINDLL.

Table 3. Compiler Options for Protected Mode (C/C++ 7.0 Only)

Overview of Compiler Options


Generate Intrinsic Functions (/Oi)

The /Oi option replaces often-used C library functions with equivalent inline versions.
This replacement saves time by removing the function overhead but increases program
size because it expands the functions.

In C version 6.0, the /Oi option is not recommended for general use because it causes
bugs in some situations, especially when DS != SS. Using #pragma intrinsic to
selectively optimize functions reduces the chance of encountering a bug.
The ZUSAMMEN.C module of the sample application demonstrates the use of #pragma
intrinsic. Although this particular use does not drastically increase program speed, it does
demonstrate the right ideas: It speeds up the WM_PAINT function and is used on a
function that is called three times per WM_PAINT message. The best savings occur when
the intrinsic function is in a loop or is called frequently.

Pack Structure Members (/Zp)

The /Zp option controls storage allocation for structures and structure members. To save
as much memory as possible, Windows packs all structures on a 1-byte boundary.
Although this saves memory, it can result in performance degradation. Intel® processors
work more efficiently when word-sized data is placed in even addresses. An application
must pack Windows structures to communicate successfully with Windows, but it need
not pack its own structures. Because Windows structures are prevalent, it is better to
compile with the /Zp option and use #pragma pack on internal data structures. Passing
an improperly packed structure to Windows can lead to problems that are difficult to
debug. Both Zusammen and Picker use #pragma pack on their internal data structures.
(See the FRAME.H, APP.H, and PACK_DLL.H modules.)

Note that PICKER.DLL packs PICKER_STRUCT. Because most Windows-based


applications pack structures, it is safer to leave DLL structures packed. In most cases, the
speed optimization is not worth the extra trouble of documenting the unpacked functions,
especially if the DLL will be used with other languages or products, such as Microsoft
Visual Basic™ or Microsoft Word for Windows.

Set Warning Level (/W3)

All Windows-based programs should be compiled at warning level 3. You can fix many
hard-to-detect bugs by removing the warnings that appear during compilation. It is less
expensive to fix a warning message than to ship a bug fix release to unsatisfied users. All
applications should be run in Windows debug mode before release.

Compile for Debugging (/Zi) and Disable Optimizations (/Od )

It is often easier to turn off optimizations to debug a module. Some optimizations can
introduce bugs into (or remove bugs from) otherwise correct programs. For this reason,
an application must be fully tested with release options, and all developers and testers
should be aware of the options used.

Stack Checking (/Gs)

By default, the compiler generates code to "check the stack"; that is, each time a function
is called, chkstk (actually _aNchkstk) compares the available stack space with the
additional amount the function needs. If the function requires more space than is
available, the program generates a run-time error message. Table 4 (below, under
"Examining the Prolog and Epilog Code") shows the call to chkstk, which is removed by
compiling with /Gs. Stack checking adds significant overhead, so it is usually disabled
with the /Gs option after sufficient testing. It is usually a good idea to reenable stack
checking on recursive functions with the check_stack pragma.

#define Statements (/DSTRICT, /D_WINDOWS, /D_WINDLL)

The #define statements /DSTRICT, /D_WINDOWS, and /D_WINDLL are


recommended for all Windows-based applications. Using /DSTRICT with
WINDOWS.H results in a more robust and type-safe application. /DSTRICT lets you
use macros to replace Windows functions such as GetStockObject with type-safe
versions such as GetStockBrush and GetStockPen.

The C header files use /D_WINDOWS and /D_WINDLL to determine the correct
prototypes and typedefs to include. /D_WINDLL ensures that using an invalid library
function in a DLL generates an error. The C/C++ version 7.0 compiler /GA option
automatically sets /D_WINDOWS; the /GD option sets both /D_WINDOWS and
/D_WINDLL.

Optimizing the Prolog and Epilog


Programs designed for Windows, unlike those designed for MS-DOS®, have special
sections of code called the prolog and epilog added to entry points. For this reason,
Windows uses special compilers. When you compile a program with the /Gw option, all
far functions receive the extra prolog and epilog code and increase in size by about 10
bytes. You can take the following steps to reduce this overhead, especially for protected-
mode-only applications:

• Reduce the number of far calls.

• Reduce the prolog and epilog code.

Reducing the Number of Far Calls

Because /Gw adds the extra code only to far functions, reducing the number of far
functions is a good way to trim program size. In the small memory model, all functions
are near unless explicitly labeled as far, so reducing far calls is not a problem. In the
medium memory model, all functions default to far and therefore receive the extra prolog
and epilog code. In C version 6.0, you can use two methods to reduce this overhead:

• Organize source modules. Label all functions explicitly as either near or far, and
compile with the medium model.

• Use mixed-model programming with small model as the base.

C/C++ version 7.0 users do not need either of these methods; they can use the /GA and
/GD options to add prolog/epilog code only to far functions marked with __export. Other
far functions are compiled without additional overhead. To add the prolog and epilog
code to all far functions, use /GA /GEf or /GD /GEf.

Organizing source modules

To reduce the number of far calls, you must organize source modules carefully. Each
module is divided into internal functions and external functions. Internal functions are
called only from within the module; external functions are called from outside the
module. As a direct result of this arrangement, internal functions are marked near and
external functions are marked far.

The Zusammen sample application is arranged in this manner. Each module has a header
file that prototypes all external functions as far. Each source file prototypes its internal
functions as near because they are not needed outside the module.

For large applications, you can use a tool such as MicroQuill's Segmentor to determine
the best segmentation to use. You can also organize source modules manually, but the
process must be repeated whenever the source file changes.

Another method for reducing far call overhead is to use the FARCALLTRANSLATION
and PACKCODE linker options. This method works exclusively on protected-mode-
only applications and should not be used in real mode. PACKCODE combines code
segments. You can specify the size of the segments to pack on the command line (for
example, /PACKCODE:8000). The default size limit is 65530 bytes. C/C++ version 7.0
turns PACKCODE on by default for all segmented executables. If a far function is called
from the same segment, FARCALLTRANSLATION replaces the far call with a near
call.

Mixed-model programming

In mixed-model programming, the small model acts as the base. All far functions are
explicitly labeled as in the previous method. Each module is compiled with the /NT
option, which places the module in a different segment, for example:

cl /c /Gw /Od /Zp /W3 /NT _MOD1 mod1.c


cl /c /Gw /Od /Zp /W3 /NT _MOD2 mod2.c

Because the small model is used, all other functions default as near model and presto!—
no far call overhead. The SDK Multipad sample application uses this method for
compiling, although many of its near functions are labeled as such. Make3 compiles
Zusammen using this method.

In practice, this method does not save much work—it only eliminates the need to label
near functions explicitly. However, labeling near functions is useful for documenting
local and global functions.
In mixed-model programming, only functions in the default _TEXT code segment can
call the C run-time library. Multipad avoids this limitation by not calling any C run-time
library functions. Mixed-model programming uses the small-model C library, which is
placed in the _TEXT segment. Because these library routines are based in small model,
they assume all code as near. If a C library function is called from a different segment, a
linker fixup error occurs because the linker cannot resolve a near jump into another
segment. There is no convenient way to avoid this restriction.

Removing the C run-time library

Because the C run-time library is not used, you need not link to it. The Windows version
3.1 SDK includes libraries named xNOCRTW.LIBthat do not contain any C run-time
functions. Each memory model has one such library containing the minimum amount of
code needed to resolve all compiler references. Using this library saves about 1.5K from
the _TEXT code segment size and about 500 bytes from the default data segment size.
Linking time also improves slightly. When using the xNOCRTW.LIB libraries, note that
the standard C libraries may contain some operations that seem ordinary (such as long
multiplication).

Examining the Prolog and Epilog Code

Decreasing the number of far functions is only part of the battle. Not all far functions
need the full prolog and epilog code, as the existence of the /GW, /GA, and /GD options
shows. The C/C++ version 7.0 /GA and /GD options provide the best achievable
optimizations of the prolog and epilog code. The C version 6.0 /GW option provides an
optimized version of the prolog/epilog code for far functions that are not entry points.
However, when armed with a little knowledge, the C version 6.0 compiler user can
generate better results for protected-mode applications than those the /GW option
provides, as discussed in the following sections.

What does the prolog/epilog code do anyway?

The prolog/epilog code sets the DS register to the correct value to compensate for the
existence of multiple data segments and their movements. The second column of Table 4
shows the assembly-language listing of the prolog/epilog code that every far function
receives when it is compiled with /Gw. The last column shows the prolog/epilog code
that near functions receive. This is the same code that far functions contain when they are
not compiled with /Gw.

Table 4. Assembly Listing of Prolog and Epilog Code (C 6.0)

C/C++ version 7.0 provides additional optimizations for real mode, even if you use the
/Gw and /GW options. These optimizations include:
• Using mov ax,ds instead of a push/pop sequence in the Preamble phase.

• Using lea sp, WORD PTR -2[bp] for the Release Frame phase.

Table 5 shows the compiler output for these options.

Table 5. Assembly Listing of Prolog and Epilog Code (C/C++ 7.0)

Most of the prolog/epilog code is not needed in protected mode but is essential for real
mode. The /GW option does not have the push ds instruction that all far functions
require in real mode to save the data segment; for this reason, /GW does not work in real
mode. Not much can be done to optimize the prolog/epilog code that C version 6.0
generates for real-mode applications, so this article focuses on optimization in protected
mode only. For more information on what happens during real mode, see Programming
Windows by Charles Petzold (Redmond, Wash.: Microsoft Press, 1990). For the compiler
writer's viewpoint, see the Windows version 3.1 SDK Help file.

The order of phases in the C/C++ version 7.0 compiler options /GA and /GD differs
slightly from that of /Gw: The Alloc Frame phase occurs before the Save DS and Load
DS phases (when compiling without /G2). As a result, the /GA and /GD options remove
the two dec bp instructions from the Release Frame phase. The compiler output for the
/GA and /GD options is shown in Table 6.

Table 6. Assembly Listing of Prolog and Epilog Code (C/C++ 7.0)

Protected mode only

The Mark Frame and Unmark Frame phases are not needed during protected mode and
can be ignored. The prolog/epilog code for a near function and the prolog/epilog
compiled with /Gw differ in four phases: Preamble, Save DS, Load DS, and Restore DS.
The other phases—Link Frame, Alloc Frame, Release Frame, and Unlink Frame—are the
same; they set up the stack frame for the function. (See Figure 1.)

Figure 1. Stack Frame Creation

The compiler generates code to access the parameters passed to the function using
positive offsets to BP ([BP + XXXX]). Negative offsets from BP ([BP – XXXX]) access
the function's local variables. This happens for all C functions—near functions, far
functions, and functions compiled with the /Gw option.

Optimizing for 80286 processors (/G2)

Because protected mode requires an 80286 processor at the minimum, you should use
some of the special 80286 instructions through the /G2 option. Two instructions—enter
and leave—are relevant to our current discussion. Enter performs the same function as
Link Frame and Allocate Frame, and leave performs the same function as Release Frame
and Unlink Frame. Table 7 shows the prolog/epilog code for near and far functions
compiled with the /G2s option and without the /Gw option.

Table 7. Assembly Listing of Prolog/Epilog Code Compiled with /G2s

Unfortunately, the /Gw option overrides the /G2 option in C version 6.0 and generates the
prolog/epilog code without the enter and leave instructions. The C/C++ version 7.0
compiler corrects this limitation; it generates Windows prolog/epilog code with the enter
and leave instructions when it compiles with /GA or /GD and /G2. Table 8 shows the
prolog/epilog code for functions compiled with C/C++ version 7.0 options.

Table 8. Assembly Listing of Prolog and Epilog Code for C/C++ 7.0 (Protected Mode
Only)

The prolog preamble's purpose

The Preamble, Save DS, Load DS, and Restore DS phases exist only when you compile a
far function with a Windows option (/Gw, /GW, /GD, or /GA). Programs developed for
Windows, unlike those developed for MS-DOS, can have multiple instances, each with
its own movable default data segment. When control is transferred from Windows to an
application or from an application to a DLL, a mechanism is needed for changing DS to
point to the correct default data segment. This mechanism consists of the prolog/epilog
code, the Windows program loader, the EXPORT section of the DEF file (or _export),
and the MakeProcInstance function.

Nothing seems to happen in the Preamble, Save DS, and Load DS phases:

push ds ; move ds into ax


pop ax
nop ; now ax = ds
push ds ; save ds
mov ds,ax ; ds = ax, but ax = ds
; therefore ds = ds
It seems like a lot of work to set DS equal to itself. However, a lot happens behind the
scenes. Examining the code with the Microsoft CodeView® debugger reveals three
Preamble phases different from the code listing the /Fc compiler option generates (see
Make4). The Client_WinProc (in WCLIENT.C), Client_Initialize (in CLIENT.C), and
Picker_Do (in PICKER.C) functions demonstrate these phases. Table 9 lists these
phases.

Table 9. Preamble Variations

The Windows program loader magically changes the Preamble phase of the prolog. The
loader first examines the list of exported functions when it loads a program. When it finds
an entry-point function with the /Gw preamble, it changes the preamble. If the function is
not exported or the preamble is different, the loader leaves it alone, and DS retains its
value. For example, in Client_Initialize, the DS register does not have to be changed so
it is not.

If the function is part of a single-instance application, the value can be set directly
because single-instance applications have only one data segment. Because DLLs are
always single instance, they belong to this group. AX is set directly to DGROUP. In the
Load DS phase, DS is loaded with the DGROUP value from AX, resulting in a correct
DS value for the function.

In exported far functions, as demonstrated by Client_WinProc, Windows removes the


entire preamble but still loads DS from AX during the Load DS phase. So where does it
load AX? It depends on how Windows calls the function. For all window procedures,
including Client_WinProc, Windows sets up AX correctly before calling the procedure.

That leaves callbacks such as those used with the EnumFontFamilies function. You can
set up an EnumFontFamilies callback as follows:

FARPROC lpCallBack;

lpCallBack = MakeProcInstance(CallBack, hInstance);


EnumFontFamilies(hdc, NULL, lpCallBack, (LPSTR)NULL);
FreeProcInstance(lpCallBack);

MakeProcInstance creates an instance thunk, which is basically a jump table with an


added capability: setting AX. Instance thunks appear as follows:

mov ax,XXXX
jmp <actual function address> ;jump to actual function

The return value of MakeProcInstance is the address of the instance thunk. This address
is passed to EnumFontFamilies, which calls the instance thunk instead of the function
itself. The instance thunk sets up AX with the current address of the data segment. In real
mode, Windows changes this address each time it moves the data segment and jumps to
the function that loads DS with the value in AX. And presto! chango! DS has the correct
value.

This discussion leads to some interesting conclusions:

• An application cannot call an exported far function directly; it must use the result
of MakeProcInstance as a function pointer instead.

• An application should not use MakeProcInstance when calling a function in a


DLL.

• DLLs should not call MakeProcInstance on any exported far function that
resides inside the DLL.

• Nonexported far functions do not need the prolog/epilog code.

• Windows sets up the AX register as part of its message-passing mechanism.


Window procedures do not have instance thunks.

• There are no obvious optimizations.

FixDS (/GA and /GEs)

FixDS by Michael Geary is a public domain program available on CompuServe® that


brings insight and imagination to the optimization process. Borland® C++ and Microsoft
C/C++ version 7.0 both incorporate this feature. Under Microsoft C/C++ version 7.0 you
can use /GA to perform the same function as FixDS (see Tables 6 and 8).

So far, we have not discussed the SS stack segment register. The prolog code does not set
SS anywhere. This must mean that the Windows Task Manager sets SS before the
function is executed. Because a Windows-based application is not normally compiled
with the /Au or /Aw option, SS == DS. So there is no reason why DS cannot be loaded
simply from SS.

Instead of pushing DS into AX, FixDS modifies the prolog to put SS into AX, which is
eventually placed in DS (see the fourth column of Table 10). This preamble differs from
the standard Windows preamble, so the Windows loader does not modify it.

This method has two convenient side effects:

• You no longer need MakeProcInstance.

• You do not have to export entry points.

FixDS does not work for DLLs because DS != SS.


Table 10. Assembly Listing of Optimized Prolog and Epilog

The C/C++ version 7.0 compiler extends the ideas of FixDS by letting the programmer
specify where DS gets its value. You can use the /GEx option in conjunction with the
/GA and /GD options to load DS. The following options are available:

• /GEa—Load DS from AX. This is equivalent to /Gw and /GW.

• /GEd—Load DS from DGROUP. This is the default behavior for /GD and is
useful for DLLs, as explained in the next section.

• /GEs—Load DS from SS. This is equivalent to FixDS and is the default behavior
for /GA.

When you compile an application with /GA, the functions marked with __export are not
really exported (you can look at the exported functions with EXEHDR). If you compile
the program with /GA /GEe, the EXEHDR listing shows all exported functions. A
program that you compile with /GA loads DS from SS and does not need to export its
entry points, as mentioned above. A program compiled with /GA /GEa should normally
be compiled with /GEe.

The /GD and /GA options work differently. The /GD option exports functions marked
with __export. To stop the compiler from exporting functions in a DLL, use /GA /GEd
/D_WINDLL /Aw instead of /GD.

Optimized DLL Prolog and Epilog

Although the previous recommendations (excluding FixDS) work fine with DLLs, a
better optimization method exists. To optimize a DLL with C version 6.0, compile all
DLL modules with the options listed in Table 2 for modules without entry points:

/Aw /G2 /Gs /Oclgne /W3 /Zp

This compilation does not generate prolog or epilog code because the /Gw option is not
used. To load DS correctly, mark all entry-point functions with _loadds. Place the
functions that the client application calls in the DEF file. This changes the prolog/epilog
code to match the second column of Table 10.

_loadds basically adds the same lines that the Windows function loader changes in the
Preamble for a DLL. See Make5 for an example of this method. Again, this is for
protected-mode-only applications.
The /GD option in C/C++ version 7.0 defaults to loading DS from the default data
segment (see the third column of Table 10). The /GD option also sets _WINDLL and
/Aw.

Notice that the compiler options include /Aw but not /Au. The /Aw option informs the
compiler that DS != SS. The /Au option is equivalent to /Aw and a _loadds on every
function, far and near. This is not an optimization because even near functions receive the
three lines of code that set up the DS register.

Using _loadds does not work for applications that have multiple instances and therefore
multiple DGROUPs. It does, however, work for single-instance applications. A single-
instance application need not export functions because the application passes function
addresses to Windows. The application should make sure that another instance cannot
start by checking the value of hInstance. Windows creates a new data segment for the
application, but the application contains hard-coded pointers to the first data segment.
The application should also set up a single data segment in the DEF file as:

DATA SINGLE MOVEABLE

Otherwise, the _loadds function modifier will generate warnings. There is no need to use
MakeProcInstance because the _loadds function modifier sets up the DS register
correctly.

EXPORT vs. _export

In the previous examples, the functions are exported in the DEF file. You can also use the
_export keyword to export DLL functions. This method has some drawbacks, depending
on the method you use to link the application with the DLL. There are three methods:

• Including an IMPORTS line in the DEF file

• Using the IMPLIB utility

• Linking explicitly at run time

Including an IMPORTS line in the DEF file

Including an IMPORTS line in the DEF file of the application, for example:

IMPORTS
PICKER.Picker_Do

although inconvenient for DLLs with many functions, allows you to rename functions,
for example:

IMPORTS
PickIt = PICKER.Picker_Do
Now the application can call PickIt instead of Picker_Do. This is useful when DLLs
from different vendors use the same function name and when you import a function
directly by its ordinal number. The linker gives each exported function an ordinal number
to speed up linking by eliminating the need to search for the function. You can override
the default ordinal number by specifying a number after an "at" sign (@) in the DLL's
DEF file, for example:

; DLL .DEF
EXPORTS
Picker_Do @1

An application can import this function with the following DEF file entry:

; Apps .DEF
IMPORTS
PickIt = PICKER.1

DLLs should always include ordinal numbers on exported functions.

Using the IMPLIB utility

Most programmers use the IMPLIB utility instead of an IMPORTS line in their DEF
files. IMPLIB takes the DEF file of a DLL or, if _export is used, takes the DLL itself and
builds a LIB file. The application links with the LIB file to resolve the calls to the DLL.
Therefore, the IMPORTS line is not needed.

One of the drawbacks of _export is that it assumes linking by name instead of linking by
ordinal number. As a result, the linker gives the function an ordinal number and the
function name is placed in the Resident Name Table.

The linker is not likely to assign the same number each time it links the program. For
example, the output of the EXEHDR program for a program with two exported functions
may originally look like this:

Exports:
ord seg offset name
1 1 07a1 WEP exported, shared data
4 1 0e06 ___EXPORTEDSTUB exported, shared data
3 1 00ac PICKER_OLDDLGPROC exported, shared data
2 1 0061 PICKER_DO exported, shared data

Adding a third exported function to the program may change all the ordinals in the
EXEHDR output, for example:

Exports:
ord seg offset name
1 1 07a1 WEP exported, shared data
3 1 0e06 ___EXPORTEDSTUB exported, shared data
4 1 0f00 NewFunction exported, shared data
2 1 00ac PICKER_OLDDLGPROC exported, shared data
5 1 0061 PICKER_DO exported, shared data

Applications that use any method of ordinal linking must now be recompiled to use the
new ordinals. You may also have to recompile if you use the EXPORT statement without
explicitly giving ordinal numbers. Having to recompile an application each time the DLL
changes offsets many of the advantages of using DLLs.

Linking by name also results in function names being placed in the Resident Name Table,
which is an array of function addresses indexed by function name. The Resident Name
Table stays in memory for the life of the DLL. When linking by ordinal number, the
function names reside on disk in the Non-Resident Name Table while an array of function
addresses indexed by ordinal number resides in memory.

For a large DLL, the Resident Name Table could consume a significant amount of
memory. Also, linking by name is much slower than linking by ordinal number because
Windows must perform a series of string comparisons to find the function in the table.

Linking explicitly at run time

Run-time dynamic linking occurs when a function call is resolved at run time instead of
load time. For example:

HANDLE hLib ;
FARPROC lpfnPick ;
// Get library handle.
hLib = LoadLibrary("PICKER.DLL") ;
// Get address of function.
lpfnPick = GetProcAddress(hLib, "Picker_Do") ;
// Call the function.
(*lpfnPick) (hwnd, &aPicker ) ;
// Free the library.
FreeLibrary( hLib) ;

Linking by name does not use the ordinal number of the function. When linking by name
it is much faster to have the function name in the Resident Name Table.

However, using ordinal numbers is still faster and uses less memory. For example:

#define PICKER_DO 3
HANDLE hLib ;
FARPROC lpfnPick ;
// Get library handle.
hLib = LoadLibrary("PICKER.DLL") ;
// Get address of function.
lpfnPick = GetProcAddress(hLib, MAKEINTRESOURCE(PICKER_DO)) ;
// Call the function.
(*lpfnPick) (hwnd, &aPicker ) ;
// Free the library.
FreeLibrary( hLib) ;
The fastest, most flexible method, regardless of the linking method you use, is to
explicitly list the functions with ordinal numbers in the EXPORTS section of the DEF
file. The C/C++ version 7.0 /GD option encourages the use of __export to mark entry
points. If you use this option, we recommend that you add an EXPORT entry in the DEF
file for all functions that an application calls.

DS != SS issues

Some problems can arise within a DLL because DS != SS. A common problem occurs
when a DLL calls the standard C run-time library. For example, if you compile the
following code with the /Aw option:

void Foo()
{
char str[10]; // allocates str on stack,
strcpy(str,"BAR"); // passing the far pointer as a
// near pointer
}

the compiler generates a near/far mismatch error because strcpy expects str to be in the
default data segment (a near pointer). However, str is allocated on the stack (making it a
far pointer) because the stack segment does not equal the data segment. The following
examples show how to avoid this situation.

• You can place the array in the data segment by making it static:
• void Foo2()
• {
• static char str[10]; // allocate str in data segment
• strcpy(str,"BAR");
• }
• You can place the array in the data segment by making it global:
• char str[10]; // allocate str in data segment

• void Foo3()
• {
• strcpy(str,"BAR");
• }
• Instead of linking with the small-model version of strcpy, you can use the large-
model (also called the model-independent) version:
• void Foo4()
• {
• char str[10];
• _fstrcpy(str,"BAR"); // accept far pointers
• }

This version expects far pointers instead of near pointers and therefore casts the
near pointers into far pointers.
• You can also use the following functions from the Windows library:
• lstrcat

• lstrcmp

• lstrcmpi

• lstrcpy

• lstrlen

• wsprintf

• wvsprintf

If you use one of these functions, the previous example becomes:

void Foo4()
{
char str[10];
lstrcpy(str,"BAR"); // accept far pointers
}

The following code fragment:

void Foo5()
{
char str[10]; // allocated on stack
char *pstr ; // near pointer based on DS

pstr = str ; // loss of segment


strcpy(pstr,"BAR");
}

causes the compiler to generate the error message:

warning C4758: address of automatic (local) variable taken.


DS != SS.

In this example, pstr is set to the offset of str, and the segment is lost because pstr is a
near pointer. Declaring pstr a far pointer eliminates this problem. However, you cannot
pass a far pointer to strcpy so you must use _fstrcpy, which results in the following
corrected code:

void Foo6()
{
char str[10];
char FAR *pstr ; // far pointer

pstr = str ; // no segment loss


_fstrcpy(pstr,"BAR");
}

The following code also prevents the segment loss:

void Foo7()
{
static char str[10]; // DS-based pointer
char *pstr ;

pstr = str ; // no segment loss


strcpy(pstr,"BAR");
}

What happens if the C run-time function does not have a far version? For example, in the
Picker DLL, the picker_OnMouseUp function calls _splitpath, which requires near
pointers. Using static or global structures poses problems for multiple applications that
use Picker simultaneously. To avoid these problems, Picker allocates memory from the
local heap with the LocalAlloc(LMEM_FIXED,size) function, which returns a local
pointer. This is exactly what Picker needs to call _splitpath.

Summary

Follow these guidelines to avoid DS != SS problems:

• Be sure that all pointers you pass to a DLL are far pointers.

• Declare pointers to stack variables as far pointers.

• Declare arrays as static or global.

• Avoid storing arrays on the stack.

• Avoid storing variables referenced by pointers on the stack.

• Use the local heap for storing data.

• Use far versions of C run-time functions (such as _fstrcpy).

• Use equivalent Windows functions (such as wsprintf or lstrcpy).

• Use prototypes on all functions.

Reminders about DLLs:

• FixDS does not work with DLLs because DS != SS.

• Avoid using _export in DLLs with C version 6.0.


• Use the DEF file to override the default behavior of functions marked with
_export.

• Always assign ordinal numbers to all exported DLL functions.

• /Au introduces a considerable amount of overhead; use /Aw and _loadds instead.

• Replace /Gw with _loadds on exported functions.

Optimizing the Calling Convention


Several calling conventions can be used for optimization, including _cdecl (/Gd),
PASCAL (/Gc), and _fastcall (/Gr):

• _cdecl is the default C calling convention and is slightly slower than PASCAL
and _fastcall.

• PASCAL (defined in WINDOWS.H as _pascal) is used to communicate between


Windows and an application. It is faster than _cdecl but does not allow variable
argument functions such as wsprintf.

• _fastcall is the fastest method. It places some of the parameters in registers but
does not support variable argument functions and cannot be used with _export or
PASCAL, so entry points cannot use the _fastcall modifier. Under C/C++ version
7.0, the __fastcall modifiercan conflict with the Windows prolog/epilog code if
used in the following combinations.

• __fastcall, __far, Gw (also invalid in C version 6.0)

• __fastcall, __far, __export, GA

• __fastcall, __far, __export, GD

• __fastcall, __far, GA, GEf

• __fastcall, __far, GD, GEf

• __fastcall, __far, __export, GA, GEf

• __fastcall, __far, __export, GD, GEf

Because the C run-time library is compiled with the _cdecl convention, you must include
header files such as STDLIB.H and STRING.H when you use a different calling
convention. These header files explicitly mark each function as _cdecl to simplify
changing the default convention. When you use a third-party library, you may have to add
the _cdecl function modifier to the header files.
You can use any calling convention as the default convention for applications, as long as
you declare all entry points FAR PASCAL and declare the WinMain function PASCAL.
Marking callback functions as PASCAL is usually safer, even if you use the /Gc Pascal
convention option, because it avoids problems if the calling convention changes
inadvertently. It is also a good form of code commenting.

Summary of calling conventions:

• WinMain should use the PASCAL calling convention.

• Entry points that Windows calls must be FAR PASCAL.

• Only _cdecl allows variable arguments.

• _fastcall is incompatible with _export or PASCAL and is therefore incompatible


with Windows prolog/epilog code.

DLLs and _cdecl

A DLL, unlike an application, can use any calling convention, even for application-called
entry points. An application that calls a DLL must know which calling convention the
DLL expects and must use that convention.

A DLL may need to implement a variable argument function. Because _cdecl is the only
convention that supports variable arguments, it is the convention of choice. If you want a
DLL function to use variable arguments, use the _cdecl convention instead of the
PASCAL convention.

Note the following caveats when using variable argument lists in DLLs:

• The variable argument macros from STDARG.H use the default pointer size to
point to the arguments that are on the stack. In the small or medium model, the
pointers are near pointers. Because DS != SS, these pointers do not point to the
correct value and must be changed to far pointers before you can use these
macros, as shown in the modified STDARG.H below:
• /****************************************************************
• * File: wstdarg.h
• * Remarks: Macro definitions for variable argument lists
• * used in DLLs.
• ****************************************************************/
• typedef char _far *wva_list ;

• #define wva_start( ap, v ) (ap = (wva_list) &v + sizeof( v ))
• #define wva_arg( ap, t ) (((t _far *)(ap += sizeof( t )))[-1])
• #define wva_end( ap ) (ap = NULL)
• When passing arguments by reference, always use far pointer declarations. The
compiler synthesizes far pointers by pushing the DS and the offset of the memory
location onto the stack. This provides the DLL with the proper information to
access the application's data segment.

• Because functions with variable arguments are defined using _cdecl, pointer
arguments that are not declared in the parameter list must be typecast in the
function call; otherwise, the omission of the function parameter prototype causes
unpredictable results. For example:
• void FAR _cdecl DebugPrint( LPSTR lpStr, LPSTR lpFmt, ... )
• DebugPrint( szValue, "%s, value passed: %d\r\n",
• (LPSTR) "DebugPrint() called", (int) 10 ) ;
• When you import or export a function, you must declare it with an underscore (_)
prefix in the DEF file. You must also preserve case sensitivity in the function
name. For example, you can declare the function above as follows:
• EXPORTS
• WEP @1 RESIDENTNAME
• _DebugPrint @2
• cdecl functions must either be linked by ordinal number or have all-uppercase
names.

Unlike Pascal functions, which are converted to uppercase before they are
exported, _cdecl functions retain their case when exported. The Windows
dynamic-linking mechanism always converts function names to uppercase before
it looks in the DLL for the function. However, functions exported from a DLL are
expected to be in uppercase and are not converted. The result is a comparison
between an uppercase function name and a mixed-case function name. This
comparison, of course, fails. The solution is to declare the function name all-
uppercase or to link by ordinal number and avoid the whole comparison problem.

• Variable argument C run-time library functions such as vsprintf and vfprintf do


not take DS != SS into account. These functions are not available in DLLs.
Compile with /D_WINDLL instead of /D_WINDOWS to detect functions that
DLLs do not support. The C/C++ version 7.0 compiler option /GD does this
automatically.

If the DLL will be used with different languages such as Visual Basic, Borland C++,
Microsoft Excel, Zortech C++, or Microsoft FORTRAN, you should use the PASCAL
convention. The registers used by the _fastcall convention can change between compiler
versions and are not compatible between compilers by different vendors.

Aliasing and Windows (/Ow and /Oa)


An alias is a second name that refers to a memory location. For example, in:

int i ;
int *p ;
p = &i ;
pointer p is an alias of variable i. You can use aliases to perform tasks while keeping the
original pointer around, for example:

// No error checking.
// Get a pointer.
//
LPSTR ptr = GlobalLock(GlobalAlloc(GHND,1000));
LPSRT ptr_alias = ptr ; // alias the pointer
for ( i = 0 ; i < 1000 ; i++)
*(ptr_alias++) = foo(i) ; // use the alias
GlobalFree(GlobalHandle(ptr)); // free the memory

The compiler makes the following assumptions if there is no aliasing:

• If a variable is used directly, no pointers reference that variable.

• If a pointer references a variable, that variable is not used directly.

• If a pointer modifies a memory location, no other pointers access the same


memory location.

Global Register Allocation (/Oe)

Although aliasing is a common and acceptable practice, the compiler can improve
optimizations if it can assume that there is no aliasing, because it can place more memory
locations into registers. By default, the compiler uses registers:

• To hold temporary copies of variables.

• To hold variables declared with the register keyword.

• To pass arguments to functions declared with fastcall or compiled with /Gr.

The /Ow and /Oa options signal the compiler that it has more freedom to place variables
or memory locations into registers; these options do not cause the compiler to keep
variables in registers.

The global register allocation option /Oe, on the other hand, allocates register storage to
variables, memory locations, or common subexpressions. Instead of using registers only
for temporary storage or for producing intermediate results, the /Oe option places the
most frequently used variables into registers. For example, /Oe places a window handle,
hWnd, in a register if a function is likely to use hWnd repeatedly.

Because the no-aliasing options increase the compiler's opportunities to place a variable
in a register, it makes sense to use these options with /Oe. In many cases, the /Ow and
/Oa options do not optimize without the /Oe option. In some cases, you can eliminate
problems with /Ow or /Oa by turning off /Oe optimization.
Using /Ow Instead of /Oa

What is the difference between /Ow (Windows version) and /Oa? Basically, /Ow is a
relaxed version of /Oa. It assumes aliasing will occur across function calls, so a memory
location placed in a register is reloaded after a function call. For example, in:

foobar( int * p) ;
{
// Compiler puts the value that p points to into a register.
*p = 5 ;
foo() ;
// If compiled with /Ow, the compiler reloads the register
// with p.
(*p)++ ;
}

the compiler places the memory referenced by pointer p into a register. If the /Ow option
is set, the compiler reloads the register. If the /Oa option is set, pointer p is not reloaded
after the function call. Thus, /Ow tells the compiler to forget everything about pointed-to
values after function calls.

Compiling the code fragment above with /Ox and /Oa results in the following code:

mov si,WORD PTR [bp+4] ; pointer p is passed in at [bp+4]


mov WORD PTR [si],5
call _foo
mov WORD PTR [si],6 ; compiler assumes that *p cannot
; change and generates *p=6 instead
; of (*p)++

Notice how the compiler optimized away the last line that incremented pointer p.

Compiling the code with /Ox and /Ow results in the following correct version:

mov si,WORD PTR [bp+4] ; p


mov WORD PTR [si],5
call _foo
inc WORD PTR [si] ; compiler assumes that
; *p might change.

To understand the benefit this technique adds to a Windows-based program, look at the
following code fragment:

void Foo(HWND hwnd)


{
char ach[80];

// Zero terminate the string in case of error.


//
ach[0] = 0;
SendMessage(hwnd, WM_GETTEXT, sizeof(ach), (LONG)(LPSTR)ach);

// If some text is returned, do something with it.


//
if (ach[0] != 0)
{
Bar(ach);
}
}

If you compile this code fragment with /Oa and C version 6.0, Bar is never called. If you
use C/C++ version 7.0, Bar is called. The C version 6.0 compiler assumes that ach does
not change in the SendMessage call and optimizes the call to the if block because ach[0]
is still zero. If you compile the code with /Ow, the compiler expects ach to change after
any function, including SendMessage.

The C version 6.0 compiler appears to be pretty dumb—it does not realize that the ach
pointer was passed to SendMessage. However, as far as the compiler can tell, a LONG
was passed, not the pointer. If a pointer had been passed, /Oa would have worked. For
example, in the following code:

void SomeFunc(HWND hwnd, LPSTR astr, int asize)


{
SendMessage(hwnd, WM_GETTEXT, asize, (LONG)astr);
}

void Foo(HWND hwnd)


{
char ach[80];

//Pass a pointer.
SomeFunc(hwnd,(LPSTR)ach, sizeof(ach));

if (ach[0] != 0)
{
Bar(ach);
}
}

the compiler knows that the pointer is being passed and can be changed. This problem
can occur in any function that takes a pointer as a DWORD (lparam) or a WORD
(wparam). The C/C++ version 7.0 compiler corrects this behavior.

You can also solve this problem by simply declaring ach volatile. This causes the
compiler to place a variable in a register only if it must. However, /Ow usually generates
better code than using the volatile keyword.

Although /Ow is the easiest solution, the code it generates is not as efficient as the code
/Oa generates, as illustrated by the hWnd window handle in the previous example.
Window handles are commonly used in functions. They are perfect examples of variable
types that are meant to be placed into registers; however, with the /Ow option they are
reloaded after any function call. Using #pragma optimize at strategic locations to turn
/Ow and /Oa off prevents problems associated with reloading. A profiler can help
determine the placement of such statements.

The STRICT macros defined in the Windows version 3.1 SDK WINDOWS.H file also
reduce the need for the /Ow option. WINDOWSX.H includes macros that make most
window functions type-safe. So, a pointer is passed as a pointer instead of being passed
as a LONG. The STRICT macros can make an application more robust and should be
used even if the /Oa option is not in effect.

Avoiding Undocumented Features

Undocumented "features" are rarely necessary or useful, with the exception of file
functions such as _lcreate that were not documented before Windows version 3.x. For
example, an undocumented feature that saves neither time nor effort is demonstrated by
the following code segment.

HANDLE h = LocalAlloc(LMEM_MOVEABLE, cb);


HANDLE h2;
char* p;

// WARNING: Undocumented Hack.


// Dereference the handle without locking it.
//
char* p = *((char**)h);

// Use *p for a bit.


*p = 0;

h2 = LocalAlloc(LMEM_MOVEABLE, cb);

// Hmm... It could have moved, so dereference it again.


//
p = *((char**)h);

if (*p = 0)
{
// Do something.
}

You should not use this undocumented feature for two reasons:

• Future versions of Windows will have a flat memory model and will not support
this type of memory accessing.

• The code will not compile as expected if you use the /Oa option. The p pointer is
not passed to the LocalAlloc function; therefore, the compiler assumes that p will
not change as a result of this function call. The programmer has tried to outsmart
the compiler by dereferencing the pointer again after the function call, so the
program appears to be safe. Not quite.... The compiler removes the second
dereference statement because it assumes that p did not change as a result of the
function call; this is exactly what the person who had to support the code would
do.

To avoid this problem:

• Do not use this undocumented feature. (This is the best solution.)

• Use /Ow instead of /Oa.

• Always lock handles to memory before using them.

• Use #pragma optimize to selectively turn the /Ow option on and off. You can
also turn /Oe off.

• Use the volatile keyword to ensure that variables are not placed in registers.

Programming at Large
Dale Rogerson
Microsoft Developer Network Technology Group

Created: April 13, 1992

Abstract
Microsoft® Windows™ version 3.1 signals the death of Windows real mode. With the
release of Windows version 3.1, only standard and enhanced modes are supported. The
end of real mode is the beginning of new programming freedoms, such as writing large-
model applications.

This article explains why the large model is valid for protected mode applications and
discusses the solutions for single instances and the Windows version 3.0 page-locking
bug, limitations of large-model applications.

The Large Memory Model and Protected Mode


For large-model applications running under real mode, the Microsoft® Windows™
version 3.0 graphical environment fixed the data segments. Fixed segments cannot move,
reducing the ability of Windows to manage memory effectively. Under protected mode,
Windows can move fixed data segments. Therefore, protected mode does not suffer the
performance degradation that real mode does.
The difference between real mode's inability to move memory and protected mode's
ability to move memory lies in the way the two modes address memory. Large-model
data pointers default to 32-bit far pointers. In real mode, a far pointer consisted of a
segment address and a segment offset, both 16 bits in length. If Windows moved the
segment, the segment address would change. Windows had no efficient method for
tracking and updating all pointers to a segment.

In protected mode, the processor provides a mechanism, the segment selector, that
removes the need to track and update individual pointers. All far pointers in protected
mode consist of a 16-bit segment selector and a 16-bit segment offset. The segment
selector does not refer directly to a physical address; instead, it indexes into a table. The
value in this table is a segment address. When a segment moves, the segment selector
does not change, but the value in the table is updated. The maintenance of the segment
selector and the selector tables is supported directly by the Intel® 80x86 microprocessor.

While the segment selector solves many of the old problems caused by using the large
model, it does not resolve two limitations. One limitation requires applications with
multiple data segments to have only a single instance. The other limitation is a bug in
Windows version 3.0 that caused multiple data segments to be page-locked in memory.
These limitations do not affect dynamic-link libraries.

Single Instances
Windows version 3.1 cannot run multiple instances of applications with multiple read-
write data segments. If a large-model application has a single read-write data segment, it
can run multiple instances. A read-only segment can also be safely shared by multiple
instances because the instances cannot change the segment. Most large-model
applications, however, have multiple data segments and, therefore, cannot run multiple
instances.

While there are several methods for getting only one data segment in a large-model
program, one must remember that the application can have only 64 kilobytes (K) of static
data, local heap, and stack combined. This is the same as a medium-model program. For
this reason, when porting from a flat model 32-bit environment, it is probably best to use
a compiler that supports development of 32-bit applications under Windows. These
compilers, such as Watcom C 9.0, MetaWare 32-Bit Windows Application Development
Kit, or MicroWay NPD C-386, use WINMEM32.DLL to get a full 32-bit flat memory
model.

The Reason

In a multiple-instance application, all instances share the same code segments but have
unique default data segments. Small- and medium-model applications have only one data
segment. Most large-model applications have multiple data segments, but the current
Windows kernel cannot resolve fixups to multiple data segments. Consider the following
code fragment found in large-model applications that establishes the DS register:
mov ax,_data_01
mov ds,ax

This code is shared by all instances of the application. When the code is loaded,
_data_01 can hold only one value. Windows has no way to associate other data segments
with a given instance of an application.

The program loader determines if only one instance is allowed after examining the .EXE
header. If it discovers more than one data segment, it limits an application to one
instance. If an application has less than 64K of data, stack, and local heap, it is possible to
collapse the data into one data segment.

Gaining Multiple Instances

To get multiple instances, there must be only one read-write data segment. Under
Microsoft C/C++ version 7.0, follow these guidelines to allow for multiple instances:

• Do not use /ND to name extra data segments unless the segment is READONLY.

• Use the .DEF file to mark extra data segments READONLY.

• Do not use __far or FAR to mark data items.

• Use /PACKDATA to combine data segments.

• Use /Gt65500 /Gx to force all data into the default data segment.

All of the above guidelines apply to Microsoft C version 6.0, except for the last one.
Microsoft C version 6.0 and C/C++ version 7.0 will usually generate two read-write data
segments. One is for initialized static data (DATA). The other one (FAR_BSS) is for
uninitialized static data. The Borland® C compilers default to generating only one data
segment. The existence of multiple data segments for a program called
SOMEPROG.EXE can be verified by the following command:

c:\> EXEHDR -v someprog.exe | more

Microsoft C version 6.0 does not have the /Gx option to stop the generation of FAR_BSS
and to combine initialized and uninitialized data. While there are ways to stop the
creation of FAR_BSS with C version 6.0, in most cases it is easier to use C/C++ version
7.0. To eliminate FAR_BSS with C version 6.0:

• Initialize all uninitialized static variables, and mark all extern variables as NEAR.

• Mark all variables as NEAR, forcing the variables into the DATA segment.

For large programs, these ways of eliminating FAR_BSS can be very time-consuming.
The big problem with all methods for gaining multiple instances is that the application
still has only one read-write data segment. It does not have more data space than a
medium- or small-model program. A large-model program can have either multiple
instances or multiple read-write data segments, but not both.

Windows Version 3.0 Page-Locking Bug


Multiple data segments do not cause any problems in Windows version 3.1, except for
requiring an application to run a single instance. In Windows version 3.0, however, there
is a bug in the memory manager that page-locks fixed segments of an application. When
a segment is page-locked, it becomes a dam in memory because it cannot be moved in
physical memory nor paged to disk. This is of great concern for applications compiled
with a large model, because large-model applications can have more than one data
segment that is fixed. Under Windows version 3.1, fixed segments in a DLL are still
page-locked to support interrupt service routines.

Page-Lock Fix

To get around the page-lock problem, follow these steps:

1. Compile your application normally, and generate a map file during linking.
Examine the map file and find the names of the FAR_DATA and FAR_BSS
segments.

2. Write one or more assembly language routines that will return handles to the
FAR_DATA and FAR_BSS segments found in step 1. The following function will
return a handle to the data segments named MYSEGMENT and FAR_BSS:
3. title simhan.asm
4. ;****************************************************************
5. ?WIN = 1
6. ?PLM=1 ; PASCAL calling convention is DEFAULT
7. ?WIN=1 ; Windows calling convention
8. ; Use 386 code?
9. .MODEL LARGE
10.include cmacros.inc
11.sBegin DATA
12.sEnd DATA
13.MYSEGMENT SEGMENT MEMORY 'FAR_DATA'
14.MYSEGMENT ENDS
15.FAR_BSS SEGMENT MEMORY 'FAR_BSS'
16.FAR_BSS ENDS
17.sBegin CODE
18.assumes CS,CODE
19.assumes DS,DATA
20.;**************************************************************
21.cProc gethandle,<PUBLIC,FAR,PASCAL>
22.cBegin
23.mov ax,MYSEGMENT
24.cEnd gethandle
25.;**************************************************************
26.cProc gethandle2,<PUBLIC,FAR,PASCAL>
27.cBegin
28.mov ax,FAR_BSS
29.cEnd gethandle2
30.sEnd CODE
31.end
32. Add a call to the following function in your application's InitInstance function
after testing the success of your CreateWindow call:
33.void unlockAll()
34.{
35.// This fix is only needed for Windows version 3.0 so check
36.// version.
37.if (LOWORD(GetVersion()) == 0x0003)
38.{
39.// Un-pagelock MYSEGMENT
40.unlockExtra(gethandle()) ;
41.// Un-pagelock FAR_BSS
42.unlockExtra(gethandle2()) ;
43.}
44.}
45.void unlockExtra(HGLOBAL hExtraSeg)
46.{
47.BOOL fRet ;
48.// Unfix segment in logical memory
49.GlobalReAlloc(hExtraSeg, 0, GMEM_MODIFY | GMEM_MOVEABLE);
50.// Only discardable memory can be GlobalPageUnlock'ed
51.GlobalReAlloc(hExtraSeg, 0, GMEM_MODIFY | GMEM_DISCARDABLE);
52.// Unfix in physical (protected mode) memory
53.GlobalPageUnlock(hExtraSeg);
54.
55.// Reset the lock count to 0 because Windows happens to lock
56.// it multiple times.
57.do {
58.fRet = GlobalUnlock(hExtraSeg);
59.} while (fRet);
60.
61.// Modify the flags to moveable
62.GlobalReAlloc(hExtraSeg, 0, GMEM_MODIFY | GMEM_MOVEABLE);
63.}
64. Modify your make file to assemble and link your procedures that return handles to
your fixed data segments.

65. Recompile your program, and check results using the Microsoft Windows 80386
Debugger (WDEB386.EXE).

Testing the Page-Lock Fix

It is a good idea to test the fix under Windows version 3.0. A program that reports the
page-lock status of segments is needed. Microsoft CodeView® for Windows and the 3.0
version of the Windows Heap Walker utility do not report the page-lock status. Also, the
3.1 version of Heap Walker does not run reliably under Windows version 3.0. WDEB386,
however, does report the page-lock status of segments.
Finally, you can use WDEB386 to get page-lock information, as follows:

1. Install the debugging version of WIN386.EXE and WIN386.SYM.

2. Run WDEB386.EXE.

3. Issue the DL selector command to dump the local descriptor table (LDT) entry
for the selector in which you are interested.

4. Take the Base linear address from the DL command and issue the .ML linear
address command.

5. Take the PFT address from the .ML command and issue the .MS PFT address
command. This will list the lock count for that page.

For more information on WDEB386.EXE, refer to Chapter 5, "Advanced Debugging:


80386 Debugger," in the Microsoft Windows version 3.1 Software Development Kit
(SDK) Programming Tools.

Words of Warning
It is important to keep the following points in mind when deciding to use the large model:

• A bug in Microsoft C/C++ version 7.0 causes C++ objects to be placed outside the
default data segment, ignoring the /Gx compiler option. To avoid this bug, specify
the object as near. For example:
• CTheApp NEAR theApp ;
• To get multiple instance large-model Microsoft Foundation Class (MFC)
applications, a special variant of the large-model libraries must be built. Use the
following make line:
• nmake MODEL=L TARGET=W DEBUG=1 OPT="/Gt65500 /Gx"

The above variant of the MFC library has not been extensively tested.

• Large-model applications run more slowly than medium- and small-model


applications.

• Basically, a multiple-instance, large-model application differs from a medium-


model application only in the size of its default data pointers.

• Multiple-instance, large-model applications have only one read-write data


segment.

• Multiple-instance, large-model applications can have only 64K total of stack,


local heap, and static data.
• It is easier to build multiple-instance, large-model applications with Microsoft
C/C++ version 7.0 and Borland C compilers than with Microsoft C version 6.0.

• When porting from a flat-model 32-bit environment, it is probably best to use a


compiler that supports development of 32-bit applications under Windows. These
compilers, such as Watcom C 9.0, MetaWare 32-Bit Windows Application
Development Kit, or MicroWay NPD C-386, use WINMEM32.DLL to get a full
32-bit flat-memory model. Another option is to wait for the release of Win32s™,
a subset of the Win32™ Application Programming Interface that lets you develop
32-bit applications for Windows version 3.1.

On a more positive note, large-model DLLs work very well because the equation SS !=
DS in the large model works exactly as it does in a DLL. Also, a DLL is always a single
instance. The Microsoft Foundation Classes recommend using a large model for DLLs.

Das könnte Ihnen auch gefallen