Intermediate Language

243885141.
doc 1 od 372
Intermediate Language

Contents

Introduction

1. Introduction to Microsoft's IL

2. IL Basics

3. Selection and Repetition

4. Keywords and Operators

5. Operator Overloading

6. Reference and Value Types

7. Pointers

8. Methods

9. Properties and Indexers
243885141.doc 2 od 372

10. Exception Handling

11. Delegates and Events

12. Arrays

13. The Other Odds and Ends

14. External DLLs

15. A GUI Application in IL

Appendix 1
Managed C++

Appendix 2
Demystifying ildasm.exe

Glossary

Introduction
The .NET languages are growing more pervasive by each passing day. We, therefore
decided to explore the world of Intermediate Language (IL), a language to which all
the source code written in various languages such as C#, COBOL etc. get compiled.
IL represents the transient stage in the process of conversion of source code into
machine language. It is inescapably imperative to gain mastery over IL, because
knowledge of IL translates into competence over IL code that may have originally
been written in any programming language. Thus, it provides a common platform to
all the programming languages. Realising its importance in the scheme of things in
the .NET world, we decided to get under the skin of IL and unravel its mysteries.

IL puts an end to the unending dissidence amongst programmers with regard to the
superiority of one programming language over the others. To this end, IL is a great
leveller. It is also a stupendous facilitator because, in the .NET world, one part of
the code may have been written in COBOL, while another may have been written in
C#, but it all eventually gets converted into IL. This provides great freedom and
fexibility to a programmer to select the language he/she is most familiar with and
does away with the need to constantly retrain oneslf to learn new languages that
seem to crop up every other day.

243885141.doc 3 od 372
Our modus operandi has been to strip the sheen of complexity surrounding IL by
presenting complex concepts in a simple and comprehensible manner. These
concepts have been corroborated with lucid examples. We have put into service all
the powers of clairvoyance at our disposal to discover and illustrate concepts of IL,
that are not readily discernible to the layman.

To facilitate the understanding of the sample programs, in every example, we have
frst presented the source code of the programs in the C# language, and then, we
have ofered their equivalent IL code. Thereafter, we have applied reverse
engineering to fathom the IL code by unravelling the equivalent C# code. We have
demonstrated how, in some cases, IL transcends the limitations of conventional
programming languages; while in others, it falls awfully short of expectations.
Though we don't expect undue approbation, it would not be immodest to mention
that, while working with the Beta version of IL, we have unearthed many
undocumented features and lacunae, which we have highlighted in the book for
your beneft.

To conclude, it can be stated that the aim of this book is to explain the various
nuances of IL and to make you adept at understanding IL code. It is also our desire
to alleviate your fear of lower level languages. The lure of IL lies in its simplicity
coupled with its tremendous power, which makes an intoxicating cocktail. But,
don't be beguiled by the apparant simplicity of the examples. We don't expect an
acquiescent attitude from you. Instead, we implore you to try out all the examples
by yourelf and ascertain their outcome. Thus, don't stand ruminating at the brink
of this exciting sea of knowledge. Dive right in! We assure you that you shall come
out a winner at the end of this sojourn into the world of IL.

Minimum Requirements

The software requirements to successfully run all the programs in this book are

Operating System - Windows 98/NT/2000
Internet Explorer 5.5
.Net Framework SDK Beta 1 (111 MB)

Internet Explorer 5.5 can be downloaded of the Microsoft site
http://www.microsoft.com/windows/ie/download/ie55sp1.htm

Net Framework SDK Beta 1 can be downloaded of the Microsoft site
http://download.microsoft.com/download/VisualStudioNET/Install/2204/NT5
/EN-US/setup.exe
Alternatively, you can visit the download section at Microsoft
( http://msdn.microsoft.com/downloads/default.asp ) and download the .Net
framework SDK Beta 1 under the Software Development Kit option.
243885141.doc 4 od 372

Acknowledgements
I wish to thank a number of people who gave me their support, new ideas and
inspiration while writing this book.

First and foremost, thanks to Manish Jain, BPB publication for publishing the
book.

Special thanks to my co-authors, Akash and Sonal, who have put in their very
best in the work assigned to them as without them and their eforts the book
would have never seen the light of day.

Thanks to Tanuja Sodhi, an ex-naval ofcer from the frst batch of lady ofcers
and an MBA from JBIMS for editing the book. She is presently freelancing as a
creative writer.

Thanks to my cover designers, Altaf Hemani and Kishore Rohra, for designing
the cover.

Thanks to Manish Purohit for putting in late hours, formatting and aligning the
text in the book.

To, Pradeep Mukhi and Shivanand Shetty, who made it simple for me and my
co-authors to come up with the book. They have always been a source of inspiration
and encouragement.

A long list of friends and my family need a mention here for their patience and
cooperation on this book while it was being written.

-Vijay Mukhi

-1-

Introduction to Microsofts IL

243885141.doc 5 od 372
The code that we write in a programming language like C#, ASP+ or in any other .NET
compatible language is fnally converted to either Assembler or Intermediate Language
(IL). Thus, code written in the COBOL Programming Language can be modifed in C#
and subsequently used in ASP+. Therefore, the best way to accentuate our
comprehension about the .NET technologies is by understanding IL.

Once you are conversant with IL, you will have no difculty in understanding the .NET
technologies, since all .NET languages fnally compile to it. IL was invented frst and it is
programming language neutral. It was then followed by other programming languages
like C#, Visual Basic.NET, ASP.NET, etc.

We shall raise the curtains on IL with a signifcantly small program. Also, we will
commence with the assumption that you are familiar with at least one .NET
programming language.

a.il
.method void vijay()
{
}

We have written a very small non-working IL program in the il subdirectory and named
it as a.il. How do we assemble it into an executable program? There is no need to fret
over this problem. Microsoft has provided a program called ilasm whose sole task is to
create an executable fle from an IL fle.

Before you run this command make sure that your path variable is set to the bin sub
directory in the framework. If not, give the command as

set path=c:\progra~1\microsoft.net\frameworksdk\in!"#A$%"

Now we use the command as follows:

c:\il&ilasm 'nologo '(uiet a.il

On doing so, the following error is generated:

)ource *le is A+)I
,rror: +o entr- point declared for e.ecutale
///// 0AIL12, /////

In future, we shall not display the frst and the last lines of the output generated by
ilasm. We shall also remove the blank lines between non-blank lines.

In IL, we are permitted to commence a line with or without a dot '.'. Anything that
begins with a dot is a directive to the assembler, asking it to perform some function,
such as creating a function or class etc. Anything that does not start with a '.' is an
actual assembler instruction.

243885141.doc 6 od 372
The signifcance of .method is that a function or method called vijay is created and this
function returns void i.e. it does not return any value. The function has been named
vijay arbitrarily for want of any other superior nomenclature.

The assembler was obviously not impressed with this program and thus brandished the
message 'no entry point'. This error message is generated because the IL fle can
contain numerous functions, and the assembler has no way of distinguishing as to
which of them is to be executed frst.

In IL, the frst function to be executed is called the entrypoint function. In C#, the
function is Main. The syntax for a function is the name followed by the familiar pair of
round () brackets. The start point and the end point of the function's code is signifed by
the curly braces {}.

a.il
{
.entrypoint
}

c:\il&ilasm 'nologo '(uiet a.il
)ource *le is A+)I
3reating #, *le
,mitting memers:
4loal Methods: 1!
5riting #, *le
6peration completed successfull-

Now no error is generated. The directive entrypoint signifes that the program execution
has to begin from this function. In this case, we have to use this directive
notwithstanding the fact that, this program has only one function. On giving the dir
command at the DOS prompt, we see three fles created. a.exe is an executable fle
which can now be executed to see the output of the program

3:\il&a
,.ception occurred: )-stem.7adImage0ormat,.ception: ,.ception from %2,)1L$:
8.988:8887. 0ailed to load 3:\IL\A.,;,.

Our luck seems to run out when we try to execute the above program because the
above run-time error is generated. One probable reason for this could be the poor
formation of the function. Every function should have the instruction 'end of function'
incorporated in it. We obviously overlooked this fact in our haste.

a.il
{
.entrypoint
ret
}

243885141.doc 7 od 372
The 'end of function' instruction is called ret. All well formed functions have to end with
this instruction.

6utput
,.ception occurred: )-stem.7adImage0ormat,.ception: ,.ception from %2,)1L$:
8.988:8887. 0ailed to load 3:\IL\A.,;,.

On executing the function, we get the same error again. Where could we have faltered
this time?

a.il
.assembly mukhi {}
{
.entrypoint
ret
}

The blunder was that we forgot to use the mandatory directive called assembly followed
by a name. We have incorporated it in the code above, and have used the name mukhi
followed by a pair of empty curly braces {}. The assembly directive is used to give a name
to the program. It is also called a deployment unit.

The code above is the smallest program that can be assembled without any errors,
though it does not perform anything useful when executed. It does not have any
function called Main. It only has a function called vijay with the entrypoint directive.
The program now assembles and runs with no errors at all.

The concept of assembly is extremely crucial in the .NET world and should be
thoroughly understood. We will explore this directive in the latter half of the chapter.

a.il
.assembly mukhi {}
{
.entrypoint
ret
}
.method void vijay1()
{
.entrypoint
ret
}

,rror
///// 0AIL12, /////

The cause for the above failure message is that the above program has two functions,
vijay and vijay1, with each containing the .entrypoint directive. As mentioned earlier,
this directive specifes as to which function is to be executed frst.

243885141.doc 8 od 372
Thus, in functionality, it is akin to the Main function in C#. When C# code gets
converted into IL code, the code contained in the function Main gets converted into a
function in IL and contains the directive .entrypoint. For example, if the frst function to
be executed in a COBOL program is called abc, the code generated in IL inserts the
.entrypoint directive in this function.

In conventional programming languages, the function to be executed frst has to have a
specifc name, eg. Main, but in IL, only the .entrypoint directive is required. Therefore,
since a program can have only one starting point, only one function in the IL code is
allowed to contain the .entrypoint directive.

It is pertinent to note that no error message number or explanation is generated,
making it difcult to debug this error.

a.il
.assembly mukhi {}
{
ret
.entrypoint
}

The .entrypoint directive need not be positioned as the frst or last directive in the
function. It has to merely be present in the body of the function, to herald its status as
the frst function to be executed. Directives are not assembly instructions and can even
be placed after the ret instruction. To remind you, ret signifes the end of the function
code.

a.il
.assembly mukhi {}
{
.entrypoint
call void System.Console::Writeine()
ret
}

We may have a function written in C#, ASP+ or COBOL, but the mechanism for
executing this function in IL is the same. It is as follows:

We have to use the assembler instruction call. The call instruction is to be followed by
the following details in the given sequence:

return type of the function (void).
the namespace (System).
the class (Console).
the function name (WriteLine()).

The function gets called but does not produce any output. So, we pass a parameter to
the WriteLine function.
243885141.doc 9 od 372

a.il
.assembly mukhi {}
{
.entrypoint
call void System.Console::Writeine(class System.Strin!)
ret
}

The above code has a glaring omission. When a function is called in IL, in addition to its
return type, the data type of the parameters that are being passed to the function have
to also be specifed. We have already stated that the Writeline function expects a
parameter of the class named System.String, but since no string is passed to the
function, it generates a runtime error.

Thus, there is a signifcant diference between IL and other programming language
when it comes to calling a function. In IL, when we call a function, we have to specify
everything we know about the function, including its return type and the data types of
its parameters. This ensures that the assembler can authenticate the syntactical
propriety of your code, by conducting appropriate checks at run time.

We shall now see how to facilitate passing of parameters to a function.

a.il
.assembly mukhi {}
{
.entrypoint
ldstr "hell"
ret
}

6utput
hell

The assembler instruction ldstr places a string on the stack. The name ldstr is an
abbreviated version of the text "load a string on the stack". A stack is an area of memory
that facilitates passing of parameters to a function. All functions receive their
parameters from the stack. Thus, instructions like ldstr are indispensable.
a.il
.assembly mukhi {}
.method public hidebysi! static void vijay()il mana!ed
{
.entrypoint
ldstr "hell"
ret
}

6utput
hell
243885141.doc 10 od 372

We have added some attributes to the method vijay. We shall explain them one by one
below.

public: This is called an accessibility attribute as it decides as to who all can access a
method. Public means that this method is accessible to every other part of the program.

hidebysig: A class can be derived from many other classes. The attribute hidebysig
ensures that a function in a parent class is hidden from the derived class having the
same name or signature. In this example, it makes sure that if the function vijay is
present in the base class, it is not visible in the derived class.

static: Methods can either be static or non-static. A static method belongs to a class
and not to an instance. Thus, as we have only a single class, we cannot have more than
one copy of a static function. There are no restrictions on where a static method can be
created. The function with the entrypoint directive must be static. Static functions must
have a body or source code associated with them and they are referenced using the type
name and not the instance name.

il managed: Due to its complex nature, we shall postpone the explanation of this
attribute. When the time is appropriate, its functionality will be clearly explained.

The abovementioned attributes do not modify the output of the function. In a short
while, it will become apparent to you as to why we have provided the explanation of
these attributes.

Whenever we write a program in the C# programming language, we frst specify the
keyword class, followed by the name of the class and then, we enclose the source code
within a pair of curly braces {}. This is demonstrated in a.cs

a.cs
class ###
{
}

Let us now introduce the IL directive called class.

a.il
.assembly mukhi {}
.class ###
{
{
.entrypoint
ldstr "hell"
ret
}
}

Notice the change in assembler output : Class 1 Methods: 1;
243885141.doc 11 od 372

6utput
hell

The directive .class is followed by the name of the class. It is optional in IL. Let us
enhance the functionality of the class by adding a few class attributes.

a.il
.assembly mukhi {}
.class private auto ansi ###
{
{
.entrypoint
ldstr "hell"
ret
}
}

6utput
hell

We have added three attributes to our class directive:

private: This signifes that access to the members of the class is restricted to the
current class only.
auto: This means that the layout of the class in memory will be decided only at
runtime, and not by our program.
ansi: The source code is generally divided into two main categories:
- Managed Code
- Unmanaged Code

Code written in languages like C is called unmanaged code or untrustworthy code. We
need an attribute that handles interoperability between unmanaged code and managed
code. For example, this attribute can be put to use when we want to transfer strings
between managed and unmanaged code.

If we cross the bounds of managed code and vault into the realm of unmanaged code, a
string, which is an array of 2-byte Unicode characters, will be converted into an ANSI
string, which is an array of 1-byte ANSI characters and vice versa. The modifer ansi is
used for smooth transition between managed and unmanaged code.

a.il
.assembly mukhi {}
.class private auto ansi ### e$tends System.%bject
{
{
.entrypoint
ldstr "hell"
243885141.doc 12 od 372
ret
}
}

6utput
hell

The class zzz has been derived from the class System.Object. In the .NET world, in order
to maintain type consistency, all types are ultimately derived form System.Object. Thus,
all objects have a common base class of Object. In IL, classes are derived from other
classes in the same manner as incorporated in programming languages like C++, C#
and Java.

a.il
.module aa.e$e
.subsystem &
.cor'la!s 1
.assembly e$tern mscorlib
{
.ori!inator ( ()& *+ ,1 1* -& ./ .0 && )
.hash ( (12 // 3+ C, 11 13 1/ &3 ,4 -4 .5 .- 02 -3 1- 0)
32 ,- /3 5C )
.ver 1:):22)/:21
}
.assembly a as "a"
{
.hash al!orithm )$))))+))/
.ver ):):):)
}
{
.method public hidebysi! static void vijay() il mana!ed
{
.entrypoint
ldstr "hell"
ret
}
.method public hidebysi! specialname rtspecialname instance void .ctor() il mana!ed
{
.ma$stack +
ldstr "hell1"
ldar!.)
call instance void 6mscorlib7System.%bject::.ctor()
ret
}
}

6utput
hell

You are bound to wonder as to why we have written such an ungainly program. You
need to exercise a little patience before the mist clears and it all starts to make sense.
We shall explain the newly introduced functions and attributes one by one:
243885141.doc 13 od 372

.ctor: We have introduced a new function called .ctor which calls the WriteLine function
to display hell1, but it does not get called. .ctor refers to the constructor.

rtspecialname: This attribute signifes to the runtime that the name of the function is
special and it is to be treated in a special manner.

specialname: This attribute alerts the compilers and tools that the function is special.
The runtime may choose to ignore this attribute.

instance: A normal function is called an instance function. Such a function is
associated with an object, unlike a static method, which is associated with a class.

The reason for choosing the specifed name for the function will become apparent in due
course.

ldarg.0: This is an assembler instruction which loads either the this pointer or the
address of the ZEROth parameter on the execution stack. We shall explain ldarg.0 in
detail subsequently.
mscorlib: In the program above, the function .ctor is being called from the base class
System.Object. The name of the function is normally prefxed with the name of the
library that contains the code. This library name is placed within square brackets. In
this case, it is optional because mscorlib.dll is the default library and it contains most
of the classes that .NET requires.

.maxstack: This directive specifes the maximum number of elements that can be
present on the evaluation stack when a method is being executed.

.module: All IL fles must be part and parcel of a logical entity called a module. The fle
is added to a module using the .module directive. The name of the module may be
stated as aa.exe, but the name of the executable fle remains the same as before, i.e.
a.exe.

.subsystem: This directive is used to specify the operating system on which the
executable will run. This is another way of specifying the kind of executable the
assembly is representing. Some of the numeric values and their corresponding
Operating Systems are as follows:

2 - A Windows Character Subsystem.
3 - A Windows GUI Subsystem.
5 - An older operating system called OS/2.

.corsfags: This directive is used to specify fags that are unique to a 64 bit computer. A
value of 1 indicates that it is an executable created from il and a value of 4 signifes a
library.

.assembly: We very briefy touched upon a directive called .assembly a couple of pages
earlier. Lets delve a little deeper now.
243885141.doc 14 od 372

Whatever we create is part of an entity called a manifest. The .assembly directive marks
the beginning of a manifest. In the hierarchy, the module is the next smaller entity to a
manifest. The .assembly directive specifes the assembly to which this module belongs.
A module can only contain a single .assembly directive.
The presence of this directive is mandatory for exe fles but is optional for modules in
a .dll. This is because this directive is needed to create an assembly for us. It is a basic
requirement of the .NET world. An assembly directive contains other directives.

.hash: Hashing is a common technique used in the computer world and there are a
large number of hashing methods or algorithms used. This directive is used for hashing.

.ver: The .ver directive consists of 4 numbers separated by a colons. They represent the
following information in the order given below:

major version number
minor version number
build
revision number

extern: If there is a requirement to refer to other assemblies, the extern directive is
used. The code of the core .NET classes is in mscorlib.dll. Besides this dll, when our
program needs to refer to code from a large number of other dlls, the extern directive
comes into play.

originator: This is the last directive that we shall explore before we move on to explain
the essence and signifcance of the above example. This directive discloses the identity
of the creator of the dll. It contains eight bytes of the public key of the owner of the dll.
It is obviously a hash value.

Let us revise what we have done so far, step by step via a diferent approach:

(a) We started with the smallest C# program that we could write. This program was
called a.cs and contained the following code:

a.cs
class ###
{
public static void 8ain()
{
System.Console.Writeine("hi")9
}
}

(b) Then we ran the C# compiler using the following command:

&csc a.cs

Therefore, the exe fle called a.exe got created.
243885141.doc 15 od 372

(c) On the executable, we ran a program called ildasm, provided by Microsoft:

&ildasm 'out=a.t.t a.e.e

This created a text fle a.txt with the following contents:

a.t$t
:: 8icroso't (;) .<0= 3rame>ork ? -isassembler. @ersion 1.).22)/.21
:: Copyri!ht (C) 8icroso't Corp. 1,,+A2)))

:: @=able3i$up -irectory:
:: <o data.
.subsystem )$)))))))&
.cor'la!s )$)))))))1
{
.ori!inator ( ()& *+ ,1 1* -& ./ .0 && ) :: .h.....&
.hash ( (12 // 3+ C, 11 13 1/ &3 ,4 -4 .5 .- 02 -3 1- 0)
32 ,- /3 5C ) :: ;-..B.=C.........%.
.ver 1:):22)/:21
}
.assembly a as "a"
{
.ver ):):):)
}
.module aa.e$e
:: 8@?-: {+,C3.-*)A315-A11-/A.11.A,*51C4-*1045}
e$tends System.%bject
{
{
.entrypoint
:: Code si#e 11 ()$b)
.ma$stack +
?D)))): ldstr "hell"
?D)))1: call void System.Console::Writeine(class System.Strin!)
?D)))a: ret
} :: end o' method ###::vijay

.method public hidebysi! specialname rtspecialname
instance void .ctor() il mana!ed
{
:: Code si#e 14 ()$11)
.ma$stack +
?D)))1: call void System.Console::Writeine(class System.Strin!)
?D)))a: ldar!.)
?D)))b: call instance void 6mscorlib7System.%bject::.ctor()
?D))1): ret
} :: end o' method ###::.ctor

} :: end o' class ###

::EEEEEEEEEEE -?S.SS085F C%8G0=0 EEEEEEEEEEEEEEEEEEEEEEE
243885141.doc 16 od 372

When you read the above fle, you will realize that all of it has been explained earlier. We
started out with a simple C# program and then compiled it into an executable fle.
Under normal circumstances, it would have got converted into machine language or the
assembler of the computer/microprocessor that the program is running on. Once the
executable is created, we disassemble it using ildasm. The disassembled output is saved
in a new fle a.txt. This fle could be named as a.il and we could have then reversed gear
by running ilasm on it to create the executable again.

Let us take a look at the smallest VB.NET program. We have named it as one.vb and its
source code is as follows:
one.vb
Gublic 8odule modmain
Sub 8ain()
System.Console.Writeine("hell")
0nd Sub
0nd 8odule

After writing the above code, we run the Visual.Net compiler, vbc. as:

&<c one.<

This produces the fle one.exe.

Next we execute ildasm as follows:

&ildasm 'out=a.t.t one.e.e

This produces the following fle a.txt:

a.t$t

:: <o data.
.cor'la!s )$)))))))1
{
.ori!inator ( ()& *+ ,1 1* -& ./ .0 && ) :: .h.....&
.hash ( (12 // 3+ C, 11 13 1/ &3 ,4 -4 .5 .- 02 -3 1- 0)
32 ,- /3 5C ) :: ;-..B.=C..........%.
.ver 1:):22)/:21
}
.assembly e$tern 8icroso't.@isual5asic
{
.ori!inator ( ()& *+ ,1 1* -& ./ .0 && ) :: .h.....&
.hash ( (15 /2 13 -2 10 1. /2 +& 31 ,) 52 2, ,3 &1 .1 50
01 10 )- 0/ ) :: 65..H.5....).1....
.ver 1:):):)
}
.assembly one as "one"
{
243885141.doc 17 od 372
.ver 1:):):)
}
.module one.e$e
:: 8@?-: {10-1,+2)A31C2A11-/A.11.A,*51C4-*1045}
.class public auto ansi modmain
e$tends 6mscorlib7System.%bject
{
.custom instance void
68icroso't.@isual5asic78icroso't.@isual5asic.Ilobals:IlobalsJStandard8odule.ttribute::.cto
r() ( ( )1 )) )) )) )
.method public static void 8ain() il mana!ed
{
:: Code si#e 11 ()$b)
.ma$stack 1
.locals init (class System.%bject67 @D))
?D)))1: call void 6mscorlib7System.Console::Writeine(class System.Strin!)
?D)))a: ret
} :: end o' method modmain::8ain

} :: end o' class modmain

.class private auto ansi DvbGroject
{
68icroso't.@isual5asic78icroso't.@isual5asic.Ilobals:IlobalsJStandard8odule.ttribute::.cto
r() ( ( )1 )) )) )) )
.method public static void Dmain(class System.Strin!67 Ds) il mana!ed
{
.entrypoint
:: Code si#e * ()$*)
.ma$stack +
?D)))): call void modmain::8ain()
?D)))1: ret
} :: end o' method DvbGroject::Dmain
} :: end o' class DvbGroject

You would be amazed to see that the outputs produced by two diferent compilers are
almost identical. We have shown you this example to demonstrate that, irrespective of
the language you use, ultimately, the source code will get converted to IL code. Whether
we use VB.NET or C#, the same WriteLine function gets called.

Thus, the diferences between programming languages has now become a superfcial
issue. The endless debate over which language is superior has fnally been put to rest.
Thus, IL has created a situation where programmers are free to use the programming
language of their choice.

Let us now demystify the code given above.

Every VB.NET program needs to be included into a module. Weve called it modmain. All
modules in Visual Basic have to end with the keyword End, hence we see End Module.
243885141.doc 18 od 372
This is where the syntax of VB difers that from C#, which does not understand
modules.

In VB.NET, functions are known as sub-routines. We need a sub-routine to mark the
starting point of program execution. This sub-routine is called Main.

The VB.NET code not only does it refer to mscorlib.dll, but also uses the fle
Microsoft.VisualBasic.

A class called _vbProject is created in IL; as the class name is not mandatory in VB.

The function called _main is the starting sub-routine to be called as it has the
entrypoint directive. Its name is preceded by a leading underscore. These names are
chosen by the VB compiler that generates the IL code.

This function is passed an array of strings as a parameter. It has a custom directive
that deals with the concept of metadata.
Next, we have the full prototype of the function, ending with an optional series of bytes.
These bytes are part of the metadata specifcations.

The module modmain gets converted into a class having the same name. This class also
has the same directive .custom as before and a function called Main. The function uses
a directive called .locals to create a variable on the stack that can only be used within
the method. This variable exists only for the duration of the execution of the method
and dies when the method stops running.

Fields are also stored in memory but, it takes a longer time to allocate memory for
them. The word init signifes that on creation, these variables should be initialized to
their default values. The default values depend upon the type of the variable. Numbers
are always initialized to the value ZERO. The word init is followed by the data type of the
variable and fnally by its name.
-2-

IL Basics

This chapter and the next couple of them will focus on and elicit a simple belief of
ours, that if you really want to understand C# code in earnest, then the best way of
doing so is by understanding the IL code generated by the C# compiler.

So, we shall raise the curtains with a small C# program and then explain the IL
code generated by the compiler. In doing so, we will be able to kill two birds with
one stone: Firstly, we will be able to unravel the mysteries of IL and secondly, we
will obtain a more intuitive understanding of the C# programming language.

243885141.doc 19 od 372
We will frst show you a .cs fle and then a program written in IL by the C# compiler,
whose output will be the same as that of the .cs fle. The output will be displayed of
the IL code. This will enhance our understanding of not only C# but also IL. So,
without much ado, lets take the plunge.

a.cs
class ###
{
{
###.abc()9
}
public static void abc()
{
System.Console.Writeine("bye")9
}
}

c:\il&csc a.cs
c:\il&ildasm 'output=a.il a.e.e

a.il

:: <o data.
.cor'la!s )$)))))))1
{
.ori!inator ( ()& *+ ,1 1* -& ./ .0 && ) :: .h.....&
.hash ( (12 // 3+ C, 11 13 1/ &3 ,4 -4 .5 .- 02 -3 1- 0)
32 ,- /3 5C ) :: ;-..B.=C..........%.
.ver 1:):22)/:21
}
.assembly a as "a"
{
:: AAA =he 'ollo>in! custom attribute is added automaticallyK do not uncomment AAAAAAA
:: .custom instance void 6mscorlib7System.-ia!nostics.-ebu!!able.ttribute::
:: .ctor(boolK bool) ( ( )1 )) )) )1 )) )) )
.ver ):):):)
}
.module a.e$e
:: 8@?-: {&C,&+**)A2.)2A11-1A,)+,A,412-1-*/0)&}
{
.method public hidebysi! static void 8ain() il mana!ed
{
.entrypoint
:: Code si#e 1* ()$1))
.ma$stack +
243885141.doc 20 od 372
?D)))): ldstr "hi"
?D)))a: call void ###::abc()
?D)))': ret
} :: end o' method ###::8ain

.method public hidebysi! static void abc() il mana!ed
{
:: Code si#e 11 ()$b)
.ma$stack +
?D)))): ldstr "bye"
?D)))a: ret
} :: end o' method ###::abc

.method public hidebysi! specialname rtspecialname
instance void .ctor() il mana!ed
{
:: Code si#e 4 ()$4)
.ma$stack +
?D)))): ldar!.)
?D)))1: call instance void 6mscorlib7System.%bject::.ctor()
?D)))*: ret
} :: end o' method ###::.ctor



The above code is generated by the il disassembler

After executing ildasm on the exe fle, we studied the IL code generated by the
program. Subsequently, we eliminated parts of the code that did not ameliorate our
understanding of IL. This consisted of some comments, directives, functions etc.
The remaining IL code presented is as close to the original as possible.

0dited a.il
.assembly mukhi {}
{
{
.entrypoint
ldstr "hi"
call void ###::abc()
ret
}
{
ldstr "bye"
ret
243885141.doc 21 od 372
}
}

c:\il&ilasm a.il

6utput
hi
-e

The advantage of this technique of mastering IL by studying the IL code itself is
that, we are learning from the master, i.e. the C# compiler, on how to write decent
IL code. We cannot fnd a better authority than the C# compiler to enlighten us
about IL.

The rules for creating a static function abc remain the same as any other function
such as Main or vijay. As abc is a static function, we have to use the static modifer
in the .method directive.

When we want to call a function, the following information has to be provided in the
order given below:

the return data type.
the class name.
the function name to be called.
the data types of the parameters.

The same rules also apply when we call the .ctor function from the base class. It is
mandatory to write the name of the class before the name of the function. In IL, no
assumptions are made about the name of the class. The name defaults to the class
we are in while calling the function.

Thus, the above program frst displays "hi" using the WriteLine function and then
calls the static function abc. This function too uses the WriteLine function to
display "bye".

a.cs
class ###
{
{
}
static ###()
{
}
}

243885141.doc 22 od 372
a.il
.assembly mukhi {}
{
{
.entrypoint
ldstr "hi"
ret
}
.method private hidebysi! specialname rtspecialname static void .cctor() il mana!ed
{
ldstr "bye"
call void 6mscorlib7System.Console::Writeine(class System.Strin!)
ret
}
}

6utput
-e
hi

Static constructors are always called before any other code is executed. In C#, a
static constructor is merely a function with the same name as a class. In IL, the
name of the function changes to .cctor. Thus, you may have observed that in the
earlier example, we got a free function called ctor.

Whenever we have a class with no constructors, a free constructor with no
parameters is created. This free constructor is given the name .ctor. This knowledge
should enhance our ability as C# programmers, as we are now in a better position
to comprehend as to what goes on below the hood.

The static function gets called frst and the function with the entrypoint directive
gets called thereafter.

a.cs
class ###
{
{
ne> ###()9
}
###()
{
}
}
a.il
.assembly mukhi {}
243885141.doc 23 od 372
{
{
.entrypoint
ldstr "hi"
ne>obj instance void ###::.ctor()
pop
ret
}
.method private hidebysi! specialname rtspecialname instance void .ctor() il mana!ed
{
ldar!.)
ldstr "bye"
ret
}
}

6utput
hi
-e

The keyword new in C# gets converted to the assembler instruction newobj. This
provides evidence that IL is not a low level assembler, and that it can also create
objects in memory. The instruction newobj creates a new object in memory. Even in
IL, we are shielded from what new or newobj really does. This demonstrates that IL
is not just another high level language, but is designed in such a way that other
modern languages can be compiled to it.

The rules for using newobj are the same as that for calling a function. The full
prototype of the function name is required. In this case, we are calling the
constructor without any parameters, hence the function .ctor is called. In the
constructor, the WriteLine function is called.

As we had promised earlier, we are going to explain the instruction ldarg.0 here.
Whenever we create an object that is an instance of a class, it contains two basic
entities:

functions
felds or variables i.e. data.

When a function gets called, it does not know or care as to where it is being called
from or who is calling it. It receives all its parameters of the stack. There is no
point in having two copies of a function in memory. This is because, if a class
contains a megabyte of code, each time we say 'new' on it, an additional megabyte of
memory will be occupied.
243885141.doc 24 od 372

When new is called for the frst time, memory gets allocated for the code and the
variables. But thereafter, with every call on new, fresh memory is allocated only for
the variables. Thus, if we have fve instances of a class, there will be only one copy
of the code, but fve separate copies of the variables.

Every non-static or instance function is passed a handle which indicates the
location of the variables of the object that has called this function. This handle is
called the this pointer. 'this' is represented by ldarg.0. This handle is always passed
as the frst parameter to every instance function. Since it is always passed by
default, it is not mentioned in the parameter list of a function.

All the action takes place on the stack. The instruction pop removes whatever is on
the top of the stack. In this example, we use it to remove the instance of zzz that
has been placed on top of the stack by the newobj instruction.

a.cs
class ###
{
{
ne> ###()9
}
###()
{
}
static ###()
{
System.Console.Writeine("byes")9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldstr "hi"
pop
ret
}
{
ldar!.)
ldstr "bye"
243885141.doc 25 od 372
ret
}
.method private hidebysi! specialname rtspecialname static void .cctor() il mana!ed
{
ldstr "byes"
ret
}
}
6utput
-es
hi
-e

The static constructor always gets called frst whereas the instance constructor gets
called only after new. IL enforces this sequence of execution. The calling of the base
class constructor is not mandatory. Hence, to save space in our book, we have not
shown its code in all the programs.

In some cases, if we do not include the code of a constructor, the programs do not
work. Only in these cases, the code of the constructor has been included. The static
constructor does not call the base class constructor, also this is not passed to
static functions.

a.cs
class ###
{
{
int i ( *9
lon! j ( 49
}
}

a.il
.assembly mukhi {}
.class private auto ansi ### e$tends System.%bject {
.method public hidebysi! static void vijay() il mana!ed {
.entrypoint
.locals (int&2 @D)Kint*/ @D1)
ldc.i/.*
stloc.)
ldc.i/.4
conv.i+
stloc.1
ret
}
}

243885141.doc 26 od 372
We have created two variables called i and j in our function Main in the C#
program. They are local variables and are created on the stack. On conversion to IL,
if you notice, the names of the variables are lost.

The variables get created in IL through the locals directive, which assigns its own
names to the variables, beginning with V_0 and V_1 and so on. The data types are
also altered from int to int32 and from long to int64. The basic types in C# are
aliases. They all get converted to data types that IL understands.

The task on hand is to initialize the variable i to a value of 6. This value has to be
loaded on the stack or evaluation stack. The instruction to do so is ldc.i4.value. An
i4 takes up four bytes of memory.

The value mentioned in the syntax above is the constant that has to be put on the
stack. After the value 6 has been loaded on to the stack, we now need to initialize
the variable i to this value. The variable i has been renamed as V_0 and is the frst
variable in the locals directive.

The instruction stloc.0 takes the value present at the top of the stack i.e. 6 and
initializes the variable V_0 to it. The process of initializing a variable is defnitely
complicated.

The second ldc instruction copies the value of 7 onto the stack. On a 32 bit
machine, memory can only be allocated in chunks of 32 bytes. In the same vein, on
a 64 bit machine, the memory is allocated in chunks of 64 bytes.

The number 7 is stored as a constant and requires only 4 bytes, but a long requires
8 bytes. Thus, we need to convert the 4 bytes to 8 bytes. The instruction conv.i8 is
used for this purpose. It places a 8 byte number on the stack. Only after doing so,
we use stloc.1 to initialize the second variable V_1 to the value of 7. Hence stloc.1

Thus, the ldc series is used to place a constant number on the stack and stloc is
utilized to pick up what is on the stack and initialize a local to that value.
a.cs
class ###
{
static int i( * 9
public lon! j ( 49
{
}
}

a.il
.assembly mukhi {}
{
.'ield private static int&2 i
243885141.doc 27 od 372
.'ield public int*/ j
{
.entrypoint
ret
}
.method public hidebysi! specialname rtspecialname static void .cctor() il mana!ed
{
ldc.i/.*
sts'ld int&2 ###::i
ret
}
{
ldar!.)
ldc.i/.4
conv.i+
st'ld int*/ ###::j
ldar!.)
ret
}
}

Now you will fnally be able to see the light at the end of the tunnel and understand
as to why we wanted you to read this book in the frst place.

Let us understand the above code, one feld at a time. We have created a variable i
that is static and initialized it to the value of 6. Since the variable i has not been
given an access modifer, the default value is private. The static modifer of C# is
applicable to variables in IL also.

The real action begins now. The variable needs to be assigned an initial value. This
value must be assigned in the static constructor only, because the variable is static.
We employ ldc to place the value 6 on the stack. Note that the locals directive is not
used here.

To initialize i, we use the instruction stsfd that looks for a value on top of the
stack. The next parameter to the instruction stsfd is the number of bytes it has to
pick up from the stack to initialize the static variable. In this case, the number of
bytes specifed is 4.

The variable name is preceded by the name of the class. This is in contrast to the
syntax of local variables.

For the instance variable j, since its access modifer was public in C#, on
conversion to IL, its access modifer is retained as public. Since it is an instance
variable, its value gets initialized in the instance constructor. The instruction used
here is stfd and not stsfd. Here we need 8 bytes of the stack.
243885141.doc 28 od 372

The rest of the code remains the same as before. Thus, we can see that the
instruction stloc is used to initialize locals and the instruction stfd is used to
initialise felds.

a.cs
class ###
{
static int i( * 9
public lon! j ( 49
{
ne> ###()9
}
static ###()
{
System.Console.Writeine("###s")9
}
###()
{
System.Console.Writeine("###i")9
}
}

a.il
.assembly mukhi {}
{
.'ield public int*/ j
{
.entrypoint
pop
ret
}
{
ldc.i/.*
sts'ld int&2 ###::i
ldstr "###s"
ret
}
{
ldar!.)
ldc.i/.4
conv.i+
st'ld int*/ ###::j
ldar!.)
ldstr "###i"
243885141.doc 29 od 372
ret
}
}

6utput
===s
===i

The main purpose of the above example is to verify whether the variable is
initialized frst or the code contained in a constructor gets called frst. The IL output
demonstrates very lucidly that, frst all the variables get initialized and thereafter,
the code in a constructor gets executed.

You may have also noticed that the base class constructor gets executed frst and
then, and only then, does the code that is written in a constructor, get called.

This nugget of knowledge is sure to enhance your understanding of C# and IL.

a.cs
class ###
{
{
System.Console.Writeine(1))9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.s 1)
call void 6mscorlib7System.Console::Writeine(int&2)
ret
}
}

6utput
18

We can print a number instead of a string by overloading the WriteLine function.

First, we push the value 10 onto the stack using the ldc family. Observe carefully,
the instruction now is ldc.i4.s and then the value of 10. Any instruction takes 4
bytes in memory, but when followed by .s takes only one byte.

243885141.doc 30 od 372
Then the C# compiler calls the correct overloaded version of the WriteLine function,
which accepts an int32 value from the stack.

This is similar to printing strings.

a.cs
class ###
{
{
System.Console.Writeine("{)}"K2))9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 @D))
ldstr "{)}"
ldc.i/.s 2)
stloc.)
ldloca.s @D)
bo$ 6mscorlib7System.?nt&2
call void 6mscorlib7System.Console::Writeine(class System.Strin!Kclass System.%bject)
ret
}
}

6utput
>8

We shall now delve on how to print a number on the screen.

The WriteLine function accepts a string followed by a variable number of objects.
The {0} prints the frst object after the comma. Even though there is no variable in
the C# code, on conversion to IL code, a variable of type int32 is created.

The string {0} is loaded on the stack using our trustworthy ldstr. Then, we place the
number that is to be passed as a parameter to the WriteLine function, on the stack.
To do so, we use ldc.i4.s which loads the constant value on the stack. After this, we
initialize the variable V_0 to 20 with the stloc.0 instruction. and then ldloca.s loads
the address of the local varable on the stack.

The major roadblock that we experience here is that the WriteLine function accepts
a string followed by an object as the next parameter. In this case, the variable is of
value type and not reference type.
243885141.doc 31 od 372

An int32 is a value type variable whereas the WriteLine function wants a full-
fedged object of a reference type.

How do we solve the dilemma of converting a value type into a reference type?

As informed earlier, we use the instruction ldloca.s to load the address of the local
variable V_0 onto the stack. Thus, our stack contains a string followed by the
address of a value type variable, V_0.
Next, we call an instruction called box. There are only two types of variables in
the .NET world i.e. value types and reference types. Boxing is the method that .NET
uses to convert a value type variable into a reference type variable.

The box instruction takes an unboxed or value type variable and converts it into a
boxed or reference type variable. The box instruction needs the address of a value
type on the stack and allocates space on the heap for its equivalent reference type.

The heap is an area of memory used to store reference types. The values on the
stack disappear at the end of a function, but the heap is available for a much longer
duration.

Once this space is allocated, the box instruction initializes the instance felds of the
reference object. Then, it assigns the memory location in the heap, of this newly
constructed object to the stack, The box instruction requires a memory location of a
locals variable on the stack.

The constant stored on the stack has no physical address. Thus, the variable V_0 is
created to provide the memory location.

This boxed version on the heap is similar to the reference type variable that we are
familiar with. It really does not have any type and thus looks like System.Object. To
access its specifc values, we need to unbox it frst. The WriteLine function does this
internally.

The data type of the parameter that is to be boxed must be the same as that of the
variable whose address has been placed on the stack. We will subsequently explain
these details.

a.cs
class ### {
static int i ( 1)9
public static void 8ain() {
System.Console.Writeine("{)}"Ki)9
}
}
a.il
.assembly mukhi {}
243885141.doc 32 od 372
{
{
.entrypoint
ldstr "{)}"
lds'lda int&2 ###::i
call void 6mscorlib7System.Console::Writeine(class System.Strin!K class System.%bject)
ret
}
{
ldc.i/.s 1)
sts'ld int&2 ###::i
ret
}
}

6utput
18

The above code is used to display the value of a static variable. The .cctor function
initializes the static variable to a value of 10. Then, the string {0} is stored on the
stack.

The function ldsldfa loads the address of a static variable of a certain data type on
the stack. Then, as usual, box takes over. The explanation regarding the
functionality of 'box' given above is relevant here also.

Static variables in IL work in the same way as instance variables. The only
diference is in the fact that they have their own set of instructions. Instructions
like box need a memory location on the stack without discriminating between static
and instance variables.

a.il
.assembly mukhi {}
{
{
.entrypoint
ldstr "{)}"
call void 6mscorlib7System.Console::Writeine(class System.Strin!K class System.%bject)
ret
}
.method public hidebysi! specialname rtspecialname instance void .ctor() il mana!ed {
243885141.doc 33 od 372
ldar!.)
ret
}
}

6utput
8

The only variation that we indulged in from the earlier program is that we have
removed the static constructor. All static variables and instance variables get
initialized internally to ZERO. Thus, IL does not generate any error. Internally, even
before the static constructor gets called, the feld i is assigned an initial value of
ZERO.

a.cs
class ###
{
{
int i ( 1)9
System.Console.Writeine(i)9
}
}
a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 @D))
ldc.i/.s 1)
stloc.)
ldloc.)
ret
}
}

6utput
18

We have initialised the local i to a value of 10. This cannot be done in the
constructor since the variable i has been created on the stack. Then, stloc.0 has
been used to assign the value of 10 to V_0. Thereafter, ldloc.0 has been ustilised to
place the variable V_0 on the stack, so that it is available to the WriteLine function.

The Writeline function thereafter displays the value on the screen. A feld and a
local behave in a similar manner, except that they use separate sets of instructions.

a.il
243885141.doc 34 od 372
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 @D))
ldloc.)
ret
}
}
6utput
?1@98>99

All local variables have to be initialised, or else, the compiler will generate an
unintelligible error message. Here, even though we have eliminated the ldc and stloc
instructions, no error is generated at runtime. Instead, a very large number is
displayed.

The variable V_0 has not been initialised to any value. It was created on the stack
and contained whatever value was available at the memory location assigned to it.
On your machine, the output will be very diferent than ours.

In a similar situation, the C# compiler will give you an error and not allow you to
proceed further, because the variable has not been initialized. IL, on the other
hand, is a strange kettle of fsh. It is much more lenient in its outlook. It does very
few error or sanity checks on the source code. This has its drawback, maening, the
programmer has to be much more responsible and careful while using IL.

a.cs
class ### {
static int i9
i ( 1)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.s 1)
sts'ld int&2 ###::i
ldstr "{)}"
243885141.doc 35 od 372
ret
}
}

6utput
18

In the above example, a static variable has been initialised inside a function and
not at the time of its creation, as seen earlier. The function vijay calls the code
present in the static constructor.

The process given above is the only way to initialize a static or an instance variable.

a.cs
class ###
{
{
### a ( ne> ###()9
a.abc(1))9
}
void abc(int i) {
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class ### @D))
stloc.)
ldloc.)
ldc.i/.s 1)
call instance void ###::abc(int&2)
ret
}
.method private hidebysi! instance void abc(int&2 i) il mana!ed
{
ldstr "{)}"
ldar!a.s i
ret
}
}

6utput
243885141.doc 36 od 372
18

The above program demonstrates as to how we can call a function with a single
parameter. The rules for placing parameters on the stack are similar to those for the
WriteLine function.

Now let us comprehend as to how a function receives parameters from the stack.

We begin by stating the data type and parameter name in the function declaration.
This is similar to the workings in C#.

Next, we use the instruction ldarga.s to load the address of the parameter i, onto
the stack. box will then convert the value type of this objct into object type and
fnally WriteLine function uses these values to display the output on the screen.

a.cs
class ###
{
{
### a ( ne> ###()9
a.abc(1))9
}
void abc(object i)
{
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class ### @D)Kint&2 @D1)
stloc.)
ldloc.)
ldc.i/.s 1)
stloc.1
ldloca.s @D1
call instance void ###::abc(class System.%bject)
ret
}
.method private hidebysi! instance void abc(class System.%bject i) il mana!ed {
ldstr "{)}"
ldar!.1
ret
243885141.doc 37 od 372
}
}

6utput
18

In the above example, we have converted an int into an object because, the
WriteLine function requires the parameter to be of this data type.

The only method of achieving this conversion is by using the box instruction. The
box instruction converts an int into an object.

In the function abc, we accept a System.Object and we use the instruction ldarg
and not ldarga. The reason being, we require the value of the parameter and not its
address. The dot after the name signifes the parameter number. In order to place
the values of parameters on the stack, a new instruction is required.

Thus, IL handles locals, felds and parameters with their own set of instructions.

a.cs
class ###
{
{
int i9
### a ( ne> ###()9
i ( ###.abc()9
}
static int abc()
{
return 2)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 @D)Kclass ### @D1)
stloc.1
call int&2 ###::abc()
stloc.)
ldloc.)
ret
}
243885141.doc 38 od 372
.method private hidebysi! static int&2 abc() il mana!ed
{
.locals (int&2 @D))
ldc.i/.s 2)
ret
}
}

6utput
>8

Functions return values. Here, a static function abc has been called. We know from
the function's signature that it returns an int. Return values are stored on the
stack.

Thus, the stloc.1 instruction picks up the value on the stack and places it in the
local V_1. In this specifc case, it is the return value of the function.

Newobj is also like a function. It returns an object which, in our case, is an instance
of the class zzz, and puts it on the stack.

The stloc instruction has been used repeatedly to initialize all our local variables.
Just to refresh your memory, ldloc does the reverse of this process.

A function has to just place a value on the stack using the trustworthy ldc and then
cease execution using the ret instruction.

Thus, the stack has a dual role to play:

It is used to place values on the stack.
It receives the return values of the functions.

a.cs
class ###
{
int i9
{
### a ( ne> ###()9
a.i ( ###.abc()9
System.Console.Writeine(a.i)9
}
static int abc()
{
return 2)9
}
}
243885141.doc 39 od 372

a.il
.asseml- mukhi AB
.class pri<ate auto ansi === e.tends )-stem.6Cect
A
.*eld pri<ate int@> i
.method pulic hide-sig static <oid <iCa-DE il managed
A
.entr-point
.locals Dclass === FG8E
newoC instance <oid ===::.ctorDE
stloc.8
ldloc.8
call int@> ===::acDE
stHd int@> ===::i
ldloc.8
ldHd int@> ===::i
call <oid ImscorliJ)-stem.3onsole::5riteLineDint@>E
ret
B
.method pri<ate hide-sig static int@> acDE il managed
A
.locals Dint@> FG8E
ldc.iK.s >8
ret
B
B
6utput
>8

The only innovation and novelty that has been introduced in the above example is
that the return value of the function abc has been stored in an instance variable.

Stloc assigns the value on the stack to a local variable.
Ldloc, on the other hand, places the value of a local variable on the stack.

It is not understood as to why the object that looks like zzz has to be put on the
stack again, especially since abc is a static function and not an instance function.
Mind you, static functions are not passed the this pointer on the stack.

Thereafter, the function abc is called, which places the value 20 on the stack. The
instruction stfd picks up the value 20 from the stack, and initializes the instance
variable i with this value.

Local and instance variables are handled in a similar manner except that, the
instructions for their initialization are diferent.

The instruction ldfd does the reverse of what stfd does. It places the value of an
instance variable on the stack to make it available for the WriteLine function.
-3-
243885141.doc 40 od 372

Selection and Repetition

In IL, a label is a name followed by the colon sign i.e ":". It gives us the ability to
jump from one part of the code to another, unconditionally. We have been
constantly witnessing the labels in the il code generated by the disassembler. For
e.g.

?D)))): ldstr "hi"
?D)))a: call void ###::abc()
?D)))': ret

The words preceding the colon are labels. In the program given below, we have
created a label called a2 in the abc function. The instruction br facilitates the
jumping to any label in the program, whenever desired.

a.il
.assembly mukhi {}
{
{
.entrypoint
stloc.1
stloc.)
ldloc.)
ret
}
{
.locals (int&2 @D))
ldc.i/.s 2)
br.s a2
ldc.i/.s &)
a2: ret
}
}

6utput
>8

The function abc demonstrates this concept. In this function, the code bypasses the
instruction ldc.i4.s 30. Therefore, the return value is displayed as 20, and not 30.
Thus, IL uses the br instruction to jump unconditionally to any part of the code.
243885141.doc 41 od 372
(The assembly instruction br takes 4 bytes whereas br followed by .s i.e br.s takes 1
byte, the same explanation is applicable for every instruction tagged with .s)

The br instruction is one of the key pivots on which IL revolves.

a.cs
class ###
{
static bool i ( true9
{
i' (i)
}
}

a.il
.assembly mukhi {}
{
.'ield private static bool i
{
.entrypoint
lds'ld bool ###::i
br'alse.s ?D))11
ldstr "hi"
?D))11: ret
}
{
ldc.i/.1
sts'ld bool ###::i
ret
}
}

6utput
hi

We have initialized the static variable to the value true in our C# program.

Static variables, if they are felds, are initialized in the static constructor
.cctor. This is shown in the above example.
Local variables, on the other hand, are initialized in the function that they
are present in.

Here, surprisingly, the value 1 is placed on the stack in the static constructor using
the ldc instruction. Even though the feld i had been defned to be of type bool in
both, C# and IL, there is no sign of true or false values.
243885141.doc 42 od 372

Next, stsfd is used to initialize the static variable i to the value 1 even though the
variable is of the type bool. This proves that IL supports the concept of a data type
called bool but, it does not recognise the words true and false. Thus, in IL, bool
values are simply aliases for the numbers 1 and 0 respectively.

The bool operators TRUE and FALSE are artefacts introduced by C# to make the life
of programmers easier. Since IL does not support these artefacts directly, it uses
the numbers 1 and 0 instead.

The instruction ldsfd places the value of a static variable on the stack. The brfalse
instruction scans the stack. If it fnds the number as 1, it interprets it as TRUE,
and if it fnds the number 0, it interprets it as FALSE.

In this example, the value it fnds on the stack is a 1 or TRUE and hence, it does
not jump to the label IL_0011. On conversion from C# to IL, ildasm replaces the
label with a name beginning with IL_.

The instruction brfalse means "jump to the label if FALSE". This difers from br,
which always results in a jump. Thus, brfalse is called a conditional jump
instruction.

There is no instruction in IL that provides the functionality of the if statement. The
if statement of C# gets converted to branch instructions in IL. None of the
assemblers that we have worked with, support high level concepts like the if
construct.

It can be appreciated from what we have just learnt that, it is imperative to gain
mastery over IL. This will help one to gain the ability to diferentiate as to which
concepts are a part of IL and which ones have been introduced by the designers of
the programming languages.

It is signifcant to note that if IL does not support a certain feature, it cannot be
implemented in any .NET programming language. Thus, the importance of
familiarising oneself with the various concepts that IL supports, cannot be over
emphasised.

a.cs
class ###
{
{
i' (i)
else
System.Console.Writeine("'alse")9
243885141.doc 43 od 372
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
lds'ld bool ###::i
br'alse.s ?D))1&
ldstr "hi"
br.s ?D))1d
?D))1&: ldstr "'alse"
?D))1d: ret
}
{
ldc.i/.1
sts'ld bool ###::i
ret
}
}

6utput
hi

An if-else statement is extremely simple to comprehend in a programming language,
but it is equally bafing in IL. IL checks whether the value on the stack is 1 or 0.

If the value on the stack is 1, as in this case, it calls the WriteLine function
with the parameter "hi", and then jumps to the label IL_001d using the
unconditional jump instruction br.
If the value on the stack is 0, the code jumps to IL_0013 and the WriteLine
function prints false.

Thus, to implement an if-else construct in IL, a conditional and unconditional jump
are required. The complexity of the IL code increases dramatically if we use multiple
if-else statements.

You can now appreciate the intelligence level of the people who write compilers.

a.cs
class ###
{
243885141.doc 44 od 372
{
}
void abc( bool a)
{
i' (a)
{
int i ( )9
}
i' ( a)
{
int i ( &9
}
}
}

a.il
.assembly mukhi {}
.class public auto ansi ### e$tends 6mscorlib7System.%bject
{
.'ield private int&2 $
{
.entrypoint
ret
}
.method private hidebysi! instance void abc(bool a) il mana!ed
{
.locals (int&2 @D)Kint&2 @D1)
ldar!.1
br'alse.s ?D)))1
ldc.i/.)
stloc.)
?D)))1: ldar!.1
br'alse.s ?D)))a
ldc.i/.&
stloc.1
?D)))a: ret
}
}

The C# programming language can complicate life. In an inner set of braces, we
cannot create a variable that is already created earlier, in an outer set. The above
C# program is syntactically correct since the braces are at the same level.

In IL, life is comparatively hassle free. The two i's become two separate variables
V_0 and V_1. Thus, IL does not impose any of the restrictions on variables.

a.cs
class ###
{
243885141.doc 45 od 372
{
>hile (i)
{
}
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
br.s ?D)))c
?D)))2: ldstr "hi"
?D)))c: lds'ld bool ###::i
brtrue.s ?D)))2
ret
}
{
ldc.i/.1
sts'ld bool ###::i
ret
}
}

On seeing the disassembled code, you will comprehend as to why programmers do
not write IL code for a living. Even a simple while loop gets converted into IL code of
stupendous complexity.

For a while construct, unconditionally a jump is made to the label IL_000c which is
at the end of the function. Here, it loads the value of the static variable i on the
stack.

The next instruction, brtrue, does the reverse of what the instruction brfalse does.
It is implemented as follows:

If the uppermost value on the stack, i.e. the value of the feld i, is 1, it jumps
to label IL_0002. Then the value "hi" is put on the stack and the WriteLine
function is called.
If the stack value is 0, the program will jump to the ret instruction.

The above program, as you may have noticed, does not intend to stop. It continues
to fow like a perennial stream of water originating from a gigantic glacier.

a.cs
243885141.doc 46 od 372
class ###
{
static int i ( 29
{
i ( i L &9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
lds'ld int&2 ###::i
ldc.i/.&
add
sts'ld int&2 ###::i
lds'ld int&2 ###::i
ret
}
{
ldc.i/.2
sts'ld bool ###::i
ret
}
}

6utput
?

IL does not have an operator for adding two numbers. The add instruction has to be
used instead.

The add instruction requires the two numbers to be added, to be frst made
available on the stack. Therefore, the ldsfd instruction places the value of the static
variable i and the constant value 3 on the stack. The add instruction then adds
them up and places the resultant sum on the stack. It also removes the two
numbers, that were used in the addition, from the stack.

Most instructions in IL get rid of the parameters that are placed on the stack for the
instruction to operate upon, once the instruction has been executed.

The instruction stsfd is used to initialize the static variable i with the resultant
sum of the addition. The rest of the code simply displays the value of the variable i.

243885141.doc 47 od 372
There is no equivalent for the ++ operator in IL. It gets converted to the instruction
ldc.i4.1. In the same vein,to multiply two numbers, the mul instruction is used, to
subtract, sub is used and so on. They all have their equivalents in IL. The code
following it remains the same.

a.cs
class ### {
static bool i9
static int j ( 1,9
i ( j M 1*9
}
}

a.il
.assembly mukhi {}
{
.'ield private static int&2 j
{
.entrypoint
lds'ld int&2 ###::j
ldc.i/.s 1*
c!t
sts'ld bool ###::i
lds'ld bool ###::i
call void 6mscorlib7System.Console::Writeine(bool)
ret
}
{
ldc.i/.s 1,
sts'ld int&2 ###::j
ret
}
}

6utput
$rue

We shall now delve upon how IL handles the conditional operator. Let us consider
the statement j > 16 in C#. IL frst pushes the value of j on the stack followed by the
constant value16. It then calls the operator cgt, which is being introduced for the
frst time in our source code. This instruction checks if the frst value on the stack
is larger than the second. If so, it puts the value 1 (TRUE) on the stack, or else it
puts the value 0 (FALSE) on the stack. This value is then stored in the variable i .
Using the WritleLine function, a bool output is produced, hence we see True
displayed.

243885141.doc 48 od 372
In the same vein, the < operator gets converted to the instruction clt, which checks
if the frst value on the stack is smaller than the second. Thus, we can see that IL
has its own set of logical operators to internally handle the basic logical operations.

a.cs
class ###
{
static bool i9
static int j ( 1,9
{
i ( j (( 1*9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
lds'ld int&2 ###::j
ldc.i/.s 1*
ceN
sts'ld bool ###::i
lds'ld bool ###::i
ret
}
{
ldc.i/.s 1,
sts'ld int&2 ###::j
ret
}
}

6utput
0alse

The operator == is the EQUALITY operator It also needs the two operands to be
checked for equality, be placed on the stack. It thereafter uses the ceq instruction to
check for equality. If they are equal, it places the value 1 (TRUE) on the stack, and if
they are not equal, it places the value 0 (FALSE) on the stack . The ceq instruction
is an integral part of the logical instruction set of IL.

a.cs
class ###
{
243885141.doc 49 od 372
static bool i9
static int j ( 1,9
{
i ( j M( 1*9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
lds'ld int&2 ###::j
ldc.i/.s 1*
c!t
ldc.i/.)
ceN
sts'ld bool ###::i
lds'ld bool ###::i
ret
}
{
ldc.i/.s 1,
sts'ld int&2 ###::j
ret
}
}

6utput
0alse

The implementation of the "less than or equal to" (i.e. <= ) and the "greater than or
equal to" (i.e. >=)operator is a little more complex. They both actually have 2
conditions rolled into one.

In the case of >=, IL frst uses the cgt instruction to check if the frst number is
greater than the second one. If so, it will return the value 1 or else it will return
value 0. If the frst condition is FALSE, the ceq instruction checks for the two
numbers to be equal. If so, it returns a TRUE, or else it returns a FALSE.

Let us try to decipher the above IL code from a slightly diferent perspective. We are
comparing the value 19 with 16. In this case, the instruction cgt will put the value
1 on the stack since 19 is greater than 16. The value 0 is put on the stack using the
instruction ldc.
243885141.doc 50 od 372

The ceq will compare the value 1 returned by the instruction cgt and the value 0
that was put on the stack by the instruction ldc. Since these two values are not
equal, ceq will return 0 or FALSE on the stack.

Let us change the value of the feld j in the static constructor to 1. Now, since the
number 1 is not greater than 16, the cgt instruction will place the value FALSE or 0
on the stack. Thereafter, another 0 is placed on the stack by the ldc instruction.
Now, when the instruction ceq compares the two values, since they are both 0, it
return TRUE.

Now, if we change the value of j to 16, the cgt instruction will return a FALSE
because 16 is not greater than 16. Thereafter, since the value of 0 is placed on the
stack by the instruction ldc, both the values passed to the instruction ceq will be
0. Since a 0 is equal to a 0, the value returned will be 1 or TRUE.

If you have not understood the above explanation, remove the lines ldc.i4.0 and ceq
from the source code and observe the output.

a.cs
class ###
{
static bool i9
static int j ( 1,9
{
i ( j O( 1*9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
lds'ld int&2 ###::j
ldc.i/.s 1*
ceN
ldc.i/.)
ceN
sts'ld bool ###::i
lds'ld bool ###::i
ret
}
243885141.doc 51 od 372
{
ldc.i/.s 1,
sts'ld int&2 ###::j
ret
}
}

6utput
$rue

The "not equal to" operator i.e. != is the reverse of ==. It uses two ceq instructions.
The frst ceq instruction is used to check whether the values on the stack are equal.
If they are equal, it returns TRUE; if they are not equal, it returns FALSE.

The second ceq compares the result of the earlier ceq with a FALSE. If the result of
the frst ceq is TRUE, the fnal answer is FALSE and vice versa.

This is truly an ingenious way of negating a value !

a.cs
class ###
{
static int i ( 19
{
>hile ( i P( 2)
{
iLL9
}
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
br.s ?D))1+
?D)))2: lds'ld int&2 ###::i
lds'ld int&2 ###::i
ldc.i/.1
add
sts'ld int&2 ###::i
?D))1+: lds'ld int&2 ###::i
ldc.i/.2
ble.s ?D)))2
ret
243885141.doc 52 od 372
}
{
ldc.i/.s 1
sts'ld int&2 ###::i
ret
}
}

6utput
1
>

We shall now refocus on the while loop after the slight digression into conditional
statements. This diversion was essential because we use conditional statements in
loops such as the while loop. A while loop containing a condition is slightly
complex.

Let us go straight to label IL_0018, which is at the end of the zzz function in IL
code. The condition is present here. The value of i (i.e. 1) is stored on the stack.
Next, the constant 2 is placed on the stack.

If you revisit the C# code, the condition in the while statement is i <= 2. The
instruction ble.s is based on the two instructors, cgt and brfalse. This instruction
checks whether the frst value, i.e. the variable i, is less than or equal to the
second. If so, it instructs the program to jump to the label IL_0002. If not, the
program moves to the next instruction.

Thus, instructions like ble make our life simpler because we do not have to use the
instructions cgt and brfalse anymore.

In C#,the condition of a while construct is present at the top, but the code of the
condition, is present at the bottom. On conversion to IL,the code to be executed for
the duration of the while construct is placed above the code for the condition.

a.cs
class ===
A
static int i = 1!
pulic static <oid MainDE
A
for D i = 1! i L= > ! iMME
A
)-stem.3onsole.5riteLineDiE!
B
B
B

a.il
.assembly mukhi {}
243885141.doc 53 od 372
{
{
.entrypoint
ldc.i/.1
sts'ld int&2 ###::i
br.s ?D))1e
?D)))+: lds'ld int&2 ###::i
lds'ld int&2 ###::i
ldc.i/.1
add
sts'ld int&2 ###::i
?D))1e: lds'ld int&2 ###::i
ldc.i/.2
ble.s ?D)))+
ret
}
{
ldc.i/.s 1
sts'ld int&2 ###::i
ret
}
}

6utput
1
>

It has been oft repeated that the while and the for constructs provide the same
functionality, and can be interchanged.

In the for loop, the code upto the frst semicolon is to be executed only once. Hence,
the variable i that is to be initialised, is placed outside the loop. Then, we
unconditionally jump to label IL_001e to check whether the value of i is less than 2
or not. If TRUE, the code jumps to label IL_0008, which is beginning point of the
code of the for statement.

The value of i is printed using the WriteLine function. Thereafter, the value of the
variable i is increased by one and the condition is checked once again.

a.cs
public class ###
{
{
int i9
i ( 19
>hile ( i P( 2)
{
243885141.doc 54 od 372
System.Console.Write(i)9
iLL9
}
i ( 19
do
{
iLL9
} >hile ( i P( 2)9
}
}

a.il
.assembly mukhi {}
.class private auto ansi ### e$tends 6mscorlib7System.%bject
{
.entrypoint
.locals (int&2 @D))
ldc.i/.1
stloc.)
br.s ?D)))e
?D)))/: ldloc.)
call void 6mscorlib7System.Console::Write(int&2)
ldloc.)
ldc.i/.1
add
stloc.)
?D)))e: ldloc.)
ldc.i/.2
ble.s ?D)))/
ldc.i/.1
stloc.)
?D))1/: ldloc.)
ldloc.)
ldc.i/.1
add
stloc.)
ldloc.)
ldc.i/.2
ble.s ?D))1/
ret
}
}

6utput
1>1>

The diference between a do while and a while in a C# program lies in the position
at which the condition gets checked.

In a do while, the condition gets checked at the end of the loop. This means
that the code contained in it will get called at least once.
243885141.doc 55 od 372
In a while, the condition is checked at the beginning of the loop. Hence, the
code may never ever get executed.

In either case, we place the value 1 on the stack and initialise the variable i or V_1.

In the while loop, we frst jump to label IL_000e where the condition checked
is whether the variable is "less than or equal to 2". If TRUE, we jump to Label
IL_0004.
In the do while loop, frst the Write function is called and then, the rest of
the code contained in the {} braces is executed. On reaching the last line of the
code within the braces, the condition is checked.

Thus, it is easier to write a do-while loop in IL than a while loop, since the condition
is a simple check at the end of the loop.

a.cs
public class ###
{
int i 9
'or ( i ( 19 iP( 1) 9 iLL)
{
i' ( i (( 2)
break9
}
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 @D))
ldc.i/.1
stloc.)
br.s ?D))1/
?D)))/: ldloc.)
ldc.i/.2
bne.un.s ?D)))a
br.s ?D))1,
?D)))a: ldloc.)
ldloc.)
ldc.i/.1
add
stloc.)
?D))1/: ldloc.)
ldc.i/.s 1)
243885141.doc 56 od 372
ble.s ?D)))/
?D))1,: ret
}
}

6utput
1

A break statement facilitates an exit from a for loop, while loop, do-while loop etc.

As usual, we jump to the label IL_0014 where the value of variable V_0 or i is
placed on the stack. Then, we place the condition value 10 on the stack and check
whether i is smaller or larger than 10, using the instruction ble.s.

If it is smaller, we get into the loop at label IL_0004. We again place the value of the
variable i on the stack and place the value 2 of the if statement on the stack. Then,
we use the bne instruction, which is a combination of the ceq and the brfalse
instructions.

If the variable V_0 is TRUE, the break statement ensures an exit from the loop by
jumping to the ret statement at label IL_0019 using the instruction br.s.

a.cs
public class ###
{
{
int i 9
'or ( i ( 19 iP( 1) 9 iLL)
{
i' ( i (( 2)
continue9
}
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 @D))
ldc.i/.1
stloc.)
br.s ?D))1/
?D)))/: ldloc.)
ldc.i/.2
bne.un.s ?D)))a
br.s ?D))1)
243885141.doc 57 od 372
?D)))a: ldloc.)
?D))1): ldloc.)
ldc.i/.1
add
stloc.)
?D))1/: ldloc.)
ldc.i/.s 1)
ble.s ?D)))/
ret
}
}

A continue statement takes control to the end of the for loop. When the if statement
results in true, the program will jump to the end of the loop, bypassing the
WriteLine function. The code will then resume execution at label IL_0010 where,
the value of the variable V_0 is incremented by 1.

The main diference between the break and the continue statements is as follows:

In a break statement, the programs jumps out of the loop.
In a continue statement, the program jumps to the end of the loop,
bypassing the remaining statements.

A goto statement could have also been used to achieve the same functionality. Thus,
the break, continue or goto statements, on conversion to IL, are transformed into
the same br instruction.

The program demonstrates that a goto statement of C# is simply translated into a
br instruction in IL.

a.cs
public class ### {
{
!oto aa9
aa: 9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
br.s ?D)))2
?D)))2: ret
}
243885141.doc 58 od 372
}

A simple goto statement in C# is translated into a br instruction in IL. Using a goto
is considered inappropriate in languages like C# but, its equivalent br instruction
in IL is extensively utilised for implementing various constructs like the if
statement, loops etc. Thus, what is taboo in a programming language is extremely
useful in IL.

a.cs
public class ###
{
{
int j9
'or ( int i ( 19 i P( 2 9 iLL)
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.1
stloc.1
br.s ?D)))e
?D)))/: ldloc.1
ldloc.1
ldc.i/.1
add
stloc.1
?D)))e: ldloc.1
ldc.i/.2
ble.s ?D)))/
ret
}
}

6utput
1>

This example illustrates a for statement. We have created a variable j in the function
Main and a variable i in the for statement. This variable i is visible only in the for
loop in C#. Thus, this variable has a limited scope.

But on conversion to IL, all variables are given the same scope. This is because,
the concept of variable scoping is alien to IL. Therefore, it is upto the C# compiler to
243885141.doc 59 od 372
enforce the rules of variable scoping. We can therefore conclude that, all variables
have the same scope or visibility in IL.
-4-

Keywords and Operators

Code that is placed after the return statement never gets executed. In the frst
program given below, you will notice that there is a WriteLine function call in C#
but is not visible in our IL code. This is because the compiler is aware that any
statements after return is not executed and hence, it serves no purpose to convert it
into IL.

a.cs
class ###
{
{
return9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
br.s ?D)))2
?D)))2: ret
}
}

The compiler does not waste time compiling code that will never get executed,
instead generates a warning when it encounters such a situation.

a.cs
class ###
{
{
}
###( int i)
{
}
}
243885141.doc 60 od 372

a.il
.assembly mukhi {}
{
{
.entrypoint
ret
}
.method private hidebysi! specialname rtspecialname instance void .ctor(int&2 i) il
mana!ed {
ldar!.)
ldstr "hi"
ret
}
}

If a constructor is not present in the source code, a constructor with no parameters
gets generated. If a constructor is present, the one with no parameters is eliminated
from the code.
The base class constructor always gets called without any parameters and it gets
called frst. The above IL code proves this fact.

a.cs
namespace vijay
{
namespace mukhi
{
class ###
{
{
}
}
}
}

a.il
.assembly mukhi {}
.namespace vijay.mukhi
{
{
{
.entrypoint
ret
}
}
}
243885141.doc 61 od 372

We may write a namespace within a namespace, but the compiler converts it all
into one namespace in the IL fle. Thus, the two namespaces vijay and mukhi in the
C# fle get merged into a single namespace vijay.mukhi in the IL fle.

a.il
.assembly mukhi {}
.namespace vijay
{
.namespace mukhi
{
{
{
.entrypoint
ret
}
}
}
}

In C#, one namespace can be present within another namespace but, the C#
compiler prefers using only a single namespace, hence the il ouput displays only
one namespace. The .namespace directive in IL is similar in concept to the
namespace keyword in C#. The idea of a namespace originally germinated in IL,
and not in programming language such as C#.

a.cs
namespace mukhi
{
class ###
{
{
}
}
}
namespace mukhi
{
class pNr
{
}
}

a.il
.assembly mukhi {}
.namespace mukhi
{
{
{
243885141.doc 62 od 372
.entrypoint
ret
}
}
.class private auto ansi pNr e$tends 6mscorlib7System.%bject
{
}
}

We may have two namespaces called mukhi in the C# fle, but they become one
large namespace in the IL fle and their contents get merged. This facility of merging
namespaces is ofered by the C# compiler.

Had the designers deemed it ft, they could have fagged the above program as an
error instead.

a.cs
class ###
{
{
int i ( *9
### a ( ne> ###()9
a.abc(re' i)9
}
public void abc(re' int i)
{
i ( 1)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.*
stloc.)
stloc.1
ldloc.1
ldloca.s @D)
call instance void ###::abc(int&2&)
ldloc.)
ret
}
.method public hidebysi! instance void abc(int&2& i) il mana!ed
{
243885141.doc 63 od 372
ldar!.1
ldc.i/.s 1)
stind.i/
ret
}
}

6utput
18

We will now explain how IL implements passing by reference. Unlike C#, it is very
convenient to work with pointers in IL. It has three types of pointers.

When the function abc is called, the variable i is passed to it as a reference
parameter. In IL, the instruction ldloca.s gets called, which places the address of
the variable on the stack. Had the instruction been ldloc instead, the value of the
variable would be placed on the stack.

In the function call, we add the symbol & at the end of the type name to indicate
the address of a variable. & sufxed to a data type indicates the memory location of
a variable, and not the value contained in it.

In the function itself, ldarg.1 is used to place the address of parameter 1 on the
stack. Then, we place the number that we want to initialise it with, on the stack. In
the above example, we have frst placed the address of the variable i on the stack,
followed by the value that we want to initialize it with i.e. 10.

The instruction stind places the value that is present on top of the stack i.e. 10 in
the variable whose address is stored as the second item on the stack. In this case,
as we have passed the address of the variable i on the stack, the variable i is
assigned the value 10.

The instruction stind is used when an address is given on the stack. It flls up that
memory location with the specifed value.

If the word ref is replaced with the word out, IL shows the same output because, in
either case, the address of a variable is being put on the stack. Thus, ref and out
are artifcial concepts implemented in C# and have no equivalent representation in
IL.

The IL code has no way of knowing whether the original program used ref or out.
Thus, on disassembling this program, we will have no way of diferentiating between
ref and out as this information is lost on conversion from C# code into IL code.

a.cs
class ###
{
243885141.doc 64 od 372
{
strin! s ( "hi" L "bye"9
System.Console.Writeine(s)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class System.Strin! @D))
ldstr "hibye"
stloc.)
ldloc.)
ret
}
}

6utput
hi-e

The next focus is on concatenating two strings. The C# compiler does this by
converting them into one string. This occurs due to the compiler's zest to optimise
constants. The value is stored in a local variable and then placed on the stack.
Thus, at runtime, the C# compiler optimises the code as far as possible.

a.cs
class ###
{
{
strin! s ( "hi" 9
strin! t ( s L "bye"9
System.Console.Writeine(t)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class System.Strin! @D)Kclass System.Strin! @D1)
ldstr "hi"
stloc.)
ldloc.)
ldstr "bye"
243885141.doc 65 od 372
call class System.Strin! 6mscorlib7System.Strin!::Concat(class System.Strin!Kclass
System.Strin!)
stloc.1
ldloc.1
ret
}
}

6utput
hi-e

Whenever the compiler deals with variables, it is ignorant of their values at compile
time. The following steps are executed in the above program:

Two variables s and t are converted into the local variables V_0 and V_1
respectively.
The local variable V_0 is assigned the string "hi".
This variable is then pushed onto the stack.
Next, the constant string "bye" is put on the stack.
Thereafter, the + operator is converted into a static function Concat, which
belongs to the String class.
This function concatenates the two strings and creates a new string on the
stack.
This concatenated string is stored in the variable V_1.
The concatenated string is fnally printed out.

There are two PLUS (+) operators in C#:

One handles strings. This operator gets converted into the function Concat
from the String class in IL.
The other one handles numbers. This operator gets converted to the add
instruction in IL.

Thus, the String class and its functions are built into the C# compiler. We can
therefore conclude that, C# can understand and handle String operations.

a.cs
class ###
{
{
strin! a ( "bye"9
strin! b ( "bye"9
System.Console.Writeine(a (( b)9
}
}
243885141.doc 66 od 372

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class System.Strin! @D)Kclass System.Strin! @D1)
ldstr "bye"
stloc.)
ldstr "bye"
stloc.1
ldloc.)
ldloc.1
call bool 6mscorlib7System.Strin!::0Nuals(class System.Strin!Kclass System.Strin!)
ret
}
}

6utput
$rue

Like the + operator, when the == operator is used with strings, the compiler
converts it into the function Equals.

From the above examples, we can deduce that the C# compiler is totally at ease
with strings. The next version will introduce many more of such classes which the
compiler shall understand intuitively.

a.cs
class ###
{
{
System.Console.Writeine((char)*1)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.s *1
call void 6mscorlib7System.Console::Writeine(>char)
ret
}
}

243885141.doc 67 od 372
6utput
A

Whenever we cast a variable, like a numeric value to a character value, internally,
the program merely calls the function with the data type of the cast. A cast does not
modify the original variable. What actually happens is that, instead of the WriteLine
function being called with an int, it gets called with a wchar. Thus a cast does not
incur any run-time overhead.

a.cs
class ###
{
{
char i ( QaQ9
System.Console.Writeine((char)i)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (>char @D))
ldc.i/.s ,4
stloc.)
ldloc.)
ret
}
}

6utput
a

The char data type of C# has a size of 16 bytes. It is converted into a wchar on
conversion to IL. The character 'a' gets converted into the ASCII number 97. This is
placed on the stack and the variable V_0 is initialised to this value. Thereafter, the
program displays the value 'a' on the screen.

a.cs
class ###
{
{
System.Console.Writeine(QRu))/1Q)9
System.Console.Writeine()$/1)9
}
}
243885141.doc 68 od 372

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.s *1
ldc.i/.s *1
ret
ret
}
}

6utput
A
N?

il cannot understand UNICODE characters or HEXADECIMAL numbers. It prefers
plain and simple decimals. The \u escape sequence is provided as a convenience to
C# programmers, to enhance their productivity.

You may have noticed that, even though the above program has two ret
instructions, no error is generated. The criteria is that at least one ret instruction
should be present.

a.cs
class ###
{
{
int Sint9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 @D))
ret
}
}
Variables created on the stack in C# are not given the same names on conversion to
IL. So, the situation where a reserved word of C# could create a problem in IL, does
not arise.
243885141.doc 69 od 372

a.cs
class ###
{
int Sint9
{
}
}

a.il
.assembly mukhi {}
{
.'ield private int&2 QintQ
{
.entrypoint
ret
}
}

In the above program, the local variable @int becomes a feld named int and the int
datatype is changed to int32, which is a reserved word in IL. Thereafter, the
compiler writes the feldname in single inverted commas. On conversion to IL, the @
sign simply disappears from the name of the variable.

a.cs
:: hi this is comment
class ### {
public static void 8ain() :: allo>ed here
{
:E
. comment over
t>o lines
E:
}
}
a.il
.assembly mukhi {}
{
{
.entrypoint
ret
}
}

When you see the above code, you will realize why programmers the world over have
an aversion to writing comments. All comments in C# are stripped of when the IL
fle is generated. Not a single comment is copied over into the IL code.

243885141.doc 70 od 372
The compiler has scant respect for comments, and it throws all of them away. There
is little wonder that programmers consider writing comments as an exercise in
futility, and their frustration is well founded.

a.cs
class ###
{
{
System.Console.Writeine("hi Rn5yeRt<o")9
System.Console.Writeine("RR")9
System.Console.Writeine(S"hi Rn5yeRt<o")9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldstr "hi Rn5yeRt<o"
ldstr "RR"
ldstr "hi RRn5yeRRt<o"
ret
}
}

6utput
hi
7-e +o
\
hi \n7-e\t+o

The String handling capabilities of C# have been inherited from IL. The escape
sequences like \n have been simply copied over.

The two backslashes (\\) result in a single backslash when displayed.

If a string is prefaced with an @ sign, the special meaning of the escape sequences
in the string is ignored and they are displayed verbatim, as shown in the program
above.

If IL had not provided support for string formatting, it would have been vexed with
the predicament of handling most of the modern programming languages.

a.cs
243885141.doc 71 od 372
Tde'ine vijay
class ### {
{
Ti' vijay
System.Console.Writeine("1")9
Telse
Tendi'
}
}
a.il
.assembly mukhi {}
{
.entrypoint
ldstr "1"
ret
ret
}
}

6utput
1

The next series of programs deals with the pre-processor directives, that are alien
to the C# compiler. Only the pre-processor is capable of comprehending them.

In the above .cs program, the #defne directive creates a word called "vijay". The
compiler knows that the #if statement is TRUE and therefore, it ignores the #else
statement. Thus, the IL fle that is generated contains only the WriteLine function
that has the parameter '1' and not the one that has the parameter '2'.

This is the power of compile time knowledge. A large amount of the code that is
never going to be used, is simply eliminated by the pre-processor prior to converting
it into IL.

a.cs
Tde'ine vijay
Tunde' vijay
Tunde' vijay
class ### {
Ti' vijay
Tendi'
}
}
a.il
.assembly mukhi {}
243885141.doc 72 od 372
{
{
.entrypoint
ret
}
}

We can use as many #undef statements as we like. The compiler knows that the
word 'vijay' has been undefned and therefore, it ignores the code in the #if
statement.

There is no way the original pre-processor directives can be recovered on re-
conversion of code from IL to C#.

a.cs
T>arnin! We have a code red
class ###
{
{
}
}

The pre-processor directive #warning in C# is used to display warnings for the
beneft of the programmer who runs the compiler.

The pre-processor directives #line and #error also do not produce any executable
output. They are used merely for providing information.

Inheritance

a.cs
class ###
{
{
$$$ a ( ne> $$$()9
a.abc()9
}
}
class yyy
{
public void abc()
{
System.Console.Writeine("yyy abc")9
}
}
class $$$ : yyy
{
}
243885141.doc 73 od 372

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class $$$ @D))
ne>obj instance void $$$::.ctor()
stloc.)
ldloc.)
call instance void yyy::abc()
ret
}
}
.class private auto ansi yyy e$tends 6mscorlib7System.%bject
{
.method public hidebysi! instance void abc() il mana!ed
{
ldstr "yyy abc"
ret
}
}
.class private auto ansi $$$ e$tends yyy
{
}

6utput
--- ac

The concept of inheritance is identical in all programming languages that support
it. The word extends has originated in IL and Java and not in C#.

When we write a.abc(), the compiler decides on the abc function to call based on the
following criteria:

If the class xxx has a function abc, then the call in function vijay will have
the prefx xxx.
If the class yyy has a function abc, then the call in function vijay will have
the prefx yyy.

Therefore, the intelligence that decides as to which function abc is to be called,
resides in the compiler and not in the generated IL code.

a.cs
class ### {
{
yyy a ( ne> $$$()9
a.abc()9
243885141.doc 74 od 372
}
}
class yyy
{
public virtual void abc()
{
}
}
class $$$ : yyy
{
public ne> void abc()
{
System.Console.Writeine("$$$ abc")9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class yyy @D))
stloc.)
ldloc.)
callvirt instance void yyy::abc()
ret
}
}
{
.method public hidebysi! ne>slot virtual instance void abc() il mana!ed
{
ldstr "yyy abc"
ret
}
}
{
{
ldstr "$$$ abc"
ret
}
}

6utput
--- ac

243885141.doc 75 od 372
In the context of the above program, a small explanation would not be out of place
for the beneft of C# neophytes.

We can equate an object a of a base class yyy to a derived class xxx. We have called
the function a.abc(). The question that comes to the fore is: which of the following
two versions of the function abc will be called ?

The function abc present in the base class yyy, to which the calling object
belongs.
OR
The function abc present in the class xxx, which is the type that it has been
initialised to.

In other words, is the compile time type signifcant or the runtime type ?

The base class function has a modifer called virtual implying that the derived
classes can override this function. The derived class, by adding the modifer new,
informs the compiler that, this function abc has nothing to do with the function
abc of the derived class. It is to treat them as separate entities.

First, the this pointer is put on the stack using ldloc.0. Then, inplace of a call
instruction there is a callvirt instead. This is because the function abc is virtual.
Other than this, there exists no diference. The function abc in class yyy is declared
virtual and is also tagged with newslot. This signifes that it is a new virtual
function. The word new is placed in the derived class in C#.

IL also uses a mechanism similar to that of C#, to fgure out as to which version of
abc is to be called.

a.cs
class ###
{
{
yyy a ( ne> $$$()9
a.abc()9
}
}
class yyy
{
{
}
}
class $$$ : yyy
{
public override void abc()
{
243885141.doc 76 od 372
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
stloc.)
ldloc.)
ret
}
}
{
{
ldstr "yyy abc"
ret
}
}
{
.method public hidebysi! virtual instance void abc() il mana!ed
{
ldstr "$$$ abc"
ret
}
{
ldar!.)
call instance void yyy::.ctor()
ret
}
}

6utput
... ac

If the base constructor of class xxx is not called, no output is displayed in the
output window. As a rule, we have not included the free constructor code in our IL
programs.

243885141.doc 77 od 372
In absence of the keywords new or override, the default keyword used is new. In the
above function abc, in class xxx, we have used the override keyword, which implies
that this function abc overrides the function of the base class.

By default, IL calls the virtual function from the class which the object looks like
and uses the compile time type. In this case, it is yyy.

The frst change that occurs with override in the derived class is the addition of the
word virtual to the function prototype. This was not supplied earlier with new
because a new function got created altogether which isolated itself from the base
class.

The use of override efectively results in the overriding of the base class function.
This makes the function abc a virtual function in the class xxx. In other words,
override becomes virtual whereas, new becomes nothing.

As there is a newslot modifer in the base class and a virtual function of the same
name in the derived class, the derived class gets called.

In a virtual function, the run time type of the object gets preference. The instruction
callvirt resolves this issue at run-time and not at compile time.

a.cs
class ###
{
{
yyy a ( ne> $$$()9
a.abc()9
}
}
class yyy
{
{
}
}
class $$$ : yyy
{
public override void abc()
{
base.abc()9
}
}

a.il
{
243885141.doc 78 od 372
ldar!.)
ldstr "$$$ abc"
ret
}

Only the code of the function abc in class xxx has been shown above. The rest of
the IL code has been omitted. base.abc() calls the function abc from the base class,
i.e. class yyy. The keyword base is a reference to the object in memory. This keyword
of C# is not understood by IL as it is a compile time issue. Base does not care
whether the function is virtual or not.

Whenever we make a function virtual for the frst time, it is a good idea to mark it
as newslot, solely to signify a break from all the functions with the same name
present in the superclasses.

a.il
.assembly mukhi {}
{
{
.entrypoint
ne>obj instance void yyy::.ctor()
callvirt instance void iii::pNr()
ret
}
}
.class inter'ace iii
{
.method public virtual abstract void pNr() il mana!ed
{
}
}
.class public yyy implements iii
{
.override iii::pNr >ith instance void yyy::abc()
.method public virtual hidebysi! ne>slot instance void abc() il mana!ed
{
ldstr "yyy abc"
ret
}
{
ldar!.)
ret
}
}

6utput
243885141.doc 79 od 372
--- ac

We have created an interface iii with just one function called pqr. Then, the class
yyy implements from interface iii but does not implement function pqr. Instead it
adds a function called abc. In the entrypoint function vijay, function pqr is called
of the interface iii.

The reason we get no errors is due to the presence of the override directive. This
directive informs the assembler to redirect any call made to the function pqr of
interface iii, to the class yyy function abc. The assembler is very serious about the
override directive. This can be gauged from the fact that without the implements iii
in the defnition of class yyy we are given the following exception:

6utput
,.ception occurred: )-stem.$-peLoad,.ception: 3lass --- tried to o<erride
method p(r ut does not implement or inherit that methods.
at ===.<iCa-DE

Destructors

a.cs
class ###
{
{
}
U###()
{
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ret
}
.method 'amily hidebysi! virtual instance void 3inali#e() il mana!ed
{
ldstr "hi"
ldar!.)
call instance void 6mscorlib7System.%bject::3inali#e()
ret
}
}

+o output
243885141.doc 80 od 372

A destructor gets converted into a function called Finalize. This piece of information
is also laid down in the C# documentation. The Finalize function calls the original
from Object. The text "hi" does not get displayed because the function is called as
and when the runtime decides. All we know is that it gets called at its demise.
Thus, whenever the object dies, it calls Finalize. There is no way of destroying
anyone or anything, including .NET objects.

a.cs
class ###
{
public ###()
{
}
public ###(int i)
{
}
{
}
U###()
{
}
}
class yyy : ###
{
}

a.il
.class private auto ansi yyy e$tends ###
{
{
ldar!.)
call instance void ###::.ctor()
ret
}
}

In the above code, weve diplayed only the yyy class. Even though we have 2
constructors and 1 destructor, the class yyy only receives the free constructor with
no parameters. Thus, derived classes do not inherit constructors or destructors of
the base class.

a.il
.assembly mukhi {}
{
{
243885141.doc 81 od 372
.entrypoint
call void yyy::abc()
ret
}
}
.class private auto ansi yyy e$tends 6mscorlib7System..rray
{
{
ldstr "hi"
ret
}
}

6utput
hi

In C#, we are not allowed to derive a class from certain classes like System.Array.
However, in IL there is no such restriction. Thus, the above code does not generate
any error.

We can safely conclude that the C# compiler has added the above restrictions and
that IL is less restrictive. The rules of a language are decided by the compiler at
compile time.

For your information, the other classes that we cannot derive from, in C#, are
Delegate, Enum and ValueType.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class aa @D))
ne>obj instance void aa::.ctor()
stloc.)
ret
}
}
.class public auto ansi aa e$tends bb
{
{
ldar!.)
call instance void bb::.ctor()
ldstr "aa"
ret
}
243885141.doc 82 od 372
}
.class public auto ansi bb e$tends cc
{
{
ldar!.)
call instance void cc::.ctor()
ldstr "bb"
ret
}
}
.class public auto ansi cc e$tends aa
{
{
ldar!.)
call instance void aa::.ctor()
ldstr "cc"
ret
}
}

,rror
,.ception occurred: )-stem.$-peLoad,.ception: 3ould not load class OaaO
ecause the format is ad Dtoo longPE
at ===.<iCa-DE

We are forbidden to have a circular reference in C#. The compiler checks for it and if
found, reports an error. IL, however, does not check for a circular reference because,
Microsoft does not expect all programmers to use pure IL.

Hence, class aa extends bb, class bb extends cc and fnally class cc extends aa.
This completes the circular reference. The exception that is thrown at runtime does
not give any indication of a circular reference. Thus, if we had not unravelled this
mystery for you here, the exception would have most probably left you bafed. We
do not intend to disclose the fact that we have understood IL deeply, but there is no
harm in giving oneself a pat on the back, once in a while.

a.cs
internal class ###
{
{
}
}

a.il
.assembly mukhi {}
{
243885141.doc 83 od 372
{
.entrypoint
ret
}
}
Access modifers, like the keyword internal, are only part of the C# lexicon and have
no relevance in IL. The keyword internal signifes that the particular class can only
be accessed from within the fle in which it is present.

Thus, by mastering IL, we are in a position to diferentiate between the core
belongings of .NET and features existing in the realms of C#.

a.il
.assembly mukhi {}
{
{
.entrypoint
ret
}
}
.class public auto ansi yyy e$tends $$$
{
}
.class private auto ansi $$$ e$tends 6mscorlib7System.%bject
{
}

In C#, there is a rule : the base class has to be more accessible than the derived
class. This rule is not adhered to in IL. Thus even though the base class xxx is
private and the derived class yyy is public, no error is generated in IL.

a.il
.assembly mukhi {}
{
{
.entrypoint
ret
}
}

A function in C# cannot be more accessible than the class within which it resides.
The function vijay is public, whereas the class that it is located in is private. Thus,
the class is more restrictive than the function contained in it. Again, there is no
such restriction imposed in IL.

a.cs
class ###
{
243885141.doc 84 od 372
{
yyy a ( ne> yyy()9
$$$ b ( ne> $$$()9
a ( b9
b ( ($$$) a9
}
}
class yyy
{
}
class $$$ : yyy
{
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class yyy @D)Kclass $$$ @D1)
stloc.)
stloc.1
ldloc.1
stloc.)
ldloc.)
castclass $$$
stloc.1
ret
}
}
{
}
{
{
ldar!.)
ret
}
}

Without a constructor in xxx, the following exception is thrown:

6utput
,.ception occurred: )-stem.In<alid3ast,.ception: An e.ception of t-pe
)-stem.In<alid3ast,.ception was thrown.
at ===.<iCa-DE
243885141.doc 85 od 372

In the above example, we are creating two objects a and b, that are instances of
classes yyy and xxx respectively. The class xxx is the derived class and yyy is the
base class. We can write a = b but, if we equate a derived class to a base class, an
error is generated. Thus, a cast operator is required.

A cast in C# gets converted to the instruction castclass, followed by the name of the
derived class that the class has to be cast into. If it cannot be casted, the above
mentioned exception will be raised.

In the above code, there is no constructor, and hence, the exception is generated.

Thus, IL has a number of higher level primitives that deal with objects and classes.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class yyy @D)Kclass $$$ @D1)
stloc.)
stloc.1
ldloc.1
stloc.)
ldloc.)
castclass $$$
stloc.1
ret
}
}
{
}
.class private auto ansi $$$ e$tends 6mscorlib7System.%bject
{
{
ldar!.)
call instance void System.%bject::.ctor()
ret
}
}

In the above case, the class xxx does not derive from class yyy anymore. They both
extend from the Object class. Yet, we are allowed to cast the class yyy to class xxx.
No error is generated with a constructor in the class xxx. but on removal of the
constructor, an exception is generated. IL too has its own strange way of working.
243885141.doc 86 od 372

a.il
.assembly mukhi {}
.class private auto ansi sealed ### e$tends 6mscorlib7System.%bject
{
{
.entrypoint
ret
}
}
.class private auto ansi yyy e$tends ###
{
}

The documentation states very clearly that a sealed class cannot be extended or
sub-classed any further. In this case, an error was expected but none was
generated. We must remind you that we are working on a beta copy. The next
version may generate an error.

a.il
.assembly mukhi {}
{
{
.entrypoint
stloc.)
ret
}
}
.class private auto ansi abstract yyy
{
}

An abstract class cannot be used directly. It can only be derived from. The above
code should have generated an error, but it does not.

a.cs
public class ###
{
const int i ( 1)9
{
}
}

a.il
.assembly mukhi {}
243885141.doc 87 od 372
{
{
.entrypoint
ldc.i/.s 1)
ret
ret
}
}

6utput
18

A constant is an entity that only exists at compile time. It is not visible at run-time.
This proves that the compiler removes all traces of compile time objects. On
conversion to IL, all occurrences of int i in the C# code get replaced by the number
10.

a.cs
public class ###
{
const int i ( j L /9
const int j ( k A 19
const int k ( &9
System.Console.Writeine(k)9
}
}

a.il
.assembly mukhi {}
{
.'ield private static literal int&2 i ( int&2()$)))))))*)
.'ield private static literal int&2 j ( int&2()$)))))))2)
.'ield private static literal int&2 k ( int&2()$)))))))&)
{
.entrypoint
ldc.i/.&
ret
}
}
6uput
@

All the constants are evaluated by the compiler and, even though, they may refer to
other constants, they are given absolute values. The IL runtime does not allocate
any memory for literal felds. This falls in the realm of metadata, which we shall
explain later.

243885141.doc 88 od 372
a.il
.assembly mukhi {}
{
.'ield private static literal int&2 i ( int&2()$)))))))*)
{
.entrypoint
ldc.i/.*
sts'ld int&2 ###::i
ret
}
}

6utput
,.ception occurred: )-stem.Missing0ield,.ception: ===.i
at ===.<iCa-DE

A literal feld represents a constant value. In IL, we are not allowed to access any
literal feld. The assembler does not generate any error at the time of assembling,
but an exception is thrown at run time. We expected a compile time error, since we
have used a literal feld in the instruction stsfd.

a.cs
public class ###
{
public static readonly int i ( 1)9
{
}
}
a.il
.assembly mukhi {}
{
.'ield public static initonly int&2 i
{
.entrypoint
lds'ld int&2 ###::i
ret
}
{
ldc.i/.s 1)
sts'ld int&2 ###::i
ret
}
}

6utput
18
243885141.doc 89 od 372

A readonly feld cannot be modifed. In IL, we have a modifer called initonly which
implements the same concept.

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.s 1)
sts'ld int&2 ###::i
lds'ld int&2 ###::i
ret
}
}

The documentation very clearly states that initonly felds can only be changed in
the constructor, but the CLR ( Common Language Runtime) does not strictly check
this. Maybe in the next version, they should guard against such occurrences.

Thus, the entire series of restrictions on readonly have to be enforced by the
programming language that converts the source code to IL. We are not trying to run
down IL, but IL expects someone else to do the error checking in this situation.

a.cs
public class ###
{
{
### a ( ne> ###()9
pNr()9
a.abc()9
}
public static void pNr()
{
}
public void abc()
{
}
}

a.il
.assembly mukhi {}
{
{
243885141.doc 90 od 372
.entrypoint
stloc.)
call void ###::pNr()
ldloc.)
call instance void ###::abc()
ret
}
.method public hidebysi! static void pNr() il mana!ed
{
ret
}
{
ret
}
}

This example serves as a refresher. The static function pqr is not passed the this
pointer on the stack, whereas, the non-static function abc is passed the this
pointer or a reference to where its variables are stored in memory.

Thus, before the call to function abc, the instruction ldloc.0 pushes the reference of
zzz onto the stack.

a.cs
public class ###
{
{
pNr(1)K2))9
}
public static void pNr(int i K int j)
{
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.s 1)
ldc.i/.s 2)
call void ###::pNr(int&2Kint&2)
ret
}
.method public hidebysi! static void pNr(int&2 iKint&2 j) il mana!ed
{
243885141.doc 91 od 372
ret
}
}

The calling convention indicates the order in which the parameters should be
pushed onto the stack. The default sequence in IL is the order in which they were
written. Thus, the number 10 frst goes onto the stack, followed by the number 20.

Microsoft implements the reverse order. Thus, frst 20 goes on the stack followed by
10. We cannot reason out this idiosyncrasy.

a.cs
public class ###
{
{
bb a ( ne> bb()9
}
}
public class aa
{
public aa()
{
System.Console.Writeine("in const aa")9
}
public aa(int i)
{
System.Console.Writeine("in const aa" L i)9
}
}
public class bb : aa
{
public bb() : this(2))
{
System.Console.Writeine("in const bb")9
}
public bb(int i) : base(i)
{
System.Console.Writeine("in const bb" L i)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class bb @D))
ne>obj instance void bb::.ctor()
stloc.)
ret
243885141.doc 92 od 372
}
}
.class public auto ansi aa e$tends 6mscorlib7System.%bject
{
{
ldar!.)
ldstr "in const aa"
ret
}
.method public hidebysi! specialname rtspecialname instance void .ctor(int&2 i) il
mana!ed
{
ldar!.)
ldstr "in const aa"
ldar!a.s i
call class System.Strin! 6mscorlib7System.Strin!::Concat(class System.%bjectKclass
System.%bject)
ret
}
}
.class public auto ansi bb e$tends aa
{
{
ldar!.)
ldc.i/.s 2)
call instance void bb::.ctor(int&2)
ldstr "in const bb"
ret
}
mana!ed
{
ldar!.)
ldar!.1
call instance void aa::.ctor(int&2)
ldstr "in const bb"
ldar!a.s i
call class System.Strin! 6mscorlib7System.Strin!::Concat(class System.%bjectKclass
System.%bject)
ret
}
}

6utput
in const aa>8
243885141.doc 93 od 372
in const >8
in const

We have created only one object, which is an instance of the class bb. Instead of two
constructors, one for the base class and one from the derived class, three
constructors are called.

In IL, at frst, a call is made to the constructor of bb with no parameters.
Then, on reaching the constructor bb, a call is made to another constructor
of the same class but with a parameter value of 20. this(20) gets converted into
an actual constructor call with one parameter.
Now, we move onto the one constructor of bb. Here, initially a call the one
constructor of aa is made as the base class constructor needs to be called frst.

Luckily, the base class constructor of aa does not take us on another wild goose
chase. After it fnishes execution, the strings are displayed, and fnally, the
constructor of bb that has no parameters, gets called.

Thus, base and this do not exist in IL and are compile time artefacts that get hard
coded into the IL code.

a.il
.assembly mukhi {}
.class private auto ansi ### e$tends 6mscorlib7System.%bject {
.entrypoint
ret
}
}
.class public auto ansi aa e$tends 6mscorlib7System.%bject {
{
ret
}
}

6utput
,.ception occurred: )-stem.MethodAccess,.ception: aa..ctorDE
at ===.<iCa-DE

We cannot access a private member from outside the class. Thus, as we have made
the only constructor private in the class bb, we are not allowed to create any object
that looks like class bb. In C#, the same rules apply for the access modifers also.

a.cs
public class ###
{
243885141.doc 94 od 372
{
yyy a ( ne> yyy()9
}
}
class yyy
{
public int i9
public bool j9
public yyy()
{
System.Console.Writeine(j)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
stloc.)
ret
}
}
{
.'ield public int&2 i
.'ield public bool j
{
ldar!.)
ldar!.)
ld'ld int&2 yyy::i
ldar!.)
ld'ld bool yyy::j
ret
}
}

6utput
8
0alse

Here, the variables i and j are not initialized. Thus, these felds do not get initialized
in the static constructors of class yyy. Before any code in class yyy gets called, these
variables are assigned their default values, which depend upon their data type. In
243885141.doc 95 od 372
this case, they are initialised by the constructors of the int and bool classes, since
these constructors get called frst.

a.cs
class ###
{
{
int i ( 1)9
strin! j9
j ( i M( 2) C "hi" : "bye"9
}
}

a.il
.assembly mukhi {}
{
.entrypoint
.locals (int&2 @D)Kclass System.Strin! @D1)
ldc.i/.s 1)
stloc.)
ldloc.)
ldc.i/.s 2)
b!e.s ?D)))'
ldstr "bye"
br.s ?D))1/
?D)))': ldstr "hi"
?D))1/: stloc.1
ldloc.1
ret
}
}

6utput
-e

The ternary operator is glorifed if statement compressed into a single line. The
variables i and j in C# become V_0 and V_1 on conversion to IL. We frst initialize
variable V_0 to 10 and then, place the condition value 20 on the stack.

The instruction bge.s is based on the instructions clt and brfalse.

If the condition is TRUE, bge.s executes a jump to the label IL_0014.
If the condition is FALSE, the program proceeds to the label IL_000f.

Then, the program proceeds to the WriteLine function and prints the appropriate
text.
243885141.doc 96 od 372

From the resultant IL code, there is no way of deciphering whether the original C#
code had used an if statement or a ?: operator. A large number of operators in C#,
such as the ternary operator, have been borrowed from the C programming
language.

a.cs
class ###
{
{
int i ( 1K j( 29
i' ( i M( / & j M 1)
System.Console.Writeine("& true")9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.1
stloc.)
ldc.i/.2
stloc.1
ldloc.)
ldc.i/./
clt
ldc.i/.)
ceN
ldloc.1
ldc.i/.1
c!t
and
br'alse.s ?D))1c
ldstr "& true"
?D))1c: ret
}
}

The & operator in C# makes the if statement more complex. It only returns TRUE if
both the conditions are TRUE. Otherwise, it returns FALSE. There is no equivalent
for the & operator in IL. Thus, it is implemented in a round about way as follows:

First we use the ldc instruction to place a constant value on the stack.
Next, the instruction stloc initializes variables i and j i.e. V_0 and V_1.
243885141.doc 97 od 372
Then, the value of V_0 is placed on the stack.
Thereafter, the condition value 4 is checked.
Then, the condition clt is used to check if the frst item on the stack is less
than the second. If it is, as is the case in the above example, then the value 1
(TRUE) is put on the stack.
The original expression in C# is i >= 4. In IL, a check for < or clt is made.
Then we check for equality i.e. = using ceq and place zero on the stack. This
results in a FALSE.
Then we follow the same rules for j > 1. Here, we use cgt instead of clt. The
result of the cgt operator is TRUE.
This result of TRUE is ANDED with the previous result of FALSE to fnally
give a FALSE value.

Note that the AND instruction will return a 1, if and only if, both the conditions are
TRUE. In all other conditions, it will return FALSE.

a.cs
class ###
{
{
int i ( 1K j( 29
i' ( i M( / && j M 1)
System.Console.Writeine("&& true")9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.1
stloc.)
ldc.i/.2
stloc.1
ldloc.)
ldc.i/./
blt.s ?D))1*
ldloc.1
ldc.i/.1
ble.s ?D))1*
ldstr "&& true"
?D))1*: ret
}
}
243885141.doc 98 od 372

Operators like the && operator are called short circuit operators as they execute the
second condition only if the frst condition is true. We have repeated the same IL
code as earlier, but now the condition is checked by instruction blt.s, a combination
of the clt and brtrue instructions.

If the condition is FALSE, a jump is made to the ret instruction at label IL_0016.
Only if the condition is TRUE, we proceed further and check the second condition.
For this, we use the instruction ble.s that is a combination of cgt and brfalse. If the
second condition is FALSE, we jump to the ret command as before and for TRUE we
execute the WriteLine function.

The && operator executes faster than the & because it only proceeds further if the
frst condition results in TRUE. In doing so, the output of the frst expression
afects the fnal outcome.
The | and || operators also behave in a similar manner.

a.cs
class ### {
{
bool $Ky9
$ ( true9
y ( 'alse9
System.Console.Writeine( $ H y)9
$ ( 'alse9
System.Console.Writeine( $ H y)9 }
}

a.il
.assembly mukhi {}
{
.entrypoint
.locals (bool @D)Kbool @D1)
ldc.i/.1
stloc.)
ldc.i/.)
stloc.1
ldloc.)
ldloc.1
$or
ldc.i/.)
stloc.)
ldloc.)
ldloc.1
$or
ret
243885141.doc 99 od 372
}
}

6utput
$rue
0alse

The ^ sign is called an XOR operator. The XOR is like an OR statement, but there is
a diference: An OR returns TRUE if any of its operands is TRUE, but an XOR will
return TRUE if and only if one of its operands is TRUE and the other one is FALSE.
Even if both operands are TRUE, it will return FALSE. xor is an IL instruction.

The != operator gets converted into the normal set of IL instructions i.e. a
comparison is done and the program branches accordingly.

a.cs
class ###
{
{
bool $ ( true9
System.Console.Writeine(O$)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (bool @D))
ldc.i/.1
stloc.)
ldloc.)
ldc.i/.)
ceN
ret
}
}

6utput
0alse

The ! operator in C# converts a TRUE to a FALSE and vice versa. In IL, the
instruction used is ceq. This instruction checks the last two parameters on the
stack. If they are the same, it returns TRUE, otherwise it returns FALSE.

243885141.doc 100 od 372
Since the variable x is TRUE, it gets initialized to 1. It is thereafter checked for
equality with the value 0. As they are not equal, the fnal result is 0 or FALSE. This
result is put on the stack. The same logic applies had x been FALSE. 0 would have
been put on the stack and checked for equality with the other 0. Since they match
the fnal answer would be TRUE.
-5-

6perator 6<erloading

Every operator overload that we use in C#, gets converted to a function call in IL.
The overloaded > operator translates into the function op_GreaterThan and a + gets
converted to op_Addition etc. In the frst program of this chapter, we have
overloaded the + operator in class yyy to facilitate adding of two yyy objects.

a.cs
public class ###
{
{
yyy a ( ne> yyy(1))9
yyy b ( ne> yyy(1)9
yyy c9
c ( a L b 9
System.Console.Writeine(c.i)9
}
}
public class yyy
{
public int i9
public yyy( int j)
{
i ( j9
}
public static yyy operator L ( yyy $ K yyy y) {
System.Console.Writeine($.i)9
yyy # ( ne> yyy(12)9
return #9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class yyy @D)Kclass yyy @D1Kclass yyy @D2)
243885141.doc 101 od 372
ldc.i/.s 1)
ne>obj instance void yyy::.ctor(int&2)
stloc.)
ldc.i/.1
stloc.1
ldloc.)
ldloc.1
call class yyy yyy::opD.ddition(class yyyKclass yyy)
stloc.2
ldloc.2
ld'ld int&2 yyy::i
ret
ret
}
}
.class public auto ansi yyy e$tends 6mscorlib7System.%bject
{
.method public hidebysi! specialname static class yyy opD.ddition(class yyy $Kclass yyy
y) il mana!ed
{
.locals (class yyy @D)Kclass yyy @D1)
ldar!.)
ld'ld int&2 yyy::i
ldc.i/.s 12
stloc.)
ldloc.)
stloc.1
ldloc.1
ret
}
.method public hidebysi! specialname rtspecialname instance void .ctor(int&2 j) il
mana!ed
{
ldar!.)
ldar!.)
ldar!.1
st'ld int&2 yyy::i
ret
}
}

6utput
18
1>

While using the plus (+) operator on the two yyy objects, C# is aware that IL does
not support operator overloading. Therefore, it creates a function called op_Addition
in the class yyy.
243885141.doc 102 od 372

Thus, operator overloading gets represented as a mere function call. The rest of the
code is easy for you to fgure out.

In IL, there is no rule stating that if the > operator is overloaded, then the <
operator also has to be overloaded. These rules are imposed by the C# compiler, and
not by IL since, IL does not support the concept of overloading at all.

a.cs
public class ###
{
{
yyy a ( ne> yyy()9
System.Console.Writeine(a)9
}
}
public class yyy
{
public static implicit operator strin!(yyy y)
{
System.Console.Writeine("operator strin!")9
return "yyy class " 9
}
public override strin! =oStrin!()
{
System.Console.Writeine("=oStrin!")9
return "mukhi"9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
stloc.)
ldloc.)
call class System.Strin! yyy::opD?mplicit(class yyy)
ret
}
}
{
.method public hidebysi! specialname static class System.Strin! opD?mplicit(class yyy
y) il mana!ed
{
ldstr "operator strin!"
243885141.doc 103 od 372
ldstr "yyy class "
stloc.)
ldloc.)
ret
}
.method public hidebysi! virtual instance class System.Strin! =oStrin!() il mana!ed
{
ldstr "=oStrin!"
ldstr "mukhi"
stloc.)
ldloc.)
ret
}
}

6utput
operator string
--- class

The C# compiler is extremely intelligent. Whenever a yyy object has to be converted
to a string, it frst checks for the presence of an operator called string in the class
yyy. If it exists, it calls that operator.

The operator named string is a predefned data type in C#. Hence, it is converted
into the operator op_Implicit. This operator takes a yyy object as a parameter. It
returns a string on the stack for the WriteLine function. The ToString function is
not called.

C# will generate an error if you alter even a single parameter to the operator string,
but such is not a case with IL as it does not support operator overloading and
conversions.

a.cs
public class ###
{
{
yyy a ( ne> yyy()9
System.Console.Writeine(a)9
}
}
public class yyy
{
public override strin! =oStrin!()
{
System.Console.Writeine("=oStrin! yyy")9
return "mukhi"9
}
}
243885141.doc 104 od 372

a.il
.assembly mukhi {}
{
{
.entrypoint
stloc.)
ldloc.)
call void 6mscorlib7System.Console::Writeine(class System.%bject)
ret
}
}
{
.method public hidebysi! virtual instance class System.Strin! =oStrin!() il mana!ed
{
ldstr "=oStrin! yyy"
ldstr "mukhi"
stloc.)
ldloc.)
ret
}
{
ldar!.)
ret
}
}

6utput
$o)tring ---
mukhi

In the C# code above, we have dispensed with the operator string and instead, have
used the ToString function. As usual, we put the object a on the stack. In the IL
code given earlier, due to the presence of operator overloads in the C# code, the
function op_Implicit was called. In this case, since there are no operator overloads,
the object reference to object a is simply put on the stack. In class yyy, even though,
the function ToString is not explicitly called, the function does get executed.

Since the ToString is virtual in the class Object, at run time, the ToString function
is called from the class yyy, instead of being called from the class Object. This is
due to the concept of a vtable, where all virtual function addresses reside.

243885141.doc 105 od 372
If the word virtual is removed from the function, the ToString function gets called
from the class Object instead of the class yyy.

a.cs
public class ###
{
{
yyy a ( ne> yyy()9
strin! s9
s ( (strin!)a9
}
}
public class yyy
{
public static e$plicit operator strin!(yyy y)
{
System.Console.Writeine("operator strin!")9
return "strin! yyy" 9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class yyy @D)Kclass System.Strin! @D1)
stloc.)
ldloc.)
call class System.Strin! yyy::opD0$plicit(class yyy)
stloc.1
ldloc.1
ret
}
}
{
.method public hidebysi! specialname static class System.Strin! opD0$plicit(class yyy
y) il mana!ed
{
ldstr "operator strin!"
ldstr "strin! yyy"
stloc.)
ldloc.)
ret
}
}
243885141.doc 106 od 372
6utput
operator string
string ---

In the above code, we have cast a yyy object into a string using an explicit cast. IL
does not understand C# keywords like implicit or explicit. It converts the cast to an
actual function such as op_Explicit or op_Implicit. Thus writing a C# compiler
requires a lot of grey matter.

a.cs
public class ###
{
{
yyy a 9
a ( 1)9
}
}
public class yyy
{
static public implicit operator yyy(int v)
{
System.Console.Writeine(v)9
yyy # ( ne> yyy()9
return #9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.s 1)
call class yyy yyy::opD?mplicit(int&2)
stloc.)
ret
}
}
{
.method public hidebysi! specialname static class yyy opD?mplicit(int&2 v) il mana!ed
{
ldar!.)
stloc.)
ldloc.)
stloc.1
ldloc.1
243885141.doc 107 od 372
ret
}
}

6utput
18

In the code above, we are not creating an object that is an instance of class yyy.
Instead, we are simply initializing it to a numeric value of 10. This results in a call
to the implicit operator yyy, which takes an int value as a parameter and creates a
yyy object.

The IL code does not understand any of this. It simply calls the relevant operator,
which in this case is op_Implicit, with an int value. It is the responsibility of this
function to create an object that is an instance of class yyy. We are, in efect,
creating two locals that look like yyy, and initializing them to the new yyy like object
on the stack. Finally its value,10, is put on the stack.

a.cs
class ###
{
{
yyy a ( ne> yyy()9
yyy b ( ne> yyy()9
System.Console.Writeine( a && b)9
System.Console.Writeine( a & b)9
}
}
class yyy
{
public static yyy operator & (yyy $Kyyy y)
{
System.Console.Writeine("op &" )9
return ne> yyy()9
}
public static bool operator true(yyy $)
{
System.Console.Writeine("true ")9
return true9
}
public static bool operator 'alse(yyy $)
{
System.Console.Writeine("'alse " )9
return true9
}
}

a.il
.assembly mukhi {}
{
243885141.doc 108 od 372
{
.entrypoint
stloc.)
stloc.1
ldloc.)
dup
call bool yyy::opD3alse(class yyy)
brtrue.s ?D))1b
ldloc.1
call class yyy yyy::opD5it>ise.nd(class yyyKclass yyy)
?D))1b: call void 6mscorlib7System.Console::Writeine(class System.%bject)
ldloc.)
ldloc.1
call class yyy yyy::opD5it>ise.nd(class yyyKclass yyy)
ret
}
}
{
.method public hidebysi! specialname static class yyy opD5it>ise.nd(class yyy $Kclass
yyy y) il mana!ed
{
ldstr "op &"
stloc.)
ldloc.)
ret
}
.method public hidebysi! specialname static bool opD=rue(class yyy $) il mana!ed
{
.locals (bool @D))
ldstr "true "
ldc.i/.1
stloc.)
ldloc.)
ret
}
.method public hidebysi! specialname static bool opD3alse(class yyy $) il mana!ed
{
.locals (bool @D))
ldstr "'alse "
ldc.i/.1
stloc.)
ldloc.)
ret
}
243885141.doc 109 od 372
}

6utput
false
)-stem.6Cect
op Q
)-stem.6Cect

In the above code, we have created two objects, a and b, that are instances of a
class yyy. Then, we have employed the overloaded operators & and && to determine
as to how IL handles them internally. If we can grasp the intricacies of IL, our
understanding of C# will become so much better. Maybe, a programmer should be
allowed to program in C# only if he/she has learnt IL.

The dup operator duplicates the value present at the top of the stack. In this case,
it is the local V_0. All occurences of && and & in the C# code are replaced by the
functions op_False and op_BitwiseAnd respectively, on conversion to IL code.

The op_False operator returns either TRUE or FALSE.

If it returns TRUE, then the answer is TRUE, and the rest of the condition is
not checked. This is how the code is short-circuited. We simply jump past code
that is not to be executed.
If it returns FALSE, the & operator gets called. This operator gets converted
to op_BitwiseAnd. In order to enhance the efciency, the two objects were
already present on the stack for the op_BitwiseAnd operator to act upon.

You will be appreciate that IL makes our understanding of abstract concepts of C#
much easier to understand.

a.cs
class ###
{
{
System.=ype m9
m ( typeo'(int)9
System.Console.Writeine(m.3ull<ame)9
}
}

a.il
.assembly mukhi {}
{
.entrypoint
243885141.doc 110 od 372
.locals (class 6mscorlib7System.=ype @D))
ldtoken 6mscorlib7System.?nt&2
call class 6mscorlib7System.=ype 6mscorlib7System.=ype::Iet=ype3romVandle(value
class 6mscorlib7System.;untime=ypeVandle)
stloc.)
ldloc.)
callvirt instance class System.Strin! 6mscorlib7System.=ype::!etD3ull<ame()
ret
}
}

6utput
)-stem.Int@>

In IL, the object m is a local named V_0 of type System.Type. In C#, the typeof
keyword returns a Type object, but in IL, a large number of steps have to be
executed to achieve the same result.

Firstly, a type is placed on the stack using the instruction ldtoken. This
loads a token that represents a type or a feld or a method.
Next, the function GetTypeFromHandle is called that picks up a token, i.e. a
structure or value class from the stack.
The function thereafter returns a Type object representing a type, which in
this case is an int. This is stored in the local V_0 and then again loaded on the
stack.
Next, the function get_FullName is called. The function is not called
FullName but get_FullName as it is a property. This property returns a string
on the stack that is displayed using the WriteLine function.

a.cs
class ###
{
{
### # ( ne> ###()9
#.abc(#)9
object o ( ne> object()9
#.abc(o)9
}
void abc(object a)
{
i' ( a is ###)
System.Console.Writeine("###")9
}
}

a.il
.assembly mukhi {}
{
243885141.doc 111 od 372
{
.entrypoint
.locals (class ### @D)Kclass System.%bject @D1)
stloc.)
ldloc.)
ldloc.)
ne>obj instance void 6mscorlib7System.%bject::.ctor()
stloc.1
ldloc.)
ldloc.1
ret
}
.method private hidebysi! instance void abc(class System.%bject a) il mana!ed
{
ldar!.1
isinst ###
br'alse.s ?D))12
ldstr "###"
?D))12: ret
}
{
ldar!.)
ret
}
}

6utput
===

The keyword is lets us determine the data type of an object at run-time. Thus the is
keyword of C# has an equivalent instruction in IL.We are passing a zzz like object
and an object that is an instance of class object to the function abc. This function
demotes every parameter it receives to a class object, but the is keyword is
intelligent enough to know that the run time data type can be of a type other than
an object. Thus, it returns TRUE for the z object, but not for the a object.

The assembler code in Main or vijay remains the same. The relevant source code is
present in the function abc.
The instruction ldarg.1 pushes the value of parameter 1 onto the stack. The
data type of this parameter is Object.
Next, the instruction isinst is called. The type with which we want to
compare the object on the stack is passed as a parameter to isinst. This
instruction determines the data type of the value present on the stack.
243885141.doc 112 od 372
If the type of the isint instruction matches what is already there on the
stack, the object remains on the stack. If it does not match, a NULL is placed
on the stack.
The brfalse instruction executes the jump to a label if the result is TRUE in
the il code.

a.cs
class ### {
{
abc(1)))9
abc("hi")9
}
static void abc( object a) {
strin! s9
s ( a as strin!9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 @D))
ldc.i/.s 1))
stloc.)
ldloca.s @D)
call void ###::abc(class System.%bject)
ldstr "hi"
call void ###::abc(class System.%bject)
ret
}
.method private hidebysi! static void abc(class System.%bject a) il mana!ed
{
ldar!.)
isinst 6mscorlib7System.Strin!
stloc.)
ldloc.)
ret
}
}

6utput

hi

243885141.doc 113 od 372
The keyword as is similar to the is. Two objects have been placed on the stack and
the function abc is called. This function requires an object on the stack. The type of
the variable a has to be converted from int to an Object. The isinst instruction takes
value at the top of the stack and converts it into the data type specifed. If it is
unable to do so, it puts a NULL on the stack.

In the second call, on the stack, a string is obtained for the WriteLine function.
Since an int32 value cannot be converted into a string, a NULL value is placed on
the stack. Hence the WriteLine function displays a blank line.

Unsafe Code

a.cs
class ###
{
unsa'e public static void 8ain()
{
System.Console.Writeine(si#eo'(byte E))9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
si#eo' unsi!ned int+E
ret
}
}

6utput
K

All pointers in C# have a size of 4 bytes each. The sizeof keyword is an instruction
in IL that returns the size of the variable that is passed as a parameter to it. It can
only be used on a value type variable, not on a reference type.

In C# we use the modifer unsafe while introducing pointers. This modifer does not
exist in IL, as IL regards everything as unsafe. Note that a byte in C# is converted
into an int8 in IL.

a.cs
class ###
{
{
243885141.doc 114 od 372
### a ( ne> ###()9
a.abc()9
}
unsa'e public void abc()
{
int Ei9
int j(19
i ( &j9
System.Console.Writeine((int)i)9
Ei ( 1)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
stloc.)
ldloc.)
ret
}
.method public hidebysi! instance void abc() il mana!ed {
.locals (int&2E @D)Kint&2 @D1)
ldc.i/.1
stloc.1
ldloca.s @D1
stloc.)
ldloc.)
conv.i/
ldloc.)
ldc.i/.s 1)
stind.i/
ldloc.1
ret
}
}

6utput
N??>@@N
18

In the following program, the main function calls a function called abc. That part of
the code has already been explained previously. The remaining part of the code is
explained in the next few lines in bullet form.

243885141.doc 115 od 372
In C#, whenever we want to obtain the address of a variable, we have to precede the
name of the variable with the symbol &. The & places the address of a variable on
the stack. IL interprets a pointer as a data type.

We start by creating a pointer to an int i in C#. V_0 is interpreted as a
pointer due to the * sign that precedes it.
Next, we initialize the variable j or V_1 to the value 1.
The instruction ldloca.s places the address of j or V_1 on the stack.
The instruction stloc.0 initializes V_0 to this value i.e. the address of j or
V_1.
The instruction ldloc.0 then places the value of the pointer on the stack and
calls the WriteLine function with an int as a parameter.
We then place the value of the pointer that is pointing to int j in memory, on
the stack.
Next, we place the number 10 on the stack.
The instruction stind places the current value on the stack i.e. 10 into the
memory location placed earlier on the stack. Thus, we have utilised stind to fll
up a certain memory location with a specifc value. This value is the address of
the variable j in memory.
The WriteLine function is fnally called to display the new value of the
variable j.

a.cs
class ###
{
{
### a ( ne> ###()9
a.abc()9
}
{
int Ei9
int j(19
i ( &j9
iLL9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
243885141.doc 116 od 372
stloc.)
ldloc.)
ret
}
{
.locals (int&2E @D)Kint&2 @D1)
ldc.i/.1
stloc.1
ldloca.s @D1
stloc.)
ldloc.)
conv.i/
ldloc.)
ldc.i/./
add
stloc.)
ldloc.)
conv.i/
ret
}
}

6utput
N??>@@N
N??>@K8

The above program is presented to demonstrate that the C# compiler understands
pointer arithmetic, whereas IL does not.

The crucial line in the above code is the one that contains the code ldc.i4.4. The C#
compiler calculates that a pointer to an int has a size of 4 and therefore, it puts this
instruction in the IL code to facilitate pointer arithmetic.

Had we replaced the int by a short, the C# compiler would have replaced the ldc
instruction with the code ldc.i4.2 because, it is aware that the size of a pointer to
short is 2. Thus, we can safely conclude that it is the C# compiler that understands
pointer arithmetics and not IL.

a.cs
class ###
{
public static unsa'e void 8ain()
{
intE i ( stackalloc int61))79
}
}
a.il
243885141.doc 117 od 372
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2E @D))
ldc.i/./
ldc.i/.s 1))
mul
localloc
stloc.)
ret
}
}

In C#, the stackalloc function allocates a certain amount of memory on the stack
whereas, new allocates memory on the heap. Heap memory is longer lasting than
stack memory.

The equivalent of this function in the IL instruction set is localloc. The parameter to
this function specifes the amount of memory to be allocated. In the C# program,
we have specifed that we want to allocate memory for 100 ints. Since each int
requires 4 bytes of memory, in IL, the numbers 4 and 100 are put on the stack and
they are multiplied using the mul operator. Thus, a total of 400 bytes of memory
are fnally allocated.

a.il
.assembly mukhi {}
{
.method public hidebysi! static void vijay(class System.Strin!67 aK int&2 i ) il mana!ed
{
.entrypoint
ldar!.)
ldlen
conv.i/
ret
}
}

6utput
,.ception occurred: )-stem.MethodAccess,.ception: $he signature for the
entr- point has too man- arguments.

The assembler does not check the signature for the entrypoint function. But at run-
time, the signature is checked to confrm whether it has only one parameter or not.
Since there are two parameters in the entrypoint function, the run time exception
243885141.doc 118 od 372
has been generated. If there had been a single int parameter, no exception would
have occurred at run-time.

The directive entrypoint cannot be present in more that one function, even if they
are in separate classes. This is already illustrated in Chapter 1.

Enums

a.cs
class ###
{
{
System.Console.Writeine(yyy.black)9
}
}
enum yyy
{
a1KblackKhell
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (value class yyy @D))
ldc.i/.1
stloc.)
ldloca.s @D)
bo$ yyy
ret
}
}
.class value private auto ansi seriali#able sealed yyy e$tends 6mscorlib7System.0num
{
.'ield public specialname rtspecialname int&2 valueDD
.'ield public static literal value class yyy a1 ( int&2()$)))))))))
.'ield public static literal value class yyy black ( int&2()$)))))))1)
.'ield public static literal value class yyy hell ( int&2()$)))))))2)
}

6utput
1

An enum is implemented as a class that is serializable. This means that the CLR
can write it to a disk or send it over a network. It extends the class System.Enum.

243885141.doc 119 od 372
In the C# program, three enums are created. On conversion to IL, three
corresponding literal felds with the same names are created. The values of the
enum variables are calculated at compile time. There is a special variable
introduced called value__.

Also, in the function vijay, the value of enum 'black' is being displayed. Observe
carefully, there is no mention of 'black' in the generated IL code.

IL handles this situation in the following chronological steps:

First, it puts the number 1 on the stack.
Then, it stores this value 1 in the yyy value class or structure V_0.
Next, it uses ldloca.s to place the address of the variable V_0 on the stack.
Thereafter, it uses box to convert it into an object.
Finally, the value 1 is stored in the value class yyy using instruction stloc.0.

Thus, it may be appreciated that IL discards all the enum names and only deals
with the values. However, we cannot get rid of the special variable value__ because
its omission will result in an error at run time.

a.cs
public enum aa : byte
{
a1Ka2Ka&
}
class ###
{
{
System.Console.Writeine(1) L aa.a2)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (value class aa @D))
ldc.i/.s 11
stloc.)
ldloca.s @D)
bo$ aa
ret
}
}
.class value public auto ansi seriali#able sealed aa e$tends 6mscorlib7System.0num
243885141.doc 120 od 372
{
.'ield public specialname rtspecialname unsi!ned int+ valueDD
.'ield public static literal value class aa a1 ( int+()$)))
.'ield public static literal value class aa a2 ( int+()$)1)
.'ield public static literal value class aa a& ( int+()$)2)
}

6utput
11

It can be seen from the code above that in the IL fle, the expression 10 + aa.a2 is
conspicuous by its absence. On generation of the IL code, the expression gets
converted to its actual value i.e. 11.

After examining the above code, we can be rest assured that enums, like other
artefacts mentioned earlier, exist only in the realm of C# and have no direct
representation in IL.

a.cs
class ###
{
System.Console.Writeine(yyy.a1 (( yyy.a2)9
}
}
enum yyy
{
a1 ( 1Ka2 ( /
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.)
ret
}
}
.class value private auto ansi seriali#able sealed yyy e$tends 6mscorlib7System.0num
{
.'ield public specialname rtspecialname int&2 valueDD
.'ield public static literal value class yyy a1 ( int&2()$)))))))1)
.'ield public static literal value class yyy a2 ( int&2()$)))))))/)
}
6utput
0alse

243885141.doc 121 od 372
When we try to compare an enum with a number using the comparison operator
==, this operator gets replaced with the value FALSE at run time. Therefore, the IL
code that is generated is vastly at variance with the original C# code.

Switch

a.cs
public class ###
{
{
### a ( ne> ###()9
a.abc(1)9
a.abc(1))9
}
void abc(int i)
{
s>itch (i)
{
case ):
System.Console.Writeine("#ero")9
break9
case 1:
System.Console.Writeine("one")9
break9
de'ault:
System.Console.Writeine("end")9
}
}
}

a.il
.assembly mukhi {}
{
.entrypoint
stloc.)
ldloc.)
ldc.i/.1
ldloc.)
ldc.i/.s 1)
ret
}
{
.locals (int&2 @D))
ldar!.1
stloc.)
ldloc.)
243885141.doc 122 od 372
s>itch ( ?D))12K?D))1e)
br.s ?D))2a
?D))12: ldstr "#ero"
br.s ?D))&/
?D))1e: ldstr "one"
br.s ?D))&/
?D))2a: ldstr "end"
?D))&/: ret
}
}

6utput
one
end

The switch statement of C# is converted to the switch instruction in IL. This
instruction checks the value at the top of the stack and accordingly branches to the
relevant label.
If the value is 0, it branches to the label IL_0012.
If the value is 1, it branches to the label IL_001e and so on.
If none of the cases match, the default clause will apply. In this case, the br.s
IL_002a instruction is executed.

a.cs
public class ###
{
{
### a ( ne> ###()9
a.abc())9
a.abc(1))9
}
void abc(int i)
{
s>itch (i)
{
case ):
System.Console.Writeine("#ero")9
break9
case 1:
System.Console.Writeine("one")9
break9
de'ault:
System.Console.Writeine("end")9
}
}
}

a.il
.assembly mukhi {}
243885141.doc 123 od 372
{
{
.entrypoint
stloc.)
ldloc.)
ldc.i/.)
ldloc.)
ldc.i/.s 1)
ret
}
{
.locals (int&2 @D))
ldar!.1
stloc.)
ldloc.)
ldc.i/.)
beN.s ?D)))c
ldloc.)
ldc.i/.1
beN.s ?D))1+
br.s ?D))2/
?D)))c: ldstr "#ero"
br.s ?D))2e
?D))1+: ldstr "one"
br.s ?D))2e
?D))2/: ldstr "end"
?D))2e: ret
}
}

6utput
=ero
end

In the previous example, we consciously used consecutive values such as 0, 1 and
so on. In this example, we have used discontinuous values like 0 and 5.

On conversion to IL code, we do not see the instruction switch, but instead, we see
a series of jumps. The instruction beq.s is based on ceq and brtrue.s.
We place the individual case values on the stack and use beq.s to check whether it
returns TRUE or FALSE.

If it is TRUE, we execute the relevant code and jump to the ret instruction.
243885141.doc 124 od 372
If it is FALSE, the next case value on the stack is checked.
Finally, if none of the beq.s instructions result in TRUE, the default clause,
which is at the end of the switch constuct, is executed.

Just as we do not have the equivalent of the if statement in IL, we also do not have
a pure corresponding switch instruction in IL. The switch is more of a convenience
to programmers of C#. The rule that a case has to end with a break statement, do
not apply in IL.

Checked and Unchecked

a.cs
class ###
{
int b ( 1))))))9
int c ( 1))))))9
{
### a ( ne> ###()9
a.pNr(a.bKa.c)9
a.$y#(a.bKa.c)9
}
int pNr( int $K int y)
{
return unchecked($Ey)9
}
int $y#( int $K int y)
{
return checked($Ey)9
}
}

a.il
.assembly mukhi {}
{
.'ield private int&2 b
.'ield private int&2 c
{
.entrypoint
stloc.)
ldloc.)
ldloc.)
ld'ld int&2 ###::b
ldloc.)
ld'ld int&2 ###::c
call instance int&2 ###::pNr(int&2Kint&2)
pop
243885141.doc 125 od 372
ldloc.)
ldloc.)
ld'ld int&2 ###::b
ldloc.)
ld'ld int&2 ###::c
call instance int&2 ###::$y#(int&2Kint&2)
pop
ret
}
.method private hidebysi! instance int&2 pNr(int&2 $Kint&2 y) il mana!ed
{
.locals (int&2 @D))
ldar!.1
ldar!.2
mul
stloc.)
br.s ?D)))*
?D)))*: ldloc.)
ret
}
.method private hidebysi! instance int&2 $y#(int&2 $Kint&2 y) il mana!ed
{
.locals (int&2 @D))
ldar!.1
ldar!.2
mul.ov'
stloc.)
br.s ?D)))*
?D)))*: ldloc.)
ret
}
{
ldar!.)
ldc.i/ )$'/2/)
st'ld int&2 ###::b
ldar!.)
ldc.i/ )$'/2/)
st'ld int&2 ###::c
ldar!.)
ret
}
}

6utput
,.ception occurred: )-stem.6<erHow,.ception: An e.ception of t-pe
)-stem.6<erHow,.ception was thrown.
at ===.<iCa-DE

This program demonstrates the use of the checked and unchecked operators and
their implementation in IL.

243885141.doc 126 od 372
The felds b and c are initialised to a decimal value of 1000 or a hex value of
Oxf4240 in the constructor. Then, in the function vijay, they are put on the stack,
and functions pqr and xyz are called. These functions return values that are not
subsequently used anywhere. Thus, the pop instruction is used to remove them of
the stack.

The function pqr does not achieve anything useful. The br.s instruction also does
not achieve anything of signifcance. This function uses the unchecked operator in
C#, which happens to be the default operator.

The function xyz only introduces a small variation: the mul instruction has been
replaced by the mul.ovf instruction. The term ovf is the short form for the word
overfow. In case an overfow occurs, the mul.ovf instruction will throw an
exception.

Thus, overfow handling is done internally by employing IL instructions. If IL was
unable to provide for handling overfows, the C# compiler would have had to
provide the code for generation of an exception.

In conclusion, whenever we use the checked operator, the compiler tells IL to use
the ovf family of instructions, so that the program can check for an overfow and
generate an exception.

a.cs
class ###
{
const int $ ( 1))))))9
const int y ( 1))))))9
static int abc() {
return checked($ E y)9
}
static int pNr() {
return unchecked($ E y)9
}
static void 8ain()
{
int i 9
i ( abc()9
i ( pNr()9
}
}

a.il
.assembly mukhi {}
{
.'ield private static literal int&2 $ ( int&2()$)))3/2/))
.'ield private static literal int&2 y ( int&2()$)))3/2/))
243885141.doc 127 od 372
{
.entrypoint
.locals (int&2 @D))
stloc.)
ldloc.)
call int&2 ###::pNr()
stloc.)
ldloc.)
ret
}
{
.locals (int&2 @D))
ldc.i/ )$d/a11)))
stloc.)
br.s ?D)))+
?D)))+: ldloc.)
ret
}
.method private hidebysi! static int&2 pNr() il mana!ed
{
.locals (int&2 @D))
ldc.i/ )$d/a11)))
stloc.)
br.s ?D)))+
?D)))+: ldloc.)
?D))),: ret
}
}

6utput
R:>:@:SSN9
R:>:@:SSN9

In the case of a constant, it does not matter whether a function uses the checked or
unchecked operators. This is because, constants are a compile time issue. They are
converted to actual constants by the compiler, as has oft been repeated.

The compiler actually multiples the constants x and y and replaces them with the
value of the resultant product. Thus, the mul operator does not make an
appearance anywhere as there is no trace of the checked operator.

It can be appreciated that the treatment of constants is diferent in C# and IL. So,
given the IL code, it is very difcult to use reverse engineering to arrive back at the
original C# code.

243885141.doc 128 od 372
Please note that most of the arithmetic operators in IL can be sufxed with .ovf
thereby ensuring that they check for overfow.

a.cs
class ### {
int iKj ( +9
i ( j MM 29
i ( j PP 29
}
}

a.il
.assembly mukhi {}
{
.entrypoint
ldc.i/.+
stloc.1
ldloc.1
ldc.i/.2
shr
stloc.)
ldloc.)
ldloc.1
ldc.i/.2
shl
stloc.)
ldloc.)
ret
}
}

6utput
>
@>

The bitwise left shift and right shift operators of C# are converted to instructions
shl and shr respectively.

Every time we use the bitwise right shift operator, it is equivalent to dividing
by 2.
Every time we use the bitwise left shift operator, it is equivalent to
multiplying by 2.

243885141.doc 129 od 372
These instructions execute much faster than the division and multiplication
instructions.

a.cs
class ###
{
{
int i(29
i ( LLi:LLi9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 @D))
ldc.i/.2
stloc.)
ldloc.)
ldc.i/.1
add
dup
stloc.)
ldloc.)
ldc.i/.1
add
dup
stloc.)
div
stloc.)
ldloc.)
ret
}
}

6utput
8
.:? D when int is changed to HoatE

The C# compiler executes code as it sees it. It starts from left to right. It frst
encounters ++i. The value of i is thus increased from 2 to 3.

243885141.doc 130 od 372
The dup instruction of IL duplicates the value at the top of the stack. The stloc.0
assigns the number 3 to i. Then the number 1 is added to the variable i, making its
resultant value 4.

The div instruction now sees 3 and 4 on the stack and thus, divides 3 by 4. The
fnal answer is 0 or .75, depending upon the data type of i.

In programming languages like C, the result is not pre-determinable, but in C#, the
order of evaluation is very lucid and clear - it executes the code from left to right
using the principle of "frst come frst served".

a.cs
class ###
{
{
### a ( ne> ###()9
int i ( )9
a.abc(iLLKiLLKiLL)9
}
public void abc( int $K int yK int #)
{
System.Console.Writeine(#)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class ### @D)Kint&2 @D1)
stloc.)
ldc.i/.)
stloc.1
ldloc.)
ldloc.1
dup
ldc.i/.1
add
stloc.1
ldloc.1
dup
ldc.i/.1
add
stloc.1
ldloc.1
dup
ldc.i/.1
243885141.doc 131 od 372
add
stloc.1
call instance void ###::abc(int&2Kint&2Kint&2)
ldloc.1
ret
}
.method public hidebysi! instance void abc(int&2 $Kint&2 yKint&2 #) il mana!ed
{
ldar!.&
ret
}
}

6utput
>
@

The above example again demonstrates that the compiler is unambiguous about the
order of execution of code on a "frst come frst served basis". It builds on the earlier
example.

The variable i is frst placed on the stack and then incremented by one, making its
value 1, but the value 0 is placed on the stack. Thus x becomes zero. Thereafter, 1
is placed on the stack and i is again incremented by 1, making its value 2. The
value of the parameter y is 1. Finally, 2 is placed on the stack. Parameter z has the
value 2 and the value of the variable i now becomes 3.

The IL code is much easier to understand.

We have created a zzz like object as local V_0 and only one int32 representing the
variable i. The instruction ldc.i4.0 places the initial value of i on the stack. Then,
stloc.1 assigns the value 0 to i. When the function abc is called, the this pointer is
placed on the stack using ldloc.0.

Now the fun starts. The value of i, which is 0, is placed on the stack and duplicated
using the dup instruction. Thus, two zeroes are placed on the stack. Next, the
number 1 is placed on the stack and the add instruction adds this number to the 0
already on the stack, resulting in the sum of 1. The numbers 1 and 0, which were
present on the stack earlier, are removed.

We store this value in i using ldloc.1 and place the new value 1 on the stack. We
again use dup to duplicate this value and put it on the stack and use the add
instruction to add the original and the duplicated values.

By now, the value of i is now 3 and the this pointer and the values 0, 1 and 2 are
present on the stack. Hence WriteLine shows 2 in abc.
243885141.doc 132 od 372

All this IL code has been written by a compiler and not a human being. If you are
not clear about the above code, you can draw the stack diagrams.

a.cs
class ### {
int i ( &24*+9
int j ( Ui9
}
}

a.il
.assembly mukhi {}
{
.entrypoint
ldc.i/ )$+)))
stloc.)
ldloc.)
not
stloc.1
ldloc.1
ret
}
}

6utput
R@>:NS

The bitwise operator ~ complements the bits, converting the 0s to 1s and 1s to 0s.
This operator has a very simple equivalent in IL, which is the not instruction.

a.cs
class ###
{
{
int i9
i ( 1,9
System.Console.Writeine(iW1)9
}
}

a.il
.assembly mukhi {}
{
243885141.doc 133 od 372
{
.entrypoint
.locals (int&2 @D))
ldc.i/.s 1,
stloc.)
ldloc.)
ldc.i/.1
rem
ret
}
}

6utput
K

The remainder operator % is converted to the rem instruction in IL. Thus, you must
have noticed that, all the basic operators of C# have simple equivalent IL
instructions.

a.cs
class ###
{
{
int i ( 21Kj ( 11K k ( 119
System.Console.Writeine(i & j )9
System.Console.Writeine(i X k )9
System.Console.Writeine(i H k )9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 @D)Kint&2 @D1Kint&2 @D2)
ldc.i/.s 21
stloc.)
ldc.i/.s 11
stloc.1
ldc.i/.s 11
stloc.2
ldloc.)
ldloc.1
and
ldloc.)
ldloc.2
or
243885141.doc 134 od 372
ldloc.)
ldloc.2
$or
ret
}
}

6utput
?
@1
@8

The bitwise anding, oring and xoring are also supported in IL by the equivalent
instructions and, or and xor. Thus, IL has most of the instructions present in its
assembler.

In addition, it has a number of higher level constructs. However, there is no logical
ANDing and ORing in IL because, IL does not understand the logical values TRUE
and FALSE.

-6-

2eference and Falue $-pes

Interfaces

An interface is a reference type, in spite of the fact that it has no code at all. Thus,
we cannot instantiate an interface. We can use it as a construct for the creation of
new types. An interface defnes a contract that is left to the class to implement.

An interface can have static felds. If an interface contains 10 abstract virtual
functions, then the class implementing from that interface has to supply the code
for all 10 of them. Thus, if a class does not provide all the function
implementations, then we cannot use the class. In such a scenario, a class derived
from it must provide the implementation.

The interface keyword in C# is a class, which the documentation describes as a
semantic attribute.

a.il
.assembly mukhi {}
.class ###
{
243885141.doc 135 od 372
.method static void vijay()
{
.entrypoint
ret
}
}
.class inter'ace yyy
{
}

We are not allowed to place any code in an interface. An interface consists only of
the function prototype, followed by a pair of curly braces {}.

a.il
.assembly mukhi {}
.class ###
{
{
.entrypoint
ret
}
}
{
.method static void vijay1()
{
ret
}
}

6utput
///// 0AIL12, /////

vijay1 is a function created in the interface yyy. As this is not permitted, the il
assembler has the domino efect as shown above.

a.il
.assembly mukhi {}
.class ###
{
{
.entrypoint
ret
}
}
{
.'ield int&2 i
}

6utput
243885141.doc 136 od 372
///// 0AIL12, /////

No variables either can be placed inside an interface. This rule is similar to that of
C#. Even though the documentation says that we can place static felds in an
interface, when we tried to do so, an error was generated.

a.il
.assembly mukhi {}
.class ###
{
{
.entrypoint
ret
}
}
{
}

6utput
,.ception occurred: )-stem.,.ecution,ngine,.ception: An e.ception of t-pe
)-stem.,.ecution,ngine,.ception was thrown.
at ===.<iCa-DE

If we create an object such as an interface using newobj, the assembler does not
generate any error, but the runtime throws an exception.

a.il
.assembly mukhi {}
.class ### implements yyy
{
{
.entrypoint
ret
}
}
{
.method public hidebysi! ne>slot virtual abstract instance void a1() il mana!ed
{
}
}

6utput
,.ception occurred: )-stem.$-peLoad,.ception: 3ould not load class O===O
ecause the method Oa1O is not de*ned.
,.ception occurred: )-stem.MissingMethod,.ception: 3ould not *nd the entr-
point.

243885141.doc 137 od 372
The above program has only one function, a1 in the interface. This function has a
pair of curly braces {}, but as mentioned earlier, we are not allowed to place any
code within them.

The class zzz has been derived from yyy, using the keyword implements. The
assembler does not check whether all code of the interface is implemented as it is
done only at runtime, and hence, an exception is generated. Thus, we can see that
most IL errors occur at run time, and not at compile time.

a.cs
class ###
{
{
}
}
inter'ace ddd
{
void a1()9
void a2()9
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ret
}
}
.class inter'ace private abstract auto ansi ddd
{
{
}
{
}
}

To reiterate what we have said earlier, an interface in C# becomes a class directive
in IL with the interface modifer added to it. The two functions a1 and a2 become
actual functions in the class ddd, and are marked as virtual, newslot as well as
abstract, i.e. having no implementation.

a.il
.assembly mukhi {}
{
243885141.doc 138 od 372
{
.entrypoint
ret
}
}
{
{
ret
}
}

,rror
///// 0AIL12, /////

We cannot place any code, including a ret, in a method that is marked as abstract.
This modifer signifes that the code for the function will be provided from some
other source. Inspite of what the documentation says, a static constructor cannot
be placed in an interface.

a.cs
class ### : dddKeee {
{
}
}
inter'ace ddd
{
}
inter'ace eee
{
}

a.il
.assembly mukhi {}
{
}
.class inter'ace private abstract auto ansi eee
{
}
.class private auto ansi ### e$tends 6mscorlib7System.%bject implements dddKeee
{
{
.entrypoint
ret
}
}

243885141.doc 139 od 372
For the purposes of inheritance, C# does not diferentiate between an interface and
a class. There is, however, a subtle diference between them in a sense that, we can
derive from more than one interface, but not from more than one class.

In IL, there is a marked diferentiation between an interface and a class. We extend
from a class and implement an interface. This is the same syntax that the Java
programming language uses.

When one compares C# with Java, their support for features such as these should
be highlighted.

a.cs
class ### {
{
yyy a ( ne> yyy()9
ddd d ( a9
d.a1()9
a.a1()9
}
}
inter'ace ddd
{
void a1()9
}
class yyy : ddd
{
public void a1()
{
System.Console.Writeine("a1")9
}
}
a.il
.assembly mukhi {}
.entrypoint
.locals (class yyy @D)Kclass ddd @D1)
stloc.)
ldloc.)
stloc.1
ldloc.1
callvirt instance void ddd::a1()
ldloc.)
call instance void yyy::a1()
ret
}
}
{
{
243885141.doc 140 od 372
}
}
.class private auto ansi yyy e$tends 6mscorlib7System.%bject implements ddd
{
.method public hidebysi! ne>slot 'inal virtual instance void a1() il mana!ed
{
ldstr "a1"
ret
}
{
ldar!.)
ret
}
}
6utput
a1
a1

We created two locals that look like class yyy and interface ddd. Then, we created
an object that looks like yyy and initialized the variable V_0 to it.

The statement d =a is translated to: loading the value of V_0 on the stack, and
using instruction stloc.1 to initialize the variable V_1. Thereafter, calling the
function a1 of the interface ddd.

We loaded the variable value V_1 on the stack. Since it was called through the
interface, we used callvirt instead of call. If we had called it through the object of
type yyy, then we would have used call.

Thus, IL understands that a call through an interface object is to be treated in a
special manner. We can change the last occurrence of ldloc.1 to ldloc.0, since both
have the same values. A call to an interface is evaluated at run time, as the
assembler does not convert it into a class access. In the locals directive, the word
class, and not interface, is placed in front of ddd.

Thus, calling a function through an interface is equivalent to using callvirt since,
there is no code in the interface. A callvirt takes more time to execute than the
plain call instruction. However, callvirt introduces dynamism at runtime.

a.cs
class ###
{
{
ddd d ( ne> yyy()9
eee e ( ne> yyy()9
d.a1()9
e.a1()9
243885141.doc 141 od 372
}
}
inter'ace ddd
{
void a1()9
}
inter'ace eee
{
void a1()9
}
class yyy : ddd K eee
{
void ddd.a1()
{
System.Console.Writeine("ddd a1")9
}
void eee.a1()
{
System.Console.Writeine("eee a1")9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals ( class yyy v K class eee v1)
stloc.)
stloc.1
ldloc.)
ldloc.1
callvirt instance void eee::a1()
ret
}
}
{
{
}
}
.class inter'ace private abstract auto ansi eee
{
{
}
}
.class private auto ansi yyy e$tends 6mscorlib7System.%bject implements dddKeee
{
243885141.doc 142 od 372
.method private hidebysi! ne>slot 'inal virtual instance void ddd.a1() il mana!ed
{
.override ddd::a1
ldstr "ddd a1"
ret
}
.method private hidebysi! ne>slot 'inal virtual instance void eee.a1() il mana!ed
{
.override eee::a1
ldstr "eee a1"
ret
}
{
ldar!.)
ret
}
}

6utput
ddd a1
eee a1

We have created a class yyy that is derived from two interfaces, ddd and eee, that
have the same function a1.

Since we want a separate implementation for each, we have to preface each
occurrence of a1 with the name of the interface, i.e. either ddd or eee, in the class
yyy.
We have created the two objects that look like yyy and stored them in classes that
look like ddd and eee. Since the function is called from an interface pointer, we have
to use callvirt instead of call. In the IL code, the two interfaces are created as shown
earlier.

In class yyy, we implement from ddd and eee, but since the two functions cannot
have the same name, we have to preface the name of the function with the name of
the interface.

In the method, we have used a directive called .override. This directive clearly
specifes as to which function from a specifed interface the function override.
Calling of an interface is a run time issue. The CLR does all the routine work.

a.il
.assembly mukhi {}
.entrypoint
.locals (class $$$ @D)Kclass ddd @D1)
243885141.doc 143 od 372
stloc.)
ldloc.)
stloc.1
ldloc.1
ldloc.)
call instance void $$$::a2()
ret
}
}
{
{
}
{
}
}
.class private auto abstract ansi yyy e$tends 6mscorlib7System.%bject implements ddd
{
{
ldstr "a1"
ret
}
.method public hidebysi! ne>slot virtual instance void a2() il mana!ed
{
ldstr "a2"
ret
}
{
ldar!.)
ret
}
}
.class $$$ e$tends yyy
{
{
ldstr "a22"
ret
}
}
6utput
a>
a>>

243885141.doc 144 od 372
The above program clears a large number of cobwebs. Let us start analysing this
program from the beginning.
We have a class interface ddd that has two functions a1 and a2. We then create a
class yyy, that implements from ddd and contains code for only one function a1.

This makes the class incomplete and hence, we tag it with the modifer abstract.
However, this modifer is optional. One more class xxx is created, that derives from
yyy and implements the second function a2. All goes well so far.

Then, using call, the function a2 of class xxx is called. However, when we call the
same function of the interface ddd using callvirt, the function is called of class yyy
and not xxx. This is so as the function in class xxx has nothing to do with the one
in class yyy.

Compare this example with the override modifer example shown earlier. If we
eliminate the code of function a2 from the class yyy, we get the following error:

6utput
,.ception occurred: )-stem.$-peLoad,.ception: 3ould not load class O...O
ecause the method Oa>O is not de*ned.
at ===.<iCa-DE

The function a2 is present in the class xxx, but it has been eliminated from the
class yyy. However, the function needs to be present in both the classes.

The same rules of nesting apply to interfaces also. Nothing stops an interface from
implementing another interfaces, using the keyword implements. Here, the word
implements may be misleading.

The keyword implies that the class that implements this interface must provide the
code for it.

An interface has fve restrictions:

i. All methods must be either virtual or static.
ii. The virtual methods must be abstract and public.
iii. No instance felds are allowed.
iv. An interface is abstract and cannot be instantiated.
v. An interface cannot inherit from a class.
vi. Under no circumstances can an interface contain code.

Structures

A structure handles memory more efciently than a class. IL does not support a
struct type directly. As IL does not recognise a structure, it does not enforce the
following rules:
243885141.doc 145 od 372

Constructors must have parameters
All members of a structure must be initialised before leaving the constructor.

Also, structures are derived from ValueType and not Object.

The type system of the .Net world is simplicity personifed. It divides all the known
types into one of the two categories: a value type or a reference type.

A reference type is known by a reference, that is, a memory location that
stores the address where the object resides in memory.
A value type, however, is directly stored in the memory location occupied by
the variable that represents the type.

Value types are used to represent small data items like local variables, integers,
numbers with decimal places etc. The memory allocated is on the stack and not on
the heap.

To access a reference type, the location of the variable in memory is to be frst
determined. This is not true for a value type. Hence, there is no overhead of an
indirection involved with a value type and therefore, it is much more efcient.

The disadvantage of a value type is that they cannot be derived from and, if the
data they represent is fairly large, then copying the type on the stack is not an
efcient way of representing that type. There is no need to instantiate a variable of
value type, as it is already instantiated. Apart from these variations, value types are
similar to reference types.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (value class $$$ v)
ldloca v
initobj value class $$$
ldloca v
ld'ld int&2 $$$::i
call void System.Console::Writeine(int&2)
ret
}
}
.class value $$$
{
.'ield public int&2 j
}
243885141.doc 146 od 372

6utput
8

In the above example, we have created a value class or a value type called xxx, that
has two public felds i and j. We used the instruction initobj to create a new value
type.

To display the value of i, we frst created a local variable that represents our value
class. In our case, the variable is called v. ldloca is used to load the address of the
variable v on the stack. Then we called initobj with the name of the value class xxx
as a parameter thus creating a a new value type.

We, then, again load the address of the value type v on the stack and call ldfd. This
instruction needs the address of the value type on the stack to work with. The only
reason that the value of i is ZERO is that the instruction initobj guarantees that all
members of the value type will be initialized to zero.

a.il
.assembly mukhi {}
.entrypoint
ldloca v
initobj value class $$$
ldloca v
ldc.i/.2
call instance void $$$::.ctor(int&2)
ldloca v
ld'ld int&2 $$$::i
ret
}
}
.class value $$$
{
.method public hidebysi! specialname rtspecialname instance void .ctor(int&2 p) il
mana!ed
{
ldar!.)
ldar!.1
st'ld int&2 $$$::i
ret
}
}
6utput
>

243885141.doc 147 od 372
The correct way to initialize a value class is to call the constructor. We frst have to
load the address of the value type v on the stack. Then, since the constructor
expects a single parameter on the stack, we place the number 2 on the stack using
ldc. The constructor is then called in the same manner as we call any other
function.

In the constructor, we frst place the this pointer on the stack. The this pointer, or
the frst invisible parameter to a function, is a reference to the starting location of
the object in memory. Parameter 1 is placed on the stack and stfd is called.

The constructor initializes all members of a value class. The static felds of a value
class are initialized when the value type is frst loaded.

a.il
.assembly mukhi {}
{
{
.entrypoint
ldloca v
ld'ld int&2 $$$::i
ret
}
}
.class value $$$
{
}

6utput
?1@98>99

Not using initobj, like in the above example will assign a random value to the value
type. The use of initobj is optional. This instruction requires a managed pointer to
an instance of the value type and it is one of the few instructions that does not
return anything on the stack. The constructor is never called by the initobj
instruction. The sole role initobj performs is to initialize all the value class members
to ZERO.

While verifying code, one should ensure that all the felds of a value type are
assigned a value before they are read or passed as parameters to a method. The
code in constructor assigns values to every feld.

243885141.doc 148 od 372
You can see the contrast between initobj and newobj. Value Types use initobj
whereas reference types use newobj. Also, value types are derived from
System.ValueType.

Value types can have static, instance and virtual methods. Here, the static
methods are called in a similar manner when in a class.

a.il
.assembly mukhi {}
{
{
.entrypoint
ldloca v
ldc.i/.&
call instance void $$$::a1(int&2)
ldloca v
ld'ld int&2 $$$::i
ret
}
}
.class value $$$
{
.method public instance void a1(int&2 p) il mana!ed
{
ldar!.)
ldar!.1
st'ld int&2 $$$::i
ret
}
}

6utput
@

To call an instance function of a value class, there is no need for either initobj or
the constructor call. But, it is a good practice to do so. We have to place the address
of the value type or the this pointer on the stack and then place the parameters.
The function a1 uses the this pointer to access the felds.

We modifed the function abc to read as follows:

.method public virtual instance void a1(int&2 p) il mana!ed

Despite making the function virtual, the program executes as before. The order of
the virtual modifer is very important.
243885141.doc 149 od 372

You may recall that a virtual function has to be called using the instruction callvirt
and not the instruction call. However, in the case of a value class, we cannot use
callvirt. Instead, the instruction call is used.

a.il
.assembly mukhi {}
{
{
.entrypoint
ldloca v
bo$ $$$
ret
}
}
.class inter'ace ddd
{
{
}
}
.class value $$$ implements ddd
{
.method public virtual instance void a1() il mana!ed
{
ldstr "hi"
ret
}
}

6utput
hi

In the above program, we specifed an interface ddd, that contains a single function
called a1. We created a value class xxx that implements from ddd.

Our intention is to call the function a1 from the interface ddd. As mentioned earlier,
to call a function of an interface, the instruction callvirt has to be used and not the
instruction call, as, an interface does not contain any code.

The callvirt instruction requires a reference type on the stack because it does not
work with value types. Thus, we use ldloca to load the address of the value type on
the stack. Then, we use the box instruction to convert it into a reference type.

243885141.doc 150 od 372

6utput
,.ception occurred: )-stem.+ull2eference,.ception: Attempted to
dereference a null oCect reference.
at ===.<iCa-DE

If we comment out the box instruction, the following exception is generated because
callvirt looks for a a boxed type on the stack:

Boxing and Unboxing

a.cs
public class ###
{
{
yyy a ( ne> yyy(1)K2))9
yyy b9
b ( a9
System.Console.Writeine( b.i )9
}
}
class yyy
{
public int iKj9
public yyy(int $K int y)
{
System.Console.Writeine("Const")9
i($9j(y9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.s 1)
ldc.i/.s 2)
ne>obj instance void yyy::.ctor(int&2Kint&2)
stloc.)
ldloc.)
stloc.1
ldloc.1
ld'ld int&2 yyy::i
243885141.doc 151 od 372
ret
}
}
{
.method public hidebysi! specialname rtspecialname instance void .ctor(int&2 $Kint&2
y) il mana!ed
{
ldar!.)
ldstr "Const"
ldar!.)
ldar!.1
st'ld int&2 yyy::i
ldar!.)
ldar!.2
st'ld int&2 yyy::j
ret
}
}

6utput
3onst
18

The constructor assigns values to felds. It places the values on the stack and uses
stfd to assign the values to the felds. The question that arises is that what
happens when we equate reference objects with each other.

The explanation is very simple: A reference object is simply a memory location
stored in a local variable. The variable V_0 contains a reference to the newly created
object in memory. We place this value on the stack and use ldloc.1 to initialize the
variable V_1 to this value.

Thus, a reference object is a number representing the memory location of an
object. Here, the same number is stored in the objects a and b. Hence b.i displays
the number 10. Here, the constructor does not get called again, as no new object is
created.

a.cs
class ### {
{
$$$ a ( ne> $$$(1)9
object b ( a9
a.$ ( 29
System.Console.Writeine((($$$)b).$)9
}
243885141.doc 152 od 372
}
struct $$$
{
public int $9
public $$$(int i)
{
$ ( i9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (value class $$$ @D)Kclass System.%bject @D1Kvalue class $$$ @D2)
ldloca.s @D)
ldc.i/.1
ldloca.s @D)
bo$ $$$
stloc.1
ldloca.s @D)
ldc.i/.2
st'ld int&2 $$$::$
ldloc.1
unbo$ $$$
ldobj $$$
stloc.2
ldloca.s @D2
ld'ld int&2 $$$::$
ret
}
}
.class value private auto ansi sealed $$$ e$tends 6mscorlib7System.@alue=ype
{
.'ield public int&2 $
mana!ed
{
ldar!.)
ldar!.1
st'ld int&2 $$$::$
ret
}
}

6utput
1

243885141.doc 153 od 372
This program and the next one should be read in conjunction if you want to grasp
the following:

Concepts of boxing and unboxing
The major diference between a class and a structure.

The concept of a structure is not supported by IL. On conversion to IL, a struct
becomes a class with the modifer value added to it. It is sealed and derived from
ValueType hence referred to as a value class.

In the C# program, an object is created, which is an instance of the structure xxx.
The constructor is passed the constant value 1, which is used to initialize the int
feld x to 1. Then, the object b is initialized to a. Next, we change the value of the
member x from 1 to 2 using the value object a.

We display the value of the feld x using b. The cast operator is used as the data
type of b is Object and not xxx. We notice that there are two x ints in memory:

One with a value of 2 that is associated with a,
The other one with the value of 1 that is associated with b.

So much for the C# program, let us see as to what happens in our IL program. We
create 3 objects in IL i.e. two variables V_1 and V_2 of the class xxx and one that
looks like an Object.

We place the address of V_0 on the stack followed by the value 1. Then, we call the
constructor using a call and not newobj, since we have a value class or structure
and not a pure class. The constructor initializes the feld x to 1.

We have to convert this value class to a pure object that is an instance of the class
Object. We load the address of V_O and call the box instruction, which converts a
value class into a class and places the reference of the newly created object on the
stack.

Then, we store this reference in the local variable V_1 using stloc.1. This is the code
generated when the statement object b = a is converted to IL. We have created a
fresh object using the box instruction. Thus there are two xxx objects in memory,
one as the value object V_0 and one as a reference object V_1.

We now need to initialize the feld x to 2. To do so, the constant 2 is placed on the
stack and stfd is called. The easier part of the code is over.

The problem is in the expression WriteLine((xxx)b).x. The object b or V_1 is a
reference object. We have to cast it to a value object. To do this, we need to unbox it.
243885141.doc 154 od 372
The act of converting a reference object to a value object is called unboxing. The
unbox instruction requires a reference type on the stack and it will place a value
type whose data type is specifed by the name following xxx.

The instruction ldobj loads an instance of xxx on the stack whose pointer is already
present on the stack. We store this instance in V_2 and load this value type again
on the stack. Then we load the value of x and display it using WriteLine.

a.cs
class ###
{
{
yyy c ( ne> yyy(1)9
object d ( c9
c.$ ( 29
System.Console.Writeine(((yyy)d).$)9
}
}
class yyy
{
public int $9
public yyy(int i)
{
$ ( i9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class yyy @D)Kclass System.%bject @D1)
ldc.i/.1
stloc.)
ldloc.)
stloc.1
ldloc.)
ldc.i/.2
st'ld int&2 yyy::$
ldloc.1
castclass yyy
ld'ld int&2 yyy::$
ret
}
}
243885141.doc 155 od 372
{
.'ield public int&2 $
mana!ed
{
ldar!.)
ldar!.)
ldar!.1
st'ld int&2 yyy::$
ret
}
}

6utput
>

c is an object of type yyy and holds a value of 1 in its member x. Object d is another
class or call it a structure, it does not create a new object in memory but instead,
points to the same object referenced by c. Thus, we have one yyy object in memory,
and any changes made to the value of x using d will be refected when using c and
vice-versa.

Here, as we are dealing with a class, the instruction newobj is used to create it. To
initialize the object d to c, we frst use ldloc.0 to place its value on the stack and
then use the instruction stloc.1 to initialize local V_1.

Then, we initialize c.x to 2 in the usual manner, by frst placing the reference on the
stack using ldloc.0 and then, placing the value on the stack using stfd.

Object d is already a reference object and yyy is a class. Hence we simply use
castclass. It is easy to use casting here because neither boxing nor unboxing is
required to be carried out.

The important point to be mentioned is that we are not creating another yyy in
memory, and hence, there is only one feld x in memory. This was not the case
earlier case, when a structure was used.

a.cs
class ###
{
{
lon! ' ( 19
object b ( '9
int i ( (int)b9
}
}

a.il
243885141.doc 156 od 372
.assembly mukhi {}
{
{
.entrypoint
.locals (int*/ @D)Kclass System.%bject @D1Kint&2 @D2)
ldc.i/.1
conv.i+
stloc.)
ldloca.s @D)
bo$ 6mscorlib7System.?nt*/
stloc.1
ldloc.1
unbo$ 6mscorlib7System.?nt&2
ldind.i/
stloc.2
ret
}
}

This example is deceptively similar to the one above.

First we take a reference object b and equate it to a value object f. Then we cast the
reference type object b to a value type int. The C# compiler gives us no errors but a
runtime exception is thrown.

Back to IL. We frst create a long or an int64 V_0 using locals. Then we create an
Object V_1 and fnally an int V_2.

We thereafter, place 1 on the stack, convert it into 8 bytes using conv.i8 and use
stloc.0 to store the value in V_0. The address is then placed on the stack, as we
need to use the box instruction to convert it into a reference type, which is fnally to
be stored in b or V_1.

Unbox the object b, the one created out of a value type, and store in an int. To do
this, we need to place a reference on the stack and call unbox. This will place a
value address on the stack and use ldind.i4 to fetch the value stored at this
address. Then, we use stloc.2 to initialize the variable V_2. The exception clearly
states that we cannot cast an object that is a reference type to a value type.

6utput
,.ception occurred: )-stem.In<alid3ast,.ception: An e.ception of t-pe
)-stem.In<alid3ast,.ception was thrown.
at ===.<iCa-DE

a.cs
class ### {
}
int $9
243885141.doc 157 od 372
void abc( int $)
{
this.$ ( $9
}
}
a.il
.assembly mukhi {}
{
{
.entrypoint
ret
}
.method private hidebysi! instance void abc(int&2 $) il mana!ed
{
ldar!.)
ldar!.1
st'ld int&2 ###::$
ret
}
}

Languages decide on how you write code and name variables. In C# a feld and a
parameter to a function can have the same names, but the parameter name has
more visibility than the feld name.

this.x refers to the feld name in the function abc unlike the parameter named x. In
IL, this dilemma does not arise, as we have one set of instructions that deals with
felds, a second set that deals with parameters to functions and a third set that
deals with locals. Thus there is no way that a name clash can ever occur.

a.cs
class ###
{
{
$$$ $ ( ne> $$$(1))9
System.Console.Writeine($.i)9
$.abc()9
}
}
struct $$$
{
public int i9
public $$$( int j)
{
i ( j9
}
public void abc()
{
System.Console.Writeine("abc")9
243885141.doc 158 od 372
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (value class $$$ @D))
ldloca.s @D)
ldc.i/.s 1)
ldloca.s @D)
ld'ld int&2 $$$::i
ldloca.s @D)
call instance void $$$::abc()
ret
}
}
.class value private auto ansi sealed $$$ e$tends 6mscorlib7System.@alue=ype
{
.method public hidebysi! specialname rtspecialname instance void .ctor(int&2 j) il
mana!ed
{
ldar!.)
ldar!.1
st'ld int&2 $$$::i
ret
}
{
ldstr "abc"
ret
}
}

6utput
18
ac

We take one more program on structures before we close this chapter. . We have
created a struct containing a feld i and a function abc. Structures are value objects
and are stored on the stack and not on the heap. Thus, the word value has been
used in the locals directive. It is a class, but of a value type.

In the defnition of the structure, we have added two modifers, sealed and value.
Therefore, we cannot derive from this value class. Everything else is similar to a
class.
243885141.doc 159 od 372

-7-

Pointers

Pointers are the heart and soul of a programming language. The only reason why
the C programming language is so popular amongst programmers is because of its
concept of pointers. Even C#, grudgingly, supports the concept of pointers. A
pointer value is an address that represents a memory location.

In IL, numbers can be of two types:
normal numbers, that we are so familiar with.
numbers that represent a location in memory.

A pointer represents the second type where the number represents a memory
location. Memory locations contain data of specifc types. A pointer also needs to be
typed, so that it can point to memory locations that contain data of the same type.
This is required to guarantee type safety.

IL defnes a location signature for pointers that contain the data type and, a special
syntax to identify it as a pointer. A pointer type value is not an object.

The & symbol signifes a managed pointer whereas, the * symbol signifes an
unmanaged pointer. The managed world does not like pointers. Then there are
transient pointers which we will introduce later.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 E v)
ldc.i/.1
stloc.)
ldloc.)
ret
}
}

6utput
1

243885141.doc 160 od 372
Like C, the programming language C# also understands a pointer to mean a
variable that contains a special number, one representing a computer memory
location. Thus, pointers are no diferent from other variables. Any number can be
stored in them.

In the above example, we have placed the value 1 on the stack and used ldloc.0 to
store this value in a pointer variable. A pointer variable is no diferent from a non-
pointer variable.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 E v)
ldc.i/.1
stloc.)
ldloc.)
ldloc.)
ldc.i/.1
add
ret
}
}

6utput
1
>

IL does not understand pointers. Therefore, IL does the following:

places the value of the pointer v on the stack
places 1 on the stack
calls the add instruction.

The add instruction does not sense the pointer on the stack and simply increases
its value by 1.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 E v)
ldloca v
243885141.doc 161 od 372
ldloca v
ldc.i/.1
add
ret
}
}
6utput
N??>@K8
N??>@K1

As explained earlier, C# increases the value of a pointer variable by 4 if it is a
pointer to an int. An int requires 4 bytes of memory.

Let us now understand some basics of pointers. The value of a pointer variable is a
memory location and it is, in turn, stored in memory too.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 E v K int&2 j)
ldloca j
stloc v
ldloc v
ldloca j
ret
}
}

6utput
N??>@K8
N??>@K8

We have loaded the address of the variable j on the stack and stored it in the
variable v. Thus, the variable v now contains the address of the variable j in
memory. From the output, we can infer that, variable j in memory, begins at the
memory location 6552340.

a.il
.assembly mukhi {}
{
{
.entrypoint
243885141.doc 162 od 372
ldloca j
stloc v
ldloc v
ldc.i/.2
stobj int&2
ldloc j
ret
}
}

6utput
>

In the above program, we have stored the address of the int32 j in the pointer
variable v. We have then, loaded the value of v or the address of variable j, on the
stack and thereafter, called the instruction stobj. This instruction takes a data type
as a parameter and initializes the memory location placed earlier on the stack, with
the value that is on top of the stack.

Thus, even though the instruction stloc v is not used anywhere, we have been able
to place a value in the memory location occupied by j. The instruction, ldloc and
stloc read from and write to a memory location respectively.

We can thus see that, the value of any variable, whether it is a local or a parameter
or a feld, is simply the value that is stored in the specifc memory location.

a.il
.assembly mukhi {}
{
{
.entrypoint
ldloca j
stloc v
ldloc v
ldc.i/.2
stobj int+
ldloc j
ret
}
}

6utput
?1@98>>N

243885141.doc 163 od 372
This program is almost identical to the earlier one, and yet, the output is vastly
diferent. The reason is that, we have changed the parameter that was passed to the
instruction stobj from int32 to int8.

Let us explain the repercussions of this change.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 j)
ldc.i/ **)11
stloc j
ldloca j
ldobj int&2
ldloca j
ldobj int+
ret
}
}
6utput
NN8?1
@

The variable j is initialised with a value 66051. Following it, the address of this
variable is placed on the stack and the instruction ldobj is called with a parameter
int32. This instruction picks up an address from the stack and returns the value
that is contained in the frst 4 memory locations starting at the retrieved address.

It takes up 4 bytes as we have specifed the parameter as int32. When we modify
the same parameter to int8 or 1 byte, we get a diferent answer. We are using the
instruction ldobj to identify as to what is stored in a specifc memory location.

a.il
.assembly mukhi {}
.entrypoint
.locals (int1* j)
ldc.i/ 111
stloc j
ldloca j
ldobj int1*
ldloca j
ldobj int+
243885141.doc 164 od 372
ldloca j
ldc.i/.1
add
ldobj int+
ret
}
}

6utput
?1?
@
>
Here, we have used a short i.e. int16 that requires 2 bytes, to store the value of the
variable j. We have placed its address on the stack and called ldobj with an int16 to
get its actual value, i.e. 515.

Thereafter, we have again placed its address on the stack and called ldobj with int8.
This generates an answer of 3. This means that the number 3 is stored in the frst
memory location occupied by the variable j. We will explain the reason for this
shortly.

We again place the address of the variable j on the stack and add 1 to it. the
address gets incremented by 1 and ldobj is called once again with int8. This time,
the answer generated is the number 2. Thus, the second memory location occupied
by the variable j contains 2.

Though we are aware that the value of the variable j is 515, how is it that the
memory it occupies contains the numbers 3 and 2 ?. Why is the number 515 stored
as the numbers 3 and 2?

The answer is very simple. Computer memory can only store values ranging from 0
to 255 i.e. a range of 256 diferent values. Thus, a value that lies in the range of 0
to 255 can be stored in one memory location. But the number 515 is larger than
255.

In this case, the assembler frst divides the number 515 by 256, because the result
of this division cannot be larger than 255. It stores the remainder of the division,
i.e. the number 3, in the frst memory location. Further, the result of the division,
i.e. the number 2 is stored in the second memory location.

Thus, the number 515 gets stored as the numbers 3 and 2 in memory. When we
want to access the value of j, the assembler multiplies the number in the frst
location by 1 and the number in the second location by 256. Thus 1*3 + 256*2
gives us back the original number 515.

243885141.doc 165 od 372
Doesnt the above explanation give you a warm feeling in the heart and make you
feel more comfortable while dealing with computers. At least, it had that efect on
us !

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 j)
ldc.i/ **)11
stloc j
ldloca j
ldobj int&2
ldloca j
ldobj int+
ldloca j
ldc.i/.1
add
ldobj int+
ldloca j
ldc.i/.2
add
ldobj int+
ldloca j
ldc.i/.&
add
ldobj int+
ret
}
}

6utput
NN8?1
@
>
1
8

The above program is similar to its predecessor, albeit with some minor
modifcations.

We want to unravel as to how an int32 is stored in memory. We initialize the
variable j to 66051. Then, we display the values at the 4 memory locations occupied
by j. The only small change we make here is that, we increase the memory location
243885141.doc 166 od 372
for ldobj by 1 the frst time, and then by 2 and then by 3, because we want to read
diferent memory locations each time.

We have to change the values in the add as we cannot change the address at which
the variable starts. Whenever a variable is stored over four memory locations, the
mathematics becomes tedious. Over four memory locations, we can store numbers
in a range of 4 billion or 2 raised to the power of 32.

The numbers to be stored in the 4 memory locations are arrived as follows:

First, the assembler divides the number 66051 by 2 raised to the power 24.
The answer is 0 and the remainder is 66051.
This remainder of 66051 is then divided by 2 raised to the power 16 or
65536. The answer is 1 and the remainder is 515.
This remainder of 515 is then divided by 2 raised to the power 8 or 256, as
explained in the example above. The answer is 2 and the remainder is 3

The 4 answers i.e. 0, 1, 2 and 3 are fnally stored in the 4 memory locations
occupied by j.

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.)
stloc j
ldloca j
stloc v
ldloc v
ldc.i/.&
stobj int+
ldloc j
ldloc v
ldc.i/.1
add
stloc v
ldloc v
ldc.i/.2
stobj int+
ldloc j
ldloc v
ldc.i/.1
add
stloc v
ldloc v
243885141.doc 167 od 372
ldc.i/.1
stobj int+
ldloc j
ret
}
}

6utput
@
?1?
NN8?1

This example simply builds upon the preceding example. A variable on the stack
has a random value and j is initialised to 0. Then, we store the address of j in the
variable v. Next, we place this address and the number 3 on the stack.

Thereafter, we use stobj with int8 to place this number 3 at the frst memory
location occupied by j. When we display the value of j, the assembler does the
following:

It multiplies the number at the frst memory location by 1 (2 raised to the
power 0)
It multiplies the number at the second memory location by 256 (2 raised to
the power 8)
It multiplies the number at the third memory location by 65536 (2 raised to
the power 16)
It multiplies the number at the fourth memory location by 2 raised to the
power 24.

The output of the above program is generated as follows:

Since the frst memory location of j has a value 3, the value of j becomes 3.
Then, we encounter the value 2 in the second memory location of j. Thus, its
value now becomes 515.
Then we fnd the value 1 in the third memory location occupied by j,
changing its value to 66051 because of the following calculation:

1E& L 21*E2 L *11&*E1 ( **)11.

This is the reverse of the earlier program. Instead of placing 66051 on the stack, we
are individually places values on the stack to build the number.

a.il
.assembly mukhi {}
243885141.doc 168 od 372
{
{
.entrypoint
.locals (int&2 EEEE v K int&2 j)
ldloca j
stloc v
ldloc v
ldc.i/.2
stobj int&2
ldloca j
ldobj int&2
ret
}
}

6utput
>

It is unfortunate that IL does not understand pointers the way C# or any other
programming language does.

Here, v is a pointer to a pointer to a pointer to an int32. Ultimately it is treated as a
pointer to an int32, and everything works as shown. We have stored the address of j
in it and used ldobj and stobj to access the memory.

We have gone a step further and removed all the asterix symbols from the locals
directive and made v a simple int32. We see no errors because a pointer and an
int32 take up the same amount of memory. Thus, the parameters to ldobj and stobj
are most crucial.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 Ev K int&2 EuK int&2 iK int&2 j)
ldloca i
stloc v
ldloca j
stloc u
ldloc u
ldloc v
sub
ret
}
}
243885141.doc 169 od 372

6utput
K

Pointer are not interpreted as memory locations to a particular data type, but as
numbers. Thus, subtracting them will give us the amount of memory separating
the pointers.

In the program above, as the two ints are separated by 4 bytes, the result of the
subtraction is 4. The pointers we have used are called unmanaged pointers. They
never reference any memory which is being monitored by the garbage collector. The
garbage collector is oblivious to the existence of these pointers. Garbage collectors
like to move things around in memory, at their beck and call. This has led to the
concept of pinning. Pointers cannot use verifable code.

There are 5 diferent load instructions in IL. They are for the following:

a feld
a static feld
a local
a parameter
an array.

If we add the letter 'a' at the end of these load instructions, we will get the address
of the variable instead of its value.

a.il
.assembly mukhi {}
{
.entrypoint
.locals (int&2 v)
ldstr "vijay"
ldstr "vijay"
stloc v
ldloc v
ret
}
}

6utput
11N99SN9
<iCa-

The ldstr instruction stores the string in memory and places its memory location on
the stack. Here, we are merely displaying this value.
243885141.doc 170 od 372

This value that we are displaying is very diferent from earlier values as the values
earlier were on the stack, whereas the string is stored on the heap.

We have called ldstr but now stored the value in the variable v. Then, we have
placed the value of v on the stack and called the WriteLine function with a string as
a parameter.

We can't place strings or objects on the stack. We can only place numbers on the
stack. Also, we can place the reference of an object on the stack. This reference is a
number that indicates the starting location of the object in memory. Using ldobj, we
can access the value that is stored in memory allocated for the object.

a.il
.assembly mukhi {}
{
.entrypoint
.locals (class ### v)
stloc v
ldloc v
ldc.i/.1
ldc.i/.2
ldc.i/.&
call instance void ###::abc(int&2Kint&2Kint&2)
ldloc v
ldc.i/.1
ldc.i/.2
ldc.i/.&
call instance void ###::pNr(int&2Kint&2Kint&2 )
ret
}
.method instance void abc(int&2 iK int&2 jK int&2 k)
{
ldar!a i
ldar!a j
ldar!a k
ret
}
.method instance void pNr(int&2 iK int&2 jK int&2 k)
{
ldar!a i
ldar!a j
ldar!a k
ret
243885141.doc 171 od 372
}
}

6utput
N??>@>8
N??>@@N
N??>@@>
N??>@>8
N??>@@N
N??>@@>

All locals are created on the stack. The rest are on the heap. The stack only
contains numbers. On a 32 bit machine, as in this case, they are in multiples of 4.

The stack is also used to transfer parameters to a function. In this case, parameters
are pushed onto the stack and the functions are called. When the function
encounters a ret, the stack is restored to the state prior to the function call.

Thereafter, when another function is called, the same stack, i.e. the same memory
that was used earlier to transfer parameters for the previous function, is used again
for the new function also.

This is how memory is conserved. Once a function fnishes execution, the memory
allocated to the locals is used by another function. Thus, locals lose their values
once a function quits out.

You may have noticed that there is a 16 byte gap between the two parameters.
There is no information available as to what is stored in these 16 bytes.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class ### v K int&2 v)
stloc v
ldloca v
ldloc v
ldc.i/.1
ldc.i/.2
call instance void ###::abc(int&2Kint&2)
ldloc v
ret
}
.method instance void abc(int&2 iK int&2 j)
{
243885141.doc 172 od 372
ldar!a j
ldar!a j
ldc.i/ /
add
ldc.i/ 2&
stobj int&2
ret
}
}

6utput
N??>@K8
N??>@@N
>@

Nothing stops you from shooting yourself in the foot. The reason why the powers to
be do not like you using pointers is that, they are very powerful but, at the same
time, they are extremely dangerous.

Here, we are displaying the address of the local v and also the parameter j. We
realized that they difer by 4 memory locations only. Thus, we added 4 to the
address of j and wrote 23 to the memory locations that signify the address of the
local v in the function vijay.

Thus, when we displayed the value of v in the function vijay, the number 23 was
displayed. Thus, from one function, we have been able to change the value of a
variable present in an another function.

This feature can create havoc if the pointers are not used carefully. Let us assume
that there is some bug in the WriteLine function and it writes some random value
somewhere in memory. If that random memory location contained any crucial data
or variables, the program can crash and there is no way that you can fnd out as to
where the error has occurred.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class ### v )
ldc.i/.2
ret
}
.method instance void abc(int&2 i)
243885141.doc 173 od 372
{
ldar!a i
ldc.i/ 2&
stobj int&2
ldar!.1
ret
}
}

6utput
>@

We can overwrite any piece of memory we like. In the function abc, we have
accessed the address of the parameter i, and stored the value 23 in that address.
When we subsequently tried to display its value, ldarg.1, for a moment also, did not
consider the old value to be 1. All that it did was read the memory location and
display the value stored there.

a.il
.assembly mukhi {}
{
.'ield int&2 i
.'ield static int&2 j
{
.entrypoint
.locals (class ### v )
ld'lda int&2 ###::i
lds'lda int&2 ###::j
ret
}
}

6utput
11N99S:>
KS>@88K9

The above example simply prints out the addresses of a static feld and an instance
feld. They are both stored on the heap but at diferent locations in the heap
memory.

Here is a summary of all that we have learnt about unmanaged pointers:

The concept of pointers has been borrowed from languages like C and C++.
There are no restrictions on their use, and thus, code that uses them cannot
be verifed at all.
243885141.doc 174 od 372
They are internally recognized as unsigned integers by the Execution Engine
(EE).
The * symbol and a data type should be used with pointers.
The run time does not report the existence of unmanaged pointers to the
garbage collector. Hence no garbage collector can handle these unmanaged
pointers.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 & v K int&2 j)
ldloca j
stloc v
ldloc v
ldc.i/.2
stobj int&2
ldloc j
ret
}
}

6utput
>

Now let us understand the managed pointer. This is the second type of pointer and
begins with a & symbol. This type of pointer may point to a feld of an object type or
to a value type or any other type. It cannot however, be NULL.

The most important thing about this type of pointer is that, it must be reported to
the garbage collector, in spite of the fact that, it points to managed memory. This
type of pointer works in the good managed world.

The last type of pointer is the transient pointer. It lies in between managed and
unmanaged pointer. We cannot create pointers of this type. They are created by the
EE, with the help of some IL instructions and depending upon the destination, the
EE makes them either managed or unmanaged pointers.

-8-

Methods

243885141.doc 175 od 372
The code of a data type is implemented by a method, which is executed by the
Execution Engine. The CLR ofers a large number of services to support the
execution of code.

Any code that uses these services is called managed code. Managed code allows the
CLR to provide a set of features such as handling exceptions. It also makes sure
that the code is verifable. Only managed code has access to managed data.

a.il
.assembly mukhi {}
.entrypoint
call instance void a1()
ret
}
}
.method public instance void a1() il mana!ed
{
ldstr "hi"
ret
}
6utput
hi

There is no rule in the IL book that prevents a method from being global. It can
certainly be written outside a class.

a.il
.assembly mukhi {}
{
.entrypoint
ldstr "hi"
ret
}

6utput
hi

In fact we can write the smallest IL program without using the class directive. It is
mandatory to have a function with the entrypoint directive. Thus, had the designers
of C# so desired, they could have provided the facility of global functions, but they
chose not to. They decided, in their infnite wisdom, that all functions should be
placed within a class. There is no such restriction imposed by IL.

The CLR recognizes three types of methods: static, instance and virtual. There are
some special functions that are automatically called by the runtime such as static
243885141.doc 176 od 372
constructors or type initializers such as .cctor and instance constructors such as
.ctor.

A method in IL is uniquely identifed by its signature. A signature consists of fve
parts:

The name of the method
The type or class that the method resides in
The calling convention used
The return type
The parameter types.
a.il
.assembly mukhi {}
{
.entrypoint
call instance int&2 a2()
pop
call instance void a2()
ret
}
{
ldstr "hi"
ret
}
.method public instance int&2 a2() il mana!ed
{
ldstr "hi1"
ldc.i/.2
ret
}

6utput
hi1
hi

For people like us, who are familiar with the world of C, C++ and Java, the concept
of a message signature depending upon the return type of a function is alien.

Here, we have two functions, both named a2, which difer in the type of return
value. This is perfectly valid in IL. The reason being that when calling a method in
IL, we only have to state the return type. But what is allowed in IL, may be taboo in
C#.

Method overloading is a concept where the same function name appears in a class,
more than once. In fact, you may not have clearly observed, in the above programs,
243885141.doc 177 od 372
the this pointer is not passed to the global functions. Even then, things worked
well.
The reason for this is that generally, global functions are static by default. In fact,
static functions are found in classes, value types and interfaces. Static functions
always have a body associated with them.

The second type of method very commonly used is an instance. These are functions
associated with an instance of a class. In this version of the CLR, we cannot declare
them in interfaces. Unlike static methods which are stand-alone methods and
behave like global functions, an instance functions is always passed a pointer or
reference to the data associated with the object. Thus, it can use the this pointer to
access a diferent set of data each time.

a.il
.assembly mukhi {}
.entrypoint
call void ###::a1()
ret
}
{
ldstr "hi"
ret
}
}

6utput
,.ception occurred: )-stem.MissingMethod,.ception: Foid .===.a1DE
at ===.<iCa-DE

A runtime exception is thrown cause the call expects the method to be static,
whereas, our method is an instance. To avoid this runtime error, replace the
modifer instance with static.

The this pointer is of the same type as the class in which the method resides. We
therefore, have to create an instance of a class before we can execute any instance
method from the class.
As a rule, all instance functions must have the this pointer as the frst parameter.
Therefore, it is automatically added as a frst hidden parameter. The this pointer
can be a null reference too.

a.il
.assembly mukhi {}
{
.'ield int&2 i
243885141.doc 178 od 372
{
.entrypoint
ldnull
call instance void ###::a1()
ret
}
{
ldstr "hi"
ldar!.)
ldc.i/.2
st'ld int&2 ###::i
ret
}
}

6utput
at ===.<iCa-DE

Whenever we refer to a feld in a type, through a function, the this pointer should
frst be available on the stack. This facilitates access to the instance felds. This
explains the above error.

Here, we have placed a ldnull as the this pointer, and thus, are unable to access the
instance members. On commenting the ldnull, no error is generated.

The instruction newobj places a this pointer on the stack. Therefore, prior to using
it, ldarg.0 is checked for NULL. However, for a value type, the this pointer is a
managed pointer to the value type. Unlike static or virtual, an instance is not an
attribute of a method. It is part of the calling convention of a method.

There are three ways to call a method in IL. These are: call, callvirt and calli. Two of
these, call and callvirt, have already been dealt with, in the past.

There are three other instructions that can be used to call a method in a special
way. These are jmp, jmpi and newobj. Every method that we call has its own
evaluation stack. The parameters to the function are placed on this stack, and
instructions also obtain their arguments from the same stack.

On the execution of an instruction, the result is also placed on the same stack. The
runtime creates and maintains this stack. When the method quits out, the stack is
released.

There is another stack that we do not concern ourselves with. This stack keeps
track of the method being called, and hence, is known as the call stack.
243885141.doc 179 od 372

The last and fnal instruction in any function is the ret instruction. This instruction
is responsible for the method returning control back to the calling method. If a
function returns a value, it must be placed on the stack before ret is called. When
quitting of a method, the stack must not contain any value, other than the value to
be returned.

We use the call instruction to call static or virtual functions. Before the call
instruction, all the parameters to the method must be placed on the stack. The frst
argument to the function is placed frst. The only diference between calling a static
and an instance method is that, the modifer instance is used for an instance
method whereas, no modifer is required for a static method.

a.il
.assembly mukhi {}
{
.'ield int&2 i
{
.entrypoint
pop
ldnull
callvirt instance void ###::a1()
ret
}
.method public virtual instance void a1() il mana!ed
{
ldstr "hi"
ret
}
}

6utput
at ===.<iCa-DE

Virtual functions have to be handled with care as they are runtime entities. With
virtual functions, the instruction callvirt is used in place of call. callvirt unlike call
executes the overriding version of the method.

a.il
.assembly mukhi {}
{
243885141.doc 180 od 372
{
.entrypoint
stloc.)
ldloc.)
ldloc.)
ret
}
}
{
{
ldstr "yyy abc"
ret
}
}
{
{
ldstr "$$$ abc"
ret
}
{
ldar!.)
ret
}
}

6utput
... ac
--- ac

We have pulled out this program from an earlier chapter, where we explained new,
override and virtual functions. The callvirt function calls the function abc from xxx,
as it overrides the one from the class yyy.

The reason being, in the class xxx, there is no modifer newslot for the function abc,
hence it is a diferent abc from the one in the base class. With call however, the
instruction simply calls abc from the class specifed, as it does not understand
modifers like virtual, newslot etc. instance is used with callvirt as the this pointer,
under no circumstances, can be NULL.

a.il
.assembly mukhi {}
243885141.doc 181 od 372
{
{
.entrypoint
stloc.)
ldloc.)
ret
}
}
{
{
ldstr "yyy abc"
ret
}
}
{
{
ldstr "$$$ abc"
ldloc.)
ret
}
{
ldar!.)
ret
}
}

6utput
... ac
--- ac

In the above example, the super class function abc from the class yyy is called,
from the function abc from class xxx. This facilitates reusing code defned in the
super class.

A virtual function may want to call all code in the base class. In IL parlance, it is
termed as a super call. In the above code, we foresee a problem with callvirt as it
will either call itself over and over again, or give us the following exception:

6utput
243885141.doc 182 od 372
... ac

at ....acDE
at ===.<iCa-DE

The reason for the above error is that, the this pointer refers to class xxx and not of
the class yyy. Thus, the instruction call is used and not callvirt.

a.il
.assembly mukhi {}
{
{
.entrypoint
ne>obj void ###::.ctor()
ret
}
.method public instance void abc()
{
ldstr "hi"
jmp instance void ###::pNr()
ldstr "bye"
ret
}
.method public instance void pNr()
{
ldstr "pNr"
ret
}
}

6utput
hi
p(r

We have created an object like zzz using newobj. It places a reference to a zzz on the
stack. The this pointer then calls the instance function abc.

Here we have displayed "hi" and then an instance method pqr is called using the
jmp instruction.

243885141.doc 183 od 372
After the method pqr fnishes execution, control does not regress to method abc.
Instead, control returns back to vijay, which is the method that called abc. Thus the
string "bye" present in the method pqr, does not get displayed.

The jmp instruction does not revert the control back to the method from where the
program initially branched out.

a.il
.assembly mukhi {}
.entrypoint
ret
}
{
ldstr "hi"
ld'tn instance void ###::pNr()
jmpi
ldstr "bye"
ret
}
.method public instance void pNr()
{
ldstr "pNr"
ret
}
}

6utput
hi
p(r

The above program is similar to its predecessor, but it uses the instruction jmpi
instead of jmp. This instruction is similar to jmp, but difers in the following
aspects:

In the case of the jmp instruction, we placed the method signature on the
stack as a parameter to the instruction.
In the case of the jmpi instruction, we frst use the instruction ldftn to load
the address of the function pqr on the stack, and then call jmpi.

The jmp family of instructions executes a jump or a branch across a method. We
can only jump to the beginning of a method, and not to anywhere inside it. The
signature of the method that we intend to jump to, must be the same.

243885141.doc 184 od 372
6utput
at ===.acDE
at ===.<iCa-DE

If the signature of the method being jumped to is not the same, the above exception
is thrown. The jmp instruction is not verifable.

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.1
ldc.i/.2
call instance void ###::abc(int&2Kint&2)
ret
}
.method public instance void abc(int&2 iK int&2 j)
{
ldc.i/.&
star! j
ldar! j
jmp instance void ###::pNr(int&2Kint&2)
ret
}
.method public instance void pNr(int&2 pKint&2 N)
{
ldar!.1
ldar!.2
ret
}
}

6utput
@
1
>

The method abc take two ints as parameters. We have placed the constant 3 on the
stack, and then used the instruction starg to change the parameter j. Then, ldarg is
used to place the new value on the stack. Thereafter, we have called the WriteLine
function to confrm if the new value is 3. The jmp instruction is the next to be
called.

243885141.doc 185 od 372
Here we have not placed any parameters on the stack. The jmp instruction frst
places the numbers 1 and 2 on the stack, and then, calls the function pqr, that
simply displays the parameters that have been passed.

Even though we have changed the parameter j, the change is not refected in the
called function pqr. This is contrary to what the documentation states. The call
does not pass parameters to the next method. The instruction jmp does so.

If function pqr returns a value, it will be passed to the function vijay and not to abc.
We cannot place any values on the stack before executing the jump. Jumps can be
executed only between methods that have the same signatures.

a.il
.assembly mukhi {}
{
{
.entrypoint
ld'tn instance void ###::abc()
calli instance void ()
ret
}
{
ldstr "hi"
ret
}
}

6utput
hi

We can call a method indirectly by frst, placing its address on the stack, and then,
using the calli instruction. At frst, the instruction ldftn places the address of a
non-virtual function on the stack. Like in the case of instance functions, the this
pointer has to be placed frst on the stack, followed by the parameters to the
functions. When we tried using calli with the address of a virtual function, Windows
generated an error.

We use the newobj instruction to create a new instance, and also, call the
constructor of a class, which is nothing more than a special instance method.

243885141.doc 186 od 372
The only diference between a constructor and an instance call is that, the this
pointer is not passed to the constructor. newobj frst creates the object, and then,
automatically places the this pointer on the stack.

a.il
.assembly mukhi {}
{
.'ield int&2 i
{
.entrypoint
.locals ( class ### v)
stloc.1
ldloc.1
ld'ld int&2 ###::i
ldloc.1
ldc.i/.2
st'ld int&2 ###::i
ldloc.1
ld'ld int&2 ###::i
ldloc.1
call instance void ###::.ctor()
ldloc.1
ld'ld int&2 ###::i
ret
}
{
ldar!.)
ldc.i/.1
st'ld int&2 ###::i
ret
}
}

6utput
1
>
1

The newobj instruction places the this pointer on the stack before calling the
constructor. If we desire to call the constructor ourselves, we too need to place the
this pointer on the stack.

In the above program, we have changed the value of the feld i to 1, then again
changed it to 2 using stfd and then displayed this value. Thereafter, we have called
243885141.doc 187 od 372
the constructor, which changes the value back to 1 again. This proves that a
constructor is no diferent from any other function.

A method defnition is called a method head in IL. The head also functions as an
interface to other methods. The format of the head is as follows:

It starts with a number of predefned method attributes.
These are followed by an optional indication, specifying whether the method
is an instance method or not.
Thereafter, the calling convention is specifed.
This is followed by the return type and a few more optional parameters.
Finally, we state the name and the parameters to the method and the
implementation attributes.

Methods are instance by default. To change the default behavior, we use use the
modifers static or virtual. As of today, the return type cannot have any attributes,
but who knows, what changes may take place tomorrow.

The code for the method is written in the method body. It can incorporate a large
number of directives.

a.il
.assembly mukhi {}
{
{
.entrypoint
.emitbyte )$1,
ret
}
}

6utput
@

The code that we write, gets converted into numbers. Every IL instruction is
represented by a number. The ldc.i4.3 instruction is known by the number 19 hex.
This information is available in the Instruction Set Reference. The directive emitbyte
emits an unsigned 8 bit number directly into the code section of the method.

Thus, we can use the opcodes of an IL instruction directly in il programs.

The return value of the entrypoint function can either be void, int32 or unsigned
int32. This value is handed over to the Operating System. A value of ZERO normally
indicates success and any other value indicates an error. The entrypoint method is
243885141.doc 188 od 372
unique, meaning, it can have private accessibility, and yet be accessed by the
runtime.

The .locals directive is used to create a local variable that can only be accessed from
within that method. Thus, it is used to store data that exists only for the duration
of a method call. After a method quits, all the memory allocated for a local is
reclaimed by IL.

It is faster for the system to allocate memory on the stack, where locals get stored,
than to allocate memory on the heap for the felds. We cannot specify attributes for
local variables, like we do for parameters.

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.1
stloc.)
ldloc.)
.locals ( int&2 i)
ret
}
}

6utput
1

The .locals directive can be placed at the end of the code and does not have to be
placed at the beginning. Thus, in a sense, a forward reference is allowed here.

a.il
.assembly mukhi {}
{
{
.entrypoint
::.#eroinit
.locals ( int&2 i)
ldloc.)
ret
}
}

6utput
?1@98>99

243885141.doc 189 od 372
Remove the comments and a value of zero will be displayed.

There is some overlap in IL. If we use the modifer init in the locals directive, then
all the variables will be assigned their default values, depending upon their type.
We have touched upon this point earlier.

The same efect is seen when we use the directive .zeroinit. This applies to all the
locals in the method.

If we place the comments, the variable i will be assigned whatever value is
present on the stack.
If we remove the comments, the runtime initialises all the value types to
ZERO and all the reference types to NULL.

a.il
.assembly mukhi {}
{
.#eroinit
{
.entrypoint
ret
}
}

,rror
a.ilDKE : error : s-nta. error at token O.=eroinitO in: .=eroinit
///// 0AIL12, /////

Some of the directives can only be used within certain entities. The directive
.zeroinit can only be used within a method and not outside. The assembler checks
whether the directive has been used at the right place or not. If not, it generates an
error message that is hardly informative.

a.il
.assembly mukhi {}
{
{
.entrypoint
stloc.)
ldloc.)
ret
}
}
243885141.doc 190 od 372
{
{
ldstr "yyy abc"
ret
}
{
ldar!.)
ret
}
}
{
{
ldstr "$$$ abc"
ret
}
{
ldar!.)
ret
}
}

6utput
... ac
You may accuse us of being repetitive, but there is no harm in refreshing our
memory.

Class yyy is a base class and xxx the derived class. We have created a local of type
yyy, which is the base class, but initialized it to the class xxx, which is the derived
class. A better way to say it is, we are creating an object that looks like xxx, but
storing it in a yyy local.

callvirt calls the function abc from the class xxx despite of it being called from the
yyy class, . This is because, the instruction callvirt executes at runtime. In that
environment, the this pointer on the stack is of class xxx, and thus abc from the
class xxx is called. The virtual function has its own unique way of deciding on the
pointer to be placed on the stack.

If we remove the modifer virtual from the function abc in class xxx, then the
function abc will be called from the yyy class. Changing the newobj to yyy does not
make a diference, as both the run time and compile time data types should be the
same. The run time data type takes precedence over the compile time data type.

243885141.doc 191 od 372
We add the modifer newslot in function abc class xxx as follows:


Here, from the point of view of the run time, the function abc is treated as a new
function. As there is no connection with the abc of class yyy, they are now treated
as two distinct functions. The abc of class yyy is called. Placing the modifer newslot
in class yyy function for abc makes it a new function abc, if one is present in the
object. Thus, it makes no diference here.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class yyy @D) K class $$$ @D1)
ne>obj instance void >>>::.ctor()
stloc.)
ldloc.)
ne>obj instance void >>>::.ctor()
stloc.1
ldloc.1
callvirt instance void $$$::abc()
ret
}
}
{
{
ldstr "yyy abc"
ret
}
{
ldar!.)
ret
}
}
{
{
ldstr "$$$ abc"
ret
}
243885141.doc 192 od 372
{
ldar!.)
ret
}
}
.class private auto ansi >>> e$tends $$$
{
{
ldstr ">>> abc"
ret
}
{
ldar!.)
call instance void $$$::.ctor()
ret
}
}

6utput
www ac
www ac

The above program is pretty large. The only diference between this program and its
predecessor is that, we have added one more class www derived from xxx. We have
created two locals, one each of the types xxx and yyy, but the run time data type of
both the locals is a www object.

The functions abc are virtual throughout. When we call the functions abc though
callvirt, even though we are using the class prefx xxx and yyy, the function gets
called from www.

This is so because the run time data type, i.e. www, of the this pointer has been
passed.

Then, we make our frst small change: We add a newslot to the function abc in class
www.

The output now reads as follows:

... ac
... ac
This output has resulted as shown above because, newslot dissociates the function
abc of the class www, from the earlier abc functions. Thus, since the abc of class
xxx is the newest, it gets called.

243885141.doc 193 od 372
Next, we add the modifer newslot to the function abc from class xxx and remove it
from the class www. The output now reads as.

--- ac
www ac

Isn't the output fascinating? Now you probably can understand, as to why we are
revisiting virtual functions.

By adding the modifer newslot to the function abc in class xxx, we are creating two
families of abc:

One that comprises only of a single abc in class yyy
Another made up of abc functions from classes xxx and www.

Thus, in every instance, the last member of the family gets called and, since the
frst family has only one member, this single member i.e. class yyy, gets called.

In the second case, the abc of class www gets called. Now let us add the newslot
modifer to function abc class www, without removing the one from class xxx.

The output now reads as follows:

--- ac
... ac

Now, we have three families of abc functions. Each of them has only one function
abc that has nothing to do with the abc functions of the other families.

If we add the modifer newslot to the function abc in class yyy, we will not see any
change in the output. This is because, we are cutting of abc from its root, from
class yyy onwards. There is no function abc in any of the classes that yyy derives
from. Hence, there is no change in the output.

If we remove virtual from the function abc in class www, it has the same efect as
adding the modifer newslot. A virtual modifer function signifes that the address of
the function to be called should be read from the vtable. If we remove the virtual
modifer from function abc class xxx, the output will be as follows:

www ac
... ac

This output has resulted because of the following:

The object created is a www type.

243885141.doc 194 od 372
In the frst case, the vtable has the address of a www abc. The vtable stores
a single address of every virtual function. The runtime checks for the compile
time data type of the pointer and on examining, it looks like yyy. Within yyy, it
discovers that function abc is virtual. Thus it looks into the vtable for the
address which turns out to be that of www.
In the second case, at the compile time the type revealed is xxx. But within
the class xxx, the function is not virtual and thus, the vtable does not come
into play.

Now we remove virtual from the function abc of class yyy only. Remember, we are
making only one change a time. The output now will be as follows:

--- ac
www ac

The same explanation as given earlier applies here too. We hope you will remember
us and our brilliant explanation of the concept of virtual. At least, this is how we
interpret it, and do not mind being the only ones to do so in this manner.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 i)
ldc.i/.1
stloc i
ldloc i
{
.locals (int&2 i)
ldc.i/.2
stloc i
ldloc i
}
ldloc.)
ldloc.1
ret
}
}

6utput
1
>
1
>
243885141.doc 195 od 372

In IL, the scoping levels do not exhibit similar behavior to those found in traditional
languages like C. Here i is created as a new variable each time with the { brace even
though, all the variables are moulded together into one large local directive.
Thus we refer to the individual variables i in their respective blocks. The ldloc.0
stands for the frst i whereas, ldloc.2 stands for the inner i that is visible in the
outer brace.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 i)
ldloca i
{
.locals (int&2 i)
ldloca i
}
ret
}
}

6utput
N??>@@N
N??>@K8

The above program displays diferent values for the local variable i. The output
proves that they are created consecutively in memory.

Whenever you are in doubt, display the value of the variables and clear up the
cobwebs in your mind. Thus, scope blocks are also known as syntactic sugar and
are only used to increase the readability and to debug code written by others.

Internally, for a variable name, IL begins at the scope we are presently in, and
recursively tries to resolve the name of the variable. Thus, even though a
declaration hides the name of a variable, we can access it using the index. The
scope does not change the lifetime of a variable. All the variables in a method are
created when we frst enter the method, and die when we exit from it. The variable
is always accessible by the zero based index, that is allocated on a "frst come frst
served" basis.

a.il
.assembly mukhi {}
{
243885141.doc 196 od 372
{
.entrypoint
ldc.i+ /
call varar! void ###::abc(...K int&2)
ret
}
.method public static varar! void abc()
{
.locals init (value class System..r!?terator itKint&2 $)
ldloca it
initobj value class System..r!?terator
ldloca it
ar!list
call instance void System..r!?terator::.ctor(value class
System.;untime.r!umentVandle)
ldloca $
ldloca it
call instance typedre' System..r!?terator::Iet<e$t.r!()
call class System.%bject System.=yped;e'erence::=o%bject(typedre')
castclass System.?nt&2
unbo$ int&2
cpobj int&2
ldloc $
ret
}
}

6utput
K

The above program demonstrates how a function accepts multiple number of
parameters.

Vararg is a calling convention that allows passing of multiple parameters to a
function. We have created a variable called it, that looks like System.ArgIterator. We
have then loaded its address on the stack using ldloca and then called arglist. This
instruction returns an opaque handle i.e. an unmanaged pointer which represents
all the arguments passed to the method. This handle can be passed to other
methods but is valid only during the lifetime of the current method. This opaque
handle is of the type RuntimeArgumentHandle.

The arglist instruction is valid on methods that take a variable number of
arguments. The constructor of the value class ArgIterator is called with this handle
as a parameter.

Once the value class is instantiated, we place the address of a local variable x on
the stack. This is more to store the parameter passed to our function. Subsequenly,
the address of variable it is put on the stack too. A function GetNextArg from class
243885141.doc 197 od 372
ArgIterator is called that places a typedref on the stack, which is then passed to the
function ToObject.

Then, the class to an int32 is casted and unboxed as we need a value type. This
value is copied to the variable x. The vararg is a calling convention, and thus, part
of the signature of the method. We are specifying it as part of the call instruction.
The ellipsis denote the end of fxed parameters and beginning of the variable
number of parameters. This is because, a function may want to have a certain fxed
number of parameters also.

The other functions of the class ArgIterator can also give us useful information,
such as the number of items on the stack.

We use method parameters to enable a method to accept data from the caller.
Method parameters are checked for type safety. They make it mandatory for a
method to be called with the correct parameters. The Execution Engine enforces the
contract between the caller and the called methods.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals ( int&2 i)
ldc.i/ /
stloc.)
ldloc.)
ldloc.)
ret
}
.method public instance void abc(int&2 )
{
ldar!.1
ret
}
}

6utput
K
K

243885141.doc 198 od 372
We are not compelled to assign any name to the parameters. In the above program,
we have a local as well as a parameter of type int32 which has no name or id. IL
does not seem to care at all. However, the unnamed variables can be referenced
only as an index. Parameters can also have attributes, as we shall now see, but
these attributes have nothing to do with the signature.

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/ 2
ret
}
.method public instance void abc(6opt7 int&2 i )
{
ldar!.1
ret
}
}

6utput
>

The frst attribute to a parameter is opt, which makes it optional. This means that,
it is not compulsory to pass a parameter to our function.

a.il
.assembly mukhi {}
{
{
.entrypoint
ret
}
.method public instance void abc(6opt7 int&2 i )
{
ret
}
}

6utput
,.ception occurred: )-stem.MissingMethod,.ception: Foid .===.acDE
at ===.<iCa-DE

243885141.doc 199 od 372
Always read the fne print. The opt attribute may indicate that the parameter is
optional, but it is used for documentation purposes only. The compiler may place
the opt attribute on a parameter, so that other tools make sense of it. As far as the
runtime is concerned, however, all the parameters are mandatory, and it simply
ignores the opt attribute. Thus, opt has no signifcance for the runtime.

Implementation attributes provide a lot of information about the nature of the
method to the runtime. These attributes decide whether the method requires
special handling at runtime or not.

The Synchronized Attribute

a.il
.assembly mukhi {}
{
.method public hidebysi! instance void abc()synchroni#ed
{
.locals (int&2 @D))
ldc.i/.)
stloc.)
br.s ?D))1+
?D)))/: ldloc.)
ldc.i/ )$&e+
call void 6mscorlib7System.=hreadin!.=hread::Sleep(int&2)
ldloc.)
ldc.i/.1
add
stloc.)
?D))1+: ldloc.)
ldc.i/.&
ble.s ?D)))/
ret
}
}
{
{
.entrypoint
.locals (class yyy @D)Kclass 6mscorlib7System.=hreadin!.=hread @D1Kclass
6mscorlib7System.=hreadin!.=hread @D2)
stloc.)
ldloc.)
ld'tn instance void yyy::abc()
ne>obj instance void 6mscorlib7System.=hreadin!.=hreadStart::.ctor(class
System.%bjectKint&2)
ne>obj instance void 6mscorlib7System.=hreadin!.=hread::.ctor(class
6mscorlib7System.=hreadin!.=hreadStart)
stloc.1
ldloc.)
243885141.doc 200 od 372
ld'tn instance void yyy::abc()
ne>obj instance void 6mscorlib7System.=hreadin!.=hreadStart::.ctor(class
ne>obj instance void 6mscorlib7System.=hreadin!.=hread::.ctor(class
6mscorlib7System.=hreadin!.=hreadStart)
stloc.2
ldloc.1
call instance void 6mscorlib7System.=hreadin!.=hread::Start()
ldloc.2
call instance void 6mscorlib7System.=hreadin!.=hread::Start()
ret
}
}

6utput with s-nchroni=ed
8
1
>
@

6utput without s-nchroni=ed
8
8
1
1
>
>
@
@

You should run the above program with and without the synchronized attribute to
appreciate its signifcance.

The attribute il managed tells the runtime that the method contains IL code that
will run in the managed world. We have created two threads, V_1 and V_2. These
execute the same function abc from class yyy.

In the function abc, we display numbers from 0 to 3, using a loop. After displaying
a number, the Sleep function stalls all operations for 1000 milliseconds. Thus the
frst thread executes function abc, prints the value 0 and then sleeps. Now the
second thread takes advantage of the fact that the frst thread is sleeping, and it
also displays 0 and falls asleep. This continues till we reach the value 3 and exit
from the loop.

The synchronized attribute does not execute the second function until the frst
thread terminates. Thus, the second thread has no choice but to wait until the frst
thread fnishes execution. Try implementing the above in C#.

243885141.doc 201 od 372
What we are trying to say is that if C# does not inculcate a feature of IL, there is no
way you can use it in any .cs program.

If a code implementation attribute is not given, the default value is il managed. The
other three options are native, optil and runtime. These are mutually exclusive. The
runtime attribute specifes that the implementation of the code will be supplied by
the runtime, and not by the programmer. We cannot place any code in this type of a
method. It is used for constructors and delegates.

a.il
.assembly mukhi {}
{
.method public hidebysi! static void vijay() optil
{
.entrypoint
ret
}
}
On running the a.exe executable, three message boxes pop up with the following
message.

1nale to load 6ptTit 3ompiler DM)3626T$.ULLE. 0ile ma- e missing or
corrupt. #lease check -our setup or rerun setup.

0ailure to compile a method to nati<e code. Most likel- it is a corrupt
e.ecutale *le.

5indows #rotection ,rror

The program reported the above errors on the introduction of the new attribute
optil. It clearly says that it could not fnd a particular dll. The attribute optil means
that the code is an optimized IL code that runs faster.

We normally end all our attributes for a method with the qualifer managed or
unmanaged. The default value is managed. This signifes as to who will manage the
execution of the method.

Managed signifes that the CLR will manage it.
Unmanaged signifes that someone else will manage it.

a.il
.assembly mukhi {}
{
.method public hidebysi! static void vijay() il unmana!ed
{
.entrypoint
ret
}
243885141.doc 202 od 372
}

6utput
at ===.<iCa-DE

If we use the unmanaged attribute with pure IL code we get the above exception.

a.cs
usin! System9
usin! System.;untime.?nteropServices9
class ###
{
6-ll?mport("user&2.dll")7
public static e$tern int 8essa!e5o$.(int hK strin! mK strin! cK int type)9
{
8essa!e5o$.()K"Vell"K"5ye"K))9
}
}

a.il
.assembly mukhi {}
{
.method public hidebysi! static pinvokeimpl("user&2.dll" >inapi)
int&2 8essa!e5o$.(int&2 hKclass System.Strin! mKclass System.Strin! cKint&2 type) il
mana!ed
{
}
{
.entrypoint
ldc.i/.)
ldstr "Vell"
ldstr "5ye"
ldc.i/.)
call int&2 ###::8essa!e5o$.(int&2Kclass System.Strin!Kclass System.Strin!Kint&2)
pop
ret
}
}

There are over a trillion lines of code already written in the programming language
C, under the Windows Operating System. This code resides in fles called dll's or
Dynamic Link Libraries. To ensure that this code is also be available to programs
written in IL, C# provides an attribute called DllImport.

243885141.doc 203 od 372
To be technically accurate, code written in a dll has nothing to do with a
programming language. Once we obtain a dll, there is no way one can detect as to
which programming language it was originally written in. The C# compiler converts
our attribute DllImport to a method. This implies that C# understands attributes
and depending upon the attribute it generates relevant IL code. The method is
called MessageBoxA and has the same parameters that we specifed in C#. The
added attribute is pinvokeimpl, that is frst passed the name of the dll that contains
the function.

Then we have a calling convention that has three parameters. The parameters are
pushed on the stack before the function gets called. The order of placing parameters
on the stack that IL follows is "frst written frst placed" i.e. from left to right. The
winapi calling convention follows the reverse order i.e. right to left.

Then, the name of the function gets added with a number specifying the size of the
parameters on the stack. Finally who restores the stack, the caller or the callee?

The function MessageBoxA can be called in the same manner that any other static
function of IL gets called.

There are two primary ways of calling unmanaged methods :

One is using pinvokeimpl,
The other is using IJW (It Just Works).

In IJW, the runtime stays out of our way, and we have to write code for handling
everything. We stick to pinvokeimpl, the one we can work with. The runtime will
automatically drift us from managed to unmanaged code, convert data types and
handle all the issues of transition management. The attributes to be used are native
and unmanaged as, that is what the documentation recommends. The C# compiler
however, is not familiar with the documentation.

Tail Calls

a.il
.assembly mukhi {}
{
.entrypoint
ldc.i/ 2
ldc.i/ &
call int&2 ###::abc(int&2K int&2)
ret
}
.method static public int&2 abc(int&2 aK int&2 r)
{
243885141.doc 204 od 372
ldar! a
ldc.i/ )
b!t c
ldar! r
ret
c:
ldar! a
ldc.i/ 1
sub
ldar! r
ldar! a
mul
tail.
call int&2 ###::abc(int&2K int&2)
ret
}
}
6utput
N

The above example uses recursion to fnd out the factorial of a number. It uses the
prefx tail. wich is a tail call instruction. Functional programming languages like
Lisp or Prolog use tail calls extensively. In a non-tail call, the current stack frame is
kept intact, and a new frame is allocated. This means that the stack position
changes. In a tail call, the stack frame is replaced with a frame for the function to
be called.

When a call terminates with a ret, the control returns to the caller function. In the
case of tail calls, control continues to remain with the called method. Since non-
tail calls need to store information as to who the caller is, it uses up memory on the
stack, and may limit the amount of recursion that is possible. Thus, tail calls
handle recursion more efectively than non-tail calls.

The above program works even without the tail prefx.

-9-

Properties and Indexers

A feld is simply a memory location, whereas, a property is a collection of methods.
A property is represented by a value, in the same way as a feld. Properties can be
considered as smart felds.

It is not compulsory to store the value of a property in a feld, but this is the
accepted practice. The CLR supports the syntax of properties, but these properties
do not exist at runtime.
243885141.doc 205 od 372

a.cs
public class ###
{
{
aa a ( ne> aa()9
int !! ( a.'' L ,9
System.Console.Writeine(!!)9
}
}
public class aa
{
public int ''
{
!et
{
System.Console.Writeine("in !et")9
return 129
}
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class aa @D)Kint&2 @D1)
stloc.)
ldloc.)
call instance int&2 aa::!etD''()
ldc.i/.s ,
add
stloc.1
ldloc.1
ret
}
}
{
.method public hidebysi! specialname instance int&2 !etD''() il mana!ed
{
.locals (int&2 @D))
ldstr "in !et"
ldc.i/.s 12
stloc.)
br.s ?D)))'
?D)))': ldloc.)
ret
243885141.doc 206 od 372
}
.property instance int&2 ''()
{
.!et instance int&2 aa::!etD''()
}
}

6utput
in get
>1

We have created a property called f in the class aa. This property is written as a
directive called .property in the IL fle, with the modifer instance, as it is a non-
static property, and with the return type int32.

There is an accessor called get, whose equivalent directive in IL is also called as
.get. This get is represented by the function get_f, that simply returns a value with
the data type of the property.

In this case, the br instruction is superfuous. The local variable V_0 is used to
store the return value that is to be placed on the stack.

The statement int gg = a.f + 9; gets executed in a unique way as follows:

The this pointer is placed on the stack.
Then, the expression a.f get replaced by a call to a function get_f from the
class aa.
Thereafter, the return value is placed on the stack.
The number 9 is placed on the stack, followed by the add instruction.
The property gets converted into a function beginning with get.

On executing the il assembler 'ilasm' on a.il, you will see the following output

6utput
3lass > Methods: 1! Props: 1!
a.cs
public class ###
{
{
aa a ( ne> aa()9
a.'' ( 1,9
}
}
public class aa
{
public int ''
{
set
{
243885141.doc 207 od 372
System.Console.Writeine(value)9
}
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
stloc.)
ldloc.)
ldc.i/.s 1,
call instance void aa::setD''(int&2)
ret
}
}
{
.method public hidebysi! specialname instance void setD''(int&2 QvalueQ) il mana!ed
{
ldar!.1
ret
}
{
.set instance void aa::setD''(int&2)
}
}

6utput
1S

A property with a set, obtains a function called set_f, having a parameter called
value. We also have a directive called .set.

Ldarg.1 is used to place the frst parameter of a function on the stack. The call to a
property a.f = 19 gets converted into the function call. Thus, a property actually
consists of two functions, get and set. They get called, depending on whether we
want to obtain the value of a property or change it, respectively.

a.il
.assembly mukhi {}
{
{
.entrypoint
243885141.doc 208 od 372
ret
}
}
{
{
}
}

Error checks in IL are sparse. We have a property called f, which does not have
either a get or a set directive. The C# compiler screams at this omission, but the IL
assembler turns a blind eye to this.
Hopefully, the next version of IL should have more reasonable error checks. Having
said this, henceforth, we will not comment on the lack of error checks. It will be
useful to remember in the IL world, you are on your own. The excess freedom given
by the IL assembler also means that you have to assume greater responsibility as a
programmer.

a.cs
public class ###
{
{
yyy a ( ne> yyy()9
a617 ( 149
System.Console.Writeine(a617)9
}
}
public class yyy
{
public int this6int i7
{
set
{
System.Console.Writeine("{)} {1} "Kvalue Ki)9
}
!et
{
System.Console.Writeine("{)}" K i)9
return 2&9
}
}
}

a.il
.assembly mukhi {}
{
.entrypoint
243885141.doc 209 od 372
stloc.)
ldloc.)
ldc.i/.1
ldc.i/.s 14
call instance void yyy::setD?tem(int&2Kint&2)
ldloc.)
ldc.i/.1
call instance int&2 yyy::!etD?tem(int&2)
ret
}
}
{
.method public hidebysi! specialname instance void setD?tem(int&2 iKint&2 QvalueQ) il
mana!ed
{
ldstr "{)} {1} "
ldar!a.s QvalueQ
ldar!a.s i
call void 6mscorlib7System.Console::Writeine(class System.Strin!Kclass
System.%bjectKclass System.%bject)
ret
}
.method public hidebysi! specialname instance int&2 !etD?tem(int&2 i) il mana!ed
{
.locals (int&2 @D))
ldstr "{)}"
ldar!a.s i
ldc.i/.s 2&
stloc.)
br.s ?D))1*
?D))1*:ldloc.)
ret
}
.property instance int&2 ?tem(int&2)
{
.!et instance int&2 yyy::!etD?tem(int&2)
.set instance void yyy::setD?tem(int&2Kint&2)
}
}

6utput
1: 1
1
>@

A indexer is a property. It has no equivalent directive in IL. An indexer is simply a
property with an extra parameter, and no other complications. When we initialize
243885141.doc 210 od 372
a[1]using the statement a[1] = 17, we are actually placing three parameters on the
stack:

The this pointer
The array index 1
The value 17.

Then, we call set_Item, as it is an indexer and not a property. The two parameters to
the function are i and value. If you remember, the indexer variable has been named
i.

The function get_Item gets called with the single parameter i and returns a value.
The frst parameter to the WriteLine function is a string and the rest of the
parameters are objects. We need to convert our int value types into objects. Thus we
need to box them.

Using the function set_Item, we are displaying the index and the value.
Using the function get_Item, we are displaying only the value.
Using the last WriteLine function, we are displaying the value of a[1], which
is 23.

Thus, indexers are an alias for a property with an extra parameter.

The properties directive is used only by compilers and other tools, to understand as
to what methods are being associated with the property. If you are not convinced,
you can delete the property directive from the above programs and run them. There
will be no change at all in the way they execute.

a.cs
public class ### {
yyy a ( ne> $$$()9
a.'' ( 1,9
}
}
public class yyy
{
public virtual int ''
{
set
{
System.Console.Writeine("yyy")9
}
}
}
public class $$$ : yyy
{
public override int ''
243885141.doc 211 od 372
{
set
{
System.Console.Writeine("$$$")9
}
}
}

a.il
.assembly mukhi {}
{
.entrypoint
stloc.)
ldloc.)
ldc.i/.s 1,
callvirt instance void yyy::setD''(int&2)
ret
}
}
{
.method public hidebysi! ne>slot specialname virtual instance void setD''(int&2
QvalueQ) il mana!ed
{
ldstr "yyy"
ret
}
{
.set instance void yyy::setD''(int&2)
}
}
.class public auto ansi $$$ e$tends yyy
{
.method public hidebysi! specialname virtual instance void setD''(int&2 QvalueQ) il
mana!ed
{
ldstr "$$$"
ret
}
{
ldar!.)
ret
}
{
.set instance void $$$::setD''(int&2)
243885141.doc 212 od 372
}
}
6utput
...

The above example demonstrates the use of virtual properties. The concept of a
property is simply an illusion. As mentioned earlier, properties are converted into a
series of functions. Thus what applies to virtual functions also applies to virtual
properties. We cannot use the modifer virtual in the properties directive.

The rationale behind using a property over a feld is:

If the value of a feld changes, no code gets called. The class is thus, unaware of the
change. In the case of a property, a method gets called. This method can contain a
large amount of code. This code can do anything.

Also, we can be very sure that the user does not change the value of the property
beyond certain acceptable limits. A method call can be optimised, and hence, a
property does not carry any signifcant overhead as compared to a direct access to a
feld. The only disadvantage is that the properties cannot be made global.

a.il
.assembly mukhi {}
{
{
.entrypoint
stloc.)
ldloc.)
ldc.i/.s 1,
call instance void yyy::setD''(int&2)
ret
}
}
{
.'ield int&2 jj
{
ldstr "yyy"
ret
}
{
.backin! int&2 jj
}
}
243885141.doc 213 od 372

6utput
---

A property normally has a feld that stores the value associated with a property.
Since the property directive is used, for documentation purposes, it would not be a
bad idea to have a directive called backing, which can be used to state the name of
this feld. We are not forced to do so. The assembler only checks to make sure that
the fled is present. It is not used in any way. It must have the same data type as
the property. Using the attribute specialname, we can inform the compiler to give it
special treatment.

a.il
.assembly mukhi {}
{
{
.entrypoint
ret
}
}
{
{
ret
}
{
.other void abc()
}
}

We fnally have one last directive called .other, that specifes the other functions
that are associated with the property. In this case, the assembler does not check for
the existence of the function, and thus, we have not included it.

To summarise, the properties directive is implemented as a series of method calls.
The same is true for indexers also.
-10-

,.ception %andling

243885141.doc 214 od 372
Exception handling in IL is a big let down. We expected a signifcant amount of
complexity, but were proved wrong, right from the beginning. IL cannot be termed
as a machine level assembler. It actually has a number of directives like try and
catch, that work like their higher level counterparts.

a.cs
class ###
{
{
try
{
abc()9
System.Console.Writeine("5ye")9
}
catch (System.0$ception e)
{
System.Console.Writeine("?n 0$ception")9
}
System.Console.Writeine(".'ter 0$ception")9
}
{
thro> ne> System.0$ception()9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class 6mscorlib7System.0$ception @D))
.try
{
ldstr "5ye"
leave.s ?D))1e
}
catch 6mscorlib7System.0$ception
{
stloc.)
ldstr "?n 0$ception"
leave.s ?D))1e
}
?D))1e: ldstr ".'ter 0$ception"
ret
}
243885141.doc 215 od 372
{
ne>obj instance void 6mscorlib7System.0$ception::.ctor()
thro>
}
}

6utput
In ,.ception
After ,.ception

In the above example, the function abc frst creates an object that looks like
Exception using newobj and places it on the stack. Thereafter the throw instruction
throws an exception. This Exception is placed on the stack, hence the catch
instruction is called. In the catch instruction, e, a local varaible, is an instance of
Exception. The next instruction, leave.s jumps to label IL_001e, the label is beyond
the catch.

To exit of from a try or a catch block, instead of the branch instruction br, leave is
used. The reason is that we are dealing with exceptions, which are to be handled in
a special way in IL. Exception handling in IL is done using higher level instructions.

a.cs
class ###
{
{
yyy a9
a(ne> yyy()9
try
{
a.abc()9
}
catch
{
}
'inally
{
System.Console.Writeine("?n 'inally")9
}
}
}
class yyy
{
public void abc()
{
}
243885141.doc 216 od 372
}

a.il
.assembly mukhi {}
{
{
.entrypoint
stloc.)
.try
{
.try
{
ldloc.)
ldstr "5ye"
leave.s ?D))21
}
catch 6mscorlib7System.%bject
{
pop
leave.s ?D))21
}
?D))21: leave.s ?D))&2
}
'inally
{
ldstr "?n 'inally"
end'inally
}
?D))&2: ret
}
}
{
{
thro>
}
}

6utput
In ,.ception
In *nall-

243885141.doc 217 od 372
The above program has utilised a try catch without a parameter and a fnally
clause. Adding a fnally clause associates the same try with a catch, and a fnally.
In a sense, two copies of try are created, one for catch and the other for fnally.

If the catch directive is not supplied with an Exception object, it will take an object
that looks like System.Object. In the catch, the item is popped of the stack, as its
value holds no signifcance. The string is printed before the leave. Also, along with
try-catch is the fnally clause which is the next to be executed. A fnally is executed
as a separate try fnally directive and it can only be exited using the endfnally
instruction.

a.il
.assembly mukhi {}
{
{
.entrypoint
.try
{
leave.s ?D))&2
}
'inally
{
ldstr "?n 'inally"
end'inally
}
?D))&2: ret
}
}

6utput
In *nall-

Nowhere is it specifed that a try must have a catch. A fnally will ultimately be
called at the end of the try.

a.il
.assembly mukhi {}
{
{
.entrypoint
ldstr "5ye"
ret
}
{
243885141.doc 218 od 372
thro>
}
}

6utput
,.ception occurred: )-stem.,.ception: An e.ception of t-pe )-stem.,.ception
was thrown.
at ===.<iCa-DE

In the absence of a try catch block, if function abc throws an exception, it will not
get caught. Instead, a runtime error is generated. A try catch clause is
recommended to proactively catch the exception, otherwise when an exception is
thrown and the program will come to a grinding halt.

a.cs
public class ###
{
{
int i ( 19
'or ( i ( 19 iP( 1) 9 iLL)
{
try
{
System.Console.Writeine("1 try")9
try
{
break9
}
'inally
{
System.Console.Writeine("2 'inally")9
}
}
'inally
{
}
}
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 @D))
ldc.i/.1
243885141.doc 219 od 372
stloc.)
ldc.i/.1
stloc.)
br.s ?D))&2
.try
{
?D)))*: ldstr "1 try"
.try
{
ldstr "2 try"
leave.s ?D))&4
}
'inally
{
ldstr "2 'inally"
end'inally
}
}
'inally
{
ldstr "1 'inally"
end'inally
}
?D))&2: ldloc.)
ldc.i/.s 1)
ble.s ?D)))*
?D))&4: ldloc.)
ret
}
}
6utput
1 tr-
> tr-
> *nall-
1 *nall-
1

The above program is quite lengthy, but very simple. It proves the fact that code
placed in a fnally block is always executed. Like death and taxes, a fnally cannot
be avoided.

The for statement branches to label IL_0032 where we frst check for the value to
be less than or equalto10. If it results in TRUE, the code at label IL_0006 is
executed. A we learnt in one of the earlier chapters, the condition check for the for
statement is always placed at the bottom in IL.

In the frst attempt, string "1 try" is printed . Thereafter the code within the second
try is executed, where "2 try" is printed. The break statement in C# gets converted
243885141.doc 220 od 372
to a leave to label IL_0037 in IL. This label signifes the end of the for statement.
The leave instruction is smart enough to realize that it is located within two trys
with a fnally clause, hence it calls the code with the fnally instruction.

Under normal circumstances, break becomes a branch instruction if not placed
within try-catch.

a.cs
public class ###
{
public void abc()
{
lock(this)
{
}
}
{
}
}
a.il
.assembly mukhi {}
{
.entrypoint
ret
}
{
ldar!.)
dup
stloc.)
call void 6mscorlib7System.=hreadin!.8onitor::0nter(class System.%bject)
.try
{
leave.s ?D))11
}
'inally
{
ldloc.)
call void 6mscorlib7System.=hreadin!.8onitor::0$it(class System.%bject)
end'inally
}
?D))11: ret
}
}

The lock keyword ensures that while one thread executes a function, the other
threads remain suspended. This keyword gets translated into a large amount of IL
code. In fact, it generates the maximum amount of code amongst all the keywords.

243885141.doc 221 od 372
The C# compiler frst calls the static function Enter from the Monitor class. Then, it
executes the code located in a try. The try block here contains no code. On
encountering the leave instruction, the program enters the fnally which calls the
Exit function from the Monitor class. This initiates another thread that is waiting
at the Enter function.
Whenever an exception occurs, an object representing the exception must be
created. This exception object has to be a class derived from Exception and cannot
be a value type or a pointer.

A Structured Exception Handling (SEH) block is made up of a try and one or more
handlers. A try directive is used to declare a protected block.

a.il
.assembly mukhi {}
{
{
.entrypoint
.try
{
ret
}
{
stloc.)
leave.s ?D))1e
}
ret
}
}

We cannot exit from a protected block with a ret but, on using it, no errors are
generated at assemble time or run time. As a rule, only a leave or a throw to
another exception is acceptable to exit from a protected block. A leave statement is
permitted in the try and not in the catch.

a.il
.assembly mukhi {}
{
.entrypoint
.try
{
243885141.doc 222 od 372
ldstr "5ye"
.try
{
ldstr "5ye1"
leave aa1
}
{
stloc.)
ldstr "?n 0$ception1"
leave.s ?D))1e
}
aa1: leave.s ?D))1e
}
{
stloc.)
leave.s ?D))1e
}
ret
}
{
ret
}
{
thro>
}
}

6utput
7-e
In ,.ception1
After ,.ception

We can nest as many trys or protected blocks within each other. A leave is required
at the end of every try to avoid all errors.

We can have four types of handlers for a try or a protected block. They are:

fnally
catch
fault
243885141.doc 223 od 372
flter

Only one of it can be used at a time.

a.il
.assembly mukhi {}
{
.entrypoint
.try
{
ldstr "5ye"
leave.s ?D))1e
}
{
stloc.)
leave.s ?D))1e
}
'inally
{
ldstr "?n 'inally"
end'inally
}
ret
}
{
thro>
}
}

6utput
In ,.ception
After ,.ception

If we comment out the code where the function abc has been called, we get the
following output:

6utput
7-e
In *nall-
After ,.ception
243885141.doc 224 od 372

As mentioned earlier, the protected block can only be handled by a single handler.

In the above example, when an exception is thrown, the catch is called and not
fnally as it does not have its own try directive. The runtime does not give us any
error, but it ignores the fnally handler.

If, however, no exception is thrown, as is the case when we comment out the call of
the function abc, then the fnally gets called.

It is quaint that the try is a directive, but the handler is not. We have two classes of
handlers:

exception resolving handlers
exception observing handlers.

The fnally and fault handlers are exception observing handlers as they do not
resolve the exception.

In an exception resolving handler, we try and resolve the exception so that normal
program control continues. The catch and flter handlers are examples of such
handlers.

The deepest handler will be visited frst, followed by the next enclosing one and so
on, until we fnd an appropriate handler. A handler has its own instructions, using
which, the program can exit a handler. It is illegal to use any other instruction for
this purpose, but at times the assembler is unable to detect this misft.

a.il
.assembly mukhi {}
{
{
.entrypoint
.try
{
}
{
br a1
}
ret
}
{
a1:
ret
243885141.doc 225 od 372
}
}

,rror
///// 0AIL12, /////

We are not allowed to jump of a catch handler. It is essential to leave the handler
in an orderly manner only.

a.il
.assembly mukhi {}
{
{
.entrypoint
stloc.)
start=ry:
ldstr "in try"
::call instance void ###::abc()
ldstr "a'ter 'unction call"
leave e$itS0V
end=ry:
start3ault:
ldstr "in 'ault"
end'ault
end3ault:
.try start=ry to end=ry 'ault handler start3ault to end3ault
e$itS0V:
ldstr "over"
ret
}
{
thro>
}
}

6utput
in tr-
after function call
o<er

If we comment out the call of the function abc, we get an error, a Windows error,
which is incomprehensible. The purpose of the above program is to demonstrate
that we can use labels to delimit code in a protected block. Thereafter, we can use
243885141.doc 226 od 372
the try directive, indicating to it the start and end label of our code and also the
start and end of the fault handler. This is another way of utilising the try directive.

The catch keyword creates a type fltered exception. Whenever an exception occurs
in a protected block, the EE checks whether the exception that occurred earlier, is
equal to or a sub-type of the error that the catch expects. If the type matches, the
code of the catch is called. If the type does not match, the EE will continue to
search for another handler.

a.il
.assembly mukhi {}
{
{
.entrypoint
.try
{
a1:
ldstr "5ye"
leave.s ?D))1e
}
{
stloc.)
leave.s a1
}
ret
}
{
thro>
}
}

6utput
In ,.ception
7-e
After ,.ception

The program is not allowed to resume execution after an exception occurs. This
means that, we cannot go back to the protected block where the exception took
place. In this case, we are allowed to do so, but keep in mind that we were using a
beta copy of the assembler.
243885141.doc 227 od 372

Whenever the EE sees a leave in a catch block, it knows that the exception is done
with, and the system returns to a normal state.

a.il
.assembly mukhi {}
{
{
.entrypoint
.try
{
ldstr "5ye"
leave.s ?D))1e
}
{
stloc.)
rethro>
}
ret
}
{
thro>
}
}

6utput
In ,.ception
was thrown.
at ===.<iCa-DE

Here, at the end of the catch is the rethrow instruction, which rethrows the same
exception again. As there is no other catch block to catch the exception, the
exception is thrown at run time.

a.il
.assembly mukhi {}
243885141.doc 228 od 372
{
{
.entrypoint
.try
{
.try
{
ldstr "5ye"
leave.s ?D))1e
}
{
stloc.)
rethro>
}
}
{
stloc.)
}
ret
}
{
thro>
}
}

6utput
In ,.ception
In ,.ception1
After ,.ception

Here, we placed another try directive with a separate catch. The exception that is
thrown by the inner catch, cannot be caught by another catch at the same level. It
needs to be caught by the catch at the higher level. Thus one more catch is needed.

a.il
.assembly mukhi {}
{
{
243885141.doc 229 od 372
.entrypoint
.try
{
ldstr "5ye"
leave.s ?D))1e
}
{
stloc.)
leave.s ?D))1e
}
catch 6mscorlib7System.?%0$ception
{
stloc.)
leave.s ?D))1e
}
ret
}
{
ne>obj instance void 6mscorlib7System.?%0$ception::.ctor()
thro>
}
}

6utput
In ,.ception
After ,.ception

The C# compiler watches our code like a hawk. On the other hand, the assembler is
a blind bat.

There are two exception handlers:

IOException: This handles Input/Output Exceptions.
Exception: This is a generic handler that handles generic exceptions since all
exceptions are derived from exception.

Consciously, we have placed the generic Exception handler frst. Therefore,
irrespective of the exception thrown, the second handler will never get called. The
function abc now throws a IOException. The generic Exception handler, which is
placed earlier in the code, is encountered frst, and therefore, it deals with the
243885141.doc 230 od 372
exception. The C# compiler would have generated an error in this situation, but the
assembler assumes what you are conscious of your deeds.

6utput
at ===.<iCa-DE

The above exception is thrown when there is no leave in a catch and there is
another catch following this one. In one of our earlier programs above, we have used
only a single catch.

a.il
.assembly mukhi {}
{
.method public static void vijay()
{
.entrypoint
ldstr "start"
br start
a2:
ldstr "in a2"
pop
ldc.i/.)
end'ilter

start:
.try
{
ldstr "in try"
leave a1
}
'ilter a2
{
ldstr "'ilter"
pop
leave a1
}
a1:
ret
}
{
thro>
243885141.doc 231 od 372
}
}

6utput
start
in tr-
was thrown.
at ===.<iCa-DE

The last type of fault handler, which is the most generic, either does not seem to
work or it could also mean that we have done something wrong.

We use the keyword flter with a label, which denotes the start point of some code.
This code checks whether the exception must be handled or not. This is triggered
of by placing either the number 0 or 1 on the stack. In our case, none of the code
dealing with the flter gets called.
-11-

Uelegates and ,<ents
Exception handling in IL is a big let down. We expected a signifcant amount of
complexity, but were proved wrong, right from the beginning. IL cannot be termed
as a machine level assembler. It actually has a number of directives like try and
catch, that work like their higher level counterparts.

a.cs
class ###
{
{
try
{
abc()9
}
catch (System.0$ception e)
{
}
System.Console.Writeine(".'ter 0$ception")9
}
{
}
}
243885141.doc 232 od 372

a.il
.assembly mukhi {}
{
{
.entrypoint
.try
{
ldstr "5ye"
leave.s ?D))1e
}
{
stloc.)
leave.s ?D))1e
}
ret
}
{
thro>
}
}

6utput
In ,.ception
After ,.ception

In the above example, the function abc frst creates an object that looks like
Exception using newobj and places it on the stack. Thereafter the throw instruction
throws an exception. This Exception is placed on the stack, hence the catch
instruction is called. In the catch instruction, e, a local varaible, is an instance of
Exception. The next instruction, leave.s jumps to label IL_001e, the label is beyond
the catch.

To exit of from a try or a catch block, instead of the branch instruction br, leave is
used. The reason is that we are dealing with exceptions, which are to be handled in
a special way in IL. Exception handling in IL is done using higher level instructions.

a.cs
243885141.doc 233 od 372
class ###
{
{
yyy a9
a(ne> yyy()9
try
{
a.abc()9
}
catch
{
}
'inally
{
System.Console.Writeine("?n 'inally")9
}
}
}
class yyy
{
public void abc()
{
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
stloc.)
.try
{
.try
{
ldloc.)
ldstr "5ye"
leave.s ?D))21
}
catch 6mscorlib7System.%bject
{
pop
leave.s ?D))21
243885141.doc 234 od 372
}
?D))21: leave.s ?D))&2
}
'inally
{
ldstr "?n 'inally"
end'inally
}
?D))&2: ret
}
}
{
{
thro>
}
}

6utput
In ,.ception
In *nall-

The above program has utilised a try catch without a parameter and a fnally
clause. Adding a fnally clause associates the same try with a catch, and a fnally.
In a sense, two copies of try are created, one for catch and the other for fnally.

If the catch directive is not supplied with an Exception object, it will take an object
that looks like System.Object. In the catch, the item is popped of the stack, as its
value holds no signifcance. The string is printed before the leave. Also, along with
try-catch is the fnally clause which is the next to be executed. A fnally is executed
as a separate try fnally directive and it can only be exited using the endfnally
instruction.

a.il
.assembly mukhi {}
{
{
.entrypoint
.try
{
leave.s ?D))&2
}
'inally
{
ldstr "?n 'inally"
end'inally
}
243885141.doc 235 od 372
?D))&2: ret
}
}

6utput
In *nall-

Nowhere is it specifed that a try must have a catch. A fnally will ultimately be
called at the end of the try.

a.il
.assembly mukhi {}
{
{
.entrypoint
ldstr "5ye"
ret
}
{
thro>
}
}

6utput
was thrown.
at ===.<iCa-DE

In the absence of a try catch block, if function abc throws an exception, it will not
get caught. Instead, a runtime error is generated. A try catch clause is
recommended to proactively catch the exception, otherwise when an exception is
thrown and the program will come to a grinding halt.

a.cs
public class ###
{
{
int i ( 19
'or ( i ( 19 iP( 1) 9 iLL)
{
try
{
try
{
243885141.doc 236 od 372
break9
}
'inally
{
}
}
'inally
{
}
}
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2 @D))
ldc.i/.1
stloc.)
ldc.i/.1
stloc.)
br.s ?D))&2
.try
{
?D)))*: ldstr "1 try"
.try
{
ldstr "2 try"
leave.s ?D))&4
}
'inally
{
ldstr "2 'inally"
end'inally
}
}
'inally
{
ldstr "1 'inally"
end'inally
}
?D))&2: ldloc.)
ldc.i/.s 1)
ble.s ?D)))*
243885141.doc 237 od 372
?D))&4: ldloc.)
ret
}
}
6utput
1 tr-
> tr-
> *nall-
1 *nall-
1

The above program is quite lengthy, but very simple. It proves the fact that code
placed in a fnally block is always executed. Like death and taxes, a fnally cannot
be avoided.

The for statement branches to label IL_0032 where we frst check for the value to
be less than or equalto10. If it results in TRUE, the code at label IL_0006 is
executed. A we learnt in one of the earlier chapters, the condition check for the for
statement is always placed at the bottom in IL.

In the frst attempt, string "1 try" is printed . Thereafter the code within the second
try is executed, where "2 try" is printed. The break statement in C# gets converted
to a leave to label IL_0037 in IL. This label signifes the end of the for statement.
The leave instruction is smart enough to realize that it is located within two trys
with a fnally clause, hence it calls the code with the fnally instruction.

Under normal circumstances, break becomes a branch instruction if not placed
within try-catch.

a.cs
public class ###
{
public void abc()
{
lock(this)
{
}
}
{
}
}
a.il
.assembly mukhi {}
{
.entrypoint
ret
}
243885141.doc 238 od 372
{
ldar!.)
dup
stloc.)
call void 6mscorlib7System.=hreadin!.8onitor::0nter(class System.%bject)
.try
{
leave.s ?D))11
}
'inally
{
ldloc.)
call void 6mscorlib7System.=hreadin!.8onitor::0$it(class System.%bject)
end'inally
}
?D))11: ret
}
}

The lock keyword ensures that while one thread executes a function, the other
threads remain suspended. This keyword gets translated into a large amount of IL
code. In fact, it generates the maximum amount of code amongst all the keywords.

The C# compiler frst calls the static function Enter from the Monitor class. Then, it
executes the code located in a try. The try block here contains no code. On
encountering the leave instruction, the program enters the fnally which calls the
Exit function from the Monitor class. This initiates another thread that is waiting
at the Enter function.
Whenever an exception occurs, an object representing the exception must be
created. This exception object has to be a class derived from Exception and cannot
be a value type or a pointer.

A Structured Exception Handling (SEH) block is made up of a try and one or more
handlers. A try directive is used to declare a protected block.

a.il
.assembly mukhi {}
{
{
.entrypoint
.try
{
ret
}
{
stloc.)
243885141.doc 239 od 372
leave.s ?D))1e
}
ret
}
}

We cannot exit from a protected block with a ret but, on using it, no errors are
generated at assemble time or run time. As a rule, only a leave or a throw to
another exception is acceptable to exit from a protected block. A leave statement is
permitted in the try and not in the catch.

a.il
.assembly mukhi {}
{
.entrypoint
.try
{
ldstr "5ye"
.try
{
ldstr "5ye1"
leave aa1
}
{
stloc.)
leave.s ?D))1e
}
aa1: leave.s ?D))1e
}
{
stloc.)
leave.s ?D))1e
}
ret
}
243885141.doc 240 od 372
{
ret
}
{
thro>
}
}

6utput
7-e
In ,.ception1
After ,.ception

We can nest as many trys or protected blocks within each other. A leave is required
at the end of every try to avoid all errors.

We can have four types of handlers for a try or a protected block. They are:

fnally
catch
fault
flter

Only one of it can be used at a time.

a.il
.assembly mukhi {}
{
.entrypoint
.try
{
ldstr "5ye"
leave.s ?D))1e
}
{
stloc.)
leave.s ?D))1e
}
'inally
{
ldstr "?n 'inally"
243885141.doc 241 od 372
end'inally
}
ret
}
{
thro>
}
}

6utput
In ,.ception
After ,.ception

If we comment out the code where the function abc has been called, we get the
following output:

6utput
7-e
In *nall-
After ,.ception

As mentioned earlier, the protected block can only be handled by a single handler.

In the above example, when an exception is thrown, the catch is called and not
fnally as it does not have its own try directive. The runtime does not give us any
error, but it ignores the fnally handler.

If, however, no exception is thrown, as is the case when we comment out the call of
the function abc, then the fnally gets called.

It is quaint that the try is a directive, but the handler is not. We have two classes of
handlers:

exception resolving handlers
exception observing handlers.

The fnally and fault handlers are exception observing handlers as they do not
resolve the exception.

In an exception resolving handler, we try and resolve the exception so that normal
program control continues. The catch and flter handlers are examples of such
handlers.

243885141.doc 242 od 372
The deepest handler will be visited frst, followed by the next enclosing one and so
on, until we fnd an appropriate handler. A handler has its own instructions, using
which, the program can exit a handler. It is illegal to use any other instruction for
this purpose, but at times the assembler is unable to detect this misft.

a.il
.assembly mukhi {}
{
{
.entrypoint
.try
{
}
{
br a1
}
ret
}
{
a1:
ret
}
}

,rror
///// 0AIL12, /////

We are not allowed to jump of a catch handler. It is essential to leave the handler
in an orderly manner only.

a.il
.assembly mukhi {}
{
{
.entrypoint
stloc.)
start=ry:
ldstr "in try"
::call instance void ###::abc()
leave e$itS0V
end=ry:
243885141.doc 243 od 372
start3ault:
ldstr "in 'ault"
end'ault
end3ault:
.try start=ry to end=ry 'ault handler start3ault to end3ault
e$itS0V:
ldstr "over"
ret
}
{
thro>
}
}

6utput
in tr-
after function call
o<er

If we comment out the call of the function abc, we get an error, a Windows error,
which is incomprehensible. The purpose of the above program is to demonstrate
that we can use labels to delimit code in a protected block. Thereafter, we can use
the try directive, indicating to it the start and end label of our code and also the
start and end of the fault handler. This is another way of utilising the try directive.

The catch keyword creates a type fltered exception. Whenever an exception occurs
in a protected block, the EE checks whether the exception that occurred earlier, is
equal to or a sub-type of the error that the catch expects. If the type matches, the
code of the catch is called. If the type does not match, the EE will continue to
search for another handler.

a.il
.assembly mukhi {}
{
{
.entrypoint
.try
{
a1:
ldstr "5ye"
leave.s ?D))1e
}
243885141.doc 244 od 372
{
stloc.)
leave.s a1
}
ret
}
{
thro>
}
}

6utput
In ,.ception
7-e
After ,.ception

The program is not allowed to resume execution after an exception occurs. This
means that, we cannot go back to the protected block where the exception took
place. In this case, we are allowed to do so, but keep in mind that we were using a
beta copy of the assembler.

Whenever the EE sees a leave in a catch block, it knows that the exception is done
with, and the system returns to a normal state.

a.il
.assembly mukhi {}
{
{
.entrypoint
.try
{
ldstr "5ye"
leave.s ?D))1e
}
{
stloc.)
rethro>
243885141.doc 245 od 372
}
ret
}
{
thro>
}
}

6utput
In ,.ception
was thrown.
at ===.<iCa-DE

Here, at the end of the catch is the rethrow instruction, which rethrows the same
exception again. As there is no other catch block to catch the exception, the
exception is thrown at run time.

a.il
.assembly mukhi {}
{
{
.entrypoint
.try
{
.try
{
ldstr "5ye"
leave.s ?D))1e
}
{
stloc.)
rethro>
}
}
{
stloc.)
}
243885141.doc 246 od 372
ret
}
{
thro>
}
}

6utput
In ,.ception
In ,.ception1
After ,.ception

Here, we placed another try directive with a separate catch. The exception that is
thrown by the inner catch, cannot be caught by another catch at the same level. It
needs to be caught by the catch at the higher level. Thus one more catch is needed.

a.il
.assembly mukhi {}
{
{
.entrypoint
.try
{
ldstr "5ye"
leave.s ?D))1e
}
{
stloc.)
leave.s ?D))1e
}
catch 6mscorlib7System.?%0$ception
{
stloc.)
leave.s ?D))1e
}
ret
}
243885141.doc 247 od 372
{
ne>obj instance void 6mscorlib7System.?%0$ception::.ctor()
thro>
}
}

6utput
In ,.ception
After ,.ception

The C# compiler watches our code like a hawk. On the other hand, the assembler is
a blind bat.

There are two exception handlers:

IOException: This handles Input/Output Exceptions.
Exception: This is a generic handler that handles generic exceptions since all
exceptions are derived from exception.

Consciously, we have placed the generic Exception handler frst. Therefore,
irrespective of the exception thrown, the second handler will never get called. The
function abc now throws a IOException. The generic Exception handler, which is
placed earlier in the code, is encountered frst, and therefore, it deals with the
exception. The C# compiler would have generated an error in this situation, but the
assembler assumes what you are conscious of your deeds.

6utput
at ===.<iCa-DE

The above exception is thrown when there is no leave in a catch and there is
another catch following this one. In one of our earlier programs above, we have used
only a single catch.

a.il
.assembly mukhi {}
{
.method public static void vijay()
{
.entrypoint
ldstr "start"
br start
a2:
ldstr "in a2"
pop
243885141.doc 248 od 372
ldc.i/.)
end'ilter

start:
.try
{
ldstr "in try"
leave a1
}
'ilter a2
{
ldstr "'ilter"
pop
leave a1
}
a1:
ret
}
{
thro>
}
}

6utput
start
in tr-
was thrown.
at ===.<iCa-DE

The last type of fault handler, which is the most generic, either does not seem to
work or it could also mean that we have done something wrong.

We use the keyword flter with a label, which denotes the start point of some code.
This code checks whether the exception must be handled or not. This is triggered
of by placing either the number 0 or 1 on the stack. In our case, none of the code
dealing with the flter gets called.
-12-

Arra-s
243885141.doc 249 od 372
An array is a contiguous block of memory that stores values of the same type.
These values are an indexed collection. The runtime has built in support to handle
arrays. Vector is another name for an array that has only one dimension and the
index count starts at zero. An array type can be any type derived from
System.Object. This includes everything under the sun, excluding pointers, which
are not allowed in this version of the CLR. Nobody knows about the next version. An
array is a subtype of System.Array and we are given plenty of leeway in working
with arrays. The newarr instruction is used only for single dimensional arrays.

a.cs
class ###
{
{
int67 a9
a( ne> int6&79
a617( 1)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&267 @D))
ldc.i/.&
ne>arr 6mscorlib7System.?nt&2
stloc.)
ldloc.)
ldc.i/.1
ldc.i/.s 1)
stelem.i/
ldloc.)
ldc.i/.1
ldelem.i/
ret
}
}

6utput
18

IL recognises the array data type. Thus, in the locals directive, we see an array of
int32 called V_0. This is similar to the process of creating an array in C# where we
frst specify that we want an array variable. Then, to create the actual array, the
size of the array is mentioned. In IL, the size is placed on the stack. IL uses newarr,
243885141.doc 250 od 372
similar to newobj to create the array in memory. However, in C#, new is used for
an array as well as for a reference type. The data type of the array to be created is
also passed to the newarr instruction. Like newobj, newarr also places the reference
of the array on the stack. Thereafter, V_0 is initialized with this reference, which is
pushed on the stack using ldloc.0.

We will now explain the IL code generated for the statement a[1] = 10. To do so, the
array index, in this case, the value 1 followed by the value of the array is to be
initialized i.e. 10 is pushed on the stack. So, there are 3 items on our stack: At the
bottom, the array reference, then the array index and fnally the new value of the
array variables.

These parameters are required by the instruction stelem.i4 to initialize an array
member. To read the value of an array variable, the address of the array reference
is loaded on the stack, followed by the index of the array. The instruction ldelem.i4
does the reverse. It retrieves the value of an array variable. As mentioned earlier, i4
stands for 4 bytes on the stack. Most instructions have such a data type at the end
of their instruction.

a.cs
class ###
{
public static void 8ain(strin!67 a)
{
System.Console.Writeine(a.en!th)9
}
}

a.il
.assembly mukhi {}
{
.method public hidebysi! static void vijay(class System.Strin!67 a) il mana!ed
{
.entrypoint
ldar!.)
ldlen
conv.i/
ret
}
}

& a one two

6utput
>

The array class has a member called Length.This Length member in C# gets
converted to an IL instruction ldlen, that requires an array object on the stack and
243885141.doc 251 od 372
returns the length. Array handling is very powerful in .NET because IL has an
intrinsic ability to understand arrays. In IL, the array has been made a frst class
member.

a.cs
class ###
{
{
int67 a9
a( ne> int6279
a6)7( 129 a617( 1)9
'oreach (int i in a)
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&267 @D)Kint&2 @D1Kint&267 @D2Kint&2 @D&Kint&2 @D/)
ldc.i/.2
stloc.)
ldloc.)
ldc.i/.)
ldc.i/.s 12
stelem.i/
ldloc.)
ldc.i/.1
ldc.i/.s 1)
stelem.i/
ldloc.)
stloc.2
ldloc.2
ldlen
conv.i/
stloc.&
ldc.i/.)
stloc.s @D/
br.s ?D))2d
?D))1c: ldloc.2
ldloc.s @D/
ldelem.i/
stloc.1
ldloc.1
ldloc.s @D/
ldc.i/.1
add
243885141.doc 252 od 372
stloc.s @D/
?D))2d: ldloc.s @D/
ldloc.&
blt.s ?D))1c
ret
}
}

6utput
1>
18

Here we have a small C# program that has been transformed to a large IL program.
To begin, we have created 5 locals instead on 1. Two of them, V_0 and V_2, are
arrays and the rest are mere ints. The two stelem.i4 instructions initialize the 2
array members as seen in the above programs.

Now let us understand how IL deals with a foreach statement. Ldloc.0 stores the
reference of the array on the stack. The instruction stloc.2 makes local V_2 as the
same array reference as V_0. Then the array reference V_2, which is similar to V_0,
is loaded on the stack. Finally using instruction ldlen, the length of the array is
determined.

The number 2 is present on the stack. This represents the length of the array. It is
changed to occupy 4 bytes on the stack and is stored in local V_3, using the
instruction stloc.3. The number 0 is then placed on the stack using the ldc
instruction. stloc pops this value 0 into local V_4 and br branches to label
IL_002d where the value of variable V_4, 0, is loaded. Also the value of local V_3,
that stores the length of the array, i.e. 2 is loaded on the stack.

Since 1 is less than 2, the code at label IL_001c is executed. This loads the array
reference on the stack, then loads local V_4, which is the index. Finally, ldelem
fetches the value of member a[0].

Adding 1 to the member V_4 serves a dual purpose: One to index the array for
ldelema.i4 and the other to stop the loop whenever we cross the length of the array
stored in local V_3. This is how a for each statement is converted, step by step, into
IL code.

a.cs
public class ###
{
{
### # ( ne> ###()9
#.abc("hi"K"bye")9
}
void abc(params strin! 67 b)
{
243885141.doc 253 od 372
System.Console.Writeine(b6)7)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (class ### @D)Kclass System.Strin!67 @D1)
stloc.)
ldloc.)
ldc.i/.2
ne>arr 6mscorlib7System.Strin!
stloc.1
ldloc.1
ldc.i/.)
ldstr "hi"
stelem.re'
ldloc.1
ldc.i/.1
ldstr "bye"
stelem.re'
ldloc.1
call instance void ###::abc(class System.Strin!67)
ret
}
.method private hidebysi! instance void abc(class System.Strin!67 b) il mana!ed
{
.param 617
.custom instance void 6mscorlib7System.Garam.rray.ttribute::.ctor() ( ( )1 )) )) )) )
ldar!.1
ldc.i/.)
ldelem.re'
ret
}
}

6utput
hi

A function with params parameter accepts a variable number of parameters. How
does the compiler handle it?

As usual, we see object V_0, that is an instance of class zzz. Alongwith it is an
array of strings V_1, which we have not created. The number 2 is then placed on
the stack and following it is an array of size 2. As the two parameters i.e. the
strings "hi" and "bye", are to be placed on the stack, IL frst creates an array of size
of 2. This array address is pushed onto the stack.
243885141.doc 254 od 372

Using ldc.i4.0, index 0 is pushed on the stack, followed by a string "hi". Thereafter
instruction stelem is sufxed with the type. Here, ref stands for the object itself.
Thus, the temp array V_1's frst or the zeroth member gets a value "hi" and the
same process is repeated for the second array member. Thus, for a params
parameter, all the parameters are converted into one huge array and the function
abc is called with this array on the stack. The fnal efect is similar to placing all the
individual parameters in one big array.

In the function abc, the frst change is that the function accepts an array with the
same name as in C#. This param directive uses the metadata to store an initial
value for the array. The array has two members "hi" and "bye". It is this data that
the array b's members must be initialized to. The .params with number 1 stands
for the frst parameter in the function prototype. Here 0 stands for the return value
and 1 stands for the frst parameter, that is our array.

We will explore the custom directive in detail later. The rest of the IL code loads the
second member of the array on the stack using ldelem.ref. This is similar in concept
to stelem.ref. Thus, the compiler does a lot of hard work for implementing the
params modifer. To sum up, it converts all the individual parameters into one
array, and this array is placed on the stack. IL does not fully understand the
params modifer. Thus the params modifer has to be the last entry in the
parameter list. The ref prefx is used to denote a reference element.

a.cs
class ###
{
{
### a ( ne> ###()9
a.abc()9
}
unsa'e public void pNr( int Eb)
{
System.Console.Writeine(b617)9
b617 ( 1*9
}
{
int 67 a ( ne> int6279
a6)7 ( 1)9 a617 ( 29
'i$ed ( int Ei ( a) pNr(i)9
}
}

a.il
.assembly mukhi {}
{
243885141.doc 255 od 372
{
.entrypoint
stloc.)
ldloc.)
ret
}
{
.locals (int&267 @D)Kint&2& pinned @D1)
ldc.i/.2
stloc.)
ldloc.)
ldc.i/.)
ldc.i/.s 1)
stelem.i/
ldloc.)
ldc.i/.1
ldc.i/.2
stelem.i/
ldloc.)
ldc.i/.)
ldelema 6mscorlib7System.?nt&2
stloc.1
ldar!.)
ldloc.1
conv.i
call instance void ###::pNr(int&2E)
ldc.i/.)
conv.u
stloc.1
ldloc.)
ldc.i/.1
ldelem.i/
ret
}
}
.method public hidebysi! instance void pNr(int&2E b) il mana!ed
{
ldar!.1
ldc.i/./
ldc.i/.1
mul
add
ldind.i/
ldar!.1
ldc.i/./
ldc.i/.1
mul
243885141.doc 256 od 372
add
ldc.i/.s 1*
stind.i/
ret
}

6utput
>
1N

Here, we will explain certain features of pointer handling in C# and IL. In the C#
program we have created an array of size 2 in the function abc and the array
members are initialised. The keyword fxed fxes the array reference in memory. For
the purpose of efciency, the garbage collector can move things around in memory.
By fxing the reference in memory, we can prevent the Garbage Collector from
moving this reference in memory.
This array reference is stored in a pointer to an int and the function pqr is called.
This function displays the value of the frst member of the array and then changes
it. The change is refected in the original array also. In the locals, we defne our int
array as usual, but we have another variable V_1, that is also a pointer, but with a
& and not a *. This pointer is also pinned, which means that IL will not move it
around. If it is moved in memory, then we cannot keep track of its memory location.
Thus, a fxed becomes a pinned location.

Using ldelema, the array and its index are pushed on the stack. V_1 is initialized
to this value and function pqr is called. In the function pqr, a [] is converted into a
memory location. Thus, the address of the array is loaded on the stack. Then, the
numbers 4 and 1 are placed on the stack because an int size is 4 and the array
index is 1. After multiplying them, 4 is added to the product to get the ofset. The
array members are then displayed. The same logic on arrays can be applied to
change its value. Whether a[1] or *(a+1) is used, the above program remains the
same.

a.cs
public class ###
{
{
strin! 67 s ( ne> strin!6&79
object 67 t ( s9
t6)7 ( null9
t617 ( "hi"9
t627 ( ne> yyy()9
}
}
class yyy
{
}

a.il
243885141.doc 257 od 372
.assembly mukhi {}
{
{
.entrypoint
.locals (class System.Strin!67 @D)Kclass System.%bject67 @D1)
ldc.i/.&
stloc.)
ldloc.)
stloc.1
ldloc.1
ldc.i/.)
ldnull
stelem.re'
ldloc.1
ldc.i/.1
ldstr "hi"
stelem.re'
ldloc.1
ldc.i/.2
stelem.re'
ret
}
}
{
}

6utput
,.ception occurred: )-stem.Arra-$-peMismatch,.ception: An e.ception of
t-pe )-stem.Arra-$-peMismatch,.ception was thrown.
at ===.<iCa-DE

The array s is an array of three strings. We have declared an array of objects but
initialised it to an array of strings, which is perfectly legal in C#. We then initialised
the members of t to a null, a string and a yyy object respectively. The runtime
knows that even though t is an array of objects, it was initialized to an array of
strings. Its members can only be strings or a NULL.

The IL code is very straightforward. It uses newarr to create an array of strings.
Then it uses stloc.1 to initialize V_1 or array t. Thereafter, stelem.ref is used to
initialize the individual array members. However, the last stelem.ref checks the data
type of the runtime error and fags it as an exception. The code used for throwing
the exception is not present in the array class at all. It is in stelem.ref and we are
not privy to this code.

a.cs
public class ###
{
243885141.doc 258 od 372
{
object 67 t ( s9
t6)7 ( (strin!)ne> yyy()9
System.Console.Writeine(t6)7)9
t617 ( ne> yyy()9
System.Console.Writeine(t617)9
}
}
class yyy
{
public static implicit operator strin! ( yyy a)
{
return "hi"9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.&
stloc.)
ldloc.)
stloc.1
ldloc.1
ldc.i/.)
stelem.re'
ldloc.1
ldc.i/.)
ldelem.re'
ldloc.1
ldc.i/.1
stelem.re'
ldloc.1
ldc.i/.1
ldelem.re'
ret
}
}
{
a) il mana!ed
243885141.doc 259 od 372
{
ldstr "hi"
stloc.)
ldloc.)
ret
}
}

6utput
hi
at ===.<iCa-DE

Had the compiler been a little more concerned about exceptions, it would have
prevented the above program from throwing one at runtime, by spotting the error at
compile time itself. We have the same situation as before. The array t is an array of
objects, but initialized to an array of strings. The member t[0] is initialized to a yyy
object, but now with a cast. This cast calls the string operator or op_Implicit
functions, that returns a string.

As the cast is not stated explicitly in the second case, the function op_Implicit does
not convert the yyy object into a String. The compiler should have noticed it at run
time and thrown an exception. But it ignores this completely. Sometimes compilers
do not behave as intelligently as expected.

a.cs
class ###
{
static void 3(params object67 b)
{
object o ( b6)79
System.Console.Writeine(o.Iet=ype().3ull<ame )9
System.Console.Writeine(b.en!th)9
}
static void 8ain()
{
object67 a ( {1K "Vello"K 12&}9
object o ( a9
3(a)9
3((object)a)9
3(o)9
3((object67)o)9
}
}

a.il
.assembly mukhi {}
{
.method private hidebysi! static void 3(class System.%bject67 b) il mana!ed
243885141.doc 260 od 372
{
.param 617
.locals (class System.%bject @D))
ldar!.)
ldc.i/.)
ldelem.re'
stloc.)
ldloc.)
call instance class 6mscorlib7System.=ype 6mscorlib7System.%bject::Iet=ype()
ldar!.)
ldlen
conv.i/
ret
}
.method private hidebysi! static void vijay() il mana!ed
{
.entrypoint
.locals (class System.%bject67 @D)Kclass System.%bject @D1Kclass System.%bject67
@D2Kint&2 @D&)
ldc.i/.&
ne>arr 6mscorlib7System.%bject
stloc.2
ldloc.2
ldc.i/.)
ldc.i/.1
stloc.&
ldloca.s @D&
stelem.re'
ldloc.2
ldc.i/.1
ldstr "Vello"
stelem.re'
ldloc.2
ldc.i/.2
ldc.i/.s 12&
stloc.&
ldloca.s @D&
stelem.re'
ldloc.2
stloc.)
ldloc.)
stloc.1
ldloc.)
call void ###::3(class System.%bject67)
ldc.i/.1
stloc.2
ldloc.2
ldc.i/.)
243885141.doc 261 od 372
ldloc.)
stelem.re'
ldloc.2
ldc.i/.1
stloc.2
ldloc.2
ldc.i/.)
ldloc.1
stelem.re'
ldloc.2
ldloc.1
castclass class System.%bject67
ret
}
}

6utput
)-stem.Int@>
@
)-stem.6CectIJ
1
)-stem.6CectIJ
1
)-stem.Int@>
@

This is quite a huge program. The explanation is slightly complicated but, without
understanding IL code, it is next to impossible to understand the nitty-gritty of C#.

Lets us tread one step at a time. This example demonstrates some basic concepts of
C# programming. We frst create an array of objects called a, of size 3 and initialize
them to two numbers and one string. Remember that everything in the .NET world
is an object. Then we have another object o that is initialized to a. We do not get an
error, but you need to bear in mind that a is an array and o is an object, that now
stirs a reference to an array.

We call the function F four times:

frst with the object a, which is an array.
then with the same object cast to an object.
then with the object o.
fnally with the object a cast to an array of objects.

243885141.doc 262 od 372
The function F accepts the parameter in an array of objects called b. The frst
member b[0] is stored in an object called o. The fullname of this object and the
length of the array are printed using the WriteLine function.

In the frst case, an array of 3 ints is placed on the stack. The name is System.Int32
and the size of the array is 3.

In the second case, as the array is casted into an Object, only the frst member
becomes a System.Object.

The third case has an object placed on the stack which is read in an array of
objects. The size is displayed as 1 since the size of the original is 1.

In the last case, C# remembers that o was equated to an array of 3 ints and thus
the new array size is 3.

Up to the stelem.ref statement,the 3 array members are merely being initialized to
the value of 1, Hello and 123. The local V_0 is array a and local V_1 refers to object
o. As it is an array of objects, the string does not pose any problems, but since the
numbers are value types, they have to be frst converted to a reference type using
the box instruction.

The frst call simply places the array stored in local V_0 on the stack. The second
call places 1 on the stack and then creates a new array of size 1 using newarr. It
stores this new array in local V_2 and then loads the value of local V_2, which is an
object, on the stack. Then, it loads a 0 and the frst main array containing 3
members, on the stack. stelem.ref is used to initialize V_2 to this value. This local is
then placed on the stack. See what a simple cast does.

Similarly, in the third case we create an array of size 1, store it in local V_0 and
then place it on the stack. Then, we place 0 and the local V_1 on the stack and
initialize V_1 to it for the function. The last call simply places the object V_1 on the
stack and calls castclass. Function F is straightforward while performing its job.
Ask yourself whether it was the C# code that enabled you to grasp the program or
was it the IL code?

a.il
.assembly mukhi {}
{
.entrypoint
ldc.i/.&
stloc.)
ldloc.)
243885141.doc 263 od 372
ldc.i/.*
ldc.i/.s 1)
stelem.i/
ret
}
}
6utput
,.ception occurred: )-stem.Inde.6ut6f2ange,.ception: An e.ception of t-pe
)-stem.Inde.6ut6f2ange,.ception was thrown.
at ===.<iCa-DE

Our array above has only 3 members, whereas we tried to store a value in the
seventh member. Whenever we exceed the bounds of an array, we will get a
IndexOutOfRangeException at runtime. Thus, be careful in dealing with arrays. Do
not cross the picket line. We store values in an array and index them, so that we
can retrieve a single item by position.

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.&
stloc.)
ldloc.)
ldc.i/.1
ldelema System.?nt&2
ldc.i/.2
stobj System.?nt&2
ldloc.)
ldc.i/.1
ldobj System.?nt&2
ret
}
}

6utput
>

We have diferent instructions for dealing with value types and arrays. Arrays are
nothing but a number of variables stored together in memory. The ldelema takes
two parameters on the stack. The frst is the address of the array that is V_0 and
the second is the index of the variable whose memory location is desired.

After running the instruction we have on the stack, the address of a variable at a
specifed array index. The instruction ldelema requires the data type of the array,
243885141.doc 264 od 372
because the ofset of the members of the array is decided by the data type. The
instruction stobj stores the value in the memory location thereby initializing the
frst member of the array to 10.

To display the frst member, the address is placed on the stack and ldobj is used to
retrieve the value. The instructions ldobj and stobj have nothing to do with arrays.
They deal with reading a memory location and placing the value found on the stack
and vice versa. Thus they only work with value type arrays.

a.il
.assembly mukhi {}
{
{
.entrypoint
ldnull
ldc.i/.1
ldc.i/.s 1)
stelem.i/
ret
}
}

6utput
at ===.<iCa-DE

Since we placed a null array reference on the stack, we get an
NullReferenceException error. We are basically simulating some of the exceptions
that arrays can throw at us.

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.&
call instance int&2 6mscorlib7System..rray::!etDen!th()
ret
}
}

6utput
@

243885141.doc 265 od 372
Like we used the ldlen instruction earlier, we could have instead used the
get_Length function, which in turn, is a Property of the Array class. The choice is
yours, but as we demonstrated earlier, the Length property is converted to the ldlen
instruction by the C# compiler, as it is far more efcient. At the end of the day, the
get_Length function does the same thing. IL does not have instructions that can
handle arrays other than vectors. Thus, multi-dimensional arrays, also called
general arrays, are created using array functions.

a.cs
class ###
{
{
int 6K7 a ( ne> int61K279
a6)K)7 ( 1)9
a6)K17 ( 2)9
System.Console.Writeine(a6)K17)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&26)...K)...7 @D))
ldc.i/.1
ldc.i/.2
ne>obj instance void int&26)...K)...7::.ctor(int&2Kint&2)
stloc.)
ldloc.)
ldc.i/.)
ldc.i/.)
ldc.i/.s 1)
call instance void int&26)...K)...7::Set(int&2Kint&2Kint&2)
ldloc.)
ldc.i/.)
ldc.i/.1
ldc.i/.s 2)
ldloc.)
ldc.i/.)
ldc.i/.1
call instance int&2 int&26)...K)...7::Iet(int&2Kint&2)
ret
}
}

243885141.doc 266 od 372
6utput
>8

One area where C# excels in is array handling. This is only because IL understands
arrays internally. Lets us now fnd out how IL handles two dimensional arrays.

A two dimensional array is declared in the same way that a normal array is
declared, and the dimensions are stated in the new instruction. The array index
starts from 0 and not from 1. In IL, to create a two dimensional array, there is a
special syntax, i.e. a 0 followed by 3 dots, twice in the locals directive. The two array
dimensions are placed on the stack and newobj is called. It is not newarr. Newobj
calls the constructor of the two dimensional array class that takes two parameters.
The return value is then stored in local V_0.

To fetch a value from a two dimensional array, the reference to the array is loaded
on the stack and stored in V_0, followed by the two indexes, using ldc. Thereafter
the values are placed on the stack to initialize the array member. The function Set
of the same int array class is called with four parameters on the stack.

Conversely, to fetch a value, the function Get is called with the 3 parameters on the
stack, the array reference and the 2 index values. Thus, multi-dimensional arrays
are built using array class functions, and not IL instructions, which are used to
build single dimensional arrays. The rank of an array is defned as the number of
dimensions of the array. The runtime expects at least a rank of 1.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&261...&K1...7 @D))
ldc.i/.2
ldc.i/.*
ne>obj instance void int&261...&K1...7::.ctor(int&2Kint&2)
pop
ret
}
}

A general purpose array has an upper bound and a lower bound. Unfortunately, as
of now, the runtime does not do any bound checking. The frst dimension has a
lower bound of 1 and an upper bound of 3. You can choose the bounds you desire.
a.cs
class ###
{
{
243885141.doc 267 od 372
int 6KK7 a9
a ( ne> int62K&K/79
a61K2K&7 ( 1)9
System.Console.Writeine(a61K2K&7)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&26)...K)...K)...7 @D))
ldc.i/.2
ldc.i/.&
ldc.i/./
ne>obj instance void int&26)...K)...K)...7::.ctor(int&2Kint&2Kint&2)
stloc.)
ldloc.)
ldc.i/.1
ldc.i/.2
ldc.i/.&
ldc.i/.s 1)
call instance void int&26)...K)...K)...7::Set(int&2Kint&2Kint&2Kint&2)
ldloc.)
ldc.i/.1
ldc.i/.2
ldc.i/.&
call instance int&2 int&26)...K)...K)...7::Iet(int&2Kint&2Kint&2)
ret
}
}

6utput
18

An array can have any rank. The above array is a three dimensional one and has a
rank of 3. So, we have to use the array handling functions to work with them. The
rank of an array is declared by using a comma between the square brackets. The
number of commas plus one is the rank of an array. If no specifc bounds are
supplied, the default is 0 for the lower bound and infnity for the upper bound.

You can specify none, one or both bounds. The CLR, in this version, ignores all the
bounds information you provide, and only pays heed to the number placed on the
stack at the time of creation of the array. Here, you have to supply all the
information. Only those arrays that have a 0 bound in all their dimensions, are CLR
compliant.

243885141.doc 268 od 372
In the above example, three bound values are placed on the stack and the array
constructor is called with three values. We are not allowed to use newarr, as the
above array is not a vector. Now to set it to a value, the three index values are
placed on the stack in a specifc order. The same Set Function is called, but this
time with four parameters. The same rules are relevant for the Get function also.
The point that we want to make is that the magnitude of the rank has no efect on
the way the array is handled. No substantial changes are required.

There are two array constructors that can be used. The frst takes the same
number of parameters as the rank of the array. The second constructor takes up
twice the number of parameters as the rank of the array. In the second type of
constructor, the frst two parameters specify the lower and upper bounds of the frst
dimension, and the next two parameters specify the upper and lower bounds for the
second dimension and so on. The frst constructor always assumes the lower bound
to be zero.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&261...1)K&...47 @D))
ldc.i/.1
ldc.i/.*
ldc.i/.&
ldc.i/.1
ne>obj instance void int&261...1)K&...47::.ctor(int&2K int&2Kint&2Kint&2)
stloc.)
ldloc.)
ldc.i/.*
ldc.i/.1
ldc.i/.s 1)
ldloc.)
ldc.i/.*
ldc.i/.1
ret
}
}

6utput
18

ldc.i/.*
ldc.i/.1

We then change the above two lines to
243885141.doc 269 od 372

ldc.i/.1
ldc.i/.2

and we see the following exception thrown at us.

at ===.<iCa-DE

An array with a lower and upper bound, having a rank of 2 is placed on the stack.
The frst dimension starts at 5 and ends at 10. Thus, on the stack is placed frst the
lower bound i.e. 5, and then, the length of the array. There is no upper bounds. As
the array starts at 5 and ends at 10, the length is calculated as follows: 10 - 5 + 1 =
6 (i.e. the upper bound - lower bound + 1). The same rule holds true for the next
rank.

The rest of the code remains the same. When the array member 6, 5 are changed to
index values of 1, 2, an exception is thrown. This is because the array bounds for
the frst dimension are 5 to 10 and for the second dimension are 3 to 7. Any
attempt to cross the array bounds in any direction generates an exception.

Array of Arrays

a.cs
class ###
{
{
int 6767 a ( ne> int627679
a6)7 ( ne> int6179
a617 ( ne> int61)79
System.Console.Writeine(a.en!th) 9
System.Console.Writeine(a6)7.en!th) 9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.2
ne>arr int&267
stloc.)
ldloc.)
243885141.doc 270 od 372
ldc.i/.)
ldc.i/.1
stelem.re'
ldloc.)
ldc.i/.1
ldc.i/.s 1)
stelem.re'
ldloc.)
ldlen
conv.i/
ldloc.)
ldc.i/.)
ldelem.re'
ldlen
conv.i/
ret
}
}

6utput
>
1

Let us explore jagged arrays where an array member can contain another array of a
diferent length. We are creating an array that has an irregular shape. In C#, the
syntax to create an array of arrays is the same. It consists of two square brackets []
[]. We frst create the array using only the frst dimension. This is done by using
newarr and stating an array data type as a parameter. We then initialize V_0 with
this array reference.

Now, since we have to create two separate one dimensional arrays, we frst place the
array reference on the stack. Then we place the index of the array member we want
to initialise followed by the size of the new array. Finally, we call newarr to create an
array of ints and place the reference on the stack. stelem.ref is used to initialize the
array member with this array reference. The same is repeated for the second
member a[1] also.

The function ldlen returns the length of the array. For the main array, using ldloc.0
its reference is placed on the stack. For the second length, ldelem.ref is used to frst
fetch the reference of the array out of the frst array member a[0], and then ldlen is
used to obtain the length.

a.cs
class ###
{
{
243885141.doc 271 od 372
int6767 a ( ne> int62767 { ne> int67 {2K&}K ne> int67 {1K*K4} }9
System.Console.Writeine(a6)7617) 9
System.Console.Writeine(a617627) 9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.2
ne>arr int&267
stloc.1
ldloc.1
ldc.i/.)
ldc.i/.2
stloc.2
ldloc.2
ldc.i/.)
ldc.i/.2
stelem.i/
ldloc.2
ldc.i/.1
ldc.i/.&
stelem.i/
ldloc.2
stelem.re'
ldloc.1
ldc.i/.1
ldc.i/.&
stloc.2
ldloc.2
ldc.i/.)
ldc.i/.1
stelem.i/
ldloc.2
ldc.i/.1
ldc.i/.*
stelem.i/
ldloc.2
ldc.i/.2
ldc.i/.4
stelem.i/
ldloc.2
stelem.re'
ldloc.1
stloc.)
ldloc.)
243885141.doc 272 od 372
ldc.i/.)
ldelem.re'
ldc.i/.1
ldelem.i/
ldloc.)
ldc.i/.1
ldelem.re'
ldc.i/.2
ldelem.i/
ret
}
}

6utput
@
:

The above example is similar to its predecessor, though it is more elaborate and
complete. A jagged array is created that is made of two arrays of sizes 2 and 3
respectively. They can be initialized in one stroke. IL does it the hard way. To fetch
the value of a[1][2], it places the reference of the array on the stack. Then it places
1, the frst array index, on the stack. Thereafter, ldelem.ref is used to obtain an
array reference.

Thus, at frst an array reference is pushed on the stack. Then 2 is placed on the
stack, and ldelema.i4 is used to get the second member of this new array. A jagged
array is treated as an array whose members contain other independent arrays.

An array of arrays is diferent from a multi dimensional array. A multi dimensional
array forms one memory block whereas, an array of arrays holds references to other
arrays in memory. Thus, an array of arrays is slower in execution since it needs to
make an extra indirection to reach the fnal element.

We can also use pointers with arrays. The salient feature of an array of arrays is
that, the frst array merely stores the addresses of other arrays. The disadvantage of
a multi dimensional array is the fact that, all the dimensions have to be of the same
size.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2676767 a)
ldc.i/.1
ne>obj instance void int&2676767::.ctor(int&2)
243885141.doc 273 od 372
stloc a
ldloc a
ldc.i/.)
ldc.i/.&
call instance void int&2676767::Set(int&2K int&26767)
ldloc a
ldc.i/.)
call instance int&26767 int&2676767::Iet(int&2)
ldc.i/.1
ldc.i/ 1)
ldloc a
ldc.i/.)
ldc.i/.1
ldc.i/.1
ldc.i/ 1))
ldloc a
ldc.i/.)
ldc.i/.1
ldc.i/.1
ret
}
}

6utput
188

Here, we shall see how to create an array a of type [][][]. We frst create a local a of
type array of array of array. Thus, we have two levels of indirection. We want the
frst or main array to have a size of 5 i.e. it should be able to store the references of
5 arrays in memory. The instruction ldc places the size 5 of this array on the stack.
Thereafter newobj is used to create the frst dimension of this array. The instruction
stloc a initializes this array and ldloc a put its reference on the stack.

Subsequently two values are placed on the stack. One is the index of the frst
member a[0] and the other is the size of the array that this member should point to
i.e. 3. newobj creates an array called int32[][]. To store it in a[0] the Set function is
used. This function requires the index of the array as the frst parameter. Hence, 0
is placed on the stack, even though newobj does not require it. It simplifes the call
of the Set function.

The next thing required is an int32[] to store in our int32[][]. So, the array a is
placed again on the stack and 0 is used to obtain the value of the array that has
243885141.doc 274 od 372
just been created. The Get functions does the job of retrieving values. Then, as
before, 1 is placed on the stack followed by the size of the new array i.e.10. Finally,
newobj creates a simple array int32[] and places it on the stack which is then
stored using the Set function.

Remember that the value 1 has already been placed on the stack. To execute the
operation a[0][1][5] = 100 the member a[0] is requred. So, the array reference a is
placed on the stack followed by 0 and the Get function is called.

To access a[0][1], as the frst member of array a[0] is already on the stack, all that is
requred is placing 1 on the stack and calling Get again. Now, to store the value in
the member a[0][1][5], 5 is loaded on the stack. To fetch the values of member a[0]
[1][5], the same procedure as before is followed. That is

load the array reference on the stack.
obtain the member 0 by using get.
obtain the member 1 of this array
fnally the member 5 on this array.

The logic is the same as described earlier.

a.cs
usin! System9
public class ###
{
public static void abc(int iK DDar!list)
{
.r!?terator a ( ne> .r!?terator(DDar!list)9
>hile (a.Iet;emainin!Count() M ))
Console.Writeine(DDre'value(a.Iet<e$t.r!()K int))9
}
{
abc(2)K DDar!list(1K 2K &))9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.s 2)
ldc.i/.1
ldc.i/.2
ldc.i/.&
call varar! void ###::abc(int&2K...Kint&2Kint&2Kint&2)
ret
243885141.doc 275 od 372
}
.method public hidebysi! static varar! void abc(int&2 i) il mana!ed
{
.locals (value class 6mscorlib7System..r!?terator @D))
ldloca.s @D)
ar!list
call instance void 6mscorlib7System..r!?terator::.ctor(value class
6mscorlib7System.;untime.r!umentVandle)
br.s ?D))1d
?D)))b: ldloca.s @D)
call instance typedre' 6mscorlib7System..r!?terator::Iet<e$t.r!()
re'anyval 6mscorlib7System.?nt&2
ldind.i/
?D))1d: ldloca.s @D)
call instance int&2 6mscorlib7System..r!?terator::Iet;emainin!Count()
ldc.i/.)
b!t.s ?D)))b
ret
}
}

6utput
1
>
@

This example builds upon the earlier example, which has a function that accepts a
variable number of arguments. In C#, __arglist enables us to implement a function
that accepts a variable number of arguments.

Internally, in IL, the function is marked with a vararg modifer and, the ArgIterator
class is used to display the values in a loop.
-13-

$he other odds and ends
An array is a contiguous block of memory that stores values of the same type.
These values are an indexed collection. The runtime has built in support to handle
arrays. Vector is another name for an array that has only one dimension and the
index count starts at zero. An array type can be any type derived from
System.Object. This includes everything under the sun, excluding pointers, which
are not allowed in this version of the CLR. Nobody knows about the next version. An
array is a subtype of System.Array and we are given plenty of leeway in working
with arrays. The newarr instruction is used only for single dimensional arrays.

a.cs
class ###
243885141.doc 276 od 372
{
{
int67 a9
a( ne> int6&79
a617( 1)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.&
stloc.)
ldloc.)
ldc.i/.1
ldc.i/.s 1)
stelem.i/
ldloc.)
ldc.i/.1
ldelem.i/
ret
}
}

6utput
18

IL recognises the array data type. Thus, in the locals directive, we see an array of
int32 called V_0. This is similar to the process of creating an array in C# where we
frst specify that we want an array variable. Then, to create the actual array, the
size of the array is mentioned. In IL, the size is placed on the stack. IL uses newarr,
similar to newobj to create the array in memory. However, in C#, new is used for
an array as well as for a reference type. The data type of the array to be created is
also passed to the newarr instruction. Like newobj, newarr also places the reference
of the array on the stack. Thereafter, V_0 is initialized with this reference, which is
pushed on the stack using ldloc.0.

We will now explain the IL code generated for the statement a[1] = 10. To do so, the
array index, in this case, the value 1 followed by the value of the array is to be
initialized i.e. 10 is pushed on the stack. So, there are 3 items on our stack: At the
243885141.doc 277 od 372
bottom, the array reference, then the array index and fnally the new value of the
array variables.

These parameters are required by the instruction stelem.i4 to initialize an array
member. To read the value of an array variable, the address of the array reference
is loaded on the stack, followed by the index of the array. The instruction ldelem.i4
does the reverse. It retrieves the value of an array variable. As mentioned earlier, i4
stands for 4 bytes on the stack. Most instructions have such a data type at the end
of their instruction.

a.cs
class ###
{
public static void 8ain(strin!67 a)
{
System.Console.Writeine(a.en!th)9
}
}

a.il
.assembly mukhi {}
{
.method public hidebysi! static void vijay(class System.Strin!67 a) il mana!ed
{
.entrypoint
ldar!.)
ldlen
conv.i/
ret
}
}

& a one two

6utput
>

The array class has a member called Length.This Length member in C# gets
converted to an IL instruction ldlen, that requires an array object on the stack and
returns the length. Array handling is very powerful in .NET because IL has an
intrinsic ability to understand arrays. In IL, the array has been made a frst class
member.

a.cs
class ###
{
{
int67 a9
243885141.doc 278 od 372
a( ne> int6279
a6)7( 129 a617( 1)9
'oreach (int i in a)
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&267 @D)Kint&2 @D1Kint&267 @D2Kint&2 @D&Kint&2 @D/)
ldc.i/.2
stloc.)
ldloc.)
ldc.i/.)
ldc.i/.s 12
stelem.i/
ldloc.)
ldc.i/.1
ldc.i/.s 1)
stelem.i/
ldloc.)
stloc.2
ldloc.2
ldlen
conv.i/
stloc.&
ldc.i/.)
stloc.s @D/
br.s ?D))2d
?D))1c: ldloc.2
ldloc.s @D/
ldelem.i/
stloc.1
ldloc.1
ldloc.s @D/
ldc.i/.1
add
stloc.s @D/
?D))2d: ldloc.s @D/
ldloc.&
blt.s ?D))1c
ret
}
}

6utput
1>
18
243885141.doc 279 od 372

Here we have a small C# program that has been transformed to a large IL program.
To begin, we have created 5 locals instead on 1. Two of them, V_0 and V_2, are
arrays and the rest are mere ints. The two stelem.i4 instructions initialize the 2
array members as seen in the above programs.

Now let us understand how IL deals with a foreach statement. Ldloc.0 stores the
reference of the array on the stack. The instruction stloc.2 makes local V_2 as the
same array reference as V_0. Then the array reference V_2, which is similar to V_0,
is loaded on the stack. Finally using instruction ldlen, the length of the array is
determined.

The number 2 is present on the stack. This represents the length of the array. It is
changed to occupy 4 bytes on the stack and is stored in local V_3, using the
instruction stloc.3. The number 0 is then placed on the stack using the ldc
instruction. stloc pops this value 0 into local V_4 and br branches to label
IL_002d where the value of variable V_4, 0, is loaded. Also the value of local V_3,
that stores the length of the array, i.e. 2 is loaded on the stack.

Since 1 is less than 2, the code at label IL_001c is executed. This loads the array
reference on the stack, then loads local V_4, which is the index. Finally, ldelem
fetches the value of member a[0].

Adding 1 to the member V_4 serves a dual purpose: One to index the array for
ldelema.i4 and the other to stop the loop whenever we cross the length of the array
stored in local V_3. This is how a for each statement is converted, step by step, into
IL code.

a.cs
public class ###
{
{
### # ( ne> ###()9
#.abc("hi"K"bye")9
}
void abc(params strin! 67 b)
{
System.Console.Writeine(b6)7)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
243885141.doc 280 od 372
.locals (class ### @D)Kclass System.Strin!67 @D1)
stloc.)
ldloc.)
ldc.i/.2
stloc.1
ldloc.1
ldc.i/.)
ldstr "hi"
stelem.re'
ldloc.1
ldc.i/.1
ldstr "bye"
stelem.re'
ldloc.1
call instance void ###::abc(class System.Strin!67)
ret
}
.method private hidebysi! instance void abc(class System.Strin!67 b) il mana!ed
{
.param 617
ldar!.1
ldc.i/.)
ldelem.re'
ret
}
}

6utput
hi

A function with params parameter accepts a variable number of parameters. How
does the compiler handle it?

As usual, we see object V_0, that is an instance of class zzz. Alongwith it is an
array of strings V_1, which we have not created. The number 2 is then placed on
the stack and following it is an array of size 2. As the two parameters i.e. the
strings "hi" and "bye", are to be placed on the stack, IL frst creates an array of size
of 2. This array address is pushed onto the stack.

Using ldc.i4.0, index 0 is pushed on the stack, followed by a string "hi". Thereafter
instruction stelem is sufxed with the type. Here, ref stands for the object itself.
Thus, the temp array V_1's frst or the zeroth member gets a value "hi" and the
same process is repeated for the second array member. Thus, for a params
parameter, all the parameters are converted into one huge array and the function
abc is called with this array on the stack. The fnal efect is similar to placing all the
individual parameters in one big array.

243885141.doc 281 od 372
In the function abc, the frst change is that the function accepts an array with the
same name as in C#. This param directive uses the metadata to store an initial
value for the array. The array has two members "hi" and "bye". It is this data that
the array b's members must be initialized to. The .params with number 1 stands
for the frst parameter in the function prototype. Here 0 stands for the return value
and 1 stands for the frst parameter, that is our array.

We will explore the custom directive in detail later. The rest of the IL code loads the
second member of the array on the stack using ldelem.ref. This is similar in concept
to stelem.ref. Thus, the compiler does a lot of hard work for implementing the
params modifer. To sum up, it converts all the individual parameters into one
array, and this array is placed on the stack. IL does not fully understand the
params modifer. Thus the params modifer has to be the last entry in the
parameter list. The ref prefx is used to denote a reference element.

a.cs
class ###
{
{
### a ( ne> ###()9
a.abc()9
}
unsa'e public void pNr( int Eb)
{
System.Console.Writeine(b617)9
b617 ( 1*9
}
{
int 67 a ( ne> int6279
a6)7 ( 1)9 a617 ( 29
'i$ed ( int Ei ( a) pNr(i)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
stloc.)
ldloc.)
ret
}
243885141.doc 282 od 372
{
.locals (int&267 @D)Kint&2& pinned @D1)
ldc.i/.2
stloc.)
ldloc.)
ldc.i/.)
ldc.i/.s 1)
stelem.i/
ldloc.)
ldc.i/.1
ldc.i/.2
stelem.i/
ldloc.)
ldc.i/.)
ldelema 6mscorlib7System.?nt&2
stloc.1
ldar!.)
ldloc.1
conv.i
call instance void ###::pNr(int&2E)
ldc.i/.)
conv.u
stloc.1
ldloc.)
ldc.i/.1
ldelem.i/
ret
}
}
.method public hidebysi! instance void pNr(int&2E b) il mana!ed
{
ldar!.1
ldc.i/./
ldc.i/.1
mul
add
ldind.i/
ldar!.1
ldc.i/./
ldc.i/.1
mul
add
ldc.i/.s 1*
stind.i/
ret
}

6utput
>
1N

243885141.doc 283 od 372
Here, we will explain certain features of pointer handling in C# and IL. In the C#
program we have created an array of size 2 in the function abc and the array
members are initialised. The keyword fxed fxes the array reference in memory. For
the purpose of efciency, the garbage collector can move things around in memory.
By fxing the reference in memory, we can prevent the Garbage Collector from
moving this reference in memory.
This array reference is stored in a pointer to an int and the function pqr is called.
This function displays the value of the frst member of the array and then changes
it. The change is refected in the original array also. In the locals, we defne our int
array as usual, but we have another variable V_1, that is also a pointer, but with a
& and not a *. This pointer is also pinned, which means that IL will not move it
around. If it is moved in memory, then we cannot keep track of its memory location.
Thus, a fxed becomes a pinned location.

Using ldelema, the array and its index are pushed on the stack. V_1 is initialized
to this value and function pqr is called. In the function pqr, a [] is converted into a
memory location. Thus, the address of the array is loaded on the stack. Then, the
numbers 4 and 1 are placed on the stack because an int size is 4 and the array
index is 1. After multiplying them, 4 is added to the product to get the ofset. The
array members are then displayed. The same logic on arrays can be applied to
change its value. Whether a[1] or *(a+1) is used, the above program remains the
same.

a.cs
public class ###
{
{
object 67 t ( s9
t6)7 ( null9
t617 ( "hi"9
t627 ( ne> yyy()9
}
}
class yyy
{
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.&
stloc.)
243885141.doc 284 od 372
ldloc.)
stloc.1
ldloc.1
ldc.i/.)
ldnull
stelem.re'
ldloc.1
ldc.i/.1
ldstr "hi"
stelem.re'
ldloc.1
ldc.i/.2
stelem.re'
ret
}
}
{
}

6utput
at ===.<iCa-DE

The array s is an array of three strings. We have declared an array of objects but
initialised it to an array of strings, which is perfectly legal in C#. We then initialised
the members of t to a null, a string and a yyy object respectively. The runtime
knows that even though t is an array of objects, it was initialized to an array of
strings. Its members can only be strings or a NULL.

The IL code is very straightforward. It uses newarr to create an array of strings.
Then it uses stloc.1 to initialize V_1 or array t. Thereafter, stelem.ref is used to
initialize the individual array members. However, the last stelem.ref checks the data
type of the runtime error and fags it as an exception. The code used for throwing
the exception is not present in the array class at all. It is in stelem.ref and we are
not privy to this code.

a.cs
public class ###
{
{
object 67 t ( s9
t6)7 ( (strin!)ne> yyy()9
System.Console.Writeine(t6)7)9
t617 ( ne> yyy()9
System.Console.Writeine(t617)9
}
}
243885141.doc 285 od 372
class yyy
{
public static implicit operator strin! ( yyy a)
{
return "hi"9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.&
stloc.)
ldloc.)
stloc.1
ldloc.1
ldc.i/.)
stelem.re'
ldloc.1
ldc.i/.)
ldelem.re'
ldloc.1
ldc.i/.1
stelem.re'
ldloc.1
ldc.i/.1
ldelem.re'
ret
}
}
{
a) il mana!ed
{
ldstr "hi"
stloc.)
ldloc.)
ret
}
}

6utput
243885141.doc 286 od 372
hi
at ===.<iCa-DE

Had the compiler been a little more concerned about exceptions, it would have
prevented the above program from throwing one at runtime, by spotting the error at
compile time itself. We have the same situation as before. The array t is an array of
objects, but initialized to an array of strings. The member t[0] is initialized to a yyy
object, but now with a cast. This cast calls the string operator or op_Implicit
functions, that returns a string.

As the cast is not stated explicitly in the second case, the function op_Implicit does
not convert the yyy object into a String. The compiler should have noticed it at run
time and thrown an exception. But it ignores this completely. Sometimes compilers
do not behave as intelligently as expected.

a.cs
class ###
{
static void 3(params object67 b)
{
object o ( b6)79
System.Console.Writeine(o.Iet=ype().3ull<ame )9
System.Console.Writeine(b.en!th)9
}
static void 8ain()
{
object67 a ( {1K "Vello"K 12&}9
object o ( a9
3(a)9
3((object)a)9
3(o)9
3((object67)o)9
}
}

a.il
.assembly mukhi {}
{
.method private hidebysi! static void 3(class System.%bject67 b) il mana!ed
{
.param 617
.locals (class System.%bject @D))
ldar!.)
ldc.i/.)
ldelem.re'
stloc.)
ldloc.)
call instance class 6mscorlib7System.=ype 6mscorlib7System.%bject::Iet=ype()
243885141.doc 287 od 372
ldar!.)
ldlen
conv.i/
ret
}
.method private hidebysi! static void vijay() il mana!ed
{
.entrypoint
.locals (class System.%bject67 @D)Kclass System.%bject @D1Kclass System.%bject67
@D2Kint&2 @D&)
ldc.i/.&
stloc.2
ldloc.2
ldc.i/.)
ldc.i/.1
stloc.&
ldloca.s @D&
stelem.re'
ldloc.2
ldc.i/.1
ldstr "Vello"
stelem.re'
ldloc.2
ldc.i/.2
ldc.i/.s 12&
stloc.&
ldloca.s @D&
stelem.re'
ldloc.2
stloc.)
ldloc.)
stloc.1
ldloc.)
ldc.i/.1
stloc.2
ldloc.2
ldc.i/.)
ldloc.)
stelem.re'
ldloc.2
ldc.i/.1
stloc.2
ldloc.2
ldc.i/.)
ldloc.1
243885141.doc 288 od 372
stelem.re'
ldloc.2
ldloc.1
castclass class System.%bject67
ret
}
}

6utput
)-stem.Int@>
@
)-stem.6CectIJ
1
)-stem.6CectIJ
1
)-stem.Int@>
@

This is quite a huge program. The explanation is slightly complicated but, without
understanding IL code, it is next to impossible to understand the nitty-gritty of C#.

Lets us tread one step at a time. This example demonstrates some basic concepts of
C# programming. We frst create an array of objects called a, of size 3 and initialize
them to two numbers and one string. Remember that everything in the .NET world
is an object. Then we have another object o that is initialized to a. We do not get an
error, but you need to bear in mind that a is an array and o is an object, that now
stirs a reference to an array.

We call the function F four times:

frst with the object a, which is an array.
then with the same object cast to an object.
then with the object o.
fnally with the object a cast to an array of objects.

The function F accepts the parameter in an array of objects called b. The frst
member b[0] is stored in an object called o. The fullname of this object and the
length of the array are printed using the WriteLine function.

In the frst case, an array of 3 ints is placed on the stack. The name is System.Int32
and the size of the array is 3.

In the second case, as the array is casted into an Object, only the frst member
becomes a System.Object.
243885141.doc 289 od 372

The third case has an object placed on the stack which is read in an array of
objects. The size is displayed as 1 since the size of the original is 1.

In the last case, C# remembers that o was equated to an array of 3 ints and thus
the new array size is 3.

Up to the stelem.ref statement,the 3 array members are merely being initialized to
the value of 1, Hello and 123. The local V_0 is array a and local V_1 refers to object
o. As it is an array of objects, the string does not pose any problems, but since the
numbers are value types, they have to be frst converted to a reference type using
the box instruction.

The frst call simply places the array stored in local V_0 on the stack. The second
call places 1 on the stack and then creates a new array of size 1 using newarr. It
stores this new array in local V_2 and then loads the value of local V_2, which is an
object, on the stack. Then, it loads a 0 and the frst main array containing 3
members, on the stack. stelem.ref is used to initialize V_2 to this value. This local is
then placed on the stack. See what a simple cast does.

Similarly, in the third case we create an array of size 1, store it in local V_0 and
then place it on the stack. Then, we place 0 and the local V_1 on the stack and
initialize V_1 to it for the function. The last call simply places the object V_1 on the
stack and calls castclass. Function F is straightforward while performing its job.
Ask yourself whether it was the C# code that enabled you to grasp the program or
was it the IL code?

a.il
.assembly mukhi {}
{
.entrypoint
ldc.i/.&
stloc.)
ldloc.)
ldc.i/.*
ldc.i/.s 1)
stelem.i/
ret
}
}
6utput
at ===.<iCa-DE
243885141.doc 290 od 372

Our array above has only 3 members, whereas we tried to store a value in the
seventh member. Whenever we exceed the bounds of an array, we will get a
IndexOutOfRangeException at runtime. Thus, be careful in dealing with arrays. Do
not cross the picket line. We store values in an array and index them, so that we
can retrieve a single item by position.

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.&
stloc.)
ldloc.)
ldc.i/.1
ldc.i/.2
stobj System.?nt&2
ldloc.)
ldc.i/.1
ldobj System.?nt&2
ret
}
}

6utput
>

We have diferent instructions for dealing with value types and arrays. Arrays are
nothing but a number of variables stored together in memory. The ldelema takes
two parameters on the stack. The frst is the address of the array that is V_0 and
the second is the index of the variable whose memory location is desired.

After running the instruction we have on the stack, the address of a variable at a
specifed array index. The instruction ldelema requires the data type of the array,
because the ofset of the members of the array is decided by the data type. The
instruction stobj stores the value in the memory location thereby initializing the
frst member of the array to 10.

To display the frst member, the address is placed on the stack and ldobj is used to
retrieve the value. The instructions ldobj and stobj have nothing to do with arrays.
They deal with reading a memory location and placing the value found on the stack
and vice versa. Thus they only work with value type arrays.
243885141.doc 291 od 372

a.il
.assembly mukhi {}
{
{
.entrypoint
ldnull
ldc.i/.1
ldc.i/.s 1)
stelem.i/
ret
}
}

6utput
at ===.<iCa-DE

Since we placed a null array reference on the stack, we get an
NullReferenceException error. We are basically simulating some of the exceptions
that arrays can throw at us.

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.&
call instance int&2 6mscorlib7System..rray::!etDen!th()
ret
}
}

6utput
@

Like we used the ldlen instruction earlier, we could have instead used the
get_Length function, which in turn, is a Property of the Array class. The choice is
yours, but as we demonstrated earlier, the Length property is converted to the ldlen
instruction by the C# compiler, as it is far more efcient. At the end of the day, the
get_Length function does the same thing. IL does not have instructions that can
handle arrays other than vectors. Thus, multi-dimensional arrays, also called
general arrays, are created using array functions.

a.cs
243885141.doc 292 od 372
class ###
{
{
int 6K7 a ( ne> int61K279
a6)K)7 ( 1)9
a6)K17 ( 2)9
System.Console.Writeine(a6)K17)9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&26)...K)...7 @D))
ldc.i/.1
ldc.i/.2
ne>obj instance void int&26)...K)...7::.ctor(int&2Kint&2)
stloc.)
ldloc.)
ldc.i/.)
ldc.i/.)
ldc.i/.s 1)
ldloc.)
ldc.i/.)
ldc.i/.1
ldc.i/.s 2)
ldloc.)
ldc.i/.)
ldc.i/.1
ret
}
}

6utput
>8

One area where C# excels in is array handling. This is only because IL understands
arrays internally. Lets us now fnd out how IL handles two dimensional arrays.

A two dimensional array is declared in the same way that a normal array is
declared, and the dimensions are stated in the new instruction. The array index
starts from 0 and not from 1. In IL, to create a two dimensional array, there is a
243885141.doc 293 od 372
special syntax, i.e. a 0 followed by 3 dots, twice in the locals directive. The two array
dimensions are placed on the stack and newobj is called. It is not newarr. Newobj
calls the constructor of the two dimensional array class that takes two parameters.
The return value is then stored in local V_0.

To fetch a value from a two dimensional array, the reference to the array is loaded
on the stack and stored in V_0, followed by the two indexes, using ldc. Thereafter
the values are placed on the stack to initialize the array member. The function Set
of the same int array class is called with four parameters on the stack.

Conversely, to fetch a value, the function Get is called with the 3 parameters on the
stack, the array reference and the 2 index values. Thus, multi-dimensional arrays
are built using array class functions, and not IL instructions, which are used to
build single dimensional arrays. The rank of an array is defned as the number of
dimensions of the array. The runtime expects at least a rank of 1.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&261...&K1...7 @D))
ldc.i/.2
ldc.i/.*
ne>obj instance void int&261...&K1...7::.ctor(int&2Kint&2)
pop
ret
}
}

A general purpose array has an upper bound and a lower bound. Unfortunately, as
of now, the runtime does not do any bound checking. The frst dimension has a
lower bound of 1 and an upper bound of 3. You can choose the bounds you desire.
a.cs
class ###
{
{
int 6KK7 a9
a ( ne> int62K&K/79
a61K2K&7 ( 1)9
System.Console.Writeine(a61K2K&7)9
}
}

a.il
.assembly mukhi {}
243885141.doc 294 od 372
{
{
.entrypoint
.locals (int&26)...K)...K)...7 @D))
ldc.i/.2
ldc.i/.&
ldc.i/./
ne>obj instance void int&26)...K)...K)...7::.ctor(int&2Kint&2Kint&2)
stloc.)
ldloc.)
ldc.i/.1
ldc.i/.2
ldc.i/.&
ldc.i/.s 1)
call instance void int&26)...K)...K)...7::Set(int&2Kint&2Kint&2Kint&2)
ldloc.)
ldc.i/.1
ldc.i/.2
ldc.i/.&
call instance int&2 int&26)...K)...K)...7::Iet(int&2Kint&2Kint&2)
ret
}
}

6utput
18

An array can have any rank. The above array is a three dimensional one and has a
rank of 3. So, we have to use the array handling functions to work with them. The
rank of an array is declared by using a comma between the square brackets. The
number of commas plus one is the rank of an array. If no specifc bounds are
supplied, the default is 0 for the lower bound and infnity for the upper bound.

You can specify none, one or both bounds. The CLR, in this version, ignores all the
bounds information you provide, and only pays heed to the number placed on the
stack at the time of creation of the array. Here, you have to supply all the
information. Only those arrays that have a 0 bound in all their dimensions, are CLR
compliant.

In the above example, three bound values are placed on the stack and the array
constructor is called with three values. We are not allowed to use newarr, as the
above array is not a vector. Now to set it to a value, the three index values are
placed on the stack in a specifc order. The same Set Function is called, but this
time with four parameters. The same rules are relevant for the Get function also.
The point that we want to make is that the magnitude of the rank has no efect on
the way the array is handled. No substantial changes are required.

243885141.doc 295 od 372
There are two array constructors that can be used. The frst takes the same
number of parameters as the rank of the array. The second constructor takes up
twice the number of parameters as the rank of the array. In the second type of
constructor, the frst two parameters specify the lower and upper bounds of the frst
dimension, and the next two parameters specify the upper and lower bounds for the
second dimension and so on. The frst constructor always assumes the lower bound
to be zero.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&261...1)K&...47 @D))
ldc.i/.1
ldc.i/.*
ldc.i/.&
ldc.i/.1
ne>obj instance void int&261...1)K&...47::.ctor(int&2K int&2Kint&2Kint&2)
stloc.)
ldloc.)
ldc.i/.*
ldc.i/.1
ldc.i/.s 1)
ldloc.)
ldc.i/.*
ldc.i/.1
ret
}
}

6utput
18

ldc.i/.*
ldc.i/.1

We then change the above two lines to

ldc.i/.1
ldc.i/.2

and we see the following exception thrown at us.

at ===.<iCa-DE
243885141.doc 296 od 372

An array with a lower and upper bound, having a rank of 2 is placed on the stack.
The frst dimension starts at 5 and ends at 10. Thus, on the stack is placed frst the
lower bound i.e. 5, and then, the length of the array. There is no upper bounds. As
the array starts at 5 and ends at 10, the length is calculated as follows: 10 - 5 + 1 =
6 (i.e. the upper bound - lower bound + 1). The same rule holds true for the next
rank.

The rest of the code remains the same. When the array member 6, 5 are changed to
index values of 1, 2, an exception is thrown. This is because the array bounds for
the frst dimension are 5 to 10 and for the second dimension are 3 to 7. Any
attempt to cross the array bounds in any direction generates an exception.

Array of Arrays

a.cs
class ###
{
{
int 6767 a ( ne> int627679
a6)7 ( ne> int6179
a617 ( ne> int61)79
System.Console.Writeine(a.en!th) 9
System.Console.Writeine(a6)7.en!th) 9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.2
ne>arr int&267
stloc.)
ldloc.)
ldc.i/.)
ldc.i/.1
stelem.re'
ldloc.)
ldc.i/.1
ldc.i/.s 1)
stelem.re'
ldloc.)
243885141.doc 297 od 372
ldlen
conv.i/
ldloc.)
ldc.i/.)
ldelem.re'
ldlen
conv.i/
ret
}
}

6utput
>
1

Let us explore jagged arrays where an array member can contain another array of a
diferent length. We are creating an array that has an irregular shape. In C#, the
syntax to create an array of arrays is the same. It consists of two square brackets []
[]. We frst create the array using only the frst dimension. This is done by using
newarr and stating an array data type as a parameter. We then initialize V_0 with
this array reference.

Now, since we have to create two separate one dimensional arrays, we frst place the
array reference on the stack. Then we place the index of the array member we want
to initialise followed by the size of the new array. Finally, we call newarr to create an
array of ints and place the reference on the stack. stelem.ref is used to initialize the
array member with this array reference. The same is repeated for the second
member a[1] also.

The function ldlen returns the length of the array. For the main array, using ldloc.0
its reference is placed on the stack. For the second length, ldelem.ref is used to frst
fetch the reference of the array out of the frst array member a[0], and then ldlen is
used to obtain the length.

a.cs
class ###
{
{
int6767 a ( ne> int62767 { ne> int67 {2K&}K ne> int67 {1K*K4} }9
System.Console.Writeine(a6)7617) 9
System.Console.Writeine(a617627) 9
}
}

a.il
.assembly mukhi {}
{
243885141.doc 298 od 372
{
.entrypoint
ldc.i/.2
ne>arr int&267
stloc.1
ldloc.1
ldc.i/.)
ldc.i/.2
stloc.2
ldloc.2
ldc.i/.)
ldc.i/.2
stelem.i/
ldloc.2
ldc.i/.1
ldc.i/.&
stelem.i/
ldloc.2
stelem.re'
ldloc.1
ldc.i/.1
ldc.i/.&
stloc.2
ldloc.2
ldc.i/.)
ldc.i/.1
stelem.i/
ldloc.2
ldc.i/.1
ldc.i/.*
stelem.i/
ldloc.2
ldc.i/.2
ldc.i/.4
stelem.i/
ldloc.2
stelem.re'
ldloc.1
stloc.)
ldloc.)
ldc.i/.)
ldelem.re'
ldc.i/.1
ldelem.i/
ldloc.)
ldc.i/.1
ldelem.re'
ldc.i/.2
ldelem.i/
243885141.doc 299 od 372
ret
}
}

6utput
@
:

The above example is similar to its predecessor, though it is more elaborate and
complete. A jagged array is created that is made of two arrays of sizes 2 and 3
respectively. They can be initialized in one stroke. IL does it the hard way. To fetch
the value of a[1][2], it places the reference of the array on the stack. Then it places
1, the frst array index, on the stack. Thereafter, ldelem.ref is used to obtain an
array reference.

Thus, at frst an array reference is pushed on the stack. Then 2 is placed on the
stack, and ldelema.i4 is used to get the second member of this new array. A jagged
array is treated as an array whose members contain other independent arrays.

An array of arrays is diferent from a multi dimensional array. A multi dimensional
array forms one memory block whereas, an array of arrays holds references to other
arrays in memory. Thus, an array of arrays is slower in execution since it needs to
make an extra indirection to reach the fnal element.

We can also use pointers with arrays. The salient feature of an array of arrays is
that, the frst array merely stores the addresses of other arrays. The disadvantage of
a multi dimensional array is the fact that, all the dimensions have to be of the same
size.

a.il
.assembly mukhi {}
{
{
.entrypoint
.locals (int&2676767 a)
ldc.i/.1
stloc a
ldloc a
ldc.i/.)
ldc.i/.&
ldloc a
ldc.i/.)
ldc.i/.1
243885141.doc 300 od 372
ldc.i/ 1)
ldloc a
ldc.i/.)
ldc.i/.1
ldc.i/.1
ldc.i/ 1))
ldloc a
ldc.i/.)
ldc.i/.1
ldc.i/.1
ret
}
}

6utput
188

Here, we shall see how to create an array a of type [][][]. We frst create a local a of
type array of array of array. Thus, we have two levels of indirection. We want the
frst or main array to have a size of 5 i.e. it should be able to store the references of
5 arrays in memory. The instruction ldc places the size 5 of this array on the stack.
Thereafter newobj is used to create the frst dimension of this array. The instruction
stloc a initializes this array and ldloc a put its reference on the stack.

Subsequently two values are placed on the stack. One is the index of the frst
member a[0] and the other is the size of the array that this member should point to
i.e. 3. newobj creates an array called int32[][]. To store it in a[0] the Set function is
used. This function requires the index of the array as the frst parameter. Hence, 0
is placed on the stack, even though newobj does not require it. It simplifes the call
of the Set function.

The next thing required is an int32[] to store in our int32[][]. So, the array a is
placed again on the stack and 0 is used to obtain the value of the array that has
just been created. The Get functions does the job of retrieving values. Then, as
before, 1 is placed on the stack followed by the size of the new array i.e.10. Finally,
newobj creates a simple array int32[] and places it on the stack which is then
stored using the Set function.

Remember that the value 1 has already been placed on the stack. To execute the
operation a[0][1][5] = 100 the member a[0] is requred. So, the array reference a is
placed on the stack followed by 0 and the Get function is called.
243885141.doc 301 od 372

To access a[0][1], as the frst member of array a[0] is already on the stack, all that is
requred is placing 1 on the stack and calling Get again. Now, to store the value in
the member a[0][1][5], 5 is loaded on the stack. To fetch the values of member a[0]
[1][5], the same procedure as before is followed. That is

load the array reference on the stack.
obtain the member 0 by using get.
obtain the member 1 of this array
fnally the member 5 on this array.

The logic is the same as described earlier.

a.cs
usin! System9
public class ###
{
public static void abc(int iK DDar!list)
{
.r!?terator a ( ne> .r!?terator(DDar!list)9
>hile (a.Iet;emainin!Count() M ))
Console.Writeine(DDre'value(a.Iet<e$t.r!()K int))9
}
{
abc(2)K DDar!list(1K 2K &))9
}
}

a.il
.assembly mukhi {}
{
{
.entrypoint
ldc.i/.s 2)
ldc.i/.1
ldc.i/.2
ldc.i/.&
call varar! void ###::abc(int&2K...Kint&2Kint&2Kint&2)
ret
}
.method public hidebysi! static varar! void abc(int&2 i) il mana!ed
{
.locals (value class 6mscorlib7System..r!?terator @D))
ldloca.s @D)
ar!list
call instance void 6mscorlib7System..r!?terator::.ctor(value class
6mscorlib7System.;untime.r!umentVandle)
br.s ?D))1d
?D)))b: ldloca.s @D)
243885141.doc 302 od 372
call instance typedre' 6mscorlib7System..r!?terator::Iet<e$t.r!()
re'anyval 6mscorlib7System.?nt&2
ldind.i/
?D))1d: ldloca.s @D)
call instance int&2 6mscorlib7System..r!?terator::Iet;emainin!Count()
ldc.i/.)
b!t.s ?D)))b
ret
}
}

6utput
1
>
@

This example builds upon the earlier example, which has a function that accepts a
variable number of arguments. In C#, __arglist enables us to implement a function
that accepts a variable number of arguments.

Internally, in IL, the function is marked with a vararg modifer and, the ArgIterator
class is used to display the values in a loop.
-14-

,.ternal ULLs

So far, all our code has been contained in one single IL fle. This is not practical
because, in real life projects, hundreds of people work together, and the code they
write is placed in diferent fles that must be shared or used by others.

a.il
.module a.dll
.class private auto autochar ###
{
{
ldstr "Vi"
ret
}
}

In the above program, a class zzz is created that resides in a fle called a.il. It
contains a single function abc. The class is private and the function is public since
we want to enable other programs to call the code located in our class. When we
compile the above program i.e. run ilasm on a.il with the dll option:
243885141.doc 303 od 372

&ilasm a.il 'dll
The assembler creates a fle with a .dll extension and not an exe fle.

A dll is used under Windows to store code that other programs can call. As we are
not specifying an executable or a stand-alone program, we have no directive called
assembly. Instead, we have used the directive module, which is given the name of
the dll. This directive is optional. IL creates one for us automatically, defaulting to
the name of the output fle, if we dont specify one. It is good idea to tell the world
this is not an executable program, but one containing code for the others to use.

We would now like to call this function abc from class zzz, which is located in a.il
from a function in b.il.

b.il
.assembly mukhi {}
.class public auto autochar yyy
{
.method public static hidebysi! void 8ain() il mana!ed {
.entrypoint
ret
}
}

We assemble this program as before. As usual we start with the directive assembly.
As stated earlier, we are planning to call a function abc from class zzz. When we
assemble the program, we do not get any error, but when we run the program, we
get the following exception:

6utput
,.ception occurred: )-stem.$-peLoad,.ception: 3ould not load class O===O.
at ---.MainDE

The runtime cannot load the class zzz since it does not reside in the current
directory. If you remember, this function was created in a module called a.dll. Let
us get back to the drawing board and supply this piece of information to the
assembler.

b.il
.assembly mukhi {}
{
.entrypoint
call void 6.module a.dll7###::abc()
ret
}
}

243885141.doc 304 od 372
,rror
///// 0AIL12, /////

Oops! An error has been generated. If you remember, sometime ago, we have
prefaced the name of the class with the name of the dll fle that contained the code.
We thought of doing the same in the above program, but this caused an error.
However, the assembler does not tell us where the error is. Most of the time it
behaves in such a secretive manner and keeps the line numbers of the code where
the error has occurred, close to its chest.

b.il
.assembly mukhi {}
.module e$tern a.dll
{
.entrypoint
ret
}
}

Now the assembler error disappears, but the runtime generates an exception

6utput
at ---.MainDE
Whenever we use the module directive in front of a function, the module must be
declared earlier. This is done, using the same module directive with two parameters,
i.e. extern and the name of the module.

The extern indicates that some of the code that we will be using later will reside in
the fle a.dll. Thus, if we do not declare a module as extern earlier in our fle, we
cannot use it later to signify that the code comes from this module. However we still
get an error at runtime, saying that the class could not be loaded.

b.il
.assembly mukhi {}
.'ile a.dll
.module e$tern a.dll
{
.entrypoint
ret
}
}

6utput
%i
243885141.doc 305 od 372

Now everything works as expected. This is because, we added a directive fle, that
informed the runtime to load the fle a.dll in memory, as it contains some code that
we are referring to. This class zzz could contain numerous functions and felds.
This is how we access the WriteLine function from the Console class.

Let us now explain a fundamental concept that the .NET world has introduced.

a.cs
public class ###
{
{
}
}

We compile the above C# program as

csc 'target:lirar- a.cs

The above line produces a fle called a.dll. When we run the same program b.exe,
the string "bye" is displayed. The point that we want to make is that, we can call
code from a dll without being concerned whether the code was written in C# or in
any other programming language. Thus, we have no way of knowing as to which
language the code of the WriteLine function from the class Console has been
written in.

This is how, the .NET world puts a stop to all debates on which programming is
better. Finally, all the code is converted into IL code in the .Net world.

Let us now modify b.il to create a dll.

b.il
.assembly b {}
.module b.dll
{
{
ldstr "abc"
ret
}
}

We compile this fle to a dll by running

ilasm .il 'dll
243885141.doc 306 od 372

This creates a fle called b.dll. The only change that we have introduced is the
addition of an assembly directive. Other than that, everything else remains the
same.

Then we have created a fle called c.il as follows:

c.il
.assembly e$tern b {}
.assembly c {}
.class public auto ansi yyy e$tends 6b7###
{
{
.entrypoint
call void 6b7###::abc()
ret
}
}

6utput
ac

We assemble it as normal. When we run c.exe, abc is displayed. Here, we are
deriving from the class zzz that is present in the dll b. Whenever we derive from
another class, we have to add the assembly directive in the fle c.il. Otherwise, the
following error is displayed when the program runs:

6utput
,.ception occurred: )-stem.MissingMethod,.ception: 3ould not *nd the entr-
point.

c.il
.assembly c {}
{
{
.entrypoint
call void abc()
ret
}
}

,rror
)ource *le is A+)I
3reating #, *le
,mitting memers:
4loal
3lass 1Methods: 1!
243885141.doc 307 od 372
2esol<ing memer refs:
///// 0AIL12, /////

The function abc may lie in the class zzz, from which the class yyy has been
derived, but we have to explicitly inform the assembler as to which class the
function should be called from. The compiler stays ignorant of this fact The point is
that fle b.dll is not parsed to check for functions contained in it.

c.il
.assembly c {}
{
{
.entrypoint
ret
}
}

The above program generates no error when we assemble it, but when we run the
program, we get the following exception:

6utput
at ---.MainDE
No assumptions are made in the IL world. You have to explicitly state each and
every time that, the class zzz is located in module b.dll. Stating it once is not
enough. Now you know why it is better for programs to generate IL code. There is
too much repetition.

Now, modify a.il to create a dll and b.il to create an executable.

a.il
.assembly a.dll {}
.module a.dll
{
{
ldstr "abc"
ret
}
}

b.il
.assembly e$tern a.dll {}
.assembly b {}
.class public auto ansi yyy e$tends 6a.dll7###
{
243885141.doc 308 od 372
{
.entrypoint
call void 6a.dll7###::abc()
ret
}
}

6utput
ac

The only change made here is that, we have called the assembly a.dll and hence, we
place the same names in the [] brackets. Also, we are not allowed to specify .module
in the [] brackets. We also do not have a .fle directive in our fle b.il, like we had
earlier.
a.il
.assembly a.dll {}
.module a.dll
{
{
ldstr "abc"
ret
}
}

We have made one small change in the fle a.il only. The fle b.il remains the same.
When we run b.exe, we get the following exception.

6utput
,.ception occurred: )-stem.MethodAccess,.ception: ===.acDE
at ---.MainDE

The reason for the above error is that, the class zzz has been made private and
hence cannot be accessed from outside. Private is the most restrictive access
modifer.

We then changed the access modifer of the class to public, but made the access
modifer of the function to private. On doing so, we get the same exception again.
Thus, both the class and the function must be public, if the class has to be
accessible from the outside.

We have exactly seven accessibility types. We have touched upon two of them,
public and private. We will now change the access modifer from private to family.
Thereafter, no error will be generated since classes derived from zzz are allowed to
call the function. But, on replacing family by assembly an error is generated since
only the same assembly is allowed to call the function. Then, we have variants on
243885141.doc 309 od 372
the above two. They are famandassem and famorassem, that are true if either both
the conditions are true or only one is true. The last one is privatescope.
-15-

A 41I Application in IL
So far, we have been writing small programs and rendering explanations for specifc
concepts. The IL documentation contains a very large program that demonstrates
how to write a small application. Their application spawns fve fles, creates four
dll's and has code that is spread over at least 30 pages.

We believe in what Newton once said, that he could see very far because he stood
on the shoulders of tall people. We would like to emulate this thought. So, let us get
down to hard work in our last chapter. We have decided to use the same application
to explain the diferent concepts. We also expect you to read the original program
yourselves.

We frst created a batch fle a.bat as shown below. The batch fle does everything for
us, including running the program. As far as possible, we have maintained the
same variable and class names that the original program contained. The only
change incorporated is that we have put everything into the following two fles:

countdown.il that becomes an executable.
aaa.il that contains the rest of the code and becomes a dll.

a.bat
del E.dll
del E.e$e
ilasm aaa.il :dll
ilasm countdo>n.il
countdo>n

countdo>n.il
.assembly Count-o>n {}
.assembly e$tern System.Win3orms {}
.assembly e$tern System.-ra>in! {}
.'ile aaa.dll
.module e$tern aaa.dll
.class ### e$tends 6System.Win3orms7System.Win3orms.3orm
{
.'ield class 6.module aaa.dll7ctb counter5o$
.'ield class 6.module aaa.dll7ssb button
.method public static void 8ain()
{
.entrypoint
243885141.doc 310 od 372
call void 6System.Win3orms7System.Win3orms..pplication::;un(class
6System.Win3orms7System.Win3orms.3orm)
ret
}
.method instance void .ctor()
{
ldar!.)
call instance void 6System.Win3orms7System.Win3orms.3orm::.ctor()
ldar!.)
ldar!.)
ne>obj instance void 6.module aaa.dll7ssb::.ctor(class
st'ld class 6.module aaa.dll7ssb ###::button
ldar!.)
dup
ne>obj instance void 6.module aaa.dll7ctb::.ctor(class ###)
st'ld class 6.module aaa.dll7ctb ###::counter5o$
.locals init (value class 6System.-ra>in!7System.-ra>in!.Si#e si#e)
ldloca si#e
initobj value class 6System.-ra>in!7System.-ra>in!.Si#e
ldloca si#e
ldc.i/ /21
ldc.i/ &))
call instance void 6System.-ra>in!7System.-ra>in!.Si#e::.ctor(int&2K int&2)
ldar!.)
ldloc si#e
callvirt instance void ###::setDSi#e(value class 6System.-ra>in!7System.-ra>in!.Si#e)
ldar!.)
call value class 6System.-ra>in!7System.-ra>in!.Color
6System.-ra>in!7System.-ra>in!.Color::!etDCadet5lue()
callvirt instance void ###::setD5ackColor(value class
6System.-ra>in!7System.-ra>in!.Color)
ldar!.)
ldstr "Count-o>n"
callvirt instance void ###::setD=e$t(class System.Strin!)
ret
}
}

aaa.il
.module aaa.dll
.assembly e$tern mscorlib {}
.'ile Count-o>n.e$e
.module e$tern Count-o>n.e$e
.class public ctb e$tends 6System.Win3orms7System.Win3orms.=e$t5o$
{
.'ield class 6.module Count-o>n.e$e7### parent
.'ield static 'amily int&2 counter-e'ault at @ijay
.method instance void .ctor(class 6.module Count-o>n.e$e7### parent)
{
ldar!.)
call instance void 6System.Win3orms7System.Win3orms.=e$t5o$::.ctor()
ldar!.)
243885141.doc 311 od 372
ldar! parent
st'ld class 6.module Count-o>n.e$e7### ctb::parent
.locals (value class 6System.-ra>in!7System.-ra>in!.Goint point)
ldloca point
ldc.i/ 41
ldc.i/ 1))
call instance void 6System.-ra>in!7System.-ra>in!.Goint::.ctor(int&2K int&2)
ldar!.)
ldloc point
call instance void 6System.Win3orms7System.Win3orms.=e$t5o$::setDocation(value
class 6System.-ra>in!7System.-ra>in!.Goint)
ldar! parent
call instance class 6System.Win3orms7System.Win3orms.ControlJControlCollection
6System.Win3orms7System.Win3orms.3orm::!etDControls()
ldar!.)
callvirt instance void
6System.Win3orms7System.Win3orms.ControlJControlCollection::.dd(class
6System.Win3orms7System.Win3orms.Control)
ldar!.)
lds'ld int&2 ctb::counter-e'ault
call class System.Strin! 6mscorlib7System.?nt&2::=oStrin!(int&2)
callvirt instance void 6System.Win3orms7System.Win3orms.=e$t5o$::setD=e$t(class
System.Strin!)
ret
}
}
.data @ijay ( int&2(&)
.class public ssb e$tends 6System.Win3orms7System.Win3orms.5utton
{
.method instance void .ctor(class 6System.Win3orms7System.Win3orms.3orm parent)
{
ldar!.)
call instance void 6System.Win3orms7System.Win3orms.5utton::.ctor()
.locals init (value class 6System.-ra>in!7System.-ra>in!.Si#e si#eKvalue class
6System.-ra>in!7System.-ra>in!.Goint point)
ldloca point
initobj value class 6System.-ra>in!7System.-ra>in!.Goint
ldloca point
ldc.i/ 1))
ldc.i/ 2))
ldar!.)
ldloc point
call instance void 6System.Win3orms7System.Win3orms.5utton::setDocation(value
ldloca si#e
ldloca si#e
ldc.i/ 2))
ldc.i/ 1)
ldar!.)
ldloc si#e
callvirt instance void 6System.Win3orms7System.Win3orms.5utton::setDSi#e(value class
6System.-ra>in!7System.-ra>in!.Si#e)
243885141.doc 312 od 372
ldar!.)
6System.-ra>in!7System.-ra>in!.Color::!etDIold()
callvirt instance void 6System.Win3orms7System.Win3orms.5utton::setD5ackColor(value
class 6System.-ra>in!7System.-ra>in!.Color)
ldar!.)
6System.-ra>in!7System.-ra>in!.Color::!etD<avy()
callvirt instance void 6System.Win3orms7System.Win3orms.5utton::setD3oreColor(value
class 6System.-ra>in!7System.-ra>in!.Color)
ldar!.)
ldstr ".rial"
ldc.r/ 2)
ne>obj instance void 6System.-ra>in!7System.-ra>in!.3ont::.ctor(class
System.Strin!K 'loat&2)
callvirt instance void 6System.Win3orms7System.Win3orms.5utton::setD3ont(class
6System.-ra>in!7System.-ra>in!.3ont)
ldar!.)
ldstr "Start"
callvirt instance void 6System.Win3orms7System.Win3orms.5utton::setD=e$t(class
System.Strin!)
ldar! parent
ldar!.)
ret
}
}

We have repeated some of the earlier explanations to refresh your memory. The
explanation is also a summary of what we have learnt so far.

When we run the above program, a button and a text box are displayed on the
screen. The button has a diferent color and the window has a title.

This large program enlightens us on designing a GUI application in IL. To write one
ourselves, we start with the main program which is countdown.il.

The assembly is named as countdown for want of a better name. The code for the
GUI classes resides in a fle called System.WinForms.dll and belongs to the
namespace System.Winforms. The type is prefaced with the dll name which is
optional in case of mscorlib.dll.

As we are referring to code in an external assembly, the extern parameter is given
after the directive assembly. This is followed by the name of the assembly that
contains the code. The extern directive is supplied for System.Drawing also. No
error is generated if an extern directive is mentioned and the code within the fle is
not executed
243885141.doc 313 od 372

Most of our code is contained in classes and resides in a fle aaa.il. The assembler
converts the il fle into a dll hence the fle name is aaa.dll. The two directives used
are:

The frst specifes the name of the physical fle using the fle directive.
The second is the assembly extern directive.

We will not repeat the code explanations that have been delved upon earlier.

We have created two felds:


The frst one is named counterBox and typed ctb. The class ctb exists in aaa.dll fle,
hence the feld is prefaced with the name of the class alongwith the .module
directive and the name of the dll. This information is mandatory when referencing a
type in an external dll. In the same vein, the feld named button that looks like
class ssb, resides in fle aaa.dll.


The class zzz is derived from the class System.Winforms.Form. To build a GUI
applications, in Microsoft parlance, WinForms, the class has to be derived from
Form in namespace System.Winforms.

The code execution begins in Main as the entrypoint directive is placed in this
function.Within this function, a new object zzz is created and placed on the stack.
This is to facilitates the next function call Run that requires a Forms object on the
stack.


The object created by the newobj instruction is all that we need to place an empty
window on our screen. newobj calls the constructor of class zzz i.e. .ctor which
populates the window with a variety of buttons, text boxes and widgets.

A look at the constructor of class zzz:

ldar!.)

243885141.doc 314 od 372
At frst, the constructor of the base class Forms is called. Prior to the call, the this
pointer is the only value that is placed on the stack as the constructor of the base
class Forms, takes no parameters. The this pointer is stored in the invisible frst
parameter to every non static function and its value can be accessed using the
instruction ldarg.0.

The next job is creating an object, an instance of class ssb i.e. a button. The
instruction newobj comes into focus again and the constructor of the class ssb is
called.


As explained earler, the this pointer is placed on the stack but it is done twice. The
second this pointer is for the parameter to the constructor. A reference to the newly
created object on the stack is stored in the feld button, so that it can be used later.

To store the return value of newobj in a feld, the frst this pointer is used. This
proves that to access a feld, we frst need a reference to the object that contains the
feld.

The return value of newobj that is placed on the stack can be stored in a local, if
desired so. But, to store it in a feld, the this pointer must be loaded on the stack
prior to calling newobj. newobj then removes the second this pointer from the stack.

In short, newobj requres the this pointer on the stack frst and then calls the
constructor. The constructor is not called directly but to do so the this pointer
reference must be placed on the stack frst. With newobj it is impractical to stack
position the this pointer as the object has not been created at all.

The instruction stfd not only requires the name of the feld, but also its data type,
since the feld can be in another assembly. Incidentally, it takes the same efort to
use an entity from another assembly, as it takes to use the same entity located in
our fle. The only diference is in the use of square brackets [] and in the name of
the module or assembly.

After the button, the next widget to be created is a text box of type ctb.

The constructor to this class needs a parameter as in the ssb class. Here, instead of
repeating the ldarg.0 instruction twice, the dup instruction is used. This
instruction simply duplicates the value present on the stack. ldarg.0 places the this
pointer on the stack and dup creates one more copy of the this pointer.

243885141.doc 315 od 372
You are free in choosing from two ldarg.0 statements or using the dup instruction.
We will focus on the role of the constructors a little later after a brief synopsis of the
program.


The reference to the newly created object on the stack, akin to the button is stored
in the feld counterBox, so that it can be used later.
The directive .locals very often used for creating variables can be placed anywhere
in the function, as styles evolve with time, but good programming style demands
that we use it at the beginning. A local variable called size of type Size within the
namespaces System.Drawing is created.

.locals init (value class 6System.-ra>in!7System.-ra>in!.Si#e si#e)

Since it is a value type, we see the modifer value in front of class.
The init keyword initializes the locals to zero. ldloca places the address of size on
the stack and then initobj calls the constructor of the value class or structure.

ldloca si#e

initobj is optional, but it is a good idea to use it. We cannot use newobj on a value
type.

Now to initialize this size object to the size of our opening windows:
To accomplish this, on the stack frst is placed the address of the size, followed by x
and y co-ordinates of the window.

ldloca size
ldc.i4 425
ldc.i4 300

Thereafter, the constructor of the Size class is called which initializes the size
object.


The Form class has a function or property called set_Size that initializes the
window size on the screen, depending upon the size object passed.

In order to change the look and feel of any GUI application, the parameters are frst
placed on the stack and then the relevant functions are called . From now on, we
will not comment upon the unavoidable requirement of the this pointer on the
stack.
To change the foreground and background color of the window, a property called
get_CadetBlue from the Color Class, a value class, is used.
243885141.doc 316 od 372

6System.-ra>in!7System.-ra>in!.Color::!etDCadet5lue()

The return value is placed on the stack and can either be an enum or a constant.
Remember properties are functions within a class.

The virtual function set_BackColor changes the background color.

callvirt instance void ###::setD5ackColor(value class
6System.-ra>in!7System.-ra>in!.Color)

We could have gone on endlessly on changing the appearance of our window,
however, we will stop with just one more function, to change the title of our
windows. To achieve this, the title as a string is loaded on the stack using ldstr and
then function set_Text is called to change the title.

ldstr "Count-o>n"
callvirt instance void ###::setD=e$t(class System.Strin!)

Now, to understanding the fle aaa.il.

aaa.il eventually gets converted into a dll, hence the assembly directive is evaded.
Instead, a module directive stating the name of the module is inserted.

.module aaa.dll
.'ile Count-o>n.e$e

This fle references some felds stored in the executable fle countdown.exe, and
thus, we need both the fle and assembly extern directive.

The frst function to be called in the dll is the constructor of class ssb, an instance
of the Button class. This is because countdown.il executes newobj on ssb i.e the
Button class. The Button class contains all the code needed to represent a Button
object on the screen. As usual, the frst call is to the constructor of the base class
Button.


Two locals are created thereafter:

One to store a width and height dimension, called size,
Another to represent a point on the screen, in x and y coordinates, called
point.
243885141.doc 317 od 372

.locals init (value class 6System.-ra>in!7System.-ra>in!.Si#e si#eKvalue class
6System.-ra>in!7System.-ra>in!.Goint point)

The point member is initialized in the same manner as the size member. point is
then loaded on the stack and set_Location from the Button class is called.

ldloc point
call instance void 6System.Win3orms7System.Win3orms.5utton::setDocation(value

Remember, the this pointer in this case is an ssb, a Button reference type.

Every Winforms widget has a set_Size function that lets us set the size of the
widget. The color can be set as shown before. To create a font object, besides the
this pointer, we need only two parameters to the constructor:

The frst is the name of the font
The second is a point size for the font.

ldar!.)
ldstr ".rial"
ldc.r/ 2)
ne>obj instance void 6System.-ra>in!7System.-ra>in!.3ont::.ctor(class
System.Strin!K 'loat&2)
callvirt instance void 6System.Win3orms7System.Win3orms.5utton::setD3ont(class
6System.-ra>in!7System.-ra>in!.3ont)

This newly created font object is placed on the stack and the function set_Font is
called.

ldstr "Start"
System.Strin!)

To place the label "Start" on the button, the string is loaded on the stack using ldstr
and thereafter the function set_Text is called. It is as simple as that.

The next instruction in the sequence is loading the parameter parent on the stack.

ldar! parent
ldar!.)

If you remember, the constructor was called with two identical parameters on the
stack. Thus ldarg.0 and ldarg parent are the same. The function get_Controls
243885141.doc 318 od 372
places a Control$ControlCollection object on the stack. The this pointer is also
loaded and the virtual function Add is called. Add, adds the newly created control
to the list of controls that are fnally to be displayed by Winforms. The function Add
belongs to the class Control$ControlCollection and hence a reference to this class is
required. It is easier in a language like C#, that shields us from passing the this
pointer.

The next widget to be displayed on the window is the text box.. The text box class is
called ctb and derives from TextBox. As usual, the constructor is called. We will
repeat this aspect of the code, i.e. calling of the constructor, for the last time. If you
do not call the base class constructor, you will be in serious trouble.

To set the location, set_Location is used and then the control is added to the list of
controls using the Add function. How does Winforms display the number 3 in the
text box ?

For this purpose, a feld called counterDefault is created which is static and an
int32. Tagged alongwith it is modifer at, and then a name Vijay.

.'ield static 'amily int&2 counter-e'ault at @ijay

This denotes that the variable counterDefault will receive its initial value from Vijay.

There is a directive called .data that creates a word "Vijay" and initialises it to the
number 3 using int32.

.data @ijay ( int&2(&)

This directive called data places the value 3 in the .data section in the PE
Executable fle. The PE fle is divided into smaller parts called sections. All the code
goes into a section called .text and all the data goes into a section called .data. We
thus do not have to initialize the variable in a constructor.

This number is placed on the stack and a static function To_String is called, that
converts this int32 into a string. Then our good old function set_Text displays it in a
text box.

Lets us now proceed to the next example.

countdo>n.il
.'ile aaa.dll
{
243885141.doc 319 od 372
{
.entrypoint
ret
}
{
ldar!.)
ldar!.)
ldar!.)
ret
}
}

aaa.il
.module aaa.dll
.'ile Count-o>n.e$e
{
.'ield class 6mscorlib7System.0ventVandler onClick0ventVandler
{
ldar!.)
ldar!.)
ldstr "Start"
System.Strin!)
ldar! parent
ldar!.)
ldar!.)
dup
dup
ldvirt'tn instance void ssb::%nClick(class System.%bjectK class
6mscorlib7System.0vent.r!s)
ne>obj instance void 6mscorlib7System.0ventVandler::.ctor(class System.%bjectK
int&2)
st'ld class 6mscorlib7System.0ventVandler ssb::onClick0ventVandler
243885141.doc 320 od 372
ldar!.)
dup
ld'ld class 6mscorlib7System.0ventVandler ssb::onClick0ventVandler
call instance void 6System.Win3orms7System.Win3orms.5utton::addDClick(class
6mscorlib7System.0ventVandler)
ret
}
.method virtual ne>slot public instance void %nClick(class System.%bjectK class
{
ldc.i/.1
call int+ Bser&2::8essa!e5eep(unsi!ned int&2)
pop
ret
}
}
.class abstract sealed public auto autochar Bser&2 e$tends 6mscorlib7System.%bject
{
.method public static pinvokeimpl("user&2.dll" cdecl) int+ 8essa!e5eep(unsi!ned int&2)
native unmana!ed
{
}
}
In this example we simply display a button sans all the bells and whistles. When we
click on this button, function OnClick merely gets called. In this function, we
simply call another function MessageBeep that rings a bell or simply produces a
beep sound. This function is present in a dll called user32.dll. In the fle
countdown.il, no new code has been added. Modifcations have only been made in
the fle aaa.il.

After the control is registered with WinForms, the this pointer is placed thrice on
the stack. Also, the address of the virtual function OnClick is loaded on the stack.

ldar!.)
dup
dup

This function is called each time the button is clicked on. The dup instruction is
placed on the stack to account for the parameters to the function OnClick.

Simultaneously, a new object is created that is an instance of the class
EventHandler.

int&2)

243885141.doc 321 od 372
This newly created object is stored in the feld onClickEventHandler. Then function
add_Click from the button class registers this function with Winforms.


Now, the function OnClick gets called with a click on the button and a static
function MessageBeep is executed. The code for this function is not available as the
attributes on the function are native. This implies that the developers have
supplied the code and the function will be executed in the unmanaged state. To call
a function from a dll, pinvokeimpl is used, stating the name of the dll and the
calling convention.

The program countdown.il remains the same for the next program. The modifed fle
aaa.il is given below.

aaa.il
.module aaa.dll
.assembly e$tern System.=imers {}
.'ile Count-o>n.e$e
{
native unmana!ed {}
}
{
{
ldar!.)
ldar!.)
ldstr "Start"
System.Strin!)
ldar! parent
ldar!.)
ldar!.)
dup
dup
243885141.doc 322 od 372
int&2)
ldar!.)
dup
ret
}
{
.locals (class 6System.=imers7System.=imers.=imer @D))
ne>obj instance void 6System.=imers7System.=imers.=imer::.ctor()
stloc.)
ldloc.)
ldnull
ld'tn void ssb::%n=imed0vent(class System.%bjectKclass 6mscorlib7System.0vent.r!s)
ne>obj instance void 6mscorlib7System.0ventVandler::.ctor(class System.%bjectKint&2)
call instance void 6System.=imers7System.=imers.=imer::addD=ick(class
ldloc.)
ldc.r+ &)).
call instance void 6System.=imers7System.=imers.=imer::setD?nterval('loat*/)
ldloc.)
ldc.i/.1
callvirt instance void 6System.=imers7System.=imers.=imer::setD0nabled(bool)
ret
}
.method public hidebysi! static void %n=imed0vent(class System.%bject sourceKclass
6mscorlib7System.0vent.r!s e) il mana!ed
{
ldc.i/.1
pop
ret
}
}

In this program, when the button is clicked on, for 3 seconds nothing happens.
Thereafter, beeps are heard. Then on, after every 3 seconds, a beep sound is heard.
This is a program that does nothing for three seconds and then activates some
code.

These programs are timer-based.

The program aaa.il has only one change incorporated in it within the OnClick
function. A local is declared that looks like class Timer and an object like class
Timer is created using newobj.
243885141.doc 323 od 372


The value returned is stored in the local V_0. Thereafter, the Timer object is again
loaded on the stack, followed by a NULL reference and the address of a static
function OnTimedEvent.

ldloc.)
ldnull
ld'tn void ssb::%n=imed0vent(class System.%bjectKclass 6mscorlib7System.0vent.r!s)

This function is repeatedly called after a certain time period has elapsed.

Concurrently, an object that looks like an EventHandler is created.

ne>obj instance void 6mscorlib7System.0ventVandler::.ctor(class System.%bjectKint&2)

This constructor, in addition to the this pointer, needs two more parameters, the
second being the address of our function.

ldloc.)

The add_Tick function then incorporates these changes and stores the handle in
the local variable.

So, the timeout period i.e. the time duration after which the function is to be called,
is placed on the stack. This number is a foat, and hence, 8 bytes are allocated on
the stack for it. The set_Interval function sets the timeout period and the
set_Enabled function sets the timer on, which runs periodically.

ldc.r+ &)).

Add the following lines of code to the end of the function OnClick, just before the ret
instruction.

ldloc.)
ldc.i/.)
callvirt instance void 6System.=imers7System.=imers.=imer::setD.uto;eset(bool)

This code calls the function set_AutoReset with the number 1, so that, the timeout
function gets called over and over again.

countdo>n.il
243885141.doc 324 od 372
.'ile aaa.dll
{
{
.entrypoint
ret
}
{
ldar!.)
ldar!.)
dup
ldar!.)
ldar!.)
ld'ld class 6.module aaa.dll7ctb ###::counter5o$
6System.Win3orms7System.Win3orms.3ormK class 6.module aaa.dll7ctb)
pop
ret
}
}

aaa.il
.module aaa.dll
.'ile Count-o>n.e$e
{
.'ield int&2 count
{
ldar!.)
ldar!.)
ldc.i/ &
st'ld int&2 ctb::count
ldloca point
ldc.i/ 41
ldc.i/ 1))
243885141.doc 325 od 372
ldar!.)
ldloc point
ldar!.)
ldar!.)
ld'ld int&2 ctb::count
System.Strin!)
ldar! parent
ldar!.)
ret
}
.method virtual ne>slot instance void SetCount(int&2 count1)
{
ldar!.)
ldar! count1
System.Strin!)
ret
}
.method virtual ne>slot instance int&2 IetCount()
{
ldar!.)
callvirt instance class System.Strin! ctb::!etD=e$t()
callvirt instance int&2 6mscorlib7System.Strin!::=o?nt&2()
ret
}
}
{
.'ield class ctb par1
.method instance void .ctor(class 6System.Win3orms7System.Win3orms.3orm
parentKclass ctb aa)
{
ldar!.)
ldar!.)
ldar!.2
st'ld class ctb ssb::par1
ldar!.)
ldstr "Start"
System.Strin!)
ldar! parent
243885141.doc 326 od 372
ldar!.)
ldar!.)
dup
dup
int&2)
ldar!.)
dup
ret
}
{
.locals ( int&2 i)
ldar!.)
ld'ld class ctb ssb::par1
callvirt instance int&2 ctb::IetCount()
ldc.i/.1
sub
stloc.)
ldc.i/.)
ldloc.)
b!t a1
ldar!.)
ldloc.)
callvirt instance void ctb::SetCount(int&2)
a1:
ret
}
}

The above program simply displays the number 3 in the edit box. With every click
on the button, the number decreases by 1. When the value becomes 0, the
execution of the program stops.

Let us now understand as to what goes behind writing such a program.

In countdown.il, a feld counterBox is created that stores the reference of the text
box. Since the reference value is saved in a feld, any method can access the text
feld if it possesses this value. Consider that any call to a function in the text box
class ctb, or access to the value stored in the text box, needs the reference of the
text box on the stack.
243885141.doc 327 od 372

At frst, a text box object is created and the value is stored in counterBox. Then
this reference to the text box is passed on to the constructor of the button class, as
the second parameter. The button can now call any methods from the text box class
by simply placing this reference on the stack. The button constructor stores the
address of this text box in a feld for later use. A point to note here is that all the
contents of a method die at the end of the method whereas, felds are perpetual.

The constructor of the text box is well explained before. The frst change
incorporated is in the constructor of the class button, i.e. ssb. Here, the this
pointer is placed on the stack, and then, using ldarg.2 the reference of the button is
also placed on the stack. Thereafter, the reference of the text box is stored in the
feld par1.

ldar!.)
ldar!.2

The rest of the code ensures that the function OnClick gets called each time the
button is clicked.

In function OnClick, a local int32 is created to store the current value of the text
box. After that, the this pointer is placed on the stack, and the value of the feld
par1 is retrieved. Using the reference to the text box on the stack, function
GetCount from class ctb is called.
.locals ( int&2 i)
ldar!.)

This function places its this pointer and par1 on the stack, and then calls the
virtual function get_Text from the textbox class.

{
ldar!.)
ret
}

This function places a string, representing the text within the text box, on the
stack. This string is converted into a number by calling the function ToInt32, which
resides in the String class. GetCount returns this number on the stack as the value
of the text box.

After placing the number 1 on the stack, sub is called.

243885141.doc 328 od 372
ldc.i/.1
sub
stloc.)

This instruction now subtracts 1 from the value present earlier on the stack. The
number happens to be the value stored in the text box. This new value is stored in
the local i to make our programming easier.

Since IL has no equivalent of the if statement, the bgt instruction is used to
compare two values. 0 is placed on the stack, followed by the value of the variable i.

ldc.i/.)
ldloc.)
b!t a1

If the value of i is zero, the bgt instruction jumps to label a1, which is at the end of
the function. If i has a value of 2, then no jump takes place as the second value
happens to be larger than the frst.

The this pointer or reference of the text box is again pushed on the stack, followed
by the new value of i and then, the function SetCount is called. This function
simply changes the value displayed in the text box.

ldar!.)
ldloc.)

This function loads the parameter passed, i.e. count1, on the stack, and uses
ToString from the int32 class to convert it into a string. The string is placed on the
stack. Finally set_Text is called to change the value displayed. This function is the
reverse of the function GetCount.

{
ldar!.)
ldar! count1
System.Strin!)
ret
}

The last part of the code only gets called if the value of the local i is positive.
There is no change in program countdown.il and aaa.il fle resembles as shown
below.

aaa.il
243885141.doc 329 od 372
.module aaa.dll
.'ile Count-o>n.e$e
{
.'ield int&2 count
{
ldar!.)
ldar!.)
ldc.i/ &
st'ld int&2 ctb::count
ldloca point
ldc.i/ 41
ldc.i/ 1))
ldar!.)
ldloc point
ldar!.)
ldar!.)
ld'ld int&2 ctb::count
System.Strin!)
ldar! parent
ldar!.)
ret
}
{
ldar!.)
ldar! count1
System.Strin!)
ret
}
{
ldar!.)
243885141.doc 330 od 372
ret
}
}
{
.'ield class ctb par1
.'ield class 6System.=imers7System.=imers.=imer timer
.method instance void .ctor(class 6System.Win3orms7System.Win3orms.3orm
parentKclass ctb aa)
{
ldar!.)
ldar!.)
ldar!.2
ldar!.)
ldstr "Start"
System.Strin!)
ldar! parent
ldar!.)
ldar!.)
dup
dup
int&2)
ldar!.)
dup
ret
}
{
stloc.)
ldloc.)
ldar!.)
st'ld class 6System.=imers7System.=imers.=imer ssb::timer
ldloc.)
ldar!.1
ld'tn instance void ssb::%n=imed0vent(class System.%bjectKclass
243885141.doc 331 od 372
ne>obj instance void 6mscorlib7System.0ventVandler::.ctor(class
ldloc.1
ldc.r+ 1)).
ldloc.1
ldc.i/.1
callvirt instance void 6System.=imers7System.=imers.=imer::setD0nabled(bool)
ret
}
.method public instance void %n=imed0vent(class System.%bject sourceKclass
6mscorlib7System.0vent.r!s e) il mana!ed
{
.locals ( int&2 i)
ldar!.)
ldc.i/.1
sub
stloc.)
ldc.i/.)
ldloc.)
b!t a1
ldar!.)
ldloc.)
br a2
a1:
ldar!.)
ld'ld class 6System.=imers7System.=imers.=imer ssb::timer
call instance void 6System.=imers7System.=imers.=imer::Stop()
a2:
ldc.i/.1
pop
ret
}
}
{
native unmana!ed {}
}

In this program, the numbers change automatically with every click on the button.
The program stops when the value becomes 0. The beep sound also stops. The class
ctb and the constructor of class ssb remains the same. It calls the function OnClick
at the press of a mouse.

The function OnClick does things diferently. Using stfd, the timer object is frst
stored in a feld called timer.
243885141.doc 332 od 372

st'ld class 6System.=imers7System.=imers.=imer ssb::timer
ldloc.)

This is because, a function from the timer class is to be called. The same value is
saved in a local V_0. This is a poor programming style, but nobody's looking. The
timer object calls the function OnTimedEvent periodically.

Earlier the function was static but now it is an instance function and it is given the
this pointer instead of a NULL. The code for the timer tick in the earlier program
has been assigned to the button click. The only change is that the text box value on
attaining ZERO will stop the timer. This routine is employed with the function Stop
from the timer class. It is given the timer reference on the stack

ld'ld class 6System.=imers7System.=imers.=imer ssb::timer

Let us put together all that we have learnt so far and write the largest program in
our book. This program should be followed up by reading the same program in the
IL documentation. It is relatively larger and spread over more fles. Let us start from
the very beginning.

countdo>n.il
.'ile aaa.dll
{
.'ield class 6.module aaa.dll7Counter counter
{
.entrypoint
ret
}
{
.locals (class 6.module aaa.dll7Count count)
ldar!.)
ldar!.)
ldar!.)
243885141.doc 333 od 372
ldar!.)
dup
ldar!.)
ld'ld class 6.module aaa.dll7ctb ###::counter5o$
ne>obj instance void 6.module aaa.dll7Count::.ctor(class 6.module
aaa.dll7?Count-isplay)
stloc count
ldar!.)
dup
ld'ld class 6.module aaa.dll7ssb ###::button
ldloc count
ne>obj instance void 6.module aaa.dll75eepin!Counter::.ctor(class 6.module
aaa.dll7?StartStop0ventSourceK class 6.module aaa.dll7Count)
st'ld class 6.module aaa.dll7Counter ###::counter
ldar!.)
ld'ld class 6.module aaa.dll7ssb ###::button
ldar!.)
ld'ld class 6.module aaa.dll7Counter ###::counter
call instance void 6.module aaa.dll7ssb::.dd=o=imeBp(class 6.module aaa.dll7Counter)
ret
}
}

aaa.il
.module aaa.dll
.'ile Count-o>n.e$e
.class public ctb e$tends 6System.Win3orms7System.Win3orms.=e$t5o$ implements
?Count-isplay
{
.'ield class 6.module Count-o>n.e$e7### parent
{
ldar!.)
ldar!.)
ldar! parent
st'ld class 6.module Count-o>n.e$e7### ctb::parent
ldloca point
ldc.i/ 41
ldc.i/ 1))
ldar!.)
ldloc point
ldar! parent
243885141.doc 334 od 372
ldar!.)
ret
}
.method virtual ne>slot instance void SetCount(int&2 count)
{
ldar!.)
ldar! count
System.Strin!)
ret
}
{
ldar!.)
ret
}
}
.data C%B<=0;D-03.B= ( int&2(&)
.class inter'ace abstract public auto autochar ?Count-isplay
{
.method virtual abstract public hidebysi! instance void SetCount(int&2 count) il
mana!ed {}
.method virtual abstract public hidebysi! instance int&2 IetCount() il mana!ed {}
}
.class inter'ace abstract auto autochar public ?StartStop0ventSource
{
.method virtual abstract public hidebysi! instance void addDStartStop0vent(class
StartStop0ventVandler) il mana!ed {}
}
.class public Count e$tends 6mscorlib7System.%bject
{
.'ield int&2 count
.'ield static 'amily int&2 counter-e'ault at C%B<=0;D-03.B=
.'ield class ?Count-isplay display
.method public instance void .ctor(class ?Count-isplay display)
{
ldar!.)
ldar!.)
ldar! display
st'ld class ?Count-isplay Count::display
ldar!.)
lds'ld int&2 Count::counter-e'ault
callvirt instance void Count::setDCount(int&2)
ret
}
.property int&2 Count()
243885141.doc 335 od 372
{
.backin! int&2 count
.!et instance int&2 !etDCount()
.set instance void setDCount(int&2)
.other instance void re'reshDCount()
}
.method virtual ne>slot instance int&2 !etDCount()
{
ldar!.)
ld'ld int&2 Count::count
ret
}
.method virtual ne>slot instance void setDCount(int&2 ne>Count) synchroni#ed
{
ldar!.)
ldar! ne>Count
st'ld int&2 Count::count
ldar!.)
ld'ld class ?Count-isplay Count::display
ldar! ne>Count
callvirt instance void ?Count-isplay::SetCount(int&2)
ret
}
.method virtual ne>slot instance void re'reshDCount() synchroni#ed
{
ldar!.)
dup
ld'ld class ?Count-isplay Count::display
callvirt instance int&2 ?Count-isplay::IetCount()
st'ld int&2 Count::count
ret
}
}
.class public Counter e$tends 6mscorlib7System.%bject
{
.'ield class 6System.=imers7System.=imers.=imer timer
.'ield class 6mscorlib7System.0ventVandler timer0ventVandler
.'ield class Count count
.'ield class ?StartStop0ventSource startStop0ventSource
.'ield class StartStop0ventVandler startStop0ventVandler
.'ield class =imeBp0ventVandler timeBp0ventVandler
.method instance void .ctor(class ?StartStop0ventSource startStop0ventSourceK class
Count count)
{
ldar!.)
ldar!.)
ldar! startStop0ventSource
st'ld class ?StartStop0ventSource Counter::startStop0ventSource
ldar!.)
ldar! count
st'ld class Count Counter::count
ldar!.)
callvirt instance void Counter::Setup=imer()
ldar!.)
243885141.doc 336 od 372
callvirt instance void Counter::SetupStartStop0vent()
ret
}
.method virtual ne>slot instance void Setup=imer()
{
ldar!.)
ldc.r+ 1)))
ne>obj instance void 6System.=imers7System.=imers.=imer::.ctor('loat*/)
st'ld class 6System.=imers7System.=imers.=imer Counter::timer
ldar!.)
ld'ld class 6System.=imers7System.=imers.=imer Counter::timer
ldc.i/.1
call instance void 6System.=imers7System.=imers.=imer::setD.uto;eset(bool)
ldar!.)
dup
dup
ldvirt'tn instance void Counter::%n=ick(class System.%bjectK class
int&2)
st'ld class 6mscorlib7System.0ventVandler Counter::timer0ventVandler
ldar!.)
ldar!.)
ld'ld class 6mscorlib7System.0ventVandler Counter::timer0ventVandler
ret
}
.method virtual ne>slot instance instance void SetupStartStop0vent()
{
ldar!.)
dup
ld'tn instance void Counter::%nStartStop(int&2)
ne>obj instance void StartStop0ventVandler::.ctor(class System.%bjectK int&2)
st'ld class StartStop0ventVandler Counter::startStop0ventVandler
ldar!.)
ld'ld class ?StartStop0ventSource Counter::startStop0ventSource
ldar!.)
ld'ld class StartStop0ventVandler Counter::startStop0ventVandler
callvirt instance void ?StartStop0ventSource::addDStartStop0vent(class
StartStop0ventVandler)
ret
}
.method instance void %nStartStop(int&2 action)
{
ldar! action
brtrue start
ldar!.)
call instance void Counter::Stop()
br done
start:
ldar!.)
call instance void Counter::Start()
done:
243885141.doc 337 od 372
ret
}
.method private hidebysi! instance void Start() il mana!ed {
ldar!.)
ld'ld class Count Counter::count
callvirt instance void Count::re'reshDCount()
ldar!.)
callvirt instance int&2 Count::!etDCount()
ldc.i/.)
ble doDnotDstart
ldar!.)
call instance void 6System.=imers7System.=imers.=imer::Start()
br done
doDnotDstart:
ldar!.)
callvirt instance void Counter::'ireD=imeBp0vent()
done:
ret
}
.method private hidebysi! instance void Stop()
{
ldar!.)
ret
}
.method virtual ne>slot 'amily hidebysi! instance void %n=ick(class System.%bjectK
class 6mscorlib7System.0vent.r!s) il mana!ed {
ldar!.)
dup
ldc.i/.1
sub
callvirt instance void Count::setDCount(int&2)
ldar!.)
ldc.i/.)
ble timeDup
br done
timeDup:
ldar!.)
call instance void Counter::Stop()
ldar!.)
callvirt instance void Counter::'ireD=imeBp0vent()
done:
ret
}
.event =imeBp0ventVandler =imeBp0vent
{
.addon instance void addD=imeBp(class =imeBp0ventVandler QhandlerQ)
.removeon instance void removeD=imeBp(class =imeBp0ventVandler QhandlerQ)
243885141.doc 338 od 372
.'ire instance void 'ireD=imeBp0vent()
}
.method virtual ne>slot instance void addD=imeBp(class =imeBp0ventVandler QhandlerQ)
il mana!ed {
ldar!.)
dup
ld'ld class =imeBp0ventVandler Counter::timeBp0ventVandler
ldar! QhandlerQ
call class6mscorlib7System.-ele!ate 6mscorlib7System.-ele!ate::Combine(class
6mscorlib7System.-ele!ateK class 6mscorlib7System.-ele!ate)
castclass =imeBp0ventVandler
st'ld class =imeBp0ventVandler Counter::timeBp0ventVandler
ret
}
.method virtual ne>slot instance void removeD=imeBp(class =imeBp0ventVandler
QhandlerQ) il mana!ed {ret}
.method virtual ne>slot instance void 'ireD=imeBp0vent()
{
ldar!.)
ld'ld class =imeBp0ventVandler Counter::timeBp0ventVandler
callvirt instance void =imeBp0ventVandler::?nvoke()
ret
}
}
.class public 5eepin!Counter e$tends Counter
{
.method instance void .ctor(class ?StartStop0ventSource startStop0ventSourceK class
Count count) il mana!ed {
ldar!.)
ldar! startStop0ventSource
ldar! count
call instance void Counter::.ctor(class ?StartStop0ventSourceK class Count)
ret
}
.method virtual instance void %n=ick(class System.%bject objectK class
6mscorlib7System.0vent.r!s event.r!s)
{
ldar!.)
ldar! object
ldar! event.r!s
call instance void Counter::%n=ick(class System.%bjectK class
ldar!.)
dup
ldc.i/.)
ble 'inalDbeep
ldc.i/.)
br beepDit
'inalDbeep:
ldc.i/.1
beepDit:
callvirt instance void 5eepin!Counter::5eep(bool)
ret
243885141.doc 339 od 372
}
.method virtual ne>slot instance void 5eep(bool 'inal5eep)
{
ldar! 'inal5eep
brtrue Q'inalQ
ldc.i/.)
br continue
Q'inalQ:
ldc.i/ /+
continue:
pop
ret
}
}
.class abstract sealed public auto autochar Bser&2 e$tends 6mscorlib7System.%bject {
native unmana!ed {}
}
.class private sealed auto autochar StartStop0ventVandler e$tends
6mscorlib7System.8ulticast-ele!ate {
.method public specialname rtspecialname hidebysi! instance void .ctor(class
System.%bject objectK int&2 QmethodQ) runtime mana!ed {}
.method virtual ne>slot public hidebysi! instance void ?nvoke(int&2 action) runtime
mana!ed {}
.method virtual ne>slot public hidebysi! instance class 6QmscorlibQ7System.?.sync;esult
5e!in?nvoke(int&2 actionKclass 6QmscorlibQ7System..syncCallback callbackK class
System.%bject object) runtime mana!ed {}
.method virtual ne>slot public hidebysi! instance void 0nd?nvoke(class
6QmscorlibQ7System.?.sync;esult result) runtime mana!ed {}
}
.class private sealed auto autochar =imeBp0ventVandler e$tends
6mscorlib7System.8ulticast-ele!ate {
.method public specialname rtspecialname hidebysi! instance void .ctor(class
System.%bject objectK int&2 QmethodQ) runtime mana!ed {}
.method virtual ne>slot public hidebysi! instance void ?nvoke() runtime mana!ed {}
.method virtual ne>slot public ne>slot hidebysi! instance class
6QmscorlibQ7System.?.sync;esult 5e!in?nvoke(class 6QmscorlibQ7System..syncCallback
callbackK class System.%bject object) runtime mana!ed {}
.method virtual ne>slot public hidebysi! instance void 0nd?nvoke(class
6QmscorlibQ7System.?.sync;esult result) runtime mana!ed {}
}
.class public ssb e$tends 6System.Win3orms7System.Win3orms.5utton implements
?StartStop0ventSource
{
.'ield class =imeBp0ventVandler timeBp0ventVandler
.'ield class StartStop0ventVandler startStop0ventVandler
.'ield bool state
{
ldar!.)
ldar!.)
ldc.i/.)
243885141.doc 340 od 372
st'ld bool ssb::state
ldar!.)
ldstr "Start"
System.Strin!)
ldar! parent
ldar!.)
ldar!.)
dup
dup
int&2)
ldar!.)
dup
ret
}
.method virtual ne>slot instance void SetState(int&2 ne>State)
{
ldar!.)
ldar! ne>State
st'ld bool ssb::state
ldar! ne>State
ldc.i/.)
beN stopDstate
ldar!.)
ldstr "Stop"
callvirt instance void ssb::setD=e$t(class System.Strin!)
br done
stopDstate:
ldar!.)
ldstr "Start"
callvirt instance void ssb::setD=e$t(class System.Strin!)
done:
ret
}
{
ldar!.)
callvirt instance void ssb::'ireDStartStop0vent()
ret
}
.method public instance void .dd=o=imeBp(class Counter counter)
{
243885141.doc 341 od 372
ldar!.)
dup
ld'tn instance void ssb::%n=imeBp()
ne>obj instance void =imeBp0ventVandler::.ctor(class System.%bjectK int&2)
st'ld class =imeBp0ventVandler ssb::timeBp0ventVandler
ldar! counter
ldar!.)
ld'ld class =imeBp0ventVandler ssb::timeBp0ventVandler
call instance void Counter::addD=imeBp(class =imeBp0ventVandler)
ret
}
.method virtual ne>slot instance void %n=imeBp()
{
ldar!.)
ldc.i/.)
callvirt instance void ssb::SetState(int&2)
ret
}
.event StartStop0ventVandler StartStop0vent
{
.addon instance void addDStartStop0vent(class StartStop0ventVandler QhandlerQ)
.'ire instance void 'ireDStartStop0vent()
}
.method virtual ne>slot instance void addDStartStop0vent(class StartStop0ventVandler
QhandlerQ)
{
ldar!.)
dup
ld'ld class StartStop0ventVandler ssb::startStop0ventVandler
ldar! QhandlerQ
call class6mscorlib7System.-ele!ate 6mscorlib7System.-ele!ate::Combine(class
6mscorlib7System.-ele!ateK class 6mscorlib7System.-ele!ate)
castclass StartStop0ventVandler
st'ld class StartStop0ventVandler ssb::startStop0ventVandler
ret
}
.method virtual ne>slot instance void 'ireDStartStop0vent()
{
ldar!.)
ld'ld bool ssb::state
brtrue stopDit
ldar!.)
ldc.i/.1 :: start counter
br continue
stopDit:
ldar!.)
ldc.i/.) :: stop counter
continue:
ldar!.)
ld'ld class StartStop0ventVandler ssb::startStop0ventVandler
ldar!.)
ld'ld bool ssb::state
callvirt instance void StartStop0ventVandler::?nvoke(int&2)
243885141.doc 342 od 372
ret
}
}

Let us frst start with the fle countdown.il. We will proceed extremely cautiously, in
a step by step manner, so that you can understand how to write a large IL program.
Fields will be explained only just prior to using them.

We start with the following directives viz assembly, assembly extern, fle and
module.
In aaa.dll, class zzz extends from the class Form as it is a Winforms application.
The entrypoint method is Main. An object that looks like zzz is created using newobj
and it is then placed on the stack. Using the function Run, that accepts a
parameter that looks like Form, a window is displayed. This function keeps
executing until the user closes the window.

But before this, the constructor of class zzz puts in a lot of hard work.

The constructor of class zzz frst calls the constructor of class Form. It is necessary
to call the base class constructor here as there may be some initial routine to be
executed. Why take a chance?

While calling the constructor of the base class or super class, the this pointer is
placed on the stack, unlike the instruction newobj.

Now to place a button on the screen:

A new object that looks like the button class ssb is created. Even though the
constructor of this class needs only one parameter, the this pointer is placed twice.
This is so because the reference of the ssb object is to be stored in a feld button
using the stfd instruction.

A quick look at the code of the constructor of class ssb present in aaa.dll at
runtime and fle aaa.il.

The class ssb implements the interface IStartStopEventSource. It is also derived
from the Button class. This interface has one function called add_StartStopEvent.
The question that comes to mind is: Why should we have an interface at all? The
only advantage of having it is that, an ssb object can now be referred to as an ssb
object, or a Button class or an interface. Any object that looks like
IStartStopEventSource will now be supplied with an ssb object. This too can be
overridden. Since ssb is derived from Button, it contains all the functionality of a
button object.

There is no bool data type in IL. This data type is converted to a variable that will
store either a 0 or 1. Thus, the feld state indicates whether the timer is on or of.
243885141.doc 343 od 372
As its value is set to zero, the timer is currently of. The label of the button is set to
Start. Then using the Add function, the button is registered with Winforms, to be
displayed later.

With every click on the button, the function OnClick from the ssb class is to be
called. The function add_Click is used to accomplish this task.

The next call made is to the constructor of the ctb class and this value is stored in
the feld counterBox within the class zzz. The class ctb represents a text box in the
program. It is derived from the class TextBox and it implements the interface
ICountDisplay. This interface has two members, SetCount and GetCount, whose job
is to deal with the number displayed in the text box.

We seem to be digressing from the topic. Back again, the function set_Location from
the TextBox class is used to position the text box on the screen and eventually the
text box too is registered with Winforms.

Next, an object, an instance of the class Count is created by passing the
constructor a textbox reference, disguised as ICountDisplay. This object is stored in
the local count. The class Count is a stand-alone class, since it is derived only from
Object. Every class may not explicitly derive from Object

A feld display, that looks like interface ICountDisplay is created and the textbox
reference is saved in it. Thus, the class Count uses this feld, display, to get and set
the text box at will, since it now holds a reference to it.

We have also created a feld called counterDefault, that is passed a value directly
from the data section of the PE fle. The .data directive uses the same words after
the at, i.e. COUNTER_DEFAULT, which we have initialized to the value 3. We could
have also used the instruction ldc to initialize this variable. We put its current value
3 on the stack and call the function set_Count from the class Count itself. The
number 3 is passed as a parameter newCount, to this function.

There is an int32 type feld called count in the class Count. At frst the parameter
newCount is saved in this feld and then the reference to the text box is stored in
the feld display. Eventually, the function SetCount from class ctb is called; to be
precise it is ICountDisplay. This displays a number 3 in the textbox. The feld count
stores the current displayed value.

Thus, function set_Count does little work. It internally calls SetCount from class
ctb to do the real work. The function SetCount frst loads the parameter 3, passed
to it, on the stack. It then converts it into a string using the static function ToString
from the int class, which places the string on the stack. This string on the stack is
used up by the set_Text function from the TextBox class, to actually display the
string in the text box.

243885141.doc 344 od 372
While we are at it, let us also understand the corresponding get functions. The
function get_Count from the Count class simply returns the value of the feld count.
GetCount from class ctb, frst uses function get_Text to place the string stored in
the text box on the stack. Then, the static function ToInt32 from the String class is
called to convert it into a number. In either case, the value is left on the stack. The
last function in class Count is refresh_Count. This function frst places the feld
display on the stack, and directly calls the virtual function GetCount from the class
ctb. The return value is stored in the feld count.

The directive property is for illustrative purposes only. It is used by tools to
document the property. To refresh your memory, in IL, a property is simply a
function, but in other languages like C#, properties have much more signifcance
and make programming simpler. The get and set directives have already been
explained earlier.

The backing directive denotes a feld that will store the value of the property. In this
case, we use count. The other directive is for functions that are part of a property,
but do not cleanly ft in a get set world, like refresh_Count. As mentioned earlier,
the directive property is optional.

An object that looks like BeepingCounter is the next in sequence to be created. Two
parameters are given to the constructor, one a button and the other a variable that
represents a count. The return value is held in a Counter feld called counter. Note
the type is not a BeepingCounter.

The class BeepingCounter is derived from Counter, which in turn is derived from
Object. In the constructor of BeepingCounter, both the entities: one is the button or
IStartStopEventSource interface and the other being count is loaded on the stack.
The constructor in the base class Counter is the subsequent routine to be executed.

In the constructor:

A feld called startStopEventSource stores the button reference and count stores the
count reference. The function SetupTimer from the Counter class is called that
creates a timer object. The constructor of this object accepts the timer click as a
foat parameter. This timer object is stored in a feld called timer.

The property set_AutoReset to set to true to ensure that a function called OnTick is
called when the timer is enabled. This is done by the function add_Tick of the timer
class. All this code has been explained in detail in the earlier programs.

The last function that this constructor calls is SetupStartStopEvent, from the
Counter class. In this function, at frst the address of a function called OnStartStop
is placed on the stack. A new object is created that is an instance of a class
StartStopEventHandler. This class is derived from MulticastDelegate and is passed
243885141.doc 345 od 372
the address of a function that is to be executed whenever the function Invoke is
called.

This class is nothing but a delegate and contains no code at all, since all the code of
the functions is to be supplied by the runtime. Thus, invoke will call the function
OnStartStop. The feld startStopEventHandler now contains the reference of this
delegate. Two more parameters are placed on the stack, a button reference to call a
function add_StartStopEvent from class ssb or the interface it extends and the next
is a a parameter that the event handler just created above.

The function add_StartStopEvent registers the earlier function with the runtime. If
you recollect, in the delegate chapter we had explained the importance of the
function Combine. A reserved word, when it is to be used as a parameter has to be
placed in single inverted quotes, as in the case of handler. It is casted to the correct
class and stored in a feld startStopEventHandler within ssb.

To avoid any more confusion, remember that a delegate simply calls a function
indirectly. Thus, using this reference, the function OnStartStop can be called.

The last act of the zzz constructor is to place the button reference on the stack and
call function AddToTimeUp with a counter reference as a parameter. In this
function, the address of a function OnTimeUp is placed on the stack. Then an
object is created which is an instance of TimeUpEventHandler. This is a class
derived from MulticastDelegate, thus incorporating two delegates.

The function OnTimeUp is to be called when Invoke is executed. This delegate is
stored in feld TimeUpEventHandler in the ssb class. Finally add_TimeUp function
is called with a counter as a parameter. This function completes the delegate
handling by calling the Combine function and storing the reference in feld
TimeUpEventHandler. Two functions have been registered so far. Every delegate that
is added, can also be removed. The function remove_TimeUp removes the delegate.

Like a property, an event also has a directive that has efect on compilers and tools
only An event called TimeUpEvent that is an instance of class TimeUpEventHandler
is available which has the usual .addon and .removeon directives and also a fre
directive. The latter one supplies information as to which function would be called
by invoke. This is used for documentation purposes only. All the code that gets
called, sets up the actual framework. The action starts only when the button is
clicked on. On doing so, the function OnClick from class ssb gets called.

In function OnClick, another function fre_StartStopEvent is called from the same
class. In this function, a check on the value of the feld called state is maded. If you
fip back a few pages, it was given an initial value of zero or false. As its value is
false, the brtrue is not executed and a function called Set_State is called from class
ssb with the value 1 as a parameter.
243885141.doc 346 od 372

In the function, the parameter received becomes the new value of feld state, whose
value now changes from 0 to 1. This value and zero are placed on the stack and the
instruction beq is executed. A jump is made to the label if the two parameters are
equal or, in other words, the value of state is zero. Being unequal for the moment,
the string Stop is placed on the stack and set_Text is called to change the label to
Stop. The course proceeds to the label done.

A round about turn to the function fre_StartStopEvent and back to the label
continue. The delegate startStopEventHandler and the state of the button i.e.
enabled or 1, is placed on the stack and the Invoke function is called. This calls the
function OnStartStop in the class Counter.

This function checks the value of the parameter passed to it. Since the value of feld
state is 1 on the stack, the instruction brtrue jumps to the label start: and calls the
function Start. In other circumstances, the function Stop is called. Thus,
OnStartStop is called through Invoke and it will in turn either call the functions
Start or Stop, depending upon the parameter value passed, either a 1 or 0,
respectively.

In the function Start, refresh_Count refreshes the counter giving it a new value. If
the value is less than zero, the program is to be terminated, so, the label
do_not_start: is executed where the function fre_TimeUpEvent is called that ceases
everything.

Presently, the timer is fred, so that the function OnTick is called every 1000
milliseconds. To achieve this, the function Start is called from the timer class.

The function OnTick is called from the class BeepingCounter and not from the class
Counter. Next, the function OnTick is called from the class Counter. In function
OnTick, which is called very second, the new value of the counter is displayed. This
is achieved using the get_Count function. The value is decremented by 1 and
set_Count function restores this new value back to the variable.

A check is performed on the value of count using ble. When zero, the timer is
stopped by calling the function Stop from the timer class.

The class BeepingCounter.

The function Beep is called where the parameter is given to change the occurrence
of beep sound. Obviously it should beep only once and terminate when the value of
count is 0.

In the function Beep, a value of 0 or 48 is placed , depending upon whether the
parameter is 0 or 1. Then, the actual MessageBeep function from class User32 is
called for. This class is an abstract class and has a static function MessageBeep
243885141.doc 347 od 372
that is of type pinvokeimpl. This specifes that the code of this function is in
user32.dll. Finally, the value returned by get_Count evaluates to zero. This will
result in a call to Stop of the timer, so that the function OnTick is not called
thereafter. The OnTimeUp function calls function SetState with a value of 0. This
triggers of the shutdown procedure as explained above.

This is the most exhaustive explanation of any program we have given so far.

-16-

Appendix 1

Managed 3MM
There are many superfcial diferences between programming languages. Minor
modifcations have been carried out in the C++ programming language and, the
modifed language is called Managed Extensions for C++.

a.cpp
Tusin! Pmscorlib.dllM
usin! namespace System9
void main()
{
Console::Writeine("hell")9
}

We specify the dll that contains the code using a pre-processor directive #using.
This pre-processor director works in the same manner in Managed C++ as it does in
C#.

The C/C++ languages support global functions, and hence, Managed C++ also does
so. Thus the function main is used with the letter "small m".

Static functions in C++ are separated from the class name by the symbol "::" . The
same syntax is used here too. Thus, there is no variation from one .Net language to
another, and since they all compile fnally to IL, we can all use the same WriteLine
function in any programming language that we like.
243885141.doc 348 od 372

We compile the above program a.cpp into an executable as follows:

&cl '3L2 a.cpp

As before we run ildasm and get the following output.

:: ?8.I0DC%;D@=.503?YBG6)7:
:: ;@.: ))))+)&+
:: Count: )))1
:: =ype: )))1
:: 6))))7 ()*)))))1)
:: ?8.I0DC%;D@=.503?YBG617:
:: ;@.: ))))+4&)
:: Count: )))1
:: =ype: )))1
:: 6))))7 ()*)))))2)

.cor'la!s )$)))))))2
.vt'i$up 617 int&2 'romunmana!ed at -D))))+)&+ :: )*)))))1
.vt'i$up 617 int&2 at -D))))+4&) :: )*)))))2
{
.ori!inator ( ()& *+ ,1 1* -& ./ .0 && )
.hash ( (12 // 3+ C, 11 13 1/ &3 ,4 -4 .5 .- 02 -3 1- 0)
32 ,- /3 5C )
.ver 1:):22)/:21
}
.assembly e$tern 8icroso't.@isualC
{
.ori!inator ( ()& *+ ,1 1* -& ./ .0 && )
.hash ( (.* && .1 &3 .1 *0 ). 3& *) 01 22 01 ++ *0 42 15
/+ 13 41 ,+ )
.ver 4:):,)&):)
}
.assembly a as "a"
{
.ver ):):):)
}
.module a.e$e
:: 8@?-: {1++.-.02A1C*/A11-1A.115A++5234&1-)45}
.class value private e$plicit ansi sealed J8ulti5yteJsi#eJ1
e$tends 6mscorlib7System.@alue=ype
{
.pack 1
.si#e 1
} :: end o' class J8ulti5yteJsi#eJ1

::Ilobal 'ields
::UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
UUUUU
243885141.doc 349 od 372
.'ield static privatescope value class J8ulti5yteJsi#eJ1 QunnamedA!lobalA
)JGS=)/)))))1Q at -D))))+)&)
::Ilobal methods
::UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
UUUUU
.method public static int&2
modopt(6mscorlib7System.;untime.?nteropServices.CallConvCdecl)
main() il mana!ed
{
.vtentry 1 : 1
:: Code si#e 14 ()$11)
.ma$stack 1
?D)))): lds'lda value class J8ulti5yteJsi#eJ1 QunnamedA!lobalA)JGS=)/)))))1Q
?D)))1: ne>obj instance void 6mscorlib7System.Strin!::.ctor(int+E)
?D)))a: call void 6mscorlib7System.Console::Writeine(class System.Strin!)
?D)))': ldc.i/.)
?D))1): ret
} :: end o' !lobal method main

.method public static pinvokeimpl(:E <o map E:)
unsi!ned int&2 DmainC;=Startup() native unmana!ed
{
.entrypoint
System.Security.SuppressBnmana!edCodeSecurity.ttribute::.ctor() ( ( )1 )) )) )) )
.vtentry 2 : 1
:: 0mbedded native code
:: -isassembly o' native methods is not supported.
:: 8ana!ed =ar!et;@. ( )$1
} :: end o' !lobal method DmainC;=Startup

.data -D))))+)&) ( bytearray (
*+ *1 *C *C )) )) )) )))

The code is quite large, but after reading it, you will realize that all the code
generated in the .NET family of languages is similar. Thus, we believe that those
who know IL have a great future in the world of computing.
-17-

Appendix 2

Uem-stif-ing ildasm.e.e
243885141.doc 350 od 372
In this chapter we will solve a great mystery for you. We unravel as to how the
programmers at Microsoft wrote the Disassembler for IL, 'ildasm.exe'. To write such
a program, you need to be familiar with the C programming language. If you are not
then understanding the C programs is futile. We will therefore not try to teach you
C, but instead, in the next couple of pages we will attempt to teach you the
structure of the PE fle format under Windows and how ildasm creates its magic.

We frst created a fle b.il that reads as follows:

b.il
.assembly mukhi{}
{
{
.entrypoint
ldstr "vijayO"
ret
}
}

We create b.exe and then run ildasm with the following syntax:

ildasm 'out=a.t.t .e.e 'All

The output created by ildasm is as follows.

a.t$t
:: G0 Veader:
:: Subsystem: )))))))&
:: <ative entry point address: ))))224e
:: ?ma!e base: ))/)))))
:: Section ali!nment: ))))2)))
:: 3ile ali!nment: )))))2))
:: Stack reserve si#e: ))1)))))
:: Stack commit si#e: ))))1)))
:: -irectories: ))))))1)
:: ) 6) 7 address 6si#e7 o' 0$port -irectory:
:: 22&) 6/b 7 address 6si#e7 o' ?mport -irectory:
:: ) 6) 7 address 6si#e7 o' ;esource -irectory:
:: ) 6) 7 address 6si#e7 o' 0$ception -irectory:
:: ) 6) 7 address 6si#e7 o' Security -irectory:
:: /))) 6c 7 address 6si#e7 o' 5ase ;elocation =able:
:: ) 6) 7 address 6si#e7 o' -ebu! -irectory:
:: ) 6) 7 address 6si#e7 o' .rchitecture Speci'ic:
:: ) 6) 7 address 6si#e7 o' Ilobal Gointer:
:: ) 6) 7 address 6si#e7 o' =S -irectory:
:: ) 6) 7 address 6si#e7 o' oad Con'i! -irectory:
:: ) 6) 7 address 6si#e7 o' 5ound ?mport -irectory:
:: 2))) 6+ 7 address 6si#e7 o' ?mport .ddress =able:
:: ) 6) 7 address 6si#e7 o' -elay oad ?.=:
243885141.doc 351 od 372
:: 2))+ 6/+ 7 address 6si#e7 o' C%8L Veader:

:: ?mport .ddress =able
:: mscoree.dll
:: ))))2))) ?mport .ddress =able
:: ))))22*e ?mport <ame =able
:: ) time date stamp
:: ) ?nde$ o' 'irst 'or>arder re'erence
::
:: ) DCor0$e8ain

:: -elay oad ?mport .ddress =able
:: <o data.
:: C; Veader:
:: 42 Veader Si#e
:: 2 8ajor ;untime @ersion
:: ) 8inor ;untime @ersion
:: 1 3la!s
:: *)))))1 0ntrypoint =oken
:: 2)1c 61d/ 7 address 6si#e7 o' 8etadata -irectory:
:: ) 6) 7 address 6si#e7 o' ;esources -irectory:
:: ) 6) 7 address 6si#e7 o' Stron! <ame Si!nature:
:: ) 6) 7 address 6si#e7 o' Code8ana!er =able:
:: ) 6) 7 address 6si#e7 o' @=able3i$ups -irectory:
:: ) 6) 7 address 6si#e7 o' 0$port .ddress =able:
:: ) 6) 7 address 6si#e7 o' Grecompile Veader:
:: Code 8ana!er =able:
:: de'ault
:: <o data.
:: 0$port .ddress =able Zumps:
:: <o data.

.cor'la!s )$)))))))1
.assembly e$tern :E2&)))))1E: mscorlib
{
.ver ):):):)
}
.assembly :E2))))))1E: mukhi
{
.ver ):):):)
}
.module b.0Y0
:: 8@?-: {-022/30)A1C&*A11-1A.115A///11&1/))))}
.class :E)2)))))2E: public auto ansi ###
e$tends 6mscorlib:E 2&)))))1 E:7System.%bject:E )1)))))1 E:
{
.method :E)*)))))1E: public hidebysi! static
void vijay() il mana!ed
:: S?I: )) )) )1
{
.entrypoint
:: 8ethod be!ins at ;@. )$2)1)
:: Code si#e 11 ()$b)
243885141.doc 352 od 372
.ma$stack +
?D)))): :E 42 X (4)))))))1 E: ldstr "vijayO"
?D)))1: :E 2+ X ().))))))1 E: call void 6mscorlib:E 2&)))))1
E:7System.Console:E )1)))))2 E:::Writeine(class System.Strin!) :E ).)))))1 E:
?D)))a: :E 2. X E: ret
} :: end o' method ###::vijay

Let us see as to how ildasm creates such an output. We will start with the simplest
C program.

a.c
Tinclude Pstdio.hM
Tinclude Psys:stat.hM
struct stat st9
main()
{
3?0 E'p9
'p ( 'open("c:RRilRRb.e$e"K"rb")9
'stat('ileno('p)K[st)9
print'("WdRn"Kst.stDsi#e)9
}

6utput
>8K9

We compiled this program, and all the others, by calling the C compiler as follows:

&cl a.c

This generates an executable fle called a.exe.

fp is a pointer to a structure called FILE, a structure tag, in the header fle stdio.h.
The function fopen opens a fle b.exe stored in the IL subdirectory, for reading, in
the binary mode. The binary mode means all characters are to be treated equally.
Thus, when the number 26 is encountered, the fle system should not read it as a
-1.
Now, to determine the size of the fle, we use a function called fstat that accepts two
parameters.

The frst parameter is the fle handle returned by the fleno function. This function
accepts a higher level fle handle i.e. a pointer returned by fopen, and returns an
int, signifying the lower level fle handle that is returned by the function open.

The second parameter to fstat is a structure st, that looks like a structure tag stat,
found in the header fle sys/stat.h. We pass the address of this structure as the
second parameter which the function fstat flls. Using the function printf, we
display the member st_size that holds the size of the fle b.exe.

243885141.doc 353 od 372
a.c
Tinclude Pstdio.hM
struct stat st9
char Ep9
main() {
3?0 E'p9
p(malloc( st.stDsi#e )9
'read(pKst.stDsi#eK1K'p)9
print'("Wc Wc Rn"KEpKE(pL1))9
}

6utput
M V

In the above program, a function called malloc is used to allocate 2048 bytes of
memory. This function returns a pointer to where this memory allocation begins.
We store the beginning address of this memory, returned by malloc, in a pointer to
a char called p.

Now to read the entire fle in memory:

To accomplish this, we use a function called fread, that requires the starting
memory location of the fle and the size of the fle to be read. We have asked for the
entire fle.
Thus, the frst byte of the fle will be stored at the memory location p, the second
byte, at memory location p+1 etc. So, using our handy printf function, we display
the frst two bytes of the fle as chars. They happen to be M and Z.

Every fle under DOS begins with these two numbers. To refresh your memory, a fle
on disk is made up of numbers ranging from 0 to 255 only, in the same manner in
which the computer memory is structured. These two numbers are called the
signature of the fle. If we change any of these two bytes, the operating system
refuses to recognize our fle as a valid DOS fle.

Legend has it that the person at Microsoft who designed the memory management
sub system of DOS had the same initials. Why are we talking about DOS when we
are programming Windows? The reason for this is that every Windows fle is a DOS
compatible fle.

a.c
Tinclude Pstdio.hM
Tde'ine W%;- short
Tde'ine %<I lon!
struct D?8.I0D-%SDV0.-0;1 {
W%;- eDma!ic9
W%;- eDcblp9
243885141.doc 354 od 372
W%;- eDcp9
W%;- eDcrlc9
W%;- eDcparhdr9
W%;- eDminalloc9
W%;- eDma$alloc9
W%;- eDss9
W%;- eDsp9
W%;- eDcsum9
W%;- eDip9
W%;- eDcs9
W%;- eDl'arlc9
W%;- eDovno9
W%;- eDres6/79
W%;- eDoemid9
W%;- eDoemin'o9
W%;- eDres261)79
%<I eDl'ane>9
}9
struct D?8.I0D-%SDV0.-0;1Eima!edoshdr9
struct stat st9
char EpKEp19
char Epe9
main()
{
3?0 E'p9
p((char E)malloc(st.stDsi#e)9
p1 ( p9
ima!edoshdr ((struct D?8.I0D-%SDV0.-0;1 E) p19
print'("8a!ic <o W$ Rn"Kima!edoshdrAMeDma!ic)9
print'("eDl'ane> W$ WdRn"Kima!edoshdrAMeDl'ane>Kima!edoshdrAMeDl'ane>)9
pe ( pLima!edoshdrAMeDl'ane>9
print'("Wc Wc Wd WdRn"KEpeKE(peL1)KE(peL2)KE(peL&))9
}

6utput
Magic +o ?aKd
eGlfanew 98 1>9
# , 8 8

We have initialized another pointer p1 to p, so that we maintain at least one pointer
to the starting location of our fle in memory. We next create a new pointer,
imagedoshdr to a structure tag_IMAGE_DOS_HEADER1 and initialized it to p.

A DOS fle begins with a structure that looks like _IMAGE_DOS_HEADER1. Two
members of this structure are displayed, the felds e_magic, i.e. MZ, and the
member e_lfanew. E_lfanew stores a decimal value of 128. This number signifes
that starting point of the Windows PE header from the start of the fle.

We jump 128 bytes and print the next four bytes or the frst 4 bytes of a Windows
PE fle. These are P E 0 0. PE is the signature of a Windows fle. If we change even
243885141.doc 355 od 372
one byte, Windows will not recognize this fle. All fles have a unique signature. In a
Java class fle , the signature is spread over 4 bytes, and in hex, it reads as CA FE
BA BE.

a.c
Tinclude P>indo>s.hM
Tinclude Pstdio.hM
struct D?8.I0D-%SDV0.-0; Eima!edoshdr9
struct stat st9
char EpKEp19
char Epe9
main()
{
3?0 E'p9
p1 ( p9
ima!edoshdr ((struct D?8.I0D-%SDV0.-0; E) p19
print'("8a!ic <o W$ Rn"Kima!edoshdrAMeDma!ic)9
print'("eDl'ane> W$ WdRn"Kima!edoshdrAMeDl'ane>Kima!edoshdrAMeDl'ane>)9
print'("Wc Wc Wd WdRn"KEpeKE(peL1)KE(peL2)KE(peL&))9
}

We see the same output as earlier. The reason being that the structure tag
_IMAGE_DOS_HEADER is already present in the fle Winnt.h that is included by
windows.h. Thus the PE fle format is documented by Microsoft. The only addition
that has to be made while running the c compiler is as follows:

cl 'nologo a.c c:\progra~1\micros~1\<cS9\li\uuid.li

a.c
Tinclude Pstdio.hM
struct D?8.I0D3?0DV0.-0; Eima!e'ilehdr9
struct stat st9
char EpKEp1KEp29
char Epe9
main()
{
3?0 E'p9
p1 ( p9
243885141.doc 356 od 372
p2 ( peL/9
ima!e'ilehdr ( p29
print'("8achine W$Rn"Kima!e'ilehdrAM8achine)9
print'("<umber %' Sections WdRn"Kima!e'ilehdrAM<umber%'Sections)9
print'("Gointer =o Symbol =able WdRn"Kima!e'ilehdrAMGointer=oSymbol=able)9
print'("<umber %' Symbols WdRn"Kima!e'ilehdrAM<umber%'Symbols)9
print'("Si#e %' %ptional Veader WdRn"Kima!e'ilehdrAMSi#e%'%ptionalVeader)9
print'("Characteristics W$Rn"K(unsi!ned short)ima!e'ilehdrAMCharacteristics)9
}

6utput
Machine 1Kc
+umer 6f )ections >
#ointer $o )-mol $ale 8
+umer 6f )-mols 8
)i=e 6f 6ptional %eader >>K
3haracteristics 18e

The PE fle starts with a magic number PE00, that takes up 4 bytes. We need to
skip over these 4 bytes. Hence, we add 4 to the variable p. This is the starting
position of the structure _IMAGE_FILE_HEADER, from the header fle. We created a
pointer imageflehdr to the structure tag and initialized it to the new value of p.
Thereafter, the members of this structure are displayed. Our exe fle is made up of a
large number of entities. Two of them are code and data.

Diferent entities are stored in diferent places or sections. Our tiny exe fle
comprises of these two sections.

After this header, comes another header, that is optional for an obj fle. The size of
this header can change, but as of now, it is 224 bytes. This header follows the above
header thus proving that our PE fle is made up of a series of C structures.

a.c
Tinclude Pstdio.hM
struct D?8.I0D%G=?%<.DV0.-0; Eima!eoptionalhdr9
struct stat st9
char EpKEp1KEp2KEp&9
char Epe9
main()
{
3?0 E'p9
p1 ( p9
243885141.doc 357 od 372
p2 ( peL/9
ima!e'ilehdr ( p29
p& ( (lon!)ima!e'ilehdrL2)9
ima!eoptionalhdr ( (struct D?8.I0D%G=?%<.DV0.-0; E)p&9
print'("Subsystem WdRn"Kima!eoptionalhdrAMSubsystem)9
print'("8a!ic W$Rn"Kima!eoptionalhdrAM8a!ic)9
print'("8ajorinker@ersion WdRn"Kima!eoptionalhdrAM8ajorinker@ersion)9
print'("8inorinker@ersion WdRn"Kima!eoptionalhdrAM8inorinker@ersion)9
print'("Si#e%'Code WdRn"Kima!eoptionalhdrAMSi#e%'Code)9
print'("Si#e%'?nitiali#ed-ata WdRn"Kima!eoptionalhdrAMSi#e%'?nitiali#ed-ata)9
print'("Si#e%'Bninitiali#ed-ata WdRn"Kima!eoptionalhdrAMSi#e%'Bninitiali#ed-ata)9
print'(".ddress%'0ntryGoint Wd W$Rn"Kima!eoptionalhdrA
M.ddress%'0ntryGointKima!eoptionalhdrAM .ddress%'0ntryGoint)9
print'("5ase%'Code WdRn"Kima!eoptionalhdrAM5ase%'Code)9
print'("5ase%'-ata WdRn"Kima!eoptionalhdrAM5ase%'-ata)9
print'("?ma!e5ase Wd W$Rn"Kima!eoptionalhdrA
M?ma!e5aseKima!eoptionalhdrAM?ma!e5ase)9
print'("Section.li!nment Wd W$Rn"Kima!eoptionalhdrA
MSection.li!nmentKima!eoptionalhdrAMSection.li!nment)9
print'("3ile.li!nment Wd W$Rn"Kima!eoptionalhdrA
M3ile.li!nmentKima!eoptionalhdrAM3ile.li!nment)9
print'("8ajor%peratin!System@ersion WdRn"Kima!eoptionalhdrA
M8ajor%peratin!System@ersion)9
print'("8inor%peratin!System@ersion WdRn"Kima!eoptionalhdrA
M8inor%peratin!System@ersion)9
print'("Si#e%'?ma!e WdRn"Kima!eoptionalhdrAMSi#e%'?ma!e)9
print'("Si#e%'Veaders WdRn"Kima!eoptionalhdrAMSi#e%'Veaders)9
print'("-llCharacteristics W$Rn"Kima!eoptionalhdrAM-llCharacteristics)9
print'("oader3la!s W$Rn"Kima!eoptionalhdrAMoader3la!s)9
print'("<umber%';va.ndSi#es WdRn"Kima!eoptionalhdrAM<umber%';va.ndSi#es)9
print'("Stack ;eserve Wd W$Rn"Kima!eoptionalhdrA
MSi#e%'Stack;eserveKima!eoptionalhdrAMSi#e%'Stack;eserve)9
print'("Stack Commit Wd W$Rn"Kima!eoptionalhdrA
MSi#e%'StackCommitKima!eoptionalhdrAMSi#e%'StackCommit)9
}

6utput
)us-stem @
Magic 18
MaCorLinkerFersion N
MinorLinkerFersion 8
)i=e6f3ode 18>K
)i=e6fInitiali=edUata ?1>
)i=e6f1ninitiali=edUata 8
Address6f,ntr-#oint 99@8 >>:e
7ase6f3ode 91S>
7ase6fUata 1N@9K
Image7ase K1SK@8K K88888
)ectionAlignment 91S> >888
0ileAlignment ?1> >88
MaCor6perating)-stemFersion K
Minor6perating)-stemFersion 8
)i=e6fImage >K?:N
)i=e6f%eaders ?1>
Ull3haracteristics 8
243885141.doc 358 od 372
Loader0lags 8
+umer6f2<aAnd)i=es 1N
)tack 2eser<e 18K9?:N 188888
)tack 3ommit K8SN 1888

We have now displayed the members of the third structure,
_IMAGE_OPTIONAL_HEADER. We are not going to explain all the members. We will
focus only those that are present in the ildasm output, and those that are
important for our understanding of PE fles.

The size of the structure _IMAGE_FILE_HEADER is 20 bytes. So we add 20 to
imageflehdr, which has been cast to a long, to get to the address of the third
structure. We store it in the pointer imageoptionalhdr. The member Subsystem has
a value of 3, which means that it is a Windows executable fle. The address of Entry
point is the relocation where the frst executable instruction starts. The number
8830 is 227e in hex. The image base tells us as to where the program will be loaded
in memory by the Windows loader. It has a value of 400000 hex. This means that
every program loaded under Windows will start at this memory location.

The section alignment means that each section will start in memory at a location
that is divisible by 0x2000. Thus, even if the size of the data is 2 bytes, it will take
up an entire section i.e. 4098 bytes of memory. The fle alignment provides the
same functionality for the fle stored in memory. The section will take up a
minimum 512 bytes of disk space. Then, we have the amount of stack memory to
be allocated for our program.

a.c
Tinclude Pstdio.hM
int i9
char Edirtype67({
"0YG%;="K"?8G%;="K";0S%B;C0"K"0YC0G=?%<"K
"S0CB;?=F"K"5.S0;0%C"K"-05BI"K".;CV?=0C=B;0"K
"I%5.G=;"K"=S"K"%.-DC%<3?I"K"5%B<-D?8G%;="K
"?.="K"-0.FD?8G%;="K"C%8D-0SC;?G=%;"
}9
struct stat st9
char EpKEp1KEp2KEp&9
char Epe9
main() {
3?0 E'p9
p1 ( p9
243885141.doc 359 od 372
p2 ( peL/9
ima!e'ilehdr ( p29
print'("@irtual .ddress Si#e <ame Rn")9
'or(i()9iP119iLL)
{
print'("W)+$ W/d "K ima!eoptionalhdrAM-ata-irectory6i7.@irtual.ddress
Kima!eoptionalhdrAM-ata-irectory6i7.Si#e)9
print'("WA11sRn"Kdirtype6i7)9
}
}
6utput
Firtual Address )i=e +ame
88888888 8 ,;#62$
8888>>@8 :? IM#62$
88888888 8 2,)6123,
88888888 8 ,;3,#$I6+
88888888 8 ),312I$W
8888K888 1> 7A),2,L63
88888888 8 U,714
88888888 8 A23%I$,3$12,
88888888 8 4L67AL#$2
88888888 8 $L)
88888888 8 L6AUG36+0I4
88888888 8 761+UGIM#62$
8888>888 9 IA$
88888888 8 U,LAWGIM#62$
8888>889 :> 36MGU,)32I#$62

Our pointer imageoptionalhdr is a pointer located at the start of the third header.
The last member of this structure is an array of 16 IMAGE_DATA_DIRECTORY
structures and is called DataDirectory. This structure has two members,
VirtualAddress and Size. Each of these entries tells us the size and the location in
memory, i.e. the virtual address, where we can fnd the respective data for these
entities.

We prefer to handle numbers in the decimal format, whereas ildasm prefers to
handle them in hex. The array member 15 will always store the data for
COM_DESCRIPTOR. This has been pre-decided, and that is how the array of
pointers dirtype is flled up.

a.c
Tinclude Pstdio.hM
int i9
char Edirtype67({
"0YG%;="K"?8G%;="K";0S%B;C0"K"0YC0G=?%<"K
"S0CB;?=F"K"5.S0;0%C"K"-05BI"K".;CV?=0C=B;0"K
"I%5.G=;"K"=S"K"%.-DC%<3?I"K"5%B<-D?8G%;="K
"?.="K"-0.FD?8G%;="K"C%8D-0SC;?G=%;"
243885141.doc 360 od 372
}9
struct D?8.I0DS0C=?%<DV0.-0; Eima!esectionhdr9
struct stat st9
char EpKEp1KEp2KEp&KEp/9
char Epe9
main()
{
3?0 E'p9
p1 ( p9
p2 ( peL/9
ima!e'ilehdr ( p29
p/ ( p&L22/9
ima!esectionhdr ( p/9
print'("<ame G .ddr @ .ddr S#%';-ata Gtr=o;-ata ")9
print'("Gtr=o;el Gtr=oinenos <o%';el <oinenos CharRn")9
'or(i()9 i Pima!e'ilehdrAM<umber%'Sections9iLL )
{
print'("WA+s W)+$ W)+$ WA+d W)+$ "Kima!esectionhdrAM<ameKima!esectionhdrA
M8isc.Ghysical.ddressKima!esectionhdrAM@irtual.ddressK
ima!esectionhdrAMSi#e%';a>-ataKima!esectionhdrAMGointer=o;a>-ata)9
print'(" W)+$ W)+$ W/d W/d W)+$ Rn"Kima!esectionhdrAMGointer=o;elocationsK
ima!esectionhdrAMGointer=oinenumbersKima!esectionhdrAM<umber%';elocationsK
ima!esectionhdrAM<umber%'inenumbersKima!esectionhdrAMCharacteristics)9
ima!esectionhdrLL9
}
}
6utput
+ame # Addr F Addr )=6f2Uata #tr$o2Uata
#tr$o2el #tr$oLinenos +o6f2el +oLinenos 3har
.te.t 88888>9K 8888>888 18>K 88888>88
88888888 88888888 8 8 N88888>8

.reloc 8888888c 8888K888 ?1> 88888N88
88888888 88888888 8 8 K>8888K8

We are now displaying the sections of the fle. It is in these sections that data of the
fle is stored. The sections start immediately after the Optional Header. We have a
structure that represents each section. If you recall, we have two sections as
specifed by the second header. We display the members of each section along with
the name of the section. The name always starts with a dot. The section .text is
where the code resides. We are then provided with the memory location and size of
the section.
243885141.doc 361 od 372

a.c
Tinclude Pstdio.hM
int iKj9
struct stat st9
char EpKEp1KEp2KEp&KEpe9
struct complus
{
int si#e9
short major9
short minor9
lon! maddrKmsi#e9
lon! 'la!sKtoken9
lon! raddrKrsi#e9
lon! saddrKssi#e9
lon! caddrKcsi#e9
lon! vaddrKvsi#e9
lon! eaddrKesi#e9
lon! paddrKpsi#e9
}9
struct complus Ea9
main()
{
3?0 E'p9
p1 ( p9
p2 ( peL/9
ima!e'ilehdr ( p29
i ( ima!eoptionalhdrAM-ata-irectory61/7.@irtual.ddressW)$2)))9
i ( i L 1129
p ( p L i9
a ( p9
print'("Wd Veader si#eRn"KaAMsi#e)9
print'("WdR 8ajor version n"KaAMmajor)9
print'("Wd 8inor version Rn"KaAMminor)9
print'("Wld 3la!s Rn"KaAM'la!s)9
print'("W$ =oken Rn"KaAMtoken)9
print'("W$ W$ 8eta-ataRn"KaAMmaddrKaAMmsi#e)9

print'("W$ W$ ;esourcesRn"KaAMraddrKaAMrsi#e)9
print'("W$ W$ Stron! <ameRn"KaAMsaddrKaAMssi#e)9
print'("W$ W$ Code 8.na!erRn"KaAMcaddrKaAMcsi#e)9
print'("W$ W$ @table 3i$upsRn"KaAMvaddrKaAMcsi#e)9
243885141.doc 362 od 372
print'("W$ W$ 0$port .ddress =ableRn"KaAMeaddrKaAMesi#e)9
print'("W$ W$ GreCompile VeaderRn"KaAMpaddrKaAMpsi#e)9
}

6utput
:> %eader si=e
> MaCor <ersion n8 Minor <ersion
1 0lags
N888881 $oken
>8?c 1dK MetaUata
8 8 2esources
8 8 )trong +ame
8 8 3ode MAnager
8 8 Ftale 0i.ups
8 8 ,.port Address $ale
8 8 #re3ompile %eader

Now to display the COM+ header. The address of the starting location of this header
is given in the 14th member of the DataDirectory Array. This value tells us as to
where in memory the COM+ header starts. Here, it happens to be 0x2000.

Unfortunately, we have loaded our exe fle in memory using malloc, and not the
mmap series of functions. Thus, we have used a quick albeit messy shortcut. We
simply took this virtual address and got the remainder after dividing it by 0x2000
because the section alignment in memory is 0x2000. This remainder is added to
512, because the fle section alignment is 512. Thus we arrive at the number 520.
This is the ofset from the start of the fle for the location of the COM+ header.

We have created a structure that maps a COM+ header and simply displays the
members. The COM+ structure starts with the width of the structure, i.e. 72,
followed by the version number. Then, it tells us the starting location of the
metadata in memory. The concept of metadata is one of the linchpins of the .NET
world. Then, there is a fags member, followed by a series of other directory entries.

a.c
Tinclude Pstdio.hM
int iKj9
struct stat st9
char EpKEp1KEp2KEp&KEpeK Ep/9
struct D?8.I0D?8G%;=D-0SC;?G=%; Ea9
lon! Eb9short Eb19
main()
{
3?0 E'p9
243885141.doc 363 od 372
p1 ( p9
p2 ( peL/9
ima!e'ilehdr ( p29
i ( ima!eoptionalhdrAM-ata-irectory617.@irtual.ddressW)$2)))9
print'("W$Rn"Ki)9
i ( i L 1129
p ( p L i9
a ( p9
print'("W)+$ ?mport .ddress =ableRn"KaAM3irst=hunk)9
print'("W)+$ ?mport <ame =ableRn"KaAM<ame)9
print'("W$ time date stampRn"KaAM=ime-ateStamp)9
print'("W$ ?nde$ o' 'irst 3or>arder re'erenceRn"KaAM3or>arderChain)9
print'("W)+$ CharacteristicsRn"KaAMCharacteristicsW)$2))))9
p/ ( p1 L 112 L aAM<ameW)$2)))9
print'("<ame WsRn"Kp/)9
p/ ( p1 L 112 L aAMCharacteristics W)$2)))9
b ( p/9
print'(".ddress W$Rn"KEb)9
print'(".ddress W$Rn"KEbW)$2))))9
p/ ( p1 L 112 L EbW)$2)))9
b1 ( p/9
print'("Vint WdRn"K Eb1)9
b1LL9
print'("WsRn"Kb1)9
}

6utput
>@8
8888>888 Import Address $ale
8888>>Ne Import +ame $ale
8 time date stamp
8 Inde. of *rst 0orwarder reference
88888>?9 3haracteristics
+ame mscoree.dll
Address >>N8
Address >N8
%int 8
G3or,.eMain

We fnally display the Import Address Table. We have written this code in such an
unstructured manner, knowing the purists of programming conventions will throw
a ft and ask for this book to be banned.

The second member of the DataDirectory array tells us as to where the Import
directory starts. We store this value in a variable called i after taking the remainder.
Microsoft devised the PE fle format in such a way that retrieving information at
runtime would be a breeze. Thus, the frst section would be loaded at memory
243885141.doc 364 od 372
location 0x2000 hex from the start of the base. Thus, at runtime, we will fnd this
Import directory at memory location 0x2300. Since we have used malloc, we add
512 to obtain the starting address of the Import table. This gives us the starting
memory location of the structure IMAGE_IMPORT_DESCRIPTOR, whose members
are then displayed. The member name is a RVA, or a relative virtual address.

We do the same computation for Name to obtain another memory location that tells
us the name of the dll that we are importing from. The member Characteristics is
an RVA. We obtain a memory location that points to another RVA. This points to a
structure where the frst member is a short for the hint and the second member is
a NULL terminated string containing the name of the function in the dll.

This is defnitely not a pretty looking program, but it works well to prove the
concept. We can use loops to make the program generic.

a.c
Tinclude Pstdio.hM
int i9
struct D?8.I0DS0C=?%<DV0.-0; Eima!esectionhdr9
struct stat st9
char EpKEp1KEp2KEp&KEp/KEp19
char Epe9
main()
{
3?0 E'p9
p1 ( p9
p2 ( peL/9
ima!e'ilehdr ( p29
print'("W$Rn"Kima!eoptionalhdrAM.ddress%'0ntryGoint)9
p/ ( p1 L 112 L ima!eoptionalhdrAM.ddress%'0ntryGointW)$2)))9
'or(i()9 iP4 9 iLL)
print'("W$ "K(unsi!ned char)p/6i7)9
print'("Rn")9
}

6utput
>>:e
X >? 8 >8 K8 8 8

243885141.doc 365 od 372
The last program in this series displays the initial bytes of the frst function to be
called. This function begins at RVA 0x227e and we simply jump to that location in
our program and display the value of the initial bytes. An f 25 is a jump instruction
in the Intel Assembler. Thus, a disassembler would convert these bytes to a jump,
followed by the memory location to jump to. The documentation very clearly states
that the op code for ldstr is 0x72 hex. The IL code and metadata is thus stored in a
PE fle within a section. This completes our exploration into PE fles.

-18-

4lossar-
The Windows Operating System runs on the Intel family of microprocessor chips.
The fle containing Intel op codes or assembler will only be able to execute on Intel
machines.

If we could compile our code into an assembler that runs on a hypothetical
microprocessor, then the code will not execute on any machine in the world.
However, if, at the time of execution, we could convert the compiled code from this
hypothetical microprocessor assembler into a form that is suitable for the
microprocessor chip of the machine that the code is running on, we could,
theoretically at least, run the code on any machine, provided such a converter
program was available.

This program would convert the code from our hypothetical chip's assembler into a
form suitable to the actual chip of the machine that the code is to be run on.

This is the concept that Java is based on and the program that does this conversion
is called the Java Virtual Machine or JVM. The assembler so generated is called
byte code. The only faw in Java is that the byte codes represent the Java
programming language very closely. This is so because, it was difcult to get other
programming languages to compile to Java byte codes.

Microsoft learnt from this drawback of the Sun based Java, and frstly created an
assembler that was powerful and generic. This assembler was named IL or
Intermediate Language. Any code, whether written in C#, VB or COBOL, is initially
converted to IL. The Perl compiler also converts code written in the Perl
programming language into IL. This IL code is then converted to assembler of the
chip at run time.

Therefore, if we study IL in depth, we can safely conclude that we will be able to
master the .NET technologies, as everything eventually gets converted to it. If IL
243885141.doc 366 od 372
does not support a feature, then the programming language also cannot support it.
Hence, undoubtedly, the most important language to learn is IL.

For example, the code of the commonly used WriteLine function may have been
written in COBOL, but it can be used in C# since, fnally it will get converted to IL.
Thus, you will now be able to appreciate the signifcance and concept of compiling
source code into an Intermediate Language.

IL directives

IL directives
.method
.entrypoint
.assembly

.class

.module
.subsystem

.corfags

.originator

.hash
.ver

.maxstack

.ctor
.custom
.locals

: creates a function
: entrypoint for program execution
: gives a name to the program, a deployment
unit.
: optional directive, collection of functions and
variables
: a logical entity, it can be a dll or an exe fle
: the operating system on which the executable
runs

: fags unique to a 64 bit computer, 1 - il generated
executable, 4 - library
: identity of the creator, a hash value representing
the public key of the owner
: algorithm used for hashing
: 4 numbers separated by colons: major, minor,
build and revision

: maximim no. of elements that can be placed on
the evaluation stack when a method is being
executed

: constructor of a class
: deals with meta data
: creates local variables for a particular function
243885141.doc 367 od 372

.cctor
.feld
.namespace
.override
.emitbyte

.zeroinit
.property
.get
.set
.backing
.other
.event
.addon
.removeon
.fle

.vtfxup

.data
on the stack
: static constructor
: class variables
: collection or grouping of code
: functions overrides the base class functions
: emits an unsigned 8 bit number directly into
the code section
: initialize the members to default values
: a property directive
: gets the value of the property
: sets the value for the property
: used to state the name of the feld
: other functions associated with the property
: event directive
: add method for event.
: remove method for event.
: manifest resource is in fle <flename> at byte
ofset <int32>.
: declares that at a certain memory location there
is a table that contains MethodDefs which needs
to converted into method pointers. The CLR will
do this conversion automatically.

: creates and initializes a data variable

IL instructions
ilasm
ret
call
ldstr
public
hidebysig

: il assembler
: return from funcion
: calls / executes a function
: load a string on the stack
: accessible to all parts of the code
: functions with similar signatures in parent class
243885141.doc 368 od 372

static

il managed
private
auto
ansi

extends
ldarg.0

specialname
rtspecialname

instance

mscorlib
extern
ildasm
init
newobj
pop
ldc.i4.value

stloc.position

conv
stsfd
stfd
ldc.i4.s

ldloca.s
are not available to the derived class
: belongs to a class, only one copy is created,
must be referenced using typename
: code is managed by il assembler
: access is restricted to the current class
: layout in memory to be decided at runtime
: refers to ansi character set for smooth transition
from managed to unmanaged code
: a class deriving from base class
: loads the this pointer or the value in the 0th
parameter on the execution stack
: attribute signifes that the function is special
: attribute signifes that the function is special
and is to be treated in a special manner at
runtime
: a normal function which is always associated
with the class
: library name that contains code for .Net function
: references code from other assemblies
: il disassembler
: initializes variables to default initialization value
: creates a new object in memory
: removes values of the stack
: load constant numeric value on the stack (i4 -
four bytes of memory)
: stores the value from the stack at the location
identifed by position
: convert value to ft the data type specifed
: store value in static feld
: store value in feld
: load the constant value following it on the stack
(in short form)
: load the address of the local variable on the
stack (in short form)
243885141.doc 369 od 372

box

ldsfda
ldloc.0
starg
ldarg
ldarga.s
ldarg.1
ldfd
br.s
ldsfd
brfalse.s
brtrue.s
add

cgt

ceq
ble.s
bne.un.s
stind.i4

virtual

newslot

callvirt
castclass
abstract

: converts value type variables to reference type
variables
: loads the static feld address on the stack
: load the local variable value on the stach
: stores/ changes argument value on the stack
: load argument on the stack
: load an argument address
: load argument onto the stack (the second one)
: load feld of an object
: branch to target (1 byte)
: load the value of the static feld on the stack
: branch to label when false
: branch to label when true
: adds loaded value2 to loaded value1 and pushes
the result on the stack
: compares the value 1 pushed frst on the stack
to be greater than value2

: compares the two values for equality
: branch on less than or equal to
: branch on not equal or unordered
: store value indirect from stack (top value on
the stack is placed in the second block holding
the address)

: base class functions can be overridden by the
derived class
: the virtual function is treated as a new function
and detaches itself from a similar function in
the base class
: call a virtual function
: cast the value to the class following it
: class that cannot be derived from
: can be initialized only once and read from, no
modifcations allowed
243885141.doc 370 od 372
initonly

bge.s
clt
xor
dup
sizeof
localloc
serializable
ldtoken

isinst

mul
mul.ovf
ldlen
switch
not
rem
or
and
ldind
unbox
ldnull
interface
abstract
fnal
fnalize
implements
explicit

: branch on greater than or equal to
: compare less than
: bitwise XOR
: duplicate the top value of the stack
: returns the size in terms of bytes in memory
: allocate space in the local dynamic memory
pool
: data can be written to disk or sent over network
: load the runtime representation of a metadata
token
: test if an object is an instance of a class or
interface, returning NULL or an instance of that
class or interface
: multiply values
: overfow check while multiplying two numbers
: load the length of an array
: checks for values and branches accordingly
: bitwise complement
: compute remainder
: bitwise OR
: bitwise AND
: load value indirect onto the stack
: convert boxed value type to its raw form
: load a null pointer
: class with function prototypes
: class with no code
: the method cannot be overridden
: destroys the instance members and the object
: class implementing functions of the interfaces
: the layout of the felds depends on programmer's
instruction

: layout is in a sequential order
243885141.doc 371 od 372
sequential
initobj
value
ldobj
stobj
ldfda
ldsfda
ldftn
calli

jmp
jmpi
vararg

arglist
cpobj
bgt
sub
tail.
extends
leave.s
throw
try - catch

fnally-endfnally

sealed
nested
rethrow
endflter
runtime managed
synchronized
: initializes a value type
: value class - a structure
: copy value type to the stack
: store a value type from the stack into memory
: load feld address
: load static feld address
: load a method pointer on the stack
: call method indicated on the stack with
arguments
: jump to method
: jump using method pointer on the stack
: vararg is a calling convention that lets us pass
multiple number of parameters to a function
: get argument list
: copy a value type
: branch on greater than
: subtract numeric values
: subsequent call terminates current method
: a class derives from the base class
: unconditionally transfers control to target
: throw an exception
: block of statements where exceptions can be
thrown and caught
: code that gets called inspite of exceptions
being thrown
: class cannot be derived from
: class within class
: rethrows the current exception
: end flter clause of SEH
: the runtime generates code for the function
: code gets executed on completion of the earlier
code
243885141.doc 372 od 372

newarr
ldlen
ldelem.i4
stelem.i4
Stelem.ref

ldelem.ref

fxed
ldvirtftn
ldnull
arglist
optil

pinvokeimpl

CLR
: create a zero-based, one-dimensional array
: load the length of an array
: load an element of an array
: store an element of an array
: implicitly casts value to the element type of
array
: load the element at index, an object, onto the
top of the stack as an O
: fxes the array reference in memory
: load a virtual functon
: load null value

: multiple arguments to a function

: optimised IL code

: invokes a function from the dll

: Common Language Runtime

Intermediate Language

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Intermediate Language

Hochgeladen von

Copyright:

Verfügbare Formate

243885141.

Das könnte Ihnen auch gefallen