This chapter and the next couple of them will focus on and elicit a simple belief of ours,that if you really want to understand C# code in earnest, then the best way of doing so isby understanding the IL code generated by the C# compiler.So, we shall raise the curtains with a small C# program and then explain the IL codegenerated by the compiler. In doing so, we will be able to kill two birds with one stone:Firstly, we will be able to unravel(解开) the mysteries of IL and secondly, we will obtain a moreintuitive understanding of the C# programming language.We will first show you a .cs file and then a program written in IL by the C# compiler, whoseoutput will be the same as that of the .cs file. The output will be displayed of the IL code.This will enhance our understanding of not only C# but also IL. So, without much ado, letstake the plunge.
The above code is generated by the il disassemblerAfter executing ildasm on the exe file, we studied the IL code generated by the program.Subsequently, we eliminated parts of the code that did not ameliorate our understanding ofIL. This consisted of some comments, directives, functions etc. The remaining IL codepresented is as close to the original as possible.
The advantage of this technique of mastering IL by studying the IL code itself is that, weare learning from the master, i.e. the C# compiler, on how to write decent IL code. Wecannot find a better authority than the C# compiler to enlighten us about IL.The rules for creating a static function abc remain the same as any other function such asMain or vijay. As abc is a static function, we have to use the static modifier in the .method
directive.When we want to call a function, the following information has to be provided in the ordergiven below:• the return data type.• the class name.• the function name to be called.• the data types of the parameters.The same rules also apply when we call the .ctor function from the base class. It ismandatory to write the name of the class before the name of the function. In IL, noassumptions are made about the name of the class. The name defaults to the class we arein while calling the function.Thus, the above program first displays "hi" using the WriteLine function and then calls thestatic function abc. This function too uses the WriteLine function to display "bye".
Static constructors are always called before any other code is executed. In C#, a staticconstructor is merely a function with the same name as a class. In IL, the name of thefunction changes to .cctor. Thus, you may have observed that in the earlier example, wegot a free function called ctor.Whenever we have a class with no constructors, a free constructor with no parameters iscreated. This free constructor is given the name .ctor. This knowledge should enhance ourability as C# programmers, as we are now in a better position to comprehend as to whatgoes on below the hood.The static function gets called first and the function with the entrypoint directive getscalled thereafter.
The keyword new in C# gets converted to the assembler instruction newobj. This providesevidence that IL is not a low level assembler, and that it can also create objects in memory.The instruction newobj creates a new object in memory. Even in IL, we are shielded fromwhat new or newobj really does. This demonstrates that IL is not just another high levellanguage, but is designed in such a way that other modern languages can be compiled toit.The rules for using newobj are the same as that for calling a function. The full prototype ofthe function name is required. In this case, we are calling the constructor without anyparameters, hence the function .ctor is called. In the constructor, the WriteLine function iscalled.As we had promised earlier, we are going to explain the instruction ldarg.0 here. Wheneverwe create an object that is an instance of a class, it contains two basic entities:• functions• fields or variables i.e. data.When a function gets called, it does not know or care as to where it is being called from orwho is calling it. It receives all its parameters off the stack. There is no point in having twocopies of a function in memory. This is because, if a class contains a megabyte of code,each time we say 'new' on it, an additional megabyte of memory will be occupied.When new is called for the first time, memory gets allocated for the code and the variables.But thereafter, with every call on new, fresh memory is allocated only for the variables.Thus, if we have five instances of a class, there will be only one copy of the code, but fiveseparate copies of the variables.Every non-static or instance function is passed a handle which indicates the location of thevariables of the object that has called this function. This handle is called the this pointer.'this' is represented by ldarg.0. This handle is always passed as the first parameter to everyinstance function. Since it is always passed by default, it is not mentioned in the
parameter list of a function.All the action takes place on the stack. The instruction pop removes whatever is on the topof the stack. In this example, we use it to remove the instance of zzz that has been placedon top of the stack by the newobj instruction.
The static constructor always gets called first whereas the instance constructor gets calledonly after new. IL enforces this sequence of execution. The calling of the base classconstructor is not mandatory. Hence, to save space in our book, we have not shown itscode in all the programs.In some cases, if we do not include the code of a constructor, the programs do not work.Only in these cases, the code of the constructor has been included. The static constructordoes not call the base class constructor, also ‘this’ is not passed to static functions.
We have created two variables called i and j in our function Main in the C# program. Theyare local variables and are created on the stack. On conversion to IL, if you notice, thenames of the variables are lost
The variables get created in IL through the locals directive, which assigns its own names tothe variables, beginning with V_0 and V_1 and so on. The data types are also altered fromint to int32 and from long to int64. The basic types in C# are aliases. They all get convertedto data types that IL understands.The task on hand is to initialize the variable i to a value of 6. This value has to be loadedon the stack or evaluation stack. The instruction to do so is ldc.i4.value. An i4 takes upfour bytes of memory.The value mentioned in the syntax above is the constant that has to be put on the stack.After the value 6 has been loaded on to the stack, we now need to initialize the variable i tothis value. The variable i has been renamed as V_0 and is the first variable in the localsdirective.The instruction stloc.0 takes the value present at the top of the stack i.e. 6 and initializesthe variable V_0 to it. The process of initializing a variable is definitely complicated.The second ldc instruction copies the value of 7 onto the stack. On a 32 bit machine,memory can only be allocated in chunks of 32 bytes. In the same vein, on a 64 bitmachine, the memory is allocated in chunks of 64 bytes.The number 7 is stored as a constant and requires only 4 bytes, but a long requires 8bytes. Thus, we need to convert the 4 bytes to 8 bytes. The instruction conv.i8 is used forthis purpose. It places a 8 byte number on the stack. Only after doing so, we use stloc.1 toinitialize the second variable V_1 to the value of 7. Hence stloc.1Thus, the ldc series is used to place a constant number on the stack and stloc is utilized topick up what is on the stack and initialize a local to that value.
Now you will finally be able to see the light at the end of the tunnel and understand as towhy we wanted you to read this book in the first place.Let us understand the above code, one field at a time. We have created a variable i that isstatic and initialized it to the value of 6. Since the variable i has not been given an accessmodifier, the default value is private. The static modifier of C# is applicable to variables inIL also.The real action begins now. The variable needs to be assigned an initial value. This valuemust be assigned in the static constructor only, because the variable is static. We employldc to place the value 6 on the stack. Note that the locals directive is not used here.To initialize i, we use the instruction stsfld that looks for a value on top of the stack. Thenext parameter to the instruction stsfld is the number of bytes it has to pick up from thestack to initialize the static variable. In this case, the number of bytes specified is 4.The variable name is preceded by the name of the class. This is in contrast to the syntax oflocal variables.For the instance variable j, since its access modifier was public in C#, on conversion to IL,its access modifier is retained as public. Since it is an instance variable, its value getsinitialized in the instance constructor. The instruction used here is stfld and not stsfld.Here we need 8 bytes of the stack.The rest of the code remains the same as before. Thus, we can see that the instructionstloc is used to initialize locals and the instruction stfld is used to initialise fields
The main purpose of the above example is to verify whether the variable is initialized firstor the code contained in a constructor gets called first. The IL output demonstrates verylucidly that, first all the variables get initialized and thereafter, the code in a constructorgets executed.You may have also noticed that the base class constructor gets executed first and then,and only then, does the code that is written in a constructor, get called.This nugget of knowledge is sure to enhance your understanding of C# and IL
We can print a number instead of a string by overloading the WriteLine function
First, we push the value 10 onto the stack using the ldc family. Observe carefully, theinstruction now is ldc.i4.s and then the value of 10. Any instruction takes 4 bytes inmemory, but when followed by .s takes only one byte.Then the C# compiler calls the correct overloaded version of the WriteLine function, whichaccepts an int32 value from the stack.This is similar to printing strings
We shall now delve on how to print a number on the screen.The WriteLine function accepts a string followed by a variable number of objects. The {0}prints the first object after the comma. Even though there is no variable in the C# code, onconversion to IL code, a variable of type int32 is created.The string {0} is loaded on the stack using our trustworthy ldstr. Then, we place the
number that is to be passed as a parameter to the WriteLine function, on the stack. To doso, we use ldc.i4.s which loads the constant value on the stack. After this, we initialize thevariable V_0 to 20 with the stloc.0 instruction. and then ldloca.s loads the address of thelocal varable on the stack.The major roadblock that we experience here is that the WriteLine function accepts a stringfollowed by an object as the next parameter. In this case, the variable is of value type andnot reference type.An int32 is a value type variable whereas the WriteLine function wants a full-fledged objectof a reference type.How do we solve the dilemma of converting a value type into a reference type?As informed earlier, we use the instruction ldloca.s to load the address of the local variableV_0 onto the stack. Thus, our stack contains a string followed by the address of a valuetype variable, V_0.Next, we call an instruction called box. There are only two types of variables in the .NETworld i.e. value types and reference types. Boxing is the method that .NET uses to converta value type variable into a reference type variable.The box instruction takes an unboxed or value type variable and converts it into a boxed orreference type variable. The box instruction needs the address of a value type on the stackand allocates space on the heap for its equivalent reference type.The heap is an area of memory used to store reference types. The values on the stackdisappear at the end of a function, but the heap is available for a much longer duration.Once this space is allocated, the box instruction initializes the instance fields of thereference object. Then, it assigns the memory location in the heap, of this newlyconstructed object to the stack, The box instruction requires a memory location of a localsvariable on the stack.The constant stored on the stack has no physical address. Thus, the variable V_0 iscreated to provide the memory location.This boxed version on the heap is similar to the reference type variable that we are familiarwith. It really does not have any type and thus looks like System.Object. To access itsspecific values, we need to unbox it first. The WriteLine function does this internally.The data type of the parameter that is to be boxed must be the same as that of the variablewhose address has been placed on the stack. We will subsequently explain these details
The above code is used to display the value of a static variable. The .cctor functioninitializes the static variable to a value of 10. Then, the string {0} is stored on the stack.The function ldsldfa loads the address of a static variable of a certain data type on thestack. Then, as usual, box takes over. The explanation regarding the functionality of 'box'given above is relevant here also.
Static variables in IL work in the same way as instance variables. The only difference is inthe fact that they have their own set of instructions. Instructions like box need a memorylocation on the stack without discriminating(有差别的) between static and instance variables.
The only variation that we indulged in from the earlier program is that we have removedthe static constructor. All static variables and instance variables get initialized internally toZERO. Thus, IL does not generate any error. Internally, even before the static constructorgets called, the field i is assigned an initial value of ZERO
We have initialised the local i to a value of 10. This cannot be done in the constructor sincethe variable i has been created on the stack. Then, stloc.0 has been used to assign thevalue of 10 to V_0. Thereafter, ldloc.0 has been ustilised to place the variable V_0 on thestack, so that it is available to the WriteLine function.The Writeline function thereafter displays the value on the screen. A field and a localbehave in a similar manner, except that they use separate sets of instructions.
All local variables have to be initialised, or else, the compiler will generate an unintelligibleerror message. Here, even though we have eliminated the ldc and stloc instructions, noerror is generated at runtime. Instead, a very large number is displayed.The variable V_0 has not been initialised to any value. It was created on the stack andcontained whatever value was available at the memory location assigned to it. On yourmachine, the output will be very different than ours.In a similar situation, the C# compiler will give you an error and not allow you to proceedfurther, because the variable has not been initialized. IL, on the other hand, is a strangekettle of fish. It is much more lenient in its outlook. It does very few error or sanity checkson the source code. This has its drawback, maening, the programmer has to be much moreresponsible and careful while using IL.
In the above example, a static variable has been initialised inside a function and not at thetime of its creation, as seen earlier. The function vijay calls the code present in the staticconstructor.The process given above is the only way to initialize a static or an instance variable.
The above program demonstrates as to how we can call a function with a single parameter.The rules for placing parameters on the stack are similar to those for the WriteLinefunction.Now let us comprehend as to how a function receives parameters from the stack.We begin by stating the data type and parameter name in the function declaration. This issimilar to the workings in C#.Next, we use the instruction ldarga.s to load the address of the parameter i, onto the stack.box will then convert the value type of this objct into object type and finally WriteLinefunction uses these values to display the output on the screen.
In the above example, we have converted an int into an object because, the WriteLinefunction requires the parameter to be of this data type.The only method of achieving this conversion is by using the box instruction. The boxinstruction converts an int into an object.In the function abc, we accept a System.Object and we use the instruction ldarg and notldarga. The reason being, we require the value of the parameter and not its address. Thedot after the name signifies the parameter number. In order to place the values ofparameters on the stack, a new instruction is required.Thus, IL handles locals, fields and parameters with their own set of instructions.
Functions return values. Here, a static function abc has been called. We know from thefunction's signature that it returns an int. Return values are stored on the stack.Thus, the stloc.1 instruction picks up the value on the stack and places it in the local V_1.In this specific case, it is the return value of the function.Newobj is also like a function. It returns an object which, in our case, is an instance of theclass zzz, and puts it on the stack.
The stloc instruction has been used repeatedly to initialize all our local variables. Just torefresh your memory, ldloc does the reverse of this process.A function has to just place a value on the stack using the trustworthy ldc and then ceaseexecution using the ret instruction.Thus, the stack has a dual role to play:• It is used to place values on the stack.• It receives the return values of the functions
The only innovation and novelty that has been introduced in the above example is that thereturn value of the function abc has been stored in an instance variable.• Stloc assigns the value on the stack to a local variable.• Ldloc, on the other hand, places the value of a local variable on the stack.It is not understood as to why the object that looks like zzz has to be put on the stackagain, especially since abc is a static function and not an instance function. Mind you,static functions are not passed the this pointer on the stack.Thereafter, the function abc is called, which places the value 20 on the stack. Theinstruction stfld picks up the value 20 from the stack, and initializes the instance variable iwith this value.Local and instance variables are handled in a similar manner except that, the instructionsfor their initialization are different.The instruction ldfld does the reverse of what stfld does. It places the value of an instancevariable on the stack to make it available for the WriteLine function.