dotNET Framework Essentials 3rd Edition-dotNET Framework Essentials 3rd Edition

2.5 Intermediate Language (IL)

In software engineering, the concept of abstraction is extremely important. We often use abstraction to hide the complexity of system or application services, providing instead a simple interface to the consumer. As long as we can keep the interface the same, we can change the hideous internals, and different consumers can use the same interface.

In language advances, scientists introduced different incarnations of language-abstraction layers, such as p-code and bytecode. Produced by the Pascal-P compiler, p-code is an intermediate language that supports procedural programming. Generated by Java compilers, bytecode is an intermediate language that supports object-oriented programming. Bytecode is a language abstraction that allows Java code to run on different operating platforms, as long as the platforms have a Java Virtual Machine (JVM) to execute bytecode.

Microsoft calls its own language-abstraction layer the Microsoft Intermediate Language (MSIL) or IL, for short. IL is an implementation of the Common Intermediate Language (CIL), a key element of the EMCA CLI specification. Similar to bytecode, IL supports all object-oriented features, including data abstraction, inheritance, polymorphism, and useful concepts such as exceptions and events. In addition to these features, IL supports other concepts, such as properties, fields, and enumeration. Any .NET language may be converted into IL, so .NET supports multiple languages and multiple platforms, as long as the target platforms have a CLR.

Shipped with the .NET SDK, Partition III CIL.doc describes the important IL instructions that language compilers should use. In addition to this specification, the .NET SDK includes another important document, Partition II Metadata.doc. Both of these documents are intended for developers who write compilers and tools, but you should read them to further understand how IL fits into .NET. Although you can develop a valid .NET assembly using the supported IL instructions and features, you'll find IL to be very tedious because the instructions are a bit cryptic. However, should you decide to write pure IL code, you could use the IL Assembler (ilasm.exe) to turn your IL code into a .NET PE file.^[8]

^[8] You can test this utility using the IL disassembler to load a .NET PE file and dump out the IL to a text file. Once you've done this, use the IL Assembler to covert the text file into a .NET PE file.

Enough with the theory: let's take a look at some IL. Here's an excerpt of IL code for the hello.exe program that we wrote earlier:^[9]

^[9] Don't compile this IL code: it's incomplete because we've extracted unclear details to make it easier to read. If you want to see the complete IL code, use ildasm.exe on hello.exe.

.class private auto ansi beforefieldinit MainApp
  extends [mscorlib]System.Object
{
  .method public hidebysig static 
          void Main(  ) cil managed
  {
    .entrypoint
    .maxstack  1
    ldstr "C# hello world!"
    call void [mscorlib]System.Console::WriteLine(string)
    ret
  } // End of method MainApp::Main

  .method public hidebysig specialname rtspecialname 
    instance void .ctor(  ) cil managed
  {
    .maxstack  1
    ldarg.0
    call instance void [mscorlib]System.Object::.ctor(  )
    ret
  } // End of method MainApp::.ctor

} // End of class MainApp

Ignoring the weird-looking syntactic details, you can see that IL is conceptually the same as any other object-oriented language. Clearly, there is a class that is called MainApp that derives from System.Object. This class supports a static method called Main( ), which contains the code to dump out a text string to the console. Although we didn't write a constructor for this class, our C# compiler has added the default constructor for MainApp to support object construction.

Since a lengthy discussion of IL is beyond the scope of this book, let's just concentrate on the Main( ) method to examine its implementation briefly. First, you see the following method signature:

.method public hidebysig static 
        void Main(  ) cil managed

This signature declares a method that is public (meaning that it can be called by anyone) and static (meaning it's a class-level method). The name of this method is Main( ). Main( ) contains IL code that is to be managed or executed by the CLR. The hidebysig attribute says that this method hides the same methods (with the same signatures) defined earlier in the class hierarchy. This is simply the default behavior of most object-oriented languages, such as C++. Having gone over the method signature, let's talk about the method body itself:

{
  .entrypoint
  .maxstack 1
  ldstr "C# hello world!"
  call void [mscorlib]System.Console::WriteLine(string)
  ret
} // End of method MainApp::Main

This method uses two directives: .entrypoint and .maxstack. The .entrypoint directive specifies that Main( ) is the one and only entry point for this assembly. The .maxstack directive specifies the maximum stack slots needed by this method; in this case, the maximum number of stack slots required by Main( ) is one. Stack information is needed for each IL method because IL instructions are stack-based, allowing language compilers to generate IL code easily.

In addition to these directives, this method uses three IL instructions. The first IL instruction, ldstr, loads our literal string onto the stack so that the code in the same block can use it. The next IL instruction, call, invokes the WriteLine( ) method, which picks up the string from the stack. The call IL instruction expects the method's arguments to be on the stack, with the first argument being the first object pushed on the stack, the second argument being the second object pushed onto the stack, and so forth. In addition, when you use the call instruction to invoke a method, you must specify the method's signature. For example, examine the method signature of WriteLine( ):

void [mscorlib]System.Console::WriteLine(string)

and you'll see that WriteLine( ) is a static method of the Console class. The Console class belongs to the System namespace, which happens to be a part of the mscorlib assembly. The WriteLine( ) method takes a string (an alias for System.String) and returns a void. The last thing to note in this IL snippet is that the ret IL instruction simply returns control to the caller.

Since .NET assemblies contain IL code, your proprietary algorithms can be seen by anyone. To protect your intellectual property, use an obfuscator, either the one that comes with Visual Studio .NET or one that is commercially available.

[ Team LiB ]