CLI Back-End and Front-End

Table of Contents

Latest News

2007-07-10

Added CLI front-end

2007-05-14

Roberto Costa steps down as a maintainer, replaced by Andrea Ornstein and Erven Rohou.

2007-01-09

Added documentation about the back-end internal structure.

2006-09-07

Creation of st/cli branch.

Introduction

CLI is a framework that defines a platform independent format for executables and a run-time environment for the execution of applications. The framework has been been standardized by the European Computer Manufacturers Association (ECMA-335) and by the International Organization for Standardization (ISO/IEC 23271:2006). CLI executables are encoded in the Common Intermediate Language (CIL), a stack-based bytecode language. CLI framework is designed to support several programming languages with different abstraction levels, from object-oriented managed languages to low-level languages with no managed execution at all.

The purpose of this project is to develop a GCC back-end that produces CLI-compliant binaries. The initial focus is on C language (more precisely, C99); C++ is likely to be considered in the future, as well as any other language for which there is an interest for a CLI back-end.

To explore the potential of .NET as a deployment file format, in collaboration with HiPEAC, we developped also a CIL front-end (always using GCC).

The original maintainer of the branch is Roberto Costa. Since May 2007, the current maintainers are Andrea Ornstein and Erven Rohou.

The implementation currently resides in the st/cli branch.

Contributing

Check out st/cli branch following the instructions found in the SVN documentation.

Being this a branch, the usual maintainer rules do not apply. The branch is being maintained by Andrea Ornstein and Erven Rohou. Checking-in into the branch is free, provided that the action was coordinated with the branch maintainer and that the usual contribution and testing rules are followed. The branch is still in heavy development and check ins into the mainline are not planned yet.

The CLI back-end

Unlike a typical GCC back-end, CLI back-end stops the compilation flow at the end of the middle-end passes and, without going through any RTL pass, it emits CIL bytecode from GIMPLE representation. As a matter of fact, RTL is not a convenient representation to emit CLI code, while GIMPLE is much more suited for this purpose.

CIL bytecode is much more high-level than a processor machine code. For instance, there is no such a concept of registers or of frame stack; instructions operate on an unbound set of local variables (which closely match the concept of local variables) and on elements on top of an evaluation stack. In addition, CIL bytecode is strongly typed and it requires high-level data type information that is not preserved across RTL.

Target machine model

Like existing GCC back-ends, CLI is truly seen as a target machine and, as such, it follows GCC policy about the organization of the back-end specific files.

Unfortunately, it is not feasible to define a single CLI target machine. The reason is that, in dealing with languages with unmanaged datas like C and C++, the size of pointers of the target machine must be known at compile time. Therefore, separate 32-bit and 64-bit CLI targets are defined, namely cil32 and cil64. CLI binaries compiled for cil32 are not guaranteed to work on 64-bit machines and vice-versa. Current work is focusing on cil32 target, but the differences between the two are minimal.

Being cil32 the target machine, the machine model description is located in files config/cil32/cil32.*. This is an overview of such a description:

CIL simplification pass

Though most GIMPLE tree codes closely match what is representable in CIL, some simply do not. Those codes could still be expressed in CIL bytecodes by a CIL-emission pass; however, it would be much more difficult and complicated to perform the required transformations at CIL emission time (i.e.: those that involve generating new local temporary variables, modifications in the control-flow graph or in types...), than directly on GIMPLE expressions.

Pass simpcil (file config/cil32/tree-simp-cil.c) is in charge of performing such transformations. The input is any code in GIMPLE form; the outcome is still valid GIMPLE, it just contains only constructs for which CIL emission is straightforward. Such a constrained GIMPLE format is referred as "CIL simplified" GIMPLE throughout this documentation.

The pass is currently performed just once, after leaving SSA form and immediately before the CIL emission. This is not a constraint; the only requirement is that the CIL emission is immediately preceded by a run of simpcil. simpcil pass is designed to be idempotent and it is perfectly fine to insert additional previous runs in the compilation flow. Given its current position in the list of passes, simpcil does not yet support SSA form (though planned).

This is a non-exhaustive list of simpcil transformations:

CIL emission pass

Pass cil (file config/cil32/gen-cil.c) receives a CIL-simplified GIMPLE form as input and it produces a CLI assembly file as output. It is the final pass of the compilation flow.

Before the proper emission, cil currently merges GIMPLE expressions in the attempt to eliminate local variables. The elimination of such variables has positive effects on the generated code, both on performance and code size (each of such an useless local variable ends up in an avoidable pair of stloc and ldloc CIL opcodes). The resulting code is no longer in valid GIMPLE form; this is fine because the code stays in this form only within the pass. This is conceptually (perhaps not only conceptually) similar to what done by the out-of-ssa pass; out-of-ssa may even be more powerful in doing this, since it operates in SSA form. It may be interesting to move simpcil pass before out-of-ssa and to avoid any variable elimination in cil. To be evaluated.

Here is an overview of how cil pass handles some of GIMPLE constructs. Many of them are omitted, for which the emission is straightforward.

The CLI front-end

The objective of the project was to create a new GCC frontend able to take a .NET executable as input, and produce optimized native code as output.

This frontend, called gcccil, would allow us to achieve two goals:

Gcccil targets primarily assemblies produced by the GCC CIL back-end since this is more convenient for achieving the first of the goals mentioned above.

Implemented functionality

Since the time available for the project was limited, it was not possible to produce a complete implementation of the standard. However, the subset implemented is enough to correctly compile some medium sized CIL programs produced by the GCC CIL back-end. Basically, everything required to compile assemblies produced by GCC CIL back-end has been implemented and tested.

In particular, the following features have been implemented:

Missing functionality

On the other hand, almost every feature which is not required to compile assemblies produced by GCC CIL back-end has not been implemented yet.

This includes:

The main obstacle impeding the implementation of these features is the lack of a runtime library (similar to libgcj) which is necessary to implement virtual machine services like garbage collection and reflection. Also, the standard class library (CORLIB) needs to be ported for this environment.

Implementation overview

Gcccil does not implement its own CLR metadata parser. Instead, it uses Mono to "parse" the input assembly. That is, Mono is used to load the assembly and parse the metadata and types. The frontend only has to parse the actual CIL code of each method. Mono provides a comprehensive API to allow this.

Once gcccil has loaded the assembly (or assemblies) to be compiled, it builds GCC types for the all the types declared or referenced in the assembly. For this, the assemblies declaring the referenced types may need to be loaded too.

CIL basic types (int32, intptr, float...) are translated to their obvious GCC equivalent. To translate classes and value types, the GCC_RECORD_TYPE tree node is used. There is some support for class inheritance, although it cannot be tested yet.

Generating types with explicit layout and size requires some additional effort. There is already one language supported by GCC which supports types with explicit layout (ADA), however CIL allows the definition of some types which are not directly translatable using the current GCC infrastructure. In particular, CIL allows to define types with "holes". That is, types of a given size that don't have fields defined for part of their storage (or that don't have fields at all). However, the contents of that storage can be accessed using pointer arithmetic and must be preserved. GCC4NET produces these kind of types very frequently. Some optimization passes of GCC don't expect these types, so a different representation has to be used in these cases.

Once the types have been parsed, gcccil parses the CIL code stream for the methods defined in the assembly in order to build GCC GENERIC trees for them. Translating from CIL opcodes to GENERIC trees should be straightforward once the types are built since CIL opcodes are simple instructions that almost always have a direct translation to GENERIC. The hardest part is using the correct type conversions to get the correct CIL semantics in presence of all the optimizations of GCC.

Gcccil cannot compile some methods if they use some unsupported feature. In those cases, those methods can be skipped, allowing the user to provide a native implementation if necessary.

Readings

[1]
ECMA, Common Language Infrastructure (CLI), 4th edition, June 2006.
[2]
John Gough, Compiling for the .NET Common Language Runtime (CLR), Prentice Hall, ISBN 0-13-062296-6.
[3]
Serge Liden, Inside Microsoft .NET IL Assembler, Microsoft Press, ISBN 073561547.