28 December 2024
Our framework, CodePorting.Translator Cs2Cpp, enables the release of libraries developed for the .NET platform in C++. In this article, we will discuss how we managed to reconcile the memory models of these two languages and ensure the correct operation of the translated code in an unmanaged environment.
You will learn about the smart pointers we use and why we had to develop our own implementations for them. Additionally, we will cover the process of preparing C# code for porting from the perspective of object lifetime management, some of the challenges we encountered, and the specific diagnostic methods we have to use in our work.
C# code is executed in a managed environment with garbage collection. For the purposes of translation, this primarily means that a C# programmer, unlike a C++ developer, is freed from the need to return allocated heap memory that is no longer used to the system. This is done by the garbage collector (GC)—a component of the CLR environment that periodically checks which objects are still used in the program and cleans those that no longer have active references.
An active reference is considered to be:
The garbage collector algorithm is usually described as traversing the object graph, starting from references located on the stack and in static fields, and then deleting everything that was not reached in the previous stage. To protect against invalidating the reference graph during the GC operation, a stop-the-world mechanism is implemented. This description may be simplified, but it is sufficient for our purposes.
Unlike smart pointers, the garbage collection approach is free from the problem of cross or cyclic references. If two objects reference each other, possibly through a number of intermediate objects, this does not prevent the GC from deleting them when the entire group, known as an isolation island, no longer has active references. Consequently, C# programmers have no prejudices against linking objects together at any time and in any combinations.
Data types in C# are divided into reference and value types. Instances of value types are always located on the stack, in static memory, or directly in fields of structures and objects. In contrast, instances of reference types are always created on the heap, while only their references (addresses) are stored on the stack, in static memory, and in fields. Value types include structures, primitive arithmetic values, and references. Reference types include classes and, historically, delegates. The only exceptions to this rule are cases of boxing, where a structure or other value type is copied to the heap for use in a specific context.
The moment of object destruction is not defined—the language guarantees only that it will not occur before the last active reference to the object is removed. If in the converted code the object is destroyed immediately upon the deletion of the last active reference, this will not disrupt the program’s operation.
Another important point is related to the support of generic types and methods in C#. C# allows writing generics once and then using them with both reference and value type parameters. As will be shown later, this point proves to be significant.
A few words about how we map C# types to C++ types. Since the CodePorting.Translator Cs2Cpp framework is intended for porting libraries rather than applications, an important requirement for us is to reproduce the API of the original .NET project as accurately as possible. Therefore, we convert C# classes, interfaces, and structures into C++ classes inheriting the corresponding base types.
For example, consider the following code:
interface I1 {}
interface I2 {}
interface I3 : I2 {}
class A {}
class B : A, I1 {}
class C : B, I2 {}
class D : C, I3 {}
class Generic<T> { public T value; }
struct S {}
It will be translated as follows:
class I1 : public virtual System::Object {};
class I2 : public virtual System::Object {};
class I3 : public virtual I2 {};
class A : public virtual System::Object {};
class B : public A, public virtual I1 {};
class C : public B, public virtual I2 {};
class D : public C, public virtual I3 {};
template <typename T> class Generic { public: T value; };
class S : public System::Object {};
The System::Object
class is a system class declared in the translator support library, external to the converted project. Classes and interfaces inherit from it virtually to avoid the diamond problem. Structures in the translated C++ code inherit from System::Object
, whereas in C# they inherit from it through System.ValueType
, but this extra inheritance is removed for optimization purposes. Generic types and methods are translated into template classes and methods, respectively.
Instances of translated types must adhere to the same guarantees provided by C#. Class instances must be created on the heap, and their lifetime must be determined by the lifetime of active references to them. Structure instances must be created on the stack, except in cases of boxing. Delegates are a special and relatively trivial case that falls outside the scope of this article.
We have considered how the memory management model in C# affects the code conversion process to C++. In the next part of the article, we will discuss how the decision was made on the method of managing the lifetime of heap-allocated objects and how this approach was implemented.