15 February 2024
At first glance, it may seem that the translator has only one way of using it: by feeding it C# code, we expect to get equivalent C++ code as output. Indeed, this way is the most common, but far from the only one. Below are other modes provided by the code translation framework and related utilities.
Products are developed by programmers who need to learn the details of the procedure of translating code to other languages and the limitations associated with it. As a result, situations arise when correct from the point of view of C# changes made by product developers break the release process for other languages.
During the development of the project, we have tried several ways to automate the detection of such problems:
Limiting the C# language version requires discipline from C# programmers and is often inconvenient. To get around these limitations, translation can be done in two stages: first, replace the constructs of modern versions of the C# language with supported analogs from past standards, and then proceed directly to translation.
When using a translator based on an outdated parser, lowering can only be done using external tools (for example, utilities written on the basis of Roslyn). On the other hand, translators based on Roslyn perform both stages sequentially, which allows using the same code both when translating code by them and when preparing code for translation by older tools.
This is similar to translating product code but implies somewhat different requirements. When translating a library of tens of millions of lines, it is important, first of all, to follow the behavior of the original C# code as strictly as possible, even at the expense of readability – simpler but different-effects code will have to be debugged longer. On the other hand, examples of using translated code should look as simple as possible, making it clear how to use the code in C++, even if it does not correspond to the behavior of the original examples written in C#.
For example, when creating temporary objects, C# programmers often use the using statement to avoid resource leaks and strictly set the moment of their release, not relying on GC. Strict translation of using gives quite complex C++ code due to the many nuances of the kind "if an exception is thrown in the using
statement block, and Dispose()
also throws an exception, which one ends up in the catching context?". Such code will only mislead the C++ programmer, creating the impression that using the library is difficult, but in fact it is quite enough to have a smart pointer on the stack, which at the right moment deletes the object and frees the resources.
Libraries that provide an API can be documented through XML comments in accordance with C# practices. Transferring comments to C++, for example, in Doxygen format, is not a trivial task. In addition to markup, it is necessary to replace references to types (since in C# full names are written with a dot, in C++ with a pair of colons), their members, and, in the case of using properties, also understand whether it is a getter or a setter. In addition, translate code fragments that are devoid of semantics and may be incomplete.
This task is solved both by the means of the translator itself and by external utilities, for example, by analyzing the generated XML documentation and additionally preparing fragments, such as examples of using methods.
As we can see, a professional framework for code conversion, in addition to high-quality translation of C# code to C++, should be able to determine the translatability of the source code, lower the language version if necessary, translate examples of using the converted libraries and their documentation.