• Paper: LLM Translation of Compiler Intermediate Representation

    From John R Levine@johnl@taugh.com to comp.compilers on Tue May 12 11:34:57 2026
    From Newsgroup: comp.compilers

    They use an LLM to translate between GCC and LLVM intermediate representation, a
    famously hard task, and claim success even though one table says it's at best 84% correct.

    Abstract

    GCC and LLVM underpin much of modern software infrastructure, relying on distinct Intermediate Representations (IRs) to drive optimizations and code generation. However, the semantic and structural differences between these IRs create significant barriers for cross-toolchain interaction, limiting the reuse of compiler frontends, backends, and optimization pipelines across programming languages and compilation ecosystems. Traditional rule-based translators have attempted to bridge this gap, but their complexity and maintenance cost have hindered practical adoption. In this context, Large Language Models (LLMs) appear to be an emerging technology that offers a data-driven alternative, capable of learning complex mappings between heterogeneous compiler IRs directly
    from sufficiently representative examples. To explore this approach, this paper presents IRIS-14B, a 14-billion-parameter transformer model fine-tuned to translate GIMPLE (as emitted by GCC) to LLVM IR (as emitted by LLVM). The model is trained on paired IRs extracted from C sources and evaluated on the GIMPLE-to-LLVM IR transformation applied to IRs derived from real-world C code and competitive programming problems. To the best of our knowledge, IRIS-14B is the first model trained explicitly for IR-to-IR translation. It outperforms the accuracy of widely used models, including the largest state-of-the-art open models available today, ranging from 13 to 1,000 billion parameters, by up to 44
    percentage points. The proposed transformation supports the integration of LLMs as complementary components within hybrid neuro-symbolic compiler architectures,
    where models such as IRIS-14B act as interoperability layers enabling cross-toolchain workflows without modifying existing compiler passes, while traditional compiler infrastructure continues to perform deterministic compilation and optimization.

    https://arxiv.org/abs/2605.08247

    Regards,
    John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.22a-Linux NewsLink 1.2