首页 > 解决方案 > LLVM IR codegen segfaults during exit only when method declarations have parameters

问题描述

Explanation

I am creating a compiler for a C-like language using yacc/bison, flex, and the LLVM toolchain (LLVM 12) using the LLVM C++ API. I have been developing and testing on Ubuntu version 20.04.3 LTS (Focal Fossa) and macOS 11.6 Big Sur. Currently, the issue is the program segfaulting when exiting the program when a method declaration has method parameters such as simply:

func test(x int) void {}

The LLVM IR will be printed properly as

; ModuleID = 'Test'
source_filename = "Test"

define void @test(i32 %x) {
entry:
  %x1 = alloca i32, align 4
  store i32 %x, i32* %x1, align 4
  ret void
}

And will segfault immediately after.

A method declaration like

func test() int {
    var x int;
    x = 5;
    return (x);
}

Will not segfault.

GDB reports that the segfault occurs during llvm::LLVMContextImpl::~LLVMContextImpl(). Valgrind reports ~LLVMContextImpl() doing an invalid read of size 8.

Edit: Valgrind output relating to invalid read

==10254== Invalid read of size 8
==10254==    at 0x5553C30: llvm::LLVMContextImpl::~LLVMContextImpl() (in /usr/lib/x86_64-linux-gnu/libLLVM-12.so.1)
==10254==    by 0x5552130: llvm::LLVMContext::~LLVMContext() (in /usr/lib/x86_64-linux-gnu/libLLVM-12.so.1)
==10254==    by 0xA44AA26: __run_exit_handlers (exit.c:108)
==10254==    by 0xA44ABDF: exit (exit.c:139)
==10254==    by 0xA4280B9: (below main) (libc-start.c:342)
==10254==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==10254== 
==10254== 
==10254== Process terminating with default action of signal 11 (SIGSEGV)
==10254==  Access not within mapped region at address 0x0
==10254==    at 0x5553C30: llvm::LLVMContextImpl::~LLVMContextImpl() (in /usr/lib/x86_64-linux-gnu/libLLVM-12.so.1)
==10254==    by 0x5552130: llvm::LLVMContext::~LLVMContext() (in /usr/lib/x86_64-linux-gnu/libLLVM-12.so.1)
==10254==    by 0xA44AA26: __run_exit_handlers (exit.c:108)
==10254==    by 0xA44ABDF: exit (exit.c:139)
==10254==    by 0xA4280B9: (below main) (libc-start.c:342)
==10254==  If you believe this happened as a result of a stack
==10254==  overflow in your program's main thread (unlikely but
==10254==  possible), you can try to increase the size of the
==10254==  main thread stack using the --main-stacksize= flag.
==10254==  The main thread stack size used in this run was 8388608.

I'm hoping that by asking here I can get some kind of hint for how to work towards solving this issue. I've been stuck on this for days.

Source Code Fragments

The sections of my code relating to method declarations and method parameters are as follow, I apologize for the length:

Bison grammar rule for program

program: extern_list decafpackage
    { 
        ProgramAST *prog = new ProgramAST((decafStmtList*)$1, (PackageAST*)$2); 
        if (printAST) {
            cout << getString(prog) << endl;
        }
        prog->Codegen();
        delete prog;
    }
    ;

Bison grammar rule for method declaration

method_decl: T_FUNC T_ID T_LPAREN params T_RPAREN method_type method_block 
    {
        $$ = new Method(*$2, $6->str(), $4, $7);
        delete $2; 
        delete $6;
    }

Bison grammar rule for method parameter

param: T_ID type { $$ = new VarDef(*$1, $2->str()); delete $1; delete $2; }
    ;

C++ Method::Codegen() handling of parameters

llvm::Function *func = llvm::Function::Create(
            llvm::FunctionType::get(returnTy, args, false),
            llvm::Function::ExternalLinkage,
            name,
            TheModule
        );

llvm::BasicBlock *BB = llvm::BasicBlock::Create(TheContext, "entry", func);
Builder.SetInsertPoint(BB);

. . .

for (auto &Arg : func->args()) {
            llvm::AllocaInst* Alloca = CreateEntryBlockAlloca(func, Arg.getName().str());
            Builder.CreateStore(&Arg, Alloca);
            sTStack->enter_symtbl(Arg.getName().str(), Alloca);
        }

C++ VarDef::Codegen()

llvm::Value *Codegen() {
        llvm::Type* ty = getLLVMType(type);
        llvm::AllocaInst* V = Builder.CreateAlloca(ty, 0, name);
        V->setName(name);
        sTStack->enter_symtbl(name, V);
        return V;
        return nullptr;
    }

Bison main

int main() {
  // Setup
  llvm::LLVMContext &Context = TheContext;
  TheModule = new llvm::Module("Test", Context);
  FPM = std::make_unique<llvm::legacy::FunctionPassManager>(TheModule);
  FPM->add(llvm::createPromoteMemoryToRegisterPass());
  FPM->add(llvm::createInstructionCombiningPass());
  FPM->add(llvm::createReassociatePass());
  FPM->add(llvm::createGVNPass());
  FPM->add(llvm::createCFGSimplificationPass());
  FPM->doInitialization();

  int retval = yyparse();

  TheModule->print(llvm::errs(), nullptr);
  return(retval >= 1 ? EXIT_FAILURE : EXIT_SUCCESS);
}

标签: c++segmentation-faultbisonflex-lexerllvm-c++-api

解决方案


Solution:

The problem was in lines of code not included. llvm::Function::Create requires an llvm::FunctionType which can be provided by filling a vector with llvm::Type* objects. I wrote a function to do this:

void getLLVMTypes(vector<llvm::Type*>* v) {
    for (auto* i : stmts) {
        llvm::Type* type = getLLVMType(i->getType());
        ((llvm::Value*)(type))->setName(i->getName()); // Problem
        v->push_back(type);
    }
}

The issue was casting each llvm::Type* object to llvm::Value* and using llvm::Value::setName to set its name. I did this to counter a problem I had earlier with parameter names not being set. I'm not entirely sure what the issue was, I had trouble compiling LLVM from source with debug flags, but it's a gnarly looking line of code and removing it, along with using an alternative way to preserve method parameter names, solved the issue.


推荐阅读