首页 > 解决方案 > Linux 内核编程:“无法使用 fs=KERNEL_DS 在虚拟地址 v 000000003a8ef000 处理内核对用户内存的访问”

问题描述

在 arm64 服务器(linux 内核 4.19.46)上运行 beegfs 客户端时,有时 copy_to_user 会导致内核 oops。系统日志如下:

[Mon Jul 19 18:20:47 2021] Unable to handle kernel access to user memory with fs=KERNEL_DS at virtual address 00000000289de000
[Mon Jul 19 18:20:47 2021] Mem abort info:
[Mon Jul 19 18:20:47 2021]   ESR = 0x9600004e
[Mon Jul 19 18:20:47 2021]   Exception class = DABT (current EL), IL = 32 bits
[Mon Jul 19 18:20:47 2021]   SET = 0, FnV = 0
[Mon Jul 19 18:20:47 2021]   EA = 0, S1PTW = 0
[Mon Jul 19 18:20:47 2021] Data abort info:
[Mon Jul 19 18:20:47 2021]   ISV = 0, ISS = 0x0000004e
[Mon Jul 19 18:20:47 2021]   CM = 0, WnR = 1
[Mon Jul 19 18:20:47 2021] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000c2f1d2c2
[Mon Jul 19 18:20:47 2021] [00000000289de000] pgd=00000180f09ed003, pud=00000180f20f7003, pmd=00e000800f600fd1
[Mon Jul 19 18:20:47 2021] Internal error: Oops: 9600004e [#1] SMP  
[Mon Jul 19 18:20:47 2021] Modules linked in: beegfs(O) zni_net(O) zni_dev(O) knem(O) ip_tables x_tables [last unloaded: beegfs]
[Mon Jul 19 18:20:47 2021] Process IOR-ft (pid: 2125, stack limit = 0x00000000e2d2510d)
[Mon Jul 19 18:20:47 2021] CPU: 1 PID: 2125 Comm: IOR-ft Tainted: G           O      4.19.46-mt+ #354
[Mon Jul 19 18:20:47 2021] Hardware name: M3000 (DT)
[Mon Jul 19 18:20:47 2021] pstate: 00000005 (nzcv daif -PAN -UAO)
[Mon Jul 19 18:20:47 2021] pc : __arch_copy_to_user+0x50/0x160
[Mon Jul 19 18:20:47 2021] lr : copyout+0x54/0x68
[Mon Jul 19 18:20:47 2021] sp : ffff0000104d3780
[Mon Jul 19 18:20:47 2021] x29: ffff0000104d3780 x28: ffff8180f6557c00
[Mon Jul 19 18:20:47 2021] x27: ffff8380c2020008 x26: ffff0000104d38f0
[Mon Jul 19 18:20:47 2021] x25: ffff8180f6557c28 x24: ffff000008b78688
[Mon Jul 19 18:20:47 2021] x23: 0000000000020000 x22: ffff0000104d3900
[Mon Jul 19 18:20:47 2021] x21: 0000000000020000 x20: 0000000000020000
[Mon Jul 19 18:20:47 2021] x19: 0000000000020000 x18: 0000000000000000
[Mon Jul 19 18:20:47 2021] x17: 0000000000000000 x16: 0000000000000000
[Mon Jul 19 18:20:47 2021] x15: 0000000000000400 x14: 0000000000000400
[Mon Jul 19 18:20:47 2021] x13: 00000000000001fe x12: 0000000000000000
[Mon Jul 19 18:20:47 2021] x11: 0000000000000001 x10: 0000000000000930
[Mon Jul 19 18:20:47 2021] x9 : 0000000000000000 x8 : ffff0000104d3c60
[Mon Jul 19 18:20:47 2021] x7 : 0000000000040000 x6 : 00000000289de000
[Mon Jul 19 18:20:47 2021] x5 : 00000000289fe000 x4 : 0000000000000008
[Mon Jul 19 18:20:47 2021] x3 : 0000001b0000001b x2 : 000000000001fff8
[Mon Jul 19 18:20:47 2021] x1 : ffff8380c2000010 x0 : 00000000289de000
[Mon Jul 19 18:20:47 2021] Call trace:
[Mon Jul 19 18:20:47 2021]  __arch_copy_to_user+0x50/0x160
[Mon Jul 19 18:20:47 2021]  _copy_to_iter+0x90/0x3f8
[Mon Jul 19 18:20:47 2021]  __commkit_readfile_receive.isra.4+0x12c/0x180 [beegfs]
[Mon Jul 19 18:20:47 2021]  __commkit_readfile_recvdata+0x98/0x1d0 [beegfs]
[Mon Jul 19 18:20:47 2021]  FhgfsOpsCommkit_communicate+0x318/0xbe0 [beegfs]
[Mon Jul 19 18:20:47 2021]  FhgfsOpsCommKit_readfileV2bCommunicate+0x38/0x58 [beegfs]
[Mon Jul 19 18:20:47 2021]  FhgfsOpsRemoting_readfileVec+0x2e4/0x508 [beegfs]
[Mon Jul 19 18:20:47 2021]  FhgfsOpsRemoting_readfile+0x60/0x88 [beegfs]
[Mon Jul 19 18:20:47 2021]  FhgfsOps_read+0xa8/0x190 [beegfs]
[Mon Jul 19 18:20:47 2021]  __vfs_read+0x30/0x158
[Mon Jul 19 18:20:47 2021]  vfs_read+0x90/0x160
[Mon Jul 19 18:20:47 2021]  ksys_read+0x64/0xd8
[Mon Jul 19 18:20:47 2021]  __arm64_sys_read+0x18/0x20
[Mon Jul 19 18:20:47 2021]  el0_svc_common+0x84/0xf0
[Mon Jul 19 18:20:47 2021]  el0_svc_handler+0x24/0x80
[Mon Jul 19 18:20:47 2021]  el0_svc+0x8/0xc
[Mon Jul 19 18:20:47 2021] Code: b8404423 b80044c3 36180064 f8408423 (f80084c3)
[Mon Jul 19 18:20:47 2021] ---[ end trace daa2f1a08c5b3727 ]---

即使在 vfs 调用的第一个读取 api 中调用了 copy_to_user,它仍然会导致这个内核 oops:

[Thu Jul 22 08:19:49 2021] Unable to handle kernel access to user memory with fs=KERNEL_DS at virtual address 000000003a8ef000
[Thu Jul 22 08:19:49 2021] Mem abort info:
[Thu Jul 22 08:19:49 2021]   ESR = 0x9600004e
[Thu Jul 22 08:19:49 2021]   Exception class = DABT (current EL), IL = 32 bits
[Thu Jul 22 08:19:49 2021]   SET = 0, FnV = 0
[Thu Jul 22 08:19:49 2021]   EA = 0, S1PTW = 0
[Thu Jul 22 08:19:49 2021] Data abort info:
[Thu Jul 22 08:19:49 2021]   ISV = 0, ISS = 0x0000004e
[Thu Jul 22 08:19:49 2021]   CM = 0, WnR = 1
[Thu Jul 22 08:19:49 2021] user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000055c2b623
[Thu Jul 22 08:19:49 2021] [000000003a8ef000] pgd=0000020074421003, pud=00000200002c5003, pmd=00e0020009000fd1
[Thu Jul 22 08:19:49 2021] Internal error: Oops: 9600004e [#1] SMP
[Thu Jul 22 08:19:49 2021] Modules linked in: orcafs(O) zni_net(O) zni_dev(O) knem(O) ip_tables x_tables [last unloaded: orcafs]
[Thu Jul 22 08:19:49 2021] Process IOR-ft (pid: 15237, stack limit = 0x00000000d5fdfa99)
[Thu Jul 22 08:19:49 2021] CPU: 3 PID: 15237 Comm: IOR-ft Tainted: G           O      4.19.46-mt+ #354
[Thu Jul 22 08:19:49 2021] Hardware name: M3000 (DT)
[Thu Jul 22 08:19:49 2021] pstate: 20000005 (nzCv daif -PAN -UAO)
[Thu Jul 22 08:19:49 2021] pc : __arch_copy_to_user+0x110/0x160
[Thu Jul 22 08:19:49 2021] lr : FhgfsOps_access_mem+0x100/0x130 [orcafs]
[Thu Jul 22 08:19:49 2021] sp : ffff0000104b3c50
[Thu Jul 22 08:19:49 2021] x29: ffff0000104b3c50 x28: ffff8380f1784080
[Thu Jul 22 08:19:49 2021] x27: ffff8780f1cc3480 x26: ffff820072784200
[Thu Jul 22 08:19:49 2021] x25: 000000003a8ef000 x24: ffff8780f1cc3480
[Thu Jul 22 08:19:49 2021] x23: 0000ffffffffffff x22: 000000003a8ef000
[Thu Jul 22 08:19:49 2021] x21: ffff000019a39000 x20: 0000000000100000
[Thu Jul 22 08:19:49 2021] x19: ffff8380f1784080 x18: 0000000000000000
[Thu Jul 22 08:19:49 2021] x17: 0000000000000000 x16: 0000000000000000
[Thu Jul 22 08:19:49 2021] x15: 0000000000000000 x14: 0000000000000000
[Thu Jul 22 08:19:49 2021] x13: 0000000000000000 x12: 0000000000000000
[Thu Jul 22 08:19:49 2021] x11: 0000000000000000 x10: 0000000000000000
[Thu Jul 22 08:19:49 2021] x9 : 0000000000000000 x8 : 0000000000000000
[Thu Jul 22 08:19:49 2021] x7 : 0000000000000000 x6 : 000000003a8ef000
[Thu Jul 22 08:19:49 2021] x5 : 000000003a9ef000 x4 : 0000000000000000
[Thu Jul 22 08:19:49 2021] x3 : ffff8400764d79c8 x2 : 00000000000fff80
[Thu Jul 22 08:19:49 2021] x1 : ffff000019a39040 x0 : 000000003a8ef000
[Thu Jul 22 08:19:49 2021] Call trace:
[Thu Jul 22 08:19:49 2021]  __arch_copy_to_user+0x110/0x160
[Thu Jul 22 08:19:49 2021]  FhgfsOps_read+0xa8/0x1c8 [orcafs]
[Thu Jul 22 08:19:49 2021]  __vfs_read+0x30/0x158
[Thu Jul 22 08:19:49 2021]  vfs_read+0x90/0x160
[Thu Jul 22 08:19:49 2021]  ksys_read+0x64/0xd8
[Thu Jul 22 08:19:49 2021]  __arm64_sys_read+0x18/0x20
[Thu Jul 22 08:19:49 2021]  el0_svc_common+0x84/0xf0
[Thu Jul 22 08:19:49 2021]  el0_svc_handler+0x24/0x80
[Thu Jul 22 08:19:49 2021]  el0_svc+0x8/0xc
[Thu Jul 22 08:19:49 2021] Code: a8c12027 a8c12829 a8c1302b a8c1382d (a88120c7)
[Thu Jul 22 08:19:49 2021] ---[ end trace dddd700636b0af2d ]---

我的代码是

void FhgfsOps_access_mem(Logger* log, char __user *buf, size_t size)
{
    const char* logContext = "FhgfsOps_access_mem";
    char *p_buff = NULL;
    unsigned long ret;

    mm_segment_t old_fs = get_fs();

    p_buff = (char *)vzalloc(size);
    if (p_buff == NULL)
    {
        Logger_logFormatted(log, Log_ERR, logContext, "vmalloc %lld failed.", size);
        return ;
    }

    set_fs(KERNEL_DS);
    ret = copy_to_user(buf, p_buff, (unsigned long)size);
    set_fs(old_fs);
    if (ret)
    {
        Logger_logFormatted(log, Log_ERR, logContext, "copy_to_user failed ret = %lld.", ret);
    }

    vfree(p_buff);

    return;
}

ssize_t FhgfsOps_read(struct file* file, char __user *buf, size_t size, loff_t *offsetPointer)
{
    App* app = FhgfsOps_getApp(file_dentry(file)->d_sb);
    Logger* log = App_getLogger(app);
    const char* logContext = "FhgfsOps_read";

    struct inode* inode = file->f_mapping->host;
    FhgfsInode* fhgfsInode = ORCAFS_INODE(inode);
    FsFileInfo* fileInfo = __FhgfsOps_getFileInfo(file);
    RemotingIOInfo ioInfo;
    ssize_t readRes;

    FhgfsOpsHelper_logOpDebug(app, file_dentry(file), inode, __func__, "(offset: %lld; size: %lld)",
    (long long)*offsetPointer, (long long)size);
    IGNORE_UNUSED_VARIABLE(app);

    FsFileInfo_getIOInfo(fileInfo, fhgfsInode, &ioInfo);

    if (app->cfg->tuneCoherentBuffers)
    {
        readRes = filemap_write_and_wait(file->f_mapping);
        if (readRes < 0)
            return readRes;

    // ignore the -EBUSY we could receive here, because there is just *no* way we can keep caches
    // coherent without locking everything all the time. if this produces inconsistent data,                                                                                                              
    // something must have been racy anyway.
        invalidate_inode_pages2(file->f_mapping);
    }

    FhgfsOps_access_mem(log, buf, size);
    ...

我还在 FhgfsOps_access_mem 中测试了 clear_user 和 put_user,它们导致内核 oops 与 copy_to_user 相同。

标签: linux-kernelkernelarm64

解决方案


推荐阅读