linux-kernel - Linux 内核编程:“无法使用 fs=KERNEL_DS 在虚拟地址 v 000000003a8ef000 处理内核对用户内存的访问”
问题描述
在 arm64 服务器(linux 内核 4.19.46)上运行 beegfs 客户端时,有时 copy_to_user 会导致内核 oops。系统日志如下:
[Mon Jul 19 18:20:47 2021] Unable to handle kernel access to user memory with fs=KERNEL_DS at virtual address 00000000289de000
[Mon Jul 19 18:20:47 2021] Mem abort info:
[Mon Jul 19 18:20:47 2021] ESR = 0x9600004e
[Mon Jul 19 18:20:47 2021] Exception class = DABT (current EL), IL = 32 bits
[Mon Jul 19 18:20:47 2021] SET = 0, FnV = 0
[Mon Jul 19 18:20:47 2021] EA = 0, S1PTW = 0
[Mon Jul 19 18:20:47 2021] Data abort info:
[Mon Jul 19 18:20:47 2021] ISV = 0, ISS = 0x0000004e
[Mon Jul 19 18:20:47 2021] CM = 0, WnR = 1
[Mon Jul 19 18:20:47 2021] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000c2f1d2c2
[Mon Jul 19 18:20:47 2021] [00000000289de000] pgd=00000180f09ed003, pud=00000180f20f7003, pmd=00e000800f600fd1
[Mon Jul 19 18:20:47 2021] Internal error: Oops: 9600004e [#1] SMP
[Mon Jul 19 18:20:47 2021] Modules linked in: beegfs(O) zni_net(O) zni_dev(O) knem(O) ip_tables x_tables [last unloaded: beegfs]
[Mon Jul 19 18:20:47 2021] Process IOR-ft (pid: 2125, stack limit = 0x00000000e2d2510d)
[Mon Jul 19 18:20:47 2021] CPU: 1 PID: 2125 Comm: IOR-ft Tainted: G O 4.19.46-mt+ #354
[Mon Jul 19 18:20:47 2021] Hardware name: M3000 (DT)
[Mon Jul 19 18:20:47 2021] pstate: 00000005 (nzcv daif -PAN -UAO)
[Mon Jul 19 18:20:47 2021] pc : __arch_copy_to_user+0x50/0x160
[Mon Jul 19 18:20:47 2021] lr : copyout+0x54/0x68
[Mon Jul 19 18:20:47 2021] sp : ffff0000104d3780
[Mon Jul 19 18:20:47 2021] x29: ffff0000104d3780 x28: ffff8180f6557c00
[Mon Jul 19 18:20:47 2021] x27: ffff8380c2020008 x26: ffff0000104d38f0
[Mon Jul 19 18:20:47 2021] x25: ffff8180f6557c28 x24: ffff000008b78688
[Mon Jul 19 18:20:47 2021] x23: 0000000000020000 x22: ffff0000104d3900
[Mon Jul 19 18:20:47 2021] x21: 0000000000020000 x20: 0000000000020000
[Mon Jul 19 18:20:47 2021] x19: 0000000000020000 x18: 0000000000000000
[Mon Jul 19 18:20:47 2021] x17: 0000000000000000 x16: 0000000000000000
[Mon Jul 19 18:20:47 2021] x15: 0000000000000400 x14: 0000000000000400
[Mon Jul 19 18:20:47 2021] x13: 00000000000001fe x12: 0000000000000000
[Mon Jul 19 18:20:47 2021] x11: 0000000000000001 x10: 0000000000000930
[Mon Jul 19 18:20:47 2021] x9 : 0000000000000000 x8 : ffff0000104d3c60
[Mon Jul 19 18:20:47 2021] x7 : 0000000000040000 x6 : 00000000289de000
[Mon Jul 19 18:20:47 2021] x5 : 00000000289fe000 x4 : 0000000000000008
[Mon Jul 19 18:20:47 2021] x3 : 0000001b0000001b x2 : 000000000001fff8
[Mon Jul 19 18:20:47 2021] x1 : ffff8380c2000010 x0 : 00000000289de000
[Mon Jul 19 18:20:47 2021] Call trace:
[Mon Jul 19 18:20:47 2021] __arch_copy_to_user+0x50/0x160
[Mon Jul 19 18:20:47 2021] _copy_to_iter+0x90/0x3f8
[Mon Jul 19 18:20:47 2021] __commkit_readfile_receive.isra.4+0x12c/0x180 [beegfs]
[Mon Jul 19 18:20:47 2021] __commkit_readfile_recvdata+0x98/0x1d0 [beegfs]
[Mon Jul 19 18:20:47 2021] FhgfsOpsCommkit_communicate+0x318/0xbe0 [beegfs]
[Mon Jul 19 18:20:47 2021] FhgfsOpsCommKit_readfileV2bCommunicate+0x38/0x58 [beegfs]
[Mon Jul 19 18:20:47 2021] FhgfsOpsRemoting_readfileVec+0x2e4/0x508 [beegfs]
[Mon Jul 19 18:20:47 2021] FhgfsOpsRemoting_readfile+0x60/0x88 [beegfs]
[Mon Jul 19 18:20:47 2021] FhgfsOps_read+0xa8/0x190 [beegfs]
[Mon Jul 19 18:20:47 2021] __vfs_read+0x30/0x158
[Mon Jul 19 18:20:47 2021] vfs_read+0x90/0x160
[Mon Jul 19 18:20:47 2021] ksys_read+0x64/0xd8
[Mon Jul 19 18:20:47 2021] __arm64_sys_read+0x18/0x20
[Mon Jul 19 18:20:47 2021] el0_svc_common+0x84/0xf0
[Mon Jul 19 18:20:47 2021] el0_svc_handler+0x24/0x80
[Mon Jul 19 18:20:47 2021] el0_svc+0x8/0xc
[Mon Jul 19 18:20:47 2021] Code: b8404423 b80044c3 36180064 f8408423 (f80084c3)
[Mon Jul 19 18:20:47 2021] ---[ end trace daa2f1a08c5b3727 ]---
即使在 vfs 调用的第一个读取 api 中调用了 copy_to_user,它仍然会导致这个内核 oops:
[Thu Jul 22 08:19:49 2021] Unable to handle kernel access to user memory with fs=KERNEL_DS at virtual address 000000003a8ef000
[Thu Jul 22 08:19:49 2021] Mem abort info:
[Thu Jul 22 08:19:49 2021] ESR = 0x9600004e
[Thu Jul 22 08:19:49 2021] Exception class = DABT (current EL), IL = 32 bits
[Thu Jul 22 08:19:49 2021] SET = 0, FnV = 0
[Thu Jul 22 08:19:49 2021] EA = 0, S1PTW = 0
[Thu Jul 22 08:19:49 2021] Data abort info:
[Thu Jul 22 08:19:49 2021] ISV = 0, ISS = 0x0000004e
[Thu Jul 22 08:19:49 2021] CM = 0, WnR = 1
[Thu Jul 22 08:19:49 2021] user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000055c2b623
[Thu Jul 22 08:19:49 2021] [000000003a8ef000] pgd=0000020074421003, pud=00000200002c5003, pmd=00e0020009000fd1
[Thu Jul 22 08:19:49 2021] Internal error: Oops: 9600004e [#1] SMP
[Thu Jul 22 08:19:49 2021] Modules linked in: orcafs(O) zni_net(O) zni_dev(O) knem(O) ip_tables x_tables [last unloaded: orcafs]
[Thu Jul 22 08:19:49 2021] Process IOR-ft (pid: 15237, stack limit = 0x00000000d5fdfa99)
[Thu Jul 22 08:19:49 2021] CPU: 3 PID: 15237 Comm: IOR-ft Tainted: G O 4.19.46-mt+ #354
[Thu Jul 22 08:19:49 2021] Hardware name: M3000 (DT)
[Thu Jul 22 08:19:49 2021] pstate: 20000005 (nzCv daif -PAN -UAO)
[Thu Jul 22 08:19:49 2021] pc : __arch_copy_to_user+0x110/0x160
[Thu Jul 22 08:19:49 2021] lr : FhgfsOps_access_mem+0x100/0x130 [orcafs]
[Thu Jul 22 08:19:49 2021] sp : ffff0000104b3c50
[Thu Jul 22 08:19:49 2021] x29: ffff0000104b3c50 x28: ffff8380f1784080
[Thu Jul 22 08:19:49 2021] x27: ffff8780f1cc3480 x26: ffff820072784200
[Thu Jul 22 08:19:49 2021] x25: 000000003a8ef000 x24: ffff8780f1cc3480
[Thu Jul 22 08:19:49 2021] x23: 0000ffffffffffff x22: 000000003a8ef000
[Thu Jul 22 08:19:49 2021] x21: ffff000019a39000 x20: 0000000000100000
[Thu Jul 22 08:19:49 2021] x19: ffff8380f1784080 x18: 0000000000000000
[Thu Jul 22 08:19:49 2021] x17: 0000000000000000 x16: 0000000000000000
[Thu Jul 22 08:19:49 2021] x15: 0000000000000000 x14: 0000000000000000
[Thu Jul 22 08:19:49 2021] x13: 0000000000000000 x12: 0000000000000000
[Thu Jul 22 08:19:49 2021] x11: 0000000000000000 x10: 0000000000000000
[Thu Jul 22 08:19:49 2021] x9 : 0000000000000000 x8 : 0000000000000000
[Thu Jul 22 08:19:49 2021] x7 : 0000000000000000 x6 : 000000003a8ef000
[Thu Jul 22 08:19:49 2021] x5 : 000000003a9ef000 x4 : 0000000000000000
[Thu Jul 22 08:19:49 2021] x3 : ffff8400764d79c8 x2 : 00000000000fff80
[Thu Jul 22 08:19:49 2021] x1 : ffff000019a39040 x0 : 000000003a8ef000
[Thu Jul 22 08:19:49 2021] Call trace:
[Thu Jul 22 08:19:49 2021] __arch_copy_to_user+0x110/0x160
[Thu Jul 22 08:19:49 2021] FhgfsOps_read+0xa8/0x1c8 [orcafs]
[Thu Jul 22 08:19:49 2021] __vfs_read+0x30/0x158
[Thu Jul 22 08:19:49 2021] vfs_read+0x90/0x160
[Thu Jul 22 08:19:49 2021] ksys_read+0x64/0xd8
[Thu Jul 22 08:19:49 2021] __arm64_sys_read+0x18/0x20
[Thu Jul 22 08:19:49 2021] el0_svc_common+0x84/0xf0
[Thu Jul 22 08:19:49 2021] el0_svc_handler+0x24/0x80
[Thu Jul 22 08:19:49 2021] el0_svc+0x8/0xc
[Thu Jul 22 08:19:49 2021] Code: a8c12027 a8c12829 a8c1302b a8c1382d (a88120c7)
[Thu Jul 22 08:19:49 2021] ---[ end trace dddd700636b0af2d ]---
我的代码是
void FhgfsOps_access_mem(Logger* log, char __user *buf, size_t size)
{
const char* logContext = "FhgfsOps_access_mem";
char *p_buff = NULL;
unsigned long ret;
mm_segment_t old_fs = get_fs();
p_buff = (char *)vzalloc(size);
if (p_buff == NULL)
{
Logger_logFormatted(log, Log_ERR, logContext, "vmalloc %lld failed.", size);
return ;
}
set_fs(KERNEL_DS);
ret = copy_to_user(buf, p_buff, (unsigned long)size);
set_fs(old_fs);
if (ret)
{
Logger_logFormatted(log, Log_ERR, logContext, "copy_to_user failed ret = %lld.", ret);
}
vfree(p_buff);
return;
}
ssize_t FhgfsOps_read(struct file* file, char __user *buf, size_t size, loff_t *offsetPointer)
{
App* app = FhgfsOps_getApp(file_dentry(file)->d_sb);
Logger* log = App_getLogger(app);
const char* logContext = "FhgfsOps_read";
struct inode* inode = file->f_mapping->host;
FhgfsInode* fhgfsInode = ORCAFS_INODE(inode);
FsFileInfo* fileInfo = __FhgfsOps_getFileInfo(file);
RemotingIOInfo ioInfo;
ssize_t readRes;
FhgfsOpsHelper_logOpDebug(app, file_dentry(file), inode, __func__, "(offset: %lld; size: %lld)",
(long long)*offsetPointer, (long long)size);
IGNORE_UNUSED_VARIABLE(app);
FsFileInfo_getIOInfo(fileInfo, fhgfsInode, &ioInfo);
if (app->cfg->tuneCoherentBuffers)
{
readRes = filemap_write_and_wait(file->f_mapping);
if (readRes < 0)
return readRes;
// ignore the -EBUSY we could receive here, because there is just *no* way we can keep caches
// coherent without locking everything all the time. if this produces inconsistent data,
// something must have been racy anyway.
invalidate_inode_pages2(file->f_mapping);
}
FhgfsOps_access_mem(log, buf, size);
...
我还在 FhgfsOps_access_mem 中测试了 clear_user 和 put_user,它们导致内核 oops 与 copy_to_user 相同。
解决方案
推荐阅读
- python - 如何删除 PyQt5 中的窗口图标?
- ios - 为什么我的字符串格式化程序使参数 0.00 而不是它的实际值
- go - 在 Golang 中 nil 接口值如何工作
- javascript - 如何在 Javascript 中将字节转换为整数?
- google-apps-script - 从网上应用店到 G-Suite Marketplace 的附加组件迁移
- php - 接收从 HTML 到 PHP 的表单信息
- json - TD Ameritrade API 获取头寸
- python - 我可以澄清什么是“列表列表的数据结构”吗?
- python - 本地和全局执行 np.random.seed() 之间的区别/
- javascript - array.includes 的 javascript 奇怪行为