r - data.table not reading characters appropriately
问题描述
I have the following tibble:
> a
# A tibble: 1 x 1
Page
<chr>
1 勒布朗·詹姆斯_zh.wikipedia.org_desktop_all-agents
> dput(a)
structure(list(Page = "<U+52D2><U+5E03><U+6717>·<U+8A79><U+59C6><U+65AF>_zh.wikipedia.org_desktop_all-agents"), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame"))
when I convert to data.table, the encoding gets wrong:
b <- as.data.table(a)
> b
Page
1: <U+52D2><U+5E03><U+6717>·<U+8A79><U+59C6><U+65AF>_zh.wikipedia.org_desktop_all-agents
I get this dataframe from a .csv file, where these japanese characters only show correctly when I use read_csv. With fread, even if I set encoding = 'UTF-8' it doesn't work. How can I overcome this problem with data.table?
Here is my sessioninfo:
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tibble_3.0.3 readr_1.3.1 data.table_1.13.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 rstudioapi_0.11 knitr_1.29 magrittr_1.5 hms_0.5.3 R6_2.4.1
[7] rlang_0.4.7 fansi_0.4.1 tools_4.0.2 xfun_0.16 tinytex_0.25 utf8_1.1.4
[13] cli_2.0.2 htmltools_0.5.0 ellipsis_0.3.1 yaml_2.2.1 digest_0.6.25 assertthat_0.2.1
[19] lifecycle_0.2.0 crayon_1.3.4 vctrs_0.3.2 glue_1.4.1 evaluate_0.14 rmarkdown_2.3
[25] compiler_4.0.2 pillar_1.4.6 pkgconfig_2.0.3
Update:
If I print the elemente alone, it shows correctly.
> b[[1]]
[1] "勒布朗·詹姆斯_zh.wikipedia.org_desktop_all-agents"
解决方案
推荐阅读
- java - 我正在尝试编写查找和替换 Java 程序
- flowtype - 键入的未密封对象不允许使用新道具
- random - 如何使文本在一定时间后自动更改?
- php - 使用 tld 之后的所有内容重定向 URL
- ios - 能够保持对创建 Timer 的线程的引用吗?
- c++ - 如何在 QT Creator 上将 QWidget 声明为继承类的对象?
- azure-devops - 我们可以在 Azure DevOps 中同时启动 2 个具有相同产品积压的 sprint 吗?
- javascript - 如何添加到对象内的数组
- firebase - 了解 Firestore 中何时发生事件以进行计费
- json - 在颤动中按数字顺序对JSON进行排序