sas - SAS中的列而不是行中的日期 - 优点?
问题描述
我的客户中很少有人在列中使用 SAS 存储日期。
例如:
| Id | Variable1_201101 | Variable1_201102 | ... | Variable1_201909 | Variable2_201101 | Variable2_201102 | ... |
等等
而不是在行中存储日期:
| Id | Date | Variable1 | Variable2 |
结果,它们有大量的单元格,因为即使某些 ID 在特定日期不存在,第一个结构中也会有空单元格,而在第二个结构中,该行将被省略。
我从未在 SQL 中遇到过这样的存储结构,这不是完美的解决方案。SAS中这种结构有什么优势吗?
解决方案
There is never a perfect storage structure. There are superior structures for solutions to problems at hand. Sometimes you have to reshape data for a particular solution, sometimes a procedure has grammar or mechanisms for reshaping within the procedure itself.
For example, examining a variable in different time frames in The TTEST Procedure
might use a PAIRED
statement and require different variables for the values. Thus the comparing Jan-2011 values to Jan-2012 values would make sense to have structure with Variable1_201101
Variable1_201201
.
Disk space for sparse wide data can be reduced effectively using COMPRESS=
options, at the cost of decompression CPU cycles. Depending on the data it can be significantly less disk use, but then is hard to deal with in alternate categorical analysis.
Traditional RDBMS has the categorical form (vertical) as a very common best practice, with indexing and foreign keys. If this is the original layout, you might need to pivot or reshape the data for a particular TTEST analysis.
Dealing with data found in a NOSQL data store you might end up more often encountering the horizontal form (because underlayment handles sparseness better).
推荐阅读
- jquery - Jquery在几个元素上的相同功能
- angular - 如何将 NGXS 与路由解析器一起使用?
- django - Django Web 应用在 Azure App Service Linux 上成功部署,但只能看到 Azure 默认页面
- mysql - 尝试在 Kubernetes 中创建两个具有相同卷的 MySQL pod 以实现高可用性
- android - 为什么单个片段在一个活动中占据整个屏幕,而其他视图使用约束布局?
- swift - UITableView 在模拟器中运行良好,但在我的物理设备中运行良好
- php - 如何在 php 中使用 curl 在标头中添加 google 身份验证
- mysql - 登录节点js api后无法返回响应
- python - 如何防止网络爬虫的 301 重定向
- c++ - Nemiver 找不到文件 /build/glibc-LK5gWL/glibc-2.23/stdlib/random.c