首页 > 解决方案 > SAS中的列而不是行中的日期 - 优点?

问题描述

我的客户中很少有人在列中使用 SAS 存储日期。
例如:

| Id | Variable1_201101 | Variable1_201102 | ... | Variable1_201909 | Variable2_201101 | Variable2_201102 | ... |  

等等

而不是在行中存储日期:

| Id | Date | Variable1 | Variable2 |

结果,它们有大量的单元格,因为即使某些 ID 在特定日期不存在,第一个结构中也会有空单元格,而在第二个结构中,该行将被省略。

我从未在 SQL 中遇到过这样的存储结构,这不是完美的解决方案。SAS中这种结构有什么优势吗?

标签: sas

解决方案


There is never a perfect storage structure. There are superior structures for solutions to problems at hand. Sometimes you have to reshape data for a particular solution, sometimes a procedure has grammar or mechanisms for reshaping within the procedure itself.

For example, examining a variable in different time frames in The TTEST Procedure might use a PAIRED statement and require different variables for the values. Thus the comparing Jan-2011 values to Jan-2012 values would make sense to have structure with Variable1_201101 Variable1_201201.

Disk space for sparse wide data can be reduced effectively using COMPRESS= options, at the cost of decompression CPU cycles. Depending on the data it can be significantly less disk use, but then is hard to deal with in alternate categorical analysis.

Traditional RDBMS has the categorical form (vertical) as a very common best practice, with indexing and foreign keys. If this is the original layout, you might need to pivot or reshape the data for a particular TTEST analysis.

Dealing with data found in a NOSQL data store you might end up more often encountering the horizontal form (because underlayment handles sparseness better).


推荐阅读