首页 > 解决方案 > 使用 TRUE/FALSE 标记在 Postgresql 中旋转

问题描述

我想知道如何使用 TRUE/FALSE 值将几个数组值放入列名中。我会给你一个具体的例子:

我所拥有的是重复的行,由于结果不同,最后一列重复:

DATE        ID    Species   Illness         Tag
20180101    001   Dog       Asthma          Mucus
20180101    001   Dog       Asthma          Noisy
20180101    001   Dog       Asthma          Respiratory
20180102    002   Cat       Osteoarthritis  Locomotor
20180102    002   Cat       Osteoarthritis  Limp
...
20180131    003   Bird      Avian Pox       Itchy

我想要得到的是:

DATE        ID    Species   Illness      Mucus  Noisy ... Limp  Itchy 
20180101    001   Dog       Asthma       TRUE   TRUE  ... FALSE FALSE
20180102    002   Cat       Osteoarth.   FALSE  FALSE ... TRUE  FALSE
...
20180131    003   Bird      Avian Pox    FALSE  FALSE ... FALSE TRUE

我仅针对标签的一部分尝试了“交叉表”功能,但它给了我不存在功能的错误:

 select * 
 from crosstab (
   'select c.id, tg."name"  
    FROM taggings t 
    join consultations c
      on c.id=t.taggable_id
    join tags tg 
      on t.tag_id=tg.id
    group by c.id, tg."name"'
 ) as final_result(dermatological BOOLEAN, behaviour BOOLEAN)

顺便提一句。我有大约 350 个标签,所以它不是最佳功能:/

编辑:最后我添加了 tablefunc 扩展,并尝试使用 crosstab(),但出现以下错误:

查询执行失败原因:SQL 错误 [22023]:错误:无效的源数据 SQL 语句详细信息:提供的 SQL 必须返回 3 列:rowid、category 和 values。

我会尝试找到一个解决方案并在这里更新它,但与此同时,如果有人知道如何解决它,请分享:)谢谢!


经过几天的阅读和尝试建议的解决方案,这对我有用:

我所做的是获取 3 个单独的表,然后加入第一个和第三个表以获取我需要的信息,如果标签存在于某个 ID 中,则将标签作为值为 1/0 的列。再编辑一次 => 我实际上并不需要日期,所以我将表格基于咨询 ID。

表 1: 获取您需要按 ID 分组的所有列的表,并获取一个 ID 具有的所有标签。

ID    Species   Age      Illness         Tag
001   Dog        2       Asthma          Mucus
001   Dog        2       Asthma          Noisy
001   Dog        2       Asthma          Respiratory
002   Cat        5       Osteoarthritis  Locomotor
002   Cat        5       Osteoarthritis  Limp
...
003   Bird       1       Avian Pox       Itchy

表 2: 获取将通过所有不同标签列表交叉所有协商的笛卡尔积,并将它们排序以用于 crosstab() 函数。(交叉表函数需要有 3 列;ID、标签和值)

With consultation_tags as
    (here put the query of the TABLE 1),
tag_list as
    (select tags."name"
    from tags
    join taggings t on t.tag_id = tags.id
    join consultations c on c.id = t.taggable_id a
    group by 1), —-> gets the list of all possible tags in the DB 
cartesian_consultations_tags as
    (select consultations_tags.id, tag_list.name,
     case when tag_list.name = consultations_tags.tag_name then 1
     else 0  --> "case" gets the value 1/0 if the tag is present in an ID
     end as tag_exists
    from
    consultations_tags
    cross join 
    tag_list)
select cartesian_consul_tags.id, cartesian_consul_tags.name, 
SUM(cartesian_consul_tags.tag_exists) --> for me, the values were duplicated, and so were tags
from cartesian_consul_tags
group by 1, 2
order by 1, 2

—> 标签的顺序在这里真的很重要,因为你是在交叉表函数中命名列的人;它不会将某些标签转换为列,它只会传输该标签位置的值,因此如果您弄乱了命名顺序,则这些值将无法正确对应。

表 3: 第二个表的交叉表 -> 它以笛卡尔积表为轴,在本例中为表 2。

SELECT * 
FROM crosstab(‘ COPY THE TABLE 2 ‘) --> if you have some conditions like “where species = ‘Dogs’”, you will need to put double apostrophe in the string value —> where species = ‘’Dogs’’
AS ct(id int4,”Itchy” int8,
“Limp” int8,
“Locomotor” int8,
“Mucus” int8,
“Noisy” int8) --> your tag list. You can prepare it in excel, so all the tags are in quotation marks and has corresponding datatype. The datatype of the tags has to be the same as the datatype of the “value” in the table 2 

最后,我想要的最终表是加入表 1 和表 3,所以我有我需要的咨询 ID 信息,以及标签列表作为列,如果标签存在于某些咨询中,则值为 0/1。

with table1 as ( Copy the query of table1),
table3 as ( Copy the query of table3)
select *
from table1
join table3 on 
table1.id=table3.id 
order by 1

决赛桌是这样的:

ID    Species   Illness      Mucus  Noisy ... Limp  Itchy 
001   Dog       Asthma       1      1     ... 0     0
002   Cat       Osteoarth.   0      0     ... 1     0
...
003   Bird      Avian Pox    0      0     ... 0     1  

标签: postgresqlpivotcrosstab

解决方案


我做了一点实验,这就是我想出的。

# Reading the data into a table

SELECT * INTO crosstab_test FROM 
(VALUES (20180101,'001','Dog','Asthma','Mucus'),
(20180101,'001','Dog','Asthma','Noisy'),
(20180101,'001','Dog','Asthma','Respiratory'),
(20180102,'002','Cat','Osteoarthritis','Locomotor'),
(20180102,'002','Cat','Osteoarthritis','Limp'),
(20180131, '003', 'Bird', 'Avian Pox','Itchy')) as a (date, id, species, illness, tag);

SELECT DISTINCT date, id, species, illness, mucus, noisy, locomotor, respiratory,  limp, itchy 
FROM 
(SELECT "date", id, species, illness
FROM crosstab_test) a
INNER JOIN             
(SELECT * FROM crosstab(
'SELECT id, tag, ''TRUE'' FROM crosstab_test ORDER BY 1,2,3',
'SELECT DISTINCT tag FROM crosstab_test ORDER BY 1')
as tabelle (id text, Itchy text, Limp text, Locomotor text, Mucus text, Noisy text, Respiratory text)) b
USING(id)
ORDER BY 1;


   date   | id  | species |    illness     | mucus | noisy | locomotor | respiratory | limp | itchy
----------+-----+---------+----------------+-------+-------+-----------+-------------+------+-------
 20180101 | 001 | Dog     | Asthma         | TRUE  | TRUE  |           | TRUE        |      |
 20180102 | 002 | Cat     | Osteoarthritis |       |       | TRUE      |             | TRUE |
 20180131 | 003 | Bird    | Avian Pox      |       |       |           |             |      | TRUE
(3 Zeilen)

如果您不关心列的顺序,您可以这样做SELECT DISTINCT * ...

NULL考虑到您所说的 350 个标签,用 s替换FALSE可能会有点困难。所以我建议离开他们。如果你确实想要它们,你可以做SELECT DISTINCT date, id, species, illness, COALESCE(mucus, 'FALSE'), COALESCE(noisy, 'FALSE'),...

然而,您将不得不吞下的苦果是将所有 350 个标签指定为带有 type textinas the tabelle (id text, Itchy text, Limp text, Locomotor text, Mucus text, Noisy text, Respiratory text)交叉表语句的一部分的列。确保将它们按'SELECT DISTINCT tag FROM crosstab_test ORDER BY 1'交叉表语句中确定的正确顺序排列。

希望这就是你要找的。


推荐阅读