首页 > 解决方案 > 生成包含问题 ID(作为列)和答案(作为它们的值)的考试评估报告的代码

问题描述

我需要编写代码来生成考试评估报告(给定多个学生的答案及其尝试),其中包括问题 ID(作为列)和答案(作为它们的值)。我需要注意的是,学生(参与者)可以在单个评估和地理标签中回答一个或多个问题。

     /** Input data */
    val inputDf = Seq(
    (1, "Question1Text", "Yes", "abcde1", 0, List("x1", "y1")),
    (2, "Question2Text", "No", "abcde1", 0, List("x1", "y1")),
    (3, "Question3Text", "3", "abcde1", 0, List("x1", "y1")),
    (1, "Question1Text", "No", "abcde2", 0, List("x2", "y2")),
    (2, "Question2Text", "Yes", "abcde2", 0, List("x2", "y2"))
    ).toDF("Qid", "Question", "AnswerText", "ParticipantID", "Assessment", "GeoTag")
    println("Input:")
    inputDf.show(false)

我的解决方案是:

     inputDf
    .groupBy($"ParticipantID")
    .pivot("Question")
    .agg(first($"ParticipantID"))
    .sort($"ParticipantID")

但这就是我得到的:

     Input:
     +---+-------------+----------+-------------+----------+--------+
     |Qid|Question     |AnswerText|ParticipantID|Assessment|GeoTag  |
     +---+-------------+----------+-------------+----------+--------+
     |1  |Question1Text|Yes       |abcde1       |0         |[x1, y1]|
     |2  |Question2Text|No        |abcde1       |0         |[x1, y1]|
     |3  |Question3Text|3         |abcde1       |0         |[x1, y1]|
     |1  |Question1Text|No        |abcde2       |0         |[x2, y2]|
     |2  |Question2Text|Yes       |abcde2       |0         |[x2, y2]|
     +---+-------------+----------+-------------+----------+--------+

     Expected:
     +-------------+----------+--------+-----+-----+-----+
     |ParticipantID|Assessment|GeoTag  |Qid_1|Qid_2|Qid_3|
     +-------------+----------+--------+-----+-----+-----+
     |abcde1       |0         |[x1, y1]|Yes  |No   |3    |
     |abcde2       |0         |[x2, y2]|No   |Yes  |null |
     +-------------+----------+--------+-----+-----+-----+

     Actual:
     +-------------+-------------+-------------+-------------+
     |ParticipantID|Question1Text|Question2Text|Question3Text|
     +-------------+-------------+-------------+-------------+
     |abcde1       |abcde1       |abcde1       |abcde1       |
     |abcde2       |abcde2       |abcde2       |null         |
     +-------------+-------------+-------------+-------------+

标签: scalaapache-spark

解决方案


你应该使用

inputDf
  .groupBy($"ParticipantID",  $"Assessment", $"GeoTag")
  .pivot("Question")
  .agg(first($"AnswerText"))
  .sort($"ParticipantID")

  .show(false)

您可以稍后根据需要重命名该列。

输出:

 +-------------+----------+--------+-------------+-------------+-------------+
|ParticipantID|Assessment|GeoTag  |Question1Text|Question2Text|Question3Text|
+-------------+----------+--------+-------------+-------------+-------------+
|abcde1       |0         |[x1, y1]|Yes          |No           |3            |
|abcde2       |0         |[x2, y2]|No           |Yes          |null         |
+-------------+----------+--------+-------------+-------------+-------------+

推荐阅读