首页 > 解决方案 > 为什么在输出文件中自动跳过标题

问题描述

我想在跳过数据头的情况下存储我的数据

这是我的猪脚本:

CRE_GM05 = LOAD '$input1' USING  PigStorage(;) AS (MGM_COMPTEUR:chararray,CIA_CD_CRV_CIA:chararray,CIA_DA_EM_CRV:chararray,CIA_CD_CTRL_BLCE:chararray,CIA_IDC_EXTR_RDJ:chararray,CIA_VLR_IDT_CRV_LOQ:chararray,CIA_VLR_REF_CRV:chararray,CIA_NO_SEQ_CRV:chararray,CIA_VLR_LG_ZON_RTG:chararray,CIA_HEU_CIA:chararray,CIA_TM_STP_CRE:chararray,CIA_CD_SI:chararray,CIA_VLR_1:chararray,CIA_DA_ARR_FIC:chararray,CIA_TY_ENR:chararray,CIA_CD_BTE:chararray,CIA_CD_PER:chararray,CIA_CD_EFS:chararray,CIA_CD_ETA_VAL_CRV:chararray,CIA_CD_EVE_CPR:int,CIA_CD_APLI_TDU:chararray,CIA_CD_STE_RTG:chararray,CIA_DA_TT_RTG:chararray,CIA_NO_ENR_RTG:chararray,CIA_DA_VAL_EVE:chararray,T32_001:chararray,TEC_013:chararray,TEC_014:chararray,DAT_001_X:chararray,DAT_002_X:chararray,TEC_001:chararray);
CRE_GM11 = LOAD '$input2' USING  PigStorage(;) AS (MGM_COMPTEUR:chararray,CIA_CD_CRV_CIA:chararray,CIA_DA_EM_CRV:chararray,CIA_CD_CTRL_BLCE:chararray,CIA_IDC_EXTR_RDJ:chararray,CIA_VLR_IDT_CRV_LOQ:chararray,CIA_VLR_REF_CRV:chararray,CIA_NO_SEQ_CRV:chararray,CIA_VLR_LG_ZON_RTG:chararray,CIA_HEU_CIA:chararray,CIA_TM_STP_CRE:chararray,CIA_CD_SI:chararray,CIA_VLR_1:chararray,CIA_DA_ARR_FIC:chararray,CIA_TY_ENR:chararray,CIA_CD_BTE:chararray,CIA_CD_PER:chararray,CIA_CD_EFS:chararray,CIA_CD_ETA_VAL_CRV:chararray,CIA_CD_EVE_CPR:int,CIA_CD_APLI_TDU:chararray,CIA_CD_STE_RTG:chararray,CIA_DA_TT_RTG:chararray,CIA_NO_ENR_RTG:chararray,CIA_DA_VAL_EVE:chararray,DAT_001_X:chararray,DAT_002_X:chararray,D08_001:chararray,PSE_001:chararray,PSE_002:chararray,PSE_003:chararray,RUB_001:chararray,RUB_002:chararray,RUB_003:chararray,RUB_004:chararray,RUB_005:chararray,RUB_006:chararray,RUB_007:chararray,RUB_008:chararray,RUB_009:chararray,RUB_010:chararray,TEC_001:chararray,TEC_002:chararray,TEC_003:chararray,TX_001_VLR:chararray,TX_001_DCM:chararray,D08_004:chararray,D11_004:chararray,RUB_016:chararray,T03_001:chararray);


-- Effectuer une jointure entre les deux tables

JOINED_TABLES = JOIN CRE_GM05 BY TEC_001, CRE_GM11 BY TEC_001;

-- Generer les colonnes 

DATA_GM05 = FOREACH JOINED_TABLES GENERATE 
        CRE_GM05::MGM_COMPTEUR  AS MGM_COMPTEUR,
        CRE_GM05::CIA_CD_CRV_CIA  AS CIA_CD_CRV_CIA,
        CRE_GM05::CIA_DA_EM_CRV   AS CIA_DA_EM_CRV,
        CRE_GM05::CIA_CD_CTRL_BLCE AS CIA_CD_CTRL_BLCE,
        CRE_GM05::CIA_IDC_EXTR_RDJ  AS CIA_IDC_EXTR_RDJ,
        CRE_GM05::CIA_VLR_IDT_CRV_LOQ AS CIA_VLR_IDT_CRV_LOQ,
        CRE_GM05::CIA_VLR_REF_CRV  AS CIA_VLR_REF_CRV,
        CRE_GM05::CIA_VLR_LG_ZON_RTG  AS CIA_VLR_LG_ZON_RTG,
        CRE_GM05::CIA_HEU_CIA AS CIA_HEU_CIA,
        CRE_GM05::CIA_TM_STP_CRE AS CIA_TM_STP_CRE,
        CRE_GM05::CIA_VLR_1 AS CIA_VLR_1,
        CRE_GM05::CIA_DA_ARR_FIC AS CIA_DA_ARR_FIC,
        CRE_GM05::CIA_TY_ENR AS CIA_TY_ENR,
        CRE_GM05::CIA_CD_BTE AS CIA_CD_BTE,
        CRE_GM05::CIA_CD_PER AS CIA_CD_PER,
        CRE_GM05::CIA_CD_EFS AS CIA_CD_EFS,
        CRE_GM05::CIA_CD_ETA_VAL_CRV AS CIA_CD_ETA_VAL_CRV,
        CRE_GM05::CIA_CD_EVE_CPR AS CIA_CD_EVE_CPR,
        CRE_GM05::CIA_CD_APLI_TDU AS CIA_CD_APLI_TDU,
        CRE_GM05::CIA_CD_STE_RTG AS CIA_CD_STE_RTG,
        CRE_GM05::CIA_DA_TT_RTG AS CIA_DA_TT_RTG,
        CRE_GM05::CIA_NO_ENR_RTG AS CIA_NO_ENR_RTG,
        CRE_GM05::CIA_DA_VAL_EVE AS CIA_DA_VAL_EVE,
        CRE_GM05::T32_001 AS T32_001,
        CRE_GM05::TEC_013 AS TEC_013,
        CRE_GM05::TEC_014 AS TEC_014,
        CRE_GM05::DAT_001_X AS DAT_001_X,
        CRE_GM05::DAT_002_X AS DAT_002_X,
        CRE_GM05::TEC_001 AS TEC_001;

STORE DATA_GM05 INTO '$OUTPUT_FILE' USING PigStorage(';');

它返回数据,但我丢失了第一行标题!

请注意,我的 $input1 和 $input2 变量是 csv 文件

我尝试使用 CSVLoader,但它也不起作用。

我需要获取与标题一起存储的输出

标签: hadoopapache-pig

解决方案


默认情况下,在 pig 最终输出中没有标题。此外,将标题添加到最终输出也没有任何意义,因为猪输出中的行序列不固定。

如果要在最终输出中添加标头,请将所有零件文件数据合并到本地文件系统中的文件中,您可以在其中显式添加标头信息,或者使用 hive 表存储此猪脚本的输出。有 HCatlog 存储可以用于相同的。


推荐阅读