Hive(二)常用操作，常用函数

最近看了一遍hive的文档，本文是为了记录文档中将来会可用东西，并非最全的《文档》，望谅解

一：Hive常用操作

1.表信息

    analyze table trandw.dwd_log_app_open_detail_di partition(dt='20220220') compute statistics; 显示计算信息标记别
    analyze table trandw.dwd_log_app_open_detail_di partition(dt='20220220') compute statistics for columns; 显示计算信息 列级别

2.DESC/DESCRIBE

    desc schema ods; 显示库信息
    desc database ods; 显示库信息
    desc extended tablename partition(dt='20220220'); 显示表大小，文件数量
    desc formatted tablename partition(dt='20220220'); （信息更全面）显示出入输出格式，locaton，文件数量、行数、文件大小

3.ALTER TABLE

    alter table table_name rename to new_table_name; 更改表名称,据说会移动文件，但是未测试环境不支持更改表明
    alter table table_name concatenate; 压缩文件
    alter table srcpart archive partition(ds='2008-04-08', hr='12') 归档文件，hdfs小文件处理
    alter table t2 exchange partition (d1 = 1, d2 = 2) with table t1; --move partition from t1 to t2
    alter table table_name set serdeproperties ('field.delim' = ','); 更改表名称
    alter table table_name unset serdeproperties ('field.delim'); 取消更改

4.SHOW

    show tblproperties tblname; 显示表所有属性
    show tblproperties tblname("foo"); 显示表特定属性
    show formatted (index|indexes) on table_with_index [(from|in) db_name]; 显示索引
    show compactions; 显示所有正在压缩的任务
    show functions like '%%'; 显示所有方法
    show columns from foo from db like 'col*'; 展示db.foo 里面所有列
    show partitions tablename ; 显示所有分区

5.INSERT

    insert overwrite directory '/user/facebook/tmp/pv_age_sum'  stored as parquet;  插入目录
    insert overwrite table tablename partition (dt='20220101') select * from table;     插入语句
    insert into table tablename partition (dt='20220101') select * from tablet;

6.LOAD、EXPORT、IMPORT

    load data [local] inpath 'filepath' [overwrite] into table tablename [dt='20220101']
    export table department partition (emp_country="in", emp_state="ka") to 'filepath';
    import table employee partition (emp_country="us", emp_state="tn") from 'filepath';

7.FUNCTION

    create [temporary] function geohash as 'bigdata.aaa' using jar 'hdfs:///var/jars/*.jar';   创建函数
    reload function geohash;     重新加载函数
    drop function if exits geohash;   删除函数

二：Hive常用函数

1.XML解析

    select xpath ('<a><b id="1"><c/></b><b id="2"><c/></b></a>','a/b/@id') ;取xml里面的元素id
    select xpath('<a><b>b1</b><b>b2</b></a>','a/*/text()') from src limit 1 ; 取xml里面的元素内容
    select xpath ('<a><b class="bb">b1</b><b>b2</b><b>b3</b><c class="bb">c1</c><c>c2</c></a>', 'a/*[@class="bb"]/text()') ;取里面class等于bb的结果

2.JSON解析

    select get_json_object(src_json.json, '.owner′)fromsrcjson;selectgetjsonobject(srcjson.json,′.store.fruit\[0]') from src_json;
    select get_json_object(src_json.json, '$.non_exist_key') from src_json;

3.tablesample 取样

    select * from source tablesample(bucket 3 out of 32 on rand()) s; 桶表随机取3个桶
    select * from source tablesample(10 rows); 随机取10行
    select * from source tablesample(bucket 3 out of 32 on rand()) s; 第 3 和第 19 个集群，因为每个桶将由 (32/16)=2 个集群组成 'source' 是用 'clustered by id into 32 buckets' 创建的

    select * from customers order by create_date limit 2,5  ;

官方文档常用函数：https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions

Hive(二)常用操作，常用函数

一：Hive常用操作

1.表信息

2.DESC/DESCRIBE

3.ALTER TABLE

4.SHOW

5.INSERT

6.LOAD、EXPORT、IMPORT

7.FUNCTION

1.XML解析

2.JSON解析

3.tablesample 取样

推荐阅读