Custom date format in Data set:
Basically we receive data from different sources, so usually have different types of date formats.
(may something like dd-MM-yyyy or dd/MM/yyyy etc.,)
When we create a hive table on top of these data, it becomes necessary to convert them into date format which is supported by hive(yyyy-MM-dd).
Example date formats in data sets:
Note:
Hive supports yyyy-MM-dd date format. So output format of all kinds of date should be yyyy-MM-dd.
Table to convert any date format into fixed format i.e yyyy-MM-dd:
Assuming input is in string data type but contains date as value
Input column name: date_in String:
Step 6:Using either cast or to_date, we can change the type to date, on final table
Step 7: Use CTAS to make the final table
Basically we receive data from different sources, so usually have different types of date formats.
(may something like dd-MM-yyyy or dd/MM/yyyy etc.,)
When we create a hive table on top of these data, it becomes necessary to convert them into date format which is supported by hive(yyyy-MM-dd).
Example date formats in data sets:
Note:
Hive supports yyyy-MM-dd date format. So output format of all kinds of date should be yyyy-MM-dd.
Table to convert any date format into fixed format i.e yyyy-MM-dd:
Assuming input is in string data type but contains date as value
Input column name: date_in String:
Input Format
|
code
|
ddMMyyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’ddMMyyyy’)))
|
dd-MM-yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd-MM-yyyy’)))
|
dd/MM/yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd/MM/yyyy’)))
|
dd MM
yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd
MM yyyy’)))
|
dd.MM.yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd.MM.yyyy’)))
|
ddMMMyyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’ddMMMyyyy’)))
|
dd-MMM-yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd-MMM-yyyy’)))
|
dd/MMM/yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd/MMM/yyyy’)))
|
dd MMM
yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd
MMM yyyy’)))
|
dd.MMM.yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd.MMM.yyyy’)))
|
ddMMMMyyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’ddMMMMyyyy’)))
|
dd-MMMM-yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd-MMMM-yyyy’)))
|
dd/MMMM/yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd/MMMM/yyyy’)))
|
dd MMMM
yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd
MMMM yyyy’)))
|
dd.MMMM.yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd.MMMM.yyyy’)))
|
ddMMyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’ddMMyy’)))
|
dd-MM-yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd-MM-yy’)))
|
dd/MM/yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd/MM/yy’)))
|
dd MM
yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd
MM yy’)))
|
dd.MM.yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd.MM.yy’)))
|
ddMMMyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’ddMMMyy’)))
|
dd-MMM-yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd-MMM-yy’)))
|
dd/MMM/yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd/MMM/yy’)))
|
dd MMM
yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd
MMM yy’)))
|
dd.MMM.yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd.MMM.yy’)))
|
ddMMMMyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’ddMMMMyy’)))
|
dd-MMMM-yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd-MMMM-yy’)))
|
dd/MMMM/yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd/MMMM/yy’)))
|
dd MMMM
yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd
MMMM yy’)))
|
dd.MMMM.yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd.MMMM.yy’)))
|
MMddyyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMddyyyy’)))
|
MM-dd-yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MM-dd-yyyy’)))
|
MM/dd/yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MM/dd/yyyy’)))
|
MM dd
yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MM
dd yyyy’)))
|
MM.dd.yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MM.dd.yyyy’)))
|
MMMddyyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMddyyyy’)))
|
MMM-dd-yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMM-dd-yyyy’)))
|
MMM/dd/yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMM/dd/yyyy’)))
|
MMM dd
yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMM
dd yyyy’)))
|
MMM.dd.yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMM.dd.yyyy’)))
|
MMMMddyyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMMddyyyy’)))
|
MMMM-dd-yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMM-dd-yyyy’)))
|
MMMM/dd/yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMM/dd/yyyy’)))
|
MMMM dd
yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMM
dd yyyy’)))
|
MMMM.dd.yyyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMM.dd.yyyy’)))
|
MMddyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMddyy’)))
|
MM-dd-yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MM-dd-yy’)))
|
MM/dd/yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MM/dd/yy’)))
|
MM dd
yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MM
dd yy’)))
|
MM.dd.yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MM.dd.yy’)))
|
MMMddyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMddyy’)))
|
MMM-dd-yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMM-dd-yy’)))
|
MMM/dd/yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMM/dd/yy’)))
|
MMM dd
yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMM
dd yy’)))
|
MMM.dd.yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMM.dd.yy’)))
|
MMMMddyy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMMddyy’)))
|
MMMM-dd-yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMM-dd-yy’)))
|
MMMM/dd/yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMM/dd/yy’)))
|
MMMM dd
yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMM
dd yy’)))
|
MMMM.dd.yy
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMM.dd.yy’)))
|
yyyyMMdd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyyMMdd’)))
|
yyyy-MM-dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy-MM-dd’)))
|
yyyy/MM/dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy/MM/dd’)))
|
yyyy MM
dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy
MM dd ‘)))
|
yyyy.MM.dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy.MM.dd’)))
|
yyyyMMMdd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyyMMMdd’)))
|
yyyy-MMM-dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy-MMM-dd’)))
|
yyyy/MMM/dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy/MMM/dd’)))
|
yyyy
MMM dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy
MMM dd ‘)))
|
yyyy.MMM.dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy.MMM.dd’)))
|
yyyyMMMMdd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyyMMMMdd’)))
|
yyyy-MMMM-dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy-MMMM-dd’)))
|
yyyy/MMMM/dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy/MMMM/dd’)))
|
yyyy
MMMM dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy
MMMM dd ‘)))
|
yyyy.MMMM.dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy.MMMM.dd’)))
|
yyMMdd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyMMdd’)))
|
yy-MM-dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy-MM-dd’)))
|
yy/MM/dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy/MM/dd’)))
|
yy MM
dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy
MM dd ‘)))
|
yy.MM.dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy.MM.dd’)))
|
yyMMMdd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyMMMdd’)))
|
yy-MMM-dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy-MMM-dd’)))
|
yy/MMM/dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy/MMM/dd’)))
|
yy MMM
dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy
MMM dd ‘)))
|
yy.MMM.dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy.MMM.dd’)))
|
yyMMMMdd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyMMMMdd’)))
|
yy-MMMM-dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy-MMMM-dd’)))
|
yy/MMMM/dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy/MMMM/dd’)))
|
yy MMMM
dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy
MMMM dd ‘)))
|
yy.MMMM.dd
|
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy.MMMM.dd’)))
|
Steps to create CUSTOM DATE FORMAT of HDFS data for HIVE
Step 1: Let's assume, below is the dataset resides on HDFS,
Simple Dataset:
1,Mano,05-09-2017
2,Govind,01-07-2017
3,Abu,09-05-2017
4,Priyanka,01-07-2017
5,Bhusan,03-06-2017
Note: As we could know , Hive supports the date data type is in the format of yyyy-MM-dd, but the date in the dataset is in the format of dd-MM-yyyy,
So, to overcome this, use the Built in HIVE functions, that can play a role, (if needed can we create UDF's as well);
Step 2: Let's decide the way to make a table that hold the custom format date
Firstly, We can follow like,
- Creating a Temporary table or stag table that can take the CUSTOM DATE from the dateset as string
- Then using HIVE QL built in functions to format the CUSTOM Date , that resides as string in temporary table to defualt format of date data type like yyyy-MM-dd.
- At final, we can follow the CTAS to make final table, with duplication of formated date in the form of date data type.
Step 3:Creating a Temporary table or stag table that can take the CUSTOM DATE from the dateset as string.
HIVEQL:
CREATE TABLE DATE_TEMP(ID INT,NAME STRING,DATE_IN STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
Step 4: Load data into the Temporary table
hive> LOAD DATA LOCAL INPATH '/home/hadoop/Mano/data.txt' OVERWRITE INTO TABLE DATE_TEMP;
Step 5: So using HIVE QL built in functions to format the CUSTOM Date.
These functions, are usefully,
- unix_timestamp(date[, pattern]) - Converts the time to a number
Example:
hive> select unix_timestamp(date_in,'dd-MM-yyyy') from date_temp;
OK
c0
1504569600
1498867200
1494288000
1498867200
1496448000
Time taken: 0.369 seconds, Fetched: 5 row(s)
- from_unixtime(unix_time, format) - returns unix_time in the specified format
Example:
hive> select from_unixtime(unix_timestamp(date_in,'dd-MM-yyyy'),'yyyy-MM-dd') from date_temp;
OK
c0
2017-09-05
2017-07-01
2017-05-09
2017-07-01
2017-06-03
Time taken: 0.104 seconds, Fetched: 5 row(s)
Step 6:Using either cast or to_date, we can change the type to date, on final table
Example:
hive> select cast(from_unixtime(unix_timestamp(date_in,'dd-MM-yyyy'),'yyyy-MM-dd')as date) from date_temp;
OK
c0
2017-09-05
2017-07-01
2017-05-09
2017-07-01
2017-06-03
Time taken: 0.107 seconds, Fetched: 5 row(s)
Step 7: Use CTAS to make the final table
Example:
hive> CREATE TABLE DATE_FINAL AS SELECT ID,NAME,CAST(FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_IN,'dd-MM-yyyy'),'yyy-MM-dd') AS DATE) FROM DATE_TEMP;
Good blog,Very useful article,Thanks for sharing this information.
ReplyDeleteSpark Scala Online Training
Nice post..
ReplyDeleteBig Data Hadoop Training
I have a table with a field open_date which is of string datatype
ReplyDeleteOpen_date
-----------
25-Mar-2020
2020-10-01
16-sep-2016
13-jun-2003
How to convert all the records to yyyy-MM-dd format
such a infomatic blog
ReplyDeletesap pp online training