Friday 15 September 2017

Custom date format in Data set

Custom date format in Data set:

Basically we receive data from different sources, so usually have different types of date formats.
(may something like dd-MM-yyyy or dd/MM/yyyy etc.,) 

When we create a hive table on top of these data, it becomes necessary to convert them into date format which is supported by hive(yyyy-MM-dd).

Example date formats in data sets:



Note: 
Hive supports yyyy-MM-dd date format. So output format of all kinds of date should be yyyy-MM-dd. 


Table to convert any date format into fixed format i.e yyyy-MM-dd:

Assuming input is in string data type but contains date as value

Input column name: date_in String:


Input Format
code
ddMMyyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’ddMMyyyy’)))
dd-MM-yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd-MM-yyyy’)))
dd/MM/yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd/MM/yyyy’)))
dd MM yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd MM yyyy’)))
dd.MM.yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd.MM.yyyy’)))
ddMMMyyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’ddMMMyyyy’)))
dd-MMM-yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd-MMM-yyyy’)))
dd/MMM/yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd/MMM/yyyy’)))
dd MMM yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd MMM yyyy’)))
dd.MMM.yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd.MMM.yyyy’)))
ddMMMMyyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’ddMMMMyyyy’)))
dd-MMMM-yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd-MMMM-yyyy’)))
dd/MMMM/yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd/MMMM/yyyy’)))
dd MMMM yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd MMMM yyyy’)))
dd.MMMM.yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd.MMMM.yyyy’)))
ddMMyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’ddMMyy’)))
dd-MM-yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd-MM-yy’)))
dd/MM/yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd/MM/yy’)))
dd MM yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd MM yy’)))
dd.MM.yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd.MM.yy’)))
ddMMMyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’ddMMMyy’)))
dd-MMM-yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd-MMM-yy’)))
dd/MMM/yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd/MMM/yy’)))
dd MMM yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd MMM yy’)))
dd.MMM.yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd.MMM.yy’)))
ddMMMMyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’ddMMMMyy’)))
dd-MMMM-yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd-MMMM-yy’)))
dd/MMMM/yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd/MMMM/yy’)))
dd MMMM yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd MMMM yy’)))
dd.MMMM.yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’dd.MMMM.yy’)))
MMddyyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMddyyyy’)))
MM-dd-yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MM-dd-yyyy’)))
MM/dd/yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MM/dd/yyyy’)))
MM dd yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MM dd yyyy’)))
MM.dd.yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MM.dd.yyyy’)))
MMMddyyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMddyyyy’)))
MMM-dd-yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMM-dd-yyyy’)))
MMM/dd/yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMM/dd/yyyy’)))
MMM dd yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMM dd yyyy’)))
MMM.dd.yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMM.dd.yyyy’)))
MMMMddyyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMMddyyyy’)))
MMMM-dd-yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMM-dd-yyyy’)))
MMMM/dd/yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMM/dd/yyyy’)))
MMMM dd yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMM dd yyyy’)))
MMMM.dd.yyyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMM.dd.yyyy’)))
MMddyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMddyy’)))
MM-dd-yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MM-dd-yy’)))
MM/dd/yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MM/dd/yy’)))
MM dd yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MM dd yy’)))
MM.dd.yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MM.dd.yy’)))
MMMddyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMddyy’)))
MMM-dd-yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMM-dd-yy’)))
MMM/dd/yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMM/dd/yy’)))
MMM dd yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMM dd yy’)))
MMM.dd.yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMM.dd.yy’)))
MMMMddyy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMMddyy’)))
MMMM-dd-yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMM-dd-yy’)))
MMMM/dd/yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMM/dd/yy’)))
MMMM dd yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMM dd yy’)))
MMMM.dd.yy
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’MMMM.dd.yy’)))
yyyyMMdd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyyMMdd’)))
yyyy-MM-dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy-MM-dd’)))
yyyy/MM/dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy/MM/dd’)))
yyyy MM dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy MM dd ‘)))
yyyy.MM.dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy.MM.dd’)))
yyyyMMMdd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyyMMMdd’)))
yyyy-MMM-dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy-MMM-dd’)))
yyyy/MMM/dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy/MMM/dd’)))
yyyy MMM dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy MMM dd ‘)))
yyyy.MMM.dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy.MMM.dd’)))
yyyyMMMMdd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyyMMMMdd’)))
yyyy-MMMM-dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy-MMMM-dd’)))
yyyy/MMMM/dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy/MMMM/dd’)))
yyyy MMMM dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy MMMM dd ‘)))
yyyy.MMMM.dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyyy.MMMM.dd’)))
yyMMdd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyMMdd’)))
yy-MM-dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy-MM-dd’)))
yy/MM/dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy/MM/dd’)))
yy MM dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy MM dd ‘)))
yy.MM.dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy.MM.dd’)))
yyMMMdd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyMMMdd’)))
yy-MMM-dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy-MMM-dd’)))
yy/MMM/dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy/MMM/dd’)))
yy MMM dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy MMM dd ‘)))
yy.MMM.dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy.MMM.dd’)))
yyMMMMdd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yyMMMMdd’)))
yy-MMMM-dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy-MMMM-dd’)))
yy/MMMM/dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy/MMMM/dd’)))
yy MMMM dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy MMMM dd ‘)))
yy.MMMM.dd
to_date(from_unixtime(UNIX_TIMESTAMP(date_in,’yy.MMMM.dd’)))


Steps to create CUSTOM DATE FORMAT of HDFS data for HIVE

Step 1: Let's assume, below is the dataset resides on HDFS,

Simple Dataset:
1,Mano,05-09-2017
2,Govind,01-07-2017
3,Abu,09-05-2017
4,Priyanka,01-07-2017
5,Bhusan,03-06-2017

Note: As we could know , Hive supports the date data type is in the format of yyyy-MM-dd, but the date in the dataset is in the format of dd-MM-yyyy,

So, to overcome this, use the Built in HIVE functions, that can play a role, (if needed can we create UDF's as well);

Step 2: Let's decide the way to make a table that hold the custom format date

Firstly, We can follow like, 
  • Creating a Temporary table or stag table that can take the CUSTOM DATE from the dateset as string
  • Then using HIVE QL built in functions to format the CUSTOM Date , that resides as string in temporary table to defualt format of date data type like yyyy-MM-dd.
  • At final, we can follow the CTAS to make final table, with duplication of formated date in the form of date data type.

Step 3:Creating a Temporary table or stag table that can take the CUSTOM DATE from the dateset as string.

HIVEQL:

CREATE TABLE DATE_TEMP(ID INT,NAME STRING,DATE_IN STRING) ROW FORMAT DELIMITED FIELDS TERMINATED  BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;



Step 4: Load data into the Temporary table

hive> LOAD DATA LOCAL INPATH '/home/hadoop/Mano/data.txt' OVERWRITE INTO TABLE DATE_TEMP;


Step 5: So using HIVE QL built in functions to format the CUSTOM Date.

These functions, are usefully,
  • unix_timestamp(date[, pattern]) - Converts the time to a number
Example:
hive> select unix_timestamp(date_in,'dd-MM-yyyy') from date_temp;
OK
c0
1504569600
1498867200
1494288000
1498867200
1496448000
Time taken: 0.369 seconds, Fetched: 5 row(s)
  • from_unixtime(unix_time, format) - returns unix_time in the specified format
Example: 
hive> select from_unixtime(unix_timestamp(date_in,'dd-MM-yyyy'),'yyyy-MM-dd') from date_temp;
OK
c0
2017-09-05
2017-07-01
2017-05-09
2017-07-01
2017-06-03
Time taken: 0.104 seconds, Fetched: 5 row(s)


Step 6:Using either cast or to_date, we can change the type to date, on final table

Example:
hive> select cast(from_unixtime(unix_timestamp(date_in,'dd-MM-yyyy'),'yyyy-MM-dd')as date) from date_temp;
OK
c0
2017-09-05
2017-07-01
2017-05-09
2017-07-01
2017-06-03
Time taken: 0.107 seconds, Fetched: 5 row(s)


Step 7: Use CTAS to make the final table
Example:
hive> CREATE TABLE DATE_FINAL AS SELECT ID,NAME,CAST(FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_IN,'dd-MM-yyyy'),'yyy-MM-dd') AS DATE) FROM DATE_TEMP;




4 comments:

  1. Good blog,Very useful article,Thanks for sharing this information.
    Spark Scala Online Training

    ReplyDelete
  2. I have a table with a field open_date which is of string datatype

    Open_date
    -----------
    25-Mar-2020
    2020-10-01
    16-sep-2016
    13-jun-2003

    How to convert all the records to yyyy-MM-dd format

    ReplyDelete
  3. such a infomatic blog
    sap pp online training

    ReplyDelete

Fundamentals of Python programming

Fundamentals of Python programming: Following below are the fundamental constructs of Python programming: Python Data types Python...