A basic Java-based User Defined Function (UDF) for Hive. This UDF converts various timestamps(SQL,ISO,.NET formats) into the Hive Timestamp format.
The various formats that it converts are:
The file timestampconv.java uses regex to convert various different timestamp formats, and extracts the relavant information from it. This information is then parsed into the Hive timestamp format.
The file timestampconvTest.java contains the JUNIT test cases which, validate the various formats that this example handles.
Clone or download this project.
From a command-line in the project directory, use the following to build, test, and package the project.
mvn clean package
Once the process completes, a file named hiveudf-1.0-SNAPSHOT.jar can be found in the target directory.
Copy the hiveudf-1.0-SNAPSHOT.jar file to your HDInsight cluster. There are a variety of methods that can be used to do this. See Upload data for Hadoop jobs for information on uploading the file directly to the storage account used by HDInsight.
You can also simply use SCP to upload the data to the head node of a Linux-based cluster, then connect with SSH and use the
hdfs dfs -put command to copy the data to storage for the cluster.
Ideally, you want the file in the default storage for the cluster; this way, it is accessible from all nodes in the cluster and is persisted when you delete the cluster.
The following example assumes the following:
The hiveudf-1.0-SNAPSHOT.jar has been stored into the /example/jars folder in the default storage account for the cluster.
The hiveudf1 table contains a column named a1 that contains a timestamp value that needs to be converted to Hive timestamp format.
add jar wasb:///example/jars/hiveudf-1.0-SNAPSHOT.jar; CREATE TEMPORARY FUNCTION timeconv AS 'com.microsoft.example.timestampconv'; select cast (timeconv(a1,"yyyy-mm-ddthh:mm:ss[.mmm]") as timestamp) from hiveudf1;