Big Data: Alternative to Minus/Except operations in Hive

Nandkishor Manjrekar
2 min readJun 10, 2021

Apache hive does not support some SET operation like Minus/Except, Intersect. So what if the user wants to find the difference between datasets in Hive?

First, let’s understand what is Hive? Hive is a data warehouse tool to process structured data on top of Hadoop aka Big Data.

Working with Hive is as same as working with traditional databases, and hive provides parallelism which traditional database doesn’t (Just in case “Why Hive then?” thought )

There are 2 alternatives for Minus/Except operation in Hive:

  1. Using NOT EXISTS
  2. JOINS (As per requirement user can tweak the Joins i.e. Left/Right/Full Outer)

NOT EXISTS:

Consider 2 tables T1 and T2 having ID column in it, Using Minus/Except we can find the difference between these 2 tables i.e. ID’s available in T2 but not available in T1 and vice versa.

Same thing we can achieve in Hive using NOT EXISTS:

#To get the ID’s available in T2 but not in T1 using NOT EXISTS

SELECT T2.ID
FROM T2
WHERE NOT EXISTS
(SELECT 1 FROM T1
WHERE T1.ID=T2.ID)

Demonstration of NOT EXISTS as alternative of MINUS/EXCEPT operator

As we can see in the image, output of the query ran using EXCEPT and output of query ran using NOT EXISTS is same.

JOIN:

Let’s try to achieve the same result using LEFT JOIN

#To get the ID’s available in T2 but not in T1 using LEFT JOIN

SELECT T2.T_id
FROM T2 LEFT JOIN T1
ON T2.T_id = T1.T_id
WHERE T1.T_id IS NULL;

Demonstration of LEFT JOIN as alternative of MINUS/EXCEPT operator

As mentioned earlier, user can tweak the join as per requirement i.e. We can achieve same output using RIGHT JOIN and FULL OUTER JOIN as well.

Demonstration of RIGHT and FULL OUTER JOIN as alternative of MINUS/EXCEPT operator

As we can see in the image, output of all the queries ran using NOT EXISTS, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN is same as that of EXCEPT query output.

Thank You for reading, hope you like it..!! Please like and share.

Would love to connect on LinkedIn: Nandkishor Manjrekar

--

--

Nandkishor Manjrekar

Hello There, I am an IT Professional and BigData enthusiast, love to play with data, currently working as Technical Lead in MNC.