Big Data: Alternative to INTERSECT operations in Hive
In my last post, we have seen the alternative of MINUS/EXCEPT operations in Apache Hive.
One more such operation which Hive does not support is INTERSECT, and in this post, we will see the alternate for the same.
Before we proceed, let’s understand what is Intersect operation in SQL? So, as the name itself explains; Intersect operation returns the intersection of two datasets, In simple words, the INTERSECT operation will return only those rows which will be common to both of the SELECT statements.
There are 2 alternatives for Intersect operation in Hive:
- Using EXISTS
- INNER JOIN
EXISTS:
Let’s
#To get the ID’s available in both T2 and T1 using EXISTS
SELECT T2.ID
FROM T2
WHERE EXISTS
(SELECT 1 FROM T1
WHERE T1.ID=T2.ID)
As we can see in the image, the output of the query ran using INTERSECT, and the output of the query ran using EXISTS is the same.
JOIN:
Let’s try to achieve the same result using INNER JOIN
#To get the ID’s available in both T2 and T1 using INNER JOIN
SELECT T2.T_id
FROM T2 INNER JOIN T1
ON T2.T_id = T1.T_id;
As we can see in the image, the output of all the queries ran using INNER JOIN is the same as that of INTERSECT query output.
Thank You for reading, hope you like it..!! Please like and share.
Would love to connect on LinkedIn: Nandkishor Manjrekar