In my last post, we have seen the alternative of MINUS/EXCEPT operations in Apache Hive.
One more such operation which Hive does not support is INTERSECT, and in this post, we will see the alternate for the same.
Before we proceed, let’s understand what is Intersect operation in SQL? So, as the name itself explains; Intersect operation returns the intersection of two datasets, In simple words, the INTERSECT operation will return only those rows which will be common to both of the SELECT statements.
There are 2 alternatives for Intersect operation in Hive:
- Using EXISTS
- INNER JOIN
#To get the ID’s available in both T2 and T1 using EXISTS
(SELECT 1 FROM T1
As we can see in the image, the output of the query ran using INTERSECT, and the output of the query ran using EXISTS is the same.
Let’s try to achieve the same result using INNER JOIN
#To get the ID’s available in both T2 and T1 using INNER JOIN
FROM T2 INNER JOIN T1
ON T2.T_id = T1.T_id;
As we can see in the image, the output of all the queries ran using INNER JOIN is the same as that of INTERSECT query output.
Thank You for reading, hope you like it..!! Please like and share.
Would love to connect on LinkedIn: Nandkishor Manjrekar