Recently I came across a situation at a client where the developer had used a GROUP BY clause in a query that involved no Aggregation Functions. The columns in the GROUP BY clause matched all of the columns in the SELECT clause. There were 7 columns in total. When I asked the Developer why he’d used GROUP BY instead of DISTINCT he replied “because [GROUP BY] was a more efficient method of eliminating [duplicate rows].” This didn’t sit right with me: why would Microsoft spend $$$ on a clause (DISTINCT) that was no better – or perhaps even worse – than using GROUP BY?
Consider the two queries:
Now look at the Actual Execution Plans:
Notice the difference? That’s right. There is none. They are the same. The SQL optimizer makes no distinction (lol). Of course, as you can see, the GROUP BY clause requires considerably more keystrokes (imagine a query returning over 100 columns).
I’ve tried this at work with a number of different queries involving multiple joins, CTEs and large tables (1m+ rows) and I get the same result: identical execution plans. So, in a nutshell: unless you want to exercise your typing skills, you should reserve your use of GROUP BY to queries using Aggregation Functions and stick with DISTINCT for eliminating duplicate rows.