SQL Server: Group By vs Distinct


Recently I came across a situation at a client where the developer had used a GROUP BY clause in a query that involved no Aggregation Functions. The columns in the GROUP BY clause matched all of the columns in the SELECT clause. There were 7 columns in total. When I asked the Developer why he’d used GROUP BY instead of DISTINCT he replied “because [GROUP BY] was a more efficient method of eliminating [duplicate rows].” This didn’t sit right with me: why would Microsoft spend $$$ on a clause (DISTINCT) that was no better – or perhaps even worse – than using GROUP BY?

Consider the two queries:

GoupByDistinct1

Now look at the Actual Execution Plans:

GoupByDistinct2

Notice the difference?  That’s right.  There is none.  They are the same.  The SQL optimizer makes no distinction (lol).  Of course, as you can see, the GROUP BY clause requires considerably more keystrokes (imagine a query returning over 100 columns).

I’ve tried this at work with a number of different queries involving multiple joins, CTEs and large tables (1m+ rows) and I get the same result: identical execution plans.  So, in a nutshell: unless you want to exercise your typing skills, you should reserve your use of GROUP BY to queries using Aggregation Functions and stick with DISTINCT for eliminating duplicate rows.

Categories:SQL Server

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: