Difference in performance when consulting with SELECT DISTINCT and GROUP BY?

11

I'm reviewing and learning SQL, there's something I notice that I find curious.

Suppose I have a table called productos and one of its fields is categoria , when I make the following queries I see that the result is the same:

SELECT DISTINCT categoria FROM productos;

and

SELECT categoria FROM productos GROUP BY categoria;

The difference I notice is that with DISTINCT filters me the duplicates and respects the order in which they appear, while the sentence that uses GROUP BY organizes them in alphabetical order. Based on that, can we say that the first sentence runs faster? If so, when handling large volumes of data, would the difference in performance be significant?

    
asked by JMEspiz 20.01.2016 в 21:15
source

5 answers

-1

The DISTINCT function removes duplicate records, the GROUP BY function is implemented to group records.

The DISTINCT function is executed in the following way:

  • Copy all% co_of% values to a temporary table
  • Sort the temporary table
  • Analyze the temporary table, returning each element that is different from the previous one

The business_key function runs as:

  • Look in the full table, store each GROUP BY in a business_key
  • Returns the keys of hashtable

The former optimizes the memory, while the latter optimizes the speed but requires a large amount of memory depending on the number of keys.

Greetings.

    
answered by 20.01.2016 / 21:35
source
3

Although it is clear that both techniques obtain the same final result, not all of them would be considered valid because of the result you want to achieve.

Taking into account the statement that you do the right thing would be to use the DISTINCT , since this applies to the row, instead the GROUP BY was created to work with aggregations such as SUM() , MAX() , AVG() , etc.

The issue of order would not be a problem because with a ORDER BY it would solve the difference.

In these links, although they are in English, the same topic was raised:

answered by 20.01.2016 в 21:36
1

GROUP BY is used most for operations of type: count , sum , etc.

Depending on the number of records in the table (speaking of millions of records), the select (either with distinct or with group by ) will take more or less the same time

If the case is that the table has millions of records (100, 200, 500), sometimes it is best to extract the data you want to group in a temporary table ( select ... insert ) and on the temporary table to execute the distinct or group by . The consultation time is considerably much faster.

    
answered by 20.01.2016 в 21:28
0

The first option only filters the rows as you find them but you have to go through them all to get the result. When you use group by the obtained primary result is reprocessed to order it according to the value of the grouping, in your case, by "category". Without using indexes, the first option is faster. However, if you index the "category" field, then the query with group by is practically as fast. Keep in mind that each alternative is used according to the result you need.

    
answered by 20.01.2016 в 21:23
0

In addition to what Leandro says and as a faithful translation of one of the responses of the link that he himself append, the answer varies between engines but you can have a scope of these two database engines:

RPTA:

No difference (in SQL Server, at least) Both queries use the same execution plan.

link

Maybe there is a difference, if there are subqueries involved:

link

No difference (Oracle style):

link

answered by 21.01.2016 в 21:42