The count() aggregate function in sql can be uses to get a histogram of the distribution of values in a table. Assume there is a table with customer data and you want to get a distribution of the distance to each customer:

```
```select distance, count(distance)
from cust\_with\_dist
group by distance;

This will show the distance group be the exact values, which is not always desirable, f.e. when the dist column contains floating point data, this would yield the following result:

```
``` distance | count
----------+-------
0.0 | 142
1.2 | 165
5.5 | 24
7.6 | 17
8.3 | 11
9.3 | 11
9.4 | 6
...

Better would be to arrange the distance column in whole numbers:

```
```select distance, count(distance)
from (select round(distance) as distance
from cust\_with\_dist) as foo
group by distance

This gives:

```
``` distance | count
----------+-------
0 | 142
1 | 165
6 | 24
8 | 28
9 | 17

Better for now, but if there is a large scale of values for the distance, e.g. 0 up to 1000km it dosn’t make sense to group the values from 100 to 200 into 1km intervals. A better grouping will be exponential or logarithmic grouping, e.g. into group up to 1, 10, 100, 1000km or 1, 5, 25, 125, 625km. This can be accomplished by transforming the dist column before grouping:

```
```select dist, count(dist)
from (select round(pow(10,ceil(log(10,distance+0.0001))))
as dist from cust\_with\_dist) as foo
group by dist

The distance is added 0.0001 to prevent log(0) exceptions, by using the ceil() function the data is grouped into intervals “up-to”, the floor() function would give “as least”. The result is

```
``` dist | count
------+-------
0 | 142
10 | 235
100 | 111
1000 | 7

If you exchange “10” by “5” you’ll get distances intervals of 0, 5, 125, 625 and so on.

{ Comments are closed! }