You are currently browsing the Philipp Meier's weblog blog archives for May, 2005

Group by logarithmic count in SQL

The count() aggregate function in sql can be uses to get a histogram of the distribution of values in a table. Assume there is a table with customer data and you want to get a distribution of the distance to each customer:

select distance, count(distance) 
from cust\_with\_dist 
group by distance;

This will show the distance group be the exact values, which is not always desirable, f.e. when the dist column contains floating point data, this would yield the following result:

 distance | count
      0.0 |   142
      1.2 |   165
      5.5 |    24
      7.6 |    17
      8.3 |    11
      9.3 |    11
      9.4 |     6

Better would be to arrange the distance column in whole numbers:

select distance, count(distance) 
from (select round(distance) as distance
           from cust\_with\_dist) as foo
 group by distance

This gives:

 distance | count
        0 |   142
        1 |   165
        6 |    24
        8 |    28
        9 |    17

Better for now, but if there is a large scale of values for the distance, e.g. 0 up to 1000km it dosn’t make sense to group the values from 100 to 200 into 1km intervals. A better grouping will be exponential or logarithmic grouping, e.g. into group up to 1, 10, 100, 1000km or 1, 5, 25, 125, 625km. This can be accomplished by transforming the dist column before grouping:

select dist, count(dist) 
from (select round(pow(10,ceil(log(10,distance+0.0001)))) 
           as dist from cust\_with\_dist) as foo
group by dist

The distance is added 0.0001 to prevent log(0) exceptions, by using the ceil() function the data is grouped into intervals “up-to”, the floor() function would give “as least”. The result is

 dist | count
    0 |   142
   10 |   235
  100 |   111
 1000 |     7

If you exchange “10” by “5” you’ll get distances intervals of 0, 5, 125, 625 and so on.


EJB 3.0 Annotations Cheat Sheet

Update: The Annotations Cheat Sheet got it’s own page here

I just made Version 1.0 of my EJB 3.0 Annotations Cheat Sheet – just download it from here as pdf or OpenOffice Document. I’d greatly appreciate any feedback about errors and possible improvements. The sheet is based on the EJB 3.0 early draft 2.

Preview of cheat sheet.


Version 1.1 contains some formatting cleanup and fixes some typos. Thanks to Emmanuel Bernard for pointing out the typos. I checked the EJB 3.0 Peristence edr 2 but it reads that all collections except many-to-many are fetched EAGER by default. The links above were updated accordingly.

Second Update

Fixed broken links.