Histograms (and bar graphs of discrete data) are really really handy when you’re trying to work out how people are using your system. Generating the data for them using SQL is a touch fiddly though. Here’sa simple query to generate some data I’ve just written.
All the problems stem from the niggle that what you are generally interested in is a count of counts eg. how many people have how many rows associated with them.
We have a simple table where a row is added everything somebody “makes friends”, so my graph wants to be a visulisation of the question “How many friends have people added”. We do it in two steps…
First – a simple query to get the raw data. This returns a big list of user id’s and the number of friends they’re added.
select id, count(id) as myCount from relationships group by id
Which runs and returns a row per user, and the number of friends that each user has added. Now to take counts of these counts, which means adding a table identifier to each table and a subquery. The table identifiers allow us to avoid the “Every derived table must have its own alias” error
select one.mycount, count(one.myCount) from (select two.id, count(two.id) as myCount from relationships two group by two.id) as one group by myCount

…and there we go – data you can drop into Google Docs and graph. The one data entry point that is missing is the Zero point eg. how many users don’t have any friends, but we get that data from another table (relationships only contains those who have added relationships).
Tags: data, dataporn, frequency, histogram, mysql, sql
