The Lapidary Lemur

Musings from Brandon Weaver

Aggregate Active Record

Active Record is an extremely powerful abstraction on SQL, but many a Rails programmer tends to forget that that means the entirety of the SQL standard. While it might be common knowledge for some, aggregate queries seem to be missing from the toolkit of a newer rails programmer.

For this we’ll be using a model called Foo with the fields a, b, and c. All of the fields are strings with random words chosen from OSX’s built in wordlist:

db_seed
1
2
3
4
# rails g model foo a b c
words   = IO.readlines('/usr/share/dict/words').flat_map { |w| w.chomp.downcase }
records = 10_000.times.map { |i| {a: words.sample, b: words.sample, c: words.sample} }
Foo.create(records)

Count

How many times have you done this?

1
Model.all.size

The problem with this one is quite simply that it’s retrieving all the records just to get a count. Seem inefficient? It is:

1
2
[3] pry(main)> Foo.all
  Foo Load (33.7ms)  SELECT "foos".* FROM "foos"

Instead, use the count method:

1
2
[4] pry(main)> Foo.count
   (0.2ms)  SELECT COUNT(*) FROM "foos"

Let’s go ahead and blank out the a field for the first thousand or so records. Note that I don’t use first, as that returns an array:

1
2
3
[8] pry(main)> Foo.where('id < 1000').update_all(a: nil)
  SQL (3.5ms)  UPDATE "foos" SET "a" = NULL WHERE (id < 1000)
=> 999

Now how would we get the count of records where a is present? Count takes arguments:

1
2
3
[9] pry(main)> Foo.count(:a)
   (1.5ms)  SELECT COUNT("foos"."a") FROM "foos"
=> 9001

Would you look at that, it’s over 9000!

Group

Let’s say we want to group our records by their length to find out how many words there are for a certain length. Ruby has a built in group_by method:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[28] pry(main)> Foo.all.group_by { |v| v.a.try(:size) }.map { |k,v| [k,v.size] }.to_h
  Foo Load (22.5ms)  SELECT "foos".* FROM "foos"
=> {nil=>999,
 14=>348,
 10=>1246,
 11=>992,
 4=>204,
 8=>1166,
 9=>1262,
 5=>404,
 6=>630,
 7=>856,
 12=>795,
 16=>117,
 13=>576,
 15=>220,
 17=>65,
 3=>51,
 18=>40,
 19=>14,
 2=>4,
 20=>6,
 1=>2,
 22=>2,
 21=>1}

That all should be enough of a trigger to start looking for an aggregate method:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[29] pry(main)> Foo.group('length(a)').count
   (9.7ms)  SELECT COUNT(*) AS count_all, length(a) AS length_a FROM "foos" GROUP BY length(a)
=> {nil=>999,
 1=>2,
 2=>4,
 3=>51,
 4=>204,
 5=>404,
 6=>630,
 7=>856,
 8=>1166,
 9=>1262,
 10=>1246,
 11=>992,
 12=>795,
 13=>576,
 14=>348,
 15=>220,
 16=>117,
 17=>65,
 18=>40,
 19=>14,
 20=>6,
 21=>1,
 22=>2}

SQL functions are perfectly valid in this context, and quite helpful as well. Just using a column name in group, we can group by similar values as well.

Pluck

Pluck doesn’t just get certain columns from a database, it can also be used for SQL functions. Let’s say we want a list of what length of words we have:

1
2
3
[35] pry(main)> Foo.pluck('DISTINCT length(a)')
   (3.3ms)  SELECT DISTINCT length(a) FROM "foos"
=> [nil, 14, 10, 11, 4, 8, 9, 5, 6, 7, 12, 16, 13, 15, 17, 3, 18, 19, 2, 20, 1, 22, 21]

How about the average length of our a column?

1
2
3
[36] pry(main)> Foo.pluck('avg(length(a))')
   (2.2ms)  SELECT avg(length(a)) FROM "foos"
=> [9.574158426841462]

Noted you can use the average(:a) function here as well:

1
2
3
[39] pry(main)> Foo.average('length(a)')
   (2.9ms)  SELECT AVG(length(a)) FROM "foos"
=> #<BigDecimal:7fcf75efd2f0,'0.9574158426 84146E1',27(36)>

…but what you cannot do with average, min, max, and other calculation functions is this useful tidbit:

1
2
3
[40] pry(main)> Foo.pluck('avg(length(a))', 'max(length(a))', 'min(length(a))', 'count(a)')
   (5.4ms)  SELECT avg(length(a)), max(length(a)), min(length(a)), count(a) FROM "foos"
=> [[9.574158426841462, 22, 1, 9001]]

That one, without aggregate functions, is likely to take quite a while indeed.

Calculations

There are some other common functions that may well come in handy if you only happen to need one value:

1
2
3
4
5
6
7
[42] pry(main)> Foo.minimum('length(a)')
   (2.4ms)  SELECT MIN(length(a)) FROM "foos"
=> 1

[43] pry(main)> Foo.maximum('length(a)')
   (2.2ms)  SELECT MAX(length(a)) FROM "foos"
=> 22

Where

Not an aggregate per-se, but using a where clause can still use SQL functions. Say you only want records with an a field longer than 10 characters:

1
2
3
[44] pry(main)> Foo.where('length(a) > 10').count
   (2.1ms)  SELECT COUNT(*) FROM "foos" WHERE (length(a) > 10)
=> 3176

Maybe the count isn’t what you’re after. Perhaps you want the ids instead?:

1
2
3
4
5
6
[45] pry(main)> Foo.where('length(a) > 10').ids
   (5.0ms)  SELECT "foos"."id" FROM "foos" WHERE (length(a) > 10)
=> [1000,
 1002,
 1004,
 # ...

Never underestimate the value of being familiar with basic functions in SQL, as they’ll save your database a lot of headaches.

Finishing up

While a strong knowledge of SQL is not always necessary for Rails development, it will most certainly improve your code and your performance. Not everything has to fit into hash arguments for a where clause.

Comments