Joining a table to itself in Ruby on Rails 3 when you can’t use the Id

Mar 31, 2011

Ruby on Rails

First of all let me explain that joining a table to itself normally – i.e. using primary and foreign keys, is pretty simple and can be done using the lovely ActiveRecord belongs_to and has_many association. That isn’t what I am talking about here, I am interested in joining to the same table using other fields and conditions.

Why would you want to do this? Well in my case it is the first step in shortlisting customers in the database who have potential duplicates. Now I wouldn’t trust de-duping to be done entirely by Sql but by narrowing down the list it will save a lot of processing time and/or manual checking.

At this point I have to confess that I really hope there is actually a better way of doing this. Perhaps there is but I can’t find it. Writing Sql fragments for the ActiveRecord layer to put together with its generated fragments is a major problem. If somebody does know if this can be done purely through ActiveRecord method calls then please let me know.

Ok, so on to some code.

select_clause = 'DISTINCT customers.*'
from_clause = 'customers, customers as other_customers'
where_clause = "customers.first_name = other_customers.first_name and customers.surname = other_customers.surname"
@customers = Customer.select(select_clause).where(where_clause).from(from_clause).all

So, to simply explain we are doing an inner join between the records in customer table against themselves and selecting records where the first name and surname match. Also, as a match will produce two rows I remove them with the select clause.

What niggles me here is having to use Sql. I don’t particularly love or hate Sql but I feel that if I am going to write it then I may as well write the whole lot. The problem here is that my first attempt was to use the Sql statement DISTINCT ROW – which failed against mySql as it is not supported. By mixing the potentially Sql implementation specific Sql statements with the implementation neutral ActiveRecord means that you could easily tie yourself to one database implementation without realising it. At least if you are writing all of the Sql you are aware that you need to take steps to address the issue.

Anyway, I hope somebody out there can offer a better solution but if not – at least this one works.