While you can always fix a bug, you cannot always recover of inconsistent data.

In this blog post, I’ll describe some practices and techniques I use to prevent inconsistent data from silently ruining your companies’ most valuable asset: its database.

Use SQL database features

SQL databases provides simple features to enforce data integrity.

null: false

“Well… I don’t know.”

Most columns in your database should be set to a value for the record to be consistent. When writing your migration, set them to null: false. The database will blow up if you forgot to set them instead of silently setting them to NULL.

create_table :users do |t|
  t.string :email, null: false
end

unique: true

“So that guy has two accounts with the same email address?”

Add index with unique constraints on unique attributes. It’s cheap and it makes look-up even more efficient! In case you also have a `deleted_at` attribute, add it to the unique index and you’re good to go.

add_index :users, [:email, :deleted_at], unique: true

transactions

“Errr, the money transfer failed half way through. We did not credit the other guy.”

Updating multiple records? Wrap that into a SQL transaction. If something goes wrong, changes are reverted and the database is rolled back to a consistent state.

Account.transaction do
  from_account.debit!(amount)
  to_account.credit!(amount)
end

locking

“Two workers were running the invoicing task in parallele… Some people got charged twice.”

Use locking when you want to prevent a record from being updated by others when you are using it.

task = InvoicingTask.lock(true).first
task.perform!

foreign keys

“We lost the destination for this pending order…”

Foreign keys will help you prevent orphan records and missing relationships. They are effective on race conditions that Active Record validations can’t spot. I use Foreigner to add foreign key support to migrations.

Use ActiveRecord validations as live checks

Validations are mainly used to check user’s inputs. I also use them as live tests to ensure data integrity.

validates :duration_is_set_when_session_is_complete

Fail early, fail loudly

Use #create!, find_by_email!, cancel!: shout at objects, they will just blow up when things go wrong.

So instead of doing:

def update_price(product_name, new_price)
  if product = Product.find_by_name(product_name)
    product.update_attributes(price: new_price)
  end
end

do:

def update_price!(product_name, new_price)
  product = Product.find_by_name!(product_name)
  product.update_attributes!(price: new_price)
end

Your users are likely to see an error message when something go wrong, but that’s much better than leaving inconsistent data.

Ensure that existing records are valid

As the code evolves, existing records become invalid. You’ll know about it when a user attempts to update his profile or when you run a migration that updates data. Run ActiveSanity daily to ensure that your data is valid.

What about you? Do you think that those practices are worthwhile? Are there other technics you use to ensure data integrity?