Translating with Mobility

In the first real post on this blog, I described my experience hacking globalize, a gem that calls itself the “Rails I18n de-facto standard library for ActiveRecord model/data translation.”

Globalize was the first gem that I contributed to, and the experience was very important to me as a step toward becoming a better and more knowledgeable programmer.

I am now one of the authors of that gem, having made many commits to the repository. In the time since I first contributed to Globalize, I’ve become keenly aware of the limitations of the gem, and of problems with existing solutions for managing translations in Rails projects.

I’d like to look back in this post to the problem of translation and to solutions devised so far, as motivation to introduce a gem I’ve just released called Mobility. Mobility attempts to solve some core problems with the multitude of Ruby-based translation gems out there right now.

The Problem

You have an application with user-generated content. You want to translate that content into multiple languages. And – importantly – you want to do it in such a way that you don’t need to recode all your existing presentation logic (views, mailers, controllers, etc.) to handle the new languages.

In other words, things should just work – more or less the same as they do with one language, except that switching the language should tell the application to switch context to read or write content stored in that language.

So this:

post = Post.first
post.title

… should return the title of the post, in the language I am currently viewing.

I18n.locale = :en
post.title
#=> "Mobility (noun): quality of being changeable, adaptable or versatile"
I18n.locale = :ja
post.title
#=> "Mobility(名詞):動きやすさ、可動性"

Likewise, I should be able to write translated values the same way:

I18n.locale = :en
post.title = "Translating with Mobility"
post.save

Now the English title has changed, but the Japanese one has not:

post = Post.first
post.title
#=> "Translating with Mobility"
I18n.locale = :ja
post.title
#=> "Mobility(名詞):動きやすさ、可動性"

For bonus points, this solution should also work for querying attributes, so something similar to this:

Post.find_by(title: "Translating with Mobility")

… should work even if title is a translated attribute.

For this to happen, something has to hook in to attribute accessors, to detect changes in the locale and return language-specific content accordingly. This may seem easy to do at first glance, but it gets tricky.

Notice that what we’re asking for is similar but different from the problem solved by localization solutions such as the Rails I18n API, which provide namespaced translation lookup (generally) for interface translation, typically with static data.

With application content, we want translations at the class level, in the form of persisted attributes of a model. Ultimately we’re talking about a storage strategy.

As it turns out, there are a number of different strategies that different people have come up with to solve this problem. Let’s take a look at them.

Solutions

Strategy 1: Translatable Columns

The most obvious solution to this problem is to create columns on your model table in each locale you want to support, with some convention for identifying the locale of the column.

Typically, this is done by appending a suffix to the column name made up of an underscore plus the locale, with any dashes in the locale converted to underscores.

So title becomes title_en, title_fr, title_ja, and so on. Adding a new locale to an application then requires that you add columns for that locale to each and every translated model table (see figure below.)

Table structure of column strategy

For a small number of models, and a small number of locales, this is a reasonable solution. It has the advantage that the column structure is easy to understand, and querying by translated attribute is straightforward:

Post.find_by(title_en: "Mobility")
#=> #<Post:0x... id: 123 content_en: "Mobility" content_ja: "モビリティ"...>
Post.find_by(title_ja: "モビリティ")
#=> #<Post:0x... id: 123 content_en: "Mobility" content_ja: "モビリティ"...>

Some key disadvantages of this approach:

For many models and many locales, this approach requires many migrations, each time a locale or model is added.
Space is not used very efficiently: every record has a column for every locale, even if the locale is not used.

Despite these disadvantages, this strategy is appropriate for applications with a fixed number of locales. For a very nice implementation of this approach, see Traco.

Mobility supports this strategy with its :column backend.

Strategy 2: Model Translation Tables

Now consider that we have an application where we have many translated models, with a potentially open-ended list of locales we want to support. A service with user-generated content offered to users around the world would be a good example of this pattern. In this case, the first strategy becomes impractical as the number of locales grows; with, say, 50 locales, and only a small number of actually translations, the vast majority of translated columns would be null. On top of which, adding new locales to all the translated models would be a pain, not to mention error-prone.

This brings us to the next strategy, shown in the database schema below. Here, each model table is assigned its own translations table. Each row in the translations table has:

A column for the locale (locale)
A foreign key pointing to the model table (post_id)
One or more columns for translations (title, content)

With this structure, we can now join the translation table (e.g. post_translations) to the model table (e.g. posts) on the locale we want to get the desired result: a model with translations in that locale.

Table structure of model translation strategy

In ActiveRecord, we would have an association on the model Post to the model PostTranslation, such that we can fetch translations with post.translations and pick out the one we want:

post = Post.first
translation = post.translations.find do |translation|
  translation.locale == "en"
end
#=> #<PostTranslation:0x... id: 123 locale: "en" ...>
translation.title
#=> "Translating with Mobility"
translation.content
#=> "In the first real post on this blog..."

With this approach, we’ve nicely solved some of the problems with the first strategy:

Since translations now have their own model and table, and we only create rows in that table when we save a translation in that locale, space usage is more efficient (no wasted columns/rows for locales that have no translation).
We do not need any migrations to add locales: simply save the translation for a new locale and it becomes available. So scalability on the “locale axis” is very good.

However, there are some clear disadvantages with this approach that should be kept in mind:

Having translations on a different table implies a lot of joining, which (if not managed very carefully) can lead to performance degradation.
Joining can also add significant complexity to the code dealing with translations. In particular, querying on translated attributes is much more complicated than it was with the first strategy, where we queried on the model column directly.

Note also that although this strategy scales nicely along the “locale axis”, it nonetheless requires a table for every translated model, and thus a migration every time we add a new model or a new translated attribute to an existing model. So this strategy may not scale well if an application has many different forms of translated content, spread across a large and growing list of models.

This strategy is implemented by Globalize, the gem mentioned at the beginning of this post (it is also implemented by many other gems). For the reasons mentioned above, Globalize has become the preferred option for model translation in Rails. Like the first strategy, it is a very useful approach for many (but not all) use cases.

Mobility supports this strategy with its :table backend.

Strategy 3: Shared Translation Tables

My experience working with Globalize, and in particular with scalability issues mentioned above, prompted me to consider another strategy which would scale better with model creation. I wanted a strategy which would be usable immediately at the code level, without having to add a table or column every time a new model or translated attribute is added.

This led me to the third strategy, which I’ve called the “key value” strategy. The basic idea is simple: rather than using a dedicated table for each translated model, instead create shared tables to store translations across all models (as shown in the figure below). Two tables are required since we will be storing two types of values: strings and text. (If you wanted to store other column types, you could add more tables for each of those types, or cast strings to some other type).

Table structure of key value strategy

The key thing here is that rather than using a normal association as we did in the last strategy, where post_id on the post_translations table pointed to the posts table, we will instead use a polymorphic association. By doing this, we can share the translations tables with as many models as we like, migrating just once to create the shared tables.

The new tables have the following columns:

A column for the locale (locale)
A foreign key pointing to the translated model id (translatable_id)
A column holding the model class (translatable_type)
A column holding the name of the attribute being translated (key)
A column holding the value of the translated attribute (value)

With these tables in place, we define a polymorphic relationship on the model, something like this:

class Post < ActiveRecord::Base
  has_many :text_translations,
    as: :translatable,
    class_name: TextTranslation,
    inverse_of: :translatable
  #...
end

and a translation class:

class TextTranslation
  self.table_name = "mobility_text_translations"
  belongs_to :translatable, polymorphic: true
end

Now that we have these models in place, we can easily fetch and set translations of a model class:

post = Post.first
post.text_translations.find do |translation|
  translation.key == "title" && translation.value == "Translating with Mobility"
end
#=> #<TextTranslation:0x...
      id: 123
      locale: "en"
      key="title"
      value="Translating with Mobility"
      ... >

To add translations to a new model, we simply add a polymorphic relationship like the one above on Post, and we can immediately add translations to the model.

In practice, there are some other issues that need to be handled to make this work effectively, but done correctly this strategy has some notable benefits:

No more migrations (after the first one), so an application can quickly add new translated models.
As with Strategy 2, no extra effort to add new locales.
Relatively efficient in terms of storage, in the sense that we only create records for translations in locales where they actually exist. However, since each attribute translation gets a row in the table, for multiple attributes on a single translation table, Strategy 2 may be more space efficient since it stores translated attributes together in a single row.

Now for the disadvantages:

The join complexity described in the last section is even worse now, because we have to join every attribute separately when searching for matches. This is quite tricky but can be done. I’ll probably write about this in a future post.
Since we have each attribute stored separately (as a key/value pair) it becomes trickier to maintain our database in a consistent state. For example, if we remove a model, its translations will remain in the translations table(s) unless we explicitly remove them. Likewise, if we decide to rename a translated attribute, we may end up with translations for the previous attribute name leftover in the table.

Despite these disadvantages, I consider this strategy to be applicable to the widest variety of situations. It is the default strategy in Mobility, implemented as the :key_value backend.

Strategy 4: Serialized Data

Yet another set of strategies take the approach of jamming all the translations for an attribute into a single column. The first strategy following this approach is one which serializes translations as a hash and stores it on a text column.

At the model layer, this is pretty trivial in Rails:

class Post < ActiveRecord::Base
  serialize :title_translations, Hash
end

post = Post.new
post.title_translations = { "en" => "Translating with Mobility" }
post.save

We can now fetch translations simply by fetching the value for the key matching a given locale:

post.title_translations["en"]
#=> "Translating with Mobility"

This strategy has the obvious advantage of simplicity: all our translations are stored together on the attribute column. Like the model translations and key-value strategies, locale scalability is not an issue since we can just add new hash keys as we need them.

In addition, no additional migrations are necessary; simply adding a text column to every model for each translated attribute is enough to support any number of translations.

Despite these advantages, the serialized approach is not generally recommended, for the key reason that querying on serialized data is not possible. Indeed, Rails itself has discouraged the use of serialized attributes in favour of database-specific storage solutions (described below).

Nonetheless, there are valid use cases for this strategy. One gem that implements this approach is Multilang.

Mobility supports this strategy with its :serialized backend.

Strategy 5/6: PostgreSQL Hstore/Jsonb

The last two strategies, Hstore and jsonb, require the use of PostgreSQL as your database. Hstore columns store depth-1 hashes, where keys and values of the hash must be of string type. Jsonb (“binary json”) columns store data in JSON format, which supports a much broader range of primitives (strings, numbers, booleans and null) as well as arrays, and unlimited depth. Jsonb is generally preferred over Hstore since it is more flexible.

Further details on these data types can be found in the links above; for our purposes, the main point is that they both store complex data types in a single column, in a form that can be easily and quickly queried.

An example of such a query (on a Jsonb column) looks like this:

SELECT "posts".* FROM "posts"
WHERE (posts.title @> ('{"en":"Translating with Mobility"}')::jsonb)

Querying on multiple locales at once is also simple given the flexibility of the data format.

Gems that use Hstore as a storage format for translations include Trasto, Multilang Hstore and Hstore Translate. I only know of one gem (as of yet) that uses Jsonb for storage, Json Translate.

Mobility supports Hstore with its :hstore backend and Jsonb with its :jsonb backend.

Common Patterns

Having worked extensively on Globalize and having delved into the code of many of the other gems mentioned above, it became clear to me that each translation solution ended up implementing many of the same things, over and over again.

The concept of a translation cache, for example, is implemented for translatable columns in Traco, for model table translations in Globalize, and for hstore translations in Multilang Hstore.

The concept of fallbacks (originally described by Sven Fuchs in this blog post) is likewise implemented again, and again, and again.

Locale accessors, methods which fetch an attribute in a particular locale with a dedicated method like title_en, is yet another repeated pattern found in many gems, for example Globalize Accessors in the case of Globalize.

Finally, tracking of changes (i.e. Active Model Dirty in Rails) is another common pattern, found here in Globalize and here in Multilang Hstore, for example.

What is important to note here is that while each of these is implemented in their own way, the repeated patterns (caching, fallbacks, locale accessors, dirty tracking, etc.) are not in general dependent on the storage implementation strategies described earlier.

In fact, the core pattern of fetching and updating translations using getter and setter methods, which behind the scenes map to some storage format, is itself similar across translation gems, independent of the storage strategy.

This code from Trasto captures this core pattern succinctly:

def define_localized_attribute(column)
  define_method(column) do
    read_localized_value(column)
  end

  define_method("#{column}=") do |value|
    write_localized_value(column, value)
  end
end

(Here, read_localized_value fetches the value of the translated attribute column in the current locale.)

This commonality between translation solutions motivated me to consider whether it would be possible to separate the implementation of translation as a high-level pattern from the lower-level implementation of storing translations (an idea also discussed here). This led me to start developing Mobility.

Introducing Mobility

So what is Mobility? Well, I’ve spent most of this post explaining a bunch of storage strategies and some things which are similar about them. Mobility takes this whole idea to the next level and turns the problem of translating attributes into the problem of mapping translated attribute accessors (readers and writers) to an abstract pluggable “backend”, which encapsulates (and isolates) all the storage logic.

Rather than jump into the details of how these backends work (which would require a blog post of its own), let’s look at how we can use Mobility to quickly implement any of the storage strategies above.

Getting Started

First, add mobility to the Gemfile:

gem 'mobility', '> 0.1.3'

In Rails, you can use a generator to create an initializer file and migrations for the shared translation tables for the key value strategy:

rails generate mobility:install

The initializer file has the line:

Mobility.config.default_backend = :key_value

With this configuration, the key value backend will be our default backend. Run the two migrations to generate string and text translation tables with rake db:migrate, and we’re ready to go.

Creating a translated attribute in a model is as easy as including (or extending) the Mobility module and declaring translated attributes (similar to Globalize and other gems):

class Post < ActiveRecord::Base
  include Mobility
  translates :title,   type: :string
  translates :content, type: :text
end

Now we can use the attributes:

post = Post.new
#=> #<Post id: nil>
post.title = "Mobility"
post.title
#=> "Mobility"
I18n.locale = :ja
post.title
#=> nil
post.title = "モビリティ"
post.title
#=> "モビリティ"
post.save

When you call post.title, you will see the SQL:

SELECT "mobility_string_translations".* FROM "mobility_string_translations"
WHERE "mobility_string_translations"."translatable_id" = 1
AND "mobility_string_translations"."translatable_type" = 'Post'
AND "mobility_string_translations"."key" = 'title'

post.content looks similar, but instead fetches from the mobility_text_translations table:

SELECT "mobility_text_translations".* FROM "mobility_text_translations"
WHERE "mobility_text_translations"."translatable_id" = 1
AND "mobility_text_translations"."translatable_type" = 'Post'
AND "mobility_text_translations"."key" = 'content'

It’s also possible to query for a post with finder methods using the i18n scope:

Post.i18n.find_by(title: "Mobility")
#=> #<Post id: 1>

The SQL for this query is a somewhat tricky table join, which is necessary since translations for this backend are stored on the shared translations table. (I’ll leave the details of that for another blog post.)

What is important to note here is that although the format for declaring translated attributes (with translates) is similar in form to other translation gems, the implementation is totally different. In particular, Mobility strives to keep the accessor and setup code for a given set of attributes totally isolated from attributes declared separately. This is what allows us to call translates twice in the model code above, one for each type of attribute (string and text).

Translation Options

Behind the scenes, when translates is called, Mobility creates an anonymous backend class which optionally includes modules for caching, fallbacks, and dirty accessors. Which of these modules is included can be customized by passing options:

class Post < ActiveRecord::Base
  include Mobility
  translates :title,   type: :string, fallbacks: { en: :ja }
  translates :content, type: :text,   fallbacks: { en: :fr }
end

I wouldn’t want to set up fallbacks like this in a real application, but regardless, it actually works: if we call post.title and the title is not set in English, it will fall back to Japanese. However, post.content will fall back to French, since this is how we have configured the attributes.

The fallbacks option is an example of a shared option which can be included for any backend. Others of this kind are cache (true by default), dirty (limited to classes which include ActiveModel::Dirty) and locale_accessors.

Here is an example of adding locale accessors to attributes:

class Post < ActiveRecord::Base
  include Mobility
  translates :title,   type: :string, locale_accessors: [:en, :ja]
  translates :content, type: :text,   locale_accessors: [:en, :ja]
end

Now we can use the accessors to access attribute values in English or Japanese:

post = Post.first
post.title_en
#=> "Mobility"
post.title_ja
#=> "モビリティ"

For more details on how these options are set, see the API documentation on the Mobility::Attributes class.

Translation Backends

As mentioned in Getting Started, the default backend is set in the initializer file to :key_value. But we can override it to use any backend, for any attribute:

class Post < ActiveRecord::Base
  include Mobility
  translates :title,   backend: :column
  translates :content, backend: :table
end

In order for this to work, we will need to create the necessary columns for the column backend (as described in Strategy 1, here for three locales):

class AddColumnsToPosts < ActiveRecord::Migration[5.0]
  def change
    add_column :posts, :title_en, :string
    add_column :posts, :title_ja, :string
    add_column :posts, :title_fr, :string
  end
end

…and translations table for the table backend (as described in Strategy 2):

class CreatePostTranslations < ActiveRecord::Migration[5.0]
  def change
    create_table :post_translations do |t|
      t.string :locale
      t.integer :post_id
      t.text :content
    end
  end
end

With these columns and table, we can use the attributes:

post.title = "Translating with Mobility"
=> "Translating with Mobility"
post.content = "In the first real post on this blog..."
=> "In the first real post on this blog..."

If we now save, this is the generated SQL:

INSERT INTO "posts" ("title_en") VALUES ('Translating with Mobility')
INSERT INTO "post_translations" ("locale", "post_id", "content") VALUES ('en',5,'In the first real post on this blog...')

So each attribute is treated entirely in isolation, and we can have multiple backends operating on the same model without any conflicts.

This works as well for queries:

Post.i18n.find_by(title: "Translating with Mobility", content: "In the first real post on this blog...")
#=> #<Post id: 5, title_en: "Translating with Mobility", title_ja: nil, title_fr: nil>

Note the SQL:

SELECT  "posts".* FROM "posts"
INNER JOIN "post_translations"
ON "post_translations"."post_id" = "posts"."id"
AND "post_translations"."locale" = 'en'
WHERE "posts"."title_en" = 'Translating with Mobility'
AND "post_translations"."content" = 'In the first real post on this blog...'

So components for each backend are combined to generate the final query. This applies to any combination of backends.

Mobility = Freedom

Why go to the trouble to implement things this way? There are a few reasons, but the driving motivation is to free the developer from having to constantly think about the details of the underlying translation implementation when working with translations. As its name implies, Mobility also strives to make it easy to change these details (e.g. change strategy) without having to rewrite an entire code base.

This in fact goes further than the examples described here. If you look at Mobility’s gemspec, you will see that it does not depend on activerecord or on activesupport. These are optional dependencies, and indeed the basic framework for setting up translated attributes with a backend (including caching, fallbacks and locale accessors) is totally independent of Rails. (Dirty tracking can also be used on a non-persisted class as long as it inherits from ActiveModel::Dirty.)

Thanks to this isolation of dependencies, Mobility can also support other ORM, and currently does: all strategies described here are also implemented for Sequel models. So this will work (provided translation tables have been created):

class Post < ::Sequel::Model
  include Mobility
  translates :title,   type: :string
  translates :content, type: :text
end

Mobility detects the parent class and generates a backend instance using a class namespaced accordingly.

Aside from keeping with the “DRY” philosophy of not repeating yourself, having a shared framework which works with many different configurations (ActiveRecord models, Sequel models, PORO, etc.) creates an opportunity to build something more interesting on top of the translation framework. I think there are some really interesting possibilities in this space, but current solutions being spread across more than a dozen different gems makes it impossible to explore those possibilities.

Try it!

So, try it and let me know what you think! I’d be really eager to hear from anyone interested in building interesting projects using Mobility for translation. While this blog has been nearly dead for a long time, I plan to follow up this post with more details on how Mobility works, other use cases, and on further developments of the gem. Stay tuned!

Posted on July 27th, 2024. Filed under: translation, mobility, ruby, sequel, rails, activerecord, globalize, i18n, postgres.