Translating with Mobility
In the first real post on this blog, I described my experience hacking globalize, a gem that calls itself the “Rails I18n de-facto standard library for ActiveRecord model/data translation.”
Globalize was the first gem that I contributed to, and the experience was very important to me as a step toward becoming a better and more knowledgeable programmer.
I am now one of the authors of that gem, having made many commits to the repository. In the time since I first contributed to Globalize, I’ve become keenly aware of the limitations of the gem, and of problems with existing solutions for managing translations in Rails projects.
I’d like to look back in this post to the problem of translation and to solutions devised so far, as motivation to introduce a gem I’ve just released called Mobility. Mobility attempts to solve some core problems with the multitude of Ruby-based translation gems out there right now.
The Problem
You have an application with user-generated content. You want to translate that content into multiple languages. And – importantly – you want to do it in such a way that you don’t need to recode all your existing presentation logic (views, mailers, controllers, etc.) to handle the new languages.
In other words, things should just work – more or less the same as they do with one language, except that switching the language should tell the application to switch context to read or write content stored in that language.
So this:
post = Post.first
post.title
… should return the title of the post, in the language I am currently viewing.
I18n.locale = :en post.title #=> "Mobility (noun): quality of being changeable, adaptable or versatile" I18n.locale = :ja post.title #=> "Mobility(名詞):動きやすさ、可動性"
Likewise, I should be able to write translated values the same way:
I18n.locale = :en post.title = "Translating with Mobility" post.save
Now the English title has changed, but the Japanese one has not:
post = Post.first post.title #=> "Translating with Mobility" I18n.locale = :ja post.title #=> "Mobility(名詞):動きやすさ、可動性"
For bonus points, this solution should also work for querying attributes, so something similar to this:
Post.find_by(title: "Translating with Mobility")
… should work even if title
is a translated attribute.
For this to happen, something has to hook in to attribute accessors, to detect changes in the locale and return language-specific content accordingly. This may seem easy to do at first glance, but it gets tricky.
Notice that what we’re asking for is similar but different from the problem solved by localization solutions such as the Rails I18n API, which provide namespaced translation lookup (generally) for interface translation, typically with static data.
With application content, we want translations at the class level, in the form of persisted attributes of a model. Ultimately we’re talking about a storage strategy.
As it turns out, there are a number of different strategies that different people have come up with to solve this problem. Let’s take a look at them.
Solutions
Strategy 1: Translatable Columns
The most obvious solution to this problem is to create columns on your model table in each locale you want to support, with some convention for identifying the locale of the column.
Typically, this is done by appending a suffix to the column name made up of an underscore plus the locale, with any dashes in the locale converted to underscores.
So title
becomes title_en
, title_fr
, title_ja
, and so on. Adding a new
locale to an application then requires that you add columns for that locale to
each and every translated model table (see figure below.)
For a small number of models, and a small number of locales, this is a reasonable solution. It has the advantage that the column structure is easy to understand, and querying by translated attribute is straightforward:
Post.find_by(title_en: "Mobility") #=> #<Post:0x... id: 123 content_en: "Mobility" content_ja: "モビリティ"...> Post.find_by(title_ja: "モビリティ") #=> #<Post:0x... id: 123 content_en: "Mobility" content_ja: "モビリティ"...>
Some key disadvantages of this approach:
- For many models and many locales, this approach requires many migrations, each time a locale or model is added.
- Space is not used very efficiently: every record has a column for every locale, even if the locale is not used.
Despite these disadvantages, this strategy is appropriate for applications with a fixed number of locales. For a very nice implementation of this approach, see Traco.
Mobility supports this strategy with its :column
backend.
Strategy 2: Model Translation Tables
Now consider that we have an application where we have many translated models, with a potentially open-ended list of locales we want to support. A service with user-generated content offered to users around the world would be a good example of this pattern. In this case, the first strategy becomes impractical as the number of locales grows; with, say, 50 locales, and only a small number of actually translations, the vast majority of translated columns would be null. On top of which, adding new locales to all the translated models would be a pain, not to mention error-prone.
This brings us to the next strategy, shown in the database schema below. Here, each model table is assigned its own translations table. Each row in the translations table has:
- A column for the locale (
locale
) - A foreign key pointing to the model table (
post_id
) - One or more columns for translations (
title
,content
)
With this structure, we can now join the translation table (e.g.
post_translations
) to the model table (e.g. posts
) on the locale we want
to get the desired result: a model with translations in that locale.
In ActiveRecord, we would have an association on the model Post
to the model
PostTranslation
, such that we can fetch translations with
post.translations
and pick out the one we want:
post = Post.first translation = post.translations.find do |translation| translation.locale == "en" end #=> #<PostTranslation:0x... id: 123 locale: "en" ...> translation.title #=> "Translating with Mobility" translation.content #=> "In the first real post on this blog..."
With this approach, we’ve nicely solved some of the problems with the first strategy:
- Since translations now have their own model and table, and we only create rows in that table when we save a translation in that locale, space usage is more efficient (no wasted columns/rows for locales that have no translation).
- We do not need any migrations to add locales: simply save the translation for a new locale and it becomes available. So scalability on the “locale axis” is very good.
However, there are some clear disadvantages with this approach that should be kept in mind:
- Having translations on a different table implies a lot of joining, which (if not managed very carefully) can lead to performance degradation.
- Joining can also add significant complexity to the code dealing with translations. In particular, querying on translated attributes is much more complicated than it was with the first strategy, where we queried on the model column directly.
Note also that although this strategy scales nicely along the “locale axis”, it nonetheless requires a table for every translated model, and thus a migration every time we add a new model or a new translated attribute to an existing model. So this strategy may not scale well if an application has many different forms of translated content, spread across a large and growing list of models.
This strategy is implemented by Globalize, the gem mentioned at the beginning of this post (it is also implemented by many other gems). For the reasons mentioned above, Globalize has become the preferred option for model translation in Rails. Like the first strategy, it is a very useful approach for many (but not all) use cases.
Mobility supports this strategy with its :table
backend.
Strategy 3: Shared Translation Tables
My experience working with Globalize, and in particular with scalability issues mentioned above, prompted me to consider another strategy which would scale better with model creation. I wanted a strategy which would be usable immediately at the code level, without having to add a table or column every time a new model or translated attribute is added.
This led me to the third strategy, which I’ve called the “key value” strategy. The basic idea is simple: rather than using a dedicated table for each translated model, instead create shared tables to store translations across all models (as shown in the figure below). Two tables are required since we will be storing two types of values: strings and text. (If you wanted to store other column types, you could add more tables for each of those types, or cast strings to some other type).
The key thing here is that rather than using a normal association as we did in
the last strategy, where post_id
on the post_translations
table pointed to
the posts
table, we will instead use a polymorphic
association.
By doing this, we can share the translations tables with as many models as we
like, migrating just once to create the shared tables.
The new tables have the following columns:
- A column for the locale (
locale
) - A foreign key pointing to the translated model id (
translatable_id
) - A column holding the model class (
translatable_type
) - A column holding the name of the attribute being translated (
key
) - A column holding the value of the translated attribute (
value
)
With these tables in place, we define a polymorphic relationship on the model, something like this:
class Post < ActiveRecord::Base has_many :text_translations, as: :translatable, class_name: TextTranslation, inverse_of: :translatable #... end
and a translation class:
class TextTranslation self.table_name = "mobility_text_translations" belongs_to :translatable, polymorphic: true end
Now that we have these models in place, we can easily fetch and set translations of a model class:
post = Post.first post.text_translations.find do |translation| translation.key == "title" && translation.value == "Translating with Mobility" end #=> #<TextTranslation:0x... id: 123 locale: "en" key="title" value="Translating with Mobility" ... >
To add translations to a new model, we simply add a polymorphic relationship
like the one above on Post
, and we can immediately add translations to the
model.
In practice, there are some other issues that need to be handled to make this work effectively, but done correctly this strategy has some notable benefits:
- No more migrations (after the first one), so an application can quickly add new translated models.
- As with Strategy 2, no extra effort to add new locales.
- Relatively efficient in terms of storage, in the sense that we only create records for translations in locales where they actually exist. However, since each attribute translation gets a row in the table, for multiple attributes on a single translation table, Strategy 2 may be more space efficient since it stores translated attributes together in a single row.
Now for the disadvantages:
- The join complexity described in the last section is even worse now, because we have to join every attribute separately when searching for matches. This is quite tricky but can be done. I’ll probably write about this in a future post.
- Since we have each attribute stored separately (as a key/value pair) it becomes trickier to maintain our database in a consistent state. For example, if we remove a model, its translations will remain in the translations table(s) unless we explicitly remove them. Likewise, if we decide to rename a translated attribute, we may end up with translations for the previous attribute name leftover in the table.
Despite these disadvantages, I consider this strategy to be applicable to
the widest variety of situations. It is the default strategy in Mobility,
implemented as the :key_value
backend.
Strategy 4: Serialized Data
Yet another set of strategies take the approach of jamming all the translations for an attribute into a single column. The first strategy following this approach is one which serializes translations as a hash and stores it on a text column.
At the model layer, this is pretty trivial in Rails:
class Post < ActiveRecord::Base serialize :title_translations, Hash end post = Post.new post.title_translations = { "en" => "Translating with Mobility" } post.save
We can now fetch translations simply by fetching the value for the key matching a given locale:
post.title_translations["en"] #=> "Translating with Mobility"
This strategy has the obvious advantage of simplicity: all our translations are stored together on the attribute column. Like the model translations and key-value strategies, locale scalability is not an issue since we can just add new hash keys as we need them.
In addition, no additional migrations are necessary; simply adding a text column to every model for each translated attribute is enough to support any number of translations.
Despite these advantages, the serialized approach is not generally recommended, for the key reason that querying on serialized data is not possible. Indeed, Rails itself has discouraged the use of serialized attributes in favour of database-specific storage solutions (described below).
Nonetheless, there are valid use cases for this strategy. One gem that implements this approach is Multilang.
Mobility supports this strategy with its :serialized
backend.
Strategy 5/6: PostgreSQL Hstore/Jsonb
The last two strategies, Hstore and jsonb, require the use of PostgreSQL as your database. Hstore columns store depth-1 hashes, where keys and values of the hash must be of string type. Jsonb (“binary json”) columns store data in JSON format, which supports a much broader range of primitives (strings, numbers, booleans and null) as well as arrays, and unlimited depth. Jsonb is generally preferred over Hstore since it is more flexible.
Further details on these data types can be found in the links above; for our purposes, the main point is that they both store complex data types in a single column, in a form that can be easily and quickly queried.
An example of such a query (on a Jsonb column) looks like this:
SELECT "posts".* FROM "posts" WHERE (posts.title @> ('{"en":"Translating with Mobility"}')::jsonb)
Querying on multiple locales at once is also simple given the flexibility of the data format.
Gems that use Hstore as a storage format for translations include Trasto, Multilang Hstore and Hstore Translate. I only know of one gem (as of yet) that uses Jsonb for storage, Json Translate.
Mobility supports Hstore with its :hstore
backend and Jsonb with its
:jsonb
backend.
Common Patterns
Having worked extensively on Globalize and having delved into the code of many of the other gems mentioned above, it became clear to me that each translation solution ended up implementing many of the same things, over and over again.
The concept of a translation cache, for example, is implemented for translatable columns in Traco, for model table translations in Globalize, and for hstore translations in Multilang Hstore.
The concept of fallbacks (originally described by Sven Fuchs in this blog post) is likewise implemented again, and again, and again.
Locale accessors, methods which fetch an attribute in a particular locale with
a dedicated method like title_en
, is yet another repeated pattern found in
many gems, for example Globalize
Accessors in the case of
Globalize.
Finally, tracking of changes (i.e. Active Model Dirty in Rails) is another common pattern, found here in Globalize and here in Multilang Hstore, for example.
What is important to note here is that while each of these is implemented in their own way, the repeated patterns (caching, fallbacks, locale accessors, dirty tracking, etc.) are not in general dependent on the storage implementation strategies described earlier.
In fact, the core pattern of fetching and updating translations using getter and setter methods, which behind the scenes map to some storage format, is itself similar across translation gems, independent of the storage strategy.
This code from Trasto captures this core pattern succinctly:
def define_localized_attribute(column) define_method(column) do read_localized_value(column) end define_method("#{column}=") do |value| write_localized_value(column, value) end end
(Here, read_localized_value
fetches the value of the translated attribute
column
in the current locale.)
This commonality between translation solutions motivated me to consider whether it would be possible to separate the implementation of translation as a high-level pattern from the lower-level implementation of storing translations (an idea also discussed here). This led me to start developing Mobility.
Introducing Mobility
So what is Mobility? Well, I’ve spent most of this post explaining a bunch of storage strategies and some things which are similar about them. Mobility takes this whole idea to the next level and turns the problem of translating attributes into the problem of mapping translated attribute accessors (readers and writers) to an abstract pluggable “backend”, which encapsulates (and isolates) all the storage logic.
Rather than jump into the details of how these backends work (which would require a blog post of its own), let’s look at how we can use Mobility to quickly implement any of the storage strategies above.
Getting Started
First, add mobility
to the Gemfile:
gem 'mobility', '> 0.1.3'
In Rails, you can use a generator to create an initializer file and migrations for the shared translation tables for the key value strategy:
rails generate mobility:install
The initializer file has the line:
Mobility.config.default_backend = :key_value
With this configuration, the key value backend will be our default backend.
Run the two migrations to generate string and text translation tables with
rake db:migrate
, and we’re ready to go.
Creating a translated attribute in a model is as easy as including (or
extending) the Mobility
module and declaring translated attributes (similar
to Globalize and other gems):
class Post < ActiveRecord::Base include Mobility translates :title, type: :string translates :content, type: :text end
Now we can use the attributes:
post = Post.new #=> #<Post id: nil> post.title = "Mobility" post.title #=> "Mobility" I18n.locale = :ja post.title #=> nil post.title = "モビリティ" post.title #=> "モビリティ" post.save
When you call post.title
, you will see the SQL:
SELECT "mobility_string_translations".* FROM "mobility_string_translations" WHERE "mobility_string_translations"."translatable_id" = 1 AND "mobility_string_translations"."translatable_type" = 'Post' AND "mobility_string_translations"."key" = 'title'
post.content
looks similar, but instead fetches from the
mobility_text_translations
table:
SELECT "mobility_text_translations".* FROM "mobility_text_translations" WHERE "mobility_text_translations"."translatable_id" = 1 AND "mobility_text_translations"."translatable_type" = 'Post' AND "mobility_text_translations"."key" = 'content'
It’s also possible to query for a post with finder methods using the i18n
scope:
Post.i18n.find_by(title: "Mobility") #=> #<Post id: 1>
The SQL for this query is a somewhat tricky table join, which is necessary since translations for this backend are stored on the shared translations table. (I’ll leave the details of that for another blog post.)
What is important to note here is that although the format for declaring
translated attributes (with translates
) is similar in form to other
translation gems, the implementation is totally different. In particular,
Mobility strives to keep the accessor and setup code for a given set of
attributes totally isolated from attributes declared separately. This is
what allows us to call translates
twice in the model code above, one for
each type of attribute (string and text).
Translation Options
Behind the scenes, when translates
is called, Mobility creates an anonymous
backend class which optionally includes modules for caching, fallbacks, and
dirty accessors. Which of these modules is included can be customized by
passing options:
class Post < ActiveRecord::Base include Mobility translates :title, type: :string, fallbacks: { en: :ja } translates :content, type: :text, fallbacks: { en: :fr } end
I wouldn’t want to set up fallbacks like this in a real application, but
regardless, it actually works: if we call post.title
and the title is not
set in English, it will fall back to Japanese. However, post.content
will
fall back to French, since this is how we have configured the attributes.
The fallbacks
option is an example of a shared option which can be included
for any backend. Others of this kind are cache
(true
by default), dirty
(limited to classes which include ActiveModel::Dirty
) and locale_accessors
.
Here is an example of adding locale accessors to attributes:
class Post < ActiveRecord::Base include Mobility translates :title, type: :string, locale_accessors: [:en, :ja] translates :content, type: :text, locale_accessors: [:en, :ja] end
Now we can use the accessors to access attribute values in English or Japanese:
post = Post.first post.title_en #=> "Mobility" post.title_ja #=> "モビリティ"
For more details on how these options are set, see the API documentation on the Mobility::Attributes class.
Translation Backends
As mentioned in Getting Started, the default backend is
set in the initializer file to :key_value
. But we can override it to use any
backend, for any attribute:
class Post < ActiveRecord::Base include Mobility translates :title, backend: :column translates :content, backend: :table end
In order for this to work, we will need to create the necessary columns for the column backend (as described in Strategy 1, here for three locales):
class AddColumnsToPosts < ActiveRecord::Migration[5.0] def change add_column :posts, :title_en, :string add_column :posts, :title_ja, :string add_column :posts, :title_fr, :string end end
…and translations table for the table backend (as described in Strategy 2):
class CreatePostTranslations < ActiveRecord::Migration[5.0] def change create_table :post_translations do |t| t.string :locale t.integer :post_id t.text :content end end end
With these columns and table, we can use the attributes:
post.title = "Translating with Mobility" => "Translating with Mobility" post.content = "In the first real post on this blog..." => "In the first real post on this blog..."
If we now save, this is the generated SQL:
INSERT INTO "posts" ("title_en") VALUES ('Translating with Mobility') INSERT INTO "post_translations" ("locale", "post_id", "content") VALUES ('en',5,'In the first real post on this blog...')
So each attribute is treated entirely in isolation, and we can have multiple backends operating on the same model without any conflicts.
This works as well for queries:
Post.i18n.find_by(title: "Translating with Mobility", content: "In the first real post on this blog...") #=> #<Post id: 5, title_en: "Translating with Mobility", title_ja: nil, title_fr: nil>
Note the SQL:
SELECT "posts".* FROM "posts" INNER JOIN "post_translations" ON "post_translations"."post_id" = "posts"."id" AND "post_translations"."locale" = 'en' WHERE "posts"."title_en" = 'Translating with Mobility' AND "post_translations"."content" = 'In the first real post on this blog...'
So components for each backend are combined to generate the final query. This applies to any combination of backends.
Mobility = Freedom
Why go to the trouble to implement things this way? There are a few reasons, but the driving motivation is to free the developer from having to constantly think about the details of the underlying translation implementation when working with translations. As its name implies, Mobility also strives to make it easy to change these details (e.g. change strategy) without having to rewrite an entire code base.
This in fact goes further than the examples described here. If you look at
Mobility’s
gemspec,
you will see that it does not depend on activerecord or on
activesupport. These are optional dependencies, and indeed the basic framework
for setting up translated attributes with a backend (including caching,
fallbacks and locale accessors) is totally independent of Rails. (Dirty
tracking can also be used on a non-persisted class as long as it inherits from
ActiveModel::Dirty
.)
Thanks to this isolation of dependencies, Mobility can also support other ORM, and currently does: all strategies described here are also implemented for Sequel models. So this will work (provided translation tables have been created):
class Post < ::Sequel::Model include Mobility translates :title, type: :string translates :content, type: :text end
Mobility detects the parent class and generates a backend instance using a class namespaced accordingly.
Aside from keeping with the “DRY” philosophy of not repeating yourself, having a shared framework which works with many different configurations (ActiveRecord models, Sequel models, PORO, etc.) creates an opportunity to build something more interesting on top of the translation framework. I think there are some really interesting possibilities in this space, but current solutions being spread across more than a dozen different gems makes it impossible to explore those possibilities.
Try it!
So, try it and let me know what you think! I’d be really eager to hear from anyone interested in building interesting projects using Mobility for translation. While this blog has been nearly dead for a long time, I plan to follow up this post with more details on how Mobility works, other use cases, and on further developments of the gem. Stay tuned!
Posted on October 6th, 2024.