A partial archive of meta.discourse.org as of Tuesday July 18, 2017.

# Importers for large forums

zogstrip

Discourse already has around 40 importers in order to cover a wide range of community software.
These importers work very well, but they tend to be slow for very large forums.
That's why we've built the bulk importers.

## What is a bulk importer?

Our standard importers go through the same code paths as the application. This has the advantage of ensuring the imported data is consistent. But tends to be slow since it's importing record by record...

In order to go faster, we need to import in bulk.
In order to import in bulk, we need to bypass Rails and use SQL.

This solution has 2 drawbacks

1. We lose pretty much all the validations (since it's done in Rails), but we can import 25 million posts in a couple of hours instead of a week
2. We need to keep it up to date whenever we change the structure of the database

There's not much we can do about #1 other than being careful to respect them in the importers.
For #2 we decided to split the code in 2 parts

• An importer script which will import the minimum viable content
• A rake task that is launched post-import in order to populate all the other required columns and tables

The importer will be responsible for importing the most important data that can't be computed.
The rake task will be responsible for computing all the missing (but required) data and stats.

A bulk importer will only import

• groups (name, description)
• user passwords & salts (so they can re-use the same password)
• user profiles (location, website, description)
• categories (name, description)
• topics (title, user, category, status, type)
• post_actions (bookmarks, likes, flags)
• tags (name)

A bulk importer will not import

• posts revisions
• groups permissions
• categories permissions
• avatars (1)
• attachments (2)

(1) the script stores the avatar's URLs in a custom field which can be used later to download the avatars

## When to use a bulk importer?

If you are planning to migrate a forum with more than 5 million posts to Discourse, then it is recommended to try our bulk importers.

We currently only support bulk importing from vBulletin but are planning to support phpBB and XenForo as well.

## How to bulk import?

### Setup

• You need to have a working development environment of Discourse.
• The database of the forum you are importing should be running on the same machine for best performance

### Import

1. Fire up your terminal and go to the discourse directory

2. Install the gem used by the importer

IMPORT=1 bundle install
3. Run the importer

ruby scripts/bulk_import/vbulletin.rb

You can change the locale by using the LOCALE environment variable

LOCALE=fr ruby script/bulk_import/vbulletin.rb

You can also change the connection settings of the imported database

DB_HOST=localhost DB_USERNAME=user DB_PASSWORD=1234 DB_NAME=myforum ruby script/bulk_import/vbulletin.rb

### Post-import

1. Once the import is done, you need to run a rake task to generate all computed data and stats

rake import:ensure_consistency
2. Create a backup

./script/discourse backup
3. Upload the backup to your production instance, enable restoring from a backup and restore your imported data

pfaffman

This is very cool!

But that's not important now.

When did that become official? Last I knew:

And especially since password rules are , I'm surprised that importing a database full of likely crappy passwords is supported now.

juli

Hey @zogstrip I recently began doing some tests to migrate my 6M+ posts forum over to Discourse but I was facing a lot of trouble with the importer taking so long. I'm so happy that the bulk importer is now a reality.

May I ask who is working on the XenForo importer? I would really love to help you guys out since I was going to do it anyway with the current XenForo importer. If a PR is welcome, I might be up to the task!

Cheers!

zogstrip

@pfaffman passwords & salts are imported in custom fields so that you can use the discourse-migratepassword (or similar) plugin to ease the migration

No one is working on it. PR is more than welcome Feel free to ask me questions if you need to.

fefrei

This is looks interesting! Is this task safe to run on an existing site?

After some fiddling in the rails console, I once ended up with an install where topic counts were wrong (e.g. an empty category claimed to have 10 topics in it) and manually wrote code to fix this (see # update all topic counts in the post linked above). It sounds like this rake task would probably solve issues like this

zogstrip

It should be safe. But I haven't testing it thoroughly. Highly recommended that you take a backup first

mtawil

Hello,
I had a vBulletin forum with more than 80M posts and more than 100GB of database size, and I need to do a migration from vBulletin to Discourse by using that fantastic tool (bulk importer).

I did follow your instructions step by step with no luck, every time I run the script I got many types of errors.

First of all, the script will not be working by root user ever. When I run the script through root user I got this error:

Loading application...
URGENT: FATAL:  Peer authentication failed for user "discourse"
Failed to initialize site default
/usr/local/lib/ruby/gems/2.4.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/postgresql_adapter.rb:651:in initialize': FATAL:  Peer authentication failed for user "discourse" (PG::ConnectionBad)


Ok, let’s get into discourse user by su - discourse and run the script:

discourse@ip-10-0-1-178-app:/var/www/discourse$ruby script/bulk_import/vbulletin.rb Loading application... /usr/local/lib/ruby/gems/2.4.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/postgresql_adapter.rb:661:in rescue in connect': FATAL: database "discourse_development" does not exist (ActiveRecord::NoDatabaseError) ......  Just to be clear, I did a hard search about setting discourse development environment, but I can’t find anything, all results are talking about how to install Discourse step by step without using Docker, this is the old way! Please, your assistance in this matter is highly appreciated. Thank you. zogstrip We don’t recommend running an import on a production instance. But if you really want to, you’ll need to tell the script to use the production database RAILS_ENV=production ruby script/bulk_import/vbulletin.rb  mtawil Here is what I got when executing the script with production environment: discourse@ip-10-0-1-178-app:/var/www/discourse$ RAILS_ENV=production ruby script/bulk_import/vbulletin.rb
/usr/local/lib/ruby/gems/2.4.0/gems/activesupport-4.2.8/lib/active_support/dependencies.rb:274:in require': cannot load such file -- mysql2 (LoadError)
from /usr/local/lib/ruby/gems/2.4.0/gems/activesupport-4.2.8/lib/active_support/dependencies.rb:274:in block in require'
from /usr/local/lib/ruby/gems/2.4.0/gems/activesupport-4.2.8/lib/active_support/dependencies.rb:240:in load_dependency'
from /usr/local/lib/ruby/gems/2.4.0/gems/activesupport-4.2.8/lib/active_support/dependencies.rb:274:in require'
from script/bulk_import/vbulletin.rb:2:in <main>'

zogstrip

Did you do the bundle install step in production mode too?

mtawil

Yes

......
......
Bundle complete! 99 Gemfile dependencies, 176 gems now installed.
Gems in the group development were not installed.
Use bundle info [gemname] to see where a bundled gem is installed.

mtawil

Just to be clear, I did install bundles by using IMPORT=1 RAILS_ENV=production bundle install`

Aref

Hi Dears @zogstrip @codinghorror @sam
We wait for the solution of the problem , and from @mtawil wait the final result of the transfer.

Thanks

zogstrip

Like I said, your best luck is to run the import in a dev environment outside of docker.

mtawil

So you did not test the script inside of Docker? Have you tried to run inside it?
Do you have any documentation of how to install Discourse without using Docker?

Thank you.

pfaffman

Search for “Development install #howto