Wednesday, January 27

Blogger’s FTP Migration Plan: Tricky, Weird, But Effective

I was going to call this post “the worst FTP migration plan, except for all of the others,” but I was afraid that that would truncate poorly on Twitter. But that’s kind of where I’m coming from.

Blogger is turning off FTP publishing, which sucks in a lot of ways, but I think everyone will be better for it: bloggers will be happier with the performance and features of custom domains, and engineers will be happy not to have to support a creaky system that uses an even creakier protocol.

Though we’ve tried to put together a migration process that will work smoothly for everyone, I’m sure it won’t be perfect; there are too many moving parts in FTP publishing to guarantee that everyone will have a great experience. Nevertheless, I believe that our overall plan is sound, so I’d like to tell you about what we came up with, as well as some of the alternatives that we considered (and that might work better for you if you want to try them out).

First up, what’s our goal? The reason most folks have for using FTP is that, before our custom domain feature was launched, FTP was the only way to put your blog on your own domain instead of blogspot.com. So, it makes sense that converting blogs to use custom domains is the focus of the migration.

Unfortunately, in the general case (which I’ll get to below), changing to a custom domain will necessitate a change in post URLs. It is an absolute priority for us to preserve all existing links to blog posts. We do have to turn off our FTP publishing, but it would be unacceptable for us to break a blog’s inbound links, both from other sites and (cough, cough) search engines. Along those same lines, we want to make sure that any accrued PageRank is preserved as well.

Here are the are three reasons why an FTP blog can’t be just converted to a custom domain blog at the same host, with the same URLs:
  • Hosting at sub-paths. With our FTP feature, you can publish your blog at www.yourdomain.com/myblog/. Custom domains don’t support that “/myblog/” bit, so if you did point www.yourdomain.com to Blogger’s servers, we wouldn’t serve your posts at the same URLs.
  • Other pages on the domain. Even if your blog is at www.yourdomain.com, you may have uploaded other pages or files that Blogger doesn’t know about. Pointing www.yourdomain.com to Blogger would keep your blog live but break links to those other files and pages.
  • Uploaded images. Ok, say you’re at www.yourdomain.com and you’ve never touched an FTP client. Would repointing the domain work then? Unfortunately, probably not. When you upload images to an FTP blog, Blogger re-uploads those images to your FTP site in an “uploaded_images”. Moving the blog directly to a custom domain at the same address would cause all if your images to break.
Ugh. Not fun.

Now, we’ve thought about this issue before, and have a bit of a solution, one that I implemented on this very blog a few years back. As long as you didn’t publish your FTP blog to a sub-path, you can use Blogger’s missing files host feature as a workaround. Unfortunately, it’s tricky to explain.

The way it works is that you tell your hosting provider to serve your old blog on a different domain, changing from, say, www.yourdomain.com to old.yourdomain.com or something along those lines. You might be able to do this easily in a configuration page, less easily from a remote shell, or you might have to copy your site by downloading and re-uploading the whole thing with an FTP client.

Once the old site is moved, you point www.yourdomain.com to Blogger and set your “Missing Files Host” setting to old.yourdomain.com. Then, whenever Blogger gets a request for a page it doesn’t know about, it sends a redirect to the missing files host. You can see this at work by trying the following link, which will redirect from blog.grogmaster.com to my missing files host, static.grogmaster.com: http://blog.grogmaster.com/uploaded_images/ptbridgeport-742307.jpg

Yay, right? Everything’s served up from its rightful URL, or redirected to a URL that works! If you ignore the bit about this not working at a sub-path, it sounds like a reasonable enough solution. And, it actually is, if you’re the sort of person who is interested in following the directions from the previous two paragraphs.

While I think missing files host is the ideal solution (that’s why I wrote it, for when I personally switched off of FTP), it’s not for everyone because we can’t automate or even provide instructions for the “tell your hosting provider to serve your old blog on a different domain” step. Everyone’s hosting provider is different, and I know that many people would get terminally frustrated at trying to get this to work.

So, missing files host works in some cases and, if you’re handy with the tools, is a great way to go. It can’t be our general solution to handle the FTP migration, though. What’s next?

Given the popularity of Apache web servers, using mod_redirect sounds very appealing. Here, the strategy would be to create a new domain for the blog, perhaps at blog.yourdomain.com. You’d write a quick, easy RewriteRule in .htaccess (or we could write it for you!) that would redirect all of your blog’s traffic, regardless of the path (and correctly handling uploaded_images and other exceptions), from the old host to the new host. Upload the .htaccess file, let the server do the redirects, Bob’s your uncle, &tc.

Quick and easy, though? Tell that to Louis Gray. We worked with Louis last Fall to field test our .htaccess migration strategy, and, though I have repressed most of the memories, I know at one point we had www.louisgray.com/blog/ redirecting correctly, but www.louisgray.com/blog (no trailing slash) ending up in a broken, double-redirected 404 page because of how the hosting provider was configured.

In this way, our dreams of a “universal” .htaccess/mod_rewrite solution, already somewhat delusional given that not all servers are Apache, and not all Apache servers have it turned on, blew up in our faces. Over the course of three hours. And Louis didn’t even have an uploaded_images directory to worry about. The variety of odd ways that an Apache server could be configured, and how those configurations could affect any suggested RewriteRules that we might provide, meant that this solution would not give anyone a chance at a smooth migration.

Nevertheless, I will note that mod_rewrite is the second suggested way of migrating off of FTP if you know what you’re doing. Unlike the missing files host strategy, it can work when your FTP blog was at a sub-path.

At this point in my narrative, it’s getting into December, and we have to shut off FTP around March. We’re still looking for a way of getting people on to custom domains in an automated, will-probably-pretty-much-just-work way, and neither missing files host nor mod_rewrite fit the bill.

An epiphany hit when we realized that we didn’t need the globally-unreliable mod_rewrite to generate a redirect; we could use the pages of the blog itself. We knew that we could reliably update those (for some definition of “reliably,” as FTP users know). They were, after all, exactly 100% of the files that we wanted to redirect to the new, Blogger-hosted custom domain. Though this technique would mean that we’d be limited to redirects that work embedded it static files, at least we’d know that those static files could be updated and served correctly.

We need three types of redirects:
  • Redirects for humans
  • Redirects for search engines
  • Redirects for feed readers
For humans, we’ll use the very traditional <meta> refresh tag, with a little bit of “this page has moved, click here to see it” HTML tossed in at the top of the page. People who follow old links to your blog will then get automatically redirected to the new URL after a short delay. This is certainly less seamless than the invisible HTTP redirect from the previous two solutions, but it’s also something that we can publish to any FTP server and have work in any browser.

For search engines, we can let them know that the page has moved. The trick is a <link> with rel="canonical" that will get invisibly added to the old posts. When a search engine sees this, it will update its index to point search traffic directly to the new URLs, so those visitors won’t even see the (somewhat ugly) redirect.

Feed readers are the last category, but sadly we have only the hackiest of solutions. As far as any of us know, there’s no Atom or RSS element to do the equivalent of a meta-refresh or rel="canonical". Therefore, we’ll have to settle for putting a “this feed has moved” post in the old feed, and rely on readers to update subscriptions themselves. This is certainly the least-satisfying aspect of this strategy, and we’re open to suggestions for fixes. (Note that if you’ve been using FeedBurner, you can just change what URL it burns and avoid this problem.)

So, once the migration tool is available, you’ll choose a new subdomain for us to host your blog on, and then we’ll do a final publish to your FTP server to put the redirect tags and the like in place. Of course, after the end of March, we won’t be able to update the FTP server for you. We’re looking at a solution for this; it’ll probably involve us generating a ZIP file of the redirect-ified posts that you can upload yourself.

And that’s how we plan to get FTP blogs moved over to custom domains. I hope that this post has given you an understanding of why we’re going with this method over some of the more obvious alternatives. We’ve set up an official blog that will have any important notices and announcements. Check it out for the latest.

I’d be happy to talk about the contents of this post in the comments below, but I won’t be able to offer specific support for FTP issues. We’re working on setting up a forum to handle any questions you might have about your own situation.