Seagate’s Hard Drive Debacle

Background

Mid January brought a small amount of coverage in IT media of widespread issues with Seagate’s SATA hard drives, including Barracuda 7200.11 drives. These are a popular drive series as they offer good performance, low price and a 5 year warranty.

There has been a fairly low level background dissatisfaction with these drives over the past few months, which finally came to a head in January 2009. People had been reporting failure of their drives for no apparent reason – and no warning – after they had been in use for only a few months; the drives would not be visible to the BIOS at boot.

In January, Seagate demonstrated how to turn every hard drive manufacturer’s nightmare into a very public PR disaster. To their credit, they did eventually resolve the problem, but the manner in which they did it was instructive – and depending on your involvement with the hardware, either amusing or extremely frustrating.

p1020028rs

Events

Due to increasing customer pressure, in mid January Seagate announced that they were aware of problems with the firmware on some of their SATA hard drives. In normal use, these drives write errors to a log on the drive, and when this log reaches a certain size, the drive fails.  At the time, they didn’t have a fix, but they warned that these drives could stop working at any time. Data on the drives would be safe, but would require professional data recovery services to reclaim. Failure was apparently random; the best way to avoid disaster was to leave systems running and make sure backups were current.

There was something of an uproar on the Seagate forums, although judging from the content of the messages, many of the more vocal contributors were probably not professional or competent administrators (many didn’t have backups).

Seagate used a KB article to communicate with those affected by this issue. Initially this page described the problem, and announced that a fix would be available shortly. To obtain the fix, it was necessary for customers to contact Seagate by email, provide the drive model number, part number, serial number and current firmware; and the necessary information would be provided by Seagate via email when it was available.

Following the process via the link provided involved setting up a personal account with Seagate, and then opening a support case against that account. An alternative link was also provided elsewhere that allowed customers to bypass that process and contact an email address directly. Neither produced any result for over a week; in fact, customer accounts became inaccessible soon after they were opened, with no way to update or check their status.

At some stage over the next couple of days, a drive serial number checker appeared on the KB article, and was immediately discredited as customers who entered serial numbers of known defective drives (ie, ones in their possession that had actually failed) were told that their drives were not affected. It seemed fairly clear that Seagate had no idea how widespread the problem was, or which models were affected.

The serial number checker disappeared after a day or so.

Shortly after this, a link to firmware SD1A was published. Immediate reports back to the forums were that this didn’t work – it failed to locate the target drives and crashed instead of exiting gracefully. Not confidence inspiring. A note appeared on the KB article to the effect that the firmware had been removed for validation. This took a couple of days.

The next firmware released was identified as SD1A (seeing a pattern here?) and there were reports that this successfully installed in most cases. However, there was a significant problem with this firmware – it worked fine with 1TB drives, but people who applied it to 500GB drives found that once it installed successfully, the drive would not boot when they restarted the system. Even worse, some people who had multiple drives (in one case, 4 drives) found that there was no way to control how the firmware was applied, and all drives that were connected would be updated – and subsequently unable to boot.

Seagate had managed to escalate the problem to involve customers who didn’t have a problem; they were now unable to access any of their data.

After another 2 days or so, a third firmware (SD1A again!) was released, and this did seem to resolve the original problem, and at the same time reinstate those 500GB drives that had been rendered unusable. It would be prudent to wait to see if there are reports of issues in the forums before downloading and applying this fix, though.

Looking Back

What was particularly interesting about this incident was the number of things that Seagate failed to do to placate their customers – in this kind of situation, people really need to be kept informed and reassured that their needs are going to be met. The background in Seagate’s case was particularly worrying – the problem had been ongoing for some months, with no action on Seagate’s part; the company’s share price had plummeted in recent months; and the CEO and COO had recently left the company.

From the repeated attempts to produce correctly working firmware, to the way that they continually updated their KB article and other communications with no times or dates (or any other way of telling what revisions had taken place), to silently updating the firmware through multiple revisions, Seagate really gave the impression that they were not handling this problem in a rigorous manner.

It will be interesting to see where these issues go from here. At one stage, Seagate had reassured customers that they would meet costs of data recovery, which could get very expensive, very quickly. There must be many incidents of customers losing data and having to restore from backups before this problem was made public, who probably rightly feel that Seagate should contribute to some of their costs.

Meanwhile, Seagate claim that very few drives are likely to ever be affected by the original problem. Upgrading to the latest firmware is probably the most prudent option at this time, as once the drive has failed, it needs to be returned to Seagate for repair.

Links:

Media coverage at TechReport:

http://techreport.com/discussions.x/16246

Original Seagate KB article:

http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931

Seagate announcement in forums:

http://forums.seagate.com/stx/board/message?board.id=ata_drives&thread.id=4771

Latest firmware:

http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207951

Unofficial explanation from a Seagate employee:

http://it.slashdot.org/comments.pl?sid=1098793&cid=26542735

Forum discussion:

http://forums.seagate.com/stx/board/message?board.id=ata_drives&thread.id=3668

Email address for support:

discsupport@seagate.com

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s