Smartmontools saves the day!


While booting up my x86 laptop this week, I noticed the following errors on the console:

Feb 26 18:16:54 zebox smartd[492]: Device: /dev/ad0, 1 Currently unreadable (pending) sectors
Feb 26 18:16:54 zebox smartd[492]: Device: /dev/ad0, 1 Offline uncorrectable sectors
Feb 26 18:46:55 zebox smartd[492]: Device: /dev/ad0, 1 Currently unreadable (pending) sectors
Feb 26 18:46:55 zebox smartd[492]: Device: /dev/ad0, 1 Offline uncorrectable sectors

Eeeeep – it looks like the disk drive is going bad. To verify this, I decided to run smartctl against the device:

$ smartctl -a /dev/ad0

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 253 006 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 92
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 068 062 030 Pre-fail Always - 6508058
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 329
10 Spin_Retry_Count 0x0013 100 100 034 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 96
187 Unknown_Attribute 0x0032 094 094 000 Old_age Always - 6
189 Unknown_Attribute 0x003a 100 100 000 Old_age Always - 0
190 Unknown_Attribute 0x0022 065 055 045 Old_age Always - 622067747
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 56
193 Load_Cycle_Count 0x0032 090 090 000 Old_age Always - 21143
194 Temperature_Celsius 0x0022 035 045 000 Old_age Always - 35 (Lifetime Min/Max 0/15)
195 Hardware_ECC_Recovered 0x001a 078 054 000 Old_age Always - 177096143
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

Hmmm – the value of Seek_Error_Rate looks extremely high, so I decided to run smartctl a second time to see if the value of Seek_Error_Rate was climbing:

$ smartctl -a /dev/ad0

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 253 006 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 92
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 068 062 030 Pre-fail Always - 6508123
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 329
10 Spin_Retry_Count 0x0013 100 100 034 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 96
187 Unknown_Attribute 0x0032 094 094 000 Old_age Always - 6
189 Unknown_Attribute 0x003a 100 100 000 Old_age Always - 0
190 Unknown_Attribute 0x0022 065 055 045 Old_age Always - 622067747
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 56
193 Load_Cycle_Count 0x0032 090 090 000 Old_age Always - 21146
194 Temperature_Celsius 0x0022 035 045 000 Old_age Always - 35 (Lifetime Min/Max 0/15)
195 Hardware_ECC_Recovered 0x001a 078 054 000 Old_age Always - 177096157
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

Sure enough, the value was increasing at a staggering rate! Since I had just purchased the drive from NewEgg, I gave them a call, and they are going to send me a replacement. Viva la smartmontools!

This article was posted by Matty on 2006-03-04 10:49:00 -0400 -0400