Out SMART Your Hard Drive


If you have used PCs over the course of your career, I am sure you are well aware of the dreaded “click of death” that occurs when a disk drive fails. These failures can be devastating, and usually cost companies and individuals thousands of dollars, and a good deal of stress. I was curious to see if any solutions were available to address this problem, and came across Self-Monitoring, Analysis, and Reporting Technology (also referred to as SMART or S.M.A.R.T) while researching this problem.

SMART is a method through which devices monitor, store and analyze information on their operational state. This state information is exported through a set of attributes (e.g., temperature, number of reallocated sectors, seek errors), which software solutions can use to measure the health of a device, predict when a device may fail, and provide notifications when attributes are approaching unsafe values.

One software solution that allows you to monitor and manage SMART devices is smartmontools. smartmontools supports a wide variety of hardware and Operating Systems (e.g., FreeBSD, Linux, OpenBSD, OS X, OS/2, Solaris, Windows), contains a complete set of documentation, and includes a command line utility (smartctl(1m)) and UNIX daemon (smartd(1m)) that can be used to view SMART attributes, run device self-tests, and notify support personnel when problems are detected.

Using smartmontools

To get started with smartmontools, the source code can be downloaded from sourceforge, and the typical “configure,” “make,” and “make install” process can be used to compile and install the software in the default location:

$ wget http://unc.dl.sourceforge.net/sourceforge/smartmontools/smartmontools-5.36.tar.gz

$ gtar xfvz smartmontools-5.36.tar.gz

$ cd smartmontools-5.36

$ ./configure

$ make

$ sudo make install

If you are using OpenBSD or FreeBSD, you can install smartmontools from the ports collection by executing “make install” in the smartmontools ports directory:

$ cd /usr/ports/sysutils/smartmontools

$ make install

Once the binaries are compiled and installed, the smartctl(1m) utility can be invoked with the “-h” (help) option to print the available options:

$ smartctl -h

smartctl version 5.36 [sparc-sun-solaris2.10] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Usage: smartctl [options] device

============================================ SHOW INFORMATION OPTIONS =====

  -h, --help, --usage
         Display this help and exit

  -V, --version, --copyright, --license
         Print license, copyright, and version information and exit

  -i, --info                                                       
         Show identity information for device

  -a, --all                                                        
         Show all SMART information for device

================================== SMARTCTL RUN-TIME BEHAVIOR OPTIONS =====

  -q TYPE, --quietmode=TYPE                                           (ATA)
         Set smartctl quiet mode to one of: errorsonly, silent

  -d TYPE, --device=TYPE
         Specify device type to one of: ata, scsi, marvell, 3ware,N

  -T TYPE, --tolerance=TYPE                                           (ATA)
         Tolerance: normal, conservative, permissive, verypermissive

  -b TYPE, --badsum=TYPE                                              (ATA)
         Set action on bad checksum to one of: warn, exit, ignore

  -r TYPE, --report=TYPE
         Report transactions (see man page)

============================== DEVICE FEATURE ENABLE/DISABLE COMMANDS =====

  -s VALUE, --smart=VALUE
        Enable/disable SMART on device (on/off)

  -o VALUE, --offlineauto=VALUE                                       (ATA)
        Enable/disable automatic offline testing on device (on/off)

  -S VALUE, --saveauto=VALUE                                          (ATA)
        Enable/disable Attribute autosave on device (on/off)

======================================= READ AND DISPLAY DATA OPTIONS =====

  -H, --health
        Show device SMART health status

  -c, --capabilities                                                  (ATA)
        Show device SMART capabilities

  -A, --attributes                                                         
        Show device SMART vendor-specific Attributes and values

  -l TYPE, --log=TYPE
        Show device log. TYPE: error, selftest, selective, directory

  -v N,OPTION , --vendorattribute=N,OPTION                            (ATA)
        Set display OPTION for vendor Attribute N (see man page)

  -F TYPE, --firmwarebug=TYPE                                         (ATA)
        Use firmware bug workaround: none, samsung, samsung2

  -P TYPE, --presets=TYPE                                             (ATA)
        Drive-specific presets: use, ignore, show, showall

============================================ DEVICE SELF-TEST OPTIONS =====

  -t TEST, --test=TEST
        Run test.  TEST is: offline short long conveyance select,M-N pending,N afterselect,on afterselect,off

  -C, --captive
        Do test in captive mode (along with -t)

  -X, --abort
        Abort any non-captive test on device

=================================================== SMARTCTL EXAMPLES =====

  smartctl -a /dev/rdsk/c0t0d0s0             (Prints all SMART information)

  smartctl --smart=on --offlineauto=on --saveauto=on /dev/rdsk/c0t0d0s0
                                              (Enables SMART on first disk)

  smartctl -t long /dev/rdsk/c0t0d0s0 (Executes extended disk self-test)

  smartctl --attributes --log=selftest --quietmode=errorsonly /dev/rdsk/c0t0d0s0
                                      (Prints Self-Test & Attribute errors)

Once you review the available options, smartctl(1m) can be invoked with the “-i” (show device identity) option to see if a device supports SMART:

$ smartctl -i /dev/rdsk/c0t0d0s0

smartctl version 5.36 [sparc-sun-solaris2.9] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     ST3120023A
Serial Number:    3KA192MF
Firmware Version: 3.33
User Capacity:    120,034,123,776 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 2
Local Time is:    Fri May 27 10:34:53 2005 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

If SMART is supported by the device, it will be indicated as shown above. If smartctl(1m) indicates that SMART is not enabled, the smartctl(1m) “-s on” ( enable/disable SMART on device ) option can be used to enable SMART on the device passed as an argument:

$ smartctl -s on /dev/rdsk/c0t0d0s0

smartctl version 5.36 [sparc-sun-solaris2.9] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.

SMART compliant devices will support a set of capabilities. These capabilities indicate which SMART features are supported by the device, and includes items such as offline surface scanning support, error logging support, or the ability to perform offline self-tests. To see which capabilities are supported on a device, smartctl(1m) can be executed with the “-c” (show capabilities) option:

$ smartctl -c /dev/rdsk/c0t0d0s0

smartctl version 5.36 [sparc-sun-solaris2.10] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                 ( 426) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  84) minutes.

Once SMART is enabled and the capabilities have been reviewed, the smartctl(1m) utility can be executed with the “-H” (health status) option to retrieve a devices overall SMART health status:

$ smartctl -H /dev/rdsk/c0t0d0s0

smartctl version 5.36 [sparc-sun-solaris2.9] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

In addition to viewing overall drive health, smartctl(1m) allows you to view SMART attributes. These attributes contain information such as operating temperature, reallocated sectors, seek errors, CRC errors, etc. SMART attributes are invaluable for locating environment problems and faulty devices. To view SMART attributes, smartctl(1m) can be invoked with the “-A” (Show device SMART vendor-specific Attributes and values) option, and the device to retrieve the attribute values from:

$ smartctl -A /dev/rdsk/c0t0d0s0

smartctl version 5.36 [sparc-sun-solaris2.10] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   056   051   006    Pre-fail  Always       -       207936224
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   078   062   030    Pre-fail  Always       -       59492653
  9 Power_On_Hours          0x0032   079   079   000    Old_age   Always       -       19215
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       73
194 Temperature_Celsius     0x0022   043   054   000    Old_age   Always       -       43
195 Hardware_ECC_Recovered  0x001a   056   051   000    Old_age   Always       -       207936224
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

If an attribute value begin to approach the value defined in the THRESH column, this may be an indication that the device is reaching the end of it’s useful life. If you suspect that a device may be close to failure (e.g., you hear clicking), but the attributes are still above the values defined in the THRESH column, a SMART self test can be performed on the drive. This will cause the device to update the drives SMART attributes, and log any errors it finds to the devices self-test log. To run a self-test, smartctl(1m) can be invoked with the “-t” (test) option, a test to run, and a device to test:

$ smartctl -t offline /dev/rdsk/c0t0d0s0

smartctl version 5.36 [sparc-sun-solaris2.9] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART off-line routine immediately in off-line mode".
Drive command "Execute SMART off-line routine immediately in off-line mode" successful.
Testing has begun.
Please wait 426 seconds for test to complete.
Test will complete after Wed Sep 28 20:07:26 2005

Use smartctl -X to abort test.

To retrieve the results of the offline self-test, smartctl(1m) can be invoked with the “-l” (Show device log) option, and the log type (e.g., SMART error log, SMART selective self test log, SMART self test log, ot the log directory) to view:

$ smartctl -l selftest /dev/rdsk/c0t0d0s0

smartctl version 5.36 [sparc-sun-solaris2.9] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       929         -

This test indicates that the self-test completed without error. To get additional detail on the work that occurs behind the scenes with smartctl(1m), the “-r ioctl” option can be appended to the smartctl(1m) command line:

$ smartctl -r ioctl -i /dev/rdsk/c0t0d0s0

smartctl version 5.36 [sparc-sun-solaris2.9] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

REPORT-IOCTL: DeviceFD=3 Command=IDENTIFY DEVICE
REPORT-IOCTL: DeviceFD=3 Command=IDENTIFY DEVICE returned 0
=== START OF INFORMATION SECTION ===
Device Model:     ST320414A
Serial Number:    3EC1CNGY
Firmware Version: 3.28
User Capacity:    20,404,101,120 bytes

Device is:        In smartctl database [for details use: -P show]
ATA Version is:   5
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu Mar  3 14:07:58 2005 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS
REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS returned 0

REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK
REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned 0

REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE VALUES
REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE VALUES returned 0

REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE THRESHOLDS
REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE THRESHOLDS returned 0

The smartctl(1m) examples up to this point have provided invaluable status information, but the examples required input from the keyboard. To setup automated alerts when devices fail or a SMART attribute changes, a cron job can be developed to check smartctl(1m) self-test logs, or the smartd(1m) daemon can be configured to analyze devices, and report anomalies when they are detected.

Conclusion

This article just began to touch the surface of what smartmontools can do, and I will refer you to the manual pages and documentation for further details. I use smartmontools on my servers and laptop to notify me when disk drives are about to fail. This ensures that I have time to backup my data before a disk drive fails, and saves me from having to purchase large quantities of Advil to deal with unexpected drive failure!!! As with all software, you should read the FAQ and documentation prior to using the software, and perform all testing on a test system. If you have questions or comments on the article, please feel free to E-mail the author.

References

The following references were used while writing this article:

Acknowledgements

Ryan would like to thank the smartmontools developers for their awesome work!