[OmniOS-discuss] disk failure causing reboot?
jstockett at molalla.com
Mon May 18 20:33:33 UTC 2015
The pool is set to fail mode wait.
In looking at the fmdump -e and fmdump -eV output, it looks just like the drive started having media/disk/transport errors around 3:40am and eventually culminated in the reboot around 6:18am. The funny thing is that driver-assessment = fatal was returned 42 times on the same device in that period, so I'm not quite sure why it didn't just drop the drive - because the documentation says:
Note: An ereport with the value driver-assessment = fatal results in the fault being propagated. It appears it didn't drop the drive until after it rebooted. I can upload the crash dump and or fmdump output if anyone is interested.
From: Paul Henson [mailto:paul.b.henson at gmail.com] On Behalf Of Paul B. Henson
Sent: Monday, May 18, 2015 1:09 PM
To: Jeff Stockett
Cc: omnios-discuss at lists.omniti.com
Subject: Re: [OmniOS-discuss] disk failure causing reboot?
On Mon, May 18, 2015 at 06:25:34PM +0000, Jeff Stockett wrote:
> A drive failed in one of our supermicro 5048R-E1CR36L servers running
> omnios r151012 last night, and somewhat unexpectedly, the whole system
> seems to have panicked.
You don't happen to have failmode set to panic on the pool?
>From the zpool manpage:
failmode=wait | continue | panic
Controls the system behavior in the event of catastrophic pool
failure. This condition is typically a result of a loss of
connectivity to the underlying storage device(s) or a failure of
all devices within the pool. The behavior of such an event is
determined as follows:
Blocks all I/O access until the device connectivity is
recovered and the errors are cleared. This is the
Returns EIO to any new write I/O requests but allows
reads to any of the remaining healthy devices. Any
write requests that have yet to be committed to disk
would be blocked.
Prints out a message to the console and generates a
system crash dump.
More information about the OmniOS-discuss