[OmniOS-discuss] iSCSI traffic suddenly comes to a halt and then resumes

Narayan Desai narayan.desai at gmail.com
Tue May 5 17:24:05 UTC 2015


If the theory is that you have a small number of drives causing trouble,
then smaller raid sets would probably help, depending on the number of
marginal devices you have.

I bet that you see a few drives pegged when you start looking at device
level service times.
 -nld

On Tue, May 5, 2015 at 11:48 AM, Matej Zerovnik <matej at zunaj.si> wrote:

> I will replace the hardwarw in about 4 months with all SAS drives, but I
> would love to have a working setup for the time being as well;)
>
> I looked at smart stats and there doesnt seem to be any errors. Also, no
> hard/soft/transfer error reported by any drive. Will take a look at service
> time tomorrow, maybe put the drives to graphite and look at them over a
> longer period.
>
> I looked at iostat -x status today and stats for pool itself reported 100%
> busy most of the time, 98-100% wait, 500-1300 transactions in queue, around
> 500 active,... First line, that is average from boot, says avg service
> time.is around 1600ms which seems like aaaalot. Can it be due to really
> big queue?
>
> Would it help to create 5 10drives raidz pools instead of one with 50
> drives?
>
> Matej
> ------------------------------
> From: Narayan Desai <narayan.desai at gmail.com>
> Sent: ‎5.‎5.‎2015 16:32
> To: Michael Rasmussen <mir at miras.org>
> Cc: Matej Zerovnik <matej at zunaj.si>; omnios-discuss
> <omnios-discuss at lists.omniti.com>
> Subject: Re: [OmniOS-discuss] iSCSI traffic suddenly comes to a halt and
> then resumes
>
> And, if you don't have the luxury of discarding hardware and replacing it
> with a supported configuration, you might look at finding marginal drives,
> either via error counters displayed in iostat -En, or drives with really
> high service times (in iostat -xnz output). We found (on a similar setup),
> that being really aggressive about drive replacement helped a lot.
>
> If you have desktop sata drives, then the drive firmware is part of the
> problem. Desktop drives retry for quite a long time when they encounter
> errors, which produce really inconsistent performance profiles. When you
> aggregate into a raid set (including in ZFS) tail latencies really start to
> matter for performance, and the pool just starts going out to lunch for a
> long time. If you can figure out and replace the drive is causing the
> problem (even if it isn't causing any hard errors), the pool performance
> goes back to normal.
>  -nld
>
> On Tue, May 5, 2015 at 4:21 AM, <mir at miras.org> wrote:
>
>> On 2015-05-05 09:46, Matej Zerovnik wrote:
>>
>>>
>>> We still kept our SATA hard drives in Supermicro JBOD with SAS
>>> expander and SATA drives.
>>>
>>>  Your problem boils down to using SATA disks in a SAS expander. Search
>> omnios user list and you will find numerous proofs that using SATA disks in
>> a SAS expander causes weird behaviors and instability.
>>
>> The fact is that SATA disks is unsupported in a SAS expander due to
>> incompatibility between command sets in SAS and SATA. As an example SATA
>> NCQ is not passed through the SAS expander which might could be the cause
>> of your strange iSCSI disconnects experienced on the client side.
>>
>> ----
>>
>> This mail was virus scanned and spam checked before delivery.
>> This mail is also DKIM signed. See header dkim-signature.
>>
>>
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.omniti.com/pipermail/omnios-discuss/attachments/20150505/ba1a0cca/attachment.html>


More information about the OmniOS-discuss mailing list