Discussion:
[Samba] Many "smbd" process with "D" uninterruptible sleep status
Simon Wang
2013-10-17 07:06:22 UTC
Permalink
I got a samba server(Linux) to store some data in my PC(windows)
frequently. And last night I found that samba can't work. After logined to
samba server and found that there's 91 smbd processes with status "D"
totally.
# ps
169 root 0 SW [pdflush]
170 root 0 DW [pdflush]
....
29534 simon 20608 D /sbin/smbd -D
29548 root 2792 D /bin/sync
30160 simon 20608 D /sbin/smbd -D
30474 simon 20608 D /sbin/smbd -D
30496 simon 18676 D /sbin/smbd -D
30673 simon 20504 D /sbin/smbd -D
30810 simon 20504 D /sbin/smbd -D
31302 simon 20608 D /sbin/smbd -D
31965 simon 20608 D /sbin/smbd -D
32288 simon 20608 D /sbin/smbd -D
...
And after 5 mins the result of "ps" is the same.

The result of "top" shows below. The load average is VERY HIGHT and CPU is
busy for io.
# top
Mem: 366504K used, 143900K free, 0K shrd, 14216K buff, 293224K cached
CPU: 1.7% usr 3.5% sys 0.0% nic 0.0% idle 92.9% io 1.7% irq 0.0% sirq
Load average: 103.10 103.03 102.65 1/149 5614
PID PPID USER STAT VSZ %MEM %CPU COMMAND
5532 3325 root R 3000 0.5 3.5 top
2421 1 simon D 20712 4.0 0.0 /sbin/smbd -D
594 1 simon D 20712 4.0 0.0 /sbin/smbd -D
4130 1 simon D 20712 4.0 0.0 /sbin/smbd -D
1261 1 simon D 20608 4.0 0.0 /sbin/smbd -D
23452 1 simon D 20608 4.0 0.0 /sbin/smbd -D
30474 1 simon D 20608 4.0 0.0 /sbin/smbd -D
21641 1 simon D 20608 4.0 0.0 /sbin/smbd -D
9053 1 simon D 20608 4.0 0.0 /sbin/smbd -D
3068 1 simon D 20608 4.0 0.0 /sbin/smbd -D

After reboot, the samba works fine. Until now I cannot reproduce yet.
I can't figure out what's going on.
Is it kind of bugs in kernel or samba?
Does pdflush crash first, then affect smbd and sync to crash?

Please help,
Thanks very much.


I used some options in smb.conf:
max smbd process = 100
max connections = 100

Linux Samba Server:
Linux 2.6.31.8
Samba 3.5.6
Software RAID 1


Simon
Volker Lendecke
2013-10-17 08:02:49 UTC
Permalink
Post by Simon Wang
I got a samba server(Linux) to store some data in my PC(windows)
frequently. And last night I found that samba can't work. After logined to
samba server and found that there's 91 smbd processes with status "D"
totally.
# ps
169 root 0 SW [pdflush]
170 root 0 DW [pdflush]
....
29534 simon 20608 D /sbin/smbd -D
29548 root 2792 D /bin/sync
30160 simon 20608 D /sbin/smbd -D
30474 simon 20608 D /sbin/smbd -D
30496 simon 18676 D /sbin/smbd -D
30673 simon 20504 D /sbin/smbd -D
30810 simon 20504 D /sbin/smbd -D
31302 simon 20608 D /sbin/smbd -D
31965 simon 20608 D /sbin/smbd -D
32288 simon 20608 D /sbin/smbd -D
...
And after 5 mins the result of "ps" is the same.
The result of "top" shows below. The load average is VERY HIGHT and CPU is
busy for io.
Long-running D is almost always a kernel problem. D means
uninterruptible wait. smbd made some syscall that does not
return for a long time. With a working kernel no user space
process should be able to do that. Look at your syslog
(/var/log/messages or /var/log/syslog or such) for kernel
problems. Something else might be extremely slow storage
like a USB1 memory stick. It should eventually come back,
but this could look the same.

Volker
--
SerNet GmbH, Bahnhofsallee 1b, 37081 G?ttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG G?ttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:kontakt at sernet.de
Jeff Layton
2013-10-17 13:46:03 UTC
Permalink
On Thu, 17 Oct 2013 10:02:49 +0200
Post by Volker Lendecke
Post by Simon Wang
I got a samba server(Linux) to store some data in my PC(windows)
frequently. And last night I found that samba can't work. After logined to
samba server and found that there's 91 smbd processes with status "D"
totally.
# ps
169 root 0 SW [pdflush]
170 root 0 DW [pdflush]
....
29534 simon 20608 D /sbin/smbd -D
29548 root 2792 D /bin/sync
30160 simon 20608 D /sbin/smbd -D
30474 simon 20608 D /sbin/smbd -D
30496 simon 18676 D /sbin/smbd -D
30673 simon 20504 D /sbin/smbd -D
30810 simon 20504 D /sbin/smbd -D
31302 simon 20608 D /sbin/smbd -D
31965 simon 20608 D /sbin/smbd -D
32288 simon 20608 D /sbin/smbd -D
...
And after 5 mins the result of "ps" is the same.
The result of "top" shows below. The load average is VERY HIGHT and CPU is
busy for io.
Long-running D is almost always a kernel problem. D means
uninterruptible wait. smbd made some syscall that does not
return for a long time. With a working kernel no user space
process should be able to do that. Look at your syslog
(/var/log/messages or /var/log/syslog or such) for kernel
problems. Something else might be extremely slow storage
like a USB1 memory stick. It should eventually come back,
but this could look the same.
Volker
It's not necessarily a kernel "problem" per-se, but you're correct that
that indicates that the process is stuck in uninterruptible sleep in
the kernel.

Another thing you can do to troubleshoot this is to see what the
processes are doing with:

# cat /proc/<pid>/stack

That'll give you a kernel stack trace for the process in question,
which may give you some hint as to what they're hanging on.
--
Jeff Layton <jlayton at samba.org>
Stéphane PURNELLE
2013-10-17 08:31:35 UTC
Permalink
Monday I got the same problem but in other situation.

We have 2 physical server (node1 et node2) and a software Lifekeeper
(software doing HA, network raid and failover)

Monday : node2 crash
Lifekeeper do a auto-failover and all service start on node1
but node1 take high loading for unknow reason and samba not responding
correctly.
And I see the same case than you, many smbd process in state D.

solution : hardware reset of server.

I'm afraid because we plan to replace these servers by new servers with
samba 4 with same configuration.
I don't know if this issue is due to number of client who try to reconnect
or other thing.

best regards

St?phane

-----------------------------------
St?phane PURNELLE Admin. Syst?mes et R?seaux
Service Informatique Corman S.A. Tel : 00 32 (0)87/342467
De : Simon Wang <simonnn.wang at gmail.com>
A : samba at lists.samba.org,
Date : 17/10/2013 09:06
Objet : [Samba] Many "smbd" process with "D" uninterruptible sleep
status
Envoy? par : samba-bounces at lists.samba.org
I got a samba server(Linux) to store some data in my PC(windows)
frequently. And last night I found that samba can't work. After logined to
samba server and found that there's 91 smbd processes with status "D"
totally.
# ps
169 root 0 SW [pdflush]
170 root 0 DW [pdflush]
....
29534 simon 20608 D /sbin/smbd -D
29548 root 2792 D /bin/sync
30160 simon 20608 D /sbin/smbd -D
30474 simon 20608 D /sbin/smbd -D
30496 simon 18676 D /sbin/smbd -D
30673 simon 20504 D /sbin/smbd -D
30810 simon 20504 D /sbin/smbd -D
31302 simon 20608 D /sbin/smbd -D
31965 simon 20608 D /sbin/smbd -D
32288 simon 20608 D /sbin/smbd -D
...
And after 5 mins the result of "ps" is the same.
The result of "top" shows below. The load average is VERY HIGHT and CPU is
busy for io.
# top
Mem: 366504K used, 143900K free, 0K shrd, 14216K buff, 293224K cached
CPU: 1.7% usr 3.5% sys 0.0% nic 0.0% idle 92.9% io 1.7% irq 0.0% sirq
Load average: 103.10 103.03 102.65 1/149 5614
PID PPID USER STAT VSZ %MEM %CPU COMMAND
5532 3325 root R 3000 0.5 3.5 top
2421 1 simon D 20712 4.0 0.0 /sbin/smbd -D
594 1 simon D 20712 4.0 0.0 /sbin/smbd -D
4130 1 simon D 20712 4.0 0.0 /sbin/smbd -D
1261 1 simon D 20608 4.0 0.0 /sbin/smbd -D
23452 1 simon D 20608 4.0 0.0 /sbin/smbd -D
30474 1 simon D 20608 4.0 0.0 /sbin/smbd -D
21641 1 simon D 20608 4.0 0.0 /sbin/smbd -D
9053 1 simon D 20608 4.0 0.0 /sbin/smbd -D
3068 1 simon D 20608 4.0 0.0 /sbin/smbd -D
After reboot, the samba works fine. Until now I cannot reproduce yet.
I can't figure out what's going on.
Is it kind of bugs in kernel or samba?
Does pdflush crash first, then affect smbd and sync to crash?
Please help,
Thanks very much.
max smbd process = 100
max connections = 100
Linux 2.6.31.8
Samba 3.5.6
Software RAID 1
Simon
--
To unsubscribe from this list go to the following URL and read the
instructions: https://lists.samba.org/mailman/options/samba
Loading...