freenas 7 server uitgevallen en start niet meer op

Pagina: 1
Acties:

Acties:
  • 0 Henk 'm!

  • timberleek
  • Registratie: Juli 2009
  • Laatst online: 16-06 21:50
hallo

Ik heb hier al een tijd een freenas 7 server draaien, inmiddels met 4 harde schijven:
500gb WDC als bufferschijf voor torrents
750Gb seagate barracuda voor muziek, backups van pc's en los spul
2TB samsung spinpoint f4 voor films
1TB WDC enterprise voor series

Ik weet dat freenas 7 wel wat verouderd is, ik wil ook wat anders (meteen heel wat anders, mischien ubuntu ofzo), maar daar heb ik nu geen tijd voor.

Nu kwam ik van de week thuis, en hij stond te wachten in bios. hdd led vol aan.
Opstarten wil hij niet meer, ik kan niet eens in bios komen.

Als ik de pc start gaat de hdd led aan meteen zodra hij voorbij zijn post is (scherm komt net in beeld). dan reageert hij nergens meer op, maar is niet vastgelopen. Als ik de delete toets ingedrukt houdt gaat hij tikken (wat je vel vaker hoort als je een toets indrukt)(met del kom je normaal gesproken in bios), maar hij gaat niet meer naar bios, hij doet gewoon niks. Alleen de hdd led blijft aan.

Als ik de schijven afkoppel en hem dan aanzet start hij wel gewoon op (maar natuurlijk kan hij de schijven niet vinden).
Ik heb een voor een de schijven afgekoppeld en ik heb nu gevonden dat hij wel doorstart als ik de 750gb schijf afkoppel. dan start hij weer gewoon goed op (al mist hij dan wel die schijf). ik kan dan de rest van de schijven gewoon weer openen, samba werkt weer enzo.

freenas zelf is geinstalleerd op een usb stick

Heeft iemand een idee wat hier aan de hand is?
waarom start hij niet op met die 750gb schijf?
En is er nog ergens terug te vinden waarom hij uit is gegaan? (hij is ondertussen al een keer aan en uit geweest, dat was achteraf misschien niet slim)

alvast bedankt
timberleek

edit:
nu ik het me bedenkt, ik heb laatst smart error mails gehad van de server. Heb even gekeken, dat was van dezelfde schijf (de seagate).
eentje op 21 juni, en een op 22 juni:

21 juni: SMART error (OfflineUncorrectableSector) detected on host: freenas.local‏
22 juni: SMART error (ErrorCount) detected on host: freenas.local‏

de volledige mails:
This email was generated by the smartd daemon running on:

host name: freenas.local
DNS domain: local
NIS domain:

The following warning/error was logged by the smartd daemon:

Device: /dev/ad6, 1 Offline uncorrectable sectors


For details see host's SYSLOG (default: /var/log/messages).

You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent. smartctl 5.40 2010-10-16 r3189 [FreeBSD 7.3-RELEASE-p3 i386] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.11 family
Device Model: ST3750330AS
Serial Number: 5QK05FSW
Firmware Version: SD35
User Capacity: 750,156,374,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Thu Jun 21 21:13:56 2012 UTC

==> WARNING: There are known problems with these drives,
see the following Seagate web pages:
http://seagate.custkb.com...e/search.jsp?DocId=207931
http://seagate.custkb.com...e/search.jsp?DocId=207951
http://seagate.custkb.com...e/search.jsp?DocId=207957

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 634) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 167) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103b) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 111 099 006 Pre-fail Always - 179217153
3 Spin_Up_Time 0x0003 096 084 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 097 097 020 Old_age Always - 3635
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 148
7 Seek_Error_Rate 0x000f 066 060 030 Pre-fail Always - 68781239153
9 Power_On_Hours 0x0032 067 067 000 Old_age Always - 29416
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 58
12 Power_Cycle_Count 0x0032 100 037 020 Old_age Always - 236
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 074 000 Old_age Always - 171801510054
189 High_Fly_Writes 0x003a 016 016 000 Old_age Always - 84
190 Airflow_Temperature_Cel 0x0022 056 043 045 Old_age Always In_the_past 44 (1 174 44 26)
194 Temperature_Celsius 0x0022 044 058 000 Old_age Always - 44 (0 9 0 0)
195 Hardware_ECC_Recovered 0x001a 030 024 000 Old_age Always - 179217153
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 29399 -
# 2 Short offline Completed without error 00% 29375 -
# 3 Short offline Completed without error 00% 29351 -
# 4 Short offline Completed without error 00% 29336 -
# 5 Short offline Completed without error 00% 29288 -
# 6 Short offline Completed without error 00% 29264 -
# 7 Short offline Completed without error 00% 29240 -
# 8 Short offline Completed without error 00% 29216 -
# 9 Short offline Completed without error 00% 29192 -
#10 Short offline Completed without error 00% 29168 -
#11 Short offline Completed without error 00% 29120 -
#12 Short offline Completed without error 00% 29096 -
#13 Short offline Completed without error 00% 29072 -
#14 Short offline Completed without error 00% 29048 -
#15 Short offline Completed without error 00% 29024 -
#16 Short offline Completed without error 00% 29000 -
#17 Short offline Completed without error 00% 28952 -
#18 Short offline Completed without error 00% 28928 -
#19 Short offline Completed without error 00% 28904 -
#20 Short offline Completed without error 00% 28880 -
#21 Short offline Completed without error 00% 28856 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


22 juni:
This email was generated by the smartd daemon running on:

host name: freenas.local
DNS domain: local
NIS domain:

The following warning/error was logged by the smartd daemon:

Device: /dev/ad6, ATA error count increased from 0 to 22

For details see host's SYSLOG (default: /var/log/messages).

You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent. smartctl 5.40 2010-10-16 r3189 [FreeBSD 7.3-RELEASE-p3 i386] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.11 family
Device Model: ST3750330AS
Serial Number: 5QK05FSW
Firmware Version: SD35
User Capacity: 750,156,374,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Fri Jun 22 18:13:56 2012 UTC

==> WARNING: There are known problems with these drives,
see the following Seagate web pages:
http://seagate.custkb.com...e/search.jsp?DocId=207931
http://seagate.custkb.com...e/search.jsp?DocId=207951
http://seagate.custkb.com...e/search.jsp?DocId=207957

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 634) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 167) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103b) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 104 099 006 Pre-fail Always - 202131103
3 Spin_Up_Time 0x0003 096 084 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 097 097 020 Old_age Always - 3635
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 149
7 Seek_Error_Rate 0x000f 066 060 030 Pre-fail Always - 68781318537
9 Power_On_Hours 0x0032 067 067 000 Old_age Always - 29437
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 58
12 Power_Cycle_Count 0x0032 100 037 020 Old_age Always - 236
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 077 077 000 Old_age Always - 23
188 Command_Timeout 0x0032 100 074 000 Old_age Always - 171801510054
189 High_Fly_Writes 0x003a 016 016 000 Old_age Always - 84
190 Airflow_Temperature_Cel 0x0022 064 043 045 Old_age Always In_the_past 36 (1 174 44 26)
194 Temperature_Celsius 0x0022 036 058 000 Old_age Always - 36 (0 9 0 0)
195 Hardware_ECC_Recovered 0x001a 030 024 000 Old_age Always - 202131103
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
ATA Error Count: 22 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 22 occurred at disk power-on lifetime: 29436 hours (1226 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 20 ff ff ff 4f 00 3d+21:27:54.638 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:51.639 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:48.732 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:45.784 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:42.877 READ DMA EXT

Error 21 occurred at disk power-on lifetime: 29436 hours (1226 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 20 ff ff ff 4f 00 3d+21:27:51.639 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:48.732 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:45.784 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:42.877 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:39.887 READ DMA EXT

Error 20 occurred at disk power-on lifetime: 29436 hours (1226 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 20 ff ff ff 4f 00 3d+21:27:48.732 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:45.784 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:42.877 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:39.887 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:36.996 READ DMA EXT

Error 19 occurred at disk power-on lifetime: 29436 hours (1226 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 20 ff ff ff 4f 00 3d+21:27:45.784 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:42.877 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:39.887 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:36.996 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:34.123 READ DMA EXT

Error 18 occurred at disk power-on lifetime: 29436 hours (1226 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 20 ff ff ff 4f 00 3d+21:27:42.877 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:39.887 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:36.996 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:34.123 READ DMA EXT
25 00 20 ff ff ff 4f 00 3d+21:27:31.133 READ DMA EXT

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 29423 -
# 2 Short offline Completed without error 00% 29399 -
# 3 Short offline Completed without error 00% 29375 -
# 4 Short offline Completed without error 00% 29351 -
# 5 Short offline Completed without error 00% 29336 -
# 6 Short offline Completed without error 00% 29288 -
# 7 Short offline Completed without error 00% 29264 -
# 8 Short offline Completed without error 00% 29240 -
# 9 Short offline Completed without error 00% 29216 -
#10 Short offline Completed without error 00% 29192 -
#11 Short offline Completed without error 00% 29168 -
#12 Short offline Completed without error 00% 29120 -
#13 Short offline Completed without error 00% 29096 -
#14 Short offline Completed without error 00% 29072 -
#15 Short offline Completed without error 00% 29048 -
#16 Short offline Completed without error 00% 29024 -
#17 Short offline Completed without error 00% 29000 -
#18 Short offline Completed without error 00% 28952 -
#19 Short offline Completed without error 00% 28928 -
#20 Short offline Completed without error 00% 28904 -
#21 Short offline Completed without error 00% 28880 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[ Voor 133% gewijzigd door timberleek op 07-07-2012 13:34 . Reden: een van de problemen, was toch geen probleem ]


Acties:
  • 0 Henk 'm!

  • Raven
  • Registratie: November 2004
  • Niet online

Raven

Marion Raven fan

Start een andere pc met de HDD in kwestie wel op? En wat zegt een quick test met de testtool van de fabrikant van de hdd? SeaTools in dit geval, als deze wil starten.

[ Voor 7% gewijzigd door Raven op 07-07-2012 13:33 ]

After the first glass you see things as you wish they were. After the second you see things as they are not. Finally you see things as they really are, and that is the most horrible thing in the world...

Oscar Wilde


Acties:
  • 0 Henk 'm!

  • alt-92
  • Registratie: Maart 2000
  • Niet online

alt-92

ye olde farte

Dus.
Disk dood.
En je systeem waarschuwt je zelfs per mail dat die disk stervende is...
Ik vind wel dat je wat eerder met de palliatieve zorg had moeten beginnen eerlijk gezegd, nu rest je niksanders dan een postmortem onderzoek...

ik heb een 864 GB floppydrive! - certified prutser - the social skills of a thermonuclear device


Acties:
  • 0 Henk 'm!

  • timberleek
  • Registratie: Juli 2009
  • Laatst online: 16-06 21:50
ik heb verder geen pc met een linux.

windows zal hem niet herkennen neem ik aan, aangezien het ufs is.
een ubuntu live cd is een beter idee denk ik.


Is het dan niet raar dat hij het 2 weken prima heeft gedaan en nu zonder waarschuwing (sterker nog, hij was niks aan het doen) ineens pats.
Is het dan ook normaal dat de server uit gaat in dat geval?

En hij wil dan zeker ook niet opstarten omdat de seagate niet door de post heen komt?

edit:
ik was die mails helemaal vergeten, ik had ze op het werk gekregen. nooit meer aan gedacht tot net

[ Voor 59% gewijzigd door timberleek op 07-07-2012 13:48 ]


Acties:
  • 0 Henk 'm!

  • terror538
  • Registratie: Juni 2002
  • Laatst online: 18:59
je hebt best kans dat er een samenloop van omstandigheden was:
-schijf die stervende is
-een stroom hiccup

Dat laatste is dan ook meteen de doodsteek voor je schijf.
Als het een sata schijf is en je systeem draait in AHCI mode (volgens mij pas mogelijk vanaf free 8 ) zou je hem losgekoppeld kunnen opstarten en dan na de bsd boot aan kunnen koppelen (hot swap). Als je veel geluk hebt is er dan wel wat mogelijk.

Heb je ook al gekeken of de schijf het wel doet op andere kabeltjes/poortjes op het moederbord? Als je "geluk" hebt is het je controller/kabel en is je data nog heel.

UFS linux support en BSD partitities zijn nogal magertjes in Linux, what-ever je doet ga er _NIET_ op wegschrijven! Beter is om met een BSD live disk te booten, al zijn je opties wat beperkter. Check in ieder geval van te voren of de Ubuntu live versies wel de BSD part en UFS kernel modules hebben, anders kan je er nog niks mee.

too weird to live too rare to die


Acties:
  • 0 Henk 'm!

  • BoAC
  • Registratie: Februari 2003
  • Laatst online: 21:55

BoAC

Memento mori

Hm ik ben bang dat je een echt probeem hebt met de schijf:

Uit je log:
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 149
Dit is echt een zwaar probleem kan ik je zeggen..
Pagina: 1