Tag: test

Memory test hung after upgrade to 64GB RAM (Follow-up)

Memory test hung after upgrade to 64GB RAM (Follow-up)

Note: This is follow up action of Memory test hung after upgrade to 64GB RAM

This machine was installed TrueNAS, but reboot quite often, about few times a week. After that, I changed it to Proxmox, then one of the active VM kept hanging.

I decided to re-look into the memory test issue.

32GB testing

If only install 32GB memory, the test was OK at beginning. But after full day of testing, start to have 1 failure in one pass, not many but not acceptable.

64GB testing

If install 48GB or 64GB memory, the test caused system hanging in black screen.

Reduce memory speed

After reduce the memory speed to 1066 for 64GB memory, looks OK in Pass 1, but Pass 2 got issue again.

Getting worse

After full day of testing, the test result was getting worse. Previously hung at 80%+, then hung at 40%+. Then looks like problem caused by other factors, such as temperature, etc.

Then I noticed that the power supply is very hot and no noise. Then I think the issue could be the system was not getting enough power.

Change power supply

After take out the power supply, found that power supply fan was not turning. Normally, this issue can not be detected, because the fan towards the bottom of casing.

I got one new power supply, after installed new power supply, the system become colder.

Memory test

After changed the power supply, the memory passed when setting at speed of 1333, which is also the speed detected by motherboard. Even the motherboard auto overclocked CPU speed to 4000+ (CPU speed should be 3600), no error too.

Speed of the old set of RAMs

The new set of RAMs can pass testing at 1600 speed, but the old set got error less than 1 minute.

Although the spec of the old and new RAMs are all the same, brand, speed, etc, but the system detected the new RAMs are 1600, but old set are 1333, no matter in which memory slot. I'm not sure whether the lower speed is caused by aging or because cheated by seller.

Speed of testing among 1600, 1333, and 1066

During the testing, I noticed that the testing speeds are quite different among 1600, 1333 and 1066 if they are under same CPU speed.

References

MemTest86

Memory test hung after upgrade to 64GB RAM

Memory test hung after upgrade to 64GB RAM

Update: The problem was fixed as stated in Memory test hung after upgrade to 64GB RAM (Follow-up)

Took many hours to troubleshoot RAM test hanging issue.

Testing software

MemTest86

The test was hung at 86% during pass 1.

Memtest86+

The bootable USB created was not bootable.

Ubuntu Live CD

The testing was hung just started.

Posible issue

Hardware issue

Should not be RAM issue, tested them by separating them into two sets with 32GB each, using MemTest86, both sets can pass.

Maybe slot issue

The RAM must be in slots in correct order, otherwise, the RAM can not be detected. In fact, this mother board is quite sensitive to the RAM position.

E8036_P9X79_DELUXE

Maybe RAM hot

The RAMs could be too hot, the newly added RAMs make the gabs between RAMs are too small, the fans are also not strong enough, this could cause high temperature. I tried to adjust the fan speed by using the controls on casing, but not effective.

But RAM should be able to take high temperature, and the type of RAM I bought has cooling case.

Software issue

Maybe conflict with VGA

The max RAM size that MemTest86 can support, is 64GB, which means it could have bug too. On the other hand, the hunging shows nothing on the screen, black screen, but keyboard light was still responsive, and the VGA light on mother board was turned on. Maybe the software wrote the area that video card used caused such issue.

Conclusion

Although the testing was hung, I decided still use them.

In order to use them fully in TrueNAS, I adjusted the zfs_arc_max to 60GB, and run two VMs, one Windows at 4GB, one ubuntu at 6GB, till the memory free less than 4GB.

Result

The TrueNAS looks working fine, although restarted once, log didn't show the cause related to memory, and there was no memory issue in dmesg monitoring screen.