Jump to content

Server Crash every 5 minutes


estiaan1234

Recommended Posts

Hi guys,

 

My Linux server is crashing every 5 minutes giving off the follow code in logs, does anyone know what this means or what causes it?

 

$

mono_fdhandle_insert: duplicate File fd 0

Receiving unhandled NULL exception

#0 0x007f5355a7602a in abort

#1 0x007f53533d712c in mono_dl_fallback_unregister

#2 0x007f53533e74d8 in monoeg_g_logv

#3 0x007f53533e756b in monoeg_g_log

#4 0x007f53533ca34c in mono_reflection_get_custom_attrs_data

#5 0x007f5353317b6d in mono_unity_jit_cleanup

#6 0x007f5353342c5e in mono_install_unhandled_exception_hook

Link to post
Share on other sites

Just migrated a save from a friend's Windows 10 PC to a dedicated server running the didstopia/7dtd-server docker image. Running this in a docker container helps eliminate bad environment state as a factor since it provisions a new Linux container each time the server is started. The migration went fine and it loads up exactly how we left it, but we started getting the error referenced in this thread.

 

Interestingly, I have an extra frame in the stack trace from the logs that might give more insight into this (all other frames are exactly the same as OP's).

 

#7 0x000000416f1225 in (wrapper managed-to-native) System.IO.MonoIO:Open (char*,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare,System.IO.FileOptions,System.IO.MonoIOError&)

 

This is helpful since it comes from the intermediate code and is much easier to debug. Now we know the exception is coming from Mono, which is an open source C# library for running .Net code on Linux (this was before .Net Core, and Unity still uses it). Here's the code referenced in the stack trace that the exception is coming from-

[MethodImplAttribute(MethodImplOptions.InternalCall)]
private unsafe extern static IntPtr Open(char* filename,
                                        FileMode mode,
                                        FileAccess access,
                                        FileShare share,
                                        FileOptions options,
                                        out MonoIOError error);

public static IntPtr Open(string filename,
                         FileMode mode,
                         FileAccess access,
                         FileShare share,
                         FileOptions options,
                         out MonoIOError error)
{
   unsafe
   {
       fixed (char* filenameChars = filename)
       {
           return Open(filenameChars, mode, access, share, options, out error);
       }
   }
}

 

So, what looks like what is happening here is the game calls the public Open method, which is supposed return a pointer to the file in memory. When the call finished and the pointer is returned, either the pointer or value at that memory location is null. This could be many different things, but because the devs didn't wrap the call in a try-catch (this should always be done when calling unsafe code), we don't have a clue as to what state led up to this error. Its not permissions, I set all server files to 777 and still got the crash. Beyond that, I have no idea. Could be a file that should exist but doesn't or a number of other things.

 

The sad part is that same Mono class the exception came from has a method specifically for this called GetException that takes a MonoIOError. The same MonoIOError that would have had its value passed to the calling thread (hence the 'out' keyword in the parameter). If this error was handled gracefully, we would have a detailed error telling us why it failed to open the file it was looking for.

 

I'm not trying to call the devs out or anything here, I work with C# professionally myself and stuff like this is stupidly easy to have fall through the cracks. Especially when it appears to be such an obscure error given that only migrating hosts to a different Linux server seems to cause it (coming both from Windows and Linux). If anything, I give them props for making such a well rounded game run so well in intermediate code (although I'm sure Unity helps).

 

TLDR; This is on the devs for not handling this exception so users can more effectively debug without relying on them for support. This is unlikely to be something we'll be able to figure out without them debugging from their side. Hopefully I've provided enough information to make that easier.

Link to post
Share on other sites

My server also started crashing after about five minutes runtime recently, with the same exception as yours.

Noticed that I had a bunch of dropped packages against random amazon IP's on port 443.

So I disabled EAC, and now the server has been running for 30 minutes, so far...

Since the logging is a bit sparse, it's hard to say if you are encountering the same issue, but it might be worth a try.

Link to post
Share on other sites

I experience the same problem on my linux dedicated server since 01.01.2020. Did anyone experience this bug in 2019?

We didnt change anything obvious so I guess its caused by a change in some external code (EAC?) possibly connected to a date bug.

 

Interestingly tough, the fd 0 problem does not only occur on System.IO.MonoIO:Open() but also on System.Net.Sockets.Socket:Accept_internal():

mono_fdhandle_insert: duplicate Socket fd 0

Receiving unhandled NULL exception

#0 0x007f3a3aed5535 in abort

#1 0x007f3a3874512c in mono_dl_fallback_unregister

#2 0x007f3a387554d8 in monoeg_g_logv

#3 0x007f3a3875556b in monoeg_g_log

#4 0x007f3a3873834c in mono_reflection_get_custom_attrs_data

#5 0x007f3a386821ae in mono_unity_jit_cleanup

#6 0x007f3a386fbe37 in mono_opcode_value

#7 0x000000416d9889 in (wrapper managed-to-native) System.Net.Sockets.Socket:Accept_internal (intptr,int&,bool)

#8 0x000000416d92b8 in System.Net.Sockets.Socket:Accept ()

#9 0x000000416d912e in System.IOSelectorJob:System.Threading.IThreadPoolWorkItem.ExecuteWorkItem ()

#10 0x0000004160fffc in System.Threading._ThreadPoolWaitCallback:PerformWaitCallback ()

#11 0x007f3a385860ad in mono_print_method_from_ip

#12 0x007f3a386f199b in mono_perfcounter_foreach

#13 0x007f3a3870d219 in mono_threads_detach_coop

#14 0x007f3a3868a6a6 in mono_unity_jit_cleanup

#15 0x007f3a3870b289 in mono_threads_set_shutting_down

#16 0x007f3a3876c843 in GC_inner_start_routine

#17 0x007f3a387615b6 in GC_call_with_stack_base

#18 0x007f3a3b08b458 in __libpthread_freeres

#19 0x007f3a3afb980f in clone

Link to post
Share on other sites

We are also having the same problem on multiple (probably all 7dtd) servers that we are hosting.

 

mono_fdhandle_insert: duplicate File fd 0

Receiving unhandled NULL exception

#0 0x007fd1409b1028 in abort

#1 0x007fd13e30f12c in mono_dl_fallback_unregister

#2 0x007fd13e31f4d8 in monoeg_g_logv

#3 0x007fd13e31f56b in monoeg_g_log

#4 0x007fd13e30234c in mono_reflection_get_custom_attrs_data

#5 0x007fd13e24fb6d in mono_unity_jit_cleanup

#6 0x007fd13e27ac5e in mono_install_unhandled_exception_hook

#7 0x00000041c39ac5 in (wrapper managed-to-native) System.IO.MonoIO:Open (char*,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare,System.IO.FileOptions,System.IO.MonoIOError&)

Servers crashing after about 5 minutes, but only if some players are connected. Empty servers not crashing.

Link to post
Share on other sites

Servers crashing after about 5 minutes, but only if some players are connected. Empty servers not crashing.

From what i read across different threads this only occurs if a player reconnects. So a player needs to join the server, then disconnect and join again, to make this occur. Probably not just rejoin, but it occurs only after a player joined (even first time?). Maybe the server fails to open the file where the player stats are saved.

Maybe it helps if the affected server owners try to focus this out?

 

Since recently multiple server admins seem to have this problem, i guess it's worth an official bug report.

Link to post
Share on other sites

Have anyone else tried disabling EAC, and did that also remedy the problem for you?

Anyhow, since running without EAC feels rather risky I continued my investigation on that track and think I've found a more safe solution to that problem.

Link to post
Share on other sites

Ok, then try lowering your TCP SYN retries. I noticed that the 7d2d server tried to connect to these IP:s (in random order): 99.81.3.206, 99.80.156.235 and 52.30.110.86.

After the first one times out, it tries another one and when that also times out the server crashes.

I assumed that they've got some software defined timeout triggering before and/or then fails to do the necessary cleanup.

 

ld;dr

Try this (but make sure that it won't affect anything else you've got running on that server):

echo 3 > /proc/sys/net/ipv4/tcp_syn_retries

echo 3 > /proc/sys/net/ipv4/tcp_synack_retries

Link to post
Share on other sites
Hi guys,

 

My Linux server is crashing every 5 minutes giving off the follow code in logs, does anyone know what this means or what causes it?

 

$

mono_fdhandle_insert: duplicate File fd 0

Receiving unhandled NULL exception

#0 0x007f5355a7602a in abort

#1 0x007f53533d712c in mono_dl_fallback_unregister

#2 0x007f53533e74d8 in monoeg_g_logv

#3 0x007f53533e756b in monoeg_g_log

#4 0x007f53533ca34c in mono_reflection_get_custom_attrs_data

#5 0x007f5353317b6d in mono_unity_jit_cleanup

#6 0x007f5353342c5e in mono_install_unhandled_exception_hook

 

I had the same problem with my server. Running Debian 10.

 

My solution was to reisntall 7dtd-ServerTools.

 

I installed 7dtd-ServerTools-18.2.3 and unpacked ServerTools-Linux-SQLite-Fix.tgz and copied the ServerTools-Linux-SQLite-Fix/ubuntu18/libSQLite.Interop.so to 7DaysToDieServer_Data/Mono/x86_64/libSQLite.Interop.so and restared the server.

 

Worked fine after that.

Link to post
Share on other sites

I had moved the server Linux VM to another disk and lowered the memory from 16Gb to 8GB and ran in the same problem. After thinking it was due to file corruption and swaping everything around It was as simple as giving it back it's initial memory.

 

I think this null pointer problem is simply due to lack of memory to allocate something.. maybe? Bottom line Increase the memory of the server and it might (cross fingers) solve the problem.

 

I experience the same problem on my linux dedicated server since 01.01.2020. Did anyone experience this bug in 2019?

We didnt change anything obvious so I guess its caused by a change in some external code (EAC?) possibly connected to a date bug.

 

Interestingly tough, the fd 0 problem does not only occur on System.IO.MonoIO:Open() but also on System.Net.Sockets.Socket:Accept_internal():

Link to post
Share on other sites

Servertools has nothing to do with it. Turning off EAC solves it.

 

host hydra.easyanticheat.net

hydra.easyanticheat.net is an alias for hydra.eac-front.com.

hydra.eac-front.com is an alias for hydra-eu.eac-front.com.

hydra-eu.eac-front.com is an alias for gamesec-hydra-eu-lb-prod-220534806.eu-west-1.elb.amazonaws.com.

gamesec-hydra-eu-lb-prod-220534806.eu-west-1.elb.amazonaws.com has address 54.229.129.174

gamesec-hydra-eu-lb-prod-220534806.eu-west-1.elb.amazonaws.com has address 54.72.32.34

gamesec-hydra-eu-lb-prod-220534806.eu-west-1.elb.amazonaws.com has address 54.72.193.107

gamesec-hydra-eu-lb-prod-220534806.eu-west-1.elb.amazonaws.com has address 34.255.142.199

gamesec-hydra-eu-lb-prod-220534806.eu-west-1.elb.amazonaws.com has address 3.248.158.34

gamesec-hydra-eu-lb-prod-220534806.eu-west-1.elb.amazonaws.com has address 99.81.42.103

gamesec-hydra-eu-lb-prod-220534806.eu-west-1.elb.amazonaws.com has address 52.211.61.165

gamesec-hydra-eu-lb-prod-220534806.eu-west-1.elb.amazonaws.com has address 52.17.75.189

These are the hosts its trying to connect. Nobody would think all of them go down. But hey, welcome to amazon web services. Put your stuff in the cloud, they said. Its safe, they said.

Link to post
Share on other sites

Yeah, I am having the same troubles too.

 

eac_server.so [x64] :: OnLoad()

mono_fdhandle_insert: duplicate File fd 0

Receiving unhandled NULL exception

#0 0x007fb60b995801 in abort

#1 0x007fb6092ed12c in mono_dl_fallback_unregister

#2 0x007fb6092fd4d8 in monoeg_g_logv

#3 0x007fb6092fd56b in monoeg_g_log

#4 0x007fb6092e034c in mono_reflection_get_custom_attrs_data

#5 0x007fb60922db6d in mono_unity_jit_cleanup

#6 0x007fb609258c5e in mono_install_unhandled_exception_hook

#7 0x00000041066cb5 in (wrapper managed-to-native) System.IO.MonoIO:Open (char*,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare,System.IO.FileOptions,System.IO.MonoIOError&)

 

I put a support ticket into Easy anti cheat. We will see where that goes.

Link to post
Share on other sites
Fix;1082252']I had the same problem with my server. Running Debian 10.

 

My solution was to reisntall 7dtd-ServerTools.

 

I installed 7dtd-ServerTools-18.2.3 and unpacked ServerTools-Linux-SQLite-Fix.tgz and copied the ServerTools-Linux-SQLite-Fix/ubuntu18/libSQLite.Interop.so to 7DaysToDieServer_Data/Mono/x86_64/libSQLite.Interop.so and restared the server.

 

Worked fine after that.

Set this yesterday. So far everything is doing well.

Link to post
Share on other sites

We might be observing two different issues here (depending on whether we run linux or windows?).

But since I reduced the SYN retries on my servers, I've not seen this issue anymore (with EAC enabled).

I'd assume the developers would be grateful if we could pinpoint this issue to a soft timeout/cleanup they have to look into.

 

These were my values (on a out-of-the-box) RedHat/CentOS 7 server before any changes:

$ cat /proc/sys/net/ipv4/tcp_syn_retries

6

$ cat /proc/sys/net/ipv4/tcp_synack_retries

5

 

Which I changed to:

$ echo 3 > /proc/sys/net/ipv4/tcp_syn_retries

$ echo 3 > /proc/sys/net/ipv4/tcp_synack_retries

 

Don't worry, those setting will not be persistent (they'll be restored at a server reboot) unless you defined them in sysctl.

 

This basically changes the total SYN timeout from about 180 seconds to 40 seconds, which seems sufficient to circumvent the bug.

Link to post
Share on other sites
Servertools has nothing to do with it. Turning off EAC solves it.

 

 

These are the hosts its trying to connect. Nobody would think all of them go down. But hey, welcome to amazon web services. Put your stuff in the cloud, they said. Its safe, they said.

 

Oh yeah, wouldn't be the first time I've ran into all sorts of weirdness when clocks roll over, DST changes, etc. And for all we know, this could be TFP failing to upgrade some integration library that their servers are expecting to have an API change for :shrug:.

 

I will confirm that disabling EAC seems to stop this problem from happening.

Link to post
Share on other sites

I can confirm same issue here:

 

eac_server.so [x64] :: OnLoad()

mono_fdhandle_insert: duplicate File fd 0

Receiving unhandled NULL exception

#0 0x007f7e99f17a28 in abort

#1 0x007f7e9787812c in mono_dl_fallback_unregister

#2 0x007f7e978884d8 in monoeg_g_logv

#3 0x007f7e9788856b in monoeg_g_log

#4 0x007f7e9786b34c in mono_reflection_get_custom_attrs_data

#5 0x007f7e977b8b6d in mono_unity_jit_cleanup

#6 0x007f7e977e3c5e in mono_install_unhandled_exception_hook

#7 0x00000041328b85 in (wrapper managed-to-native) System.IO.MonoIO:Open (char*,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare,System.IO.FileOptions,System.IO.MonoIOError&)

 

Distro Details

=================================

Distro: CentOS Linux 7 (Core)

Arch: x86_64

Kernel: 3.10.0-1062.9.1.el7.x86_64

Uptime: 0d, 11h, 19m

tmux: tmux 1.8

glibc: 2.17

 

Server Resource

=================================

CPU

Model: Intel Xeon E3-12xx v2 (Ivy Bridge, IBRS)

Cores: 4

Frequency: 2999.998MHz

Avg Load: 0.37, 0.39, 0.41

 

Memory

Mem: total used free cached available

Physical: 7.7GB 2.5GB 5.0GB 5.1GB 5.0GB

Swap: 0B 0B 0B

 

Storage

Filesystem: /dev/sda1

Total: 30G

Used: 27G

Available: 3.2G

 

 

7 Days To Die Server Details

=================================

Maxplayers: 16

Game mode: GameModeSurvival

Game world: Navezgane

Master server: true

Status: ONLINE

 

Command-line Parameters

=================================

./7DaysToDieServer.x86_64 -logfile /home/sdtdhost/log/server/output_log__2020-01-05__04-10-42.txt -quit -batchmode -nographics -dedicated -configfile=/home/sdtdhost/serverfiles/sdtdserver.xml

Link to post
Share on other sites

Archived

This topic is now archived and is closed to further replies.

Guest
This topic is now closed to further replies.
×
×
  • Create New...