Jump to content

How can you troubleshoot server lag not caused by memory, cpu, or latency?


Aesirkin
 Share

Recommended Posts

I'm running my first dedicated server.  We chose Ping Perfect as our host, and are now on day 105 (2 hour days).  We restart every 6 hours.  We provisioned 8 GB of RAM and above average CPU priority with 16 player slots.  We usually have 2-4 players on at a time, but sometimes that jumps as high as 7-8.  Max view distance was set at 7 (just decreased it to 6), max zombies is 36 and max animals is 12 (though I saw 62 zombies on the livemap the other day; not during a bloodmoon).  We have many mods, and a few large player made POIs

 

Over the past 5-10 game days we've noticed increased lag.  When riding around we're constantly jumping forward or back (every 2-3 seconds).  Zombies are walking in place as much as they're moving.  And other similar issues.  A restart seems to resolve it for a time, but it's coming back quicker and more frequently as we go.

 

I checked Ping Perfect's stats page, and during these periods I notice nothing that seems the likely culprit.  CPU usage rarely goes above 5%.  Memory usage is usually just under 5GB, but sometimes rises to around 5.5GB; nowhere near our 8GB limit.  Ping times for players are always sub 100, and usually sub 35.  So all of my hardware reporting indicates things are running smoothly.  Yet...lag...

 

Where would I start in troubleshooting an issue like this?  An example started around 8ish last night.  I ran a mem command to check memory usage around 9pm while it was still occurring, but everything looks good, though not matching the graph (am I correct that mem reports server resources while memcl reports client?).

 

image.thumb.png.d2c399eef7d7c549866ce45a31b36b28.png

 

image.thumb.png.5c4c5472552a57876b3b7949491b0424.png

 

Link to comment
Share on other sites

32 minutes ago, meganoth said:

It seems it happens when at least 4 players are online. And a possible culprit could be network. 

 

 

 

Thanks.  When it happened last night there were 6 players online.  I checked at the time and all had pings listed in the 20s or 10s (a little better than usual).  Is there some other metric I can check other than the ping in the player panel that could help me determine if this is a network issue?

Link to comment
Share on other sites

That's the rub. Without access to a root shell your options are limited even if I knew where to look.

 

One thing you could do would be to check your servers logfile for the times the FPS go below 20. Generally 20 is an internal limit that shouldn't be underrun regularily.

If this happens often then I'd say generally your server hardware runs into some limits and it doesn't matter much which ones.

 

In that case you might try

1) turn off dynamic mesh on the server

2) try a vanilla test game with the same players. If that changes anything then one of the mods is too much for the system

 

 

 

Link to comment
Share on other sites

15 hours ago, meganoth said:

That's the rub. Without access to a root shell your options are limited even if I knew where to look.

 

One thing you could do would be to check your servers logfile for the times the FPS go below 20. Generally 20 is an internal limit that shouldn't be underrun regularily.

If this happens often then I'd say generally your server hardware runs into some limits and it doesn't matter much which ones.

 

In that case you might try

1) turn off dynamic mesh on the server

2) try a vanilla test game with the same players. If that changes anything then one of the mods is too much for the system

 

 

 

 

Thanks.  I'll see what I can figure out looking at the logs.

 

I'm toying with the idea of just putting together a budget server machine and hosting here.  My suspicion is that I'm sharing this server with too many other VMs and we're hitting bottlenecks somewhere regardless of what those graphs show.

Link to comment
Share on other sites

On 8/6/2022 at 9:19 PM, meganoth said:

That's the rub. Without access to a root shell your options are limited even if I knew where to look.

 

One thing you could do would be to check your servers logfile for the times the FPS go below 20. Generally 20 is an internal limit that shouldn't be underrun regularily.

If this happens often then I'd say generally your server hardware runs into some limits and it doesn't matter much which ones.

 

 

I downloaded a log from the 6 hour session on the night referenced.  There were 665 FPS reports in the log.  Around 70 reported FPS below 20; some as low as 10 or 11.  Is that abnormal?  How regularly should sub-20 be hit?  I assume once or twice would be okay, but 10% is far too high?

 

Link to comment
Share on other sites

1 hour ago, Aesirkin said:

 

I downloaded a log from the 6 hour session on the night referenced.  There were 665 FPS reports in the log.  Around 70 reported FPS below 20; some as low as 10 or 11.  Is that abnormal?  How regularly should sub-20 be hit?  I assume once or twice would be okay, but 10% is far too high?

 

 

Do those drops coincide with the times when you also have problems in the game?

 

For example if too many virtual machines are hosted on one physical server you might see FPS drop at all possible times (i.e. when other virtual machines are under heavy load). But if your own virtual server is hitting hard limits (either set by hardware or the virtual environment) only your own server load determines the FPS and your own actions trigger the drops.

 

The average FPS should be taken into account as well. If you see FPS normally at say 25 it indicates you have no spare resources for the times under heavy load. If you see FPS normally at 60 it could mean the servers hardware is in principle quite capable to run the game but something unexpected (like a bug or an unfortunate setting or a badly optimized mod) kills performance at times.

 

2 years ago my server hardware was updated (from 2014 server hardware to 2020 server hardware. Interestingly no increase of CPU frequency though, I think both generations had about ~2.1-2.3Ghz).

Before I had <30 FPS average and drops below 20 happened. Afterwards I had <40 average and no drops below 20. I remember first being a little disappointed at the small jump of FPS. Until I saw many improvements in actual game play. For example rubberbanding and delayed area load while driving vehicles at high speed was almost gone.

 

Did you test turning off dynamic mesh? This is a test you can easily do in an existing game.

 

Edited by meganoth (see edit history)
Link to comment
Share on other sites

Sorry, it took me a bit to get time to work through this.

 

Yes, those low framerates, and the intensity of the low frame rates, coincided with the lag we experienced.

 

I did not do anything with dynamic mesh yet.  I want to look into it and understand what it is and what the impact of disabling it is before doing anything.

 

Thanks!

Link to comment
Share on other sites

I really can't find much on dynamic mesh.  Many many Google results talk about disabling it to improve performance, but nothing really describes exactly what it is and what it does, and how it will impact players to disable it.

 

I'll try turning it off during the next reboot.  But would you be able to tell me what it is that I'm turning off and what potential issues I should look out for?

 

Thanks!

Link to comment
Share on other sites

Lets say you build a horde base or change a POI so it looks very different. If you now drive away you will suddenly see your horde base vanish or the POI look like it is unchanged.

The reason is that the pictures you see from the distance are prerendered pictures. Since your changes to horde base and POI were done long after the prerendering happened there are no smaller pictures of your horde base or the changed POI.

 

The dynamic mesh feature is dynamically rendering those smaller pictures whenever you change the landscape. It does this as a background job. Effect: You can see your horde base and changes to POIs from the distance. Disadvantage: It seems to use up a lot of resources at the moment

 

You can also read about it in the changelog/announcement of A20.

 

Edited by meganoth (see edit history)
Link to comment
Share on other sites

Thank you; I understand now.

 

Is there any way to still have this done, but not as a background process?  Possibly a rerender of all updated POIs on startup?  Or is there a task I can schedule to run in the middle of the night?  Or possibly just turn the aggressiveness of that process down (update less frequently or only for large changes)?  It seems like this is a valuable feature that I'd hate to just turn off.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...