Stress Testing Minecraft Servers

Understanding Server Stress Testing

Stress testing simulates high player loads so you can find performance problems before real players find them for you. The point isn't to see when the server crashes. It's to figure out how many players you can actually handle and where things start falling apart.

A new plugin might work fine with 10 players and tank the server at 50. A VPS upgrade might not deliver the improvement you were hoping for. Stress testing gives you hard numbers instead of guesses.

Key Performance Metrics

Minecraft servers have two numbers you need to watch: TPS and MSPT.

TPS (Ticks Per Second) measures the server's tick rate, targeting 20 ticks per second. MSPT (Milliseconds Per Tick) measures how long each tick takes to process. MSPT is the one you should actually care about, because it shows performance trends before TPS drops. A server at 45ms MSPT is still hitting 20 TPS but has almost no headroom left.

TPS	MSPT	Server State
20.0	<40ms	Healthy, has headroom
18-19	40-50ms	Minor lag, running near limit
15-17	50-70ms	Noticeable delays
<15	>70ms	Severe, nearly unplayable

The best tool for monitoring both is Spark, which works on Paper, Spigot, Fabric, Forge, and proxies. Install it, then use these commands in-game:

/spark tps              # Shows TPS, MSPT, and CPU usage
/spark profiler start   # Start CPU profiler
/spark profiler stop    # Stop and generate shareable report
/spark heapsummary      # Memory breakdown
/spark gc               # Garbage collection activity

Spark's profiler generates flamegraphs showing exactly where server time is spent. Look for the widest sections; those are eating the most time. If a plugin name shows up disproportionately large, that's your problem.

Optimize Before Testing

Testing an unoptimized server wastes time. You'll just discover it needs optimization. Get your server running well first, then stress test to see what it can handle.

Server Software

Paper is the standard for performance. It includes hundreds of optimizations over Spigot while maintaining plugin compatibility. For servers expecting 100+ players, consider Pufferfish (a Paper fork tuned for high player counts). Avoid vanilla Bukkit and CraftBukkit; they lack modern optimizations entirely.

View and Simulation Distance

These two settings make the biggest difference out of anything you'll touch.

server.properties

view-distance=8
simulation-distance=4

Every chunk within simulation distance requires entity processing, crop growth, and redstone ticking, which is far more expensive than just rendering terrain. Setting simulation distance lower than view distance gives players good visibility without the processing overhead.

Server Type	View Distance	Simulation Distance
Budget / high player count	5-6	3-4
Mid-range	7-8	4-5
High-performance	10+	6-8

Entity Limits

Mob AI is one of the most expensive operations on any Minecraft server. Tighten the defaults in paper-world-defaults.yml:

paper-world-defaults.yml

entities:
  spawning:
    spawn-limits:
      monster: 20
      creature: 5
      water_creature: 2
      water_ambient: 2
      ambient: 1
    ticks-per-spawn:
      monster: 10
      creature: 400
      water_creature: 400
      ambient: 400
  behavior:
    max-entity-collisions: 2  # Down from 8

Pregenerate Chunks

Chunk generation is brutally expensive. TPS can plummet from 20 down to 2-8 while generating new terrain. Use the Chunky plugin to pregenerate before players explore:

/chunky world world
/chunky radius 5000
/chunky start

Run this overnight or on a separate server, then copy the world files to production. Never pregenerate while players are online.

Java Flags

The right JVM flags depend on your Java version and hardware. Java 25 brought production-ready ZGC and Compact Object Headers (JEP 519), which together are now the recommended choice for servers with sufficient hardware. For older Java versions or smaller setups, Aikar's G1GC flags remain solid.

Replace the memory values with your available RAM (Xms and Xmx should match):

start-server.sh

java -Xms10G -Xmx10G -XX:+UseZGC \
  -XX:+UseCompactObjectHeaders -XX:+AlwaysPreTouch \
  -XX:+DisableExplicitGC -XX:+PerfDisableSharedMem \
  -XX:+UseDynamicNumberOfGCThreads -jar paper.jar --nogui

ZGC is self-tuning, so you don't need to hand-tune GC parameters. Compact Object Headers reduce per-object header size from 12 to 8 bytes, lowering memory usage and GC pressure across the board.

Requirements: 8+ CPU cores and 6GB+ allocated RAM. On smaller hardware, use G1GC instead.

start-server.sh

java -Xms10G -Xmx10G -XX:+UseG1GC -XX:+ParallelRefProcEnabled \
  -XX:MaxGCPauseMillis=200 -XX:+UnlockExperimentalVMOptions \
  -XX:+DisableExplicitGC -XX:+AlwaysPreTouch \
  -XX:G1NewSizePercent=30 -XX:G1MaxNewSizePercent=40 \
  -XX:G1HeapRegionSize=8M -XX:G1ReservePercent=20 \
  -XX:G1HeapWastePercent=5 -XX:G1MixedGCCountTarget=4 \
  -XX:InitiatingHeapOccupancyPercent=15 \
  -XX:G1MixedGCLiveThresholdPercent=90 \
  -XX:G1RSetUpdatingPauseTimePercent=5 \
  -XX:SurvivorRatio=32 -XX:+PerfDisableSharedMem \
  -XX:MaxTenuringThreshold=1 -jar paper.jar --nogui

G1GC vs ZGC

G1GC (Aikar's flags) divides the heap into regions and collects garbage in phases, aiming for a 200ms pause time target. It still produces stop-the-world pauses that scale with heap size, but Aikar's tuning parameters optimize it for Minecraft's allocation patterns: lots of short-lived objects from chunk loading, entity processing, and packet handling. Works well on modest hardware and smaller heaps.

ZGC does nearly all garbage collection concurrently with the application. Pause times stay under 1ms regardless of heap size. That's not a typo. Where G1GC can spike to 200ms+ (long enough for players to notice rubber-banding), ZGC pauses are measured in microseconds. brucethemoose's testing found no measurable server throughput hit, and Netflix's production data on Generational ZGC showed a 6-8% throughput improvement over G1 in allocation-heavy workloads. The tradeoff is higher CPU usage, since ZGC runs its collection threads concurrently instead of pausing everything.

Compact Object Headers (JEP 519) complement either collector. They shrink object headers from 12 bytes to 8 bytes, reducing heap usage and improving cache locality. The SPECjbb2015 benchmark showed 22% less heap usage and 8% less CPU time with compact headers enabled. Amazon validated these numbers across hundreds of production services. For Minecraft specifically, where the server creates millions of small, short-lived objects per tick (chunk data, entity state, packets), that 10-20% reduction in live data memory is significant.

In practice: If your server has 8+ cores and you're running Java 25, use ZGC. Sub-millisecond pauses mean players will never feel a GC hiccup. If you're on older Java, constrained hardware (under 8 cores), or allocating less than 6GB RAM, stick with Aikar's G1GC flags. Monitor with /spark gc either way.

For deeper benchmarks and analysis, see brucethemoose's Minecraft Performance Flags Benchmarks and Obydux's modern startup flags.

Conducting the Stress Test

The key is testing incrementally. Connecting 500 bots at once teaches you nothing except "it crashes."

Phase 1: Baseline

Start with an empty server. Record idle TPS (should be 20.0), MSPT (should be under 10ms), memory usage, and CPU. This is your reference point.

Phase 2: Gradual Load

Increase bots in steps and hold each level for 10-15 minutes:

Step	Bots	Goal
1	10	Confirm test setup works
2	25	Watch for early degradation
3	50	Typical peak for small servers
4	100	Mid-size server stress test
5	200+	Find the breaking point

At each step, record TPS, MSPT (current, median, and 95th percentile), memory trends, and any errors. Run a Spark profiler for 2 minutes during each level to capture where time is being spent.

Phase 3: Realistic Behavior

Idle bots standing still don't represent real players. Add movement and actions to actually hit the parts that slow down:

Tests entity tracking, pathfinding, collision detection, and chunk loading. Random walking is enough to stress entity tracking and collision. Check EntityAI CPU usage and chunk loading frequency in Spark.

Tests world I/O and chunk generation. Bots flying or moving to unexplored areas reveal whether your pregeneration was sufficient. Watch for disk I/O spikes and TPS drops.

Tests mob AI, damage calculations, and entity death/spawn cycles. High entity AI usage and frequent state changes stress the server differently than passive movement.

Phase 4: Spike Testing

Real servers get sudden load spikes. Think 100 players joining within 30 seconds during an event. Start with 10 bots connected, then join 50 simultaneously. Watch whether TPS drops below 15 during the surge and how long recovery takes.

Phase 5: Endurance

Some issues only show up after hours: memory leaks, gradual resource exhaustion, increasing GC pause times. Connect 50-75% of your target capacity and let it run for 4-8 hours. If memory keeps climbing or MSPT creeps up over time, something's leaking.

Identifying Bottlenecks

TPS drops proportionally with player count. This is general overhead from entities, chunks, and networking. The fix is straightforward: reduce view/simulation distance and entity caps.

Random TPS spikes are a different beast. Usually it's chunk generation, large redstone devices, or a plugin running heavy tasks on the main thread. Pregenerate more of the world and profile during the spikes. The flamegraph will tell you exactly what's blocking.

MSPT creeping up over time points to memory pressure or resource exhaustion. Pull up Spark's heap summary and look for unusual object counts or memory usage that just keeps growing.

If TPS looks fine but CPU is pinned above 80%, you're closer to the limit than you think. There's no headroom for spikes. Optimize further or upgrade hardware.

When Spark shows a specific plugin consuming more than 10% of CPU, investigate its configuration first. Many plugins have expensive features that can be disabled. If entity AI dominates the profile (>30%), lower mob caps. If chunk operations are excessive, tighten chunk loading rates in paper-global.yml.

Using SoulFire for Testing

I'd recommend SoulFire for this. Unlike most bot frameworks that reimplement the Minecraft protocol from scratch, SoulFire runs actual Fabric client code, so bots behave exactly like real players at the protocol level. You don't get weird packet timing or broken physics that throws off your results. The load is basically identical to real players, which is the whole point.

It also supports multiple Minecraft versions through ViaFabricPlus, so you can test whatever version you're running without hunting for a compatible tool.

Here's how I usually run a test:

Set your target server and bot count in the SoulFire interface
Stagger the joins (1-3 seconds between bots) so you don't flood the login server
Turn on the movement and activity plugins so bots aren't just standing there
Keep Spark running on the server the whole time
Hold each load level for 15-30 minutes before bumping it up

When to Test

Don't just test once. Run tests before launches, after major plugin updates, before events with expected high turnout, and after hardware changes. A monthly baseline test catches gradual degradation from accumulating data and slow memory leaks that daily monitoring misses.

The process is: optimize first, monitor with Spark, test incrementally with realistic behavior, fix what the profiles reveal, and retest. That's really all there is to it.