Blockchain load tests (FTM, S, AVAX)

They're more difficult than expected

Dec 23, 2024

I was too lazy to edit together the Sonic, Fantom, and Avalanche logos together so I asked Grok to do it. Claude and ChatGPT were even worse, if you can believe it.

Foreword

My original note about load testing Sonic got a lot more visibility than I expected. If you’re here from Sonic / Fantom you probably don’t know me, because I’ve spent most of the last two years in Avalanche communities.

I’ll try to be unbiased, even though I know quite a bit more about how Avalanche works. I’m writing this foreword (and most of this article) before actually doing the final tests, no idea how it will turn out. Please don’t assume I cheated if Avalanche looks better when we get to the results.

FYI - I’M JUST SOME GUY

These tests are not official in any way. They don’t test anything standard. There might be bugs. There might be inaccuracies. Please read the details about how the tests were conducted, which are included later on.

This is absolutely not the top performance for any of the chains tested, if testing such a thing is even possible.

I do think that it’s a good directional indication of how the chains perform relatively to each other, using one guy’s arbitrary idea of how that should be done.

Background

I just recently botted the non-public ETH→Sonic USDC bridge for a few friends so they could buy into AnonAI early. While doing some bulk actions, I couldn’t help but notice that transactions zip along pretty fast, so I started reading the docs on the Sonic Labs website.

Personally I think these are a bit too vague and high level to explain what the magic juice is for Sonic. I “read” the Lachesis whitepaper linked from the docs (not smart enough to understand it) but that’s the consensus used in Fantom already, so not something new. I tried asking on Discord for pointers to more information but no response.

So I figured, why not just test both and see how they do relative to each other?

First test

Seemed like a simple idea. I have all kinds of tools from previous projects that should make this quick. I hacked up a load test and kicked it off on 100 different machines.

Just an offhand thing, so I didn’t do the best job of explaining what I did. And honestly I didn’t collect enough details for a really good load test. So I decided to put a bit more work into it.

Huge mistake.

You need a PHD in load testing

I’m exaggerating, but god damn this was a lot harder than I expected it to be. Partially because I wanted to be a bit efficient, I guess. It’s a lot easier if you just spin up 1,000 machines to do a test rather then if you try to make it work with only 100. Let’s talk about some of the problems I ran into.

Rate limiting?

When I did my initial tests, I was pleased to see that the RPCs were super performant and that there were really basically no rate limits. That made testing pretty easy.

Well, I have no inside information, but I wouldn’t be surprised if they made some changes in the last 24H because some asshole kept sending 10K transactions over the span of 2 minutes.

Or maybe not. Could be the phase of the moon, maybe the RPCs are drunk, maybe I hallucinated or ran insufficient tests. Anyway, it went significantly worse on my retests, and I had to make a lot of adjustments, including increasing the bot count from 100 to 200. Probably would have been better with 300-500 but I’m cheap.

Nonces!

When you send a transaction, you need to include a value which is the number of transactions that you’ve sent previously. So if you send 10 transactions from a fresh account, you send 0, 1, 2… 9 with them.

I have some code to handle nonces efficiently, but apparently it doesn’t deal well with trying to send a massive number of transactions at high speed with potentially flaky RPC endpoints. I ended up writing some custom stuff for this experiment.

Still, I bumped into issues. For example, in one test, a bot only got two transactions onto Fantom. After investigating, it seems like the RPC node accepted the TX and then just… lost it? Or whatever, it never made it on chain.

Then it kept sending the next 200 transactions with a nonce above that, and since transactions are guaranteed to be included in order, they just timed out.

Other weirdness and mistakes

Unfortunately I didn’t keep a good log but I ran into all kinds of problems, which I mostly blame on RPC flakiness. It was a frustrating day. Suspicious how much easier that one-off test was, I probably did something wrong.

I made so many dumb mistakes, it’s remarkable. E.g. just now I accidentally re-ran the Fantom load test again after I had already swept the gas out of the accounts. I forgot to fund accounts several times, ran out of gas several times, funded the accounts incorrectly, got rate limited by Google Cloud, you name it.

Test details

You can nitpick any of these if you want, but at least they’re standardized across all the tests, which is (in my opinion) the most important thing. Once again, we’re not trying to measure ‘max performance’, just ‘relative for this test’.

200 bots, running on Google Cloud, spread across 6 different regions (against the common wisdom which is ‘run in us-east-1 for best performance’).
.1s pause between sending each transaction (not including transmission time). This is done to try and minimize RPC flakiness, which was an issue at lower pause times. It puts the absolute max TPS per bot at 10, but 3 is a more realistic number, which makes max effective TPS 600.
Test runs for one minute. Reduced from two minutes to limit the possibility of rate limiting, to save money, and impact on actual users of the chains.
Past tests have shown empirically that ‘multicasting’ TX (sending the same TX to multiple RPC nodes) improves time to inclusion. In this case, Sonic only has two public nodes (official and DRPC) so for parity I used the official node and DRPC for all chains.
The transactions are pre-cached, issued one by one.
Transactions use a fixed 30K gas limit, EIP 1559 TX with a fixed and different per chain (sufficient that it’s never exceeded) max gas price. Nonces are calculated, not fetched prior to sending.
Test runs for 1 minute with a fixed start time and end time. The monitor will watch for blocks with TX from the bots in them and record relevant data, and summarize it.

Latency vs throughput

Andre brings up a good point here but I don’t think it matters much for this test. Worth investigating for a really comprehensive attempt at maximizing TPS, but even then I doubt it would matter if you’re willing to throw enough resources at it.

My reasoning is that the latency of the RPCs only really impacts the time to land on chain (which is not a focus of this test). If the RPC responds slowly, or doesn’t gossip fast enough to validators, well, it will still get included eventually.

Over the course of 1-2 minutes that kind of displacement should get lost in the noise; if you can only mine 800 transactions per second, it doesn’t matter if they were broadcast one second ago or five seconds ago.

It would only matter if the public RPC nodes weren’t up to the task of handling the throughput, which is probably worth knowing anyway right? Hard to say the chain can do X TPS when the RPC nodes can’t accept them.

Costs

The compute costs were actually the largest part of running these tests. Gas for 10K minimal transactions is quite low, given the average gas price on these three chains.

Depending on the run I was doing, the compute cost between $5 and $20 to do one of these tests. Could have been optimized lower but whatever. Probably another $5-$30 for gas for each test (Avalanche load tests are expensive).

In total I probably spent about $600 on this, way too much on this for just a random hobby project. I’ll return some of my kid’s Christmas presents, he’s a pain in the ass anyway.

I think that if you did want to try and measure max performance for a chain, you could probably get close by scaling this test up 3x-5x to ensure saturation.

On Avalanche at least, I know that would be pretty expensive as the gas scaling kicks in. I’m not clear on how the scaling works on Sonic/Fantom but I haven’t been able to make it budge with my tests, they have targets but I believe they’re set much higher.

Expectations

My completely unscientific guess is that the results will be Sonic > Avalanche > Fantom for TPS.

One thing to note is that there’s a big difference between ‘transactions accepted’ and ‘transactions included in blocks’. My original test for Sonic showed 850 accepted transactions per second, but they got included in blocks at a much lower speed, and had quite a long tail of smaller TX blocks after the flood had stopped.

The ‘accepted speed’ is really just a measurement of my bots and the RPC nodes themselves, although it does put an upper limit on the TPS that the chain can actually provide.

My expectation is that none of the chains will finish including TX immediately (within 1s-2s) after the flood ceases. It will be interesting to see what the slope of included TX looks like after the end time. I would like to see it remain relatively flat and then dip straight to zero.

Of the three chains, only Avalanche does not have sub-second block times. So I expect to see some seconds with zero TPS on Avalanche.

Quick test results

Before starting I’ll run a with just one bot to see what it looks like (no fancy charts, just bot output).

Fantom

Total bot transactions: 252

Runtime window statistics:
Transactions: 201
TPS: 3.35

Post-runtime window statistics:
Transactions: 51
TPS: 6.38

Time to reach transaction percentiles:
Time to 95%: 66.00 seconds
Time to 100%: 68.00 seconds

Sonic

Total bot transactions: 355

Runtime window statistics:
Transactions: 354
TPS: 5.90

Post-runtime window statistics:
Transactions: 1
TPS: 1.00

Time to reach transaction percentiles:
Time to 95%: 58.00 seconds
Time to 100%: 61.00 seconds

Avalanche

Total bot transactions: 290

Runtime window statistics:
Transactions: 285
TPS: 4.75

Post-runtime window statistics:
Transactions: 5
TPS: 5.00

Time to reach transaction percentiles:
Time to 95%: 59.00 seconds
Time to 99%: 61.00 seconds

Summary

Not a huge difference here, and it can be probably mostly attributed to the speed of the RPCs.

Reviewing the logs for the run, Fantom was the only test to experience RPC flakiness, which probably impacted the total throughput.

Sonic’s RPCs consistently returned faster than Avalanches, ~.16s vs ~.19s, which explains the TPS difference.

Fantom had a surprising and considerable amount of ‘spill over’ after the test. Sonic had a smaller amount than Avalanche, but that’s to be expected given the longer Avalanche block time, I’d say both kept up with the transactions promptly.

Full test results

Fantom

Processed blocks from 100882092 to 100882390

Load Test Summary:
Used native:  0.497300722597066983
Min transactions per bot: 4
Average transactions per bot: 37.90
Max transactions per bot: 271
Total bot transactions: 7580

Runtime window statistics:
Transactions: 6136
TPS: 102.27

Post-runtime window statistics:
Transactions: 1444
TPS: 12.13

Time to reach transaction percentiles:
80%: 59.00 seconds
90%: 81.00 seconds
95%: 125.00 seconds
99%: 166.00 seconds
100%: 179.00 seconds

I think it’s become clear that the Fantom RPCs are quite a bit less good than the Sonic ones. Of the transactions that did land, most of them landed in the test window, but there was a long tail of 120s for the remainder.

Blue line is the blocks per second, you can see it hovers around 1-2 with a few gaps to 3 or 0.

Sonic

Processed blocks from 1364593 to 1365309

Load Test Summary:
Min transactions per bot: 147
Average transactions per bot: 196.69
Max transactions per bot: 395
Total bot transactions: 39337

Runtime window statistics:
Transactions: 24621
TPS: 410.35

Post-runtime window statistics:
Transactions: 14716
TPS: 69.74

Time to reach transaction percentiles:
80%: 88.00 seconds
90%: 118.00 seconds
95%: 157.00 seconds
99%: 206.00 seconds
100%: 271.00 seconds

The Sonic RPCs are pretty good, and the chain handled quite a few more TPS than Fantom.

But the tail for inclusion here is VERY long. Those last few TX between 99% and 100% were lost in the phantom zone for over a minute.

Pretty consistently builds 2-3 blocks per second. Interestingly, it produces fewer but larger blocks at the start, and then smaller but faster blocks at the end. I guess that’s a good thing.

Avalanche

Processed blocks from 54774059 to 54774121

Load Test Summary:
Min transactions per bot: 69
Average transactions per bot: 197.13
Max transactions per bot: 283
Total bot transactions: 39427

Runtime window statistics:
Transactions: 17618
TPS: 293.63

Post-runtime window statistics:
Transactions: 21809
TPS: 42.51

Time to reach transaction percentiles:
80%: 338.00 seconds
90%: 410.00 seconds
95%: 455.00 seconds
99%: 526.00 seconds
100%: 573.00 seconds

Uh oh! I fucked up.

Block 54776336: 0 bot tx, 10 total tx
Block 54776337: 0 bot tx, 7 total tx
Block 54776338: 689 bot tx, 690 total tx
Block 54776339: 678 bot tx, 682 total tx
Block 54776340: 0 bot tx, 7 total tx
Block 54776341: 0 bot tx, 1 total tx
Block 54776342: 0 bot tx, 7 total tx

Looks like I did NOT set the max gas properly for this load test. The blocks with 0 TX before suddenly getting a huge chunk are when the minimum fee for the block exceeded my TX fees.

I see a suspicious number of blocks with ~700 transactions, taking a look at one:

Suddenly it makes more sense. The block is literally full of transactions. Some quick math explains why; 15M (max gas per block) / 21.5K (gas used by my TX) = 697.

I’ve heard people say that Avalanche has hit 850 TPS, which looking at the raw math, now seems a bit suspicious. I’m not sure it can produce multiple blocks per second at the moment. Here’s an example of the first few blocks, including their timestamp.

Block 54776239 at 1734969921: 660 bot tx, 669 total tx
Block 54776240 at 1734969922: 610 bot tx, 620 total tx
Block 54776241 at 1734969923: 651 bot tx, 667 total tx
Block 54776242 at 1734969927: 699 bot tx, 701 total tx
Block 54776243 at 1734969928: 667 bot tx, 673 total tx
Block 54776244 at 1734969929: 451 bot tx, 461 total tx
Block 54776245 at 1734969930: 669 bot tx, 678 total tx
Block 54776246 at 1734969932: 699 bot tx, 701 total tx

I’ve highlighted a few gaps with 1 or more seconds in between blocks. But there are plenty of examples of it hitting exactly one block per second.

Block 54776250 at 1734969940: 696 bot tx, 698 total tx
Block 54776251 at 1734969941: 675 bot tx, 678 total tx
Block 54776252 at 1734969942: 573 bot tx, 582 total tx
Block 54776253 at 1734969943: 393 bot tx, 395 total tx
Block 54776254 at 1734969944: 491 bot tx, 497 total tx
Block 54776255 at 1734969945: 481 bot tx, 484 total tx

So why the gap? I speculate that some of it is caused by under provisioned validators; I know Avalanche saw reduced liveliness during the inscriptions craze because some validators could not keep up with block production. I imagine there’s a visualization somewhere that shows the historical slot misses by validators but I’m not sure where that is.

No chart for Avalanche alone, looks kind of ugly, see below for combined charts.

All combined on a chart

Since I goofed on Avalanche I could only pick the first 30 seconds of data to keep things in parity between the three charts.

Missing timestamps for Avalanche make it ugly, so here’s a 3s moving average version.

Finally, cumulative transaction counts over the first 30s.

Summary

It seems like Sonic is a huge improvement over Fantom, but I think some of the distinction can be attributed to the RPC nodes (eating my words from earlier right now). It does consistently produce more blocks per second, which is a good improvement regardless of throughput.

Avalanche consistently produces larger and fuller blocks (up to the gas limit). Gas price scales up relatively quickly compared to the other chains. A surprising number of timestamps with zero blocks produced. It seems to do better at producing blocks on time (one per second) when the transactions per second included are lower.

The gas limit on Sonic seems insanely high, and the gas target is also pretty extreme, I didn’t experience any gas price increases over the baseline. No zero-block timestamps, a max of four blocks per second.

Thoughts

Note to self: stop doing expensive hobby research.

Location

One interesting outcome is that bots that launched in non-us locations had a distinctly worse time getting transactions included than us-based bots. Those poor Australia bots! But even Europe had a worse time than the US. The tyranny of the speed of light in action.

“Actual load testing”

I’ve said this like 10 times in this post but I’m not really sure what an actual load test even means. But if I wanted to attempt it, I’d probably use the same code but:

Only run in us-east1.
Use a validator for each chain in addition to the primary RPC and DRPC. Maybe also use QuickNode (paid node service), I’ve had good results with them.
Launch more bots, like a lot more. Maybe 800.
Be safer in sending TX from each bot, probably shoot for 2/s since none of these chains can currently handle 1600 TPS.
Set higher max gas for Avalanche, a lot higher, and fund the accounts more.
Just run for 30s, I don’t think 60s is meaningfully more useful.

Afterword

Apparently Sonic is running in something called “safety mode”. This means that there is a more even distribution of stake, which generally leads to lower finalization times because more nodes (possibly lower performance ones) need to learn about the transactions to finalize them).

In general I’d like to see more technical docs about Sonic, this seems like the kind of thing I should be able to read about. Will probably give it another go once it’s out of safety mode to see what the difference looks like.

If you’re reading this and you know how to visualize the DAG prior to inclusion in the main chain, PLEASE reach out to me to let me know, I would love to see what it looks like under load.

I’m also open to testing other L1s with cheap gas if you have suggestions. I don’t think it’s worth testing L2s with centralized sequencers, it’s not a fair comparison or even interesting.

tactical_retreat’s stuff

Discussion about this post