Ulm maul and battleaxe have odd results.
The maul troop has slightly less defense but is otherwise the same, yet in this readout they do better than the battleaxe. If anything it should be the other way around.
Something else you need to consider when running these tests is that when you do match-ups versus all units, the tests include lots of lightly armored troops. A much higher percentage than would be seen in a game with humans. This skews the results in favor of troops that do well versus light armor, such as flail troops that do low damage but strike twice. Such troops are much less effective versus heavy armor that will be more popular when playing against humans.
Just some considerations. I applaud the effort that you're putting into this work.
