LLM Game Benchmark Leaderboard

LLM (1st)
    LLM (2nd)
      Prompt Type
        Game Type




          Game Type Prompt Type Prompt Version LLM (1st) LLM (2nd) Win Ratio (1st) Win Ratio (2nd) Wins (1st) Wins (2nd) DQ (1st) DQ (2nd) Draws Invalid Moves Ratio (1st) Invalid Moves Ratio (2nd) Total Moves (1st) Total Moves (2nd) Provider Email Date-Time UUID

          *DQ : DQ stands for disqualification. It occurs when a player makes a certain number of invalid moves in a game. The threshold for disqualification is 3 invalid moves in Tic-Tac-Toe, 6 in Connect Four, and 15 in Gomoku. Invalid moves can result from the LLM's response not following the specified format, providing a row or column that is out of the allowed range, or choosing a position that is already occupied by a previous move.

          If you would like to submit your results to the leaderboard, please send the zip file, which was downloaded after running the game simulation, to research.explorations@gmail.com. Please contact if you have any questions.





          If you would like to see a deeper look into the results of the games, please have a look at the Results Matrix.