Gemini vs Grok vs Claude vs DeepSeek: Which AI tool is best for chess? | Chess News


Gemini vs Grok vs Claude vs DeepSeek: Which AI tool is best for chess?
A detailed view of a chess board and pieces (Photo by Dean Mouhtaropoulos/Getty Images)

The inaugural day of the AI chess exhibition tournament, hosted by Google’s Kaggle Game Arena project, witnessed four Large Language Models (LLMs) securing dominant 4-0 victories to advance to the semifinals. Gemini 2.5 Pro, o4-mini, Grok 4, and o3 defeated their respective opponents Claude 4 Opus, DeepSeek R1, Gemini 2.5 Flash, and Kimi k2, showcasing the capabilities of general-purpose AI models in strategic gameplay.The Kaggle Game Arena, a new initiative by Google-owned Kaggle, aims to evaluate how LLMs perform in competitive environments. The tournament features eight leading LLMs competing in a single-elimination knockout bracket, with games broadcast live on multiple platforms.Go Beyond The Boundary with our YouTube channel. SUBSCRIBE NOW!Google has partnered with DeepMind to organise this unique tournament, where LLMs use a universal controller called “harness” to visualise positions and make moves. Each AI has four attempts to make a legal move, failing which results in losing the game.The match between Kimi k2 and o3 ended quickly, with none of the games lasting beyond eight moves. Kimi k2 consistently failed to make legal moves, despite showing the ability to follow opening theory for initial moves.O4-mini’s victory against DeepSeek R1 displayed a pattern of strong opening moves followed by declining play quality. Despite the inconsistencies, o4-mini managed to achieve two checkmates during the match.“This is a side effect btw. @xAI spent almost no effort on chess,” posted Elon Musk on X, responding to Grok 4’s impressive performance in the tournament.Gemini 2.5 Pro’s match against Claude 4 Opus featured more checkmates than illegal move forfeits. The first game showed both AIs maintaining good moves until move nine, when Claude 4 Opus made a critical error with 10…g5.Grok 4 delivered the strongest performance of the day, demonstrating particular skill in identifying and capitalising on undefended pieces in its match against Gemini 2.5 Flash.The tournament has revealed three primary challenges for LLMs in chess: visualising the entire board, understanding piece interactions, and making legal moves. These limitations vary among the different AI models.The competition continues on Wednesday, August 6, starting at 1 p.m. ET / 19:00 CEST / 10:30 p.m. IST. Viewers can watch the event live on GM Hikaru Nakamura’s Twitch and YouTube channels, as well as on the tournament’s dedicated events page.





Source link

Leave a Reply

Translate »