Abstract: Trustworthy evaluation methods for code snippets play a crucial role in neural code generation. Traditional methods, which either rely on reference solutions or require executable test cases ...
Abstract: This paper presents a quantitative evaluation method for assessing the system stability of thermal power units after adding condenser function. An evaluation system for power system ...
CATArena (Code Agent Tournament Arena) is an open-ended environment where LLMs write executable code agents to battle each other and then learn from each other. CATArena is an engineering-level ...