Code Policy Models
An AI research project exploring whether large language models can serve as effective optimizers for game-playing agents by iteratively writing and refining Python code as the policy representation - replacing neural network weights with human-readable programs and gradient descent with LLM-guided code editing. The system operates an evolutionary loop: each generation, the LLM produces candidate policy edits, which are evaluated through parallel rollouts in a target environment (currently Pokemon Blue running on a headless Game Boy emulator), with optional Gemini video analysis providing multimodal feedback on agent behavior. A tournament selection mechanism pits multiple LLM-generated candidates against an elite policy to balance exploration with stability, while the full rollout trajectory and reward signal are fed back as context for the next generation's edits.