The Lab

Experiments at the frontier of AI.

We don’t just deploy AI - we push it forward. Our lab is where we test ideas that haven’t been tried, build tools that don’t exist yet, and publish work that gets cited by the world’s best.

Experiments

2026Active Research

Code Policy Models

An AI research project exploring whether large language models can serve as effective optimizers for game-playing agents by iteratively writing and refining Python code as the policy representation - replacing neural network weights with human-readable programs and gradient descent with LLM-guided code editing. The system operates an evolutionary loop: each generation, the LLM produces candidate policy edits, which are evaluated through parallel rollouts in a target environment (currently Pokemon Blue running on a headless Game Boy emulator), with optional Gemini video analysis providing multimodal feedback on agent behavior. A tournament selection mechanism pits multiple LLM-generated candidates against an elite policy to balance exploration with stability, while the full rollout trajectory and reward signal are fed back as context for the next generation's edits.

2026Active Research

Code Language Models

A research project exploring whether LLMs can act as optimizers for language models by writing and refining Python code as the model itself - replacing learned weights updated by gradient descent with human-readable rules updated by LLM-guided code edits. The system runs a multi-agent optimization loop: a planner agent reviews past results and proposes improvement ideas, parallel improver agents implement each idea on isolated branches, and an integrator agent evaluates and merges the best performers back into the main line. A constraint scanner enforces that the model stays purely rule-based - no neural networks, no corpus statistics, no learned parameters.

2025

Quantamental Trading System

AI experiment exploring whether multi-model LLM consensus can reliably detect market-moving corporate announcements and generate actionable trading signals across global equity markets. The system operates a continuous surveillance loop across nine exchanges (NZX, ASX, SGX, JPX, SEC, LSE, HKEX, XETRA, TWSE), feeding rich financial context simultaneously to three frontier LLMs (GPT-5, Claude Opus, Gemini 2.5 Pro) reasoning independently with extended thinking enabled. A consensus mechanism requires at least two of three models to agree on direction with a minimum 4% expected gain threshold before emitting a signal.

2025

LucentBench

A benchmarking framework for evaluating AI performance in financial intelligence tasks. Designed to rigorously test how well language models handle real-world fund management scenarios - from research synthesis to portfolio analysis.

View resource→

2024

AI "No-Code" Notebooks

A collection of no-code-required notebooks for non-technical users to interface with a handful of open source models and capabilities - democratizing access to AI tools in early 2024.

2023

GPT CLI

An early open-source command-line interface for ChatGPT, enabling developers to interact with GPT models directly from their terminal.

View resource→

2022

Halo Lang

An experimental programming language project exploring novel approaches to language design and compilation.

View resource→

2022

Torchwindow

An open-source ML visualization library for PyTorch, providing real-time training visualization and debugging tools for machine learning engineers.

View resource→

2020

Temporal Probability Calibration

Published research on calibrating probabilistic predictions over time - cited by Google Brain, Amazon AWS, Stanford, NYU, and other leading institutions.

View resource→

Timeline

2020

2022

2023

2024

2025

2026