(a) Performance profiles for
50 tasks comparing GFP against a wide range of prior works,
showing
the
fraction of tasks where each algorithm achieves a score above threshold tau. (b) Performance
profiles on
105
tasks, including more challenging ones, and carefully reevaluated prior methods. (c)
Performance
profiles
restricted to
30 noisy and explore tasks.
Along with the code, we release csv files containing all our benchmarking results.
These include the
144
tasks
GFP was evaluated on, all the
hyperparameters
used, our careful reevaluation of ReBRAC on
OGBench, and the first evaluation of GFP
and FQL
on
Minari.