STARS performs adaptive rejection sampling at the segment level, enabling efficient alignment of LLM outputs with reward models during inference without requiring additional training. This script uses ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results