AI safety tests found to rely on 'obvious' trigger words; with easy rephrasing, models labeled 'reasonably safe' suddenly fail, with attacks succeeding up to 98% of the time. New corporate research ...
Sasha Stiles turned GPT-2 experiments into a self-writing poem at a Museum of Modern Art installation—and a new way to think about text-generating AI optimization ...