BOTHALTEROUT: Training an AI to hate the USMNT

By Eliot McKinley

The United States Men’s National Team Twitter community is an interesting place. While there is lots of good stuff there, like detailed tactical analysis and extensive coverage of every bounce pass, there is also a large contingent of people that seem to hate everything about soccer in this country. Win or lose, you’ll see them in the replies of many tweets from US Soccer or journalists confidently expressing why whatever just happened is bad for the current and future state of American soccer and they have the solution. 

For sanity, it is best to just mute these “Reply Guys,” but what if you are a sicko who still needs to see these kinds of takes and just took a workshop on AI? You force a bot to read thousands of these tweets and then ask it to write its own, of course.

It turns out that training a model to imitate the writing style and vocabulary of a person or group of people is not too hard these days. In fact, with the transformer models available on Hugging Face and a pre-written Google Colab notebook by Boris Dayama, you can actually make your own AI version of someone’s Twitter account in about 10 minutes.

I decided to go a bit farther, collecting 21,832 tweets from 12 accounts that are among the pre-eminent Reply Guys of USMNT Twitter to fine tune a new model based upon the existing GPT-2 model for text generation. Training this takes a bit longer than just one account, but in an hour we’ve got a bot that looks a lot like what you might see from these types of accounts on Twitter.  

I call this model BOTHALTEROUT after the rallying hashtag of the Reply Guys. After supplying a short prompt, BOTHALTEROUT will finish the tweet, including things like hashtags and emojis. While it often returns nonsense, if you give it a few tries it almost always gives you something amazing. Here are some examples:

The model does have some limitations. Since the GPT-2 model was trained on text pulled from the internet, it may have the same inherent biases that comes with text on the internet. The additional training from the Reply Guy tweets does not help with this issue and in some ways may make it worse. Furthermore, the data set is still pretty small. Adding more accounts and training for longer may provide a bit better performance, although I’m not sure that this task is worth too much computing resources.

You can try BOTHALTEROUT on Hugging Face or generate multiple results using this notebook. Please use responsibly.