by Bilyana Lilly & Florentine Eloundou Nekoul
As Donald Trump’s victory in the 2016 U.S. Presidential elections was being announced, across the ocean, several Russian bloggers in St. Petersburg raised their glasses with champagne and said almost in unison, “We made America great.” They were members of Russia’s state-sponsored Internet Research Agency (IRA) which hired them to launch an unprecedented media campaign against the United States aimed to sow social divisions, denigrate Presidential candidate Hillary Clinton and aid Trump’s presidential campaign. The IRA staff, also known as trolls who create internet content in accordance with Russian government objectives, used existing social divisions and disinformation to amplify incendiary issues and encourage unwitting Americans to help them with organizing rallies in the United States. One of the fabricated narratives that the IRA trolls spread – a conspiracy known as PizzaGate – even convinced an American citizen to open fire in a D.C. Pizzeria. Despite significant advancements in policy and hundreds of removed IRA accounts on social media after the 2016 elections, the IRA continued to operate and is now targeting the American public ahead of the 2020 election. We developed an AI model that facilitates detection of Russian trolls on different social media platforms with a small amount of data.
Our team tested whether machine learning models, and especially Native Language Identification (NLI) models and a pre-trained neural networks architecture, can serve as the basis for a model that identifies Russian trolls across platforms with small amounts of data based on linguistic characteristics of posted content, which are more difficult to change than behavior on social media. While Russia’s IRA has recently started testing other tactics that can diminish the relevance of linguistic features, such as using local social media users to spread IRA content, our model can still be applied to identify Russian trolls on these networks, which are still a part of Russia’s operations. We used several datasets to build our model, including:
- a Twitter dataset of tweets by Russian trolls who were active during the 2016 U.S. elections that we used as a treatment sample;
- a control dataset of tweets from Twitter users representative of American Twitter users who were active in the same period;
- a dataset of error-annotated Russian English essays and serves as our treatment sample for the NLI analysis;
- and a dataset of trolls’ content on Reddit.
We ran several models that successfully identified that our NLI model is superior to other similar models, such as a simple logistic regression, which achieves a precision of 63.3% and an AUC of 81.5% – a standard set of metrics for the evaluation of such analyses. Table 1 provides a list of the results of some of the models we ran. Our NLI model yielded superior results. It achieved 98% precision and almost 100% accuracy. This means that our model makes it possible to distinguish between tweets by a Russian troll and tweets by English-speaking users with almost perfect precision and accuracy.
We then applied our model to Reddit data to evaluate how well our detection algorithm performs on troll-generated content from a different social media platform. Our results can be characterized as successful, as displayed in Table 2. We were able to identify Reddit comments from trolls, including their metadata, with high precision and accuracy.
To find out what the smallest amount of data to run our model can be, we ran our troll detection algorithm with progressively smaller data samples. The results in Table 3 show that the model is highly effective with as little as 2,500 observations.
The perpetual race of sophistication between digital defenders and malicious actors on social media means that methods of influence are constantly evolving. We found a solution by building a model that identifies Russian trolls based on linguistic features, which are more difficult to alter than behavioral ones. This finding enables us to apply the model across different media platforms where users exhibit different behavioral patterns. Another valuable aspect of our model is that it accurately detects Russian trolls with relatively little data. It can improve the robustness of detection tools and can be a part of digital forensic analysis for establishing attribution, which is a prerequisite for being able to hold adversaries accountable and protect our democracy from foreign interference.
This work was supported by the RAND Center for Scalable Computing and Analysis (SCAN).