Between May and July of 2018, staff members observed, collected and analyzed nearly 90 million public Twitter accounts that had released over 500 million tweets. In addition, researchers also examined elements of each account including profile screen names, number of followers, avatars and descriptions to gather one of the largest accumulations of Twitter data ever studied.
Among the report’s most interesting finds was a sophisticated “cryptocurrency scam botnet,” which consists of at least 15,000 separate bots. The botnet ultimately siphons money from individual users by posing as cryptocurrency exchanges, news organizations, verified accounts and even celebrities. Accounts in the botnet are programmed to deploy malicious behaviors to evade detection and look like real profiles.
Researchers were also able to map the botnet’s three-tiered structure, which consists of “hub” accounts that are followed by many bots, scam publishing bots, and amplification bots that specifically like tweets to increase their popularity and appear legitimate.
Olabode Anise, a data scientist and co-author of the report, explained, “Users are likely to trust a tweet depending on how many times it’s been retweeted or liked. Those behind this particular botnet know this and have designed it to exploit this very tendency.”
To discover the scam bots, researchers utilized subsets of varying machine-learning algorithms and built features that could train them to locate the bot accounts. Among the five considered algorithms were AdaBoost, Logistic Regression, Random Forest, Naive Bayes and Decision Trees. It was discovered that Random Forest outperformed the other algorithms during the initial testing phases. From there, three individual models of the algorithm were trained to deal with both social and crypto spam bots.
Researchers discovered that bot accounts follow certain behaviors, which, once identified, made them easier to recognize. For example, bot accounts often tweet in short bursts, causing the average times between messages to remain low, while actual Twitter users often wait longer periods between their tweets.
Some methods for evading discovery, however, are more sophisticated. Bots often use unicode characters in tweets rather than traditional ASCII characters. They also use screen names that are typos of spoofed accounts’ screen names, and add white spaces between words and punctuation marks. Profile pictures are also edited to prevent image detection. Finally, many bots appear to follow the same accounts.
Twitter has suspended cryptocurrency spam bots in the past and usually identifies fake accounts quickly. Nevertheless, executives appear to have missed several portions of the latest scam project.
A Twitter spokesperson claimed, “Spam and certain forms of automation are against Twitter’s rules. In many cases, spammy content is hidden on Twitter on the basis of automated detections. When spammy content is hidden on Twitter from areas like search and conversations, that may not affect its availability via the API. This means certain types of spam may be visible via Twitter’s API even if it is not visible on Twitter itself. Less than 5% of Twitter accounts are spam-related.”