By now, everyone has heard of fake news, aka, misinformation, or propaganda. While false or misleading news precedes the internet, fake news has now become a singular part of a much larger problem in the digital world called content abuse. This includes any form of user generated content that is malicious in nature--trolling, bullying, harassment, fake reviews and social engineering scams that play on our emotions to trick us into handing over money, data, or in the case of the recent 2016 presidential election scandal involving Facebook and consulting firm Cambridge Analytica, our votes.

Digital identities are rapidly growing, and with more and more users online - we have a reported 3.8 billion users online as of 2017 - criminals, armed with machine learning technologies, are gaining access to an increasing amount of information they can use to identify and exploit our vulnerabilities. On the flip side of this, machine learning has proven a powerful force in fighting back against fraudsters, malicious content, abuse and more. Without it, businesses are running on antiquated methods for identifying fraud, fraudsters stay a step ahead, and the cost to businesses can be very high.

Although harder to quantify than direct losses such as chargebacks or fraudulent purchases, for businesses on the receiving end, abusive content is arguably more costly because it attacks a company’s  top line-- the user. Losing user trust means customer acquisition costs go up, and lifetime value of a customer goes down.

The fallout from the Facebook-Cambridge Analytica scandal shows these costs quite clearly: A recent poll found that just 41 percent of Americans trust Facebook to protect their privacy. Facebook’s market value dropped $60 billion dollars in a 48 hour span, as the hashtag #DeleteFacebook went viral.

What abusive content looks like

Combating fraud and abuse online is becoming more and more difficult because there’s no clear definition of what this looks like due to the volume and varying types of fraud. Catfishing and cyberbullying have gained traction in the mainstream in recent years, as users who have fallen victim to these malicious activities seek to expose them to the broader population of digital users.

Marketplace scams are also common - for example, Amazon’s recent troubles with things like hackers stealing account credentials from legitimate merchants and selling non-existent or fake goods. In this case, buyers don’t realize until the fake goods are received, or in many cases no goods are received, that a scam has taken place. By then the “seller” is long gone - with the buyer’s money.

Often, users turn their anger at being scammed, catfished, or having their user data shared without their consent or knowledge towards the platform they’ve been using. The expectation is that these companies are putting the necessary technology in place to protect their users, and when they don’t, the users lose trust in that company’s ability to protect them from fraud. This ultimately affects a company's top line - it’s users - and bottom line - its profits - and rebuilding user trust is no easy feat. These businesses need to find the best and most effective way to protect themselves and their users or risk falling prey continuously to fraudsters, scammers, and malicious and abusive content.

Machine learning to the rescue

Fraudsters have found their way around the basic protections most sites have in place. The way sites often deal with content abuse is to hire a team of content moderators whose job it is to comb through ratings and reviews or watch videos. They’re usually supported by rules-based systems and blacklists that look at each piece of content for rules violations, and either block offending content or flag it for moderators to review.

Professional scammers know this, so rules-based systems have become a never-ending game of whack a mole. They are not agile enough to respond to new words or slang terms, and they are very difficult to scale. Additionally, they are not designed to be able to consider user behavior at all. That can result in “false positives” where legitimate users are actually turned away, while fraudulent users skate under the radar.

Machine learning-based technology can analyze far more signals than what rules-based systems can handle, and look at them all simultaneously to form a more complete picture. This is a far more effective approach, because fraudsters can alter text to avoid triggering the rules, but it’s harder to alter signals such as what device they’re using, or the speed at which they’re posting. For example, if someone signs up for 15 accounts and starts posting to each of them within two minutes of signing up, the machine compares that to the behavior of an average user and flags it as an anomaly.

More and more companies are realizing the power machine learning has to keep hackers and fraudsters from scamming legitimate users. For example, Zoosk uses machine learning to streamline their fraud management workflow, helping analysts review content more efficiently. That way, they can quickly identify what is and isn’t fraudulent on the platform. Machine learning models enable companies that rely on user-generated content to access the most accurate data faster, so they can make more informed decisions in real time, such as blocking users and investigating suspicious content.

Considering all the factors

As machine learning systems gather more data, they generate new insights that can make the picture ever sharper. One insight we’ve uncovered with machine learning is that users with iPhone 5s are seven times more likely to commit fraud than iPhone 8 users. Why? If you want to get around the rules by using a lot of different devices, you don't need the latest and greatest iPhone. Used iPhone 5s will do just fine, so hackers and fraudsters will choose older tech to carry out these various forms of malicious activity.

As machine learning becomes more accessible, fraudsters are creating their own machine learning models to perpetrate content abuse schemes. Fortunately, machine learning tools are more accessible than they’ve ever been to businesses of all types and sizes, not just the Googles and Amazons of the world. These tools can effectively use an ever-growing body of data to detect fraud patterns and block threats before they strike.  The ability of this technology to learn continuously also keeps it one step ahead of malicious and abusive schemes, protecting users from things we haven’t even thought of yet.