Outputs of the consumer Internet product development process are launched into consistently adversarial online environments. Products are often weaponized in ways their developers never imagined by highly-motivated adversaries who engage surprising creativity.
Seemingly innocuous products or features will eventually attract attraction for the spam and abuse of users, customers and the business, in addition to more subtle forms of manipulation which are even harder to detect. The scale of this problem is plain to see; it’s a defining feature of nearly every online social platform.
When Twitter launched Spaces in 2021, the product had horrific problems [1] [2] [3] in both the audio content and Spaces room text. These room titles got rendered onto the top of users’ home timeline for all to see, leveraging product infrastructure built for a completely different experience. The safety and moderation systems we’d built for the rest of Twitter had simply not been properly integrated into that product infra, and it was an expensive post-launch effort to fix that up.
That’s one example in a long list of Twitter product mechanics which were weaponized by bad actors, which also included everything from profile bios to reply recommendations to trends. One of the perpetually broken experiences was Lists, used to harass people in their notifications as well as through search typeahead via list-association signal ingestion.
Designing for safety requires inputs from all functions of the development process — user research, design, product management, trust & safety, legal and engineering.
In what kinds of situations is the user uncomfortable using the product? Are they be able to detect or recognize abuse, spam, or scams? Will they able to report it? Will they be able to categorize it as such? Is the reporting flow seamless enough with the product experience that users actually follow through? Will the action of reporting make the experience better for them in the short term, and long term?
The technical side of this problem is extremely complex — I wrote a little about it here (and since Twitter has disabled logged-out Tweet browsing, I’ll quote it inline):
The engineers (and everyone else, honestly) working on anti-abuse/product-safety/policy-enforcement systems don’t talk enough about how hard that job is.
I spent two years helping build an ML serving platform for models in that space. Its infinitely complex.
The problem space is adversarial; the nature of language and behavior is always changing — actors trying to work around changes to rules or enforcement; models drift
The objective functions are noisy: people handle abuse on the platform in different, sometimes unmeasurable ways
Understanding of language, its interpretation and intent is hard: conversational context, slang, dialects, code switching add dimensions of complexity
Precision and recall in these systems is requires a careful balance; its a point-in-time and they require constant adaptation
All that to say: if you work in the relevance or recommendations space and have never given thought to how your system might work with ambiguous objective functions, noisy metrics and a slice of the user base constantly evading your products, that is this space
What follows are a few considerations mass-market consumer Internet product should make prior to launch:
Clear Policy
Ambiguous usage policy or terms of service are a bilateral issue affecting both users and the ability of the service itself to manage itself effectively.
For the users, poor policy means the parameters of conduct are unclear, and leave a lot of room for interpretation both by bad actors (because abusive or faudulent behavior seem technically acceptable) and victims (who lack clarity on conduct standards and produce noisy reporting).
For the service, ambiguous policy results in an extremely difficult optimization problem for the technical solutions and agents combating abuse. Lack of precision during labeling of training data for models, or development or hand-written heuristics, will result in low-precision preemptive detection of violative content or conduct.
Proactively setting (and maintaining) explicit and clear terms of service relating to site conduct are the foundation upon which effective product safety technology solutions are built.
Take two real-world examples of online social platform conduct policy. The first:
You may not threaten, incite, glorify, or express desire for violence or harm.
You may not wish, hope, or express desire for harm. This includes (but is not limited to) hoping for others to die, suffer illnesses, tragic incidents, or experience other physically harmful consequences.
The second:
Be kind. Don’t be snarky. Converse curiously; don’t cross-examine.
One of these is easier to interpret and model than the other.
Another note for emerging companies and services expanding to international markets: internationalizing policy is not as simple as translating it into other languages. Good policy localization relies on experts from relevant locales to interpret context, customs, conduct and language.
Reporting and Reviewing
Users need a way to report specific content or conduct.
Even when relying on automated systems or 3rd party services to try and filter obvious spam, abuse or fraud, users need a way to indicate what kinds of things weren’t caught— or what kinds of things the developers didn’t realize might be a problem.
The reporting and subsequent actioning systems must be resilient against abuse of themselves — against troll armies or unhappy users attempting to co-opt spam and abuse reporting in a punitive fashion against other users.
Spam, Abuse and Prevalence Metrics
Defining metrics which track abusive experience are notoriously difficult — if one could accurately measure how often users were exposed to abuse in their products, they’d probably have prevented the abuse in the first place. Note that the rate at which abuse is experienced is not the same as the rate at which it is reported.
Companies sometimes fall back to spam and abuse reporting volume as a metric, sometimes normalized over impression counts or other usage metrics. These metrics are important, but may move up or down depending on changes to product mechanics or safety-related interventions. The idea here is roughly:
- When users see spam or abuse, they report it
- If report levels go down, users are having a safer experience
- If report levels go up, users are having a less safe experience
There’s a few problems with these assumptions. Primarily: this is a very noisy metric. Reporting not uniformly performed by all users. Users may not report bad experiences consistently, or at all, because:
- They couldn’t be bothered
- They couldn’t figure out how to do it
- It was too much work
- They didn’t think anything would come of it
- They’re worried about being obligated to follow up
- They’re worried about retaliation by the other party
- They’ve given up trying to improve the product
- They’d rather leave
Reports shouldn’t be assessed independently from other kinds of product telemetry because they’re dependent on users taking specific action. So for example, if report rates decrease but dwell time is also decreasing, chances are good problems haven’t been solved — instead, the opposite.
Prevalence metrics are a useful latent technique to try and understand the exposure of the user base to violative content via offline analysis. The goal is to approximate how many users viewed or interacted with violative or marginal classes of actions or content. Here’s an example from Facebook.
Minimum Safety Integration as a Product Policy and Platform Feature
The development of new policy, representative abuse metrics, and robust (and proactive) product safety countermeasures are hard to balance with rapid product development. These things take time, and are slow to react to seemingly minor but occasionally substantive changes in product mechanics which dictate how users interact with each other.
At a bare minimum, new products and features need to be wired into safety actioning systems so that the service can react quickly when it needs to. The interfaces for safety systems or platforms should be simple and intuitive enough that basic integration can be quickly and cheaply factored into the product roadmap. Whether that’s rapid rules and heuristic-based actioning, or blunt takedown tools, its important to consider ways to respond to unexpected events quickly after they happen.
The worst-case situation is where every new product release requires a completely bespoke safety integration. This either pushes out release time dramatically, or leads to premature release without adequate safety controls in place.
Separately from being a challenge from a product timeline standpoint, this creates tension between the product and safety teams which is usually unconstructive.
Red Team Exercises
NIST defines red team exercise as
An exercise, reflecting real-world conditions that is conducted as a simulated adversarial attempt to compromise organizational missions or business processes and to provide a comprehensive assessment of the security capabilities of an organization and its systems
This is a security-oriented definition, but when applied to product safety, red teams provide an invaluable way to understand how new products or platforms could be used in unintended or unexpected ways. Its also an interesting way to identify gaps in product requirements or behaviors, from the perspective of users or customers who may not be malicious, but may be trying to repurpose the product for some other legitimate use case.
I’ve thoroughly enjoyed every red team exercise I’ve participated in. These things tend to result in a huge list of “p0” issues to close up before launch, so its best to organize them around a clear set of risk prioritization criteria, rather than equate every functional or safety gap.
Safety should not be an afterthought in the process of developing consumer-oriented Internet products. As online spaces become more complex, it’s crucial for product developers to understand the risks involved and take proactive steps to mitigate them. Clear policies, robust reporting and reviewing mechanisms, accurate abuse metrics, and the integration of safety features should be considered early in the development process.
Throw in a regular red-team evaluation process for significant product launches to find gaps, but also build a culture of adversarial awareness in the development team.