What is a Feed?
Feeds are everywhere: the Facebook news feed, the Instagram or Threads home feeds, the GitHub homepage feed, the Amazon home feed, the Twitter/X home timeline, the LinkedIn home feed, the Medium For you feed.
These feeds are core constructs within social or content platforms, and are complex aggregations of user or partner-generated content, events and relationships between actors (human or otherwise). They are conceptual derivatives of web feeds. Feeds generally try and balance several objectives, which often exist in tension:
- Keep viewers informed about, or entertained with, recent content updates from followed accounts or concepts
- Expand viewers’ exposure to relevant, interesting content from outside of their directly-followed set of content producers
- Incentivize viewers’ engagement with the various elements in the feed, including native ad formats, new products, and other things
I’m going to describe at a high level how feeds work and some of their characteristics. This post will focus on high-level product and technical concepts, and leave design specifics for another time.
But first, a technical definition:
A feed is a flattened, scrollable view of a personalized subset of a content platform’s data graph
That’s a mouthful, so I’ll break this down below.
Data Graph
Content platforms operate on enormous amounts of data. The products tend to be built on structured data, often called entities, which may have records counting in the billions, trillions, or more.
Many of these entity types are familiar to users; they are often visible in various product experiences and can be created or interacted with, from posts
to friends
, messages
to hashtags
.
It’s useful to think about these entities as nodes in a graph, regardless of whether they are actually stored and managed that way. The edges between the nodes in this conceptual graph represent a relationship between the various entity types and records; for example
Post(ID) -[links to]-> Article(ID)
Article(ID) -[linked by]-> Post(ID)
Post(ID) -[created by]-> User(ID)
User(ID) -[created]-> Post(ID)
User(ID) -[likes]-> Post(ID)
Post(ID) -[liked by]-> User(ID)
User(ID) -[interested in (0.7)]-> Topic(ID)
Topic(ID) -[interest from (>= 0.7)]-> UserId([IDs])
Edge types usually have a bidirectional conceptual relationship, though in practice the graph traversal through the underlying systems may be organized primarily for queries in one direction or the other. It depends on the nature of the relationship between the entities and how the platform uses them. For example, the bidirectional queries of
User(ID) -[membership in]-> Group([IDs])
Group(ID) -[members]-> User([IDs])
may be common and probably have read-optimized query paths in both directions, while the queries
Post(ID) -[contains]-> Hashtag([IDs])
Hashtag(ID) -[contained within]-> Post([IDs])
are not 1:1 in terms of complexity, and are probably not served by the same query paths. The second query is considerably more expensive and probably requires specialized infrastructure, like an inverted index. The number of hashtags in a post may be on the order of 10; the number of posts associated with a given hashtag may be on the order of millions.
Using the graph and these relationships, the structure between the entities can be mapped out and understood. Ultimately the various product experiences of the content platform, from its feed to other things, are compositions and arrangements of this structure.
At a technical level, these relationships might be stored in an actual graph database like Neptune or Neo4j, or the data graph might just be a conceptual graph, consisting of relational or distributed SQL, key-value stores, search indexes, application logic in clients and backend services, and other things.
Personalization
There’s a lot of data in the data graph; the challenge is getting the right subset of that data in front of the feed viewer. If viewers had infinite time and interest, they could browse everything and see 100% of relevant content; in reality, viewers tend to bounce in under a minute if they’re not engaged. This could broadly be described as an experience personalization or content relevance problem.
Depending on the nature of the content platform, there are a few different inputs which could be considered when pulling data from the graph into a feed for a viewer:
- Explicit edges between a viewer and entities (followed users, trending hashtags, Subreddits, topics or interests), usually created by the viewer
- Nth-degree explicit edges between a viewer and entities (for example, the viewer follows a user who liked a post)
- Global popularity, locality-based or subnetwork-based trending interest, or timeliness of an entity and its content
- Inferred relevance of the data based on the matching of a viewer’s interests with the content or characteristics of the entity (implicit edges)
There are usually a number of linear (rules-based) and recommender systems which work to produce candidate feed entities along these lines, which are combined during a final heavy ranking stage. For example, the articles
recommender system could be distinct from the inferred topical interest posts
recommender system, but the end result visible to the user is the product of a singular cross-origin entity interleaving and ranking system. This final ranking and interleaving system might be located in the clients, but is usually a backend process.
Some feed inputs might be precomputed in an offline or asynchronous fashion, while other inputs are assembled on-demand at read-time when the viewer loads the feed. The difference depends on the nature of the product mechanics, scale and performance realities, and cost considerations.
For example, Twitter famously built an enormous write-time assembly system for home timelines (called home fanout) in the early 2010s, which was the foundation of home timeline serving for many years. By around 2016, however, that system was just one of many inputs, largely on-demand, that went into the final home timeline composition.
Flattened Views and Intuitive Navigation
Given all that complexity, the feed needs to be flattened into a simple, linear and vertically-scrollable format amenable to mobile devices and traditional vertical scroll on web. There are a few general experience expectations from viewers:
- Viewers generally should be able to navigate back to previously-seen elements, giving the feed a feeling of stability and concreteness, as opposed to ephemerality. In a strict reverse-chronological feed, this is akin to scrolling over a literal timeline of events
- The feed shouldn’t contain duplicative elements across sessions. In strict reverse-chronological feed, this isn’t very difficult, but in a highly recommendations-oriented one, it can be challenging
For the first point, this can generally be accomplished by caching a larger computed (and yet unserved) feed in the backend, or caching viewed feed entities, and their ordering, on the client. These solutions require careful pagination mechanics to ensure the clients and backend both understand where the viewer is scrolling.
For the second expectation, de-duplication can be accomplished either by a feed-wide backend entity serving deduplication or ranking penalty system, or an entity-specific deduplication layer built-in to the feed’s constituent candidate sources. It’s also possible to cache the set of viewed entity IDs on the client side, and use this to prevent the rendering of duplicates, but this cache can grow to very large sizes and might become impractical, and wouldn’t address cross-client deduplication concerns.
Challenges
There’s a few challenges specific to the home feed variant of the broader family of feeds, which differentiate them from others like search results or top posts.
Product Dominance
The home or main feed tends to dominate where viewers spend their time. Most people don’t spend a lot of time actively searching for things they’d like to read about or engage with — they expect interesting content to be surfaced to them.
This causes a dynamic which creates difficulty successfully launching new products which do not have some sort of hook into, or distribution mechanism from the home feed; or at least, on the broader home page. Sometimes this manifests as a home feed of heterogeneous containers or content types. For example, the Reels
or People You May Know
horizontal containers on the Facebook news feed, or Ads
in native formats— the goal here is to place these features into the feed, where viewers spend their time. The feed system has global awareness of the quality of the feed, so it tends to decide where external products are located, even if it has no say in their contents.
Feed {
FeedElementContainer {
OrganicPost {} // Selected by feed team
}
FeedElementContainer {
Ad {} // Slot and content selected by ad-serving team
}
FeedElementContainer {
OrganicPost {} // Selected by feed team
}
FeedElementContainer {
PYMK {} // Slot selected by feed team; contents by PYMK team
}
}
This means that in practice, product and engineering teams building these feeds are part product owners, and part product platforms. In the latter case, their responsibility is to help other products and features succeed. That can be difficult, especially when launching new classes of entities or containers into the feed requires a tradeoff against existing, high-performing products or containers.
Scale
The home feed is usually in the default app-opening or home page experience, and is where users tend to spend their time, so they generate a lot of online request load into the backend serving systems.
If the home feed is completely pre-materialized and stored in an efficient serving layer, then serving load becomes more manageable. However, if some, or all, of the home feed is computed on-demand by viewer read-request, it can become incredibly expensive when heavy recommendation or inference workloads are in the hot path.
Though a default feed response may assemble 10 or 100 posts at a time for any given request, the earlier candidate generation and filtering stages may process many orders of magnitude more posts — hundreds of thousands, or millions of potential entities, depending on the density of the viewer’s graph. If ads are being served, a similar process is being run in parallel via real-time bidding in the ads stack.
Back when Twitter was Twitter, around a third of total infrastructure costs could be directly attributed just to core home timeline relevance and serving, which did not include ad serving and other peripheral products.
Viewer intent
The intent of a viewer when opening their home feed is ambiguous, and feeds need to appeal to a generalized viewer intent for that session.
This is different than other products, where the viewer intent is more clear — for example, if user searches for toronto maple leafs
, their intent to find content relevant to a hockey team is obvious. If a user visits a content discovery experience (these things tend to be called explore
, discover
or popular
), their intent is likely to find content outside of their strong-edge graph, perhaps because they’re dissatisfied with their home feed composition.
So while its not too difficult to understand what kinds of things a viewer is generally interested in, its very difficult to understand whether their intent for any given session on the feed is oriented around catch me up on the latest news or keep me entertained while I kill time.
Feed Optimality
Feed assembly is not just a pure recommendation system problem because of a two key differences: User-created constraints, and timeliness.
Users declare strong edges with other entities, usually other accounts. This behavior may result in a suboptimal Feed experience (by a variety of metrics-based technical measures), but results in the feed users want or expect.
Secondly, there’s an expectation that viewers see novel updates or deltas from their mental state of their network since their last login or session. Thus, personalization efforts are necessarily time-bound within some product-defined window, and need to account for the value of timeliness in the product experience, as well as the prevention of serving duplicative content.
Feed assembly is an optimization problem with major, noisy constraints. Finding the right balance of engaging and expected entity interleaving in the Feed is a challenging task that needs to account for broader product and platform mechanics, product strategy, cost, and user needs.
There’s lots more here to write about, perhaps in a future post, including:
- Pagination mechanics for clients
- Content filtering
- Candidate generation and ranking funnels
- Engagement and recommendation strategy
- Feed update streaming and incremental ranking
Feeds are extremely complex structures, where nuanced changes can dramatically impact an online platform’s entire architecture and business. I’ve found that mentally modeling the feed as a flattened, scrollable view of a personalized subset of a content platform’s data graph
helps clarify its purpose and relationship to other product surfaces. Hopefully you will too!