NIST: Data Distribution in Privacy-Preserving Federated Learning

Our first post in the series introduced the concept of federated learning and described how it’s different from traditional centralized learning – in federated learning, the data is distributed among participating organizations, and share model updates (instead of raw data).

What kinds of techniques can we use to build privacy-preserving federated learning systems? It turns out to depend heavily on how the data is distributed. This post defines and explains the different ways data can be distributed, or partitioned, among participants in federated learning systems. Future posts in the series will describe specific techniques applicable in each situation.

Data partitioning schemes describe how data is distributed among participating organizations, as compared to the centralized scheme in which one party holds all the data.

In a horizontal partitioning scheme, the rows of the data are distributed among the participants.
In a vertical partitioning scheme, the columns of the data are distributed among the participants.

Combinations of the two are also possible—we’ll get to those at the end of this post…

Read the Blog