Split Learning Project Page: Distributed deep learning without sharing raw data

Abstract: Split learning is a technique developed at the MIT Media Lab’s Camera Culture group that allows for participating entities to train machine learning models without sharing any raw data.


Ramesh Raskar, Associate Professor, MIT Media Lab; Principal Investigator (raskar(at)mit.edu)
Praneeth Vepakomma, Research Assistant, MIT Media Lab (vepakom(at)mit.edu)
Otkrist Gupta, MIT Affiliate
Vitor Pamplona, MIT Affiliate
Kevin Pho, MIT UROP

Application Scenarios:

Split learning removes barriers for collaboration in a whole range of sectors including healthcare, finance, security, logistics, governance, operations and manufacturing.

For example, a split learning configuration as shown below allows for resource-constrained local hospitals with smaller individual datasets to collaborate and build a machine learning model that offers superior healthcare diagnostics, without sharing any raw data across each other as necessitated by trust, regulation and privacy.

Landscape of related work: As shown below, split learning ideally fills the gap for being able to perform advanced AI tasks like training machine learning models in distributed settings with a substantial level of data protection.

Privacy aware AI, Split Learning at World Economic Forum and Niti Aayog

Health Grid: Blockchain-based Data Marketplace | Ramesh Raskar | WEF 2019 RAMESH RASKAR INTERVIEW WITH BLOXLIVE AT THE WEF AI for All | Speedtalk | Ramesh Raskar Ramesh Raskar: UNC-Chapel Hill Convocation Speaker | 2019

Key technical idea: In the simplest of configurations of split learning, each client (for example, radiology center) trains a partial deep network up to a specific layer known as the cut layer. The outputs at the cut layer are sent to another entity (server/another client) which completes the rest of the training without looking at raw data from any client that holds the raw data. This completes a round of forward propagation without sharing raw data. The gradients are now back propagated again from its last layer until the cut layer in a similar fashion. The gradients at the cut layer (and only these gradients) are sent back to radiology client centers. The rest of back propagation is now completed at the radiology client centers. This process is continued until the distributed split learning network is trained without looking at each others raw data.

SplitNN Architectures


Potential Partner/Want to connect with us?

Please fill this simple form to reach out

Frequently asked questions

  1. How does split learning work and what is new in our approach?
    Split learning attains high resource efficiency for distributed deep learning in comparison to existing methods by splitting the models architecture across distributed entities. It only communicates activations and gradients just from the split layer unlike other popular methods that share weights/gradients from all the layers. Split learning requires no raw data sharing; either of labels or features.

  2. How is raw data protected and who can get positively impacted?
    Split learning requires absolutely no raw data sharing. Sectors like healthcare, finance, security, surveillance and others where data sharing is prohibited will benefit from our approach for training distributed deep learning models. Another modality of split learning called NoPeek SplitNN also drastically reduces leakage due to any communicated activations by reducing their distance correlation with raw data while maintaining model performance via categorical cross-entropy.

  3. How long will it take to transition from laboratory setting to actual deployments between cooperating entities?
    The approach is easily deployable for inter and intra entity or organizational collaboration and is highly versatile in terms of possible network topologies. Due to its high resource efficiency in terms of computations, memory, communication bandwidth it is also naturally suitable for distributed learning where the clients are pervasive and ubiquitous edge devices like mobile phones or IOT devices as well as across larger devices and organizations.


Split Learning Papers:

1.) Distributed learning of deep neural network over multiple agents, Otkrist Gupta and Ramesh Raskar, In: Journal of Network and Computer Applications 116, (PDF) (2018)

2.) Reducing leakage in distributed deep learning for sensitive health data, Praneeth Vepakomma, Otkrist Gupta, Abhimanyu Dubey, Ramesh Raskar, Accepted to ICLR 2019 Workshop on AI for social good.(PDF) (2019)

3.) Split learning for health: Distributed deep learning without sharing raw patient data, Praneeth Vepakomma, Otkrist Gupta, Tristan Swedish, Ramesh Raskar, Accepted to ICLR 2019 Workshop on AI for social good.(PDF) (2018)

4.) Survey paper: No Peek: A Survey of private distributed deep learning, Praneeth Vepakomma, Tristan Swedish, Ramesh Raskar, Otkrist Gupta, Abhimanyu Dubey, (PDF) (2018)

AutoML Papers:

1.) Accelerating neural architecture search using performance prediction, Bowen Baker, Otkrist Gupta, Ramesh Raskar, Nikhil Naik, In: conference paper at ICLR, (PDF) (2018)

2.) Designing neural network architecture using reinforcement learning, Bowen Baker, Otkrist Gupta, Nikhil Naik & Ramesh Raskar, In: conference paper at ICLR, (PDF) (2017)

We are giving a half-day tutorial at CVPR 2019: On Distributed Private Machine Learning for Computer Vision: Federated Learning, Split Learning and Beyond by Brendan McMahan (Google, USA), Jakub Konečný (Google, USA), Otkrist Gupta (LendBuzz), Ramesh Raskar (MIT Media Lab, Cambridge, Massachusetts, USA), Hassan Takabi (University of North Texas, Texas, USA) and Praneeth Vepakomma (MIT Media Lab, Cambridge, Massachusetts, USA).

Recent talk on Split Learning at Datacouncil.ai SF 2019 (Slides)

Split learning’s computational and communication efficiency on clients:

Client-side communication costs are significantly reduced as the data to be transmitted is restricted to initial layers of the split learning network (splitNN) prior to the split. The client-side computation costs of learning the weights of the network are also significantly reduced for the same reason. In terms of model performance, the accuracies of Split NN remained competitive to other distributed deep learning methods like federated learning and large batch synchronous SGD with a drastically smaller client side computational burden when training on a larger number of clients as shown below in terms of teraflops of computation and gigabytes of communication when split learning is used to train Resnet and VGG architectures over 100 and 500 clients with CIFAR 10 and CIFAR 100 datasets.

Versatile plug-and-play configurations of split learning

Versatile configurations of split learning configurations cater to various practical settings of i) multiple entities holding different modalities of patient data, ii) centralized and local health entities collaborating on multiple tasks, iii) learning without sharing labels, iv) multi-task split learning, v) multi-hop split learning and other hybrid possibilities to name a few as shown below and further detailed in our paper here (PDF)

News stories

  1. (A new AI method can train on medical records without revealing patient data)

  2. (A little-known AI method can train on your health data without threatening your privacy)

  3. (The Algorithm Newsletter: The privacy-preserving AI technique that will transform healthcare)

  4. (Les Echos: Medical secrecy, artificial intelligence and RGPD: irreconcilable? Not so sure…)

Potential Partner/Want to connect with us?

Please fill this simple form to reach out