Abstract: Split learning is a technique developed at the MIT Media Lab’s Camera Culture group that allows for participating entities to train machine learning models without sharing any raw data.
Ramesh Raskar, Associate Professor, MIT Media Lab; Principal Investigator (raskar(at)mit.edu)
Praneeth Vepakomma, Research Assistant, MIT Media Lab (vepakom(at)mit.edu)
Otkrist Gupta, MIT Affiliate
Vitor Pamplona, MIT Affiliate
Kevin Pho, MIT UROP
Split learning removes barriers for collaboration in a whole range of sectors including healthcare, finance, security, logistics, governance, operations and manufacturing.
For example, a split learning configuration as shown below allows for resource-constrained local hospitals with smaller individual datasets to collaborate and build a machine learning model that offers superior healthcare diagnostics, without sharing any raw data across each other as necessitated by trust, regulation and privacy.
Landscape of related work: As shown below, split learning ideally fills the gap for being able to perform advanced AI tasks like training machine learning models in distributed settings with a substantial level of data protection.
Key technical idea: In the simplest of configurations of split learning, each client (for example, radiology center) trains a partial deep network up to a specific layer known as the cut layer. The outputs at the cut layer are sent to another entity (server/another client) which completes the rest of the training without looking at raw data from
any client that holds the raw data. This completes a round of forward propagation without sharing raw data. The gradients
are now back propagated again from its last layer until the cut layer in a similar fashion. The gradients at the
cut layer (and only these gradients) are sent back to radiology client centers. The rest of back
propagation is now completed at the radiology client centers. This process is continued until the
distributed split learning network is trained without looking at each others raw data.
Please fill this simple form to reach out
How does split learning work and what is new in our approach?
Split learning attains high resource efficiency for distributed deep learning in comparison to existing methods by splitting the models architecture across distributed entities. It only communicates activations and gradients just from the split layer unlike other popular methods that share weights/gradients from all the layers. Split learning requires no raw data sharing; either of labels or features.
How is raw data protected and who can get positively impacted?
Split learning requires absolutely no raw data sharing. Sectors like healthcare, finance, security, surveillance and others where data sharing is prohibited will benefit from our approach for training distributed deep learning models. Another modality of split learning called NoPeek SplitNN also drastically reduces leakage due to any communicated activations by reducing their distance correlation with raw data while maintaining model performance via categorical cross-entropy.
How long will it take to transition from laboratory setting to actual deployments between cooperating entities?
The approach is easily deployable for inter and intra entity or organizational collaboration and is highly versatile in terms of possible network topologies. Due to its high resource efficiency in terms of computations, memory, communication bandwidth it is also naturally suitable for distributed learning where the clients are pervasive and ubiquitous edge devices like mobile phones or IOT devices as well as across larger devices and organizations.
Split Learning Papers:
1.) Distributed learning of deep neural network over multiple agents, Otkrist Gupta and Ramesh Raskar, In: Journal of Network and Computer Applications 116, (PDF) (2018)
2.) Reducing leakage in distributed deep learning for sensitive health data, Praneeth Vepakomma, Otkrist Gupta, Abhimanyu Dubey, Ramesh Raskar, Accepted to ICLR 2019 Workshop on AI for social good.(PDF) (2019)
3.) Split learning for health: Distributed deep learning without sharing raw patient data, Praneeth Vepakomma, Otkrist Gupta, Tristan Swedish, Ramesh Raskar, Accepted to ICLR 2019 Workshop on AI for social good.(PDF) (2018)
4.) Survey paper: No Peek: A Survey of private distributed deep learning, Praneeth Vepakomma, Tristan Swedish, Ramesh Raskar, Otkrist Gupta, Abhimanyu Dubey, (PDF) (2018)
1.) Accelerating neural architecture search using performance prediction, Bowen Baker, Otkrist Gupta, Ramesh Raskar, Nikhil Naik, In: conference paper at ICLR, (PDF) (2018)
2.) Designing neural network architecture using reinforcement learning, Bowen Baker, Otkrist Gupta, Nikhil Naik & Ramesh Raskar, In: conference paper at ICLR, (PDF) (2017)
We are giving a half-day tutorial at CVPR 2019:
On Distributed Private Machine Learning for Computer Vision: Federated Learning, Split Learning and Beyond by
Brendan McMahan (Google, USA), Jakub Konečný (Google, USA), Otkrist Gupta (LendBuzz), Ramesh Raskar (MIT Media Lab, Cambridge, Massachusetts, USA), Hassan Takabi (University of North Texas, Texas, USA) and Praneeth Vepakomma (MIT Media Lab, Cambridge, Massachusetts, USA).
Client-side communication costs are significantly reduced as the data to be transmitted is restricted to initial layers of the split learning network (splitNN) prior to the split. The client-side computation costs of learning the weights of the network are also significantly reduced for the same reason. In terms of model performance, the accuracies of Split NN remained competitive to other distributed deep learning methods like federated learning and large batch synchronous SGD with a drastically smaller client side computational burden when training on a larger number of clients as shown below in terms of teraflops of computation and gigabytes of communication when split learning is used to train Resnet and VGG architectures over 100 and 500 clients with CIFAR 10 and CIFAR 100 datasets.
Versatile configurations of split learning configurations cater to various practical settings of i) multiple entities holding different modalities of patient data, ii) centralized and local health entities collaborating on multiple tasks, iii) learning without sharing labels, iv) multi-task split learning, v) multi-hop split learning and other hybrid possibilities to name a few as shown below and further detailed in our paper here (PDF)
Please fill this simple form to reach out