Machine learning models trained on data from blockchain-based marketplaces have the potential to create the world’s most powerful artificial intelligences.
They combine two potent primitives: private machine learning, which allows for training to be done on sensitive private data without revealing it, and blockchain-based incentives, which allow these systems to attract the best data and models to make them smarter. The result is open marketplaces where anyone can sell their data and keep their data private, while developers can use incentives to attract the best data for their algorithms to them.
Constructing these systems is challenging and the requisite building blocks are still being created, but simple initial versions look like they are starting to become possible. I believe these marketplaces will transition us out of the current era of Web 2.0 data monopolies into a Web 3.0 era of open competition for data and algorithms, where both are directly monetized.
Origin
The base of this idea came in 2015 from talking with Richard of Numerai. Numerai is a hedge fund that sends encrypted market data to any data scientist who wants to compete to model the stock market. Numerai combines the best model submissions into a “metamodel”, trades that metamodel, and pays data scientists whose models perform well.
Having data scientists compete seemed like a powerful idea. So it got me thinking: can you create a fully decentralized version of this system that could be generalized to any problem? I believe the answer is yes.
Construction
As an example, let’s try creating a fully decentralized system for trading cryptocurrencies on decentralized exchanges. This is one of many potential constructions:
Data Data providers stake data and make it available to modelers.
Model building Modelers choose what data to use and create models. Training is done using a secure computation method which allows models to be trained without revealing the underlying data. Models are staked as well.
Creating a metamodel is optional — you can imagine models that are used without being combined into a metamodel.
Using the metamodel A smart contract takes the metamodel and trades programmatically through decentralized exchange mechanisms on-chain.
Distributing gains/losses After some time period passes, trading produces a profit or loss. This profit or loss is divided up amongst contributors to the metamodel based on how much smarter they made it. Models which contributed negatively have some or all of their staked funds taken. Models then turn around and perform similar distributions/stake slashing to their data providers.
Verifiable computation Computation for each step is either performed centralized but verifiable and challengeable using a verification game like Truebit or decentralized using secure multiparty computation.
Hosting Data and models are either hosted on IPFS or with nodes in a secure multiparty computation network, as on-chain storage would be too expensive.
What makes this system powerful?
Incentives to attract the best data globally Incentives to attract data are the most potent part of the system as data tends to be the limiting factor for most machine learning. In the same way Bitcoin created an emergent system with the most compute power in the world through open incentives, a properly engineered incentive structure for data would cause the best data in the world for your application to come to you. And it’s nearly impossible to shut down a system where data is coming from thousands or millions of sources.
Competition between algorithms Creates open competition between models/algorithms in places where it previously didn’t exist. Picture a decentralized Facebook with thousands of competing newsfeed algorithms.
Transparency in rewards Data and model providers can see they are getting the fair value of what they’ve submitted since all computation is verifiable, making them far more likely to participate.
Automation Taking action on-chain and generating value directly in tokens creates an automated and trustless closed loop.
Network effects Multi-sided network effects from users, data providers, and data scientists make the system self-reinforcing. The better it performs, the more capital it attracts, which means more potential payouts, which attracts more data providers and data scientists, who make the system smarter, which in turn attracts more capital, and back around again.
Source/More: Blockchain-based Machine Learning Marketplaces – Fred Ehrsam – Medium