Chain Century Conversation with Dr. Yongzheng Jia: An Interpretation of Artificial Intelligence and Storage and Computing on Blockchain

Chain Century Conversation with Dr. Yongzheng Jia: An Interpretation of Artificial Intelligence and Storage and Computing on Blockchain

On August 18th, hosted by Chain Century Finance and Everchain, co-sponsored by Juying International and PAI Community, BTRAC Global Digital Network Advanced Think Tank, Interview Blockchain, Coin World, and Carbon Chain Value co-organized the Chain Century Finance Conversation, Distributed Storage No.6 and was simultaneously broadcast live on Golden Finance and nearly 200 community platforms.

This event strategically cooperated with five excellent media in the industry, Golden Finance, Cointelegraph, Mars Finance, Block Technology, and Interstellar Vision, and more than 50 blockchain media gave full support.

With the theme of “Artificial Intelligence and Storage and Computing on Blockchain”, this event invited Dr. Yongzheng Jia, Secretary-General of the Blockchain Branch of the Chinese Society of Technology and Economics, as a guest, and Shike Jiao, CEO of Chain Century Finance, as the host, to jointly interpret artificial intelligence. Smart and blockchain storage and computing and other hot topics such as business opportunities in the field of distributed storage.

Special guest: Dr. Yongzheng Jia

Secretary-General of the Blockchain Branch of the Chinese Association of Technology Economics, founder & CEO of Ever Chain. Tsinghua University Computer Science Experimental Class (Yao Class) Bachelor degree in 2009, Ph.D. in Interdisciplinary Information Research Institute, once went to the University of California, Berkeley, University of Pennsylvania as visiting scholars, co-founded in 2013 is the world’s largest Chinese MOOC platform-Xuetang Online, founded in 2018 Ever Chain. Research directions include: dynamic optimization of large-scale social networks, cloud computing systems, and the application of artificial intelligence and blockchain technology to products and services for large-scale users. In the Internet, distributed systems, artificial intelligence, game theory, etc. Top conferences/journals in the field (such as WWW, IEEE/ACM Transactions on Networking, JSAC, ICSE, ICDCS, etc.) published many academic papers.

Host: Shike Jiao

CEO of Chain Century Finance and Partner of BTRAC Global Digital Network Advanced Think Tank. Chain Century and the top experts in the industry specially launched a series of conversations on topics such as, “New Infrastructure”, “IPFS”, “Filecoin”, “New Third Board”, etc., to communicate with industry experts, pioneers and other big names to help more people catch new opportunities in new tracks.

A Look Back on the Conversation:

What is the relationship between artificial intelligence and storage and computing on the blockchain?

[ Dr. Yongzheng Jia ] : First of all, whether it is older big data, cloud computing, or today’s artificial intelligence technology, they are all based on the secure storage of data, value mining, and privacy protection.

As we all know, artificial intelligence algorithms require large amounts of data and models, and we need to store and calculate large amounts of data.

Convolutional Neural Network (CNN) AlexNet, born in 2011, shares 60 million parameters

Google’s NLP model, BERT, was popular for a while, with a total of 300 million parameters.

GPT-2, the predecessor of the GPT-3 model, has 1.5 billion parameters.

Nvidia’s Megatron-BERT has 8 billion parameters.

In February 2020, Microsoft Turing NLP has 17 billion parameters.

In June 2020, Open AI’s GPT-3 came out with 175 billion parameters. It is the largest AI model to date.

At the same time, the data used in various artificial intelligence and big data systems around the world is also increasing at a rate of 180 billion terabytes per year, which reflects AI’s demand for massive data storage and computing. Obviously, AI has entered the era of data storage and is intensive with calculations. We need to fully guarantee the security of AI data and algorithm models to prevent data and model parameters from being lost. At the same time, we also need to protect the privacy of data to prevent losses caused by the leakage of user privacy during the use of data.

We have noticed that distributed storage and blockchain technology provide better security for AI. In traditional data storage systems, we often need to ensure the safe storage of data and models through redundant backup and disaster recovery technologies. In addition, distributed storage represented by IPFS enables AI data and algorithm models to be stored more securely, effectively preventing single points of failure. At the same time, blockchain technology can provide AI data and algorithms with access control that does not rely on trusted third parties, and can trace back to how data is stored and used in a fine-grained manner on the blockchain. Therefore, we can establish a better data storage and sharing mechanism through the blockchain, promote better data circulation and generate value, and protect the privacy of data.

At the same time, we need to further think about how to better combine AI and blockchain technology so that the two can integrate and promote each other.

Here, I want to share with you the Proof of Useful Work (PoUW) project proposed by Project PAI. I think this research project represents the trend of integration of AI and blockchain technology and has the opportunity to change the industry. .

The purpose of PoUW is to use AI calculations to replace traditional PoW hashing operations, so that the computing power on the blockchain can be used for actual AI applications, making these computing power “useful”. In the future, these computing powers can not only be used for machine learning training, but can also be extended to general-purpose on-chain calculations, such as on-chain data analysis, cryptographic-based on-chain privacy calculations (such as zero-knowledge proof, multi-party secure computing), etc.

As we all know, the mining of Bitcoin is a very time and resource consuming process. It costs a lot of time and resources every time a blockchain transaction block is added on the Bitcoin Blockchain. The Proof of Work (PoW) mechanism used by the Bitcoin protocol uses a large number of hash operations to complete consensus. These calculations themselves have no other purpose besides consensus. Based on the training of machine learning models on the blockchain, a novel “Proof of Useful Work” (PoUW) protocol is provided. After a certain amount of machine learning training work honestly, miners can get an opportunity to mint new virtual currency and get block rewards. At the same time, we have introduced AI and on-chain computing customers to the network. Customers will submit AI training tasks to the PoUW blockchain network and pay tokens to nodes participating in machine learning training to complete the tasks. This is an additional incentive for participating in the network, in addition to the block reward provided by the system, and is based on real needs.

The following figure shows a PoUW working model, describing how the various participants in the PoUW system work together: First, customers can submit a machine learning or on-chain computing task to PAI’s PoUW network by paying PAI coins. Worker nodes (miners) need to perform AI training to complete training and calculation tasks. Supervisors and evaluators in the network need to verify the work of workers, evaluate the distribution of income and payment methods, and prevent malicious behavior of Byzantine nodes, ordinary nodes You can also perform regular on-chain transactions in the PoUW network and enjoy general blockchain services. In this way, PAI’s PoUW blockchain guarantees the security of the entire machine learning training process, enabling real AI algorithm tasks to be solved by the computing power of the blockchain network. At the same time, more incentives are introduced for the blockchain, so that the token (PAI currency) on the PoUW blockchain has a richer application scenario.

Currently, Project PAI’s PoUW project is in the testnet stage and will be launched on the mainnet next year. In order to make AI and on-chain computing run better on the blockchain network, we also need to consider the storage requirements of the blockchain. The PAI blockchain has designed and implemented the corresponding data storage protocol (PDP-2), which has been used in areas such as supply chain management and commodity traceability. In the future, the PAI data storage protocol will also be connected to BitTorrent and IPFS networks to support AI training and general computing on a larger scale. At the same time, based on the PAI data storage protocol, Project PAI has also developed the blockchain digital identity authentication and authorization system PAI PASS, which is used to better share and manage data on the blockchain to protect user privacy.

Here, I would like to thank Project PAI for providing us with the latest research results on the combination of AI and blockchain. As today’s case, the PoUW project describes for us a complete system architecture that combines AI and blockchain storage and computing.

For more research results on the PAI blockchain and PoUW, you are welcome to view:

IPFS is hailed as a new generation of Internet underlying protocols. What role do you think it has on the development of distributed storage?

[ Dr. Yongzheng Jia ] : First of all, distributed storage systems and distributed file systems are not necessarily decentralized. They can be a scalable storage system architecture initiated and operated by a single entity, sharing the storage load by using multiple storage servers , And use the location server to locate and store information, so as to ensure the reliability, availability and data access efficiency of the system. Traditional classic distributed storage systems include GFS, Hadoop, glusterFS, etc., all of which have had an important impact on distributed storage technology.

In particular, IPFS (InterPlanetary File System) is a peer-to-peer (P2P) distributed file system, which can be completely decentralized, which is different from the above-mentioned distributed storage/file systems. Therefore, IPFS can be used well with the blockchain system, and an incentive mechanism (Filecoin) can be introduced into the system.

At the same time, IPFS can also be used as a new generation of Internet underlying protocols, as an alternative and upgraded version of HTTP is completely fine. Because it is different from HTTP, the P2P transmission protocol used by IPFS hardly has 404 and 502. At the same time, the files will be stored forever after being encrypted and fragmented. With more idle hard drives and bandwidth, IPFS will no longer need to waste a lot of equipment maintenance, greatly save bandwidth and storage resources, and greatly reduce the cost of data transmission.

Here is a brief introduction to the working principle of IPFS:

Each file and all blocks in IPFS are given a unique fingerprint called a cryptographic hash.

IPFS deletes duplicate files with the same hash value through the network, confirms which files are redundant and duplicates by calculation, and tracks the version history of each file.

Each network node only stores the content it is interested in, as well as some index information, which helps to figure out who is storing what.

When searching for a file, you can find the node that stores the file on the network through the hash value of the file and find the file you want.

Using what is called IPNS (Decentralized Naming System), each file can be collaboratively named with a readable name. By searching, you can easily find the file you want to view.

The IPFS open source protocol started in 2014 and has been running safely and stably for 6 years. The activity on GitHub is very good, and the participation of developers is high. Regarding the IPFS incentive mechanism (Filecoin) and various Internet and blockchain applications based on IPFS, we will introduce them in detail later.

You said that the new data computing technology strategy is to transfer calculations to storage instead of transferring data from the data source to the CPU before performing calculations. How do you understand this?

[ Dr. Yongzheng Jia ] : In traditional system architecture design, storage and calculation are often separated. Usually, data is transferred from the data source (such as memory, disk and other storage devices) to the CPU, and then the calculation is performed. This design is sometimes not optimal. As we said, AI technology is data storage and calculation intensive. If it can better bridge the gap between calculation and storage, it will improve the efficiency of data calculation and storage. .

For a long time, researchers have realized that traditional CPU-centric processing of large data sets is inefficient.

Therefore, in order to achieve higher data-intensive processing performance and energy efficiency, many research intiatives have begun to explore new storage and computing technologies: Near Data Processing (NDP), which transfers computing to storage (ie data source). , Instead of transferring data from the data source to the CPU for calculation.

These studies believe that the extra computing resources in the disk can be used to run data processing tasks locally. With the continuous development of solid state drives (SSD) and the emergence of data-intensive applications, in recent years, Near Data Computing (NDP) has attracted widespread attention from researchers in the fields of storage, high-performance computing, and database systems with promising results.

In addition to the near-data computing technology described above, I believe that decentralized storage technology and blockchain technology have given us more possibilities for integrating storage and computing technologies. Nodes that are responsible for storage in IPFS may also perform corresponding on-chain calculations in the future. The miners trained in the PoUW blockchain of Project PAI or the verifiers who test the training results can also carry the storage of data and AI algorithm models. This concept is consistent with near-data calculations. Combining decentralized storage and computing will bring more application scenarios, especially for data-intensive applications such as AI and big data science, which will greatly improve our computing and storage efficiency.

How will the rapid development of 5G and AI promote distributed storage technology?

[ Dr. Yongzheng Jia ] : The main goal of the 5G network is to keep the end user connected. Compared with the 4G network (4G LTE) service, the transmission rate is only 75Mbps, and the 5G network has successfully reached 1Gbps in the 28 GHz band. . Therefore, 5G can greatly improve the efficiency of point-to-point transmission in the network, increase bandwidth, and reduce delay.

The super-large network capacity of 5G network provides the connection capacity of hundreds of billions of devices, thereby improving the efficiency of storage, transmission and sharing of various data and files.

At the same time, the 5G network has also improved the level of coordination and intelligence of the system, manifested in the realization of multi-user, multi-point, multi-antenna collaborative networking, and flexible automatic adjustment between networks. This also provides more possibilities for a more flexible decentralized storage architecture.

As we mentioned earlier, the technology of AI has created a huge demand for storage technology (including distributed storage), and it has also promoted the integration of storage and computing. At the same time, AI algorithms can also be used for intelligent scheduling of distributed storage resources to better match the supply and demand of the storage market, thereby improving market efficiency and making distributed storage technology more intelligent, which is also distributed storage And a hot research topic in the field of cloud computing.

From a technical point of view, what industry opportunities will artificial intelligence and distributed storage technology trends bring? Such as entrepreneurship track, project case.

[ Dr. Yongzheng Jia ] : First of all, decentralized storage is a big market, and IPFS provides us with a lot of imagination. Building richer Internet applications and blockchain applications based on IPFS is a good choice for entrepreneurs.

The near-data computing and intelligent storage we just mentioned are all good research directions for AI+ storage.

A decentralized autonomous organization (DAO) based on decentralized storage is also a good direction. Decentralized storage creates greater capacity and more possibilities for DAO.

Among many IPFS applications, finance is a good direction, especially decentralized finance (Defi), which is currently under fire.

In addition, we just mentioned the PoUW of Project PAI, which integrates AI and computing on the chain very well. Combined with decentralized storage technology, the computing and storage of AI and blockchain can be integrated into a unified solution, applied to various real-life scenarios.

Do you think the Filecoin protocol will make distributed storage a large-scale blockchain application project?

[ Dr. Yongzheng Jia ] : As an open source decentralized storage protocol, IPFS was born in 2014, and the network runs smoothly.

The design of the Filecoin mechanism introduces an incentive mechanism for this decentralized storage protocol. Like Bitcoin, the incentive mechanism in cryptoeconomics is a model for coordinating the production relationship between nodes in a decentralized (detrusted) system. The mechanism design can ensure that the system runs in the expected direction.

Filecoin is an incentive mechanism and public chain system based on the IPFS protocol. The IPFS protocol defines how files are stored, retrieved, and transmitted in a distributed system. It can save and share files permanently and decentrally. This is a content addressable, Point-to-point distributed protocol. FIL is a token issued by Filecoin, which is used to incentivize the behavior of various roles in the storage and retrieval market in the Filecoin network.

Filecoin adopts a hybrid consensus mechanism: based on expected consensus (EC) (equivalent to PoW+PoS hybrid consensus), and Proof of Replication (PoRep) and Proof of Time (PoSt) as a supplement.

In the expected consensus, the probability of a miner winning the election is directly proportional to the current storage capacity of the miner. The storage capacity of miners is proved by proof of time and space (PoSt) and proof of replication (PoRep). Space-time proof can use the proof chain and timestamp to prove that the miner has stored data for a certain period of time. Even if the verifier is not online, it can be verified in the future that the miner has generated the proof chain during this period of time, effectively preventing temporary data generation attacks.

There are two major markets in the Filecoin economic model: storage and retrieval markets. There are customers and miners in the two markets. In addition to the above two roles, there are developers and investors in the Filecoin economic model ecosystem. Developers develop new tools and applications and provide proof-of-concept suggestions to optimize the Filecoin ecosystem and storage retrieval market. Developers who submit proposals can receive foundation allowances. Investors increase the value of Filecoin’s network by providing liquidity in the secondary market.

At present, there are various types of applications in the IPFS ecosystem. You can refer to the following picture:

The above figure shows various applications developed based on IPFS. We can already see that IPFS is widely used in various fields such as data storage, social media, browser, finance, content, NFT, governance, and exchanges. We believe that in the future, IPFS will have more application scenarios, which will bring us more surprises.

There is a Zhihu link, which introduces some of the above applications. We welcome everyone to check:

Featured questions

(1) Where is the development of PAI’s storage technology now? Compared with other systems, what advantages does the PAI blockchain storage system have. Can IPFS be used as the bottom layer?

Dr. Yongzheng Jia : Project PAI’s storage technology is based on the PAI data storage protocol PDP-2, and it can already provide corresponding technical services. The previous paragraph also cooperated with Uncle Saba’s for food traceability and supply chain management. In addition, PAI PASS is an identity authentication and data authority control system built on the PAI storage protocol. There will be more application scenarios in data sharing in the future. Project PAI’s storage system is an application service protocol built on the bottom layer of storage, which can be decentralized networks such as IPFS and Bittorrent.

(2) Sharing for Dr. Jia. BTRAC Global Digital Network Advanced Think Tank Technical Expert Dr. Donglin Wang pointed out: Distributed storage in the storage industry is a type of centralized storage. Distributed storage in the blockchain industry is decentralized storage. The two are not the same thing. IPFS is The files are not encrypted and fragmented. These are all added by Chinese miners, and they are all false propaganda. What do you think about this?

Dr. Yongzheng Jia : Distributed storage and decentralized storage are two concepts, not distinguished by industry. Decentralized storage must be distributed. The storage industry and the blockchain industry are also independent, and the blockchain needs storage infrastructure to support it. Regarding IPFS, files are not encrypted or fragmented. These are all added by Chinese miners and are false propaganda. The original agreement is indeed not there, thank you Dr. Wang for his correction!

(3) The storage industry will have rigid demand in many industries. Will it cause new trends like Defi?

Dr. Yongzheng Jia : I am more optimistic about the storage industry. Whether it is the development of modern storage technology or IPFS, there is great potential. Defi also has a significant contribution to the blockchain industry.

(4) The Filecoin testnet launch was postponed again. Why is it so difficult?

Dr. Yongzheng Jia : I think the mechanism design of Filecoin is a very complicated thing, so there will be some problems in the system test. Therefore, individuals have reservations about Filecoin’s current technical route, but the system incentives of IPFS must be there.

The article only represents the author’s opinion and does not constitute any investment advice or suggestion. Investment is risky, so be cautious when entering the market.

Project PAI is a public blockchain that supports artificial intelligence technology and applications. It is committed to promoting the integration and innovation of artificial intelligence and blockchain technology, providing blockchain infrastructure for artificial intelligence applications and providing artificial intelligence for blockchain systems. The technical support and toolbox make the blockchain more intelligent.