Xiaohongshu Completes Industry's Largest Data Lake Migration to Cloud with Zero Downtime, Moving 500PB of Data in Just One Year

Xiaohongshu Completes Industry's Largest Data Lake Migration to Cloud with Zero Downtime, Moving 500PB of Data in Just One YearOn November 6th, after a year of intensive work, Xiaohongshu successfully migrated its industry-leading data lake to Alibaba Cloud with zero downtime. According to statistics, the project involved 1,500 participants and moved 500PB of data

Xiaohongshu Completes Industry's Largest Data Lake Migration to Cloud with Zero Downtime, Moving 500PB of Data in Just One Year

On November 6th, after a year of intensive work, Xiaohongshu successfully migrated its industry-leading data lake to Alibaba Cloud with zero downtime. According to statistics, the project involved 1,500 participants and moved 500PB of data. As one of China's leading internet companies, Xiaohongshu boasts over 300 million monthly active users. Its data lake stores all raw data from the past 11 years, including structured, semi-structured, and unstructured data.

In recent years, with the rapid growth of its business, Xiaohongshu's demand for online data processing has increased significantly. At the same time, historical problems accumulated from offline processing have also posed a significant financial and time risk for future switching. That is why, in November 2023, Xiaohongshu initiated a cloud migration project, aiming to move its data lake to Alibaba Cloud within a year.

  Xiaohongshu Completes Industry

After migrating to Alibaba Cloud, the data lake can leverage multiple OSS Buckets to be included in a unified resource pool, enabling multiple Buckets to share the OSS throughput and QPS capabilities within the resource pool. This flow control capability can effectively allocate resources and utilize throughput performance efficiently when facing complex business scenarios for Xiaohongshu, minimizing mutual interference between different business tenants.

Alibaba Cloud's native HDFS + DLA metadata seamlessly integrates with the Hadoop EMR system, supporting linear scaling capabilities for metadata, easily handling the linear growth of Xiaohongshu's hundreds of PBs of data.

The volume of data migrated this time is much larger than previous industry-leading cases. Xiaohongshu's cloud migration project went through three phases:

  • Phase 1: The project team first addressed standard issues and then implemented governance based on these standards.
  • Phase 2: After completing governance, the project officially entered the dual-run phase in May 2024. Data was copied to Alibaba Cloud, with both sides running simultaneously, verifying correctness and timeliness.
  • Phase 3: In August 2024, the project concluded the dual-run phase and moved into the cutover stage. The Alibaba Cloud team provided on-site support throughout the process, successfully completing the cutover.

In November 2024, Xiaohongshu's cloud migration project was officially declared complete. The project moved 500PB of data with zero downtime, completing 110,000 tasks, involving 1,500 participants and over 40 departments.

Xiaohongshu's successful completion of the industry's largest data lake migration project not only demonstrates its strong technical prowess but also highlights its commitment to data security and efficiency. It also offers a successful case study for other internet companies. We believe that Xiaohongshu will continue to invest in the data field, providing users with even better services in the future.


Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.(Email:[email protected])