Fault Detection and Diagnosis Software of LHAASO
H. Zhang, M. Gu and S. Fan, "Fault Detection and Diagnosis Software of LHAASO," in IEEE Transactions on Nuclear Science, doi: 10.1109/TNS.2024.3454806.
======
Mainly responsible for JAVA Spring Cloud development, responsible for writing RESTful services to be provided to the front end.
Developed the member management system for the science experiment collaborations. The system uses NextJS for the front-end and SpringBoot for the back-end, and I am responsible for writing all the code for both the front-end and the back-end. Currently used in three large-scale international collaborations, with more than 3,000 users.
Developed a fault detection and diagnosis system for large-scale scientific experiments. The system uses Python and provides users with a common monitoring tool, unified fault monitoring and DAG-based root cause analysis of faults. I am responsible for all aspects of the system. It has been used in LHAASO and achieved good results.
Dedicated to improving the reliability and stability of large-scale online distributed data systems. Modified based on MinIO so that it can run on memory and manage distributed memory. Currently, it can manage a cluster of 50 computers with over 25TB of memory, achieving more than 10GB/s of mixed reads and writes, and running stably.
Deploying and fine-tuning ChatGLM3-6B to explore the use of large langugage models to improve the user experience of high-energy physics online systems.
H. Zhang, M. Gu and S. Fan, "Fault Detection and Diagnosis Software of LHAASO," in IEEE Transactions on Nuclear Science, doi: 10.1109/TNS.2024.3454806.
Talk at Sichuan, Sichuan, China
Mandarin(Native), English(Fluent)