Wenqi Lou (娄文启 )
About me
I was born in Xinxiang, Henan Pronvince. I received Bachelor's degree in School of Computer in June 2018 from Northwestern Polytechnical University (NWPU), Xi'an.
In the same year, I was admitted to study for a M.Sc. degree in School of Computer Science and Technology, University of Science and Technology of China without entrance examination. From Sept. 2020, I started my Ph.D. degree under the supervision of Professor Xuehai Zhou and Professor Chao Wang. I received my PhD degree of computer science in December, 2023 at USTC.
娄文启,现为中国科大软件学院特任副研究员,硕士生导师。2018年6月本科毕业于西北工业大学计算机学院,
2023年12月于中国科学技术大学获得计算机系统结构博士学位,导师为
周学海教授与
王超教授, 毕业后留组任教。
主要研究方向为智能加速器架构、FPGA加速器设计、软硬件协同优化等,致力于从算法与硬件角度缓解深度学习模型的部署压力。
近年来,在计算机系统结构领域发表学术论文20余篇,包括IEEE TCAD、TC、DAC、FPGA、DATE等顶级期刊会议,其中以第一/通讯作者身份发表CFF-A/B类论文10篇,授权专利4项。
同时,主持国家自然科学基金青年项目、江苏省自然科学基金青年项目、校级项目以及思必驰企业合作项目,担任《计算机学报》、IEEE TCAD、TVLSI、TCBB等审稿人。
📢: 招收对硬件加速与模型推理优化感兴趣、数理基础扎实的同学(推免/联培/工程实践方向),由我与王超教授联合指导,欢迎邮件联系,附简历。
Research Interests
- FPGA Accelerator Design: Specialized in designing FPGA accelerators for Convolutional Neural Networks (CNN), Vision Transformers (ViT), and Large Language Models (LLM) to enhance performance and efficiency.
- Neural Architecture and Accelerator Co-Search: Focused on the co-evolution of neural network architectures and hardware accelerators to achieve optimized performance on specific hardware platforms.
- Model Quantization and Pruning for FPGA/GPU Inference: Expertise in reducing model complexity and size through quantization and pruning techniques, specifically tailored for efficient inference on FPGA and GPU architectures.
- AI for HW Design: Utilizing artificial intelligence to revolutionize the hardware design process, including automated design exploration, optimization, and verification.
Publications (* corresponding author)
- Jiale Dong, Wenqi Lou*, Hao Wu, Zhendong Zheng, Yunji Qin, Lei Gong, Chao Wang*, Xuehai Zhou.
"MoE-Sched: Enabling Efficient FPGA Deployment of Mixture-of-Experts Vision Transformers via Coordinated Scheduling",
IEEE Transactions on Very Large Scale Integration Systems (TVLSI), 2025:1-14.
(CCF-B, JCR-Q1)
- Wei Fu#, Wenqi Lou#*, Cheng Tang, Hongbing Wen,Yunji Qin, Lei Gong, Chao Wang*, Xuehai Zhou.
"UniCoS: A Unified Neural and Accelerator Co-Search Framework for CNNs and ViTs".
ACM/IEEE Design Automation Conference (DAC), San Francisco, Jun. 22-25, 2025:1-6.
(CCF-A, Top Conference in EDA Area)
- Haoyu Cai, Wenqi Lou*, Chao Wang*, Xuehai Zhou.
"Picasso: Analyzing Prompt Design for Text-To-Image Generative Diffusion Models from a Temporal-Spatial Perspective".
ACM Transactions on Multimedia Computing Communications and Applications (TOMM), 2025:1-24.
(CCF-B, JCR-Q1)
- Hongbing Wen, Zihao Wang, Jiale Dong, Wenqi Lou*, Chao Wang, Xuehai Zhou.
"QLlama: An FPGA-Based Microscaling Quantization Accelerator for Energy-Efficient Llama2 Inference".
IEEE Embedded Systems Letters (ESL), September 28-October 3, 2025, Taipei, China.
(CCF-B, CODES+ISSS 2025, Late Breaking Tracks)
- Cheng Tang, Wenqi Lou*, Qianyu Cheng, Jiayi Tuo, Wei Fu, Tianhao Jiang, Chao Wang, Xuehai Zhou.
"Spectral Enhanced Tuning: A Plug-and-Play Framework for Dehazing Models with Frequency Decoding and Fusion".
IEEE International Conference on Multimedia & Expo (ICME) 2025, Nantes, Jun. 30 to July 4, 2025.
(CCF-B)
- Jiale Dong#, Hao Wu#, Zihao Wang, Wenqi Lou*, Zhendong Zheng, Lei Gong, Chao Wang and Xuehai Zhou.
"CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA".
31st International European Conference on Parallel and Distributed Computing (EURO-PAR), Dresden, Germany, Aug. 25 to Aug. 29, 2025:1-14.
(CCF-B)
- Wenqi Lou, Yunji Qin, Zihao Wang, Chao Wang*, Lei Gong, Xuehai Zhou.
"Automated FPGA Accelerator Generation Framework for Transformers with Dataflow Optimization".
54th International Conference on Parallel Processing (ICPP), San Diego, Sept. 8-11, 2025:1-12.
(CCF-B, Leading Conference in Parallel Computing, Accepted)
- Wenqi Lou, Yunji Qin, Xuan Wang, Lei Gong, Chao Wang, Xuehai Zhou.
"FlexBCM: Hybrid Block-Circulant Neural Network and Accelerator Co-Search on FPGAs".
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (IEEE TCAD), 2024, 43(11):3852-3863.
(CCF-A, accepted at the CODES+ISSS 2024)
- Wenqi Lou, Lei Gong, Chao Wang, Jiaming Qian, Xuan Wang, Changlong Li, Xuehai Zhou.
"Unleashing Network/Accelerator Co-Exploration Potential on FPGAs: A Deeper Joint Search".
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (IEEE TCAD), 2024, 43(10):3041-3054.
(CCF-A)
- Wenqi Lou, Lei Gong, Chao Wang, Zidong Du, Xuehai Zhou.
"OctCNN: A High Throughput FPGA Accelerator for CNNs Using Octave Convolution Algorithm".
IEEE Transactions on Computers (IEEE TC), 2022, 71(8): 1847-1859.
(CCF-A)
- Wenqi Lou, Jiaming Qian, Lei Gong, Xuan Wang, Chao Wang, Xuehai Zhou.
"NAF: Deeper Network/Accelerator Co-Exploration for Customizing CNNs on FPGA".
Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2023:1-6.
(CCF-B, Flagship Conference in EDA Area)
- 娄文启, 王超, 宫磊, 周学海.
一种神经网络指令集扩展与代码映射机制. 软件学报, 3074-3086, 2020.
(CCF-T1, Chinese Journal)
-
Wenqi Lou, Chao Wang, Lei Gong, Xuehai Zhou. "OctCNN: An Energy-Efficient FPGA Accelerator for CNNs using Octave Convolution Algorithm".
IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2020:410-411.
(CCF-B, Poster)
-
Wenqi Lou, Chao Wang, Lei Gong, Xuehai Zhou.
"RV-CNN: Flexible and efficient instruction set for CNNs based on RISC-V processors".
Advanced Parallel Processing Technologies: 13th International Symposium (APPT), 2019.
(EI Index, CCF Architecture Committee held)
-
Wenqi Lou, Chao Wang, Lei Gong, Xuehai Zhou.
"Neural Network Instruction Set Extension and Code Mapping Mechanism".
International Journal of Software and Informatics (IJSI), 2020. (EI Index)
-
Yunji Qin, Wenqi Lou*, Chao Wang*, Lei Gong, Xuehai Zhou.
"Enhancing Long Sequence Input Processing in FPGA-Based Transformer Accelerators through Attention Fusion".
Proceedings of the 2024 ACM Great Lakes Symposium on VLSI (GLVLSI). 2024:599-603. (CCF-C)
-
Wei Fu, Wenqi Lou*, Yunji Qin, Lei Gong, Chao Wang, Xuehai Zhou.
"MFNAS: Multi-Fidelity Exploration in Neural Architecture Search with Stable Zero-shot Proxy".
Pacific Rim International Conference on Artificial Intelligence (PRICAI). Springer, 2024:348-360. (CCF-C)
-
Wei Fu, Wenqi Lou*, Lei Gong, Chao Wang, Xuehai Zhou.
"Beyond Training: A Zero-Shot Framework to Neural Architecture and Accelerator Co-Exploration".
IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops). IEEE, 2024. (CCF-B)
-
Cheng Tang, Wenqi Lou*.
"TCL-Net: A Lightweight and Efficient Dehazing Network with Frequency-Domain Fusion and Multi-Angle Attention".
Asian Conference on Computer Vision (ACCV). Spring, 2024. (CCF-C, oral, 5.6%)
- Yixuan Zhu, Wenqi Lou, Yinkang Gao, Binze Jiang, Xiaohang Gong, Xi Li.
"Fine-Grained Shared Cache Interference Analysis using Basic Block's Execution Time".
IEEE International Conference on Computer Design (ICCD), 2024.
(CCF-B)
-
Xuan Wang, Lei Gong, Jing Cao, Wenqi Lou, Weiya Wang, Chao Wang, Xuehai Zhou.
"hAP: A Spatial-von Neumann Heterogeneous Automata Processor with Optimized Resource and IO Overhead on FPGA".
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA). 2023. (CCF-B, Top Conference in FPGA Area)
- Xiangjun Qu, Lei Gong, Wenqi Lou, Qianyu Cheng, Xianglan Chen, Chao Wang, Xuehai Zhou.
"AutoSparse: A Source-to-Source Format and Schedule Auto-Tuning Framework for Sparse Tensor Program".
IEEE International Conference on Computer Design (ICCD), 2024.
(CCF-B)
- Zihan Wang, Lei Gong, Wenqi Lou, Qianyu Cheng, Xianglan Chen, Chao Wang, Xuehai Zhou.
"UniCoMo: A Unified Learning-Based Cost Model for Tensorized Program Tuning".
IEEE International Conference on Computer Design (ICCD), 2024.
(CCF-B)
- Jiale Dong, Wenqi Lou*, Zhendong Zheng, Yunji Qin, Lei Gong, Chao Wang, Xuehai Zhou.
"UbiMoE: A Ubiquitous Mixture-of-Experts Vision Transformer Accelerator With Hybrid Computation Pattern on FPGA".
2025 IEEE International Symposium on Circuits and Systems (ISCAS), London, May 25-28, 2025:1-5.
(CCF-C, Oral)
Activities
- Reviewer for 《计算机学报》
- Reviewer for IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)
- Reviewer for IEEE Transactions on Very Large Scale Integration Systems (TVLSI)
- Reviewer for IEEE Transactions on Biomedical Circuits and Systems (TBioCAS)
- Reviewer for IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
- Reviewer for The Journal of Supercomputing (CCF-C)
- Reviewer for Computers & Electrical Engineering (JCR-Q1)
- Reviewer for Scientific Reports (JCR-Q1)
- Reviewer for Journal of Cryptographic Engineering (JCR-Q2)
- Reviewer for Microprocessors and Microsystem (SCIE)
- Reviewer for International Journal of Electronics (SCIE)
- Reviewer for IET Computers & Digital Techniques (EI)
- 2025 全国高校程序设计教育大会 程序设计类实训案例论坛 主持
Awards
- 2025 江苏省青年科技人才托举工程
- 2025 Programming Education Conference of China, grand prize
- Intel Fellowship 2022
- USTC-Gusu First Class Scholarship 2021
- Outstanding Graduate of NWPU 2018
Projects
- "通用芯片语音模型优化部署技术研发", 思必驰科技有限公司合作项目, 2025-04 至 2026-06, 项目主持人, 在研.
- "实用算法课程教学改革研究与实践", 安徽省新时代育人质量工程项目(教学改革研究), 2025-04至2026-06, 项目主持人,在研.
- "基于频域滤波卷积的神经网络可重构加速器新原理、新结构与新方法", 国家自然科学基金面上项目, 2022-01 至 2025-12-31, 技术骨干, 在研.
Teaching
- 2024 中国科大软件学院专业基础课 《实用算法设计》, 主讲, 2024.03; 2024.09
- 2025 中国科大软件学院专业选修课《高级计算机体系结构》主讲, 20205.02
Patent
- "性能预测模型的训练方法及装置、性能预测方法及装置"; 宫磊,王超,周学海,王腾,娄文启,李曦,陈香兰, 发明专利, 已授权
- "图数据处理方法及装置"; 宫磊,王超,周学海,王腾,娄文启,李曦,陈香兰, 发明专利, 已授权
- "CNN模型与加速器的联合搜索方法、装置、设备及介质"; 娄文启, 发明专利, 已公开
- "神经网络和硬件的联合搜索方法及装置"; 娄文启,王超,付薇,唐承,宫磊,王腾,周学海,发明专利, 已授权
|