Research revolves around Multimodal Interaction and Modeling, with internship projects involving LLM and Image Generation.
Currently seeking works related to Multimodal algorithms and AIGC.
I will be completing my master’s degree at the University of Science and Technology of China under the guidance of Associate Professor Jun Yu. Additionally, I have corporate mentors Peng Chang, who heads the multimodal group at the Silicon Valley Research Institute of Ping An Technology in the United States, and Iek-Heng Chu. My undergraduate studies were pursued at Guangzhou University, where I was supervised by Professor Jin Li, the executive dean of the Institute of Artificial Intelligence, and Associate Professor Xianmin Wang. Currently, I have contributed to the publication of more than 10 articles.
During my undergraduate and postgraduate years, I often participated in algorithm competitions. I participated in more than 20 AI algorithm competitions in total, and gained a wealth of competition experience and strategies. I was a member of the Alibaba Security Student Expert Group. I am ranked the 7th in the Alibaba Security Challenger Program.
My research interests include:
- Multimodal Interaction and Modeling (CV/NLP)
- AIGC
- Fine-grained Image Recognition
- Robust Machine Learning
My business directions include:
- Large language Models
- Exploratory Data Analysis (EDA)
- Data Mining
- Style Transfer (Autoencoder, GAN, Diffusion)
- Object Detection
📝 Published Papers
Dialogue Cross-Enhanced Central Engagement Attention Model for Real-Time Engagement Estimation
Jun Yu, Keda Lu, Ji Zhao et al. (First student author)
- Propose center-based sliding window to solve the problem of repetitive inference in sliding windows, improving inference efficiency by 100%.
- Propose the central engagement attention model based on SA, surpassing previous SOTA BiLSTM model, with inference efficiency improved by 300%.
- Propose cross-enhanced module based on CA and seamlessly integrated with the central engagement attention model, establish a new SOTA result.
MvAV-pix2pixHD: Multi-view Aerial View Image Translation
Jun Yu, Keda Lu, Shenshen Du et al. (First student author)
- Design Time-priority sampling and random sampling for sampling.
- Propose MvAV-pix2pixHD for multi-view aerial view image translation and use three powerful losses.
- This method won the 1st and 2nd place in the MAVIC-T competition for two multi-view image translation tasks.
-
ACM-MM 2023(CCF-A)
Sliding Window Seq2seq Modeling for Engagement Estimation
Jun Yu, Keda Lu, Mohan Jing et al. (First student author) -
TOMM 2024在投(CCF-B)
Exploring Seq2seq Models for Engagement Estimation in Dyadic Conversations
Jun Yu, Keda Lu, Lei Wang et al. (First student author)
- Design multiple Seq2seq model based on Transformer and BiLSTM.
- Propose sliding window to address the significant context loss issue.
- Propose Ai-BiLSTM to align and interact multimodal features of dialogue participants, further enhancing performance.
- This method won the championship🏆 at ACM-MM 2023.
A Comprehensive and Unified Out-of-Distribution Classification Solution Framework
Jun Yu, Keda Lu, Yifan Wang et al. (First student author)
- Propose semantic masking for enhancing model robustness.
- Propose OOD-DAS, a comprehensive data augmentation collection.
- Propose OOD-Attention, which seamlessly integrates with SOTA classification models to improve model robustness.
- Propose an iterative pseudo-labeling method for ensemble integration of multiple architecture models, further enhancing OOD recognition accuracy.
- This method won the championship🏆 at ICCV 2023.
-
ACM-MM 2024
End-to-end Spatio-Temporal Information Aggregation For Micro-Action Detection Jun Yu, Mohan Jing, Gongpeng Zhao, Keda Lu et al. -
ACM-MM 2024
Building Robust Video-Level Deepfake Detection via Audio-Visual Local-Global Interactions Yifan Wang, Xuecheng Wu, Jia Zhang, Mohan Jing, Keda Lu et al. -
ACM-MM 2023
Answer-Based Entity Extraction and Alignment for Visual Text Question Answering Jun Yu, Mohan Jing, Weihao Liu, Tongxu Luo, Bingyuan Zhang, Keda Lu et al. -
CLEF 2022
Bag of Tricks and a Strong Baseline for FGVC. Jun Yu, Hao Chang, Keda Lu et al. -
CLEF 2022
Efficient Model Integration for Snake Classification Jun Yu, Hao Chang, Zhongpeng Cai, Guochen Xie, Liwen Zhang, Keda Lu et al. -
CVPR 2022 workshop
Pseudo-label generation and various data augmentation for semi-supervised hyperspectral object detection Jun Yu, Liwen Zhang, Shenshen Du, Hao Chang, Keda Lu et al. -
AAAI 2022 workshop
Mining limited data for more robust and generalized ML models, Jun Yu, Hao Chang, Keda Lu et al. -
International Journal of Machine Learning and Cybernetics
Generating transferable adversarial examples based on perceptually-aligned perturbation, Hongqiao Chen, Keda Lu, Xianmin Wang et al.
💻 Projects
- 2024.03 - now Multimodal Large Language Models
- 2023.10 - 2024.02 Loan Customer Repayment Intention Recognition
- Conducte EDA on a dataset with millions of records and tens of millions of call texts.
- EDA -> Data Cleaning -> Feature Engineering. Utilized BERT for text modeling to identify customers’ repayment intentions.
- Explore LLM for data augmentation on call texts to enhance model robustness.
- 2023.05 - 2023.09 Vertical Domain Chat Assistant (Training Corpus Construction, Based on ChatGLM, Bloomz, Qwen, etc., to finetune)
- 2023.03 - 2023.06 OCR Large Model Showcase Platform
- Use Gradio to construct the entire OCR large model showcase interface, incorporating DocQA, MLLM, and pure OCR modules.
- Independently maintained for internal analysis and debugging, as well as external business showcasing.
- This project was awarded the 2023 H1 XXX·Enterprise Excellence Award - Technical Advancement.
- Responsible for the DocQA module.
- 2023.01 - 2023.03 Chinese font generation of Arbitrary style (GAN、Diffusion model)
- Explore Chinese font generation algorithms, including DG-Font and Diff-Font.
- Collect a dataset of 400 different styles of fonts.
- Design an end-to-end font generation model based on the Diffusion model (DDPM). It slightly outperformed Diff-Font and DG-Font in metrics such as SSIM and LPIPS.
- Future Improvements: End-to-end, Contrastive learning, Diffusion model.
- 2022.11 - 2023.01 Document generation and style transfer (Independent research)
- Explore Diffusion model and GAN for end-to-end document generation.
- Research five years of style transfer articles from top conferences, CNN -> Attention -> Transformer, including AdaIN(ICCV2017), MetaNet(CVPR2018), SANet(CVPR2019), MAST(ACM-MM 2020), StyleFormer(ICCV2021), AdaAttN(ICCV2021) and StyTr2(CVPR2022).
- Reproduce StyTr2(CVPR2022) and AdaAttN(ICCV2021) and transfer them to the document generation task for data augmentation.
- Future improvements: Contrastive learning, GAN, Diffusion model
- 2022.06 - 2022.12 Reproducing mainstream algorithms based on the Mindspore algorithmic framework
- Participate in reproducing the RetinaFace face detection algorithm.
- Independent reproduce the FCENet text detection algorithm.
- 2020.12 - 2021.01 Genetic Algorithm-based Intelligent Timetabling - Course Management System (Individually implemented)
- Use sqlite3 databaseand and Bootstrap-Flask for visualization. Implement distinct client interfaces for students, teachers, and educational director.
- Propose an intelligent timetabling algorithm and proposed a novel optimization objective function (utilizing course variance). Employed genetic algorithms for optimization in timetabling.
- This project comprises over 2000 lines of Python code and 1000 lines of HTML code. It has been openly shared on my personal blog and Github.
- 2019.04 - 2019.06 Student performance management system based on MFC (C++) (Individually implemented).
- Includes all basic functions (Create, Read, Update, Delete), as well as operations like import, save, and sorting.
- The design was primarily inspired by the large login button interface of QQ, aiming to create a clear and clean user experience.
- This project comprises over 10,000 lines of C++ code and has been open-sourced on my personal blog and Github.
🏅 Competitions
Master phase (Main force)
-
2024.07 ACM-MM 2024: Grand challenge, Multi-Domain Engagement Estimation (Solo, Champion🏆) [LeaderBoard]
-
2024.03 CVPR 2024: Multi-modal Aerial View Image Challenge - Translation (Top3 prize 2500$, Solo, Runner up🥈) [LeaderBoard] [Paper]
-
2023.10 ICCV 2023: Out Of Distribution Generalization: Object Classification track (Solo, Champion🏆) [LeaderBoard] [[Paper]]
-
2023.10 ICCV 2023: Out Of Distribution Generalization: Pose Estimation track (Solo, Champion🏆) [LeaderBoard] [Report]
-
2023.07 ACM-MM 2023: Grand challenge, Engagement Estimation (Solo, Champion🏆) [LeaderBoard] [Paper] [New]
-
2022.10 ECCV 2022: Out Of Distribution Generalization Track-1: Object Classification (Top3 prize 3300$, Runner up🥈) [LeaderBoard] [Code]
-
2022.10 ECCV 2022: Out Of Distribution Generalization Track-2: Object Detection (Top3 prize 3300$, Runner up🥈) [LeaderBoard] [Code]
-
2022.05 CVPR 2022: FGVC9 workshop FungiCLEF2022 challenge (Runner up🥈) [LeaderBoard] [Code] [Paper]
-
2022.03 CVPR 2022: Multi-modal Aerial View Object Classification - SAR+EO (Top3 prize 6000$, Champion🏆) [LeaderBoard] [Report] [New]
-
2022.03 CVPR 2022: Multi-modal Aerial View Object Classification - SAR (Top3 prize 6000$, Champion🏆) [LeaderBoard] [Report] [New]
Master phase(Assistance)
-
2024.07 ACM-MM 2024: 1M-Deepfakes Detection Challenge (Champion🏆) [LeaderBoard] [Paper]
-
2024.07 ACM-MM 2024: Micro-Action Analysis Grand Challenge:Multi-label Micro-Action Detection (Champion🏆) [LeaderBoard] [Paper]
-
2024.07 ACM-MM 2024: Micro-Action Analysis Grand Challenge:Micro-Action Detection (Runner up🥈) [LeaderBoard] [Paper]
-
2023.12 ICCV 2023: WECIA - Caption Generation Challenge (Champion🏆) [LeaderBoard]
-
2023.07 ACM-MM 2023: Visual Text Question Answering (3rd🥉) [LeaderBoard] [Paper]
-
2023.03 CVPR 2023: Multi-modal Aerial View Imagery Challenges - Translation (Top3 prize 2250$, Champion🏆) [LeaderBoard] [Paper]
-
2022.06 CVPR 2022: Robustness in Sequential Data challenge (Champion🏆) [LeaderBoard] [Report] [New]
-
2022.03 CVPR 2022: Semi-Supervised Hyperspectral Object Detection Challenge (Champion🏆) [LeaderBoard] [Paper]
Bachelor phase
-
2022.08 Computer Competition of China (Top10 prize 560,000¥, Solo, National Second Prize, Top30/3000+) [LeaderBoard] [Code]
-
2022.01 AAAI 2022: Data-Centric Robust Learning on ML Models (Top10 prize 1000,000¥, Solo, Rank 10/3692) [LeaderBoard] [Code] [Paper]
-
2021.11 OPPO Security AI Challenge - Face Recognition Attacks (Top10 prize 600,000¥, Solo, Rank 12/2000+) [LeaderBoard] [Code]
-
2021.03 CVPR 2021:White-box Adversarial Attacks on ML Defense Models (Top10 prize 100,000¥, Rank 20/1681) [LeaderBoard] [Code] [Blog]
-
2020.10 Adversarial Attacks on forged images (Top10 prize 2 million ¥, Rank 6/1666) [LeaderBoard]
-
2020.08 Tencent Advertising Algorithm Competition (Top10 prize 100,000$, Rank 11/10000+) [Code] [Blog]
-
2020.04 Used Car Trading Price Forecast (Solo, Winner, Rank 13/2815) [LeaderBoard] [Code] [Blog]
-
2020.03 Text Adversarial Attack Competition (Top10 prize 68,000¥, Rank 4/1666) [LeaderBoard] [Code] [Blog]
-
2019.12 ImageNet Adversarial Attack Competition (Top10 prize 68,000¥, Rank /1522) [LeaderB] [Blog]
-
2019.10 GeekPwn2019 CAAD CTF Finals (Finals prize 100,000¥, Rank 5th place in Finals) [LeaderBoard] [New]
🎖 Honors and Awards
- 2024.10 National Scholarship (Top 1% of graduate students)
- 2023.11 Huawei Scholarship (Top 30th in the university)
- 2023.10 National Scholarship (Top 1% of graduate students)
- 2022.10 National Scholarship (Top 1% of graduate students)
- 2021.10 National Scholarship (Top 1% of undergraduate students)
- 2020.10 National Scholarship (Top 1% of undergraduate students)
🎓 Educations
- 2022.09 - 2025.07, University of Science and Technology of China, Computer Technology, Recommended Postgraduate, Master’s Degree
- 2018.09 - 2022.06, Guangzhou University, Computer Science and Technology (1/591), Bachelor’s Degree
🏛️ Academic conferences
- 2024.03, Mindspore AI Framework Industry Conference (organized by Huawei), invited by Huawei, Beijing.
- 2023.11, 31st ACM International Conference on Multimedia, Ottawa, Canada.
- 2020.12, The 1st AI and Security Symposium (organized by Tsinghua University and Alibaba Security), invited by Alibaba, Beijing.
- 2019.10, The 5th GeekPwn International Security Geek Competition, Shanghai.
💻 Internships
- 2023.10 - 2024.10, Palo Alto Lab, PAII, Inc.
- 2023.04 - 2023.06, Fuxi Lab, Netease.
- 2022.11 - 2023.09, YouTu lab, Tencent.
- 2022.06 - 2022.12, 2012 Lab, Huawei.
Thank you very much for every visitor, and I look forward to hearing from you!