Research revolves around Multimodal Interaction and Modeling, with internship projects involving LLM and Image Generation.

Currently seeking works related to Multimodal algorithms and AIGC.

I will be completing my master’s degree at the University of Science and Technology of China under the guidance of Associate Professor Jun Yu. Additionally, I have corporate mentors Peng Chang, who heads the multimodal group at the Silicon Valley Research Institute of Ping An Technology in the United States, and Iek-Heng Chu. My undergraduate studies were pursued at Guangzhou University, where I was supervised by Professor Jin Li, the executive dean of the Institute of Artificial Intelligence, and Associate Professor Xianmin Wang. Currently, I have contributed to the publication of more than 10 articles.

During my undergraduate and postgraduate years, I often participated in algorithm competitions. I participated in more than 20 AI algorithm competitions in total, and gained a wealth of competition experience and strategies. I was a member of the Alibaba Security Student Expert Group. I am ranked the 7th in the Alibaba Security Challenger Program.

My research interests include:

  • Multimodal Interaction and Modeling (CV/NLP)
  • AIGC
  • Fine-grained Image Recognition
  • Robust Machine Learning

My business directions include:

  • Large language Models
  • Exploratory Data Analysis (EDA)
  • Data Mining
  • Style Transfer (Autoencoder, GAN, Diffusion)
  • Object Detection

📝 Published Papers

IJCAI 2024 (CCF-A)
sym

Dialogue Cross-Enhanced Central Engagement Attention Model for Real-Time Engagement Estimation
Jun Yu, Keda Lu, Ji Zhao et al. (First student author)

  1. Propose center-based sliding window to solve the problem of repetitive inference in sliding windows, improving inference efficiency by 100%.
  2. Propose the central engagement attention model based on SA, surpassing previous SOTA BiLSTM model, with inference efficiency improved by 300%.
  3. Propose cross-enhanced module based on CA and seamlessly integrated with the central engagement attention model, establish a new SOTA result.
CVPR 2024 (CCF-A) workshop
sym

MvAV-pix2pixHD: Multi-view Aerial View Image Translation
Jun Yu, Keda Lu, Shenshen Du et al. (First student author)

  1. Design Time-priority sampling and random sampling for sampling.
  2. Propose MvAV-pix2pixHD for multi-view aerial view image translation and use three powerful losses.
  3. This method won the 1st and 2nd place in the MAVIC-T competition for two multi-view image translation tasks.
ACM-MM 2023 (CCF-A)
sym
  1. Design multiple Seq2seq model based on Transformer and BiLSTM.
  2. Propose sliding window to address the significant context loss issue.
  3. Propose Ai-BiLSTM to align and interact multimodal features of dialogue participants, further enhancing performance.
  4. This method won the championship🏆 at ACM-MM 2023.
Trans 在投
sym

A Comprehensive and Unified Out-of-Distribution Classification Solution Framework
Jun Yu, Keda Lu, Yifan Wang et al. (First student author)

  1. Propose semantic masking for enhancing model robustness.
  2. Propose OOD-DAS, a comprehensive data augmentation collection.
  3. Propose OOD-Attention, which seamlessly integrates with SOTA classification models to improve model robustness.
  4. Propose an iterative pseudo-labeling method for ensemble integration of multiple architecture models, further enhancing OOD recognition accuracy.
  5. This method won the championship🏆 at ICCV 2023.

💻 Projects

  • 2024.03 - now Multimodal Large Language Models
EDA Show
sym
  • 2023.10 - 2024.02 Loan Customer Repayment Intention Recognition
  1. Conducte EDA on a dataset with millions of records and tens of millions of call texts.
  2. EDA -> Data Cleaning -> Feature Engineering. Utilized BERT for text modeling to identify customers’ repayment intentions.
  3. Explore LLM for data augmentation on call texts to enhance model robustness.
  • 2023.05 - 2023.09 Vertical Domain Chat Assistant (Training Corpus Construction, Based on ChatGLM, Bloomz, Qwen, etc., to finetune)
OCR Large Model Showcase Platform
sym
  • 2023.03 - 2023.06 OCR Large Model Showcase Platform
  1. Use Gradio to construct the entire OCR large model showcase interface, incorporating DocQA, MLLM, and pure OCR modules.
  2. Independently maintained for internal analysis and debugging, as well as external business showcasing.
  3. This project was awarded the 2023 H1 XXX·Enterprise Excellence Award - Technical Advancement.
  4. Responsible for the DocQA module.
Chinese font generation
sym
  • 2023.01 - 2023.03 Chinese font generation of Arbitrary style (GAN、Diffusion model)
  1. Explore Chinese font generation algorithms, including DG-Font and Diff-Font.
  2. Collect a dataset of 400 different styles of fonts.
  3. Design an end-to-end font generation model based on the Diffusion model (DDPM). It slightly outperformed Diff-Font and DG-Font in metrics such as SSIM and LPIPS.
  • Future Improvements: End-to-end, Contrastive learning, Diffusion model.
Document generation and style transfer
sym
  • 2022.11 - 2023.01 Document generation and style transfer (Independent research)
  1. Explore Diffusion model and GAN for end-to-end document generation.
  2. Research five years of style transfer articles from top conferences, CNN -> Attention -> Transformer, including AdaIN(ICCV2017), MetaNet(CVPR2018), SANet(CVPR2019), MAST(ACM-MM 2020), StyleFormer(ICCV2021), AdaAttN(ICCV2021) and StyTr2(CVPR2022).
  3. Reproduce StyTr2(CVPR2022) and AdaAttN(ICCV2021) and transfer them to the document generation task for data augmentation.
  • Future improvements: Contrastive learning, GAN, Diffusion model
Face Recognition and Text Detection
sym
  • 2022.06 - 2022.12 Reproducing mainstream algorithms based on the Mindspore algorithmic framework
  1. Participate in reproducing the RetinaFace face detection algorithm.
  2. Independent reproduce the FCENet text detection algorithm.
Course Management System
sym
  • 2020.12 - 2021.01 Genetic Algorithm-based Intelligent Timetabling - Course Management System (Individually implemented)
  1. Use sqlite3 databaseand and Bootstrap-Flask for visualization. Implement distinct client interfaces for students, teachers, and educational director.
  2. Propose an intelligent timetabling algorithm and proposed a novel optimization objective function (utilizing course variance). Employed genetic algorithms for optimization in timetabling.
  3. This project comprises over 2000 lines of Python code and 1000 lines of HTML code. It has been openly shared on my personal blog and Github.
Student performance management system
sym
  • 2019.04 - 2019.06 Student performance management system based on MFC (C++) (Individually implemented).
  1. Includes all basic functions (Create, Read, Update, Delete), as well as operations like import, save, and sorting.
  2. The design was primarily inspired by the large login button interface of QQ, aiming to create a clear and clean user experience.
  3. This project comprises over 10,000 lines of C++ code and has been open-sourced on my personal blog and Github.

🏅 Competitions

Master phase (Main force)


Master phase(Assistance)


Bachelor phase


🎖 Honors and Awards

  • 2023.11 Huawei Scholarship (Top 30th in the university)
  • 2023.10 National Scholarship (Top 1% of graduate students)
  • 2022.10 National Scholarship (Top 1% of graduate students)
  • 2021.10 National Scholarship (Top 1% of undergraduate students)
  • 2020.10 National Scholarship (Top 1% of undergraduate students)

🎓 Educations

  • 2022.09 - 2025.07, University of Science and Technology of China, Computer Technology, Recommended Postgraduate, Master’s Degree
  • 2018.09 - 2022.06, Guangzhou University, Computer Science and Technology (1/591), Bachelor’s Degree

🏛️ Academic conferences

  • 2024.03, Mindspore AI Framework Industry Conference (organized by Huawei), invited by Huawei, Beijing.
  • 2023.11, 31st ACM International Conference on Multimedia, Ottawa, Canada.
  • 2020.12, The 1st AI and Security Symposium (organized by Tsinghua University and Alibaba Security), invited by Alibaba, Beijing.
  • 2019.10, The 5th GeekPwn International Security Geek Competition, Shanghai.

💻 Internships

  • 2023.10 - 2024.10, Palo Alto Lab, PAII, Inc.
  • 2023.04 - 2023.06, Fuxi Lab, Netease.
  • 2022.11 - 2023.09, YouTu lab, Tencent.
  • 2022.06 - 2022.12, 2012 Lab, Huawei.

Thank you very much for every visitor, and I look forward to hearing from you!