Paper-Conference

Enhancing Large-Scale AI Training Efficiency: The C4 Solution for Real-Time Anomaly Detection and Communication Optimization

Jianbo Dong

• Jan 1, 2025 • 1 min read

Salus: A Practical Trusted Execution Environment for CPU-FPGA Heterogeneous Cloud Platforms

Yu Zou

• Jan 1, 2024 • 1 min read

Evt: Accelerating deep learning training with epilogue visitor tree

Zhaodong Chen

• Jan 1, 2024 • 1 min read

Tt-gnn: Efficient on-chip graph neural network training via embedding reformation and hardware optimization

Zheng Qu

• Jan 1, 2023 • 1 min read

Spada: Accelerating sparse matrix multiplication with adaptive dataflow

Zhiyao Li

• Jan 1, 2023 • 1 min read

Rm-stc: Row-merge dataflow inspired gpu sparse tensor core for energy-efficient sparse acceleration

Guyue Huang

• Jan 1, 2023 • 1 min read

Predicting the output structure of sparse matrix multiplication with sampled compression ratio

Zhaoyang Du

• Jan 1, 2023 • 1 min read

Klotski: DNN model orchestration framework for dataflow architecture accelerators

Chen Bai

• Jan 1, 2023 • 1 min read

Hbp: Hierarchically balanced pruning and accelerator co-design for efficient dnn inference

Ao Ren

• Jan 1, 2023 • 1 min read

Gamora: Graph learning based symbolic reasoning for large-scale boolean networks

Nan Wu

• Jan 1, 2023 • 1 min read