• Research
  • Publications
  • People
  • Posts
  • News
  • Advices
  • Contact Us
Contact Us
  • Posts
    • 体系结构发展的风向标:ISCA 2025 会议总结
    • 苦苦寻觅的通用LLM压缩算法,居然一直潜伏在视频编码里!
  • Advice Collection
  • Contact Us
  • Publications
    • Enhancing Large-Scale AI Training Efficiency: The C4 Solution for Real-Time Anomaly Detection and Communication Optimization
    • H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference
    • Matrix: Multi-Cipher Structures Dataflow for Parallel and Pipelined TFHE Accelerator
    • MemTunnel: a CXL-based Rack-Scale Host Memory Pooling Architecture for Cloud Service
    • NVMePass: A Lightweight, High-performance and Scalable NVMe Virtualization Architecture with I/O Queues Passthrough
    • Phoenix: Pauli-based High-level Optimization Engine for Instruction Execution on NISQ devices
    • Push Multicast: A Speculative and Coherent Interconnect for Mitigating Manycore CPU Communication Bottleneck
    • TRACI: Network Acceleration of Input-Dynamic Communication for Large-Scale Deep Learning Recommendation Model
    • UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures
    • A comprehensive survey on gnn characterization
    • A Tightly Coupled AI-ISP Vision Processor
    • Dstc: Dual-side sparsity tensor core for dnns acceleration on modern gpu architectures
    • Enabling efficient sparse multiplications on GPUs with heuristic adaptability
    • Evt: Accelerating deep learning training with epilogue visitor tree
    • Klotski v2: Improved DNN Model Orchestration Framework for Dataflow Architecture Accelerators
    • Large-scale self-normalizing neural networks
    • NoCFuzzer: Automating NoC Verification in UVM
    • Optimizing nvme storage for large-scale deployment: Key technologies and strategies in alibaba cloud
    • SAFE: A Scalable Homomorphic Encryption Accelerator for Vertical Federated Learning
    • Salus: A Practical Trusted Execution Environment for CPU-FPGA Heterogeneous Cloud Platforms
    • Survey of machine learning for software-assisted hardware design verification: Past, present, and prospect
    • A comprehensive survey on distributed training of graph neural networks
    • Accelerating distributed GNN training by codes
    • Addressing Data Explosion Issue in Emerging Deep Learning Applications
    • Alcop: Automatic load-compute pipelining in deep learning compiler for ai-gpus
    • ArchExplorer: Microarchitecture exploration via bottleneck analysis
    • CHAM: A customized homomorphic encryption accelerator for fast matrix-vector product
    • Dynamic n: M fine-grained structured sparse attention mechanism
    • E-booster: A field-programmable gate array-based accelerator for secure tree boosting using additively homomorphic encryption
    • Ecssd: Hardware/data layout co-designed in-storage-computing architecture for extreme classification
    • Efficient super-resolution system with block-wise hybridization and quantized winograd on fpga
    • Gamora: Graph learning based symbolic reasoning for large-scale boolean networks
    • Hbp: Hierarchically balanced pruning and accelerator co-design for efficient dnn inference
    • High-performance and scalable software-based NVMe virtualization mechanism with I/O queues passthrough
    • Klotski: DNN model orchestration framework for dataflow architecture accelerators
    • Memory-friendly scalable super-resolution via rewinding lottery ticket hypothesis
    • Mnsim 2.0: A behavior-level modeling tool for processing-in-memory architectures
    • MPU: Memory-centric SIMT processor via in-DRAM near-bank computing
    • NPS: a framework for accurate program sampling using graph neural network
    • Predicting the output structure of sparse matrix multiplication with sampled compression ratio
    • Rm-stc: Row-merge dataflow inspired gpu sparse tensor core for energy-efficient sparse acceleration
    • Spada: Accelerating sparse matrix multiplication with adaptive dataflow
    • Spg: Structure-private graph database via squeezepir
    • Tt-gnn: Efficient on-chip graph neural network training via embedding reformation and hardware optimization
    • Adaptive ship-radiated noise recognition with learnable fine-grained wavelet transform
    • Ai-assisted synthesis in next generation eda: Promises, challenges, and prospects
    • Autocomm: A framework for enabling efficient communication in distributed quantum programs
    • Beacon: Scalable near-data-processing accelerators for genome analysis near memory pool with the cxl support
    • Characterizing and understanding HGNNs on GPUs
    • Compact Multi-level Sparse Neural Networks with Input Independent Dynamic Rerouting
    • Dynamic sparse attention for scalable transformer acceleration
    • EPQuant: A Graph Neural Network compression approach based on product quantization
    • HEDA: multi-attribute unbounded aggregation over homomorphically encrypted database
    • Iccad cad contest 2022
    • Milan: Masked image pretraining on language assisted representation
    • Multi-node acceleration for large-scale gcns
    • Mutually reinforcing structure with proposal contrastive consistency for few-shot object detection
    • OpSparse: a highly optimized framework for sparse general matrix multiplication on GPUs
    • Optimal transport for label-efficient visible-infrared person re-identification
    • ReDCIM: Reconfigurable digital computing-in-memory processor with unified FP/INT pipeline for cloud AI acceleration
    • Saarsp: An architecture for systolic-array acceleration of recurrent spiking neural networks
    • SPCIM: Sparsity-balanced practical CIM accelerator with optimized spatial-temporal multi-macro utilization
    • The spike gating flow: A hierarchical structure-based spiking neural network for online gesture recognition
    • Toward robust spiking neural network against adversarial perturbation
    • TranCIM: Full-digital bitline-transpose CIM-based sparse transformer accelerator with pipeline/parallel reconfigurable modes
    • Underwater-art: Expanding information perspectives with text templates for underwater acoustic target recognition
  • Research
  • Privacy
  • Terms of Service
    • Prof. Yuan Xie Secures Major RGC Strategic Topics Grant, Contributing to HKUST’s Record-Breaking Funding Success
    • Prof. Yuan Xie Has Been Named as Fang Professorship in Engineering

Privacy

Dec 1, 2023 · 1 min read

Add your company privacy policy here…

Last updated on Dec 1, 2023

← Research Aug 6, 2025
Terms of Service Dec 1, 2023 →

Privacy · Terms of Service

© 2025 JC STEM FACT Lab @ HKUST.

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.