A 3D-hybrid-cube processing Unit(HCU) featuring 64 TOPS Compute In Memory with Planar Node Process for tensors and Custom RISC-V for vectors. Sub-millisecond reasoning powered by 6GB-48GB 3D DRAM.
Our Compute In Memory core is built using hand-crafted Custom Circuits. By bypassing standard cell libraries, our MAC units utilize an optimized Addertree structure to achieve extreme efficiency for FP4 and FP8 tensor operations.
By stacking 6GB of DRAM directly atop our hybrid logic die and eliminating the Buffer Die, we unlock 2.0 TB/s bandwidth with zero vertical delay.
Complementing our Compute In Memory engine, the specialized RISC-V Vector Engine utilizes custom instruction set extensions to handle non-linear layers and complex vector arithmetic in BF16 and FP32 formats.
Hardware-native Broadcast and All-Reduce support. Our self-developed NoC bus enables multi-cluster synchronization between Compute In Memory nodes at near-physical limits.
Dedicated hardware logic for prefix-sum and collective operations, slashing inter-cluster communication latency by 80%.
Proprietary NoC fabric supports single-cycle operand broadcasting to all Compute In Memory clusters simultaneously.
Computation triggers automatically upon operand arrival. Whether it is a Compute In Memory Tensor op or a RISC-V Vector op, our architecture eliminates 40% energy waste by removing instruction-fetch wait states.
Join us to craft the next generation of custom circuit AI silicon.
Send your resume to:
hr@cimicro.ai