AITopics | cxl

Collaborating Authors

cxl

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System Inferences

Huo, Pingyi, Devulapally, Anusha, Maruf, Hasan Al, Park, Minseo, Nair, Krishnakumar, Arunachalam, Meena, Akbulut, Gulsum Gudukbay, Kandemir, Mahmut Taylan, Narayanan, Vijaykrishnan

arXiv.org Artificial IntelligenceSep-25-2024

Deep Learning Recommendation Models (DLRMs) have become increasingly popular and prevalent in today's datacenters, consuming most of the AI inference cycles. The performance of DLRMs is heavily influenced by available bandwidth due to their large vector sizes in embedding tables and concurrent accesses. To achieve substantial improvements over existing solutions, novel approaches towards DLRM optimization are needed, especially, in the context of emerging interconnect technologies like CXL. This study delves into exploring CXL-enabled systems, implementing a process-in-fabric-switch (PIFS) solution to accelerate DLRMs while optimizing their memory and bandwidth scalability. We present an in-depth characterization of industry-scale DLRM workloads running on CXL-ready systems, identifying the predominant bottlenecks in existing CXL systems. We, therefore, propose PIFS-Rec, a PIFS-based scheme that implements near-data processing through downstream ports of the fabric switch. PIFS-Rec achieves a latency that is 3.89x lower than Pond, an industry-standard CXL-based system, and also outperforms BEACON, a state-of-the-art scheme, by 2.03x.

cxl, fabric switch, pif-rec, (14 more...)

arXiv.org Artificial Intelligence

2409.16633

Country:

North America > United States > Pennsylvania > Centre County > University Park (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Europe (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

How Flexible Is CXL's Memory Protection?

Communications of the ACMNov-17-2023, 22:19:44 GMT

Samuel W. Stark is a Ph.D. student and Harding Scholar in the Department of Computer Science and Technology at the University of Cambridge, U.K., where he is studying the wider applications of capabilities for shared-memory systems with Simon Moore. A. Theodore Markettos is a senior research associate in the Department of Computer Science and Technology at the University of Cambridge, U.K., where he co-leads the CAPcelerate project, which is researching the use of capabilities for securing distributed distrustful accelerators. Simon W. Moore is a professor of computer engineering in the Department of Computer Science and Technology at the University of Cambridge, U.K., where he conducts research and teaching in the general area of computer architecture, with particular interests in secure and rigorously engineered processors and subsystems.

cxl, endpoint, protection, (17 more...)

Communications of the ACM

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (1.00)
Europe > Switzerland > Zürich > Zürich (0.04)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Hardware (0.94)
Information Technology > Artificial Intelligence (0.68)

Add feedback

Failure Tolerant Training with Persistent Memory Disaggregation over CXL

Kwon, Miryeong, Jang, Junhyeok, Choi, Hanjin, Lee, Sangwon, Jung, Myoungsoo

arXiv.org Artificial IntelligenceJan-19-2023

This paper proposes TRAININGCXL that can efficiently process large-scale recommendation datasets in the pool of disaggregated memory while making training fault tolerant with low overhead. To this end, i) we integrate persistent memory (PMEM) and GPU into a cache-coherent domain as Type-2. Enabling CXL allows PMEM to be directly placed in GPU's memory hierarchy, such that GPU can access PMEM without software intervention. TRAININGCXL introduces computing and checkpointing logic near the CXL controller, thereby training data and managing persistency in an active manner. Considering PMEM's vulnerability, ii) we utilize the unique characteristics of recommendation models and take the checkpointing overhead off the critical path of their training. Lastly, iii) TRAININGCXL employs an advanced checkpointing technique that relaxes the updating sequence of model parameters and embeddings across training batches. The evaluation shows that TRAININGCXL achieves 5.2x training performance improvement and 76% energy savings, compared to the modern PMEM-based recommendation systems.

artificial intelligence, cxl-mem, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2301.07492

Genre: Research Report (0.64)

Technology:

Information Technology > Hardware (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.49)

Add feedback