Media Summary: 서울대학교 데이터사이언스대학원 Data Lakehouse Systems for Data Science 연구실 2024.09.13 Mini-Conference MonoNN: Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern GPU-Centric ... Kamino: Efficient VM Allocation at Scale with Latency-Driven Cache-Aware
Osdi 24 Llumnix Dynamic Scheduling - Detailed Analysis & Overview
서울대학교 데이터사이언스대학원 Data Lakehouse Systems for Data Science 연구실 2024.09.13 Mini-Conference MonoNN: Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern GPU-Centric ... Kamino: Efficient VM Allocation at Scale with Latency-Driven Cache-Aware A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient Far-Memory Applications Lei Chen, University of Chinese Academy ... Enabling Tensor Language Model to Assist in Generating High-Performance Tensor Programs for Deep Learning Yi Zhai, ... DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving Yinmin Zhong and ...
ServerlessLLM: Low-Latency Serverless Inference for Large Language Models Yao Fu, Leyang Xue, Yeqi Huang, and ... Low End-to-End Latency atop a Speculative Shared Log with Fix-Ante Ordering Shreesha G. Bhat, Tony Hong, Xuhao Luo, Jiyu ... nnScaler: Constraint-Guided Parallelization Plan Generation for Deep Learning Training Zhiqi Lin, University of Science and ... Optimizing Resource Allocation in Hyperscale Datacenters: Scalability, Usability, and Experiences Neeraj Kumar, Pol Mauri Ruiz, ... DecDEC: A Systems Approach to Advancing Low-Bit LLM Quantization Yeonhong Park, Jake Hyun, Hojoon Kim, and Jae W. Lee, ...