Crayfish: Navigating the Labyrinth of Machine Learning Inference in Stream Processing SystemsShow others and affiliations
2024 (English)In: Advances in Database Technology - EDBT, Open Proceedings.org , 2024, Vol. 27, p. 676-689, article id 3Conference paper, Published paper (Refereed)
Abstract [en]
As Machine Learning predictions are increasingly being used in business analytics pipelines, integrating stream processing with model serving has become a common data engineering task. Despite their synergies, separate software stacks typically handle streaming analytics and model serving. Systems for data stream management do not support ML inference out-of-the-box, while model-serving frameworks have limited functionality for continuous data transformations, windowing, and other streaming tasks. As a result, developers are left with a design space dilemma whose trade-offs are not well understood. This paper presents Crayfish, an extensible benchmarking framework that facilitates designing and executing comprehensive evaluation studies of streaming inference pipelines. We demonstrate the capabilities of Crayfish by studying four data processing systems, three embedded libraries, three external serving frameworks, and two pre-trained models. Our results prove the necessity of a standardized benchmarking framework and show that (1) even for serving tools in the same category, the performance can vary greatly and, sometimes, defy intuition, (2) GPU accelerators can show compelling improvements for the serving task, but the improvement varies across tools, and (3) serving alternatives can achieve significantly different performance, depending on the stream processors they are integrated with.
Place, publisher, year, edition, pages
Open Proceedings.org , 2024. Vol. 27, p. 676-689, article id 3
Series
Advances in Database Technology - EDBT, ISSN 2367-2005 ; 27
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-346149DOI: 10.48786/edbt.2024.58Scopus ID: 2-s2.0-85190993856OAI: oai:DiVA.org:kth-346149DiVA, id: diva2:1855934
Conference
27th International Conference on Extending Database Technology, EDBT 2024, Paestum, Italy, Mar 25 2024 - Mar 28 2024
Note
QC 20240507
Part of ISBN:
978-389318091-2, 978-389318094-3, 978-389318095-0
2024-05-032024-05-032024-05-07Bibliographically approved