DataDome Open-Sources Its First Machine Learning Package: Sliceline

Bot management Scraping

DataDome’s data science team is happy to announce the open-sourcing of Sliceline, a machine learning package for model debugging.

Sliceline

This package is a python implementation of SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging published at SIGMOD 2021 by Svetlana Sagadeeva and Matthias Boehm from Graz University of Technology.

It is designed for fast slice finding in situations where a trained machine learning (ML) model performs significantly worse. Sliceline generates rules, aka “filters” to identify sub-populations in a dataset on which the ML model struggles to predict the output. It gives you clues on how the model is working and on how to improve it.

Sliceline is a new available component for your MLOps and/or Explainability stacks.

Slice enumeration and pruning techniques are done via sparse linear algebra, and slices are scored by the algorithm on different criteria including:

Slice size (the number of elements in the slice).
Slice errors (the error the model is making on the slice).

At DataDome, we are using Sliceline in 2 different ways:

As a ML model debugger.
As a contrast set algorithm, to generate dynamic blocking patterns in certain contexts.

We will soon present an application of Sliceline on the open-source Titanic dataset.

Titanic departing Southampton April 10, 1912.

Contributing

We are happy to contribute to the community’s global knowledge thanks to open source. Sliceline has been open-sourced under the BSD-3-Clause license.

Feel free to contribute in any way you like. We are always open to new ideas and approaches.

Open a discussion if you have any question or inquiry whatsoever. It is more useful to ask your question in public rather than sending a private email. We also encourage people to open a discussion before contributing, so that everyone is aligned and unnecessary work is avoided.
Feel free to open an issue if you think you’ve spotted a bug or a performance issue.

Please check out the contribution guidelines if you want to bring modifications to the code base.