DataDome
Threat research

DataDome Open-Sources Its First Machine Learning Package: Sliceline

Table of contents

DataDome’s data science team is happy to announce the open-sourcing of Sliceline, a machine learning package for model debugging.

Sliceline

This package is a python implementation of SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging published at SIGMOD 2021 by Svetlana Sagadeeva and Matthias Boehm from Graz University of Technology.

It is designed for fast slice finding in situations where a trained machine learning (ML) model performs significantly worse. Sliceline generates rules, aka “filters” to identify sub-populations in a dataset on which the ML model struggles to predict the output. It gives you clues on how the model is working and on how to improve it.

Sliceline is a new available component for your MLOps and/or Explainability stacks.

Slice enumeration and pruning techniques are done via sparse linear algebra, and slices are scored by the algorithm on different criteria including:

  • Slice size (the number of elements in the slice).
  • Slice errors (the error the model is making on the slice).

At DataDome, we are using Sliceline in 2 different ways:

  1. As a ML model debugger.
  2. As a contrast set algorithm, to generate dynamic blocking patterns in certain contexts.

We will soon present an application of Sliceline on the open-source Titanic dataset.

 Open Source – Titanic Departing Southampton

Titanic departing Southampton April 10, 1912.

Contributing

We are happy to contribute to the community’s global knowledge thanks to open source. Sliceline has been open-sourced under the BSD-3-Clause license

Feel free to contribute in any way you like. We are always open to new ideas and approaches.

  • Open a discussion if you have any question or inquiry whatsoever. It is more useful to ask your question in public rather than sending a private email. We also encourage people to open a discussion before contributing, so that everyone is aligned and unnecessary work is avoided.
  • Feel free to open an issue if you think you’ve spotted a bug or a performance issue.

Please check out the contribution guidelines if you want to bring modifications to the code base.

 

Datadome
Antoine De Daran
Cybersecurity Data Scientist
Antoine de Daran is a Data Scientist at DataDome, focused on developing new approaches to stop bots with the power of machine learning (ML). As part of DataDome’s data science team, Antoine leverages the wealth of data processed by DataDome to identify new weak signals to improve our ML detection. Antoine has more than 5 years of experience in data science, from time series analysis to supervised learning, unsupervised learning, and more.
Datadome

Experience everything DataDome

Schedule a demo of the DataDome platform to see how you can start blocking bots and preventing cyberfraud.