DataDome
Engineering

Meet the Woman in Charge of Data Science at DataDome: Q&A With Konstantina Kontoudi, PhD

Table of content
Paige Tester, Sr. Content Marketing Manager
25 Jan, 2022
|
min
“When we put new machine learning models into production, it’s not enough to know the theory—you need to write the actual code, make it work, and make it reliable.”
Konstantina Kontoudi, PhD, Lead Data Scientist at DataDome

From getting her PhD in theoretical physics to spending 60% of her workday coding, Konstantina tells us how she got here and why her role as Lead Data Scientist suits her perfectly. Get exclusive insights from the brains behind DataDome’s machine learning magic on what it’s like to break into the cybersecurity field, why machine learning competitions are a great learning tool, and what she sees as the top trends in data science today.

Q: What does your typical workday look like?

A: I usually start the day by checking Slack and email, and catching up on ML-related news. At 9:45, my team has a 5-minute daily standup meeting to synchronize. After that, most of my time is spent coding and implementing, doing code reviews, and reading up on topics related to my projects.

Coding and implementation typically represents around 60% of my day, reading maybe 20%, but the balance depends on the stage of the project I’m working on. On average, I have two meetings per day, primarily technical meetings with my team members or synchronization meetings with other teams or the DataDome management.

Q: How did you get into the data science field?

A: After finishing my PhD in theoretical physics, I wanted to step out of academia and find a “real” job. Someone mentioned Coursera, where I found Andrew Ng’s Machine Learning course which has since become a must-have for every aspiring data scientist—millions of people have taken it now. Anyway, I took the course, and I thought “this is amazing”! I already knew all the math I needed in order to work with machine learning; what I didn’t know so well was coding. But I found courses for that, too. One of them required that I participate in a Kaggle competition, which is how I started to learn Python. 

 I then found a job as a developer, which provided the perfect opportunity to make the jump from academia to business. I always had in the back of my mind that I wanted to do data science, but those years of developer experience turned out to be very valuable. When we put new machine learning models into production, it’s not enough to know the theory—you need to write the actual code, make it work, and make it reliable. 

Q: What exactly was it that attracted you about data science?

 A: I enjoy the complexity and the math involved, and I like that the knowledge is very transferable. In my previous job, we used machine learning to perform quality tests in a production plant. Now I apply it to cybersecurity, but the same algorithms can also be used for things like medical imaging, almost anything really. And I think that’s amazing. 

Q: How have you seen the data science field change since you started? 

 A: When I started to follow the field, some eight years ago, all the hype was around XGBoost and more traditional algorithms. Today, there’s a lot more focus on deep learning and neural networks. 

 I also think that, at the time, most companies doing data science were just scratching the surface and experimenting to see what could be done. This has changed. More and more companies are now using data science in production, which requires not only a theoretical comprehension of machine learning, but also the ability to write production-quality code. 

Q: What do you see as the top trends in data science right now? 

 A: Well, it’s a huge field, but one interesting area of research is trying to understand why neural networks work and what exactly happens when they are trained. People are trying to create models of neural networks’ behavior, but it’s still an open research question. I also see a lot of hype around natural language processing (NLP).

Looking at the latest NeurIPS conference program, there was also a huge chapter about bias and ethics, and how to tackle that. Models learn from data, so if the data is biased, the model will be biased—especially with language models, which are often both racist and sexist because they capture all the existing biases in the data that is out there. 

In general, there’s a lot of focus on data-centric AI, and rightly so. If your data isn’t good, you just won’t get reliable output. Us data scientists like to create complicated models because it’s so much fun, but the truth is that very often, if you have better data, you don’t need to change the model. Even well-known datasets like ImageNet have been found to have mislabeled images, so there’s a growing body of tools available to help you identify this sort of problem, gather expert knowledge, and automate your data labeling.

 This is something we’re already working on at DataDome. Without going into too much detail, we’re using automated data labeling functions to produce probabilistic labels for every fingerprint. This helps us, for example, identify false negatives.

Q: How do you stay knowledgeable about trends? What are your preferred resources?

 A: I subscribe to a lot of newsletters, probably too many. One that I particularly like is The Batch by professor Andrew Ng, whom I already mentioned. It summarizes research papers, but also covers interesting topics that appear in the news, with a very short and concise approach. Another very useful tool is called Connected papers. When you put in an academic paper, it creates a graph of other publications with similar content. It allows you to quickly find the most relevant papers for the subject area you are working on.

 In my spare time, I also continue to do Kaggle competitions from time to time, to learn and gain experience in new areas. The last one I did was about image segmentation, which isn’t something I’m using in my day-to-day work. There are other platforms, too, but Kaggle is great because the participants are sharing a lot. There are spaces for discussion and spaces where you can share your notebooks, and people really do. So you can see other people’s work and how they explain their approaches, which is a great way to learn.

 You can also access the history of earlier competitions, and if you’re lucky, you’ll find a problem that’s similar to what you’re currently trying to do. Of course, competitions are only about getting the best score, and the winning solutions aren’t always practical in real life, but there’s still often useful inspiration to be found.

Fun fact: In 2021, Konstantina won 2nd place in a competition called the Feel The Rhythm Challenge, where the Australian utility company Western Power asked data scientists to develop a model to help keep people safe at work. Congratulations Konstantina!

Q: What’s the best part of your job?

A: I really enjoy getting quick feedback on my work. When we deploy a new machine learning model to DataDome’s bot detection engine, our feedback loops will tell us almost instantly how it performs. In many other businesses, you have to wait for a long time before you get any feedback at all.

I also like that all my customers—internal and external—are technical people. I find it very easy to interact with them; even if they’re not in the same field as me, they understand my challenges and pain points. 

Q: What are you most proud of in your career? 

A: On the technical side, it must be the implementation of machine learning in DataDome’s API servers. I managed every aspect of it, with help from the engine team in the final phase. Many of the technologies involved were first-times for me, and I touched almost every component of DataDome’s infrastructure.

Together with my colleagues, we have also laid the foundations for a really strong machine learning team at DataDome. I think my soft skills have improved a lot over the last couple of years.

Q: What advice do you have for someone looking to break into the cybersecurity field? 

A: Cybersecurity is huge. Personally, I didn’t know anything about cybersecurity before joining DataDome, but I learned by reading a lot and asking plenty of questions. 

I’d say that if you’re looking to break into cybersecurity, you first need to narrow down the area you want to focus on, and then just start to experiment with it. If you’re interested in DataDome’s domain, for example, you can start by creating a few bots yourself, try to scrape some websites and see what happens. Another great way to explore different domains and gain hands-on experience is to enter Capture the Flag (CTF)-type challenges.

Q: Women are notoriously underrepresented in cybersecurity; what has your experience been like? 

A: Well, I’ve been in this situation since I started studying physics, but honestly I’ve never had any bad experiences related to being a woman in a male-dominated field. Maybe I’ve been lucky, or maybe I’ve just not connected the dots. I don’t tend to overthink these things, and if someone’s unpleasant, I’ll just think he’s an idiot—I won’t necessarily believe that it’s because I’m a woman.

Q: If you had to work in any other industry or role, what would it be?

A: My role suits me perfectly, and I can’t think of anything I’d rather do. But if I had to change industries, I might choose the medical field. There’s lots of interesting work going on, and if you’re successful, you can really change people’s lives for the better.