Machine Learning Architect Anne Schuth recommends news - and that you read this
If it touches upon machine learning, it’s part of Anne Schuth’s job. In only a few years, Anne has built a small but impressive machine learning empire within DPG Media. Here, you catch his personalized story.
In 2018 Anne joined as a Machine Learning Engineer, set up a team, and leveled up news personalization for all DPG Media’s news brands. Now, he’s proudly wearing his fifth job title: Machine Learning Architect. Anne laughs: “Yes, the fifth one already, but it’s a very logical sequence of job titles. They’ve evolved in line with my tasks and responsibilities.”
Anne’s steps within the company are illustrative of how things work at DPG Media. If you have a good idea and you convince the right people, you can make it happen. “It’s pretty cool that it’s not all mapped out from above. If you think of something of value, you can just plan a meeting with a director. I know that in a lot of companies, that is out of the question.”
Anne looks back on fruitful years in which the News Personalization squad grew from zero to twelve people. All of them work on machine learning applications for news recommendation. As a Machine Learning Architect, Anne focuses on the technical aspects of building a large-scale recommendation system and oversees the technical challenges of content understanding, search, user understanding, and ranking.
Before Anne’s arrival, DPG Media was only just, cautiously, exploring personalization. Anne: “I remember a Proof of Concept for Personalization as a Service, but it wasn’t quite right. It was more of an e-commerce solution which you see at Bol.com or Amazon. ‘People who have read this also read…’ That’s a very limited approach to personalization. Also, a huge downside was that such an approach only works for older content while we are serving readers news, inherently new.”
Time for a different approach. “Don’t focus on what other people are reading, but focus on the contents of an article,” says Anne. When he and his growing team started delivering their first products based on this rationale, personalization gained real traction within the company. “People started to see the effectiveness and slowly, but surely, started to realize that DPG Media is big enough to work on machine learning in-house. Also, I strongly believe that we, a publisher, shouldn’t outsource news recommendation as it’s such a fundamental part of our products. In the end, we select the news. Normally, the editorial staff would do that. If you leave it up to an algorithm, you have to understand that algorithm inside and out.”
The devil’s advocate and a moral compass
Anne thinks DPG Media is a stimulating company to work for if you’re into machine learning. The technical challenges are pretty thrilling, as the scale and impact are massive, and news recommendation is just fun to work on. And then there’s the moral appeal. Anne explains: “If we don’t get it right, companies like Google and Facebook will. They’ll sideline us. For society’s sake, I think that it’s important that a publisher takes on this role and not the foreign tech giants that have no sense of social responsibility.”
The devil’s advocate will argue that society doesn’t need news personalization and is not interested in filter bubbles. Anne recognizes the resistance but says personalization doesn’t necessarily lead to filter bubbles. “Part of our job is explaining that we’re not creating filter bubbles. Personalization is nothing more than adjusting a product to a person, and yes, we could do that by creating a filter bubble. But we could also do the opposite and show people everything they don’t like. That’s also personalization, but it’s of no use to the user nor us.”
The real question is how to adjust a product so that it betters the user experience. For news, that means not exhausting readers by throwing the thousands of articles created daily by DPG Media’s news brands at them. One way or another, a manageable selection needs to be made, ensuring that a person is informed. And, unfortunately, it’s not that simple as a filter bubble. It’s always a mix of editorial selection and algorithmic personalization.
“One of our journalistic responsibilities is to inform people, and there are different ways to do so - no filter bubble needed. Let’s take me as an example. I’m not interested in soccer; I don’t want to read it, I don’t want to see it. So just don’t serve it. It’s as simple as that. It’s a very basic personalization option, one that not many people object to, but it is of high value as it opens up space on our news platforms for other content.”
Let’s talk tech: algorithms, tech stack, and more
But how does this machine learning thing actually work? Is the entire squad constantly writing and adjusting algorithms? Nope. The algorithms are at the heart of the systems the Machine Learning Engineers work on, but most time is spent on the systems themselves. The squad works with an advanced tech stack, including Python, Kubernetes, Redis, Elasticsearch, Airflow, Delta, MLflow. Kafka is used for stream processing and to connect all the dots in the infrastructure, and PySpark for batch processing. Everything runs on AWS.
One of the products Anne’s squad recently delivered is the personalized push notifications for Algemeen Dagblad and its regional titles. “The notifications are mainly location-based. Each day, we send hundreds of thousands of notifications. For every single published article, we basically consider each user individually. That’s an enormous load we have to deal with intelligently.”
Anne already mentioned that news recommendations should be content-based. Several squad members work on automatically understanding what an article is about. Through text extraction and labels added by journalists, they optimize personalized news recommendations. Anne: “We train the algorithm ourselves based on our data. We search for mentions of persons, organizations, and locations, for example. We also try to recognize sentiment, but you can already make quite a difference by just counting positive and negative words. We use that to balance gruesome and more light-hearted news in recommendation lists. We’re also figuring out how to balance political bias in our recommendations, but that is still a difficult one to tackle.”
In addition to content understanding, user understanding is important for news recommendation. There are implicit signals, like reading behavior, and explicit signals, like following topics. All the squad has to do is put two and two together. Easier said than done, by the way.
“We’re all in it for the long run, not just one sprint”
Machine Learning Engineers all have different backgrounds. Some are Backend Developers, others are DevOps Engineers, but everyone is a Machine Learning Engineer. “We believe that knowledge should be spread across people as much as possible. So when someone joins us, we first let them pick up stories they are unfamiliar with. That’s not very efficient at first, but it pays off in the long run. It’s what makes us a good team: we can help each other out and easily take something over when someone is on holiday, for example. Also, when you understand how different services and applications are connected and how they run, you make different engineering decisions—better ones,” Anne smiles.
Anne is a lucky man, as he had the chance to hand-pick his team. “I’m very happy with my colleagues, and we have a great time. I think the feeling is mutual. In general, I think DPG Media is a very friendly, supportive organization. Because of its size, DPG Media offers the time and the space to figure things out. Of course, we all have to deliver. We’re all in it for the long run, not just one sprint.”