Exploring the Cosmos through Data: Meet Sebastian Ratzenböck, our First PhD Graduate in Data Science

Using interpretable machine learning techniques, Sebastian discovers hidden stellar groups and sets out to create Atlas Galaxia Proxima, a thorough catalogue revealing the nearby Milky Way's star formation history.

Sebastian Ratzenböck combines data science with physics. During his Master's, he analysed data from various scientific projects, including dark matter search and high-energy particle physics experiments at CERN. For his PhD, Sebastian developed interpretable machine learning techniques to reveal new stellar groups in the Milky Way. His current objective is to build Atlas Galaxia Proxima, a comprehensive catalogue that provides valuable insights into the recent star formation history of our local Milky Way. Embark on an interview with Sebastian, as he tells us about his dissertation and we explore his FFG funded project.

 

Dissertation

Finding needles in the Galactic haystack - towards interpretable machine learning methods to identify stellar structures in the Milky Way.

In his dissertation, he introduced new machine-learning techniques for identifying unknown stellar populations and yet unseen members of known stellar groups in the Milky Way. These techniques include Uncover, a novelty detection approach that looks for undetected star cluster members, and SigMA, a clustering algorithm that studies the topological properties of stellar data. Both methods are formulated in a domain-specific language, use expressive hyper-parameters, and allow for result validation to enhance confidence in the results. With these tools, we aim to enable astronomers to build models based on their expertise. He has successfully applied these methods to Gaia data, an extensive census of Milky Way stars, and detected previously unknown stellar populations.

 

What motivated you to pursue a PhD and how did you choose your field of research?

My Master’s program in Technical Physics fuelled my interest in data analysis of physical data, especially machine learning. During my Master's, I got hands-on experience in data analysis by working on three big scientific projects and had the chance to apply machine learning methods to classify high energy physics data recorded at the ALICE experiment at CERN. This experience taught me the importance of modern data science methods in solving complex problems in physics and motivated me to pursue a PhD focused on data science.

 

To what extent was it beneficial for your research to be part of our research network Data Science?

As my work centres around applied data science, I encounter various domain-specific challenges that demand expertise from diverse quantitative fields. Being part of the research network Data Science has given me access to a wide range of knowledge. Throughout my studies, I had many insightful discussions with my colleagues and members of the data science board, which fuelled new ideas that helped shape my research.

 

What was the most significant discovery of your research and how do you think it will impact your field and benefit our society?

Our research has revealed a new aspect to the star formation process: star-forming regions undergo multiple distinct and apparently coordinated star-forming events, each showing sightly different ages and kinematics. This finding challenges a widely-accepted theory that most stars are formed in much more chaotic and dispersed process. Instead, our work suggests that all star formation happens within clusters (not dispersed) that form age-ordered chains of young stars. We are currently conducting further research to support this claim.

Additionally, we have developed a tool for astronomers that enables them to create machine-learning models on their own. The tool enables domain experts to select hyper-parameters in the space of physical quantities one cares about instead of in the space of arbitrarily-defined clustering-method parameters.

 

What were the biggest challenges and how did you maintain your motivation and focus during your studies?

Working in an interdisciplinary field during my PhD program presented certain challenges. Rather than focusing on a single problem and studying it in depth, the program requires (and enables) learning various skills from different fields, such as physics, statistics, machine learning, and visualisation. Each domain uses a different language and research methods which takes time to get accustomed to and master. A broad background also makes it harder to establish credibility in one or multiple fields simultaneously.

Regular exchanges with colleagues and supervisors, joining weekly group meetings, and taking on teaching responsibilities helped familiarise me with and deepen my knowledge in different fields. But the biggest motivation is seeing when a project works out and domain scientists find some great insight with the tools that we have built. It’s a great feeling when the work pays off, and immense motivation to keep going.

 

What are your future career plans and how do you see your PhD degree influencing your professional trajectory?

I’m happy to have secured funding as a PostDoc researcher, which enables me to try to understand the origin and evolution of stellar structures in the Milky Way by employing and further developing tools built during my PhD. I plan to use SigMA and Uncover to construct a high-accuracy master catalog of stellar structures. This catalog will serve as a baseline for further investigations on the fundamental properties of all nearby stellar populations, which provides a unique laboratory for studying star and planet formation and evolution. 

Beyond that, a PhD in data science is a great foundation for many real-world applications where data has become a significant and abundant resource. The ability to manipulate and make sense of complex data sets is becoming increasingly important in many applied fields outside physics, such as finance, healthcare, and marketing. Therefore, obtaining a PhD in data science not only opens up exciting research opportunities but also provides a strong foundation for an impactful career in various industries.

 

Please tell us why you stay affiliated to the research network.

I’m happy to have secured funding from the FFG for a project at the Research Network Data Science. In the three-year project, which started in the autumn of last year, I aim to use the tools developed during my PhD to study further and understand the origin and evolution of star clusters in the Milky Way.