What’s it like to work in climate data science?
Meet Climate Policy Radar data scientist Kalyan Dutia. He shares his love of linguistics, the opportunities and challenges of responsible artificial intelligence, and exciting developments in the field of machine learning for climate change.
Downtime from data science, foraging with colleagues
Give us a whistlestop tour of your career as a data scientist
I discovered machine learning during my engineering Master’s. I was really fascinated with the way computers could learn by example, rather than by a set of rules. For a few years I was a data science consultant in various industries at IBM, doing everything from detecting smart meter faults to building and maintaining machine learning models for a chatbot that serves millions of users. That’s what sparked a real interest in natural language processing and linguistics.
I then led work at the Science Museum for a research project looking at how you could use machine learning to build knowledge graphs that link the country’s gallery, library, archive and museum collections together. It was a really fun opportunity to get deep into research in an interesting and impactful problem in the open science space.
When that project came to an end, I was figuring out what I wanted to do next and realised I wanted to work on climate. I stumbled on Climate Policy Radar through Twitter, and before I knew it I was in the door as the company’s first data scientist.
Why did you want to do data science for climate change?
I was looking for an impactful problem that I could use my natural language processing skills in. I narrowed that down to climate and healthcare. When I went looking for natural language processing jobs in climate, most roles involved non-language applications of AI, like using computer vision to analyse satellite imagery or time-series data to better understand the energy grid. Climate Policy Radar was the first company I found that was applying natural language processing to climate change.
What does your job look like at Climate Policy Radar?
At the moment I’m focussing on improving the experience people have when they search for climate change laws and policies on our open tool. This means understanding their intent when they search for something in our database, designing powerful but efficient ways to turn that search into a query on millions of text passages within those laws and policies, and aggregating those in a way that’s useful to display back to them.
I’m also doing lots of thinking about how we can use machine learning to start creating a structured evidence base from all the text we have. And recently, I’ve been thinking about how we might use the latest advances in AI, such as large language models like OpenAI’s GPT-3, in a responsible and reliable way to add powerful new features to our tool.
This could look like automated summaries of legal and policy documents (see the image below for one I made as an example), or suggesting queries people might like to use to interrogate our database based on what they typed into the search bar. I tweeted about some early experiments I’ve been playing with.
A screenshot of an experiment using OpenAI’s GPT-3 to produce a summary of a climate policy that has been tailored to a particular search query.
I do as much knowledge sharing with the team as I can, so everyone feels like they can chip into the direction and development of machine learning that we’re doing.
Why do you like working with language?
It’s fun to learn linguistics, and I’ve been lucky to work with really good linguists here and elsewhere. I find there’s a lot of power in building things that make predictions using language, because naturally everyone understands the inputs and outputs without needing to understand any of the maths. I like nerding out about the language aspect of machine learning and the additional complexities it brings, like psychology.
What big problems are you tackling at work right now?
My big challenge is, how do we do machine learning in a measurably responsible, ethical and effective way? And how do we build things in a way that gives us the information to build future things better, without the overhead of having to do more maintenance work, as we’re only a small team.
What applications of machine learning for climate change are you excited about?
I’m really fascinated by the use of machine learning to tackle climate change using satellite imagery. Open Climate Fix - among many others - are doing great work in this area and their commitment to open science is something I hope we can replicate in the policy domain.
We also recently partnered with OpenAI for their hackathon on climate change, and one of the teams built a demo that would allow policymakers to query Google Earth Engine using natural language. Now I’m thinking more about how we could connect our future analyses of climate policies with the effects of climate change seen from satellites.
Who’s doing brilliant things in this space that you’d love to collaborate with?
The level of engagement in the hackathon we ran with OpenAI showed that there are so many data scientists and software engineers itching to work in climate, so I’m keen to figure out ways we can make community engagement as impactful as possible. I’m also super excited to collaborate with all the policy experts from our partners at LSE’s Grantham Research Institute and elsewhere, whose work we’re building on using machine learning.
What gets you out of bed on a Monday morning?
The opportunity to work on exciting and impactful problems with really good people. Textbook, but true.
And finally, on a serious note… What series are you into right now?
After years of not succumbing to peer pressure, I’m finally binge-watching all of the US The Office for the first time. I’m definitely late to the party on that one.
Stay updated with the latest developments from Climate Policy Radar - sign up for our newsletter.