TRANSKRYPCJA VIDEO
Dla tego filmu nie wygenerowano opisu.
Welcome back to Google DeepMind, the podcast. I'm your host, Professor Hannah Fry. Now, there are a few places in the past couple of years that have felt the transformative influence of AI as keenly as the education sector. Teachers around the world are having to abruptly rethink how they engage with and assess students. And there are concerns around the potential for cheating and dependence on technology. Those fears are valid. But despite the rapid transition, there has been something remarkably resilient about the idea of a human teacher. Something that, at its heart, has remained immune to ebbs and flows of technology. After all, we've had classrooms for almost as long as we've had civilisations. And there are undoubtedly opportunities here too.
Imagine a classroom where each lesson is tailored to your individual pace of learning. where an AI tutor is available around the clock and technology can predict where you're likely to get stuck before you do. Well, researchers here at Google DeepMind have been grappling with both the opportunities and challenges of AI in education. They recently published a major paper on developing AI responsibly in this area. And one of its lead authors is my guest on the podcast. Irina Durenka is a research lead at Google DeepMind. Her background spans experimental psychology and computational neuroscience, and she has spent a decade within these walls asking questions like, how do humans learn? Welcome to the podcast, Irina.
This is a space where people are very heavily invested, right? Does that make it quite a difficult space to navigate? It does, because if you think about it, education has been around for thousands of years, and it is a fundamental goal. structure in our society. Every child is supposed to get educated. So the educational systems have been around for a while. They are quite rigid and they're very established. So to come in and say, look, we have this amazing technology and we're going to revolutionize everything and change everything. I think it's not going to work so easily. And we've seen this happen with technologies of the past, like intelligent. . . tutoring systems have existed for 50 plus years.
A lot of investment and research has gone into them, but you could argue that the promise of that technology hasn't fully materialized. Or more recently, we had MOOCs, these massive open courses. And again, there was so much excitement how we won't need traditional education anymore. You can just go online and learn anything you would ever want to learn. And once again, when you actually look at who uses these systems, it's people who have already gone through traditional education. And typically it's people trying to get their second master's. So it's definitely not the thing that came and broke the system. And I guess maybe we shouldn't be trying to break the system. There is a lot of amazing stuff happening in traditional education. It's not just about. . .
taking the knowledge from the teacher and kind of distilling or drip feeding that into the student. It's about the social aspects of talking to your peers and learning together. It's about the teachers like giving skills like how to be a global citizen, how to navigate, how to like critically think, how to evaluate information. So there is so much more to the educational systems. than just the knowledge that they give. So in our team, we're thinking about the new technology in terms of how can it work within the current system and how can it add to it. So this isn't like starting with a brand new blank sheet of paper and saying design an education system from scratch. It's like augmenting the one that exists.
Yeah, so actually Justin Reich, a researcher at MIT, has a really nice quote. So he says that. . . new technology doesn't break educational systems. Educational systems kind of tame new technology. Which is what happened with MOOCs. Exactly, exactly, yes. So, and yeah, as I said, we're also seeing that there are these human aspects of teacher-student interactions that we can't possibly ever change with technology. For example, if you think about a student and a tutor, There are some social rules that are in place where a student is very unlikely to just stand up and walk away from a human tutor. But if you are interacting with an AI tutor, you can just close the window and that's it. So there are certain challenges that come with bringing. . .
technology in and there are certain things that human to human interactions have that technology will never replace so this is why we're trying to work within the system to begin with. How disruptive do you expect it will be to education then? Because I mean, on the one hand, there has been quite a lot of disruption already, like especially particularly recently with large language models. When I spoke to Demis, he was talking about overestimating the impact of something in the short term and then underestimating how big the longer term impact will be. Where do you think education fits in with this? I feel like there is so much buzz about Gen. AI right now in education. I feel like everyone actually expects it to completely change everything immediately.
And there have been so many different attacks that sprung around taking a language model and turning that into a tutor or a homework helper or anything else that that kind of help students and honestly so far nothing has really made the impact that I think everyone was expecting. So that's why from our perspective we are in the center of actually improving this technology and we have unprecedented access to Gemini, we can influence how things change and in fact one of our goals is to make Gemini the best large language model for education. So what is the ambition here? Is it to build a sort of universal AI tutor? It is, but we also wanted to power different experiences. So the very first place where we deployed our AI tutor was YouTube.
So on learning videos, there is now a new function, which is kind of like if you're watching a learning video and you don't quite understand something, you can virtually raise your hand. And an AI tutor will pop up and you can ask all your learning questions to the tutor. And then more recently, we also launched Gemini Gem. So it's called the Learning Coach. And it's basically optimized to be your guide into learning experiences. So here you can come up with any question and say, oh, I want to learn about photosynthesis or tell me about American Civil War. And it will give you a plan. It will try to understand what you know and don't know, and it will try to guide you through the materials.
So what we're hoping to do is really push the research to make these base models as good as possible for education and then figure out how to actually make the best use of them. And in fact, I think we hope that the community can help us with that so that it's not just us dictating. what an AI tutor should be like. It's us listening to people who have been in the space for much longer than us and trying to help them make the most of technology and make the technology be the best it can be for them.
How far do you think the technology can go though? Can you sort of paint me an image of what you, in a very optimistic scenario, would like the future to look like? I think a lot of people talk about kind of AI-first schooling or these. . . I think there is even a school in the UK that just switched to mostly having AI-based education. And I think they only just have a few teachers on hand to kind of help around. And I just don't think that that future is something we should be striving for. We really don't want to, you know, replace human teachers. We want to. . . give a tool that enhances this kind of in-person classroom experience between teachers and students.
I think it's a little bit sad if students come to school and just sit around looking at screens all day. So the way we are thinking about it is that there are still teachers as mentors, as role models to the students, and there is a lot of kind of peer interactions during learning. But there is this AI system that helps, that works with teachers and learners and helps them make the best of the situation. So maybe for each learner, the AI tutor can help them move at their own pace and really target their interests. And at the same time, the teacher gets a view of where everyone is and they can kind of still steer the student.
So they still have control and they can still kind of bring their own personality and teaching style to the lessons. Because I think this connection between teachers and students is so important. And like looking back on my own education, what stands out to me is these amazing teachers who made me excited about a certain subject. So I think what technology. . . should be trying to do is make more of interactions and memories like that and maybe remove the less ideal situations where maybe the teacher and the student don't click or the teacher is so overworked that they don't have time to spend with a particular student who actually needs them the most. I imagine that there'll be some people watching who don't necessarily know about your background.
So can you tell us a little bit, what was your path to get to thinking about AI in education? I mean, how far away should I start? Day one. A brief history. Yeah. So I've always been fascinated by intelligence, any kind of intelligence, human or artificial. I started coding quite early on in life. It was just a lucky coincidence that my brother and I got. . .
a comic book as children and it was about uh basically introduction to programming so so we started writing small games um around the age of probably 11 or 12 and i remember at some point uh during the summer my brother and i were bored and we discovered that you can actually get access to the source code of of one of those like shooter games um and you could actually code up your opponents so like wow this is exciting we can actually create ai so i i remember how like putting a diary and she liked the summer we're going to solve ai um of course that didn't happen the ambition of you oh yes amazing surprisingly though my brother went on to study computer science um but i was growing up in a kind of traditional society where somehow it just didn't click to me that computer science the degree and making games and like playing around on computers with my brother are the same thing to me computer science was something kind of dry and more about the hardware and i really did not enjoy that so i ended up studying psychology as my degree i was kind of wondering okay how can i move towards ai and still study intelligence because i was fascinated like how does the brain do it how does this incredible behavior and intelligence and reasoning, how does it all arise? And then I was very lucky that by the time I finished my PhD and I heard about deep mind and how you can actually do neuroscience research and answer these deep fundamental questions with deep learning, it was kind of this perfect job for me.
So I started off in the neuroscience team. And as I mentioned, kind of this idea of intelligence and reasoning has always been at the back of my mind because reasoning is kind of what makes us intelligent. So I started to work on improving reasoning in language models. And very early on, just even before language models became kind of this big thing, I realized that they were quite bad at reasoning. But also what I realized is that humans don't really use reasoning that much. If you think about it in our daily lives, we don't actually think through a lot of our actions. We kind of just. . . we're almost acting on autopilot. So to really study reasoning, we needed a domain where reasoning was important.
And that's where education became a thing again, because this is where humans discover how to reason well. There's something so interesting in that then, that the motivation is in some ways trying to teach AI to be better at reasoning, and in the process, understand what it means to teach reasoning. I mean, that's kind of quite a nice way around to look at it. Yeah. And also it's interesting how doing something and teaching somebody else how to do it are not the same. And this is basically the challenge we are now solving. So the base Gemini is slowly improving at reasoning and math and coding and all of those basic skills.
But then our job is to actually stop the model from using these skills and kind of giving away the answer and really like just doing the job for the student. And instead holding back and thinking about what are the right questions I can ask the student so that they can figure it out by themselves. And that's very hard. Like models are fine tuned to be helpful. So the initial reaction is I'll just give you the answer. So we have to do a lot of work to stop them from doing that. But then actually, I think you've really hit the nail on the head there, that being able to do something is not the same as being able to teach it.
And I'm really struck in maths education, which is the space that I know most about, about how there is this push and pull from different sectors about what is required of students and the best possible way to instil those skills and that knowledge. If you're building a sort of an AI which. . . will have this universal appeal. How do you find that balance of making sure that you're hitting all of the notes that are required from all of the different areas? It is a good question.
So when we first started building the the Choose server, we thought, you know, we can talk to teachers and kind of other maybe academics in the field, as well as learners and figure out, okay, what is the perfect way to teach? So there is a sort of a best. Yeah, but you kind of assume that in everything, there is like this optimal strategy, maybe this is the scientists are going to ask kind of a But we did that. We went and interviewed a lot of stakeholders. And what we realized is that there is a lot of disagreement. And actually, once we started even deploying our early tutor models on different Google surfaces like YouTube or Gemini app, we found that even there, there were different requirements.
So let's say on the YouTube. The educational video is the main act. So the tutor is really there to support that. And maybe the tutor should be giving away answers much more because it's actually helpful for the learner on that surface. At the same time, if you talk to a teacher at school, they have very different requirements. They really don't want the tutor to give away answers, definitely not to the exam questions. And they will also want the tutor to follow some particular exam board requirements or particular teaching style of that particular teacher. So how do you actually incorporate all of those diverse voices into a single tutor? So what we've realized is that we need to build kind of this base pedagogical model that you can steer with different instructions.
So one teacher can come and say, actually, I want my students to just have fun today and just answer any question you have and really push on some fun experiences. And another tutor might be much more academic and say, no, today we're doing exam practice problems. You just guide the students through these topics and make sure that they understand everything. I guess one of the big things about education, I mean, as you said, right, there isn't this optimal approach to teaching, but there are these kind of imperfect measures, really. You know, we sort of know good teaching when we see it, but it feels quite difficult. difficult to quantify.
So how do you decide what counts as good pedagogy when you're navigating in this space? So first, you might say, well, there's learning science, right? So why don't you just look at the papers and they'll give you the answer. And yes, there is a lot of literature, but there is no consensus as such. But another thing is Pedagogy is very context dependent. So what works for maybe a novice learner might not work for an expert learner, or what works for a subject that's more procedural, like, let's say, math, you actually learn the skill of like the procedure of how to solve a problem might not work for more memory based subject like history. So when you start thinking about, okay, there's these hundreds of different pedagogical strategies that have been studied.
all of them work slightly different in different contexts, suddenly you have this massive space where maybe there isn't like one single point that's the best pedagogy, but there are many different regions that are best pedagogies in the given context.
But the problem is, how do you even kind of quantify this space? And then how do you search it for this perfect? pedagogy strategy and it becomes kind of similar to the work that deep mind you know has done before like um playing the game of go um the reason why it was such a huge challenge for ai was because the search space was huge like all the possible moves you can do it's like so many and there isn't one known strategy you can say ai has to search the space of possible moves and strategies and discover what it things is the best one. And what we found with the AlphaGo work was that, first of all, AI was much better than humans.
Like basically all of humanity for thousands of years playing the game of Go, AI was actually able to search the space and discover better strategies in the matter of days or kind of months compared to what humans could do. So our hope is that we can do something similar with education. we're going back to this kind of question like how do we actually know what success looks like? In Go, you can still measure who has won, and it's pretty unambiguous. Whereas in education, the holy grail is whether the students'learning outcomes have become better. But this is not something you can measure quickly. You kind of need months, if not years, to really track this.
the learner and that's not it's not really feasible so a lot of our work is actually done okay we know what we're aiming for but how can we approximate it in a way that's easier to measure and faster to measure so we published a report recently where it's like 70 pages of basically our trial and error and like different attempts at measuring pedagogy so going from working with real students at Arizona State University and maybe measuring at the longer timescales of a couple of months to asking pedagogical raters and teachers to look through a few examples of conversations between students and our AI tutor and maybe give us quicker feedback on the order of weeks or days to automatic measures where we actually ask AI to evaluate AI.
and give us much more targeted, much more limited, but still useful feedback in a matter of hours. But I guess if you really want to evaluate what good teaching is, you want to do that full randomized control trial where you're monitoring people over a period of time. How far away do you think we are from being able to run those? Well, these are being run right now. So Arizona State University is one example where we're actually running these.
I think the problem with these is Even if you take your students and you split them into students who have access to the AI tutor and students who don't, what we find is that in the group where they theoretically have access to the tutor, only a small percentage actually engage with it. And that creates a problem because why are some students engaging and others not? Is there something inherently different about these students? And then if we. . .
only see success in those who engage, is it because of the tutor or is it because these learners were inherently more motivated and hence they would have done better anyway? And then the question is, who are we helping and what effect does it have kind of at the larger scale? So if you kind of think about the top students and the bottom students and then you're helping the top students do better. but you're not actually helping the bottom students. You're actually increasing the gap. But I think everyone, when they go into edtech, they actually want to decrease the gap. So how do we do that? How do we make sure that everyone engages? That's another big question that we're working on.
I mean, there's just imperfect measures everywhere you look, isn't there? It's very, very difficult to get a real ground truth in any of this. But then I suppose there's also, I mean, there's further complications in this because, okay, so that sort of. . . teaching styles. But presumably there are some subjects where there's more of a ground truth than others. I'm thinking, for example, if you created a tutor for history, I mean, it would change depending on which country you were in as to what might be the most relevant answers to a particular question. Yes, this is a big issue for us. We are, yeah, we've thought a lot about what do you do in this situation because you can't give this one true answer to any history question.
This is, again, why we're thinking about steerability so that teachers in different countries can give kind of the background information to the tutor so they kind of knows what is the expected way of answering certain questions. But it also. . . often historical topics bring up questions that are really important to discuss, but are also hard to discuss and very sensitive. I'm thinking things like the Holocaust. So again, how should the tutor behave in these situations? I think the standard approach to safety often is effectively declining to engage in a difficult conversation, but that's not something a tutor can do. can do. I mean part of the point of education is to think about difficult things. Exactly. So I can't say that we've solved this problem.
We are trying to give different views and kind of trying to give the learners a chance to critically evaluate different ideas in the space. I'm also really trying to bring metacognition to this problem. So metacognition is an interesting one. I think it often gets overlooked, but a lot of people don't actually know how to learn. It's often. . . not as much fun as you would expect. It requires you to plan ahead, to really engage with the materials. And yeah, most people don't really know how to do that. So what a tutor can do is actually teach, learn, okay, if you're trying to answer this difficult question, maybe what you should do is go and look up different primary sources. And then. . .
think about what are they telling you and what do you think about it? What do other experts think about this? And kind of teaching the learner how to go about answering these questions rather than necessarily giving the answers directly. There's layers to this. And I guess on one layer, you have like knowledge and facts, which is, I guess, maths is quite, you know, full of them. And then above that, you've got like the skills of critically evaluating. And then above that, metacognition, which is. . . how to develop the skills to evaluate the knowledge. Exactly. So you think that's the answer to this safety question of approaching difficult problems? So not necessarily the answer to safety.
It's more of an answer how to engage with subjects where, as you said, there is no necessarily like single ground truth. In terms of safety, I think it's a slightly different question. Sometimes people ask us like, why are you working on safety at all? Aren't you? using base models, which already went through a lot of safety, fine-tuning safety work. And the answer to that is, even though they have done all of this background work, when it comes to the educational kind of use case specifically, you have to think about how these systems will be used. So one thing we found was that. . . So. . .
Our tutors are deployed to Arizona State University students, and in particular through their study hall program, which is aimed at bringing more diverse learners to higher education. So essentially anyone watching ASU videos on YouTube can get invited to take part in this course where it's the same lectures, but with more faculty support and essentially an opportunity to earn credit and then transfer. to become an actual student at Arizona State University. But what it means is that these learners are typically already in full-time work or they have family commitments, they're quite short on time and stressed. And so when they're learning, naturally, sometimes they are just in a bad state and there's no one, maybe they're studying at 11pm and they just need to vent.
And the only thing that they can vent to is this AI tutor that's sitting in front of them on the screen. So we find these kind of emotional outbursts like, I am so stressed. I'm really struggling here. Will I ever be able to solve this problem? Maybe I should just quit. And, you know, the tutor can't ignore these messages. It can't just say, sorry, you know, I can't answer this. It really needs to engage and say something that connects. with the user in this very vulnerable state. So what our tutors are trained to do, and we see transcripts like this coming in, it's like, you know, it's fine to feel this way, everyone feels this way.
We can get through this together, there are resources that can help you and things like that. I know that you've written that an AI tutor should be careful about sensitive self-disclosure, I guess, particularly in that sort of a setting. What did you mean by that? So one when people speak to each other, what often happens is maybe one of the conversation partners will say something personal and maybe mention a personal fact and that encourages the other person to also open up and share something about them. And through this, they build trust and a kind of a connection that helps the conversation move forward. And when a learner mention something so personal about how stressed they are. It's almost Naturally, they would expect the tutor to share back.
But then, of course, the tutor doesn't have a stressful situation from their past that they can share. Anything they self-disclose like that would be effectively a lie. So there's this very kind of thin line that we have to walk where the tutor needs to maintain the connection and make sure that they support the learner. but at the same time not mislead them and not create a connection which shouldn't exist between a human and an AI. So at no point can it pretend to be another human, but it needs to understand how to empathise with a human student. Exactly. But then, okay, I sort of wonder, there's something really interesting there about like the correct amount of anthropomorphisation.
Are there some advantages to to students knowing that it's an AI, knowing that there isn't a human at the other end? Are students more comfortable making mistakes in front of the AI, for instance? Yes, for sure. So something we've heard from students is that they feel much more comfortable asking what they might perceive as a silly question to AI tutors, just because they don't feel judged, as you kind of do when there is a human on the other side. Also, when you're in a class. And you could ask a question, but then there's also peer judgment in this one-on-one setting with an AI tutor. You can basically say anything and it's going to be fine. So we find that the learners really appreciate that.
But then what about trust? Do you find that people end up believing the AI more than they would a sort of human tutor? Sometimes we do. So we had this very interesting situation where In the very first stages of developing the AI tutor, we wanted to test it out, how it compares to human teachers. So we connected paid raters who were told, look, you have this opportunity to learn different subjects. You will get connected to a tutor. And we didn't tell them whether it was an AI or a human. And just, you know, have fun, enjoy the learning experience. And after that. . .
they were given a questionnaire and in this questionnaire we asked things like you know how much do you think you've learned how much did you enjoy the experience and so this was the very first version of our tutor which we knew was quite bad um and we found surprisingly that the learners reported having learned more with an ai tutor than than a human so that seemed strange um So we decided to kind of look through the transcript to understand what is going on there. And we found that the AI tutor hallucinated all sorts of interesting, surprising facts that, of course, as a learner, pretty much everything the tutor says sounds like, I did not expect that. This is a fun fact I just learned today.
So, of course, they were very impressed and felt like they have learned more. But in fact, this is not. . . Something that the tutors should be doing and definitely something that we worked on to address in the future iterations. Is that a concern going forwards? I mean, the idea of hallucinations and people mistaking those for real knowledge? It is definitely a concern. The base technology is getting better at factuality and also with education because we're teaching some material that is known. So there's always some sort of grounding. We. . .
our tutors kind of like avoid some of these issues of factuality by just being able to say, you know, I'm only teaching you about this particular YouTube video or this particular piece of text that your teacher has provided and kind of referring facts towards that primary source. So it's kind of gives the tutor less opportunity to actually make things up. I just wanted to think actually also about the effect that large language models have had on education. more generally, so outside of a specific AI tutor, because there is a big question that everyone has been asking, which is about putting in safeguards to stop AI being used to cheat or to, you know, sort of do people's exams for them or do people's essays for them.
So what kind of safeguards can you put in? This technology is so pervasive. And we actually, we talked to students about their use of Gen AI and Even we were surprised by how much they used it. So literally they were saying that their screen is kind of their lecture, their notes, and then Gen AI at the bottom. I think the technology is here to stay and it will be used by the learners. What can be done is trying to encourage learners to kind of critically evaluate the responses, trying to maybe change how we evaluate, like what are the assignments, so that it actually works with the technology. Because if you think about it, education is preparing us for the real world.
And in the real world, I think the expectation will be more and more to actually work with this technology because it does help in many ways and does make us more productive. So it doesn't make sense to ban it during education and then expect learners to know how to use it properly in their work. So maybe one way to think about it is, and I think that's what we've heard from teachers, is how to change assignments and the ways of teaching and working where Gen. AI is encouraged as a partner, but the evaluation is done slightly differently. So it's kind of like calculators in the past where you're allowed to use calculators in certain math exams, but you're still expected to know how to do these calculations without help.
I do wonder in the longer term as we start to see, I don't know, like gen AI being like the assistant at all times, whether we can end up building a bit of a dependency on them. I mean, do students end up with a feeling that they have mastery when they don't? Actually, it's the AI that's doing the work. So I think there are two potential issues here that you've identified. One is kind of this feeling of mastery when there isn't one. And this is a very common factor in any kind of learning, even if you're talking about traditional education. For example, one of the things that students do a lot in preparation for exam is just like reread their notes or reread the textbook.
And that kind of creates a feeling like they have mastered the material just because they are so familiar with it. But when they go into an exam, they actually find that they can't remember the facts. I can't use the information well. So this is kind of one of the things that is very well known to be kind of a bad educational strategy, just rereading. And we find the same with AI tutors, where if we ask a learner how they thought the conversation went and how much they think they've learned, they can report really good satisfaction.
Whereas if we give the same conversation to a teacher and then ask them the same question, how pedagogical was the tutor? How well do you think that session went? They could rate it very, very differently. I guess the other factor is this question of dependency. We definitely find that if the learners use Gen AI a lot during their studies, they feel like it's really helping in the. . .
process and actually studies show that it does increase their um like the success in exercises and kind of marks but when it comes to an exam actually the learner's performance drops and that's because during studies they get so dependent on the ai uh providing them with the answers or even if it guides them if it doesn't actually teach them the right things uh it might be like they're just outsourcing their reasoning to the AI. That can be a problem during exam conditions where you don't have access to it anymore and you don't actually remember or know how to reason through these problems on your own. I guess because the best exams aren't just testing knowledge, they're also testing skill. Exactly.
From all of your research then, what are the big conclusions that you draw about how to create an effective AI tutor? Do you reckon you've solved it? no nowhere near um and i would say we just made the very first step and that step is kind of realizing how hard of a problem this is i think when we first started doing this work we were naive and kind of wide-eyed thinking that we will come in and solve it within a year but now i think we we have a better idea of of the scope of the problem and kind of what are the main things to address to start making meaning meaningful progress and these are things like how do we know success like what how do we measure uh pedagogy um Where do we get the data? How do we actually train these models? And also how to engage the communities better and who are we building for so that we are not accidentally increasing the gap in education, but are making meaningful steps towards decreasing them.
So I think there is a very long. . . road ahead of us and actually we think that we really need to bring all of the community to work on this problem together. So we are trying to kind of create common benchmarks that we can all climb together. That was really nice. Really nice. Thank you for joining me, Irina. I was really struck in that conversation with Irina by the notable shift in the sorts of problems that are being considered in this building.
We've gone from dealing with definites, right? Like like winning or losing at chess or go or recognising cat or no cat in images, to education, a space with no absolutes, only imperfect measures in every direction, in what counts as good teaching, in what counts as an effective learning experience, in how to get the balance between knowledge and skills and learning how to learn, or to walk the line between how much a tutor should prompt and how much it should withhold, even how human a tutor should be. None. of those questions have ground truths. And that is what makes this challenge so incredibly difficult, but also one, as beautifully demonstrated by Irina there, which requires humility and collaboration to solve. You've been listening to Google DeepMind the podcast with me, Professor Hannah Fry.
If you enjoyed that episode, do subscribe to our YouTube channel. You can also find us on your favourite podcast platform. And we have got plenty more episodes on a whole range of topics to come. So do check those out too. .