John Schulman, co-founder of OpenAI and lead architect of ChatGPT, invented two key elements used in ChatGPT’s training. Proximal Policy Optimization (PPO) and Trust Region Policy Optimization (TRPO) were the results of his work in deep reinforcement learning. By combining large data learning with machine learning through trial-and-error, he helped usher in today’s AI revolution.
But before all that, John was working towards his PhD in Neuroscience at UC Berkeley. Let’s delve a little deeper into how he got started.
Academic Beginnings
John’s initial plan was to study Physics in the California Institute of Technology and then get a PhD in Neuroscience at Berkeley. He remembers choosing Berkeley because he had a “good feeling” about it–and because he liked the professors he talked to during visit day.
One of his lab rotations under the neuroscience program happened to be with Pieter Abbeel, director of the Berkeley Robot Learning Lab and co-director of the Berkeley Artificial Intelligence Research lab.
John already knew of (and was interested in) Abbeel’s work, citing helicopter control and towel-folding robots as the projects that specifically caught his eye. But when he started actually working in Abbeel’s lab, his interest quickly transformed into excitement. He found himself spending all his time there working on surgical and personal robotics.
It wasn’t long before he requested a transfer to Berkeley’s EECS (Electrical Engineering and Computer Sciences) department.
OpenAI Was a Sidequest
Interestingly enough, John joined and co-founded OpenAI before he finished his PhD in Computer Science.
After he’d done a few projects in EECS, John encountered a major concern. He realized that their current methods weren’t sophisticated or robust enough for real world applications. Any usable product they would conceptualize would need so much engineering for just one specific demo. It simply wasn’t realistic.
But rather than accept it as one of those “it is what it is” scenarios, John decided to tackle the problem head-on. He says (or, rather, writes) it himself in a guide he created for the OpenAI Fellows Program back in December 2017:
“The keys to success are working on the right problems, making continual progress on them, and achieving continual personal growth.”
He wasn’t about to back down.
He noted that, during that time, a lot of people had gotten pretty good results with deep learning. People in the field started analyzing what these results meant for AI, and John was one of them. He investigated the potential deep learning had for robotics and the conclusion he came to was reinforcement learning.
He hypothesized that complex neural network training on large amounts of data could be combined with machines learning through trial and error. This approach–which John christened “deep reinforcement learning”–could be the key to refining robotics for practical real-world usage.
With this new goal in mind, he joined OpenAI in 2015 so he could better research Artificial Intelligence. He thought their mission ambitious but, given that he already had an interest in AI, he wasn’t too skeptical. He figured that if there was any space where AI and AGI (Artificial General Intelligence) would be acceptable to talk about, it would be in this company.
In a recent interview with his old mentor, Pieter Abbeel, John recognizes that he was in the right place at the right time. AI was new, untapped technology, but the resources and approaches were steadily catching up. He wanted to research deep reinforcement learning even further for his PhD. There was an enterprising new company determined to engineer AGI–or AI that could match or exceed human intelligence.
All the pieces were perfectly in place–John just had to put in the work.
John’s Contributions
John definitely plays a crucial, on-going role in this AI-powered era of tech and innovation. Aside from being a research scientist, co-founder, and lead architect, he has also contributed to the following programs:
- OpenAI Gym
- OpenAI Baselines
- Stable Baselines
- TrajOpt
- Computation Graph Toolkit
- Procgen Benchmark
In 2018, John received the MIT Technology Review’s 35 Innovators Under 35 award. This would join his other two awards, C.V. Ramamoorthy Distinguished Research Award and ICRA 2013’s Best Vision Paper award.
And In His Downtime…
When he isn’t revolutionizing machine learning as we know it, John says he’s sometimes “a lazy person.” He still struggles to be productive and get things done.
Aside from tinkering with deep reinforcement learning, John likes to go rock climbing and running. He’ll wind down by going for a jog around the neighborhood or playing the piano. He also travels abroad for vacation whenever he can.
And when it gets to be too much, John has some chickens in his backyard that he enjoys taking care of.