top of page


How RL Changed My Taste in AI Systems
I used to treat reinforcement learning as a mysterious corner of machine learning where agents somehow “figure it out” through trial and error. The more I read, the more I realized that the mystery comes from a single twist: the feedback is delayed, noisy, and often sparse. Once you accept that, RL stops being magic and starts being a very specific kind of optimization problem that punishes sloppy assumptions. What follows is the learning path that actually worked for me. It
5 hours ago6 min read
Reinforcement learning vs “regular” training: the real difference is not the math, it is the loop
Most ML people grow up on a simple mental model: you have a dataset, you define a loss, you run gradient descent, you ship a checkpoint. That covers supervised learning and a lot of self-supervised pretraining. The model is learning from a fixed distribution of examples, and the training pipeline is basically a linear flow from data to gradients. Reinforcement learning (RL) breaks that mental model because the model is not only learning from data, it is also actively creating
Jan 267 min read


GPT-OSS Safeguard as Policy-Executable Safety, and the Cabinet Briefing Risk Scanner Built on Top of It
Abstract This article presents a systems-focused account of how GPT-OSS Safeguard can be used as a policy-executable safety component and how that capability can be operationalized into a real workflow for high-stakes government communications. The case study is a Cabinet Briefing Risk Scanner, an AI tool that reviews draft communications prior to distribution by applying an explicit written risk policy, treating the analyzed text as untrusted, and emitting strict structured
Jan 314 min read
2025: The Year I Bet on Myself
On December 30th, 2024, I finished my last day at IBM. It was the kind of ending that looks simple from the outside, but internally it carried years of thought and a lot of quiet pressure. I wasn’t leaving because I hated the work, and I wasn’t leaving because something broke. I was leaving because I could feel myself outgrowing the comfort of a structured path. IBM gave me discipline, exposure, and a solid environment to sharpen my skills, but I kept feeling a stronger pull
Jan 1, 20267 min read


Research Imperatives and the Struggle for Algorithmic Dominance
Rethinking the Question of AI Supremacy In the world of artificial intelligence, the pace of advancement has become dizzying. Breakthroughs that seemed like science fiction only a few years ago are now real and At the center of this whirlwind are three pioneering labs – Google DeepMind , OpenAI , and Anthropic – engaged in a high-stakes race for influence, innovation, and the future of intelligence . A common question asked is, “Which AI company is the best?” However, frami
Dec 14, 202539 min read
Dec 13, 20250 min read
From Scaling To Research: Reflections On The Ilya Sutskever Conversation With Dwarkesh
There is a moment in the recent Dwarkesh Podcast episode with Ilya Sutskever that captures a turning point in how the AI community understands its own progress. Sutskever, one of the central figures behind modern deep learning and now the founder of Safe Superintelligence Inc., looks back at the last few years and says, in effect: the era when simply scaling models was the main engine of progress is ending. It is time to return to the age of research, only this time with very
Dec 2, 202510 min read
bottom of page




