1. Build for Future Models (Source: Kevin Weil, CPO of OpenAI)
The Exponential Improvement Mindset:
> "The AI model you're using today is the worst AI model you will ever use for the rest of your life. What computers can do changes every two months."
Core Principle:
- Don't design around current model limitations
- Build assuming capabilities will 10x in 2 months
- Edge cases today = core use cases tomorrow
- Make room for model to get smarter
How to Apply:
```
DON'T:
- "AI can't do X, so we won't support it"
- Build fallbacks that limit model capabilities
- Design UI that assumes current limitations
DO:
- Build interfaces that scale with model improvements
- Design for the capability you want, not current reality
- Test with future models in mind
- Make it easy to swap/upgrade models
```
Example:
```
Feature: "AI code review"
β Current-Model Thinking:
- "Models can't catch logic bugs, only style"
- Limit to linting and formatting
- Don't even try complex reasoning
β
Future-Model Thinking:
- Design for full logic review capability
- Start with style, but UI supports deeper analysis
- As models improve, feature gets better automatically
- Progressive: Basic β Advanced β Expert review
```
---
2. Evals as Product Specs (Source: Kevin Weil, OpenAI)
Test Cases = Product Requirements:
> "At OpenAI, evals are the product spec. If you can define what good looks like in test cases, you've defined the product."
The Approach:
Traditional PM:
```markdown
Requirement: "Search should return relevant results"
```
AI-Native PM:
```javascript
// Eval as Product Spec
const searchEvals = [
{
query: "best PM frameworks",
expectedResults: ["RICE", "LNO", "Jobs-to-be-Done"],
quality: "all3InTop5",
},
{
query: "how to prioritize features",
expectedResults: ["Shreyas Doshi", "Marty Cagan"],
quality: "relevantInTop3",
},
{
query: "shiip prodcut", // typo
correctAs: "ship product",
quality: "handleTypos",
},
];
```
How to Write Evals:
```
- Define Success Cases:
- Input: [specific user query/action]
- Expected: [what good output looks like]
- Quality bar: [how to measure success]
- Define Failure Cases:
- Input: [edge case, adversarial, error]
- Expected: [graceful handling]
- Quality bar: [minimum acceptable]
- Make Evals Runnable:
- Automated tests
- Run on every model change
- Track quality over time
```
Example:
```typescript
// Product Requirement as Eval
describe("AI Recommendations", () => {
test("cold start: new user gets popular items", async () => {
const newUser = { signupDate: today, interactions: [] };
const recs = await getRecommendations(newUser);
expect(recs).toIncludePopularItems();
expect(recs.length).toBeGreaterThan(5);
});
test("personalized: returning user gets relevant items", async () => {
const user = { interests: ["PM", "AI", "startups"] };
const recs = await getRecommendations(user);
expect(recs).toMatchInterests(user.interests);
expect(recs).toHaveDiversity(); // Not all same topic
});
test("quality bar: recommendations >70% click rate", async () => {
const users = await getTestUsers(100);
const clickRate = await measureClickRate(users);
expect(clickRate).toBeGreaterThan(0.7);
});
});
```
---
3. Hybrid Approaches (Source: Kevin Weil)
AI + Traditional Code:
> "Don't make everything AI. Use AI where it shines, traditional code where it's reliable."
When to Use AI:
- Pattern matching, recognition
- Natural language understanding
- Creative generation
- Ambiguous inputs
- Improving over time
When to Use Traditional Code:
- Deterministic logic
- Math, calculations
- Data validation
- Access control
- Critical paths
Hybrid Patterns:
Pattern 1: AI for Intent, Code for Execution
```javascript
// Hybrid: AI understands, code executes
async function processUserQuery(query) {
// AI: Understand intent
const intent = await ai.classify(query, {
types: ["search", "create", "update", "delete"]
});
// Traditional: Execute deterministically
switch(intent.type) {
case "search": return search(intent.params);
case "create": return create(intent.params);
// ... reliable code paths
}
}
```
Pattern 2: AI with Rule-Based Fallbacks
```javascript
// Hybrid: AI primary, rules backup
async function moderateContent(content) {
// Fast rules-based check first
if (containsProfanity(content)) return "reject";
if (content.length > 10000) return "reject";
// AI for nuanced cases
const aiModeration = await ai.moderate(content);
// Hybrid decision
if (aiModeration.confidence > 0.9) {
return aiModeration.decision;
} else {
return "human_review"; // Uncertain β human
}
}
```
Pattern 3: AI + Ranking/Filtering
```javascript
// Hybrid: AI generates, code filters
async function generateRecommendations(user) {
// AI: Generate candidates
const candidates = await ai.recommend(user, { count: 50 });
// Code: Apply business rules
const filtered = candidates
.filter(item => item.inStock)
.filter(item => item.price <= user.budget)
.filter(item => !user.previouslyPurchased(item));
// Code: Apply ranking logic
return filtered
.sort((a, b) => scoringFunction(a, b))
.slice(0, 10);
}
```
---
4. AI UX Patterns
Streaming:
```javascript
// Show results as they arrive
for await (const chunk of ai.stream(prompt)) {
updateUI(chunk); // Immediate feedback
}
```
Progressive Disclosure:
```
[AI working...] β [Preview...] β [Full results]
```
Retry and Refinement:
```
User: "Find PM articles"
AI: [shows results]
User: "More about prioritization"
AI: [refines results]
```
Confidence Indicators:
```javascript
if (result.confidence > 0.9) {
show(result); // High confidence
} else if (result.confidence > 0.5) {
show(result, { disclaimer: "AI-generated, verify" });
} else {
show("I'm not confident. Try rephrasing?");
}
```
Cost-Aware Patterns:
```javascript
// Progressive cost
if (simpleQuery) {
return await smallModel(query); // Fast, cheap
} else {
return await largeModel(query); // Slow, expensive
}
```
---