Stepping into the Future: How StepFun AI Desktop Agent Aims to Outpace Claude Cowork
Claude Cowork has taken the AI world by storm, spotlighting locally-based AI Agent products. But StepFun AI, launched in September 2025, is already pushing further—building a desktop-first agent with deeper integration, proactive assistance, and a bold vision for the future of human-AI collaboration.

🔍 A New Era of Desktop Agents
With the rise of Claude Cowork, attention has turned sharply toward AI agents operating directly on users’ devices. However, StepFun AI (from StepFun Stars) entered this space months earlier with its StepFun AI Desktop Companion, now available for both Mac and Windows in free preview.
Unlike cloud-centric models, StepFun’s approach emphasizes local execution, file handling, and task automation—positioning it as more than just an assistant, but a proactive collaborator embedded in daily workflows.
🔗 Download now: https://www.stepfun.com/download
🔄 Part 1: Beyond Claude Cowork — Vision, Strategy & Divergence
Shared Goals, Different Paths
In an exclusive interview with Zhong Jingwei, Product Lead at StepFun AI, we explored how their agent compares to emerging competitors like Claude Cowork:
“We’re moving one step further in terminal agent exploration. Features like Global Memory and the floating window interface allow deeper context awareness and smoother interaction. While others refine agent precision, we’re expanding the boundaries of what’s possible on-device.”
Both aim for end-to-end task automation, but differ in strategy:
| Feature | StepFun AI | Claude Cowork |
|---|---|---|
| Execution Environment | Local-first (desktop app) | Cloud-assisted |
| User Interaction | Floating panel + proactive prompts | Chat-focused |
| Context Handling | Global memory across sessions | Context-limited per session |
| Installation Barrier | Higher (requires download) | Lower (web-based) |
Despite differences, both recognize a shared trajectory: hybrid端云协同 (edge-cloud synergy). Yet due to cost and latency, starting local offers richer context access—even if it means lower initial adoption.
The Core Challenge: Agent Capability & User Adoption
Two universal hurdles remain:
- Agent Performance: Making agents faster, more reliable, and cheaper.
- User Awareness: Most people don’t know when or how to use agents.
To tackle performance, StepFun introduced “Miaoji” (妙计) — reusable workflows akin to Anthropic’s Skills, but with broader customization. These encapsulate scripts, templates, and validation rules, turning one-time actions into repeatable tools.
“Imagine finishing a perfect Excel analysis once, then saving that entire process as a single-click tool. That’s Miaoji. It’s not just automation—it’s institutional knowledge made executable.”
And soon, they plan autonomous Miaoji creation: if the agent detects you’ve successfully completed a complex task twice, it may suggest saving it automatically.
On user adoption, real-world testing revealed a surprising gap: even professionals rarely realize AI can help them. Three teachers interviewed used the same product in wildly different ways—lesson planning, grade analysis, admin work—and only discovered new uses after sharing tips.
This highlights a key insight: education through demonstration is critical.
💡 Part 2: Defining the Product — An “Exploratory” AI Layer
StepFun AI isn’t trying to replace apps. Instead, they envision a new layer atop existing systems:
“Think of it as an AI Processing Layer sitting above Web, Apps, and databases. Not replicating old flows—but creating new outcomes and new states.”
🆕 Three Types of “New Results”
- New Information: e.g., Deep Research that synthesizes web data beyond surface-level summaries.
- New Media: Like auto-generating PPTs from videos (à la NotebookLM).
- New Interfaces: Custom UIs built dynamically—such as reformatting leaked PDF emails into Gmail-like views.
🔄 Two Kinds of “New States”
Actions that change your relationship with the world:
- Auto-filling forms
- Sending messages on your behalf
- Scheduling meetings proactively
These split into two core experiences:
| Experience | Description | Example |
|---|---|---|
| Task Execution | Command-driven (like J.A.R.V.I.S.) | “Summarize all my Q3 sales reports.” |
| Browsing Operation | Immersive, dynamic UI (like Iron Man’s desk) | Real-time dashboard generation |
For now, StepFun focuses on task execution, believing browsing enhancements aren’t yet compelling enough.
🎯 Part 3: Why Start with Desktop & Office?
Their choice of platform wasn’t random. Here’s why desktop wins—for now.
🖥️ Terminal Choice: Depth Over Reach
Desktops offer:
- Deeper OS integration
- Secure access to local files and browsers
- Richer context via screen reading and activity tracking
While mobile and browser extensions have potential, desktop provides the most fertile ground for early experimentation.
Even automotive systems are promising—high data availability and voice-friendly environments make car-based agents highly effective.
📊 Use Case Focus: Solve Painful, Complex Tasks
Why office work?
“If an agent fails 2 out of 10 times ordering coffee, users abandon it. But in offices, tasks are long, tedious, and manual. Even 60–70% success feels valuable.”
So StepFun targets two high-friction scenarios:
- File Processing – Data cleaning, batch renaming, report merging
- Bulk Information Extraction – Scraping social media, updating spreadsheets automatically
One parent even created a self-updating spaced-repetition vocabulary quiz, where correct answers trigger automatic deletion from a child’s mistake list.

⚙️ Part 4: Miaoji — Lowering the Barrier to Power
“Miaoji” isn’t just a feature—it’s a bridge between novice users and advanced automation.
What Miaoji Enables:
- ✅ Simplify Repeated Actions
- ✅ Lower Prompt Engineering Barriers
- ✅ Preserve Scripts as Assets
- ✅ Pave the Way to Autonomous Learning
Currently, usage falls short of expectations—not because of functionality, but discoverability.
Many users miss the / trigger hint in the input box. Others don’t understand what “Miaoji” means without guidance.
Future updates will include:
- Stronger visual cues
- Pre-loaded templates (e.g., McKinsey-style reports)
- In-app tutorials
- Community sharing incentives
“We want users to ask: Can StepFun try this too? That’s the mindset shift we’re after.”
🤖 Part 5: Proactive Service — The Next Frontier
True intelligence isn’t reactive—it’s anticipatory.
The Vision: From Assistant to Anticipator
“Ideally, before our interview today, the agent would’ve downloaded the product for you and said: ‘You’re discussing agents—want to try one?’ Or prepared a competitor analysis on Manus, Genspark… automatically.”
That future requires deep contextual understanding and trust.
Today, StepFun explores proactivity in two ways:
- Curated Scenarios
- Auto-create to-dos from screen content
- Daily recap suggestions
- Recommend relevant Miaoji based on behavior
- User-Defined Rules
- “Every night at 9 PM, summarize my Obsidian notes”
- “When I’m distracted, remind me to focus”
Privacy remains paramount—especially with screen monitoring. All processing happens locally, ensuring security while building smarter models.
As Zhong notes:
“Proactivity helps cross the chasm. Early adopters find clever uses; we package those insights so mainstream users benefit seamlessly.”
📁 Part 6: Real-World Usage — What Are Users Actually Doing?
Data shows clear patterns in usage:
| Use Case | Share |
|---|---|
| File Processing | 40% |
| Info Gathering | 30% |
| Quick Q&A / Misc | 30% |
Creative File Use Cases:
- HR Automation: Merge multiple attendance sheets across departments
- Research Aid: Download arXiv papers and rename them intelligently based on content
- Invoice Management: Auto-classify and organize financial documents
One standout example? A parent using scheduled quizzes from a curated error log—turning static files into evolving learning tools.
🧠 Part 7: Model Matters — But So Does Everything Else
Is “Model = Product” Still True?
“Strong models still matter. They bring inherent credibility and virality. When a smarter model emerges, people test it immediately. There’s natural momentum.”
Yet domestic agentic models still lag behind global leaders. Breakthroughs like Gemini 3 continue unlocking new capabilities.
But StepFun believes the future lies in integration:
- Personal data access
- Workflow embedding
- Experience design
- Feedback loops for model training
“A brilliant model trapped in poor UX underperforms a good model in a well-designed system.”
Recent updates reflect this:
- Enhanced Global Memory for cross-session awareness
- Internal demos of proactive query suggestions
- Plans to publish curated Miaoji examples
And yes—the product feeds back into model development:
- Popular Miaoji become benchmarks for model improvement
- Real user tasks generate high-quality synthetic training data
- User feedback refines environmental signals
🌐 Part 8: The Future of Interaction — Conversation First
When asked about agent marketplaces like Macaron or MuleRun, Zhong emphasized simplicity:
“The lowest-cost way to solve a task is conversation. Just say what you want. The agent finds or creates the right Miaoji. No app store, no installation, no learning curve.”
He praises innovations like:
- Hero AI: Context-aware input fields that suggest structured inputs mid-typing
- Sky.app (by OpenAI): Elegant floating window UX
- MineContext (ByteDance): ADHD-focused context management
But cautions: unless there’s a true unmet need, standalone tools struggle against discoverability and retention.
“It’s like browser extensions. Great in theory—but who remembers to install them?”
Instead, StepFun bets on conversational invocation as the dominant paradigm.
🏁 Final Thoughts: Rethinking Human-Machine Collaboration
StepFun AI Desktop Companion represents more than a productivity boost. It’s a prototype for a future where:
- AI doesn’t wait for commands—it anticipates needs
- Tools aren’t downloaded—they’re created on demand
- Workflows aren’t linear—they evolve autonomously
While challenges remain in model strength, privacy, and user education, StepFun is betting that starting local, thinking big is the right path forward.
And if their vision holds, one day we might all work like top tech executives—with armies of silent agents doing the heavy lifting, leaving us only to decide.
Article adapted from Founder Park. Interview conducted November 2025, supplemented January 2026.