Agentic Coding: Context, Memory, Workflows, Skills, Subagents
Last updated on April 24, 2026
Agentic coding uses AI assistants as active participants in development under strict human oversight. Most teams treat coding agents like magic autocomplete. They’re not. Treat agents as supervised assistants under strict rules.
Why it matters:
- Speed – faster delivery by offloading boilerplate and routine tasks.
- Safety & Control – strict governance over what the agent can access and modify, with all changes reviewed before deployment
- Predictability – standardized workflows produce consistent results.
This document is a distillation of the most effective principles and workflows our team has adopted for software development. These practices are the product of extensive experimentation and are actively used to enhance our engineering efficiency.
We maintain an open-source B2B SaaS starter template with a production-ready agentic coding context setup: github.com/softcery/saas-template.
The Problem That Started Everything
The main problem was clear: the agent has no context about the project. It doesn’t know our architectural decisions, our coding rules, or the reasons why we made certain technical choices. Every time we start a session, the agent is like a new software engineer who has never seen our codebase.
That’s why context files are the solution. We needed to teach the agent how we work, document our decisions, and make sure he understood our limitations before it wrote a single line of code.
After building a proper context system, the same agent that went in circles now ships production code daily. It finally retains what we’ve taught it.
Six Steps to Actually Productive AI Coding
After months of iteration, here’s the system that actually ships production code. We’ll use a real example - building a voice agent integration that took one day instead of three.
Step 1: Build Your Agent’s Brain (System prompt)
We primarily use Claude Code and Cursor. But the concepts of context files, working memory, and structured workflows are universal — they apply equally to Codex CLI, Aider, Devin, Windsurf (now part of Cognition since July 2025), and other agentic tools.
Before writing any prompts, create a CLAUDE.md at your project root. For Cursor, the modern format is .cursor/rules/*.mdc — individual Markdown-with-frontmatter files for different scopes (core, framework, architecture, testing, security). The legacy single-file .cursorrules still works but is deprecated and is silently ignored in Agent mode, so any project using Cursor’s agent should migrate. These files become the agent’s persistent knowledge about your project — automatically included in every request.
Start with the essentials:
## **CRITICAL: Follow these personality guidelines strictly before responding:**
1. Exercise Quiet Confidence: Trust your abilities without needing to prove them. State what you know simply. Acknowledge uncertainty directly and explore options together.
2. ...
3. ...
4. ...
## **Further reading:**
- Foundation document that shapes this project as a product: @.ai/project-brief.md
- Backend architecture: api/.ai-knowledge/backend-architecture.md
- Frontend architecture: web/.ai-knowledge/frontend-architecture.md
- Backend tech stack: api/.ai-knowledge/backend-tech-stack.md
- Frontend tech stack: web/.ai-knowledge/frontend-tech-stack.md
- Engineering instructions: @.ai/engineering.md
## **Memory**
Follow the memory instructions in @.ai/memory.md
## **Workflows (Tools)**
- Code review workflow for backend: api/.ai-knowledge/backend-code-review-workflow.md
- Code review workflow for frontend: web/.ai-knowledge/frontend-code-review-workflow.md
- Absolutely follow this file always when user provide git difference or another context and ask you to make review
- Task preparation workflow in @.ai/tools/task-preparation-workflow.md
- Don't follow this workflow if user didn't specify it
- Task execution workflow in @.ai/tools/implementation-workflow.md
- This workflow is executed after task-preparation-workflow.md, so at this stage there is already a specific folder for the task and trd.md, implementation-strategy.md
- Absolutely follow this file always when user ask about it
This modular approach keeps the main file clean while providing comprehensive context. The agent now understands your constraints before writing a single line.
Step 2: Create Working Memory
Agent keep a history of conversations, but they don’t track the state of the project between sessions. If you start a new chat, the agent won’t know that you refactored the authentication module yesterday, won’t take into account how we store JWTs, or where you’re stuck in a complex implementation.
We achieve this by instructing the agent to log its sessions and update core documentation. The agent follows a strict protocol for documenting its work, ensuring that architectural changes, new dependencies, or key outcomes are captured permanently. This turns a series of stateless conversations into a continuously evolving knowledge base.
Create .ai/memory.md that tracks what actually matters:
# Working Memory
1. Document sessions in @.ai/sessions.
2. Use the following file format: "{yyyy-mm-dd}-{title}.md".
3. Use the following memory log file format: title, description, session log, session outcomes, lessons learned (if any).
4. Keep memories concise, only document what is worth documenting.
5. Document only when user asks you to.
6. You should also adjust @.ai/architecture.md, @.ai/tech-stack.md if, for example, we have made changes to the project architecture or added a new dependency.
The instructions for this behavior are defined in a dedicated file (@.ai/memory.md), which is referenced in the main CLAUDE.md or rules file.
Now the agent builds on previous work instead of starting fresh each time.
Step 3: Define Reusable Workflows
Workflows are step-by-step instructions that tell the agent how to handle routine tasks. Instead of explaining how to review code or implement features repeatedly, we created reusable patterns the agent follows autonomously.
Create .ai/tools/task-preparation-workflow.md:
When given a new feature, the agent will:
- Create a knowledge folder for the task (e.g., knowledge/billing).
- Add
trd.md(Task Requirement Document) with:- Original requirements
- Acceptance criteria
- Dependencies
- Review existing code if specified
- Create
implementation-strategy.mdwith:- Database changes/API modifications/UI components/Testing approach
- Create
progress.mdfor tracking - Await next instruction
The critical part is to create implementation strategy – the implementation-strategy.md file. By having the agent create this detailed plan upfront, we save thousands of tokens in subsequent sessions. Instead of re-analyzing the entire codebase to understand what needs to be done, the agent simply reads its own strategy document and continues working.
- Read
/knowledge/{feature}/implementation-strategy.md - Check or create
progress.md - If new, break strategy into a subtasks checklist
- If existing, identify the next incomplete task
- Implement the next task
- Update
progress.md - Repeat until complete
In Claude Code, you have three places to put reusable workflows:
.claude/commands/*.md— explicit slash commands. Type/task-preparation voice-agent-featureto invoke..claude/skills/*.md— the recommended modern format. A skill’s body loads only when invoked, so long reference material costs almost nothing until needed. Each skill auto-gets a/slash-commandinterface and can be auto-invoked by Claude when its description matches the task..claude/agents/*.md— subagents that run tasks in their own isolated context window. Use them for parallel work or to keep heavy investigation out of your main session — only the summary returns to your conversation.
/task-preparation voice-agent-feature
/implementation voice-agent-feature
In Cursor, invoke by @-mentioning the rule or workflow file:
@.ai/tools/task-preparation-workflow.md implement voice agent
This two-phase approach — plan then execute — catches problems early and keeps work organized. With subagents, you can also run plan generation in one isolated context and execution in your main session, which keeps the planning artifacts out of the implementation context window.
Step 4: Connect to Live Systems (MCP)
The Model Context Protocol (MCP) — open-sourced by Anthropic in late 2024 and now standard across Claude, Cursor, Codex, Devin, Windsurf, and most other agentic tools — is the standardized communication layer that bridges the gap between an isolated AI agent and live development tools. MCP transforms the agent from a passive text generator into an active partner capable of performing real actions in our environment.
We use a network of specialized MCP servers to grant our agents specific, controlled capabilities, similar to how AI agent frameworks provide different orchestration patterns. Here are our primary use cases:
Playwright MCP (and the new Playwright CLI): Microsoft’s official Playwright MCP lets the agent drive a real browser through structured accessibility snapshots — DOM analysis, network inspection, console log review, form-filling, and end-to-end test generation. In early 2026, Microsoft released @playwright/cli as a token-cheaper alternative: a typical browser automation task costs ~114k tokens through MCP vs ~27k via CLI (~4× reduction). Microsoft now recommends the CLI for coding agents on cost-sensitive workloads. Replaces the older Browser MCP for most web-debugging use cases.
Anthropic Computer Use & Claude in Chrome: For tasks the agent can’t accomplish via Playwright accessibility (visual-heavy debugging, drag interactions), Claude’s computer-use mode and the Claude in Chrome extension let the model interact with arbitrary GUIs via screenshot + click/type. Slower and more expensive than Playwright, but the universal fallback.
Figma-Context-MCP: Allows agents to translate design into code. Provide a link to a Figma frame, and the agent retrieves layout information, component properties, and styling — then generates corresponding code with high fidelity to the original design.
Postgres-MCP: A dedicated MCP server exposes a secure interface to PostgreSQL development databases. The agent can run queries, analyze schemas, and suggest optimizations — for example, identifying missing indexes or rewriting inefficient queries based on an execution plan it requests through the server. Do NOT grant access to production databases.
Other servers we run: GitHub MCP (issue / PR / repo operations), Slack MCP (notifications and triage), Linear MCP (ticket lifecycle), and one custom MCP server per long-running internal service. Anthropic’s Channels feature (research preview) extends MCP so servers can push messages into a Claude session without being polled.
Important: Always be mindful of the data you transmit to the LLM and be conservative with the permissions you grant for command execution. Use the Hooks system (Step 7) for deterministic guardrails on top of model-level prompts.
Step 5: Write Prompts That Get Results
Specificity beats brevity every time. The agent can infer intent, but it can’t read minds. The difference between a prompt that works and one that leads to hours of revision is usually just missing details.
What We Learned About Specificity
Every vague instruction costs you iteration time. Watch the difference:
Waste of time: “add tests”
Actually works: “add unit tests for the payment retry logic in processPayment(), especially the exponential backoff when Stripe returns a 429”
Gets you nowhere: “make the API faster”
Gets results: “the /api/users endpoint times out with 1000+ results. Add pagination with 100 items per page, keep the response under 200ms”
Creates a mess: “add user management”
Ships to production: “create CRUD endpoints for user management at /api/admin/users following our existing REST patterns from /api/admin/teams. Include role checking - only admins can modify. Use the same error responses we have everywhere else”
Five Rules for Prompts That Actually Work
1. Give it something to copy Don’t describe patterns when you can point to them. “Build it like our TeamService class” beats a paragraph of architectural explanation.
2. Set boundaries explicitly The agent will helpfully “improve” things you didn’t ask it to touch. Always specify: “Only modify files in user module. Don’t touch the database schema.”
3. Define success specifically Replace “make it better” with measurable outcomes: “Refactor this part of code following SOLID principles” or “Use streaming to return messages as they are generated from llm”
4. Link to your source of truth You wrote documentation for a reason. Use it: “Follow the error handling described in @.ai/standards/errors.md.” This prevents the agent from inventing its own approach.
5. Demand a plan before code Complex tasks need thinking first. End with: “List your implementation steps and potential breaking changes before starting.”
Step 6: Build. Test. Improve.
Perfect setups don’t exist. It is something that will never be ready. Every project is unique. The key is continuous refinement. When an agent makes mistakes, constantly correct them.
When the agent makes mistakes, update your documentation:
Agent used wrong error format?
## **Error Handling**
- Always use AppError class:
- `throw new AppError('message', 'ERROR_CODE', statusCode)`
- Never use plain Error or console.error
Agent missed a performance issue?
Add to memory.md:
## **2025-01-16: Database Performance**
- Learned: Our users table has 2M rows
- Always use indexes for user_id lookups
- Never use LIKE queries on email field
Agent needs new capability?
Create a new workflow in .ai/tools/.
Every mistake becomes a prevention. Every success becomes a pattern.
Step 7: Security Measures and Hooks
Be attentive about what data you transmit to the LLM and restrictive with the permissions you grant. In 2026, Claude Code shipped Hooks — event-driven scripts that run deterministic code on agent events — which is a much stronger primitive than relying on prompts alone for guardrails.
For Claude Code (permissions): Use the permissions.deny block in ~/.claude/settings.json to forbid access to sensitive files. Read more.
{
"permissions": { "deny": ["./.env", "./secrets/**", "./.aws/credentials"] }
}
For Claude Code (hooks): Hooks run as deterministic shell commands, not model prompts, so they cannot be talked around. Anthropic ships these hook events:
PreToolUse— runs before every tool call; can block the call. Use to prevent edits to protected paths orrm -rfon the wrong directory.PermissionRequest— runs when a permission dialog is shown; can auto-allow or auto-deny. Use to enforce org-wide allowlists.PostToolUse/Stop/SessionStart— for logging, audit trails, and session-bootstrap automation.
Example PreToolUse hook to block edits outside src/:
# .claude/hooks/pre-tool-use.sh
case "$CLAUDE_TOOL_INPUT_PATH" in
src/*) exit 0 ;;
*) echo "blocked: edits outside src/"; exit 1 ;;
esac
For Cursor: Use a .cursorignore file at the project root. It functions like .gitignore, making any listed file or directory invisible to the agent.
Real Results from Real Projects
Faster Onboarding and Code Navigation: Instead of engineers spending days manually tracing logic through an unfamiliar codebase, the agent can instantly locate relevant code sections, explain architectural patterns, and identify dependencies. This has cut the effective onboarding time for new engineers from weeks to days, allowing them to contribute meaningful code much faster.
Rapid Unit Test Generation: Create comprehensive test suites in minutes, covering standard success and failure cases. What used to take hours of hand-writing edge cases can now be tested in 5 minutes. This significantly reduces the risk of regressions without slowing down the development cycle.
Automated Documentation Drafting: A well-defined prompt instructs the agent to generate high-level technical documentation for new features and endpoints. What previously took a developer 30 minutes to write is now a 5-minute review and editing process, ensuring the project remains maintainable over time.
Automated Testing and Debugging: Manually testing numerous edge cases for an API endpoint is inefficient. By leveraging the Model Context Protocol (MCP) or standard CLI tools, the agent can execute a predefined test suite against an endpoint in minutes, covering dozens of scenarios and generating a concise report of the results for human review.
Architectural Planning and Brainstorming: When faced with a complex architectural problem, developers use a Socratic dialogue with the agent to brainstorm solutions. This approach helps in quickly exploring trade-offs, evaluating different patterns, and solidifying a design, reducing research time and making individual developers more autonomous in their decision-making.
Accelerated Infrastructure and Deployment: The agent creates initial Terraform or CloudFormation configurations based on high-level requirements. The developer’s role has shifted from writing code from scratch to validating the deployment plan generated by the agent, enabling faster and more reliable infrastructure deployment.
Drastic Reduction in UI Development Time: Building a standard UI component like a modal window, which used to take an hour, is now completed in minutes. Using a Figma-Context-MCP or by providing screenshots of a design, the agent generates the corresponding code with high fidelity, allowing developers to focus on complex state management and interaction logic.
Efficient and Thorough Code Reviews: The agent acts as a first-pass reviewer on pull requests. It generates a summary of changes, highlights potential risks, identifies deviations from coding standards, and points out specific code blocks that require careful human attention. This ensures that even large PRs receive a thorough review without becoming a bottleneck or being approved blindly.
Automation of Routine Coding Tasks: Repetitive tasks that previously required searching documentation or copying snippets from Stack Overflow are now delegated to the agent. Simple tasks like adding a new CRUD endpoint, writing a database migration, or setting up boilerplate for a new service are completed almost instantly, freeing up developer time for higher-value work.
Tips and Best Practices
This is a list of practices we have integrated into our daily workflow.
Onboard with the Agent. When learning a new codebase or feature, use the agent as you would a human colleague in a pair programming session. Ask it about architecture, control flow, and purpose.
Delegate Git Operations. Agents can handle many Git operations. We use them for searching commit history and writing detailed commit messages.
Embrace Iteration. Perfect prompts do not exist. Treat prompt engineering as a continuous, iterative process. Always be refining your instructions to better fit your project’s specific needs.
Be Specific. The agent’s success rate improves with precise instructions. Giving clear directions upfront reduces rework.
- Poor: “add tests for billing.py”
- Good: “write a new test case for billing.py, covering the edge case where the user is logged out. avoid mocks”
Provide Visual Context. Use screenshots and images, especially when working from design mocks for UI development or analyzing charts. If you cannot provide an image, explicitly state the importance of visual appeal in the prompt.
Give agent URLs. For tools, libraries, or versions released after the model’s knowledge cut-off, include links to official documentation in your prompt to ensure accuracy.
Leverage Terminal Access. Since the agent has terminal access, instruct it to make API requests with curl for debugging or to use other CLI as part of its workflow.
Be an Active Collaborator. While autonomous modes exist, you get superior results by guiding the agent’s approach. Either create a detailed technical plan before implementation or course-correct the agent as it works.
Keep context focused: During long sessions, the LLM’s context window can fill with irrelevant conversation, file contents, and commands. In Claude Code, use /clear to start fresh between tasks, or /compact to auto-summarize the current session into a smaller seed context. For investigation-heavy tasks, delegate to a subagent so the verbose exploration stays in its own isolated window. In Cursor, start a new chat between tasks. Sonnet 4.6 supports a 1M-token context window with 300k output, but cheap context still beats expensive context — keep it tight.
What Still Needs Humans
Even with Sonnet 4.6 at 79.6% on SWE-bench Verified and 72.5% on OSWorld computer use, a few categories still belong to humans:
Complex cross-service refactoring: Agents can hold a lot more context than a year ago (1M tokens on Sonnet 4.6, 300k output), but spanning many services with message queues and databases simultaneously still requires human judgment about which invariants must hold across the boundary.
Business-critical architecture decisions: “Should we shard the database?” requires understanding growth projections, cost constraints, and business priorities — none of which sit in the codebase.
Design sense: “Make this dashboard more intuitive” produces functional but soulless interfaces. Models don’t have a feel for visual hierarchy or user psychology.
Production debugging under fire: When everything’s broken, you need pattern recognition and intuition that agents still lack — they’re great at hypothesis testing once you’ve narrowed the search space.
The Reality Check
Agents do not replace software engineers. They remove the low-value work – boilerplate, routine refactors, test scaffolding, code search – but they do not provide product judgment or architectural insight. They will not challenge unclear requirements.
Use them as supervised assistants with a clear task, scope, and definition of done. Keep guardrails in place: least-privilege permissions, required code review, CI checks, and audit logs. Measure impact instead of making claims – track lead time per PR, first-pass CI rate, review time, and defect rate. The result is higher throughput on repeatable work while humans stay accountable for design, trade-offs, and quality.
About Softcery: We’re the AI engineering team that founders call when other teams say “it’s impossible” or “it’ll take 6+ months.” We specialize in building advanced AI systems that actually work in production, handle real customer complexity, and scale with your business. We work with B2B SaaS founders in marketing automation, legal tech, and e-commerce—solving the gap between prototypes that work in demos and systems that work at scale. Get in touch.