Blog post written by Kshetrajna Raghavan and Ricardo Tejedor.
Our product classification system processes over tens of millions of predictions daily with a high degree of accuracy. But what happens when the taxonomy that powers it needs to grow and adapt? We needed a solution that ensures that our 10,000+ categories and 2,000+ attributes continue to serve merchants and customers as commerce evolves.
The solution: an innovative AI multi-agent system that goes beyond classifying products—one that actively improves the Taxonomy labels themselves, keeping our system agile and future-proof.
The challenge: Keeping taxonomy current at scale
The taxonomy that powers our product classification system faces its own scaling challenges. Commerce never stands still: new products emerge, existing categories evolve, and merchant needs shift as markets change.
The volume problem
Managing a product taxonomy on a worldwide scale requires constant attention. Every new product type, emerging technology category, and seasonal trend potentially requires taxonomy updates. Traditional manual curation simply can't keep pace.
For example, consider the rapid emergence of categories like smart home devices, sustainable products, or remote work equipment. Each category represents not just new categories but also entirely new attribute sets.
Smart home devices need connectivity types, power requirements, and compatibility– specifications that didn't exist in our taxonomy before.
The expertise problem
Effective taxonomy design requires deep domain expertise. Understanding the nuanced differences between types of guitar pickups, the classification hierarchy for industrial equipment, or the appropriate attributes for skincare products demands specialized knowledge across dozens of verticals.
Our taxonomy team couldn't maintain expertise across every product category that merchants sell. Yet inconsistent or poorly structured taxonomy directly impacts merchant success through reduced discoverability, suboptimal search results, and ineffective filtering options for customers.
The consistency problem
As our taxonomy grew organically, inconsistencies crept in. We began noticing similar concepts represented differently across categories, along with inconsistencies in naming conventions and discrepancies in product categorizations between merchants and customers.
These inconsistencies compound over time, creating confusion for merchants during product listing and frustration for customers trying to filter and compare products. More critically, they reduce the quality of our classification system.
Our journey: From manual curation to AI-driven evolution
Our approach to taxonomy management has evolved significantly. What started as manual processes has transformed into an AI-driven system that scales with the complexity of modern commerce.
Traditional taxonomy management
Initially, taxonomy updates followed a familiar pattern: domain experts would analyze product data, identify gaps or inconsistencies, propose changes, and implement them through careful manual review processes. While this approach ensured quality, it created bottlenecks that limited our ability to keep pace with the rapid evolution of commerce.
The process was inherently reactive—we'd recognize the need for new categories or attributes only after merchants began listing products that didn't fit well within existing structures. By then, we'd already missed opportunities to provide better experiences for both merchants and customers.
The agent-based breakthrough
The emergence of advanced language models presented an opportunity to reimagine taxonomy management entirely. Rather than replacing human expertise, we envisioned AI agents that could augment our team's capabilities, providing the scale and consistency needed while preserving the quality and domain knowledge that manual curation provided.
Our breakthrough came from recognizing that different types of analysis could be combined for more comprehensive insights. Some improvements become clear when analyzing the logical structure of the taxonomy itself, identifying gaps in category hierarchies or missing attribute relationships. Others emerge only when examining real product data, understanding how merchants actually describe their products and what attributes would help customers make purchasing decisions.
Technical deep dive: The AI agent architecture
Our AI agent system is built on three foundational principles: specialized analysis, intelligent coordination, and quality assurance. Each component addresses specific challenges we identified in scaling taxonomy management while maintaining the expertise and consistency our platform requires.
A novel approach to taxonomy evolution
While AI has been applied to product categorization and basic taxonomy creation, our system represents a fundamentally different approach: using specialized AI agents for continuous taxonomy evolution rather than static construction.
Real product grounding: Our system integrates actual merchant product data from our platform, ensuring that proposed changes reflect how merchants actually describe and categorize their products. This grounds taxonomy decisions in commerce reality rather than purely theoretical organizational principles.
Multi-agent specialization: We employ multiple specialized agents—one focused on structural consistency, another on product-driven insights—that are intelligently synthesized. This combination discovers improvements that neither approach would identify alone.
Sophisticated equivalence discovery: Perhaps most uniquely, our system detects complex equivalence relationships where specific categories equal broader categories filtered by attribute values. This enables the crucial commerce insight that merchants should organize their catalogs however serves their business best, while platform systems understand the underlying product relationships.
System architecture flow
Enabling agent-taxonomy interaction
For AI agents to effectively analyze and improve taxonomy, they need sophisticated ways to explore, understand, and validate the existing structure. We implemented a system that allows agents to search for related categories, examine hierarchical relationships, and verify whether proposed changes might conflict with existing elements.
This foundation enables agents to perform contextual analysis—understanding not just isolated categories or attributes, but how they fit within the broader taxonomy structure. An agent analyzing guitar-related categories, for example, can explore the entire musical instruments hierarchy, examine related attributes across different instrument types, and identify patterns that inform better structural decisions.
Multi-Stage analysis pipeline
Our analysis pipeline combines different types of expertise through specialized agents, each optimized for specific types of insights.
Structural analysis examines the logical consistency and completeness of the taxonomy itself. This agent identifies gaps in category hierarchies, inconsistencies in naming conventions, and opportunities to better organize related concepts. It operates purely on the taxonomy structure, ensuring logical coherence and consistent organization principles.
Product-driven analysis integrates real merchant data, examining how products are actually described and categorized on our platform. This agent analyzes patterns in product titles, descriptions, and merchant-defined categories to identify gaps between how merchants think about their products and how our taxonomy represents them.
Intelligent synthesis merges insights from both approaches, resolving conflicts and eliminating redundancies. When structural analysis suggests one improvement and product analysis suggests another, this synthesis process determines the optimal path forward, often combining insights from both sources.
Equivalence detection solves a fundamental challenge in commerce: how to maintain merchant flexibility while enabling intelligent system behavior. This autonomous agent identifies when different taxonomy approaches represent identical product sets, creating the foundation for systems that understand product relationships across merchant organizational preferences.
Consider golf shoes—one merchant might create a specific "Golf Shoes" category, while another uses "Athletic Shoes" with an "Activity Type = Golf" attribute. Both approaches serve their merchant and customers' needs perfectly, but our search, recommendation, and analytics systems benefit from understanding that these represent the same products.
The system detects sophisticated attribute-based equivalences: a specific category can equal a broader category filtered by one or more attribute values. "Women's Golf Shoes" might be equivalent to "Athletic Shoes" + "Activity Type = Golf" + "Gender = Women." This enables merchants to organize their catalogs however makes sense for their business while ensuring that platform intelligence works seamlessly regardless of their chosen taxonomy approach.
Automated quality assurance
The final stage introduces automated quality assurance through specialized AI judges. These judges evaluate proposed changes using advanced reasoning capabilities, applying domain expertise and taxonomy design principles to filter and refine suggestions before human review.
Different types of changes—adding new attributes, creating category hierarchies, or modifying existing structures—require different types of evaluation. Our judge system uses specialized evaluation criteria for each change type, ensuring that technical requirements, business rules, and domain expertise are properly applied.
Domain-specific judges provide specialized expertise for different product verticals. A judge focused on electronics, for example, understands the technical requirements and common patterns specific to that industry, while a judge specializing in musical instruments applies different expertise relevant to that domain.
Results & impact
The transformation from manual taxonomy curation to AI-driven evolution has delivered significant improvements across multiple dimensions, enabling us to maintain taxonomy quality while scaling to meet the growing complexity of commerce on our platform.
Efficiency gains
Our AI agent system can analyze entire taxonomy branches in parallel, identifying improvement opportunities that would previously require weeks of manual analysis. Where taxonomy experts might analyze a few categories per day, our system can comprehensively evaluate hundreds of categories, examining both structural consistency and alignment with real product data.
This efficiency gain is particularly valuable for emerging product categories. When new product types gain popularity on our platform, our system can quickly identify taxonomy gaps and propose comprehensive solutions, rather than reactive patches that accumulate technical debt over time.
Quality improvements
The multi-agent approach has improved the consistency and comprehensiveness of our taxonomy evolution. By combining structural analysis with real product data, we identify improvements that neither approach would discover alone. Structural analysis ensures logical consistency and proper hierarchy organization, while product-driven analysis ensures that categories and attributes reflect how merchants actually describe and differentiate their products.
The automated quality assurance layer has proven particularly valuable, catching potential issues before human review and ensuring that domain-specific expertise is consistently applied across different product verticals. This has reduced the iteration cycles typically required between initial proposals and final implementation.
Consider how the system handled mobile phone accessories: our product analysis agent identified that merchants frequently advertise "MagSafe support" for accessories like chargers, cases, and wallets—a growing compatibility differentiator.
The agent proposed adding a "MagSafe compatible" boolean attribute to help customers filter for MagSafe-ready products. The specialized electronics judge evaluated this proposal, verifying that no duplicate attribute existed, confirming the boolean type was appropriate, and recognizing that while brand-specific, MagSafe represents a legitimate technical standard similar to Bluetooth or Qi charging. The judge approved the attribute with 93% confidence, noting it would "improve customer filtering for MagSafe-ready chargers, cases, wallets, etc."
This example demonstrates how our agents identify real merchant needs, propose solutions, and receive sophisticated evaluation—all working together to evolve the taxonomy systematically.
Scaling taxonomy development
Perhaps most importantly, the system has fundamentally changed how we approach taxonomy development. Rather than reactive improvements triggered by specific merchant needs or platform limitations, we can now proactively identify and address taxonomy gaps before they impact merchant and customer experiences.
The system's ability to process and reason about the entire taxonomy structure enables comprehensive improvements that consider cross-category relationships and maintain global consistency. This holistic approach prevents the fragmentation that often occurs when addressing taxonomy issues in isolation.
To validate this systematic approach, we applied our AI-powered taxonomy evolution method specifically to the Electronics > Communications > Telephony area (referred to as "Telephony AI" in our analysis), comparing it against our previous manual expansion approach. This focused implementation served as a proof-of-concept for the broader methodology:

Future directions
As AI capabilities continue to advance, we see exciting opportunities to further enhance our taxonomy evolution system and create even tighter integration with our product classification pipeline.
Enhanced agent capabilities
We're exploring how newer language models and reasoning capabilities can improve the sophistication of our analysis agents. Enhanced reasoning could enable more nuanced understanding of product relationships, better detection of subtle inconsistencies, and more sophisticated synthesis of conflicting insights from different analysis approaches.
We're particularly interested in expanding the domain expertise of our specialized judges, enabling them to handle increasingly complex product categories and emerging commerce trends with greater precision and understanding.
Cross-language support
As Shopify's global reach continues to expand, we're investigating how to extend our taxonomy evolution system to better support international commerce. This includes understanding how product categorization and attribute relevance might vary across different markets and cultures, and how to maintain consistency while allowing for regional customization.
Deeper integration with classification
The relationship between our taxonomy evolution system and our product classification pipeline presents opportunities for continuous improvement loops. Classification patterns and merchant feedback could inform taxonomy evolution priorities, while taxonomy improvements could immediately benefit classification accuracy and merchant acceptance rates.
We envision a future where these systems work together seamlessly, with taxonomy evolution informed by real-world classification performance and classification benefiting from continuously optimized taxonomic structures.
Conclusion
Our evolution of taxonomy management represents a fundamental shift from manual, reactive processes to AI-driven, proactive improvement systems. By combining multiple types of analysis, automated quality assurance, and human expertise, we've created a system that can scale with the complexity of modern commerce while maintaining the quality and consistency our merchants and customers depend on.
This work demonstrates how AI agents can augment human expertise in complex, knowledge-intensive domains. Rather than replacing human judgment, our system amplifies the capabilities of our taxonomy team, enabling them to focus on high-level strategic decisions while AI handles the comprehensive analysis and quality assurance that underpins effective taxonomy management.
As we continue to push the boundaries of what's possible in ecommerce infrastructure, we remain committed to building systems that scale with the diverse and evolving needs of our global merchant community. The AI-driven taxonomy evolution system represents another step in our ongoing mission to make commerce better for everyone.
Does this type of work resonate with you? Check out our careers page — we want to hear from you.
About the authors
Kshetrajna Raghavan is a Principal ML Engineer.
X: @kshetrajna
Ricardo Tejedor is a Senior Taxonomist.