Vector DB and data processing service

Vector DB and data processing service

Company
๐Ÿ“ฑ GutDiaries
Skill Area
๐Ÿ“Š Data & AnalyticsProduct & Development
Tool Stack
Zilliz (vector database)vector embeddingsPythonLLMs
๐Ÿ“‹

Overview

๐Ÿ” Project context
A specialized backend service for GutDiaries that fetches ingredient pH and FODMAP content using vector databases and LLMs. The system combines existing food data with AI-generated insights to provide users with digestive health information during food logging.
๐Ÿ‘ฅ UX Driver
Developed in response to user surveys revealing that GERD and IBS sufferers needed immediate pH and FODMAP information while logging meals. This feature directly addressed a gap in the market where existing apps lacked real-time nutritional guidance for digestive conditions. Bridges the critical gap between first use and long-term retention that (in order to entice users ASAP in their experience to increase day-over-day engagement)
โš™๏ธ Technical approach
Self-expanding knowledge base that learns from user queries Hybrid search approach (scalar + vector + AI fallback) JSON-structured API with multiple specialized endpoints
๐Ÿ’ก

UX-Driven Driver for the Features

  • User-driven development: Developed in response to user surveys highlighting the need for immediate pH and FODMAP information during food logging
  • Retention driver: Designed to create an "aha moment" during initial food logging to motivate continued app usage
โš™๏ธ

Technical Implementation

I designed a multi-layered food analysis API with three main endpoints:

  1. Food to Ingredients Conversion - Converts food names into standardized ingredient lists using Googleโ€™s Gemini API
  2. pH & FODMAP Ingredient Analysis - Fetches data for individual ingredients for digestive health factors (i.e. fermentability which affects IBS and pH which affects acidity)
  3. Comprehensive Food Summary - Provides gut health insights for complete meals based on data from the first two endpoints

The service uses a 3 tiered approach:

  • First attempts exact matches via scalar search
  • Falls back to semantic similarity through vector embeddings
  • When no match is found, uses a pre-trained Gemini AI model to generate reasonable estimates
  • Automatically stores AI-generated results to continually expand the knowledge base

Highlights

  • Vector database Integration: Created a custom vector embedding system (using Gemini embeddings model) with Zilliz that stores and retrieves food data based on semantic similarity
  • 3 tier search: Implemented a three-tier search strategy that gracefully goes down from exact matches to AI inferences
  • Self-learning architecture: Designed an async process that captures AI-generated insights and feeds them back into the vector database
image

Swagger Docs

(Requires Bearer token, contact me if youโ€™d like one to test the API for any reason)

https://food-engine.gutdiaries.com/docs
๐Ÿ“Š

Impact & Results

  • Engagement metrics: Significantly increased user retention by providing value from the very first food logging experience
  • User satisfaction: Achieved higher app store ratings as users appreciated immediate, actionable gut health insights
  • Knowledge expansion: Self-expanding database continues to improve with each user query
  • Competitive advantage: Created unique market positioning by addressing the key user need identified in surveys that competitors had overlooked