Vector DB and data processing service

Vector DB and data processing service

Company
📱 GutDiaries
Skill Area
📊 Data & AnalyticsProduct & Development
Tool Stack
Zilliz (vector database)vector embeddingsPythonLLMs
📋

Overview

🔍 Project context
A specialized backend service for GutDiaries that fetches ingredient pH and FODMAP content using vector databases and LLMs. The system combines existing food data with AI-generated insights to provide users with digestive health information during food logging.
👥 UX Driver
Developed in response to user surveys revealing that GERD and IBS sufferers needed immediate pH and FODMAP information while logging meals. This feature directly addressed a gap in the market where existing apps lacked real-time nutritional guidance for digestive conditions. Bridges the critical gap between first use and long-term retention that (in order to entice users ASAP in their experience to increase day-over-day engagement)
⚙️ Technical approach
Self-expanding knowledge base that learns from user queries Hybrid search approach (scalar + vector + AI fallback) JSON-structured API with multiple specialized endpoints
💡

UX-Driven Driver for the Features

  • User-driven development: Developed in response to user surveys highlighting the need for immediate pH and FODMAP information during food logging
  • Retention driver: Designed to create an "aha moment" during initial food logging to motivate continued app usage
⚙️

Technical Implementation

I designed a multi-layered food analysis API with three main endpoints:

  1. Food to Ingredients Conversion - Converts food names into standardized ingredient lists using Google’s Gemini API
  2. pH & FODMAP Ingredient Analysis - Fetches data for individual ingredients for digestive health factors (i.e. fermentability which affects IBS and pH which affects acidity)
  3. Comprehensive Food Summary - Provides gut health insights for complete meals based on data from the first two endpoints

The service uses a 3 tiered approach:

  • First attempts exact matches via scalar search
  • Falls back to semantic similarity through vector embeddings
  • When no match is found, uses a pre-trained Gemini AI model to generate reasonable estimates
  • Automatically stores AI-generated results to continually expand the knowledge base

Highlights

  • Vector database Integration: Created a custom vector embedding system (using Gemini embeddings model) with Zilliz that stores and retrieves food data based on semantic similarity
  • 3 tier search: Implemented a three-tier search strategy that gracefully goes down from exact matches to AI inferences
  • Self-learning architecture: Designed an async process that captures AI-generated insights and feeds them back into the vector database
image

Swagger Docs

(Requires Bearer token, contact me if you’d like one to test the API for any reason)

https://food-engine.gutdiaries.com/docs
📊

Impact & Results

  • Engagement metrics: Significantly increased user retention by providing value from the very first food logging experience
  • User satisfaction: Achieved higher app store ratings as users appreciated immediate, actionable gut health insights
  • Knowledge expansion: Self-expanding database continues to improve with each user query
  • Competitive advantage: Created unique market positioning by addressing the key user need identified in surveys that competitors had overlooked