Mentoring Tomorrow's AI Developers

AI Experiments in Supermarket Price Tag Recognition

Exploring Snowflake for Image Processing

In January, I began experimenting with Snowflake Document AI to process supermarket price tag images. The goal was to extract structured product data from shelf photos, leveraging the scalable infrastructure and built-in machine learning capabilities of Snowflake. While the platform offered promising tools for document analysis, my early results showed that the models trained within Snowflake struggled with the complexity and variability of real-world price tags. The extraction accuracy was limited, and the workflow proved less efficient than hoped.

Transition to Gemini 2.0 Flash API

Seeking improved performance, I tested a new approach using the Gemini 2.0 Flash API from Google. This shift marked a significant improvement in both speed and accuracy. The Gemini API demonstrated a much stronger ability to process diverse price tag images, handling variations in layout, fonts, and lighting that had challenged the Snowflake models.

How the Application Works

The new application workflow is streamlined for efficiency and reliability:

  1. Image Preparation: Uploaded images are automatically converted to JPG format (if needed), and their size is reduced to optimize processing speed.
  2. API Request: The prepared image is sent to the Google Gemini API for analysis.
  3. Data Extraction: The API processes the image and returns a structured JSON response containing extracted product and price information.
  4. Data Storage: Extracted product data is stored in a Supabase table for further querying and analysis.

This architecture enables rapid prototyping and easy scaling for new use cases, as detailed in the application overview and use case documentation.

Extending Core Functionality with Simple Prompt Changes

A standout feature of this solution is its core AI processing logic, which is implemented in just about 15 lines of code.

This compact design means you can easily adapt the application’s capabilities by simply modifying the prompt sent to the AI model. For example, you can:

  • Identify stamps and signatures in scanned documents.
  • Detect document types (as demonstrated in the ‘categorizer‘ app).
  • Check for specific key names or values within documents.
  • Apply custom extraction or validation logic for new use cases.

This modularity allows the application to evolve rapidly, supporting new requirements or document types without major code changes—just update the prompt and the AI will adjust its behavior accordingly.

You can also integrate this API as an MCP (Model Control Point) endpoint within larger enterprise workflows, enabling centralized and flexible document intelligence across your systems.

Potential Pricing

When considering production deployment, it’s important to understand the cost structure of the underlying AI service. For Google Gemini 2.0 Flash, pricing is based on the number of tokens processed (input and output), regardless of whether the content is text or image.

Pricing Table for Gemini 2.0 Flash

ItemPrice per 1M Tokens (USD)
Input (text/image/video)$0.10
Output (text/image/video)$0.40
  • Example: Processing a 1024×1024 image typically consumes about 1,290 tokens. Costs scale with the number and size of images processed. We can process over 500 images for 0.1 USD (cost depents also on size of output)
  • Context caching and storage have additional minor costs for advanced use cases, but are not required for basic processing

This token-based pricing model enables precise cost estimation and makes it affordable to process large batches of documents or images at scale.

Comparing Approaches and Future Directions

While the Gemini API provided a quick and effective solution, it is also possible to integrate other models within Snowflake. This flexibility could be valuable for organizations with specific requirements around data residency, customization, or integration with existing Snowflake data pipelines.

Considerations for Production-Scale Solutions

Leveraging Google’s API enabled rapid development and high-quality results, but several factors must be considered before choosing a long-term solution:

  • Scalability: Cloud APIs like Gemini are easy to scale for pilot projects, but costs can rise quickly at production volumes.
  • Data Storage: Storing sensitive product data in third-party clouds (e.g., Google, Supabase) raises questions about privacy and compliance.
  • Pricing: API usage fees and cloud storage costs must be weighed against the benefits of managed infrastructure.
  • Privacy and Security: For sensitive or regulated data, on-premises or private cloud solutions (such as custom models in Snowflake) may be preferable.

In summary, while rapid prototyping is easier with Google’s API, it is important to evaluate all technical and business requirements—especially for large-scale, privacy-sensitive deployments—before committing to a particular architecture.