Building Apps with Generative AI

This document covers the complete AI application development journey, from ideation and model selection through building with RAG and fine-tuning to production deployment with MLOps best practices.

This document explores the three main phases of AI application development—ideation and experimentation, building with frameworks and techniques like RAG and fine-tuning, and operationalizing with MLOps—providing developers with practical guidance for creating production-ready AI-powered applications using open source tools and technologies.


Introduction to AI Application Development

Recent data from Gartner indicates that 80% of enterprises will have used some type of generative AI through models or APIs by 2026. While many developers have experience using AI through co-pilots in IDEs and popular large language models online, building applications that actually use AI represents a different challenge. The accessibility of AI development has improved significantly, making it easier than ever for developers to get started.

The journey from a simple proof of concept to a production application involves three main steps: ideation and experimentation, building the application, and development and operations. Open source tools and technologies are available to help throughout the inner loop development process of building, running, and testing applications with AI capabilities.


Phase One: Ideation and Experimentation

The first step in building an application that uses generative AI is ideating around exploration and proof of concepts. This phase can be broken down into several essential steps.

Model Research and Evaluation

Use cases are specialized, requiring specialized models that can do the job effectively. The process begins with researching and evaluating models from popular repositories like Hugging Face or the open source community. Different factors must be considered during evaluation:

FactorConsideration
Model SizeResource requirements and computational needs
PerformanceSpeed, accuracy, and task-specific capabilities
BenchmarkingEvaluation using popular benchmark tools

Ground Rules for Model Selection

Several general principles guide model selection decisions:

Self-hosting a large language model will generally be cheaper than a cloud-based service. Small language models (SLMs) versus large language models (LLMs) will generally perform better with lower latency and are specialized for specific tasks.

Prompting Techniques

Understanding various prompting techniques is essential when working with models:

Zero-shot prompting involves asking a model a question without any examples of how to respond. This tests the model’s baseline capabilities without additional context.

Few-shot prompting provides a few different examples of how to respond, demonstrating the behavior desired from the LLM as work progresses with the AI.

Chain of thought prompting asks the model to explain its thinking process step by step, making the reasoning transparent and verifiable.

Understanding Capabilities and Limitations

The different capabilities and limitations of models must be understood early in the process. Experimenting with data during this phase helps identify potential challenges that might arise as the AI journey progresses.


Phase Two: Building the Application

After evaluating models for the use case, the building phase begins. Similar to how databases and different services can be locally run on a machine, AI can also be served locally and accessed through its API from localhost.

Local Model Serving

Running models locally provides the added benefit of keeping data secure and private on premise. This approach becomes particularly important when working with sensitive or proprietary information.

Retrieval Augmented Generation (RAG)

RAG is a method for using data with large language models. The approach takes a pre-trained foundational model and supplements it with relevant and accurate data. This helps provide better and more accurate responses by grounding the model’s outputs in specific, up-to-date information.

Fine-Tuning

Fine-tuning represents an alternative approach where the large language model includes the data within it. This process bakes information about desired behavior, styles, and intuition directly into the model itself. When the model is inferenced, it has domain-specific data available every time, without requiring external retrieval.

These represent just two approaches among many available methods for incorporating data into AI applications.

Frameworks and Tools

Having the right tools and frameworks, such as LangChain, simplifies development work. These frameworks enable focus on building new features for popular generative AI use cases:

  • Chatbots
  • IT process automation
  • Data management
  • And much more

The simplification comes through streamlining the different calls made through the model using sequences of prompts and model calls to accomplish more complex tasks. Problems are broken down into smaller, more manageable steps, with evaluation of flows during model calls in both development and production environments.


Phase Three: Operationalizing AI Applications

The final step involves deploying the AI-powered application to production to enable scaling. This falls under the umbrella of machine learning operations, or MLOps.

Infrastructure Requirements

Infrastructure must be able to handle efficient model deployment and scaling. Technologies such as containers and orchestrators like Kubernetes help accomplish this by:

  • Auto-scaling based on demand
  • Balancing traffic for the application
  • Managing resource allocation efficiently

Production-ready runtime environments, such as vLLM, can be used for model serving to ensure optimal performance.

Hybrid Approach

Organizations are increasingly taking a hybrid approach with both models and infrastructure:

Multi-model approach: Different models for different use cases, similar to a Swiss Army knife with specialized tools for specific tasks.

Hybrid infrastructure: A combination of on-premise and cloud infrastructure to maximize resource utilization and budget efficiency.

Benchmarking and Monitoring

With AI-powered applications in production, the work continues. Ongoing requirements include:

  • Benchmarking performance
  • Monitoring behavior
  • Handling different exceptions from the application

Similar to how DevOps practices ensure smooth deployments, MLOps ensures models go into production in a controlled, reliable fashion.


The Developer Perspective on AI

Recent innovations in AI have made this topic much more accessible for developers. Numerous tools are available throughout the process to support development efforts. While AI represents new capabilities, it functions as another tool that can be added to the developer tool belt.

The complete process flows from ideation through building to deployment. These steps enable developers to make real impact with their work by leveraging generative AI capabilities in practical, production-ready applications.


Key Takeaways

The three-phase approach to AI application development provides a structured path forward:

The ideation and experimentation phase focuses on model research, evaluation, and understanding prompting techniques. The building phase leverages frameworks like LangChain and techniques like RAG or fine-tuning to create functional applications. The operationalization phase applies MLOps practices to deploy, scale, and monitor AI-powered applications in production environments.

Throughout all phases, open source tools and technologies provide the foundation for efficient development. Local model serving offers security and privacy benefits, while hybrid approaches to both models and infrastructure maximize flexibility and resource efficiency.


Conclusion

Building applications with generative AI follows a clear progression from ideation to production. The accessibility of modern tools and frameworks has lowered the barrier to entry for developers, making AI application development a practical skill to acquire. By understanding the three main phases—ideation and experimentation, building, and operationalization—developers can successfully create AI-powered applications that deliver real value. The key is to approach AI as another tool in the developer toolkit, using established development practices adapted for the unique requirements of machine learning operations.


FAQ

  1. Design, implementation, and testing
  2. Ideation and experimentation, building, and development and operations
  3. Planning, coding, and deployment
  4. Research, prototyping, and maintenance
(2) The three main phases of AI application development are ideation and experimentation, building the application, and development and operations (MLOps). These phases guide developers from proof of concept to production-ready AI-powered applications.

Zero-shot prompting involves asking a model a question without any examples of how to respond. This tests the model’s baseline capabilities without additional context, relying entirely on the model’s pre-trained knowledge to generate appropriate responses.

Small language models (SLMs) versus large language models (LLMs) generally perform better with lower latency and are specialized for specific tasks. While LLMs offer broader capabilities, SLMs provide more efficient performance for targeted use cases with reduced computational requirements.

TechniqueDescription
A. Zero-shot prompting1. Asking the model to explain its thinking process step by step
B. Few-shot prompting2. Asking a model without any examples of how to respond
C. Chain of thought prompting3. Providing several examples demonstrating desired behavior
D. Fine-tuning4. Baking domain-specific information directly into the model
A-2, B-3, C-1, D-4.

Self-hosting a large language model is generally more expensive than using a cloud-based service.

False. Self-hosting a large language model will generally be cheaper than a cloud-based service. However, this comes with the trade-off of managing infrastructure and resources internally.

RAG is a method for using data with large language models. The approach takes a pre-trained foundational model and supplements it with relevant and accurate data. This helps provide better and more accurate responses by grounding the model’s outputs in specific, up-to-date information without modifying the model itself.

  1. RAG is faster than fine-tuning
  2. RAG supplements the model with external data while fine-tuning bakes information directly into the model
  3. RAG requires more computational resources than fine-tuning
  4. Fine-tuning cannot handle domain-specific data
(2) RAG supplements a pre-trained model with relevant external data during inference, while fine-tuning bakes domain-specific information, desired behaviors, and styles directly into the model itself. This means RAG retrieves data externally while fine-tuning modifies the model’s parameters.

Experimenting with data early in the ideation phase helps identify potential challenges that might arise as the AI journey progresses. This early experimentation allows developers to:

  • Understand model capabilities and limitations
  • Discover data quality issues
  • Identify edge cases and failure modes
  • Make informed decisions about model selection
  • Plan for necessary data preprocessing or augmentation

Running models locally provides several benefits:

  • Data remains secure and private on premise
  • No need to transmit sensitive information to external services
  • Ability to serve the model through API from localhost
  • Similar workflow to running databases and services locally
  • Reduced latency for development and testing

This approach becomes particularly important when working with sensitive or proprietary information.

  1. TensorFlow
  2. PyTorch
  3. LangChain
  4. Scikit-learn
(3) LangChain is mentioned as a framework that simplifies development work by enabling focus on building new features for popular generative AI use cases like chatbots, IT process automation, and data management through streamlined model calls and prompt sequences.

MLOps stands for machine learning operations. Its purpose is to ensure models go into production in a controlled, reliable fashion, similar to how DevOps practices ensure smooth software deployments. MLOps encompasses infrastructure management, model deployment, scaling, benchmarking, monitoring, and exception handling for AI-powered applications.

Once an AI-powered application is deployed to production, the development work is complete and no further monitoring is needed.

False. With AI-powered applications in production, the work continues. Ongoing requirements include benchmarking performance, monitoring behavior, and handling different exceptions from the application. MLOps ensures continuous management of production models.

TechnologyPurpose
A. Kubernetes1. Production-ready runtime for model serving
B. Containers2. Managing resource allocation and balancing traffic
C. vLLM3. Auto-scaling based on demand
D. Hybrid infrastructure4. Combining on-premise and cloud resources for efficiency
A-3, B-2, C-1, D-4.

A multi-model approach means using different models for different use cases, similar to a Swiss Army knife with specialized tools for specific tasks. Organizations implement this strategy to optimize performance, cost, and capabilities by selecting the most appropriate model for each particular use case rather than trying to use a single model for all purposes.

  1. Zero-shot prompting
  2. Few-shot prompting
  3. Chain of thought prompting
  4. Fine-tuning only
(3) Chain of thought prompting asks the model to explain its thinking process step by step, making the reasoning transparent and verifiable. This is essential for compliance purposes where understanding the model’s decision-making process is required.

  1. Models should be evaluated from repositories like Hugging Face
  2. Model size is an important factor to consider
  3. Benchmarking tools can help assess performance
  4. Only cloud-based models should be considered for cost efficiency
(4) is incorrect. The document states that self-hosting a large language model will generally be cheaper than a cloud-based service. Both self-hosted and cloud-based models should be considered, with cost being one of several evaluation factors.

Popular generative AI use cases mentioned include:

  • Chatbots
  • IT process automation
  • Data management

Frameworks like LangChain help simplify development for these use cases by streamlining model calls and enabling sequences of prompts to accomplish more complex tasks.

Breaking down problems into smaller, manageable steps enables:

  • More effective use of sequences of prompts and model calls
  • Accomplishment of more complex tasks through composition
  • Evaluation of flows during model calls in both development and production
  • Better debugging and troubleshooting capabilities
  • Clearer understanding of the application logic and behavior

This approach aligns with the principle of using frameworks to simplify complex AI workflows.

  1. AI development remains highly complex and inaccessible
  2. Only experienced data scientists can build AI applications
  3. Recent innovations have made AI development more accessible for developers
  4. AI tools are only available through expensive enterprise licenses
(3) The document states that recent innovations in AI have made this topic much more accessible for developers, and that it has become easier than ever to get started. Open source tools and technologies provide the foundation for efficient development.

The three phases form a progression that guides AI application development from concept to production. The ideation and experimentation phase focuses on model research and evaluation, the building phase leverages frameworks and techniques to create functional applications, and the operationalization phase applies MLOps practices to deploy and maintain applications in production. Each phase builds upon the previous one, creating a complete development lifecycle for AI-powered applications.