Building Apps with Generative AI

November 21, 2025 12 min read Generative AI Machine Learning Application Development Docs IBM-RAG-AI AI Mlops Rag Model-Fine-Tuning

This document covers the complete AI application development journey, from ideation and model selection through building with RAG and fine-tuning to production deployment with MLOps best practices.

On this page

This document explores the three main phases of AI application development—ideation and experimentation, building with frameworks and techniques like RAG and fine-tuning, and operationalizing with MLOps—providing developers with practical guidance for creating production-ready AI-powered applications using open source tools and technologies.

Introduction to AI Application Development

Recent data from Gartner indicates that 80% of enterprises will have used some type of generative AI through models or APIs by 2026. While many developers have experience using AI through co-pilots in IDEs and popular large language models online, building applications that actually use AI represents a different challenge. The accessibility of AI development has improved significantly, making it easier than ever for developers to get started.

The journey from a simple proof of concept to a production application involves three main steps: ideation and experimentation, building the application, and development and operations. Open source tools and technologies are available to help throughout the inner loop development process of building, running, and testing applications with AI capabilities.

Phase One: Ideation and Experimentation

The first step in building an application that uses generative AI is ideating around exploration and proof of concepts. This phase can be broken down into several essential steps.

Model Research and Evaluation

Use cases are specialized, requiring specialized models that can do the job effectively. The process begins with researching and evaluating models from popular repositories like Hugging Face or the open source community. Different factors must be considered during evaluation:

Factor	Consideration
Model Size	Resource requirements and computational needs
Performance	Speed, accuracy, and task-specific capabilities
Benchmarking	Evaluation using popular benchmark tools

Ground Rules for Model Selection

Several general principles guide model selection decisions:

Self-hosting a large language model will generally be cheaper than a cloud-based service. Small language models (SLMs) versus large language models (LLMs) will generally perform better with lower latency and are specialized for specific tasks.

Prompting Techniques

Understanding various prompting techniques is essential when working with models:

Zero-shot prompting involves asking a model a question without any examples of how to respond. This tests the model’s baseline capabilities without additional context.

Few-shot prompting provides a few different examples of how to respond, demonstrating the behavior desired from the LLM as work progresses with the AI.

Chain of thought prompting asks the model to explain its thinking process step by step, making the reasoning transparent and verifiable.

Understanding Capabilities and Limitations

The different capabilities and limitations of models must be understood early in the process. Experimenting with data during this phase helps identify potential challenges that might arise as the AI journey progresses.

Phase Two: Building the Application

After evaluating models for the use case, the building phase begins. Similar to how databases and different services can be locally run on a machine, AI can also be served locally and accessed through its API from localhost.

Local Model Serving

Running models locally provides the added benefit of keeping data secure and private on premise. This approach becomes particularly important when working with sensitive or proprietary information.

Retrieval Augmented Generation (RAG)

RAG is a method for using data with large language models. The approach takes a pre-trained foundational model and supplements it with relevant and accurate data. This helps provide better and more accurate responses by grounding the model’s outputs in specific, up-to-date information.

Fine-Tuning

Fine-tuning represents an alternative approach where the large language model includes the data within it. This process bakes information about desired behavior, styles, and intuition directly into the model itself. When the model is inferenced, it has domain-specific data available every time, without requiring external retrieval.

These represent just two approaches among many available methods for incorporating data into AI applications.

Frameworks and Tools

Having the right tools and frameworks, such as LangChain, simplifies development work. These frameworks enable focus on building new features for popular generative AI use cases:

Chatbots
IT process automation
Data management
And much more

The simplification comes through streamlining the different calls made through the model using sequences of prompts and model calls to accomplish more complex tasks. Problems are broken down into smaller, more manageable steps, with evaluation of flows during model calls in both development and production environments.

Phase Three: Operationalizing AI Applications

The final step involves deploying the AI-powered application to production to enable scaling. This falls under the umbrella of machine learning operations, or MLOps.

Infrastructure Requirements

Infrastructure must be able to handle efficient model deployment and scaling. Technologies such as containers and orchestrators like Kubernetes help accomplish this by:

Auto-scaling based on demand
Balancing traffic for the application
Managing resource allocation efficiently

Production-ready runtime environments, such as vLLM, can be used for model serving to ensure optimal performance.

Hybrid Approach

Organizations are increasingly taking a hybrid approach with both models and infrastructure:

Multi-model approach: Different models for different use cases, similar to a Swiss Army knife with specialized tools for specific tasks.

Hybrid infrastructure: A combination of on-premise and cloud infrastructure to maximize resource utilization and budget efficiency.

Benchmarking and Monitoring

With AI-powered applications in production, the work continues. Ongoing requirements include:

Benchmarking performance
Monitoring behavior
Handling different exceptions from the application

Similar to how DevOps practices ensure smooth deployments, MLOps ensures models go into production in a controlled, reliable fashion.

The Developer Perspective on AI

Recent innovations in AI have made this topic much more accessible for developers. Numerous tools are available throughout the process to support development efforts. While AI represents new capabilities, it functions as another tool that can be added to the developer tool belt.

The complete process flows from ideation through building to deployment. These steps enable developers to make real impact with their work by leveraging generative AI capabilities in practical, production-ready applications.

Key Takeaways

The three-phase approach to AI application development provides a structured path forward:

The ideation and experimentation phase focuses on model research, evaluation, and understanding prompting techniques. The building phase leverages frameworks like LangChain and techniques like RAG or fine-tuning to create functional applications. The operationalization phase applies MLOps practices to deploy, scale, and monitor AI-powered applications in production environments.

Throughout all phases, open source tools and technologies provide the foundation for efficient development. Local model serving offers security and privacy benefits, while hybrid approaches to both models and infrastructure maximize flexibility and resource efficiency.

Conclusion

Building applications with generative AI follows a clear progression from ideation to production. The accessibility of modern tools and frameworks has lowered the barrier to entry for developers, making AI application development a practical skill to acquire. By understanding the three main phases—ideation and experimentation, building, and operationalization—developers can successfully create AI-powered applications that deliver real value. The key is to approach AI as another tool in the developer toolkit, using established development practices adapted for the unique requirements of machine learning operations.

FAQ

Design, implementation, and testing
Ideation and experimentation, building, and development and operations
Planning, coding, and deployment
Research, prototyping, and maintenance

(2) The three main phases of AI application development are ideation and experimentation, building the application, and development and operations (MLOps). These phases guide developers from proof of concept to production-ready AI-powered applications.

Zero-shot prompting involves asking a model a question without any examples of how to respond. This tests the model’s baseline capabilities without additional context, relying entirely on the model’s pre-trained knowledge to generate appropriate responses.

Small language models (SLMs) versus large language models (LLMs) generally perform better with lower latency and are specialized for specific tasks. While LLMs offer broader capabilities, SLMs provide more efficient performance for targeted use cases with reduced computational requirements.

Technique	Description
A. Zero-shot prompting	1. Asking the model to explain its thinking process step by step
B. Few-shot prompting	2. Asking a model without any examples of how to respond
C. Chain of thought prompting	3. Providing several examples demonstrating desired behavior
D. Fine-tuning	4. Baking domain-specific information directly into the model

A-2, B-3, C-1, D-4.

Self-hosting a large language model is generally more expensive than using a cloud-based service.

False. Self-hosting a large language model will generally be cheaper than a cloud-based service. However, this comes with the trade-off of managing infrastructure and resources internally.

RAG is faster than fine-tuning
RAG supplements the model with external data while fine-tuning bakes information directly into the model
RAG requires more computational resources than fine-tuning
Fine-tuning cannot handle domain-specific data

(2) RAG supplements a pre-trained model with relevant external data during inference, while fine-tuning bakes domain-specific information, desired behaviors, and styles directly into the model itself. This means RAG retrieves data externally while fine-tuning modifies the model’s parameters.

Experimenting with data early in the ideation phase helps identify potential challenges that might arise as the AI journey progresses. This early experimentation allows developers to:

Understand model capabilities and limitations
Discover data quality issues
Identify edge cases and failure modes
Make informed decisions about model selection
Plan for necessary data preprocessing or augmentation

Running models locally provides several benefits:

Data remains secure and private on premise
No need to transmit sensitive information to external services
Ability to serve the model through API from localhost
Similar workflow to running databases and services locally
Reduced latency for development and testing

This approach becomes particularly important when working with sensitive or proprietary information.

TensorFlow
PyTorch
LangChain
Scikit-learn

(3) LangChain is mentioned as a framework that simplifies development work by enabling focus on building new features for popular generative AI use cases like chatbots, IT process automation, and data management through streamlined model calls and prompt sequences.

MLOps stands for machine learning operations. Its purpose is to ensure models go into production in a controlled, reliable fashion, similar to how DevOps practices ensure smooth software deployments. MLOps encompasses infrastructure management, model deployment, scaling, benchmarking, monitoring, and exception handling for AI-powered applications.

Once an AI-powered application is deployed to production, the development work is complete and no further monitoring is needed.

False. With AI-powered applications in production, the work continues. Ongoing requirements include benchmarking performance, monitoring behavior, and handling different exceptions from the application. MLOps ensures continuous management of production models.

Technology	Purpose
A. Kubernetes	1. Production-ready runtime for model serving
B. Containers	2. Managing resource allocation and balancing traffic
C. vLLM	3. Auto-scaling based on demand
D. Hybrid infrastructure	4. Combining on-premise and cloud resources for efficiency

A-3, B-2, C-1, D-4.

A multi-model approach means using different models for different use cases, similar to a Swiss Army knife with specialized tools for specific tasks. Organizations implement this strategy to optimize performance, cost, and capabilities by selecting the most appropriate model for each particular use case rather than trying to use a single model for all purposes.

Zero-shot prompting
Few-shot prompting
Chain of thought prompting
Fine-tuning only

(3) Chain of thought prompting asks the model to explain its thinking process step by step, making the reasoning transparent and verifiable. This is essential for compliance purposes where understanding the model’s decision-making process is required.

Models should be evaluated from repositories like Hugging Face
Model size is an important factor to consider
Benchmarking tools can help assess performance
Only cloud-based models should be considered for cost efficiency

(4) is incorrect. The document states that self-hosting a large language model will generally be cheaper than a cloud-based service. Both self-hosted and cloud-based models should be considered, with cost being one of several evaluation factors.

Popular generative AI use cases mentioned include:

Chatbots
IT process automation
Data management

Frameworks like LangChain help simplify development for these use cases by streamlining model calls and enabling sequences of prompts to accomplish more complex tasks.

Breaking down problems into smaller, manageable steps enables:

More effective use of sequences of prompts and model calls
Accomplishment of more complex tasks through composition
Evaluation of flows during model calls in both development and production
Better debugging and troubleshooting capabilities
Clearer understanding of the application logic and behavior

This approach aligns with the principle of using frameworks to simplify complex AI workflows.

AI development remains highly complex and inaccessible
Only experienced data scientists can build AI applications
Recent innovations have made AI development more accessible for developers
AI tools are only available through expensive enterprise licenses

(3) The document states that recent innovations in AI have made this topic much more accessible for developers, and that it has become easier than ever to get started. Open source tools and technologies provide the foundation for efficient development.

The three phases form a progression that guides AI application development from concept to production. The ideation and experimentation phase focuses on model research and evaluation, the building phase leverages frameworks and techniques to create functional applications, and the operationalization phase applies MLOps practices to deploy and maintain applications in production. Each phase builds upon the previous one, creating a complete development lifecycle for AI-powered applications.

Choose AI Models

Flask

Browse Courses

Building Apps with Generative AI

Introduction to AI Application Development

Phase One: Ideation and Experimentation

Model Research and Evaluation

Ground Rules for Model Selection

Prompting Techniques

Understanding Capabilities and Limitations

Phase Two: Building the Application

Local Model Serving

Retrieval Augmented Generation (RAG)

Fine-Tuning

Frameworks and Tools

Phase Three: Operationalizing AI Applications

Infrastructure Requirements

Hybrid Approach

Benchmarking and Monitoring

The Developer Perspective on AI

Key Takeaways

Conclusion

FAQ

Which of the following best explains the three main phases of AI application development?

What is zero-shot prompting?

What are the key differences between SLMs and LLMs in terms of performance?

Match the following prompting techniques with their descriptions

True or False

What is Retrieval Augmented Generation (RAG)?

Which of the following is the primary difference between RAG and fine-tuning?

What is the most likely outcome if a developer experiments with data early in the ideation phase?

Why is running models locally beneficial for AI application development?

Which of the following frameworks is mentioned for simplifying AI application development?

What does MLOps stand for and what is its purpose?

True or False

Match the following technologies with their purposes in AI application deployment

What is meant by a multi-model approach in AI application development?

A development team needs to build a chatbot that must explain its reasoning for compliance purposes. Which prompting technique should they prioritize?

Which of the following is incorrect regarding model evaluation during the ideation phase?

What are popular generative AI use cases mentioned for application development?

Why should problems be broken down into smaller, manageable steps when building AI applications?

What can most likely be inferred about the accessibility of AI development based on the document?

What is the relationship between the three phases of AI application development?