# Embedding Models

<figure><img src="/files/Q0ehcrlN29fF0uUcBGxW" alt=""><figcaption></figcaption></figure>

### Overview <a href="#overview" id="overview"></a>

Embedding models are the foundation of semantic search in AINexLayer. They transform text into high-dimensional vectors that capture the meaning and context of your content, enabling the AI to find relevant information based on meaning rather than just keywords.

### How Embeddings Work <a href="#how-embeddings-work" id="how-embeddings-work"></a>

#### Text to Vector Conversion <a href="#text-to-vector-conversion" id="text-to-vector-conversion"></a>

1. **Text Input**: Raw text from your documents
2. **Tokenization**: Break text into tokens (words, subwords)
3. **Model Processing**: Neural network processes tokens
4. **Vector Output**: Numerical representation of text meaning
5. **Storage**: Vectors stored in vector database

#### Semantic Understanding <a href="#semantic-understanding" id="semantic-understanding"></a>

* **Meaning Capture**: Vectors represent semantic meaning
* **Context Awareness**: Understands word context and relationships
* **Similarity Matching**: Similar concepts have similar vectors
* **Cross-Language**: Works across different languages

### Supported Embedding Models <a href="#supported-embedding-models" id="supported-embedding-models"></a>

#### OpenAI Embeddings <a href="#openai-embeddings" id="openai-embeddings"></a>

**Best for**: General-purpose semantic search, high accuracy

**Available Models**

* **text-embedding-ada-002**: General-purpose embedding model
* **text-embedding-3-small**: Smaller, faster model
* **text-embedding-3-large**: Larger, more accurate model

**Configuration**

```json
{
  "provider": "openai",
  "model": "text-embedding-3-small",
  "dimensions": 1536,
  "apiKey": "your-openai-api-key"
}
```

**Specifications**

* **Dimensions**: 1536 (ada-002), 1536 (3-small), 3072 (3-large)
* **Context Length**: 8192 tokens
* **Languages**: 100+ languages supported
* **Pricing**: $0.0001/1K tokens

#### Azure OpenAI Embeddings <a href="#azure-openai-embeddings" id="azure-openai-embeddings"></a>

**Best for**: Enterprise deployments, compliance requirements

**Available Models**

* **text-embedding-ada-002**: Enterprise-grade embedding model
* **text-embedding-3-small**: Enterprise small model
* **text-embedding-3-large**: Enterprise large model

**Configuration**

```json
{
  "provider": "azure-openai",
  "model": "text-embedding-ada-002",
  "endpoint": "https://your-resource.openai.azure.com/",
  "apiKey": "your-azure-api-key",
  "apiVersion": "2024-02-15-preview"
}
```

#### Cohere Embeddings <a href="#cohere-embeddings" id="cohere-embeddings"></a>

**Best for**: Multilingual support, business applications

**Available Models**

* **embed-english-v3.0**: English-optimized model
* **embed-multilingual-v3.0**: Multilingual model
* **embed-english-light-v3.0**: Lightweight English model

**Configuration**

```json
{
  "provider": "cohere",
  "model": "embed-multilingual-v3.0",
  "apiKey": "your-cohere-api-key",
  "inputType": "search_document"
}
```

**Specifications**

* **Dimensions**: 1024
* **Context Length**: 512 tokens
* **Languages**: 100+ languages
* **Pricing**: $0.0001/1K tokens

#### Local Embedding Models <a href="#local-embedding-models" id="local-embedding-models"></a>

**Best for**: Privacy, offline use, cost control

**Sentence Transformers**

```json
{
  "provider": "sentence-transformers",
  "model": "all-MiniLM-L6-v2",
  "dimensions": 384,
  "device": "cpu"
}
```

**Available Models**

* **all-MiniLM-L6-v2**: Fast, general-purpose model
* **all-mpnet-base-v2**: High-quality English model
* **paraphrase-multilingual-MiniLM-L12-v2**: Multilingual model
* **distilbert-base-nli-mean-tokens**: Distilled BERT model

**Configuration**

```json
{
  "provider": "sentence-transformers",
  "model": "all-mpnet-base-v2",
  "dimensions": 768,
  "device": "cuda",
  "batchSize": 32
}
```

#### Ollama Embeddings <a href="#ollama-embeddings" id="ollama-embeddings"></a>

**Best for**: Local deployment, custom models

**Available Models**

* **nomic-embed-text**: High-quality local embedding
* **mxbai-embed-large**: Large local embedding model
* **all-minilm**: Lightweight local model

**Configuration**

```json
{
  "provider": "ollama",
  "model": "nomic-embed-text",
  "baseURL": "http://localhost:11434",
  "dimensions": 768
}
```

**Installation**

```bash
# Install embedding model
ollama pull nomic-embed-text

# Start Ollama service
ollama serve
```

### Embedding Model Selection <a href="#embedding-model-selection" id="embedding-model-selection"></a>

#### By Use Case <a href="#by-use-case" id="by-use-case"></a>

**General Document Search**

* **Recommended**: OpenAI text-embedding-3-small
* **Why**: Good balance of speed and accuracy
* **Use Cases**: General document retrieval, Q\&A

**High-Accuracy Search**

* **Recommended**: OpenAI text-embedding-3-large
* **Why**: Highest accuracy for complex queries
* **Use Cases**: Research, complex analysis

**Multilingual Content**

* **Recommended**: Cohere embed-multilingual-v3.0
* **Why**: Optimized for multiple languages
* **Use Cases**: International documents, multilingual search

**Privacy-Sensitive**

* **Recommended**: Local models (Sentence Transformers)
* **Why**: Data stays on your infrastructure
* **Use Cases**: Sensitive documents, compliance

**Cost-Optimized**

* **Recommended**: Local models, OpenAI ada-002
* **Why**: Lower cost per embedding
* **Use Cases**: High-volume processing, budget constraints

#### By Performance Requirements <a href="#by-performance-requirements" id="by-performance-requirements"></a>

**Speed Priority**

* **Fastest**: Local models, OpenAI 3-small
* **Medium**: Cohere models, OpenAI ada-002
* **Slower**: OpenAI 3-large, complex local models

**Accuracy Priority**

* **Highest**: OpenAI 3-large, Cohere multilingual
* **High**: OpenAI 3-small, Cohere English
* **Good**: Local models, OpenAI ada-002

**Cost Priority**

* **Cheapest**: Local models
* **Moderate**: OpenAI ada-002, Cohere models
* **Expensive**: OpenAI 3-large

### Configuration Management <a href="#configuration-management" id="configuration-management"></a>

#### Environment Variables <a href="#environment-variables" id="environment-variables"></a>

```bash
# OpenAI Embeddings
OPEN_AI_KEY=your-openai-api-key

# Azure OpenAI Embeddings
AZURE_OPENAI_API_KEY=your-azure-api-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/

# Cohere Embeddings
COHERE_API_KEY=your-cohere-api-key

# Local Models
EMBEDDING_MODEL_PATH=/path/to/local/model
```

#### Model Configuration <a href="#model-configuration" id="model-configuration"></a>

```json
{
  "defaultEmbeddingModel": "text-embedding-3-small",
  "embeddingModels": {
    "text-embedding-3-small": {
      "provider": "openai",
      "dimensions": 1536,
      "contextLength": 8192
    },
    "all-MiniLM-L6-v2": {
      "provider": "sentence-transformers",
      "dimensions": 384,
      "contextLength": 512
    }
  }
}
```

### Performance Optimization <a href="#performance-optimization" id="performance-optimization"></a>

#### Embedding Generation <a href="#embedding-generation" id="embedding-generation"></a>

* **Batch Processing**: Process multiple texts together
* **Parallel Processing**: Use multiple workers
* **Caching**: Cache embeddings for repeated text
* **Optimization**: Use appropriate model for task

#### Storage Optimization <a href="#storage-optimization" id="storage-optimization"></a>

* **Vector Compression**: Compress vectors for storage
* **Indexing**: Efficient vector indexing
* **Quantization**: Reduce vector precision
* **Deduplication**: Remove duplicate embeddings

#### Search Optimization <a href="#search-optimization" id="search-optimization"></a>

* **Index Optimization**: Optimize vector indexes
* **Similarity Metrics**: Choose appropriate similarity function
* **Query Optimization**: Optimize search queries
* **Result Caching**: Cache search results

### Vector Dimensions <a href="#vector-dimensions" id="vector-dimensions"></a>

#### Dimension Trade-offs <a href="#dimension-trade-offs" id="dimension-trade-offs"></a>

* **Higher Dimensions**: Better accuracy, more storage
* **Lower Dimensions**: Faster search, less storage
* **Optimal Range**: 384-1536 dimensions for most use cases

#### Common Dimensions <a href="#common-dimensions" id="common-dimensions"></a>

* **384**: Fast, lightweight models
* **768**: Balanced performance
* **1024**: Good accuracy
* **1536**: High accuracy (OpenAI standard)
* **3072**: Maximum accuracy (OpenAI large)

### Similarity Metrics <a href="#similarity-metrics" id="similarity-metrics"></a>

#### Cosine Similarity <a href="#cosine-similarity" id="cosine-similarity"></a>

* **Best for**: General semantic similarity
* **Range**: -1 to 1
* **Advantages**: Scale-invariant, good for text
* **Use Cases**: Most document search applications

#### Euclidean Distance <a href="#euclidean-distance" id="euclidean-distance"></a>

* **Best for**: Geometric similarity
* **Range**: 0 to infinity
* **Advantages**: Intuitive distance measure
* **Use Cases**: Clustering, classification

#### Dot Product <a href="#dot-product" id="dot-product"></a>

* **Best for**: Fast computation
* **Range**: -infinity to infinity
* **Advantages**: Very fast computation
* **Use Cases**: High-performance applications

### Troubleshooting <a href="#troubleshooting" id="troubleshooting"></a>

#### Common Issues <a href="#common-issues" id="common-issues"></a>

**Embedding Generation Failures**

* **API Errors**: Check API keys and quotas
* **Model Errors**: Verify model availability
* **Text Length**: Check text length limits
* **Network Issues**: Verify network connectivity

**Poor Search Results**

* **Model Selection**: Try different embedding models
* **Text Quality**: Improve source text quality
* **Chunking Strategy**: Optimize text chunking
* **Similarity Threshold**: Adjust similarity thresholds

**Performance Issues**

* **Slow Generation**: Use faster models or batch processing
* **Memory Issues**: Monitor memory usage
* **Storage Issues**: Optimize vector storage
* **Search Speed**: Optimize vector indexes

#### Error Handling <a href="#error-handling" id="error-handling"></a>

```json
{
  "error": {
    "type": "embedding_generation_failed",
    "message": "Failed to generate embedding",
    "details": {
      "textLength": 10000,
      "maxLength": 8192,
      "suggestion": "Reduce text length or use chunking"
    }
  }
}
```

### Best Practices <a href="#best-practices" id="best-practices"></a>

#### Model Selection <a href="#model-selection" id="model-selection"></a>

* **Start Simple**: Begin with OpenAI ada-002 or local models
* **Test Performance**: Evaluate models for your specific use case
* **Consider Costs**: Balance accuracy with cost
* **Plan for Scale**: Consider scaling requirements

#### Text Preparation <a href="#text-preparation" id="text-preparation"></a>

* **Clean Text**: Remove noise and formatting issues
* **Appropriate Length**: Use optimal text chunk sizes
* **Context Preservation**: Maintain document context
* **Language Consistency**: Use consistent language

#### Performance Optimization <a href="#performance-optimization-1" id="performance-optimization-1"></a>

* **Batch Processing**: Process multiple texts together
* **Caching**: Cache embeddings for repeated content
* **Indexing**: Use efficient vector indexes
* **Monitoring**: Monitor embedding performance

#### Security and Privacy <a href="#security-and-privacy" id="security-and-privacy"></a>

* **API Key Security**: Secure embedding API keys
* **Data Privacy**: Consider data privacy requirements
* **Local Models**: Use local models for sensitive data
* **Access Control**: Implement proper access controls

### Integration Examples <a href="#integration-examples" id="integration-examples"></a>

#### Python Integration <a href="#python-integration" id="python-integration"></a>

```python
import openai
from sentence_transformers import SentenceTransformer

# OpenAI Embeddings
def get_openai_embedding(text):
    response = openai.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

# Local Embeddings
def get_local_embedding(text):
    model = SentenceTransformer('all-MiniLM-L6-v2')
    embedding = model.encode(text)
    return embedding.tolist()
```

#### API Integration <a href="#api-integration" id="api-integration"></a>

```bash
# Generate embedding
POST /api/v1/embeddings/generate
Content-Type: application/json

{
  "text": "Your text here",
  "model": "text-embedding-3-small"
}

# Batch embedding generation
POST /api/v1/embeddings/batch
Content-Type: application/json

{
  "texts": ["Text 1", "Text 2", "Text 3"],
  "model": "text-embedding-3-small"
}
```

***

**🔢 Embedding models are the foundation of semantic search. Choose the right model for your needs to achieve optimal search performance and accuracy.**


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://doc.ainexlayer.com/documentation/ai-engine-configuration/embedding-models.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
