Architectural Overview
MDFE (MetLife Data Framework Engine) - Comprehensive Documentation
Table of Contents
- Executive Summary
- Architecture Overview
- Core Components
- Data Flow and Processing
- System Integration
- Configuration Management
- Data Zones and Storage Layers
- Pipeline Architecture
- SCD2 Implementation
- Error Handling and Quality Assurance
- Deployment and Operations
- Best Practices
- Troubleshooting Guide
- Appendices
Executive Summary
MDFE (MetLife Data Framework Engine) is a comprehensive, enterprise-grade data processing and integration platform developed by MetLife for their BD (Big Data) Data Platform. MDFE serves as the central orchestration engine for data ingestion, transformation, and distribution across multiple source systems and data warehouses.
Key Capabilities:
- Multi-System Integration: Seamlessly connects CNXT, OLAS, CustomerPortal, and other MetLife systems
- Azure Synapse Integration: Native support for Azure Synapse Analytics and Spark processing
- SCD2 Support: Built-in Slowly Changing Dimension Type 2 implementation for historical data tracking
- Mapping-Driven Architecture: Configuration-based approach using CSV mapping files
- Real-time and Batch Processing: Supports both streaming and batch data processing patterns
Architecture Overview
High-Level Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐│ Source Systems │ │ MDFE │ │ Target Systems ││ │ │ Framework │ │ ││ • CNXT/LifeLine │────▶│ │────▶│ • Azure Synapse ││ • OLAS │ │ • Processors │ │ • CDZ/DDZ ││ • CustomerPortal│ │ • Mapping Mgmt │ │ • Data Lake ││ • Agent Systems │ │ • Pipeline Orch │ │ • Analytics │└─────────────────┘ └─────────────────┘ └─────────────────┘
Technology Stack:
- Processing Engine: Java-based processors with Apache Spark integration
- Orchestration: Azure Data Factory pipelines
- Storage: Azure Synapse Analytics, Azure Data Lake Storage
- Configuration: CSV-based mapping files and JSON pipeline definitions
- Monitoring: Built-in quality assurance and reconciliation processes
Core Components
1. MDFE Processors
1.1 MdfeProcessor
Class: com.metlife.mdfe.processors.MdfeProcessor
Purpose: Main processing engine for data transformation and routing
Key Responsibilities:
- Data ingestion from source systems
- Schema mapping and transformation
- Data validation and quality checks
- Routing to appropriate target systems
1.2 MdfeBDToJsonProcessor
Class: com.metlife.mdfe.processors.MdfeBDToJsonProcessor
Purpose: Specialized processor for converting Big Data formats to JSON
Key Responsibilities:
- Big Data to JSON conversion
- Structure preservation during transformation
- Metadata management
- Performance optimization for large datasets
2. Mapping Management System
2.1 Mapping File Structure
All mapping configurations follow a standardized CSV format:
sor,app_id,mapping_idMDFE,bdp_agent_bd_olas,DAETLLIB_LLDCA_STGCustomerPortal,mdfe_CustomerPortal,CustomerPortal_RiderDetailedInformation_stg
Field Definitions:
- sor: System of Record identifier
- app_id: Application identifier for processing context
- mapping_id: Specific table or entity mapping identifier
2.2 Naming Conventions
- STG: Staging layer tables (e.g.,
OLAS_OLPABDCLN_LLDAA_stg) - CDZ: Customer Data Zone tables (e.g.,
OLAS_OLPABDCLN_LLDAA_cdz) - HIST: Historical tables for SCD2 (e.g.,
OLAS_OLPABDCLN_LLDAA_cdz_hist) - DDZ: Derived Data Zone for analytics-ready data
Data Flow and Processing
1. Data Ingestion Flow
Source System → MDFE Processor → Staging (STG) → Validation → CDZ → DDZ
2. Processing Stages
Stage 1: Data Ingestion
- Raw data extraction from source systems
- Initial schema validation
- Data type conversion and standardization
- Load into staging tables (
_stgsuffix)
Stage 2: Data Transformation
- Business rule application
- Data cleansing and standardization
- Derived field calculations
- Quality assurance checks
Stage 3: Data Storage
- Load into Customer Data Zone (CDZ)
- Historical tracking implementation (SCD2)
- Metadata update and lineage tracking
Stage 4: Analytics Preparation
- Aggregation and summarization
- Load into Derived Data Zone (DDZ)
- Index optimization for query performance
System Integration
1. CNXT (Context/LifeLine) Integration
Purpose: Life insurance policy and customer data management
Key Tables:
OLPABDDATA_LADPH: Policy administration dataOLPABDCLN_LLBAGTBNK: Agent banking informationOLPABDCLN_LLDAA: Policy detailsOLPABDCLN_LLDBC: Beneficiary informationOLPABDCLN_LLDCS: Customer service records
Processing Pattern:
-- Example CNXT query pattern for schema mappingWITH cte AS (SELECTREPLACE(REPLACE([Library], '[', ''), ']', '') AS [Library],REPLACE(REPLACE([Table_Name], '[', ''), ']', '') AS Table_Name,REPLACE(REPLACE([Column_Name], '[', ''), ']', '') AS Column_Name,LOWER(REPLACE(REPLACE([Data_Type], '[', ''), ']', '')) AS Data_TypeFROM [PAAS_Anomaly_Report].[dbo].[CNXT])SELECTColumn_Name AS columnName,CASEWHEN Data_Type = 'decimal' THEN 'decimal(16,2)'WHEN Data_Type = 'char' THEN 'varchar(400)'WHEN Data_Type = 'numeric' THEN 'long'WHEN Data_Type = 'nvarchar' THEN 'varchar(400)'ELSE Data_TypeEND AS dataTypeFROM cteWHERE Table_Name = REPLACE(REPLACE('{0}', 'OLPABDCLN_', ''), 'OLPABDDATA_', '');
2. OLAS (Online Life Administration System) Integration
Purpose: Online life insurance administration and processing
Configuration Files:
MDFE_mapping_OLAS_CACB.csv: Customer and agent data mappingsMDFE_mapping_OLAS_cdz_CACB.csv: CDZ-specific mappings
3. CustomerPortal Integration
Purpose: Customer self-service portal data integration
Key Entities:
CustomerPortal_RiderDetailedInformation: Rider and benefit details
Data Flow:
CustomerPortal → mdfe_CustomerPortal → STG → CDZ → CDZ_HIST
Configuration Management
1. Pipeline Configuration Files
1.1 LoadMain.json
Purpose: Main class processor pipeline configuration
Key Activities:
read_mdfe_config_file: Configuration file readingget_storage_account_name: Storage account resolution- Data processing orchestration
1.2 RunMdfe_Forech.json
Purpose: Iterative processing pipeline for multiple mappings
Key Features:
- Lookup-driven processing
- Dynamic configuration loading
- Parallel execution support
2. Configuration Parameters
{"mdfe_processor": "com.metlife.mdfe.processors.MdfeProcessor","mdfe_bd_to_json_processor": "com.metlife.mdfe.processors.MdfeBDToJsonProcessor","config_file_path": "mdfe/apps/bd_dataplatform/configs/","mapping_file": "MDFE_mapping_clics.csv"}
Data Zones and Storage Layers
1. Staging Layer (STG)
Purpose: Raw data landing and initial processing
Characteristics:
- Temporary storage for validation
- Schema preservation from source
- Data type standardization
- Initial quality checks
Example Tables:
OLAS_OLPABDCLN_LLDAA_stgCustomerPortal_RiderDetailedInformation_stg
2. Customer Data Zone (CDZ)
Purpose: Cleansed and standardized data storage
Features:
- Business rule application
- Data quality enforcement
- SCD2 implementation ready
- Optimized for analytical queries
Example Schema:
CREATE TABLE bd_data_platform_cdz2.cnxt_insured_policy_info(POL_NUM varchar(50),POL_STTS varchar(50),TOT_PREM varchar(50),POL_ISSUANCE_DT timestamp,LST_PY_DT timestamp,PLN_CD varchar(50),PLN_NM varchar(50),SUM_ASRD decimal(16,2),PREM_AMT decimal(16,2),-- Additional fields...ins_ts timestamp,upd_ts timestamp);
3. Historical Layer (CDZ_HIST)
Purpose: SCD2 historical data tracking
Implementation:
- Effective date tracking
- Change reason capture
- Previous value preservation
- Audit trail maintenance
4. Derived Data Zone (DDZ)
Purpose: Analytics-ready aggregated data
Features:
- Pre-calculated metrics
- Business KPIs
- Optimized for reporting tools
- Scheduled refresh patterns
Pipeline Architecture
1. Azure Data Factory Integration
Pipeline Components:
- Lookup Activities: Configuration and mapping file reading
- Filter Activities: Data subset identification
- ForEach Activities: Iterative processing
- Copy Activities: Data movement between layers
- Databricks/Spark Activities: Complex transformations
2. Execution Patterns
2.1 Batch Processing Pattern
Trigger → Config Load → Mapping Lookup → ForEach Processing → Validation → Load
2.2 Real-time Processing Pattern
Event Trigger → Stream Processing → Immediate Validation → Incremental Load
3. Error Handling
- Retry mechanisms with exponential backoff
- Dead letter queue for failed records
- Comprehensive logging and monitoring
- Alert system for critical failures
SCD2 Implementation
1. SCD2 Processing Logic
-- SCD2 Implementation PatternWITH cte AS (SELECT *,ROW_NUMBER() OVER(PARTITION BY TRIM(cspol) ORDER BY ins_ts DESC) AS RowNoFROM bd_data_platform_cdz1.dbo.olas_lldcs)SELECT*,CASEWHEN c.rowNo = 1 THEN 'New'WHEN c.RowNo = cte2.maxRowNo THEN 'Old'ELSE 'Updated'END AS FlagFROM cte cINNER JOIN (SELECT TRIM(cspol) AS cspol, MAX(rowNo) AS maxRowNoFROM cteGROUP BY TRIM(cspol)) cte2 ON TRIM(c.cspol) = TRIM(cte2.cspol)
2. Change Detection
- Insert: New records with current timestamp
- Update: Close existing record, insert new version
- Delete: Soft delete with end-date update
3. Effective Dating
eff_start_dt: Record effective start dateeff_end_dt: Record effective end date (null for current)ins_ts: System insertion timestampupd_ts: Last update timestamp
Error Handling and Quality Assurance
1. Data Quality Checks
Pre-Processing Validation:
- Schema validation
- Data type verification
- Business rule validation
- Completeness checks
Post-Processing Validation:
- Record count reconciliation
- Data integrity verification
- Business logic validation
- Audit trail completeness
2. Error Categories
Critical Errors:
- Schema mismatches
- Connection failures
- Authentication issues
- Resource unavailability
Warning-Level Issues:
- Data quality concerns
- Performance degradation
- Capacity constraints
- Configuration inconsistencies
3. Monitoring and Alerting
- Real-time pipeline monitoring
- Performance metrics tracking
- Error rate monitoring
- Capacity utilization alerts
Deployment and Operations
1. Environment Management
Development Environment:
- Feature development and testing
- Unit test execution
- Code quality validation
- Peer review processes
Staging Environment:
- Integration testing
- Performance validation
- User acceptance testing
- Production simulation
Production Environment:
- Live data processing
- Real-time monitoring
- Automated backup and recovery
- Incident response procedures
2. Deployment Process
Steps:
- Code commit and review
- Automated testing execution
- Security and compliance validation
- Staging deployment and validation
- Production deployment approval
- Production deployment execution
- Post-deployment monitoring
3. Operational Procedures
Daily Operations:
- Pipeline execution monitoring
- Performance metrics review
- Error log analysis
- Capacity planning
Weekly Operations:
- System health assessment
- Performance trend analysis
- Capacity utilization review
- Security audit
Monthly Operations:
- Comprehensive system review
- Documentation updates
- Process optimization
- Training and knowledge transfer
Best Practices
1. Configuration Management
- Version control for all configuration files
- Environment-specific parameter management
- Automated configuration validation
- Change documentation requirements
2. Performance Optimization
- Parallel processing where possible
- Efficient data partitioning strategies
- Index optimization for query performance
- Resource allocation optimization
3. Security and Compliance
- Data encryption in transit and at rest
- Access control implementation
- Audit trail maintenance
- Compliance reporting automation
4. Monitoring and Observability
- Comprehensive logging implementation
- Performance metrics collection
- Alert threshold configuration
- Dashboard and reporting setup
Troubleshooting Guide
1. Common Issues and Resolutions
Pipeline Failures:
Symptom: Pipeline execution failures
Causes:
- Configuration errors
- Resource constraints
- Data quality issues
- Network connectivity problems
Resolution Steps:
- Check pipeline logs for specific error messages
- Validate configuration files and parameters
- Verify resource availability and capacity
- Test network connectivity to source systems
- Examine data quality reports for issues
Performance Issues:
Symptom: Slow processing times
Causes:
- Insufficient resources
- Inefficient queries
- Large data volumes
- Network latency
Resolution Steps:
- Analyze resource utilization metrics
- Review query execution plans
- Implement data partitioning strategies
- Optimize network configuration
Data Quality Issues:
Symptom: Incorrect or missing data in target systems
Causes:
- Source system issues
- Mapping configuration errors
- Business rule violations
- Processing logic errors
Resolution Steps:
- Validate source data quality
- Review mapping configurations
- Test business rule implementations
- Analyze processing logic for errors
2. Diagnostic Tools and Techniques
Log Analysis:
- Pipeline execution logs
- System performance logs
- Error and warning logs
- Audit trail analysis
Performance Monitoring:
- Resource utilization monitoring
- Query performance analysis
- Network latency measurement
- Data volume trending
Data Validation:
- Schema validation tools
- Data profiling utilities
- Business rule testing
- Reconciliation reports
Appendices
Appendix A: System Acronyms and Definitions
| Acronym | Full Form | Definition |
|---|---|---|
| MDFE | MetLife Data Framework Engine | Core data processing framework |
| CNXT | Context/LifeLine | Life insurance policy management system |
| OLAS | Online Life Administration System | Online insurance administration platform |
| CDZ | Customer Data Zone | Cleansed and standardized data layer |
| DDZ | Derived Data Zone | Analytics-ready data layer |
| STG | Staging | Raw data landing layer |
| SCD2 | Slowly Changing Dimension Type 2 | Historical data tracking methodology |
| BD | Big Data | Large-scale data processing initiative |
Appendix B: Configuration File Templates
B.1 Mapping File Template
sor,app_id,mapping_id[SOURCE_SYSTEM],[APPLICATION_ID],[MAPPING_IDENTIFIER]
B.2 Pipeline Parameter Template
{"mdfe_processor": "com.metlife.mdfe.processors.MdfeProcessor","config_file_path": "[CONFIG_PATH]","config_file_name": "[CONFIG_FILE]","storage_account": "[STORAGE_ACCOUNT]","container_name": "[CONTAINER_NAME]"}
Appendix C: Performance Benchmarks
Processing Volume Benchmarks:
- Small Dataset: < 100MB, Processing Time: < 5 minutes
- Medium Dataset: 100MB - 1GB, Processing Time: < 30 minutes
- Large Dataset: 1GB - 10GB, Processing Time: < 2 hours
- Enterprise Dataset: > 10GB, Processing Time: < 8 hours
Resource Utilization Targets:
- CPU Utilization: 70-85% during peak processing
- Memory Utilization: 75-90% of allocated resources
- Storage I/O: Optimized for sequential read/write patterns
- Network Bandwidth: Efficient utilization with compression
Appendix D: Contact Information and Support
Development Team:
- Platform Architecture: [Contact Information]
- Data Engineering: [Contact Information]
- Operations Support: [Contact Information]
- Business Analysis: [Contact Information]
Support Channels:
- Production Issues: [Emergency Contact]
- Development Support: [Development Team Contact]
- Business Questions: [Business Team Contact]
- Documentation Updates: [Technical Writing Team]
Document Version: 1.0
Last Updated: October 19, 2025
Document Owner: BD Data Platform Team
Review Cycle: Quarterly
This documentation is proprietary to MetLife and contains confidential business information. Distribution should be limited to authorized personnel only.