MDFE Architecture Overview

Published At 2025/Oct/19
No Image Found

Architectural Overview

MDFE (MetLife Data Framework Engine) - Comprehensive Documentation

Table of Contents

  1. Executive Summary
  2. Architecture Overview
  3. Core Components
  4. Data Flow and Processing
  5. System Integration
  6. Configuration Management
  7. Data Zones and Storage Layers
  8. Pipeline Architecture
  9. SCD2 Implementation
  10. Error Handling and Quality Assurance
  11. Deployment and Operations
  12. Best Practices
  13. Troubleshooting Guide
  14. Appendices

Executive Summary

MDFE (MetLife Data Framework Engine) is a comprehensive, enterprise-grade data processing and integration platform developed by MetLife for their BD (Big Data) Data Platform. MDFE serves as the central orchestration engine for data ingestion, transformation, and distribution across multiple source systems and data warehouses.

Key Capabilities:

  • Multi-System Integration: Seamlessly connects CNXT, OLAS, CustomerPortal, and other MetLife systems
  • Azure Synapse Integration: Native support for Azure Synapse Analytics and Spark processing
  • SCD2 Support: Built-in Slowly Changing Dimension Type 2 implementation for historical data tracking
  • Mapping-Driven Architecture: Configuration-based approach using CSV mapping files
  • Real-time and Batch Processing: Supports both streaming and batch data processing patterns

Architecture Overview

High-Level Architecture

  1. ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
  2. Source Systems MDFE Target Systems
  3. Framework
  4. CNXT/LifeLine │────▶│ │────▶│ Azure Synapse
  5. OLAS Processors CDZ/DDZ
  6. CustomerPortal Mapping Mgmt Data Lake
  7. Agent Systems Pipeline Orch Analytics
  8. └─────────────────┘ └─────────────────┘ └─────────────────┘

Technology Stack:

  • Processing Engine: Java-based processors with Apache Spark integration
  • Orchestration: Azure Data Factory pipelines
  • Storage: Azure Synapse Analytics, Azure Data Lake Storage
  • Configuration: CSV-based mapping files and JSON pipeline definitions
  • Monitoring: Built-in quality assurance and reconciliation processes

Core Components

1. MDFE Processors

1.1 MdfeProcessor

Class: com.metlife.mdfe.processors.MdfeProcessor

Purpose: Main processing engine for data transformation and routing

Key Responsibilities:

  • Data ingestion from source systems
  • Schema mapping and transformation
  • Data validation and quality checks
  • Routing to appropriate target systems

1.2 MdfeBDToJsonProcessor

Class: com.metlife.mdfe.processors.MdfeBDToJsonProcessor

Purpose: Specialized processor for converting Big Data formats to JSON

Key Responsibilities:

  • Big Data to JSON conversion
  • Structure preservation during transformation
  • Metadata management
  • Performance optimization for large datasets

2. Mapping Management System

2.1 Mapping File Structure

All mapping configurations follow a standardized CSV format:

  1. sor,app_id,mapping_id
  2. MDFE,bdp_agent_bd_olas,DAETLLIB_LLDCA_STG
  3. CustomerPortal,mdfe_CustomerPortal,CustomerPortal_RiderDetailedInformation_stg

Field Definitions:

  • sor: System of Record identifier
  • app_id: Application identifier for processing context
  • mapping_id: Specific table or entity mapping identifier

2.2 Naming Conventions

  • STG: Staging layer tables (e.g., OLAS_OLPABDCLN_LLDAA_stg)
  • CDZ: Customer Data Zone tables (e.g., OLAS_OLPABDCLN_LLDAA_cdz)
  • HIST: Historical tables for SCD2 (e.g., OLAS_OLPABDCLN_LLDAA_cdz_hist)
  • DDZ: Derived Data Zone for analytics-ready data

Data Flow and Processing

1. Data Ingestion Flow

  1. Source System MDFE Processor Staging (STG) Validation CDZ DDZ

2. Processing Stages

Stage 1: Data Ingestion

  • Raw data extraction from source systems
  • Initial schema validation
  • Data type conversion and standardization
  • Load into staging tables (_stg suffix)

Stage 2: Data Transformation

  • Business rule application
  • Data cleansing and standardization
  • Derived field calculations
  • Quality assurance checks

Stage 3: Data Storage

  • Load into Customer Data Zone (CDZ)
  • Historical tracking implementation (SCD2)
  • Metadata update and lineage tracking

Stage 4: Analytics Preparation

  • Aggregation and summarization
  • Load into Derived Data Zone (DDZ)
  • Index optimization for query performance

System Integration

1. CNXT (Context/LifeLine) Integration

Purpose: Life insurance policy and customer data management

Key Tables:

  • OLPABDDATA_LADPH: Policy administration data
  • OLPABDCLN_LLBAGTBNK: Agent banking information
  • OLPABDCLN_LLDAA: Policy details
  • OLPABDCLN_LLDBC: Beneficiary information
  • OLPABDCLN_LLDCS: Customer service records

Processing Pattern:

  1. -- Example CNXT query pattern for schema mapping
  2. WITH cte AS (
  3. SELECT
  4. REPLACE(REPLACE([Library], '[', ''), ']', '') AS [Library],
  5. REPLACE(REPLACE([Table_Name], '[', ''), ']', '') AS Table_Name,
  6. REPLACE(REPLACE([Column_Name], '[', ''), ']', '') AS Column_Name,
  7. LOWER(REPLACE(REPLACE([Data_Type], '[', ''), ']', '')) AS Data_Type
  8. FROM [PAAS_Anomaly_Report].[dbo].[CNXT]
  9. )
  10. SELECT
  11. Column_Name AS columnName,
  12. CASE
  13. WHEN Data_Type = 'decimal' THEN 'decimal(16,2)'
  14. WHEN Data_Type = 'char' THEN 'varchar(400)'
  15. WHEN Data_Type = 'numeric' THEN 'long'
  16. WHEN Data_Type = 'nvarchar' THEN 'varchar(400)'
  17. ELSE Data_Type
  18. END AS dataType
  19. FROM cte
  20. WHERE Table_Name = REPLACE(REPLACE('{0}', 'OLPABDCLN_', ''), 'OLPABDDATA_', '');

2. OLAS (Online Life Administration System) Integration

Purpose: Online life insurance administration and processing

Configuration Files:

  • MDFE_mapping_OLAS_CACB.csv: Customer and agent data mappings
  • MDFE_mapping_OLAS_cdz_CACB.csv: CDZ-specific mappings

3. CustomerPortal Integration

Purpose: Customer self-service portal data integration

Key Entities:

  • CustomerPortal_RiderDetailedInformation: Rider and benefit details

Data Flow:

  1. CustomerPortal mdfe_CustomerPortal STG CDZ CDZ_HIST

Configuration Management

1. Pipeline Configuration Files

1.1 LoadMain.json

Purpose: Main class processor pipeline configuration

Key Activities:

  • read_mdfe_config_file: Configuration file reading
  • get_storage_account_name: Storage account resolution
  • Data processing orchestration

1.2 RunMdfe_Forech.json

Purpose: Iterative processing pipeline for multiple mappings

Key Features:

  • Lookup-driven processing
  • Dynamic configuration loading
  • Parallel execution support

2. Configuration Parameters

  1. {
  2. "mdfe_processor": "com.metlife.mdfe.processors.MdfeProcessor",
  3. "mdfe_bd_to_json_processor": "com.metlife.mdfe.processors.MdfeBDToJsonProcessor",
  4. "config_file_path": "mdfe/apps/bd_dataplatform/configs/",
  5. "mapping_file": "MDFE_mapping_clics.csv"
  6. }

Data Zones and Storage Layers

1. Staging Layer (STG)

Purpose: Raw data landing and initial processing

Characteristics:

  • Temporary storage for validation
  • Schema preservation from source
  • Data type standardization
  • Initial quality checks

Example Tables:

  • OLAS_OLPABDCLN_LLDAA_stg
  • CustomerPortal_RiderDetailedInformation_stg

2. Customer Data Zone (CDZ)

Purpose: Cleansed and standardized data storage

Features:

  • Business rule application
  • Data quality enforcement
  • SCD2 implementation ready
  • Optimized for analytical queries

Example Schema:

  1. CREATE TABLE bd_data_platform_cdz2.cnxt_insured_policy_info(
  2. POL_NUM varchar(50),
  3. POL_STTS varchar(50),
  4. TOT_PREM varchar(50),
  5. POL_ISSUANCE_DT timestamp,
  6. LST_PY_DT timestamp,
  7. PLN_CD varchar(50),
  8. PLN_NM varchar(50),
  9. SUM_ASRD decimal(16,2),
  10. PREM_AMT decimal(16,2),
  11. -- Additional fields...
  12. ins_ts timestamp,
  13. upd_ts timestamp
  14. );

3. Historical Layer (CDZ_HIST)

Purpose: SCD2 historical data tracking

Implementation:

  • Effective date tracking
  • Change reason capture
  • Previous value preservation
  • Audit trail maintenance

4. Derived Data Zone (DDZ)

Purpose: Analytics-ready aggregated data

Features:

  • Pre-calculated metrics
  • Business KPIs
  • Optimized for reporting tools
  • Scheduled refresh patterns

Pipeline Architecture

1. Azure Data Factory Integration

Pipeline Components:

  1. Lookup Activities: Configuration and mapping file reading
  2. Filter Activities: Data subset identification
  3. ForEach Activities: Iterative processing
  4. Copy Activities: Data movement between layers
  5. Databricks/Spark Activities: Complex transformations

2. Execution Patterns

2.1 Batch Processing Pattern

  1. Trigger Config Load Mapping Lookup ForEach Processing Validation Load

2.2 Real-time Processing Pattern

  1. Event Trigger Stream Processing Immediate Validation Incremental Load

3. Error Handling

  • Retry mechanisms with exponential backoff
  • Dead letter queue for failed records
  • Comprehensive logging and monitoring
  • Alert system for critical failures

SCD2 Implementation

1. SCD2 Processing Logic

  1. -- SCD2 Implementation Pattern
  2. WITH cte AS (
  3. SELECT *,
  4. ROW_NUMBER() OVER(PARTITION BY TRIM(cspol) ORDER BY ins_ts DESC) AS RowNo
  5. FROM bd_data_platform_cdz1.dbo.olas_lldcs
  6. )
  7. SELECT
  8. *,
  9. CASE
  10. WHEN c.rowNo = 1 THEN 'New'
  11. WHEN c.RowNo = cte2.maxRowNo THEN 'Old'
  12. ELSE 'Updated'
  13. END AS Flag
  14. FROM cte c
  15. INNER JOIN (
  16. SELECT TRIM(cspol) AS cspol, MAX(rowNo) AS maxRowNo
  17. FROM cte
  18. GROUP BY TRIM(cspol)
  19. ) cte2 ON TRIM(c.cspol) = TRIM(cte2.cspol)

2. Change Detection

  • Insert: New records with current timestamp
  • Update: Close existing record, insert new version
  • Delete: Soft delete with end-date update

3. Effective Dating

  • eff_start_dt: Record effective start date
  • eff_end_dt: Record effective end date (null for current)
  • ins_ts: System insertion timestamp
  • upd_ts: Last update timestamp

Error Handling and Quality Assurance

1. Data Quality Checks

Pre-Processing Validation:

  • Schema validation
  • Data type verification
  • Business rule validation
  • Completeness checks

Post-Processing Validation:

  • Record count reconciliation
  • Data integrity verification
  • Business logic validation
  • Audit trail completeness

2. Error Categories

Critical Errors:

  • Schema mismatches
  • Connection failures
  • Authentication issues
  • Resource unavailability

Warning-Level Issues:

  • Data quality concerns
  • Performance degradation
  • Capacity constraints
  • Configuration inconsistencies

3. Monitoring and Alerting

  • Real-time pipeline monitoring
  • Performance metrics tracking
  • Error rate monitoring
  • Capacity utilization alerts

Deployment and Operations

1. Environment Management

Development Environment:

  • Feature development and testing
  • Unit test execution
  • Code quality validation
  • Peer review processes

Staging Environment:

  • Integration testing
  • Performance validation
  • User acceptance testing
  • Production simulation

Production Environment:

  • Live data processing
  • Real-time monitoring
  • Automated backup and recovery
  • Incident response procedures

2. Deployment Process

Steps:

  1. Code commit and review
  2. Automated testing execution
  3. Security and compliance validation
  4. Staging deployment and validation
  5. Production deployment approval
  6. Production deployment execution
  7. Post-deployment monitoring

3. Operational Procedures

Daily Operations:

  • Pipeline execution monitoring
  • Performance metrics review
  • Error log analysis
  • Capacity planning

Weekly Operations:

  • System health assessment
  • Performance trend analysis
  • Capacity utilization review
  • Security audit

Monthly Operations:

  • Comprehensive system review
  • Documentation updates
  • Process optimization
  • Training and knowledge transfer

Best Practices

1. Configuration Management

  • Version control for all configuration files
  • Environment-specific parameter management
  • Automated configuration validation
  • Change documentation requirements

2. Performance Optimization

  • Parallel processing where possible
  • Efficient data partitioning strategies
  • Index optimization for query performance
  • Resource allocation optimization

3. Security and Compliance

  • Data encryption in transit and at rest
  • Access control implementation
  • Audit trail maintenance
  • Compliance reporting automation

4. Monitoring and Observability

  • Comprehensive logging implementation
  • Performance metrics collection
  • Alert threshold configuration
  • Dashboard and reporting setup

Troubleshooting Guide

1. Common Issues and Resolutions

Pipeline Failures:

Symptom: Pipeline execution failures
Causes:

  • Configuration errors
  • Resource constraints
  • Data quality issues
  • Network connectivity problems

Resolution Steps:

  1. Check pipeline logs for specific error messages
  2. Validate configuration files and parameters
  3. Verify resource availability and capacity
  4. Test network connectivity to source systems
  5. Examine data quality reports for issues

Performance Issues:

Symptom: Slow processing times
Causes:

  • Insufficient resources
  • Inefficient queries
  • Large data volumes
  • Network latency

Resolution Steps:

  1. Analyze resource utilization metrics
  2. Review query execution plans
  3. Implement data partitioning strategies
  4. Optimize network configuration

Data Quality Issues:

Symptom: Incorrect or missing data in target systems
Causes:

  • Source system issues
  • Mapping configuration errors
  • Business rule violations
  • Processing logic errors

Resolution Steps:

  1. Validate source data quality
  2. Review mapping configurations
  3. Test business rule implementations
  4. Analyze processing logic for errors

2. Diagnostic Tools and Techniques

Log Analysis:

  • Pipeline execution logs
  • System performance logs
  • Error and warning logs
  • Audit trail analysis

Performance Monitoring:

  • Resource utilization monitoring
  • Query performance analysis
  • Network latency measurement
  • Data volume trending

Data Validation:

  • Schema validation tools
  • Data profiling utilities
  • Business rule testing
  • Reconciliation reports

Appendices

Appendix A: System Acronyms and Definitions

AcronymFull FormDefinition
MDFEMetLife Data Framework EngineCore data processing framework
CNXTContext/LifeLineLife insurance policy management system
OLASOnline Life Administration SystemOnline insurance administration platform
CDZCustomer Data ZoneCleansed and standardized data layer
DDZDerived Data ZoneAnalytics-ready data layer
STGStagingRaw data landing layer
SCD2Slowly Changing Dimension Type 2Historical data tracking methodology
BDBig DataLarge-scale data processing initiative

Appendix B: Configuration File Templates

B.1 Mapping File Template

  1. sor,app_id,mapping_id
  2. [SOURCE_SYSTEM],[APPLICATION_ID],[MAPPING_IDENTIFIER]

B.2 Pipeline Parameter Template

  1. {
  2. "mdfe_processor": "com.metlife.mdfe.processors.MdfeProcessor",
  3. "config_file_path": "[CONFIG_PATH]",
  4. "config_file_name": "[CONFIG_FILE]",
  5. "storage_account": "[STORAGE_ACCOUNT]",
  6. "container_name": "[CONTAINER_NAME]"
  7. }

Appendix C: Performance Benchmarks

Processing Volume Benchmarks:

  • Small Dataset: < 100MB, Processing Time: < 5 minutes
  • Medium Dataset: 100MB - 1GB, Processing Time: < 30 minutes
  • Large Dataset: 1GB - 10GB, Processing Time: < 2 hours
  • Enterprise Dataset: > 10GB, Processing Time: < 8 hours

Resource Utilization Targets:

  • CPU Utilization: 70-85% during peak processing
  • Memory Utilization: 75-90% of allocated resources
  • Storage I/O: Optimized for sequential read/write patterns
  • Network Bandwidth: Efficient utilization with compression

Appendix D: Contact Information and Support

Development Team:

  • Platform Architecture: [Contact Information]
  • Data Engineering: [Contact Information]
  • Operations Support: [Contact Information]
  • Business Analysis: [Contact Information]

Support Channels:

  • Production Issues: [Emergency Contact]
  • Development Support: [Development Team Contact]
  • Business Questions: [Business Team Contact]
  • Documentation Updates: [Technical Writing Team]

Document Version: 1.0
Last Updated: October 19, 2025
Document Owner: BD Data Platform Team
Review Cycle: Quarterly


This documentation is proprietary to MetLife and contains confidential business information. Distribution should be limited to authorized personnel only.