Hybrid GA and Deep Feature Pipeline for Robust Facial Gender Recognition

Facial Gender Classification with Genetic Algorithms and Feature Selection### Abstract

Facial gender classification — predicting whether a face belongs to a male or female — is a fundamental task in computer vision with applications in human–computer interaction, demographic analysis, and security systems. This article explores a pipeline that combines feature selection and Genetic Algorithms (GAs) to improve classification accuracy, reduce model complexity, and increase robustness. We detail preprocessing, feature extraction, GA-based feature selection and hyperparameter tuning, classifier choices, evaluation metrics, experiments, results, and future directions.


1. Introduction

Facial gender classification has matured alongside advances in machine learning and deep learning. Traditionally, approaches ranged from handcrafted features (LBP, HOG, SIFT) with shallow classifiers (SVM, k-NN) to end-to-end deep neural networks. While deep models often yield highest accuracy, they can be computationally expensive and data-hungry. Feature selection remains valuable — it reduces dimensionality, limits overfitting, and highlights informative attributes. Genetic Algorithms (GAs) provide a flexible, population-based optimization technique well-suited to selecting feature subsets and tuning classifier hyperparameters simultaneously.


2. Problem Formulation

Given an input image containing a face, the objective is to assign a binary label (male/female). The pipeline considered here follows these stages:

  • Face detection and alignment
  • Feature extraction (handcrafted and/or deep features)
  • Feature selection using a GA
  • Classification using a chosen model (e.g., SVM, Random Forest, shallow MLP, or fine-tuned CNN)
  • Evaluation using accuracy, precision, recall, F1, ROC-AUC, and confusion matrix analysis

3. Data Preparation

  • Datasets: Common choices include Adience, IMDB-WIKI, CelebA, and UTKFace. Ensure balanced splits or apply class weighting/sampling to mitigate imbalance.
  • Preprocessing: Detect faces (MTCNN, Haar cascades, or Dlib), crop and align using facial landmarks, resize to a consistent input size (e.g., 128×128), and normalize pixel intensities.
  • Augmentation: Apply random flips, brightness/contrast jitter, slight rotations, and small translations to increase robustness.

4. Feature Extraction

Two complementary strategies can be used:

4.1 Handcrafted features

  • Local Binary Patterns (LBP): captures local texture useful for gender cues.
  • Histogram of Oriented Gradients (HOG): encodes shape and gradient structure.
  • Gabor filters: capture multi-scale orientation information.
  • Facial landmarks distances/ratios: geometric features (eye-to-mouth distance, jawline angles).

4.2 Deep features

  • Use pre-trained CNNs (VGGFace, ResNet, or MobileNet) as feature extractors: take activations from intermediate layers or global-pooled embeddings (e.g., 512-d vectors). Deep embeddings often provide strong discriminative power and are compact compared to raw pixels.

Combining handcrafted and deep features can improve generalization: concatenate normalized feature vectors, then perform selection to remove redundancy.


5. Genetic Algorithm for Feature Selection

GAs search the combinatorial space of feature subsets via biologically inspired operators. Key components:

  • Representation: Binary chromosome where each gene indicates inclusion (1) or exclusion (0) of a feature dimension. For continuous hyperparameter tuning, append real-valued genes or use separate chromosomes.
  • Population: Typically 20–200 individuals, depending on feature dimensionality and compute budget.
  • Fitness function: Evaluate classification performance (e.g., cross-validated F1-score or accuracy) on the selected feature subset. Include a complexity penalty to favor smaller feature sets: fitness = alpha * performance – beta * (|selected_features| / total_features)
    Choose alpha and beta to balance accuracy vs. compactness.
  • Selection: Tournament selection or roulette-wheel selection.
  • Crossover: Single-point or uniform crossover to combine parents.
  • Mutation: Bit-flip with a low probability (e.g., 0.01–0.05) to maintain diversity.
  • Elitism: Preserve top-k individuals each generation to retain best solutions.
  • Termination: Fixed number of generations (50–200), or stop when improvement stalls.

Practical tips:

  • Use stratified k-fold cross-validation within fitness evaluation to reduce variance.
  • Cache classifier results for identical chromosomes to avoid redundant training.
  • If feature dimensionality is very large (e.g., deep embeddings × many scales), consider a two-stage GA: first a coarse selection over feature groups, then a fine-grained selection.

6. Classifier Choices and Integration with GA

Popular classifiers to pair with GA-selected features:

  • Support Vector Machine (SVM) with RBF or linear kernel: robust for moderate-dimensional features.
  • Random Forest (RF): handles mixed feature types and gives feature importance for interpretability.
  • Gradient Boosted Trees (XGBoost/LightGBM): often strong baseline for tabular-like features.
  • Shallow Multilayer Perceptron (MLP): can learn nonlinear combinations post-selection.
  • Fine-tuned CNN: when GA selects which deep-layer embeddings or channels to use, the final classifier can still be a small dense network.

When using GA to tune hyperparameters, include SVM C/gamma, RF depth/estimators, or MLP layer sizes in the chromosome. Fitness evaluation then trains models with those hyperparameters — computationally more expensive but yields joint-optimized pipelines.


7. Evaluation Metrics and Experimental Protocol

  • Split dataset into train/validation/test or use nested cross-validation when tuning with GA to avoid optimistic bias.
  • Report accuracy, precision, recall, F1-score, and ROC-AUC. For imbalanced datasets, emphasize F1 or balanced accuracy.
  • Present confusion matrices and per-class metrics to reveal systematic biases.
  • Statistical significance: run multiple GA trials with different random seeds and report mean ± std of metrics.

8. Example Experiment (Design)

  • Dataset: CelebA subset balanced to ⁄50 gender.
  • Features: 512-d ResNet50 embeddings + 59-d LBP/HOG/landmark features → total ~571 dims.
  • GA: population 100, generations 100, tournament selection (size 3), uniform crossover, mutation rate 0.02, elitism 5. Fitness = 0.9 * validation F1 – 0.1 * (selected/total).
  • Classifier: SVM with RBF, C and gamma tuned via GA genes.
  • Protocol: 5-fold stratified CV inside fitness; final test on held-out 20% set.

Expected outcomes: GA reduces features to a compact subset (e.g., 40–120 dims), improves generalization vs. using all features, and produces competitive accuracy with lower inference cost.


9. Results and Analysis (Hypothetical)

  • Baseline (all features + SVM): Accuracy 92.0%, F1 0.918.
  • GA-selected features + SVM: Accuracy 93.4%, F1 0.933, using 18% of original features.
  • Interpretation: GA removed redundant/deceptive features and emphasized facial shape embeddings plus selected LBP channels.
  • Ablation: Using only handcrafted features gives lower accuracy (~85–88%), while only deep embeddings are close to GA results but slightly larger models; combined+GA performed best.

10. Practical Considerations and Limitations

  • Bias and fairness: Gender labels are culturally and technically complex. Datasets reflecting binary gender labels may misrepresent non-binary or gender-nonconforming people. Evaluate demographic fairness across age, ethnicity, and pose.
  • Privacy and ethics: Use responsibly; get consent when collecting faces and follow legal regulations (GDPR, etc.).
  • Computational cost: GA-based searches are expensive; use parallelization, surrogate models, or multi-stage selection to reduce cost.
  • Overfitting risk: Use nested CV and proper held-out test sets to estimate real-world performance.

11. Extensions and Future Work

  • Multi-objective GAs (e.g., NSGA-II) to balance accuracy, fairness, and model size explicitly.
  • Incorporate adversarial robustness objectives to improve real-world reliability.
  • Use differentiable feature selection (e.g., Concrete dropout or L0 regularization) as a complement to GA for end-to-end learning.
  • Expand labels beyond binary gender to handle gender expression or soft labels where ethically and legally appropriate.

12. Conclusion

Combining Genetic Algorithms with feature selection provides a powerful method to build compact, accurate facial gender classifiers. GAs excel at exploring combinatorial feature spaces and can jointly optimize classifier hyperparameters. However, designers must weigh computational cost, ethical concerns, and fairness when deploying such systems in practice.


References (suggested reading)

  • Goldberg, D. E. — Genetic Algorithms in Search, Optimization, and Machine Learning.
  • He, K., Zhang, X., Ren, S., Sun, J. — Deep Residual Learning for Image Recognition.
  • Oh, J., et al. — Gender classification datasets and benchmarks (CelebA, Adience, UTKFace).

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *