SEHFS: Structural Entropy-Guided High-Order Correlation Learning for Multi-View Multi-Label Feature Selection

SEHFS (Structural Entropy-Guided High-Order Correlation Learning) is a novel information-theoretic method for multi-view multi-label feature selection. It introduces a structural entropy framework to capture complex high-order feature dependencies and employs a hybrid optimization strategy to avoid local optima. The method has been validated on eight datasets, demonstrating superior performance in feature selection compared to existing approaches.

SEHFS: Structural Entropy-Guided High-Order Correlation Learning for Multi-View Multi-Label Feature Selection

Structural Entropy Method Advances Multi-View Multi-Label Learning

A novel information-theoretic method has been developed to overcome persistent challenges in multi-view multi-label (MVML) learning, a critical area for real-world AI applications. The proposed approach, called Structural Entropy Guided High-Order Correlation Learning for Multi-View Multi-Label Feature Selection (SEHFS), introduces a structural entropy framework to capture complex, high-order feature dependencies and employs a hybrid optimization strategy to avoid local optima, marking a significant theoretical and practical advancement.

Overcoming the High-Order Correlation Challenge

While information-theoretic methods are prominent for learning nonlinear relationships in MVML, they have historically struggled with a key limitation: real-world data features often possess intricate high-order structural correlations that go beyond simple pairwise interactions. The SEHFS method directly addresses this by innovatively converting the feature graph into a structural-entropy-minimizing encoding tree. This process quantifies the information cost of high-order dependencies, allowing the model to learn these complex correlations directly.

The core mechanism groups features with strong high-order redundancy into single clusters within the encoding tree while simultaneously minimizing inter-cluster correlations. This dual strategy effectively eliminates feature redundancy both within and across clusters, leading to a more efficient and discriminative feature set. The theoretical establishment of structural entropy's ability to learn these high-order correlations provides a rigorous foundation for the model's design.

A Hybrid Framework for Robust Optimization

The second major challenge in the field is the reliance of many information-theoretic methods on heuristic optimization, which often leads to convergence at suboptimal local solutions. SEHFS adopts a new, robust framework that fuses information theory with matrix methods. This hybrid approach learns a shared semantic matrix alongside view-specific contribution matrices to reconstruct a global view matrix.

This architecture enhances the traditional information-theoretic method by providing a mechanism to balance global and local optimization. By learning both shared and view-specific structures, the framework mitigates the risk of local optima and ensures a more comprehensive capture of the data's underlying semantics from multiple perspectives.

Validated Superior Performance

The efficacy of the SEHFS method has been rigorously validated through extensive experimentation. Comprehensive tests were conducted on eight datasets from various domains, demonstrating that SEHFS achieves superior performance in feature selection compared to existing methods. Furthermore, detailed ablation studies confirmed the individual contributions of its novel components—the structural entropy mechanism and the hybrid optimization framework—to its overall success.

Why This Matters: Key Takeaways

  • Captures Real-World Complexity: SEHFS breaks new ground by learning high-order feature correlations, moving beyond the pairwise limitations of previous information-theoretic models and better aligning with the complex structures in real-world data.
  • Enhances Optimization Stability: The fusion of information theory with matrix methods provides a more stable optimization pathway, reducing the risk of converging to poor local solutions and improving model reliability.
  • Delivers Practical Results: Empirical validation across multiple, diverse datasets proves that the theoretical advancements translate into superior feature selection performance, making it a compelling solution for advanced MVML applications.

常见问题