Di Chang -

In the practice of distributed regression, selecting the optimal subset to eliminate redundant information is crucial for enhancing model performance. Distributed data subsets often face multiple challenges, including outliers, high variability, data duplication, excess independent variables, and point redundancy. Effectively managing and reducing this redundant information is an important approach to mitigate inconsistencies in statistical inference. In this paper, we have developed an R package COR, which implements optimal subset selection with respect to the covariance matrix, observation matrix, and response vector (COR), as well as estimating the optimal subset length. The implementation details of the COR package are presented, and its superior performance is demonstrated through a series of simulation studies and real-world applications, including the estate dataset ranging from low to high dimensions and riboflavin datasets.