AUTHOREA
Log in Sign Up Browse Preprints
LOG IN SIGN UP

Preprints

Explore 66,105 preprints on the Authorea Preprint Repository

A preprint on Authorea can be a complete scientific manuscript submitted to a journal, an essay, a whitepaper, or a blog post. Preprints on Authorea can contain datasets, code, figures, interactive visualizations and computational notebooks.
Read more about preprints.

Wildlife Diversity and the Impact of Human Disturbance on Activity Patterns: A Case S...
Wuyuan Zhang
Teng-Wei Su

Wuyuan Zhang

and 3 more

October 11, 2024
Understanding the impact of human activities on wildlife is crucial for effective biodiversity conservation and management. To assess mammal diversity and human disturbance impacts in the Labagoumen Nature Reserve, Beijing, we conducted a camera trapping survey from July 2019 to August 2022. Over 33,842 camera trap days yielded 5,002 identifiable photos of mammals, representing 13 species from 9 families and 5 orders, including new records of Eurasian red squirrel (Sciurus vulgaris) and Asian badger (Meles leucurus). The top five species by relative abundance index (RAI) were Siberian roe deer (Capreolus pygargus, RAI = 5.50), wild boar (Sus scrofa, RAI = 1.96), Eurasian red squirrel (RAI = 1.86), raccoon dog (Nyctereutes procyonoides, RAI = 1.59), and Pére David’s rock squirrel (Sciurotamias davidianus, RAI = 1.38).Activity rhythm analysis at 34 camera sites revealed unimodal patterns for Eurasian red squirrel, wild boar, and Pére David’s rock squirrel, a bimodal pattern for Siberian roe deer, and a trimodal pattern for raccoon dogs. Peak activities of all species were offset from peak human activities, with the highest overlap for Eurasian red squirrel and Pére David’s rock squirrel, and the lowest for raccoon dogs. Monthly and seasonal patterns showed the highest mammal activity in September, and peak human disturbances in May and October. Further analysis of the overlap between mammal daily activity rhythms and human disturbances during tourist and non-tourist seasons revealed that the overlap index for most species was higher during tourist seasons. Wild boars exhibited a bimodal activity pattern during tourist seasons and unimodal during non-tourist seasons. These findings enriched the species composition records of the reserve and revealed significant interactions between human disturbances and mammal behavior, providing a solid scientific basis for biodiversity conservation and management in the Labagoumen Nature Reserve.
Seasonal re-assembly of floodplain aquatic communities in lotic-lentic seasonal ecoto...
Hiromi Uno
Shunsuke Utsumi

Hiromi Uno

and 5 more

August 28, 2024
Seasonal changes in the environment often strongly influence biological communities. In environmental transition zones, or ecotone, the environment fluctuates over time between two different types of environments, and the seasonal change is more pronounced. Although emphasis has been placed on spatial variation of biota along environmental gradients, seasonal change has not been well studied despite the seasonal nature of many ecotones. The study was conducted on Butokamabetsu River floodplain, Hokkaido, Japan. In this study, we investigated seasonal biotic re-assembly in floodplain waterbodies characterized as transitions between lotic and lentic environments, and further investigated the biological processes behind seasonal re-assembly. We observed a clear seasonal re-assembly of biological communities in floodplain waterbodies. From the spring snowmelt season to the summer low flow season, the biological communities were largely driven by the hydrological connectivity to the river, represented as the timing of the lotic-lentic transition during the seasonal flood recession. In contrast, after a few months of summer low flow period, the effect weakened over time, and the communities were structured based more on the basis of the local environment. The seasonal re-assembly was largely explained by the re-assembly of amphibian and aquatic insect larvae, the main members of the floodplain aquatic assemblage, which metamorphose and emerge from the water during the summer period and then re-distribute in different ways more strongly influenced by local environmental factors such as water body size, temperature, and dissolved oxygen levels. Given that biota in ecotones occupy the habitat for a limited time due to the severe environmental fluctuations, such seasonal changes as we observed in this study may be widespread in ecotones. Landscape and local environmental conditions could alternately shape community structures in different seasons. Further attention to the temporal aspects of community structure is needed for community studies as well as for conservation.
Coupling effects of environmental factors on the phytoplankton community structure in...
Huibo Wang
Le Wang

Huibo Wang

and 6 more

October 11, 2024
Studying the coupled effects of environmental factors on the structure of phytoplankton communities can deepen our understanding of the stability of aquatic ecosystems in extreme environments. This study examined the phytoplankton community structure and environmental factors of saline a lake during spring, summer, and autumn in 2019. A total of 95 phytoplankton species (belonging to 47 genera and 7 phyla) were identified in Ebinur Lake, reflecting a species richness lower than those of freshwater lakes while being greater than the levels observed in other saltwater lakes. Bacillariophyta dominated the phytoplankton assemblage, followed by Chlorophyta and Cyanophyta, with lesser diversity in other algal species, suggesting that the species composition was similar to that observed in other saltwater lakes. There was considerable spatiotemporal variation in the structure of the phytoplankton community, with the biomass of phytoplankton displaying notable seasonal variation. In spring, the biomass of Bacillariophyta was dominant; in summer, as the climate warmed, the biomass of phytoplankton reached its peak and the biomass of Chlorophyta was dominant; in autumn, the biomass was the lowest, and Chlorophyta and Bacillariophyta shared dominance. The spatial distribution was relatively consistent, as reflected in the distribution of phytoplankton in the three seasons, with the southeastern area of the lake generally exhibiting higher biomass than other lake areas. Bacillariophyta and Chlorophyta were significantly correlated with water transparency (SD); Cyanophyta was significantly correlated with water temperature (WT), and Cryptophyta was significantly correlated with pH. The interaction effects of various environmental factors, including pH, SD, Chl. a, NH₄+-N, and salinity, jointly affect the dynamics of the phytoplankton community structure in Ebinur Lake. This study investigated the effects of physicochemical factors on the structure of the phytoplankton community in a high-salinity lake, thereby providing a basis for ecological protection and environmental management of aquatic ecosystems in extreme environments.
Industrial Location and Polarized Growth Dynamics in Arica, Chile  (1953-1980): A His...
Rodrigo Barra Novoa

Rodrigo Barra Novoa

September 10, 2025
This study examines the experience of industrial localization in Arica, Chile, during the period 1953-1980. It analyzes the impact of industrial promotion policies and the development pole strategy on the economic and demographic growth of the city. Using historical data and qualitative analysis, the effectiveness and limitations of these policies are evaluated. The results indicate that while these strategies achieved rapid initial growth, they did not generate long-term sustainable regional development due to reliance on special incentives, lack of regional integration, and vulnerability to changes in national economic policies. This case provides fundamental lessons to orient the design of policies that promote a more balanced regional industrial development.
Quickest GPS jamming detection using machine learning intelligent classifier-based RO...
ahmed moumena

ahmed moumena

October 11, 2024
Physical‐layer security threats have evolved from malicious attacks in wireless systems, due to their furtive nature, make wireless communication systems vulnerable. In this work we proposed a centralized modulated wideband converter (C-MWC) combined with classifier detector based Mahalanobis distance ( MD S ) based classical estimator S and robust distance ( RD MCD ) based MCD estimator. The received signal at each radio receiver in each channel pass by different steps to realize sub-Nyquist sampling rate. Every receiver gives minimum sampling. All compressed observations from each channel are collected in compressed data matrix, which is considered directly as the input of the proposed M D S - RD MCD classifier-based ROC curve in the level of fusion center (FC). The performance evaluation is performed in terms of anomaly detection rate-based threshold value of each distance. By employing one of the machine learning (ML) techniques MD S - RD MCD classifier based PCA using ROC curve, the performance of this new proposed system shows good.
Illuminating Curiosity: Exploring Voltage Distribution in LEDs Through Hands-On Circu...
K A Jafar Sadiq

K A Jafar Sadiq

October 14, 2024
STEM Lesson Plan: Exploring the Science Behind LEDs and Voltage DistributionGrade/Level : 10-12 Duration : 50 minutes Subject : Electronics and PhysicsOverview : Here students will explore the basic principles of Light Emitting Diodes (LEDs), diodes, and voltage distribution across circuits using LEDs of different colors. Through hands-on activities, students will investigate how different LEDs consume voltage, discover the differences between parallel circuit configurations, and gain insights into real-world applications of electronic switches. By sparking curiosity with simple yet surprising experiments, students will learn core concepts of electrical circuits and light behavior.Learning Standards :Understand the working of diodes and LEDs in electronic circuits.Investigate the relationship between color and wavelength in the context of LEDs and the light spectrum.Apply scientific concepts to practical circuits and interpret voltage differences with multimeter readings.Problem Statement : How do different colored LEDs respond when connected in parallel, and why do some LEDs light up while others don’t?Learning Outcomes :Explain how diodes, particularly LEDs, function in circuits.Understand and analyze how voltage distribution affects LEDs in parallel circuits.Explore the connection between LED colors and their required starting voltages.Demonstrate curiosity-driven problem-solving by identifying patterns in LED behavior.STEM Connect :Science : Wavelengths of light, the electromagnetic spectrum, scattering of light (why violet scatters more than red).Technology : Practical use of multimeters, understanding electrical components.Engineering : Circuit design using breadboards, resistors, and LEDs.Mathematics : Voltage distribution in circuits and its measurement using tools like multimeters.
A Combined Estimator for Nonlinear System Identification via LPV Approximations
Sadegh Ebrahimkhani
John Lataire

Sadegh Ebrahimkhani

and 1 more

October 11, 2024
This paper addresses the identification of Nonlinear (NL) systems using a linearization approach, introducing a combined estimator to tackle this challenge. We assume that the unknown NL system operates around a stable, slowly varying operating point. The system trajectory is then perturbed slightly via small, typically fast, input perturbations. We demonstrate that the NL system’s response to these small perturbations can be approximated by a Linear Parameter-Varying (LPV) system model. Furthermore, we show that this LPV model represents the linearized version of the unknown NL system around the operating point. A new parametrization for the LPV model coefficients is introduced, establishing a structural relationship between the LPV coefficients. This structural relationship reduces the number of parameters to be estimated and ensures that the LPV model always corresponds to the linearized form of the NL system. Additionally, we demonstrate that this LPV model structure allows for the unique reconstruction of the NL system model through symbolic integration, resulting in a closed-form nonlinear Ordinary Differential Equation (ODE). This integration introduces a second structural relationship, linking the LPV model to the NL model. By leveraging these two structural relationships, we reformulate the problem of NL system identification via linearization as a combined estimation problem, leading to a unified LPV-NL estimation framework. This approach utilizes all available data, including perturbation data (linear response) and the varying operating point (NL response). We propose a combined estimator that jointly estimates the NL-LPV model, capturing the intrinsic structure of NL system identification through linearization. Finally, we present a numerical example to illustrate the performance of the proposed method.
Less Conservative Robust Control of Polytopic Systems Part II: Metaheuristic Design a...
Shuhei Matsuda
Kang-Zhi Liu

Shuhei Matsuda

and 6 more

October 11, 2024
Owing to the bilinear nature of robust performance conditions, it remains a challenge to effectively design a controller for parametric systems. To overcome this difficulty, we establish a metaheuristic-based design framework in this paper. This framework includes a simple initialization method, detailed search flows, and the associated objective functions for each step. In addition, this method can individually and easily shape the gain characteristics of closed-loop transfer functions, thus lowering the hurdle of control design for complex and uncertain systems. The whole design procedure is validated and illustrated through its application to a drivetrain bench. Numerous trials show that on average a success rate of 70% is achieved in the search for the controller.
Less Conservative Robust Control of Polytopic Systems Part I: Analysis by space dilat...
Kang-Zhi Liu
Shuhei Matsuda

Kang-Zhi Liu

and 5 more

October 11, 2024
Parameter uncertainty is the most frequently encountered model uncertainty. Although the research on the robust control of parametric systems has a long history, existing design tools are still either conservative or not numerically efficient, particularly for the performance problems. This paper treats polytopic systems which have good compatibility with physical systems. It is shown that less conservative robustness conditions can be derived from the well-known Lagrange method by treating the performance specification as an objective function in a dilated signal space and regarding the dynamics as a hyperplane in this space. A broad class of frequency domain specifications and regional pole-placement are analyzed in detail. Desirable multiplier structures are also revealed through numerical analysis. The results lay a solid foundation for an effective robust performance design of the polytopic systems.
The island rule-like patterns of plant size variation in a young land-bridge archipel...
Zengke Zhang
Wensheng Chen

Zengke Zhang

and 13 more

October 11, 2024
The island rule- a general pattern of dwarfism in large species to gigantism in small species on islands relative to mainland- is typically seen as a macroevolutionary phenomenon. However, it remains unknown whether the ecological processes associated with abiotic and biotic factors generate a pattern of plant size variation similar to the island rule. Through measuring plant height for 29623 individuals of 50 common woody species in the Zhoushan Archipelago (8500 years old and yet to undergo major evolutionary adaptation) and the adjacent mainlands in Eastern China, we examined whether island area and remoteness, resource availability, environmental stress, plant-plant competition and insect herbivory can explain the pattern of plant size variation. We found pronounced variations in plant height, similar to those of the island rule. Further analyses revealed that islands with low resource availability, such as low soil organic matter content and low precipitation, had a high degree of dwarfism; islands experiencing high environmental stress, such as high soil pH, had a high degree of dwarfism; and islands experiencing less plant-plant competition had a high degree of gigantism. The magnitude of plant dwarfism was also higher on small and remote islands than on larger and nearer islands. These results highlight the importance of ecological processes associated with abiotic and biotic conditions in shaping the island rule-like patterns of plant size variation. Our study therefore suggests that the island rule can be caused by both ecological and evolutionary processes. Given that the age of our studied archipelago is too young to undergo major evolution, our results evidenced that ecological processes likely played a prominent role for generating the island rule-like patterns. Future studies on the island rule need to perform experiments to disentangle evolutionary from ecological mechanisms.
Photodynamic Biomimetic Nanoparticles Accelerate Tumor Vascular Normalization Initiat...
Yufei Liu
Changheng Xie

Yufei Liu

and 6 more

October 11, 2024
The abnormal tumor vascular networks fuel the tumor growth and aggravate the tumor hypoxia. Although vascular normalization therapy (VNT) was clinically effective, the long period before vascular normalization window (VNW) initiation presents significant prognostic challenges. Here, we developed a platelet-mimetic nano-system, IA@PM, which was self-assembled by apatinib (APA) and indocyanine green and further coated with platelet membrane (PM). IA@PM accelerates the pro-angiogenic and anti-angiogenic factors balance to significantly expedite the VNW initiation of APA from the 4th day to the 2nd day post-treatment through rationally utilizing photodynamic therapy (PDT). Benefiting from this VNW initiation acceleration, the timelier and facilitated tumor drug delivery was achieved. This process further constructs a self-amplified therapeutic cycle between VNT and PDT, contributing to the excellent antitumor effects. Collectively, IA@PM breaks key limiting factors in current VNT efficacy by accelerating VNW initiation and also significantly promotes PDT antitumor efficacy with lower doses of ICG, bringing revolutionary new strategies in VNT.
Corydalis Saxicola Bunting Total Alkaloids alleviate nonalcoholic fatty liver disease...
Fang Huang
Silu Li

Fang Huang

and 7 more

October 11, 2024
Background and Purpose: Nonalcoholic fatty liver disease (NAFLD) affects about 30% of the world’s population. The development of NAFLD affects the morphological structure, number and physiological function of liver mitochondria. Corydalis Saxicola Bunting Total Alkaloids (CSBTA), a promising therapeutic candidate for NAFLD, can play the role of anti-inflammation and hepatoprotection. The present study aimed to investigate the protective effect of CSBTA and its underlying mechanism. Experimental Approach: Our group used high fat and high cholesterol diet (HFHCD) to feed C57BL/6 mice for 20 weeks to establish a NAFLD disease model. CSBTA was administered at week 12 for 8 weeks. Palmitic acid (PA) was co-cultured with AML12 cells for 24 hours to establish lipid accumulation model in vitro. Histopathology, biochemical indicators of lipid metabolism, function of hepatocellular mitochondrial were detected and gene expression of mitochondrial biogenesis and mitophagy were also evaluated. Key Results: The results indicated that CSBTA can alleviate HFHCD and PA induced lipid accumulation, improve hepatocellular mitochondrial structure and function. CSBTA upregulated the gene and protein expression related to mitochondrial biogenesis, increased the content of mtDNA and promoted the normal proceed of mitophagy in damage cells, which blunted an effective operation of CSBTA on ameliorating NAFLD. Conclusion and Implications: CSBTA can effectively maintain the mitochondrial function in vivo and in vitro, increase the expression levels of mitochondrial biogenesis related genes in liver and AML12 cells, promote the occurrence of mitophagy, maintain the quantitative and qualitative homeostasis of mitochondria in the body, and ultimately improve NAFLD.
Cholera outbreak in Sudan 2024
Yosra Abdullatif Mahmoud Adam
Musab Abduljalil Mohamed

Yosra Abdullatif Mahmoud Adam

and 3 more

October 11, 2024
In this review article, highlights about cholera and the outbreak of cholera and its effect on the health system and human beings in Sudan have been discussed. Additionally, the article contains an overview of the common risk factors and possible reasons for the outbreak of the cholera and the efforts of the government and the NGOs in controlling the outbreak. Cholera is a highly virulent disease that spreads through contaminated food or water and can cause severe diarrhea. An outbreak occurs when more cases than expected occur in a specific location over a specific time period. On 15 April 2023, clashes started between the Sudanese army and the Paramilitary Force with ongoing fighting that led to a humanitarian crisis and infrastructural collapses, severe famine, and many outbreaks, including the outbreak of cholera. This was declared in August 2024 after a wave of cases began on July 22. Between July and September, 8,457 cases and 299 deaths were reported across eight states. Between July and December 2023, 3.1 million people in Sudan were at risk, with 500,000 children under five at risk. The ministry of health and NGOs are working together to control the outbreaks by providing water and sanitation, detecting and managing cases, and vaccinating the affected population.
Pulmonary Actinomycosis Masquerading as Lung Cancer
Hana Blibech
Yossr Aloulou

Hana Blibech

and 10 more

October 11, 2024
Key clinical message:
IRF4 haploinsufficiency in a multiplex family with Whipple's arthritis

Sinem Unal

and 10 more

October 11, 2024
Sinem Unal1,2#, Stéphanie Dublanc3#, Hailun Li1,2, Camille Soudée1,2,Whipple Consortium, Vivien Béziat1,2,4, Guillaume Vogt1,2,4, Xavier Puéchal5,Jean-Laurent Casanova1,2,4,6,7,§, Jacinta Bustamante1,2,4,8,§,@, Jérémie Rosain1,2,4,8,§,@1Laboratory of Human Genetics of Infectious Diseases, Necker Branch, Inserm U1163, Necker Hospital for Sick Children, Paris, France, EU2University of Paris Cité, Imagine Institute, Paris, France, EU3Departmental of Rheumatology, Libourne Hospital, Libourne, France, EU4St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, Rockefeller University, New York, New York, USA5National Referral Center for Rare Systemic Autoimmune Diseases, Cochin Hospital, Assistance Publique-Hôpitaux de Paris (AP-HP) Centre, University of Paris Cité, Paris, France, EU6Howard Hughes Medical Institute, New York, NY, USA7Department of Pediatrics, Necker Hospital for Sick Children, AP-HP, Paris, France, EU8Study Center for Primary Immunodeficiencies, Necker Hospital for Sick Children, AP-HP, Paris, France, EU*,§equal contributions@Correspondence: jacinta.bustamante@inserm.fr (J.B.) or jeremie.rosain@institutimagine.org (J.R.)
Incidence of leukemia in Eritrea: 11-year Laboratory -based retrospective analysis of...
Daniel Mebrahtu Abraha
Efriem Ghirmay

Daniel Mebrahtu Abraha

and 10 more

October 11, 2024
Little or no research has been conducted on the epidemiology of leukemias in Eritrea . In this retrospective study, we evaluated the burden and trends of acute lymphoblastic leukemia (ALL), Acute myelogenous leukemia (AML), chronic myelogenous leukemia (CML), chronic lymphocytic leukemia (CLL) and overall leukemia in Eritrea. Methods: An audit of leukemia cases recorded in laboratory logbooks at the National Health Laboratory (NHL) and Orotta Referral and Teaching Hospital (ORTH) between January 2010 and December 2021 was performed. Aside from leukemia sub-types, additional variables that were retrieved included age, sex, years of incidence, residency. Relevant estimates assessed included crude incidence rates (CIR), age-standardised rates (ASIR) and estimated annual percentage change (EAPC). Results: In total, 372 confirmed cases of leukemia were recorded between, 2010-2020. The median [interquartile range (IQR)] age, maximum – minimum age, and male/female ratio were as follows: 48 years (24.5 – 60 years), 2 - 91 years, and 210/161 (1.3: 1), respectively. Estimated all-age CIR and ASIR over the study period was 9.22 per 100 000 and 30.1 per 100 000 respectively. Analysis of cumulative (2010 - 2020) CIR per 100 000 (ASIR per 100 000) for ALL, AML, CLL, and CML were as follows: 2.01(3.87); 0.94(2.38); 2.94(15.37) and 3.61(24.03). Additionally, median (IQR) age differed significantly across different subtypes of leukemia – ALL (23.0 years, IQR: 10.0 – 39.0); AML (30 years, IQR: 20 – 56 years), CLL (59.0 years, IQR: 40.75 – 66.75 years), and CML (49 years, IQR: 39.25 – 60 years), p value (Kruskal Wallis), < 0.05). No sex specific differences were observed in median (IQR) for different types of leukemia. Unlike other leukemia sub-types evaluation of EAPC demonstrated that the incidence of leukemia has increased overtime, 21.9 (95 CI: 3.1-44.1), p-value = 0.025. Conclusions: The burden of leukemia was relatively stable . However, due to underreporting and underdiagnosis, it’s our belief that the true burden of leukemia is likely higher. Further, an upward trend in the burden of ALL was uncovered. Lastly, expansion of diagnostic services to other sub-zones, establishment of a national cancer registry and research remains a priority in Eritrea.
Changes in the immune function of CD4+ and CD8+ T lymphocytes and their value in infe...
Yin Xu
Yingyan Zan

Yin Xu

and 6 more

October 11, 2024
Due to the primary disease or chemotherapy-induced neutropenia, patients with acute myeloid leukemia (AML) become a high-risk population for infections. This article discusses the effects of AML cells and chemotherapeutic drugs on the immune function of CD4 + and CD8 + T lymphocytes, as well as the role of their quantitative changes in identifying the risk of infection during the post-chemotherapy bone marrow suppression period in AML. It summarizes the immune function of CD4 + and CD8 + T lymphocytes in the diagnosis of infections and pathogen differentiation during the post-chemotherapy bone marrow suppression period in AML.
Case Series of Intracranial Central Primitive Neuroectodermal Tumor in Two Adults Tre...
Shubham Dokania
Sambit Nanda S

Shubham Dokania

and 6 more

October 11, 2024
Ewing sarcomas (ESs) and primitive neuroectodermal tumours (PNETs) exhibit identical genetic and histological characteristics, hence collectively denoted as ESs/PNETs, originating from the neuroectoderm and primarily consisting of primitive neuroectodermal cells. PNETs occur primarily in the cerebrum. They constitute 3-5% of all paediatric brain tumours. This case report describes two cases of intracranial central PNET with negative IHC and chromosomal markers in adult patients treated with craniospinal irradiation (CSI) and focal radiotherapy boost with concurrent and adjuvant chemotherapy.
AN INVERSE TIME-DEPENDENT SOURCE PROBLEM FOR DISTRIBUTED-ORDER TIME-SPACE FRACTIONAL...
Huimin Wang
Yushan Li

Huimin Wang

and 1 more

October 11, 2024
This paper focuses on the inverse time-dependent source term problem in a distributed-order time-space fractional diffusion equation (DTSFDE) using initial and boundary conditions and boundary Cauchy data. Firstly, we prove the existence and uniqueness of the solution to the direct problem under homogeneous Neumann boundary conditions. Additionally, based on regularity of the solution to the direct problem, uniqueness and stability estimates for the inverse problem are established. Subsequently, we convert the inverse problem into a variational problem using the Tikhonov regularization method, and used the conjugate gradient algorithm to solve the variational problem, obtaining an approximate solution to the inverse source problem. Finally, we validate the effectiveness and stability of the proposed algorithm through numerical examples.
Astronomical Cycle Identification and High-Frequency Sequence Stratigraphy of the Low...
Wenhu Yu
2016592004

Wenhu Yu

and 3 more

September 10, 2025
The Mahu Sag, a crucial hydrocarbon-bearing depression in the Junggar Basin, has recently been identified as having significant oil reservoirs within the deep layers of the Lower Wuerhe Formation (Middle Permian). However, the lack of a unified high-frequency sequence stratigraphic framework for this formation has impeded isochronous sand body correlation and hindered sustainable hydrocarbon exploration. In this study, we integrate traditional sequence stratigraphy with astronomical cycle theory by utilizing spectral analysis and correlation coefficient (COCO) analysis on natural gamma-ray logging curves. Our results indicate that sedimentation of the Lower Wuerhe Formation was significantly controlled by astronomical cycles. Spectral analysis reveals that the deposition time of three third-order sequences ranges from 0.83 to 2.06 million years. Combined with seismic profiles and Integrated Prediction Error Filter Analysis (INPEFA) technology, we further identify the superimposed cyclic characteristics of the strata. Through multi-scale analysis, the Lower Wuerhe Formation is subdivided into three third-order sequences, eleven fourth-order sequences, and twenty-six fifth-order sequences. This research establishes a multi-level isochronous stratigraphic framework for the Lower Wuerhe Formation, enhancing the precision and efficiency of hydrocarbon exploration and supporting more sustainable energy development.
Fractional Garden Design
Ibukunoluwa Taiwo

Ibukunoluwa Taiwo

March 31, 2026
Lesson Title: Fractional Garden DesignGrade Level: Primary 6 (Grade 6)Time: 50 minutes
The Role of Wearable technology in the Diagnosis of Atrial Fibrillation and Benefits...
Wahab Khawar Siddiqui
Mobeen Siddiqui

Wahab Khawar Siddiqui

and 2 more

October 10, 2024
Dear Editor, Atrial Fibrillation is an arrhythmic heart condition characterized by an irregularly irregular heart rhythm. Atrial Fibrillation is the most common sustained cardiac arrhythmia and its global prevalence is increasing [1][2]. This growing trend shows that by 2020 there will be 12.1 million cases of AFib worldwide [3]. However, it mostly goes undiagnosed in Early stages. Recent Advancements have led to production of smart watches or wearable’s that have been equipped with sensors that are able to detect heart rate. It has been documented that use of wearable is being adopted worldwide and consumers are increasing [4] Wearable devices can continuously monitor heart rhythms, any irregularities will also be detected in asymptomatic patients or those with intermittent symptoms. Thus offering a potential for an early diagnosis. If such individuals report to the hospital immediately an established diagnosis can be made. An apple heart study concluded that out of 419,297 participants, 2161 (0.52%) participants received notification for irregular pulse. 450 participants shared ECG data that could be analyzed, atrial fibrillation was present in 34% overall and 35% of participants 65 years of age or above [5]. Early detection will help in timely medical interventions to prevent compilations like stroke, heart failure and cardiovascular issues. A 12 month compliance data of 8500 individuals showed that patients with hypertension and diabetes who used Apple watch or Fitbit were 1.3 times more likely to take their medication of time [6]. I advocate for Increased awareness among individuals about the benefits of wearable so an early diagnosis can be done and subsequent early treatment along with lifestyle changes can be done. This will prevent complications and improve morbidity and mortality. In long course of time this will also reduce the burden on healthcare system and Doctors by reducing numbers of patients presenting with severe complications of AFib. This will create a worldwide impact of decreased incidence of Afib and it complications.
Boosting Commit Classification with Contrastive Learning
Jiajun Tong
Zhixiao Wang

Jiajun Tong

and 2 more

October 10, 2024
Commit Classification (CC) is an important task in software maintenance, which helps software developers classify commit changes into different types according to their nature and purpose. However, existing models need lots of manually labeled data for model fine-tuning, when training samples are insufficient, ensuring the performance of commit classification becomes very critical. The scarcity of data also leads to the problem of poor model generalization ability, resulting in satisfactory performance only on specific tasks. Moreover, they often ignore the sentence-level semantic information in the commit message, which is essential for discovering the difference between diverse commits, especially for fewshot scenarios. In this work, we propose to boost commit classification with contrastive learning. This method can solve the CC problem in fewshot scenarios. To augment the training datasets and improve the generalization ability of our proposed method, we generate additional training samples by Semantic Prototype, which is defined as a representative embedding for a group of semantically similar instances. To produce meaningful and discriminating sentence-level vectors for each commit in a pair, we employ a pretrained Sentence-Transformer as the embedding layer. The network then learns to maximize the distance in the latent space for positive pairs and minimize it for negative pairs, leading to a fine-tuned Sentence-Transformer with fixed weights for the downstream commit classification task. Extensive experiments on two open available datasets demonstrate that our framework, though simple, can solve the CC problem effectively even in fewshot scenarios. It not only achieves state-of-the-art performance but also improves the adaptability of the model without requiring a large number of training samples for fine-tuning. The code, data, and trained models are available at https://github.com/CUMT-GMSC/CommitFit.
NicheFlow: Towards a foundation model for Species Distribution Modelling - Supporting...
Russell Dinnage

Russell Dinnage

October 16, 2024
MATHEMATICAL DETAILS Derivation of the Model Equations The goal is to estimate the probability distribution of species across geographic coordinates \((X, Y)\), given that the species is \( S = s \), and that it occurs \((O_s = 1)\) e.g. \(P(X, Y | S = s, O_s = 1)\). For simplicity we will use the expression \(S = s\) to represent \(S = s, O_s = 1\). To include the environment in this probability, this can be represented mathematically as: \[ P(X, Y \mid S = s) = ^{\infty} \cdots ^{\infty} P(X, Y, \mid S = s) \, e_1 \cdots e_n. \] This expression introduces the environmental variables \(\) and integrates over all possible environmental conditions. 1. APPLYING THE LAW OF TOTAL PROBABILITY: We expand the joint probability using the Law of Total Probability: \[ P(X, Y \mid S = s) = ^{\infty} \cdots ^{\infty} P(X, Y \mid S = s, ) P( \mid S = s) \, e_1 \cdots e_n. \] This step decomposes the probability into two components: one that describes how environmental conditions affect the geographic distribution, and another that captures the species' niche, or how environmental conditions influence species occurrence. 2. ASSUMPTION OF CONDITIONAL INDEPENDENCE: At this point, we make a key biological assumption: the occurrence of a species is driven entirely by environmental conditions, not by geographic coordinates themselves. In statistical terms, this means we assume that species \( S \) is conditionally independent of the coordinates \((X, Y)\) given the environmental conditions \(\). Mathematically, this is expressed as: \[ P(X, Y \mid S = s, ) = P(X, Y \mid ). \] This substitution reflects the idea that any effects of geographic coordinates on species occurrence are only through their relationship with the environment. Biologically, this means that the environment determines where species can occur, and coordinates influence species distributions only indirectly via their environmental characteristics. This is related to the standard statistical assumption of "no unmeasured confounders," implying that environmental variables capture all relevant factors affecting species occurrences. Substituting this assumption, we get: \[ P(X, Y \mid S = s) = ^{\infty} \cdots ^{\infty} P(X, Y \mid ) P( \mid S = s) \, e_1 \cdots e_n. \] 3. ENVIRONMENTAL NICHE AS A CONDITIONAL DENSITY WITH SPECIES EMBEDDINGS: The term \( P( \mid S = s) \) captures the environmental niche of the species, representing a high-dimensional probability distribution of environmental conditions where the species is likely to occur—often referred to as a "hypervolume" in ecological terms. We further decompose this distribution by introducing a latent variable \(_s\), which serves as a lower-dimensional vector representation of the species' complex niche. This process is a form of representational learning, where \(_s\) is optimized to encapsulate the essential ecological characteristics of the species. Specifically: \[ P( \mid S = s) = \int_Z P( \mid = _s) P( = _s) \, z, \] where \(_s\) represents the species' niche in a lower-dimensional space, allowing for a more efficient and flexible representation of complex environmental dependencies. 4. COMBINING THE INTEGRALS: Substituting this into Equation (1), we arrive at the final expression: \[ P(X, Y \mid S = s) = ^{\infty} \cdots ^{\infty} P(X, Y \mid ) \left(\int_Z P( \mid = _s) P() \, z\right) \, e_1 \cdots e_n. \] This final equation (2) is a combination of two probability distributions that are independent of one another: 1. \( P(X, Y \mid ) \): Represents the probability of geographic coordinates given environmental conditions. 2. \( P( \mid S = s) \): Represents the environmental niche of the species, describing how likely a species is to occur under different environmental conditions. Biologically, \( P( \mid S = s) \) captures the species' niche, detailing the range of environmental conditions under which a species can thrive. Meanwhile, \( P(X, Y \mid ) \) maps these environmental conditions to geographic locations, indicating where such suitable conditions are found on Earth. Distance Metrics for Comparing High Dimensional Distributions Let \(_{}\) and \(_{}\) be two matrices where rows represent individual environmental vectors associated with predicted and observed occurrences, respectively. These matrices can have different numbers of rows, reflecting the flexibility of the distance measures used. Energy Distance: Energy Distance measures the expected difference between pairs of samples from two distributions and provides a smooth, computationally efficient metric for comparing distributions: \[ E(_1, _2) = 2[\|_1 - _2\|] - [\|_1 - _1'\|] - [\|_2 - _2'\|], \] where \(_1\) and \(_2\) represent matrices of vectors, and \(_1'\) and \(_2'\) are independent copies of \(_1\) and \(_2\). Energy Distance primarily matches the marginal characteristics of the distributions, such as mean and variance, without explicitly aligning their spatial structures. This makes it computationally efficient and ideal for quickly moving the optimization towards higher probability regions in the latent space due to its smoother loss surface (Székely & Rizzo, 2013). Sinkhorn Distance: Sinkhorn Distance is a computationally efficient approximation of the Wasserstein distance (also known as Earth Mover's Distance), which measures the cost of optimally transporting one probability distribution to match another. The classic Wasserstein distance is defined as: \[ W(_1, _2) = c(_{1,i}, _{2,j}), \] where \( \Pi(\mu, \nu) \) represents the set of all transport plans between distributions \( _1 \) and \( _2 \), and \( c(_{1,i}, _{2,j}) \) is typically the squared Euclidean distance between vectors \(_{1,i}\) and \(_{2,j}\). The Wasserstein distance captures detailed structural characteristics such as covariance and spatial distribution of the data, making it particularly valuable for ecological applications (Villani, 2008). However, it is computationally expensive, especially in high-dimensional settings typical of environmental data. Sinkhorn Distance introduces an entropic regularization term to the Wasserstein distance, which not only smooths the optimization landscape but also makes the transport plan differentiable, allowing gradient-based optimization. The Sinkhorn distance is defined as: \[ S(_1, _2) = c(_{1,i}, _{2,j}) + \epsilon \log(), \] where \(\epsilon\) is a regularization parameter that controls the trade-off between transport cost and entropy. The Sinkhorn algorithm iteratively adjusts dual variables to find a stable transport plan, which is computationally efficient and differentiable, unlike traditional optimal transport solutions. This differentiation enables the use of gradient descent for embedding optimization, making Sinkhorn particularly suitable for complex, high-dimensional comparisons (Cuturi, 2013). Zero-shot Optimization Details Monte Carlo Sampling for Loss Estimation: A key aspect of the optimization approach for zero-shot species is the Monte Carlo sampling of environmental vectors. At each optimization step, \( N_{} \) samples of predicted environmental vectors, \(_{}\), are drawn from the generative model conditioned on the current species embedding \(_{s^*}\). This results in a matrix of predicted environmental vectors, \(_{}\), which is compared to the matrix of observed environmental vectors \(_{}\). The flexibility of Energy and Sinkhorn distances allows them to operate on matrices with differing numbers of rows, meaning \( N_{} \) does not have to match the number of observed points \( N_{} \). This flexibility is advantageous because it permits balancing between computational efficiency and optimization effectiveness. A larger \( N_{} \) results in more accurate distance estimation but increases computational costs, while a smaller \( N_{} \) adds stochasticity, which can help escape local minima during stochastic gradient descent. In practice, I found \( N_{} = 1000 \) samples per optimization iteration provided a good balance, offering sufficient stochasticity without excessively slowing down computation, particularly for test species with approximately 100 observed occurrence points. A good approach to the optimization was to have the parameter \(\alpha\) start at 1 and gradually decrease it to 0 throughout the optimization. Initially, Energy Distance is emphasized due to its computational efficiency and smoother loss surface, which facilitates rapid convergence toward higher probability regions in the latent space. As \(\alpha\) decreases, the focus shifts towards Sinkhorn Distance, which allows for fine-tuning by capturing intricate distributional details, albeit with a more complex loss landscape. The differentiability of both Energy Distance and Sinkhorn Distance is crucial, as it allows for efficient backpropagation of gradients through the loss function, enabling gradient-based optimization of \(_{s^*}\). MODEL ARCHITECTURE DETAILS NichEncoder Stage 1: Conditional Variational Autoencoder (CVAE) The first stage of NichEncoder is a Conditional Variational Autoencoder (CVAE) that generates environmental variables \(\) from species-specific embeddings, \(_{}\). The CVAE conditions on \(_{}\) by concatenating these embeddings to each layer of both the encoder and decoder networks, allowing the model to learn species-specific environmental niches. The architecture consists of multiple layers of Multi-Layer Perceptrons (MLPs) that encode and decode the input environmental variables. ENCODER AND DECODER STRUCTURE The encoder network takes environmental variables concatenated with species embeddings as input and processes them through three fully connected layers, each with 1024 neurons and ReLU activations. The output of the encoder includes the mean (\(\mu\)) and log variance (\(\log \sigma^2\)) of the latent variable \(_{}\). The decoder network mirrors this structure, taking the latent variable \(_{}\) and species embeddings as input to reconstruct the environmental variables \(}\). Species embeddings are learned using an embedding layer that maps species identifiers to \(_{}\) vectors, which are used throughout the encoder and decoder networks. LOSS FUNCTION AND OPTIMIZATION STRATEGY The CVAE is optimized using a combined loss function that includes the reconstruction loss, KL divergence, and species latent space regularization. The \(\gamma\) parameter plays a crucial role in balancing the contributions of the likelihood and the KL divergence during training, progressively adjusting the importance of these terms to ensure the latent variables capture the data manifold effectively. The total loss function is given by: \[ = \gamma \cdot _{q(_{} \mid , _{})}[\log p( \mid _{}, _{})] - D_{}(q(_{} \mid , _{}) \parallel p(_{})) + \lambda \cdot }, \] where \(\gamma\) is initially set to down-weight the likelihood term and is gradually increased during training to enhance the model's focus on data reconstruction. The species regularization term, \(}\), is designed to keep the species latent space compact by penalizing a combination of squared (L2) and absolute (L1) values of the latent variables, effectively balancing between Gaussian (ridge regression) and Laplace (lasso regression) priors: \[ } = (1 - \alpha) \cdot \sum (_{}^2) + \alpha \cdot \sum |_{}|, \] where \(\alpha\) is a tunable parameter controlling the weighting between the two regularization terms. The optimizer used for training the CVAE is AdamW with a learning rate of 0.002, and a one-cycle learning rate policy is employed over 2500 epochs to dynamically adjust the learning rate during training. Stage 2: Rectified Flow Model The second stage of NichEncoder uses a Rectified Flow model, inspired by , to address the challenge of modeling the non-Gaussian posterior distribution of \(_{}\). This model estimates an Ordinary Differential Equation (ODE) that transforms noise into the complex distribution observed in \(_{}\). RECTIFIED FLOW MODEL DETAILS The Rectified Flow model is based on a U-net style architecture adapted with MLP layers for vector field estimation (Figure 2, main manuscript). The architecture consists of three encoding and three decoding layers, with progressively decreasing and then increasing neuron counts (512, 256, and 128). Inputs to the model include the coordinates of latent variables, a time variable encoding, and species embeddings. TRAINING DATA CREATION Training data for the Rectified Flow model is generated by interpolating between Gaussian noise samples and target samples from the latent distribution, forming a path that the model learns to approximate. The model is trained to predict the vector direction of these interpolated samples, effectively estimating a vector field pointing towards areas of high density in the target distribution. TRAINING PROCESS The training process of the Rectified Flow model consists of two stages: 1. In the first stage, the model learns to estimate the vector field that guides noise samples toward the target latent distribution. The model is trained on paths created by the interpolation, capturing the flow dynamics through the latent space. 2. In the second stage, the model refines the learned ODE by training on noise samples and their corresponding points after the initial ODE transformation. This rectification stage adjusts the ODE to approximate a linear transformation from noise to the target distribution, enhancing sampling efficiency. LOSS FUNCTION AND OPTIMIZATION SETTINGS The loss function minimizes the mean squared error (MSE) between the predicted and target vectors during training: \[ = (, ). \] The optimizer used is AdamW with a learning rate of 0.001 and a weight decay of 0.01. A one-cycle learning rate policy is applied over 6000 epochs, adjusting learning rates dynamically to ensure effective training. TRAJECTORY SAMPLING Trajectories are sampled using an ODE solver (ode45), integrating the learned vector field to transform noise into samples from the target distribution efficiently. This approach allows the model to generate high-quality samples with minimal integration steps. GeODE Model Architecture GeODE uses a modified rectified flow architecture similar to that used in NichEncoder but tailored specifically for geographic data. The model generates 2-dimensional noise vectors as inputs, which are transformed through the rectified flow mechanism to output the desired geographic coordinates. The input consists of random noise vectors representing initial guesses in 2D space, while the conditioning input (\(\)) comprises environmental variables associated with each geographic location. Each environmental vector is normalized using means and standard deviations calculated from the data, ensuring numerical stability during training. U-NET ARCHITECTURE The core of GeODE is a U-net style structure implemented with Multi-Layer Perceptrons (MLPs) instead of convolutional layers . The U-net consists of two primary paths: downsampling and upsampling. In the downsampling path, the input noise vectors and environmental conditioning are passed through three fully connected layers with progressively smaller neuron counts (512, 256, and 128). These layers reduce the dimensionality while learning broad, high-level representations of the relationship between geographic locations and environmental factors. The upsampling path reconstructs the geographic coordinates by reversing the dimensionality reduction, using three corresponding fully connected layers to produce the final outputs. Skip connections between the downsampling and upsampling paths retain and propagate finer details, leading to more accurate predictions. INPUT CONDITIONING AND ENCODING In addition to the U-net structure, GeODE includes specialized encoding layers for the time variable \(t\) and environmental conditioning vectors. A linear layer encodes the time step, representing the interpolation factor between noise and target coordinates. Another linear layer processes the environmental vectors, embedding them into a latent space that informs the transformation from noise to geographic coordinates. These encoded time and environmental vectors are concatenated with the latent representations from the U-net, allowing the model to incorporate both spatial and environmental dependencies into its predictions. Training Data Creation Training data for GeODE is generated through a Monte Carlo sampling process. Gaussian noise samples are drawn for both the latitude and longitude dimensions, creating initial random coordinate sets. These coordinates are linearly interpolated with target coordinates (actual occurrence points), guided by the ODE. This interpolation path forms the input for training, allowing the model to learn how to evolve from noise to realistic geographic distributions. Training Process STAGE 1: VECTOR FIELD ESTIMATION In the first stage, the model learns to estimate a vector field that guides the initial noise samples toward the target coordinates, which represent actual geographic occurrence points. The U-net predicts the transformation vectors that align the noise vectors with the target distribution. Training data is generated by sampling Gaussian noise vectors for both \(X\) and \(Y\) dimensions, which are then linearly interpolated with the actual occurrence points. This interpolation forms a path between the noise and the target coordinates, which the model learns to follow. STAGE 2: ODE RECTIFICATION The second stage refines the transformation by rectifying the ODE. In this step, the ODE is adjusted so that it evolves the noise vectors in a near-linear path toward the target coordinates, minimizing the computational steps required to generate realistic samples during inference. Loss Function and Optimization The loss function used to train GeODE is the mean squared error (MSE) between the predicted and target coordinates: \[ = (, ). \] Optimization is performed using the AdamW optimizer with a learning rate of 0.001 and a weight decay of 0.01. A one-cycle learning rate policy is employed over 5000 epochs, dynamically adjusting the learning rate to improve training efficiency.
← Previous 1 2 … 693 694 695 696 697 698 699 700 701 … 2754 2755 Next →

| Powered by Authorea.com

  • Home