Journal of Forensic Investigation
Download PDF
Peptides were loaded onto a 100 μm × 25 mm Magic C18 (200 Å, 5 μm) trap column and separated on a 75 μm × 150 mm Magic C18 (200 Å, 3 μm) analytical column using a 90-minute gradient at 300 nL/min. Full MS scans were acquired over 300–1600 m/z. The top 15 precursor ions were selected for high-energy collisional dissociation (HCD) using a 2.0 m/z isolation window, 27% normalized collision energy, and 5-second dynamic exclusion.
Protein Identification - MS/MS spectra were searched using X! Tandem (version Alanine 2017.2.1.4) against the HumanFR_ crap05292020_rev database (149,657 entries). Search parameters included: (i) Trypsin specificity; (ii) Parent and fragment ion tolerances of 20 ppm; and (iii) Variable modifications: carbamidomethylation (Cys), oxidation (Met, Trp), deamidation (Asn, Gln), N-terminal pyro-Glu formation, ammonia loss, and selenocysteine modifications. Protein identifications were validated using Scaffold (v4.11.1, Proteome Software Inc.). Peptide identifications were accepted at >88% probability to achieve <0.5% false discovery rate (FDR). Protein identifications were accepted at >5% probability, FDR <5%, and required at least two unique peptides, as assigned by the Protein Prophet algorithm. Proteins sharing indistinguishable peptides were grouped according to parsimony principles. Relative protein abundance was estimated using weighted spectral counting. Salivary protein profiling and statistical analyses- Proteomic data were normalized to the total spectral counts within each sample. Across datasets, 1,169 proteins were identified in the first cohort and 281 in the second; proteins present in all datasets were retained for further analysis (n = 269; (Supplementary Table 1). To correct for technical variation and non-biological batch effects, data were processed using the EigenMS algorithm [32,33] Corrected data were subjected to principal component analysis (PCA) using ClustVis [34]. To identify proteins contributing most strongly to inter-subject variability, PCA loading values from the full 269-protein dataset were ranked by absolute magnitude (Supplementary Table 2). The top 10 proteins were selected for reduced-dimensional PCA analysis. Additional statistical analyses were performed using GraphPad Prism 9.1.0. To quantify intersample similarity within PCA space, Euclidean distances were calculated using PC1 and PC2 coordinates exported from ClustVis. For each PCA model (full 269-protein dataset and reduced 10-protein dataset), a centroid representing the perpetrator-derived samples (CH14, CH14.1, CH14.2) was calculated by averaging their PC1 and PC2 values. Euclidean distance for each sample was then computed as √ [(PC1_sample − PC1_centroid) ² + (PC2_sample − PC2_centroid) ²]. Distances for perpetrator-derived samples were compared with those of all non-perpetrator samples using the Mann–Whitney U test (two-tailed). Statistical significance was defined as P < 0.05.
DNA Quantification- DNA quantification was performed using the Quantifiler® Trio DNA Quantification Kit (Thermo Fisher Scientific, Waltham, MA) on a QuantStudio™ 5 Real-Time PCR System. Standard curves were generated using serial dilutions down to 5 pg/μL. Reactions were prepared according to the manufacturer’s instructions and amplified under the following conditions: 95°C for 2 minutes, followed by 40 cycles of 95°C for 9 seconds and 60°C for 30 seconds.
Amplified products were separated using an Applied Biosystems™ 3500xL Genetic Analyzer with 36 cm capillary arrays and POP-4 polymer. Samples were prepared with Hi-Di™ formamide and WEN ILS 500 size standard, denatured at 95°C for 3 minutes, snap-cooled, and analyzed using the HID36_POP4XL module.
Protein–protein interaction (PPI) network analysis using STRINGdb demonstrated significant biological connectivity (237 nodes, 2,823 edges; expected edges = 674; average node degree = 23.8; clustering coefficient = 0.49; PPI enrichment p < 1 × 10-¹⁶), supporting the biological coherence of the retained dataset (Figure 1A). Unsupervised k-means clustering (k = 2, determined by centroid stability and within-cluster variance) separated the proteins into two principal functional groups [Figure 1A]. The larger cluster (n = 207) was enriched for secretory granule lumen and neutrophil degranulation pathways, whereas the smaller cluster (n = 25) was enriched for keratinization and intermediate filament organization. Gene Ontology enrichment confirmed expected salivary molecular functions, including endopeptidase inhibitor and antioxidant activities [Figure 1B-C]. Comparison with previously published salivary proteomes demonstrated overlap with known salivary signatures, while identifying 143 proteins not previously reported [Figure 1D], indicating expanded proteomic depth rather than a technical artifact.
Visualization of uncorrected data demonstrated separation by collection cohort, consistent with batch-driven variance [Figure2 A-C]. Following EigenMS correction, this separation was markedly reduced, indicating effective removal of systematic bias while retaining inter-individual variability.
All subsequent analyses were conducted on batch-corrected data
In the reduced 10-protein PCA model [Figure 3B], PC1 and PC2 explained 27.6% and 20.2% of the total variance, respectively (47.8% cumulative). The increase in explained variance after feature reduction suggests that the selected proteins capture a greater proportion of subject-associated variability than the full proteome dataset. The three perpetrator-derived samples (CH14, CH14.1, CH14.2) formed a compact cluster in PC space. To quantify this separation, Euclidean distances were calculated in PC1–PC2 space relative to the centroid of the perpetrator samples. Perpetrator-derived samples exhibited significantly smaller distances to their centroid (0.0633 ± 0.0105) compared with all non-perpetrator samples (0.6269 ± 0.1616; Mann–Whitney U = 0, p = 1.88 × 10-⁴). The closest nonperpetrator sample (CH6.1; distance = 0.248) remained substantially separated from the perpetrator cluster. These results quantitatively confirm the enhanced discriminatory resolution observed visually in [Figure 3C]. Collectively, these findings demonstrate that salivary proteomic profiling retains subject-specific structure capable of discriminating a crime scene sample from unrelated individuals, even after correction for batch effects and dimensionality reduction. The observed clustering and quantitative separation in PCA space suggest that a focused subset of discriminatory proteins enhances resolution beyond that achieved with the full proteome. To contextualize the forensic utility of this approach, we next compared proteomic-based discrimination with conventional STR DNA genotyping performed on the same cohort.
Using all 23 loci [Figure 4A], PC1 and PC2 explained 18.0% and 16.5% of total variance, respectively (34.5% cumulative). PCA revealed substantial overlap among samples, reflecting the relatively small variance in genotypic frequencies across individuals. For example, samples A14 and A16 (perpetrator-derived samples) overlapped completely in PC space, consistent with identical STR profiles. However, additional samples (e.g., A7) occupied proximal positions in PCA space despite having distinct genotypes. Also, additional sample pairs exhibited limited spatial resolution in reduced-dimensional space (A1–A2, A3–A11, A5–A10; [Figure 4A]. The limited dispersion observed in PCA likely reflects the constrained numerical range of genotypic frequencies derived from a finite population database (n = 1,036 individuals), resulting in relatively small between-profile variance (average variance = 0.003686).
The reduced-locus PCA demonstrated modest improvement in cluster separation; however, overlap among unrelated individuals remained, indicating that PCA of genotypic frequencies provides limited discriminatory resolution compared with conventional STR matching criteria.
The comparative results highlight fundamental differences between the two approaches: STR genotyping is deterministic and population-statistically validated, whereas proteomic profiling is continuous, biologically dynamic, and currently exploratory.
· STR typing provides deterministic identity via allele concordance
. · PCA-based proteomics shows intra-individual similarity over time.
· Proteomics may complement DNA in low-template conditions.
· Comparative analysis clarifies the strengths and limitations of both methods.
Research Article
Comparative Evaluation of Salivary Proteomic Profiling and STR DNA Typing for Forensic Subject Discrimination
Doan H1, Hogan C1, Viray J2 and Giulivi C1,3*
1Department of Molecular Biosciences, School of Veterinary Medicine, Davis, CA 95616,
2Sacramento District Attorney’s Office, Biology Laboratory, Sacramento, CA 95814
3Medical Investigations of Neurodevelopmental Disorders (M.I.N.D.)
Institute, University of California Davis, CA 95817
2Sacramento District Attorney’s Office, Biology Laboratory, Sacramento, CA 95814
3Medical Investigations of Neurodevelopmental Disorders (M.I.N.D.)
Institute, University of California Davis, CA 95817
*Address for Correspondence:Cecilia Giulivi, Department of Molecular Biosciences, School of
Veterinary Medicine, 1089 Veterinary Medicine Drive, Davis, CA, USA, E-mail: cgiulivi@ucdavis.edu
Submission:18 March, 2026
Accepted:29 May, 2026
Published:03 June, 2026
Copyright: ©2026 Doan H, et al. This is an open access article
distributed under the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium,
provided the original work is properly cited.
Keywords:Forensic identification; short tandem repeat (STR) typing; DNA
analysis; salivary proteomics; liquid chromatography–tandem mass
spectrometry (LC–MS/MS); principal component analysis (PCA); forensic
biology; Random Match Probability (RMP)
Abstract
Identifying persons of interest from biological evidence is central
to establishing probative value in forensic investigations. Short tandem
repeat (STR) DNA typing remains the forensic gold standard due to
its statistical robustness and high inter-individual discriminatory power.
However, low-template DNA, degradation, and complex mixtures
may limit profile clarity. In contrast, forensic proteomics leverages
the relative abundance and chemical stability of proteins and may
provide complementary biological information.
This study directly compared subject discrimination using STR genotyping and salivary proteomic profiling. Saliva samples from 41 individuals (including three samples from the same individual collected across ~2 years) were analyzed by LC–MS/MS, and protein abundance profiles were evaluated using unsupervised principal component analysis (PCA) with Euclidean distance metrics. DNA genotyping was performed on 16 samples using real-time qPCR quantification, PowerPlex ® Fusion STR amplification, and capillary electrophoresis.
Full STR profiles were obtained for all DNA samples, and identical alleles across all loci confirmed perpetrator identity through populationbased Random Match Probability (RMP) calculations. Proteomic PCA of 269 shared proteins explained 20.1% (PC1) and 15.0% (PC2) of total variance; reduction to 10 discriminatory proteins increased explained variance to 27.6% and 20.2%, respectively. Perpetrator-derived proteomic samples formed significantly tighter clusters than unrelated individuals (Mann–Whitney U = 0, p = 1.88 × 10-4). Notably, samples collected approximately two years apart remained more similar to each other than to any other subject in multivariate space.
While STR typing provides deterministic identification through locusby- locus allele concordance, proteomic profiling captures continuous, biologically dynamic variation. These findings support the concept that salivary proteomics may serve as a complementary forensic tool, particularly when DNA quality or quantity is compromised, though further validation and the development of a statistical framework are required before evidentiary implementation.
This study directly compared subject discrimination using STR genotyping and salivary proteomic profiling. Saliva samples from 41 individuals (including three samples from the same individual collected across ~2 years) were analyzed by LC–MS/MS, and protein abundance profiles were evaluated using unsupervised principal component analysis (PCA) with Euclidean distance metrics. DNA genotyping was performed on 16 samples using real-time qPCR quantification, PowerPlex ® Fusion STR amplification, and capillary electrophoresis.
Full STR profiles were obtained for all DNA samples, and identical alleles across all loci confirmed perpetrator identity through populationbased Random Match Probability (RMP) calculations. Proteomic PCA of 269 shared proteins explained 20.1% (PC1) and 15.0% (PC2) of total variance; reduction to 10 discriminatory proteins increased explained variance to 27.6% and 20.2%, respectively. Perpetrator-derived proteomic samples formed significantly tighter clusters than unrelated individuals (Mann–Whitney U = 0, p = 1.88 × 10-4). Notably, samples collected approximately two years apart remained more similar to each other than to any other subject in multivariate space.
While STR typing provides deterministic identification through locusby- locus allele concordance, proteomic profiling captures continuous, biologically dynamic variation. These findings support the concept that salivary proteomics may serve as a complementary forensic tool, particularly when DNA quality or quantity is compromised, though further validation and the development of a statistical framework are required before evidentiary implementation.
Introduction
DNA genotyping has been the “Gold Standard” in forensic
science, serving as a critical identification method and a noninvasive
means of collecting reference profiles [1]. One of the primary
advantages of short-tandem repeat (STR) typing is the visual
discrimination provided by electropherogram peaks, which allow
analysts to determine the alleles present in a sample, assess potential
contributions from multiple individuals, and estimate the number of
contributors through peak height ratios at each locus [2-5].
DNA can be extracted from numerous biological materials
encountered at crime scenes, including saliva deposited on cigarette
butts, glassware, utensils, and gum, as well as from half-eaten food,
vomit, and telephone receivers [6]. In sexual assault cases, saliva
recovered from the neck, face, breast, and/or genitalia—often in
conjunction with genital fluids—can corroborate victim testimony
and link offender(s) to the crime scene. In routine forensic workflows,
saliva is first identified using presumptive and confirmatory assays
targeting α-amylase, including immunochromatographic strip tests
and radial diffusion methods, prior to downstream DNA analysis.
High-quality and sufficient-quantity DNA are essential for reliable identification. Samples encountered in forensic investigations are frequently partially degraded, contain low template amounts, and may include PCR inhibitors. Individually or collectively, these factors influence both the efficiency of laboratory processing and the reliability of results [7] DNA degradation or inefficient amplification [3,4,6,8] can compromise interpretability [4,6,8,9] due to intrinsic chemical instability. Biological samples may be exposed to harsh environmental conditions prior to collection, including elevated temperatures, ultraviolet radiation, high humidity, and bacterial nucleases that degrade template DNA [2,4,6,8,9]. The ribose backbone and nitrogenous bases are particularly susceptible to depurination, oxidation, and hydrolysis [10], whereas proteins and peptides are generally more stable outside physiological environments [11].
Although STR markers provide robust genetic linkage to an individual, they are located within noncoding regions of chromosomes and therefore offer limited phenotypic or contextual information [12]. To address limitations associated with degraded or low-template samples, alternative genetic approaches have been developed, including mitochondrial DNA (mtDNA), single nucleotide polymorphisms (SNPs) [13], insertion/deletion (indels) polymorphisms [14], and mini-STRs [15]. DNA typing remains highly effective when contributors can be clearly resolved [3-5,16] however, complex mixtures and environmentally compromised samples continue to pose interpretive challenges.
High-quality and sufficient-quantity DNA are essential for reliable identification. Samples encountered in forensic investigations are frequently partially degraded, contain low template amounts, and may include PCR inhibitors. Individually or collectively, these factors influence both the efficiency of laboratory processing and the reliability of results [7] DNA degradation or inefficient amplification [3,4,6,8] can compromise interpretability [4,6,8,9] due to intrinsic chemical instability. Biological samples may be exposed to harsh environmental conditions prior to collection, including elevated temperatures, ultraviolet radiation, high humidity, and bacterial nucleases that degrade template DNA [2,4,6,8,9]. The ribose backbone and nitrogenous bases are particularly susceptible to depurination, oxidation, and hydrolysis [10], whereas proteins and peptides are generally more stable outside physiological environments [11].
Although STR markers provide robust genetic linkage to an individual, they are located within noncoding regions of chromosomes and therefore offer limited phenotypic or contextual information [12]. To address limitations associated with degraded or low-template samples, alternative genetic approaches have been developed, including mitochondrial DNA (mtDNA), single nucleotide polymorphisms (SNPs) [13], insertion/deletion (indels) polymorphisms [14], and mini-STRs [15]. DNA typing remains highly effective when contributors can be clearly resolved [3-5,16] however, complex mixtures and environmentally compromised samples continue to pose interpretive challenges.
Importantly, DNA genotyping does not provide information
regarding an individual’s disease status, age, lifestyle factors, or recent
exposures. In contrast, saliva transfer—such as that occurring in bite
marks—may provide insight into sex and age [17], geographical
region [18,19], smoking habits or drug use [20,21], and disease states through the analysis of salivary analytes, including proteins [22-26].
While such information may enhance investigative leads, it also raises
important ethical and privacy considerations that warrant broader
discussion.
Collectively, these findings support the exploration of proteomic profiling as either an alternative or complementary identification strategy. Degraded DNA and mixed-source samples remain difficult to resolve [2-5,27,28], whereas many salivary proteins are highly conserved across individuals [22,26,29], with a subset exhibiting discriminatory potential [30,31]. Recent proteomic investigations have demonstrated that salivary protein profiles can discriminate between individuals using principal component analysis (PCA), reducing hundreds of detected proteins to a focused subset of highly discriminatory markers that cluster uniquely by subject. This study, therefore, aims to critically evaluate the comparative and complementary value of STR-based DNA typing and salivary proteomics for subject identification in forensic casework.
Collectively, these findings support the exploration of proteomic profiling as either an alternative or complementary identification strategy. Degraded DNA and mixed-source samples remain difficult to resolve [2-5,27,28], whereas many salivary proteins are highly conserved across individuals [22,26,29], with a subset exhibiting discriminatory potential [30,31]. Recent proteomic investigations have demonstrated that salivary protein profiles can discriminate between individuals using principal component analysis (PCA), reducing hundreds of detected proteins to a focused subset of highly discriminatory markers that cluster uniquely by subject. This study, therefore, aims to critically evaluate the comparative and complementary value of STR-based DNA typing and salivary proteomics for subject identification in forensic casework.
Materials and Methods
Study design and sample collection:
All saliva collection procedures were performed as previously
described [30, 31] and were approved by the University of California,
Davis Institutional Review Board (IRBNet ID: 1544585-1; approved
4/17/2020). The work described has been carried out in accordance
with the World Medical Association’s Declaration of Helsinki.
Written informed consent was obtained from all participants. A
total of 41 saliva samples were collected from female volunteers aged
20-61 years. Eleven samples were collected on October 22, 2020 (B1
cohort), and thirty samples were collected on February 4, 2022 (B2/B3
cohort). Subjects had abstained from eating, drinking, or oral hygiene
procedures for 15-30 minutes prior to collection. Participants rinsed
with water, allowed unstimulated saliva to pool for 60 seconds, and
expectorated 0.1–1 mL into sterile containers. Samples were stored at
−20°C until processing. The October 2020 cohort (n = 11) included a
designated “crime scene” sample (CH14) and 10 unrelated subjects.
The February 2022 cohort included 15 subjects, one of whom
corresponded to the perpetrator represented in the 2020 cohort.
Saliva samples from this second cohort were divided into two equal
aliquots: one untreated (B2) and one treated with starch to remove
amylase and glycosylated proteins (B3), as described previously [30].
Untreated samples were labeled CH#, and starch-treated samples
were labeled CH#.1. The perpetrator samples were labeled CH14.1
(starch-treated) and CH14.2 (untreated). The original crime scene
sample remained labeled CH14.For DNA genotyping, saliva samples from the 2022 cohort (n =15) were analyzed along with an additional sample collected from the perpetrator on April 19, 2023 (n = 16 total DNA samples). All
samples were stored at -20°C until further processing.
Saliva proteomics:
Sample processing – Whole saliva samples were precipitated
with four volumes of −20°C analytical-grade acetone (Sigma-
Aldrich, St. Louis, MO) and incubated overnight at 4°C. Samples
were centrifuged at 16,000 × g for 10 minutes at 4°C. Pellets were
washed twice with −20°C acetone, centrifuged under the same
conditions, and dried under vacuum for 15 minutes (SpeedVac).
Protein pellets were solubilized in 100 μL of 6 M urea in 50 mM
ammonium bicarbonate (pH 8.0). Samples were reduced with 2.5 μL
of 5 mM dithiothreitol (DTT) for 30 minutes at 37°C and alkylated
with 20 μL of 5 mM iodoacetamide (IAA) for 30 minutes in the dark.
Excess IAA was quenched with 20 μL DTT for 10 minutes at room
temperature. Proteins were digested with a mass spectrometry-grade
rLys-C/Trypsin Gold mix (Promega, Madison, WI) at a 1:25 enzymeto-
protein ratio for 4 hours at 37°C. Urea concentration was diluted
to <1 M with 50 mM ammonium bicarbonate, and digestion was
continued overnight at 37°C. Peptides were desalted using Macro
Spin Columns (The Nest Group, Ipswich, MA). Approximately 10–
100 μg of peptide digest was subjected to mass spectrometry.
Liquid Chromatography and Tandem Mass Spectroscopy -
Peptide digests were randomized prior to analysis and processed at
the UC Davis Proteomics Facility using a Q Exactive Orbitrap mass
spectrometer (Thermo Scientific) coupled to a Proxeon Easy-nLC II
system with a nanospray source.Peptides were loaded onto a 100 μm × 25 mm Magic C18 (200 Å, 5 μm) trap column and separated on a 75 μm × 150 mm Magic C18 (200 Å, 3 μm) analytical column using a 90-minute gradient at 300 nL/min. Full MS scans were acquired over 300–1600 m/z. The top 15 precursor ions were selected for high-energy collisional dissociation (HCD) using a 2.0 m/z isolation window, 27% normalized collision energy, and 5-second dynamic exclusion.
Protein Identification - MS/MS spectra were searched using X! Tandem (version Alanine 2017.2.1.4) against the HumanFR_ crap05292020_rev database (149,657 entries). Search parameters included: (i) Trypsin specificity; (ii) Parent and fragment ion tolerances of 20 ppm; and (iii) Variable modifications: carbamidomethylation (Cys), oxidation (Met, Trp), deamidation (Asn, Gln), N-terminal pyro-Glu formation, ammonia loss, and selenocysteine modifications. Protein identifications were validated using Scaffold (v4.11.1, Proteome Software Inc.). Peptide identifications were accepted at >88% probability to achieve <0.5% false discovery rate (FDR). Protein identifications were accepted at >5% probability, FDR <5%, and required at least two unique peptides, as assigned by the Protein Prophet algorithm. Proteins sharing indistinguishable peptides were grouped according to parsimony principles. Relative protein abundance was estimated using weighted spectral counting. Salivary protein profiling and statistical analyses- Proteomic data were normalized to the total spectral counts within each sample. Across datasets, 1,169 proteins were identified in the first cohort and 281 in the second; proteins present in all datasets were retained for further analysis (n = 269; (Supplementary Table 1). To correct for technical variation and non-biological batch effects, data were processed using the EigenMS algorithm [32,33] Corrected data were subjected to principal component analysis (PCA) using ClustVis [34]. To identify proteins contributing most strongly to inter-subject variability, PCA loading values from the full 269-protein dataset were ranked by absolute magnitude (Supplementary Table 2). The top 10 proteins were selected for reduced-dimensional PCA analysis. Additional statistical analyses were performed using GraphPad Prism 9.1.0. To quantify intersample similarity within PCA space, Euclidean distances were calculated using PC1 and PC2 coordinates exported from ClustVis. For each PCA model (full 269-protein dataset and reduced 10-protein dataset), a centroid representing the perpetrator-derived samples (CH14, CH14.1, CH14.2) was calculated by averaging their PC1 and PC2 values. Euclidean distance for each sample was then computed as √ [(PC1_sample − PC1_centroid) ² + (PC2_sample − PC2_centroid) ²]. Distances for perpetrator-derived samples were compared with those of all non-perpetrator samples using the Mann–Whitney U test (two-tailed). Statistical significance was defined as P < 0.05.
DNA Analyses:
DNA Extraction - DNA was extracted from whole saliva using
a modified QIAamp® DNA Blood Mini Kit protocol (QIAGEN,
Germantown, MD) following the manufacturer’s supplementary
instructions for saliva. Briefly, saliva samples (0.1–1.0 mL) were
diluted in Dulbecco’s phosphate-buffered saline (DPBS), centrifuged
at 1,800 × g for 5 minutes at 4°C, and the cell pellets were resuspended
in DPBS. Samples were lysed with QIAGEN Protease and Buffer
AL at 56°C for 1 hour, followed by ethanol addition and column based
purification using QIAamp spin columns. Wash steps were
performed with Buffers AW1 and AW2, and DNA was eluted in
150 μL UltraPure™ water. Extracted DNA was stored at −20°C until
analysis.DNA Quantification- DNA quantification was performed using the Quantifiler® Trio DNA Quantification Kit (Thermo Fisher Scientific, Waltham, MA) on a QuantStudio™ 5 Real-Time PCR System. Standard curves were generated using serial dilutions down to 5 pg/μL. Reactions were prepared according to the manufacturer’s instructions and amplified under the following conditions: 95°C for 2 minutes, followed by 40 cycles of 95°C for 9 seconds and 60°C for 30 seconds.
DNA input for downstream amplification was adjusted to 0.5–1.0 ng in a 15 μL reaction volume using amplification-grade water.:
STR amplification and Capillary Electrophoresis - STR
amplification was performed using the PowerPlex® Fusion 6C
System (Promega) following the manufacturer’s protocol. PCR was
conducted on a Veriti™ 96-Well Thermal Cycler under the following
conditions: 96°C for 1 minute; 29 cycles of 96°C for 5 seconds and
60°C for 1 minute; final extension at 60°C for 10 minutes; hold at 4°C.Amplified products were separated using an Applied Biosystems™ 3500xL Genetic Analyzer with 36 cm capillary arrays and POP-4 polymer. Samples were prepared with Hi-Di™ formamide and WEN ILS 500 size standard, denatured at 95°C for 3 minutes, snap-cooled, and analyzed using the HID36_POP4XL module.
STR profile interpretation- Electropherograms were analyzed
using GeneMapper® ID-X v1.6. The analytical threshold for allele
calling was set at 100 relative fluorescence units (RFU), consistent
with laboratory validation for the PowerPlex® Fusion 6C system. All
profiles were single-source. Because participants did not consent to
publication of full STR profiles, electropherograms and genotype
frequency tables are not presented.
Random Match Probability (RMP) Calculations- Random match probabilities were calculated according to NRC II recommendations (1996) [35] For homozygous loci, the Balding–Nichol’s equation [p² + p (1 − p) θ] was applied using θ = 0.01. For heterozygous loci, Hardy–Weinberg expectations (2pq) were used. Allele frequencies were obtained from the NIST-revised U.S. STR population database [36]. Locus-specific probabilities were multiplied across all loci to obtain the overall RMP, and values were reported as 1/RMP.
Random Match Probability (RMP) Calculations- Random match probabilities were calculated according to NRC II recommendations (1996) [35] For homozygous loci, the Balding–Nichol’s equation [p² + p (1 − p) θ] was applied using θ = 0.01. For heterozygous loci, Hardy–Weinberg expectations (2pq) were used. Allele frequencies were obtained from the NIST-revised U.S. STR population database [36]. Locus-specific probabilities were multiplied across all loci to obtain the overall RMP, and values were reported as 1/RMP.
Results
Salivary Profiling Using Proteomics:
Shotgun proteomic analysis identified >2,000 proteins across
all samples. For downstream comparative analysis, only proteins
detected in all 41 samples were retained (n = 269; (Supplementary
Table 1), minimizing missing-value bias and ensuring comparability
across subjects.Protein–protein interaction (PPI) network analysis using STRINGdb demonstrated significant biological connectivity (237 nodes, 2,823 edges; expected edges = 674; average node degree = 23.8; clustering coefficient = 0.49; PPI enrichment p < 1 × 10-¹⁶), supporting the biological coherence of the retained dataset (Figure 1A). Unsupervised k-means clustering (k = 2, determined by centroid stability and within-cluster variance) separated the proteins into two principal functional groups [Figure 1A]. The larger cluster (n = 207) was enriched for secretory granule lumen and neutrophil degranulation pathways, whereas the smaller cluster (n = 25) was enriched for keratinization and intermediate filament organization. Gene Ontology enrichment confirmed expected salivary molecular functions, including endopeptidase inhibitor and antioxidant activities [Figure 1B-C]. Comparison with previously published salivary proteomes demonstrated overlap with known salivary signatures, while identifying 143 proteins not previously reported [Figure 1D], indicating expanded proteomic depth rather than a technical artifact.
Batch Effect Assessment and Correction:
Because sample collections occurred approximately 16 months
apart, potential batch effects were formally evaluated. Singular value
decomposition (SVD)-based normalization using EigenMS was
applied to partition biological variance from systematic technical
variation.Visualization of uncorrected data demonstrated separation by collection cohort, consistent with batch-driven variance [Figure2 A-C]. Following EigenMS correction, this separation was markedly reduced, indicating effective removal of systematic bias while retaining inter-individual variability.
All subsequent analyses were conducted on batch-corrected data
Figure 1: Functional characterization and literature comparison of the salivary proteome dataset. Panel A. Protein–protein interaction (PPI) network analysis of the 269 proteins retained for comparative analysis, generated using STRINGdb with whole-genome background enrichment. The network comprised 237 nodes and 2,823 edges (expected edges = 674), with an average node degree of 23.8 and an average local clustering coefficient of 0.49. The observed PPI enrichment was highly significant (p < 1 × 10-¹⁶), indicating greater connectivity than expected by chance. Nodes represent individual proteins, and edges represent high-confidence interactions. Unsupervised k-means clustering (k = 2) identified two principal functional clusters. Panel B. Gene Ontology (GO) enrichment analysis of molecular function for the network proteins. The top five enriched terms are shown for clarity. Terms were grouped by similarity (similarity ≥ 0.8) and ranked by signal strength. Minimum term count = 2; minimum signal and strength thresholds = 0.01. Panel C. GO enrichment analysis for tissue-associated terms derived from the same protein set. The top four enriched tissue categories are shown. Parameters were identical to those used in panel B. Panel D. Literature overlap analysis. Left: Venn diagram illustrating overlap between the present dataset and four previously published salivary proteomic studies. Right: Enrichment of overlapping proteins across PubMed-indexed datasets; the top four overlapping studies are shown (38-41).
Figure 2:Assessment of batch effects and normalization of salivary proteomic data. Panel A. Principal component analysis (PCA) of salivary proteomic profiles before and after batch correction. Samples are color-coded by collection group: B1 (red; collected in 2020), B2 (green; collected in 2022 without starch treatment), and B3 (blue; collected in 2022 with starch pre-treatment to remove amylase and glycosylated proteins). The left panel shows clustering of raw data prior to normalization, with separation largely driven by the collection cohort. The right panel shows the PCA after EigenMS normalization, demonstrating reduced cohort-driven separation and improved sample integration. Percent variance explained by each principal component is indicated on the axes.
Panel B. Singular value decomposition (SVD) analysis used to identify systematic technical trends in the dataset. The left panel displays dominant variance components in the raw data,revealing structured batch-associated patterns. The right panel shows the corresponding components after EigenMS normalization, indicating attenuation of systematic bias. Panel C: Quantification of inter-sample distances across experimental conditions before and after normalization. Bars represent mean pairwise Euclidean distances within and between groups; the asterisk indicates a statistically significant reduction in batch-associated separation following EigenMS correction
(statistical test described in Methods).
that were log-transformed (ln[x+1]) and Pareto-scaled to stabilize
variance and reduce dominance by highly abundant proteins.
Principal Component Analysis and Subject Discrimination:
Unsupervised principal component analysis (PCA) was performed
using singular value decomposition on the batch-corrected dataset (n
= 41 samples; 269 proteins). PCA was used solely for dimensionality
reduction and visualization, without incorporating class labels.
In the full proteome dataset, the first two principal components
explained 20.1% (PC1) and 15.0% (PC2) of the total variance,
respectively [Figure 2D]. The three samples from the same perpetrator
(CH14, CH14.1, CH14.2) clustered closely together in this reduced-
dimensional space and were separated from the majority of other
subjects. Euclidean distances in PC1–PC2 space were calculated
relative to the centroid of the perpetrator samples (CH14, CH14.1,
CH14.2). Perpetrator-derived samples showed substantially smaller
distances (0.0481 ± 0.0327) than non-perpetrator samples (0.524 ±
0.234; Mann–Whitney U = 0, p = 1.88 × 10-⁴), supporting statistically
tighter clustering of perpetrator samples. The closest non-perpetrator
sample (CH12.1; distance = 0.137) remained separated from the
perpetrator cluster. Complete Euclidean distance values for all
samples in both PCA models are provided in (Supplementary Table
3). Although PC1 and PC2 together accounted for 35.1% of total
variance, clustering was observed without supervised modeling,
indicating that subject-level variance contributed measurably to the
dominant components.Identification of Discriminatory Proteins:
To identify proteins contributing most strongly to inter-subject
variability, PCA loadings from the full dataset were examined
(Supplemental Table 2). The top 10 proteins with the highest absolute
loading values were selected as candidate discriminatory markers
(AMY1A, GC, IGHA2, IGKC, JCHAIN, KRT13, KRT14, MUC5B,
SPPR3, ZG16B).In the reduced 10-protein PCA model [Figure 3B], PC1 and PC2 explained 27.6% and 20.2% of the total variance, respectively (47.8% cumulative). The increase in explained variance after feature reduction suggests that the selected proteins capture a greater proportion of subject-associated variability than the full proteome dataset. The three perpetrator-derived samples (CH14, CH14.1, CH14.2) formed a compact cluster in PC space. To quantify this separation, Euclidean distances were calculated in PC1–PC2 space relative to the centroid of the perpetrator samples. Perpetrator-derived samples exhibited significantly smaller distances to their centroid (0.0633 ± 0.0105) compared with all non-perpetrator samples (0.6269 ± 0.1616; Mann–Whitney U = 0, p = 1.88 × 10-⁴). The closest nonperpetrator sample (CH6.1; distance = 0.248) remained substantially separated from the perpetrator cluster. These results quantitatively confirm the enhanced discriminatory resolution observed visually in [Figure 3C]. Collectively, these findings demonstrate that salivary proteomic profiling retains subject-specific structure capable of discriminating a crime scene sample from unrelated individuals, even after correction for batch effects and dimensionality reduction. The observed clustering and quantitative separation in PCA space suggest that a focused subset of discriminatory proteins enhances resolution beyond that achieved with the full proteome. To contextualize the forensic utility of this approach, we next compared proteomic-based discrimination with conventional STR DNA genotyping performed on the same cohort.
DNA STR Genotyping and PCA of Genotypic Frequencies STR Profile Quality and Interpretation:
All saliva samples yielded single-source, full STR profiles using
the PowerPlex® Fusion 6C system. Stutter peaks were observed at
expected positions (N ± 4 repeat units) and were identified according
to established analytical criteria. Complete 1/RMP values calculated
for all four U.S. populations and for the single combined population
are provided in (Supplementary Table 5A, 5B), respectively. Several
samples displayed minor artifacts consistent with pull-up peaks.
Sample A3 showed a 102 RFU pull-up at locus D12S391 associated
with a strong FGA allele (>1800 RFU). Sample A13 displayed three
pull-up peaks at Amelogenin, D3S1358, and SE33, attributable to
strong neighboring loci. These artifacts were below true allele peak
Figure 3: Identification of discriminatory salivary proteins and subject separation in reduced-dimensional space. Panel A. Principal component analysis (PCA) of all 41 salivary samples using the full set of 269 shared proteins after batch correction and preprocessing (ln[x+1] transformation and Pareto scaling). Samples corresponding to unrelated individuals (“suspects”) are shown in blue, and perpetrator-derived samples (CH14, CH14.1, CH14.2) are shown in red. Ellipses represent 95% confidence intervals. PC1 and PC2 explained 20.1% and 15.0% of the total variance,
respectively. Panel B. PCA loadings from the full dataset were examined to identify proteins contributing most strongly to inter-subject variability. The top 10 proteins with the highest absolute loading values were selected as candidate discriminatory markers. Panel C. PCA of salivary samples using the 10 selected proteins (AMY1A,
GC, IGHA2, IGKC, JCHAIN, KRT13, KRT14, MUC5B, SPPR3, ZG16B). Reduction to this subset increased explained variance (PC1 = 27.6%, PC2 = 20.2%) and resulted in tighter clustering of perpetrator-derived samples relative
to unrelated individuals. Ellipses represent 95% confidence intervals.
intensities and did not interfere with genotype interpretation. Sample
A1 exhibited a potential tri-allelic pattern between loci D12S391
and D19S433, observed as an off-ladder allele. Re-amplification
reproduced the tri-allelic signal (243 RFU) with expected stutter
patterning. GeneMapper® ID-X flagged locus D12S391; however, the
tri-allelic pattern did not alter single-source profile interpretation.
To our knowledge, tri-allelic patterns at this locus are rare and not
commonly reported in public databases. All profiles were suitable for
statistical evaluation.
PCA of Genotypic Frequencies (All 23 Loci):
To explore variance structure among DNA profiles, a two dimensional
principal component analysis (PCA) was performed
using locus-specific genotypic frequencies calculated under Hardy–
Weinberg or Balding–Nichol’s expectations (θ = 0.01), based on NIST
allele frequency data (Hill et al., 2013). One-population frequency
estimates were used to maintain uniform scaling across samples for
PCA visualization.Using all 23 loci [Figure 4A], PC1 and PC2 explained 18.0% and 16.5% of total variance, respectively (34.5% cumulative). PCA revealed substantial overlap among samples, reflecting the relatively small variance in genotypic frequencies across individuals. For example, samples A14 and A16 (perpetrator-derived samples) overlapped completely in PC space, consistent with identical STR profiles. However, additional samples (e.g., A7) occupied proximal positions in PCA space despite having distinct genotypes. Also, additional sample pairs exhibited limited spatial resolution in reduced-dimensional space (A1–A2, A3–A11, A5–A10; [Figure 4A]. The limited dispersion observed in PCA likely reflects the constrained numerical range of genotypic frequencies derived from a finite population database (n = 1,036 individuals), resulting in relatively small between-profile variance (average variance = 0.003686).
PCA After Locus Reduction:
To evaluate whether a subset of loci could improve separation
in PCA space, loci contributing minimally to principal component
loadings were removed. Absolute loading values from principal
components through PC7 were examined to identify loci contributing
minimally to variance structure (Supplementary Table 4), yielding
a cumulative variance threshold of 83.3%. Six loci (D10S1248,
D16S539, D2S1338, vWA, D5S818, SE33) were excluded, leaving 17
loci for reanalysis [Figure 4B].The reduced-locus PCA demonstrated modest improvement in cluster separation; however, overlap among unrelated individuals remained, indicating that PCA of genotypic frequencies provides limited discriminatory resolution compared with conventional STR matching criteria.
Random Match Probability (RMP):
For forensic comparison, locus-specific genotypic frequencies
were multiplied across all loci to calculate Random Match Probabilities
(RMP) in accordance with NRC II guidelines. Because donor ethnicity
was unknown, 1/RMP values were calculated using allele frequencies
from the four major U.S. populations. Using this approach, only one
exact match was observed: the perpetrator’s samples A14 and A16
showed identical alleles at all loci, confirming their genetic identity.
No unrelated sample produced a matching STR profile. Unlike
proteomic profiling, which demonstrated measurable clustering in
multivariate space under exploratory dimensionality reduction, STR
identification relies on deterministic locus-by-locus comparison and
population- based probability calculations, yielding unambiguous
profile matching when full allelic concordance is present. The greater
dispersion observed in proteomic PCA likely reflects continuous
quantitative variability in protein abundance across individuals,
whereas STR genotypic frequencies are constrained by discrete allele
categories and bounded population-frequency distributions.
Figure 4: Principal component analysis (PCA) of STR genotypic frequencies. Panel A: PCA of subjects using genotypic frequencies from all 23 STR loci. PC1 and PC2 explained 18.0% and 16.5% of total variance, respectively. Several sample pairs exhibited limited separation in reduced-dimensional space, including A1–A2, A3–A11, A5–A10, and the cluster A7–A14–A16 (circled in red). Notably, perpetrator-derived samples A14 and A16 overlapped completely, consistent with identical STR profiles. PCA was used for exploratory visualization only and does not substitute for locus-by-locus comparison or Random Match Probability (RMP) calculations. Panel B: PCA loadings plot showing the contribution of individual STR loci to variance in principal component space. Loci with minimal loading contributions
(D10S1248, D16S539, D2S1338, vWA, D5S818, SE33) are circled in red. These loci were excluded for exploratory reanalysis in panel C. Low loading contributions in the PCA space do not imply reduced forensic discriminatory power. Panel C: PCA of subjects using the reduced set of 17 STR loci selected based on loading magnitude. Modest improvement in sample dispersion was observed; however, overlap among unrelated individuals remained. Perpetrator-derived samples A14 and A16 (circled in red) remained superimposed, reflecting identical
allele profiles across loci.
Discussion
Comparative Forensic Utility of Proteomics and STR Typing:
The present study highlights fundamental methodological
differences between salivary proteomic profiling and conventional
STR DNA genotyping. STR typing operates through deterministic
allele concordance across defined loci, with statistical weight expressed
as a population-based Random Match Probability (RMP). When full
allelic concordance is observed, as in samples A14 and A16, identity
is established within the statistical framework of population genetics
through RMP calculations. In contrast, proteomic profiling relies on
multivariate quantitative variation in protein abundance, capturing
subject-specific biological structure in reduced-dimensional space.
The PCA and Euclidean distance analyses demonstrated that salivary
protein signatures retain measurable discriminatory structure even
after correction for batch effects, with perpetrator-derived samples
forming statistically tighter clusters than unrelated individuals.
The reduced PCA was intended to visualize variance structure after
feature reduction and does not constitute independent validation of
discriminatory performance, as the same dataset was used for feature
selection and visualization.Importantly, these approaches address different aspects of forensic
identification. STR typing provides categorical identity confirmation
but is limited in its ability to resolve degraded samples, low-template
DNA, or complex mixtures, and does not convey phenotypic or
physiological information. Proteomic profiling, while inherently
multivariate and exploratory in its current implementation, captures
biologically informative variation that may complement DNA
typing, particularly when DNA quantity or quality is compromised.
The enhanced separation observed with a reduced discriminatory
protein panel further suggests that targeted proteomic markers may
improve resolution. The loci removed in the reduced PCA model
were selected solely on the basis of low loading contributions within
the exploratory principal component framework of this dataset.
Importantly, low PCA loading does not imply reduced forensic
discriminatory power. Several of the excluded loci (e.g., SE33,
D2S1338, vWA) are highly polymorphic and contribute substantially
to match probability calculations in standard STR analysis. Their
limited contribution in the present PCA likely reflects cohort-specific
genotype distribution and the constrained variance structure of
population-based frequency values rather than the intrinsic weakness
of the markers. Thus, rather than serving as a replacement for STR
genotyping, salivary proteomics may function as a complementary
evidentiary layer— providing additional discriminatory structure or
contextual biological information under conditions where traditional
DNA-based approaches encounter analytical challenges.
Practical and Analytical Considerations:
The PowerPlex® Fusion system has been validated to generate
full STR profiles from as little as 0.125 ng of template DNA [37],
underscoring the high sensitivity of modern STR typing. In contrast,
proteomic profiling required milligram-scale total protein input for
LC–MS/MS analysis. Both workflows required approximately one
working day from extraction to analytical output; however, DNA
typing relies on multiple proprietary kits, locus-specific primers,
fluorescent dyes, and capillary electrophoresis instrumentation.Proteomic workflows depend primarily on LC–MS/MS
instrumentation and downstream computational analysis. Although
LC–MS/MS instrumentation represents a substantial capital
investment, it enables high-dimensional biological characterization
beyond identity testing alone.
When evaluating analytical robustness, STR typing benefits
from decades of developmental validation, population database
construction, and standardized statistical interpretation. In the
present study, full allelic concordance between A14 and A16 provided
categorical confirmation of identity. Proteomic profiling, while able
to cluster perpetrator-derived samples distinctly from unrelated
individuals, remains inherently multivariate and probabilistic in its
current implementation.
Temporal stability further distinguishes the two approaches. STR genotypes remained identical across time points, as expected for germline DNA markers. Notably, despite sampling intervals spanning approximately 2 years, perpetrator-derived proteomic profiles remained significantly closer to one another than to any unrelated individual in PCA space, as quantified by centroid-based Euclidean distance analysis, indicating that intra-individual similarity exceeded inter-individual variability under the present analytical framework. Such small variability may reflect biological influences, including age, environmental exposure, and physiological state. Nevertheless, even after correction for batch effects and reduction to discriminatory protein subsets, perpetrator-derived samples maintained statistically significant clustering relative to unrelated individuals. Principal component analysis revealed that proteomic data accounted for a greater proportion of variance in the dominant components (PC1 = 27.6%, PC2 = 20.2%) than STR genotypic frequency PCA (PC1 = 18.0%, PC2 = 16.5%). This difference likely reflects the continuous quantitative nature of protein abundance compared with the constrained frequency range of allele-based genotypes. However, STR discriminatory power arises not from multivariate dispersion but from the multiplicative combination of locus-specific genotype probabilities across loci. Consequently, reduced separation in PCA space does not imply reduced forensic discrimination for STR typing, as evidentiary weight is derived from locus-by-locus probability calculations rather than dimensional variance structure. For forensic proteomics to achieve courtroom viability, statistical frameworks analogous to those used in DNA typing will be required. The identification of genetically variable peptides and construction of population frequency databases may allow calculation of likelihood ratios or RMP-like statistics, placing proteomic evidence on a comparable statistical footing with STR analysis.
Temporal stability further distinguishes the two approaches. STR genotypes remained identical across time points, as expected for germline DNA markers. Notably, despite sampling intervals spanning approximately 2 years, perpetrator-derived proteomic profiles remained significantly closer to one another than to any unrelated individual in PCA space, as quantified by centroid-based Euclidean distance analysis, indicating that intra-individual similarity exceeded inter-individual variability under the present analytical framework. Such small variability may reflect biological influences, including age, environmental exposure, and physiological state. Nevertheless, even after correction for batch effects and reduction to discriminatory protein subsets, perpetrator-derived samples maintained statistically significant clustering relative to unrelated individuals. Principal component analysis revealed that proteomic data accounted for a greater proportion of variance in the dominant components (PC1 = 27.6%, PC2 = 20.2%) than STR genotypic frequency PCA (PC1 = 18.0%, PC2 = 16.5%). This difference likely reflects the continuous quantitative nature of protein abundance compared with the constrained frequency range of allele-based genotypes. However, STR discriminatory power arises not from multivariate dispersion but from the multiplicative combination of locus-specific genotype probabilities across loci. Consequently, reduced separation in PCA space does not imply reduced forensic discrimination for STR typing, as evidentiary weight is derived from locus-by-locus probability calculations rather than dimensional variance structure. For forensic proteomics to achieve courtroom viability, statistical frameworks analogous to those used in DNA typing will be required. The identification of genetically variable peptides and construction of population frequency databases may allow calculation of likelihood ratios or RMP-like statistics, placing proteomic evidence on a comparable statistical footing with STR analysis.
Limitations
Several limitations of the present study should be acknowledged.
First, the cohort size was modest (n = 41 for proteomics; n = 16 for
DNA comparison), limiting the generalizability of the findings and
precluding robust population-level statistical modeling of proteomic
variability. While clear clustering was observed for perpetrator derived
samples, larger, more diverse cohorts will be necessary to
assess false-positive rates, inter-individual overlap, and classification
stability.
Second, proteomic profiling was evaluated using principal
component analysis and Euclidean distance metrics, which are
exploratory and visualization-oriented techniques. Although
statistically significant separation was observed, PCA does not
constitute a predictive or classification model. Future studies should
incorporate supervised machine learning approaches with crossvalidation
or external validation cohorts to quantify classification
accuracy, sensitivity, and specificity.
Third, salivary protein expression is influenced by biological variables including age, sex, circadian rhythm, diet, health status, and environmental exposures. Although batch correction (EigenMS) was applied to mitigate technical variation, biological variability across time points was observed in longitudinal samples. This temporal variability underscores the need to characterize intra-individual stability over extended intervals before proteomic profiling can be considered a deterministic identification method.
Fourth, protein abundance was estimated using spectral counting, which provides semi-quantitative measurements. While sufficient for comparative profiling, more precise quantification methods (e.g., MS1 intensity-based quantification or targeted proteomics) may improve reproducibility and discriminatory resolution.
Fifth, STR PCA analyses were conducted using one-population allele frequency estimates to standardize visualization. While appropriate for exploratory multivariate comparison, this simplification reduces population-specific accuracy and does not reflect standard forensic reporting practices. Importantly, identity conclusions were based solely on full allele concordance and RMP calculations.
Finally, proteomic profiling currently lacks established population databases and widely accepted statistical frameworks analogous to Random Match Probability or likelihood ratios used in forensic DNA typing. The development of frequency databases for genetically variable peptides and standardized interpretive guidelines will be essential before proteomic evidence can be considered for courtroom application
Third, salivary protein expression is influenced by biological variables including age, sex, circadian rhythm, diet, health status, and environmental exposures. Although batch correction (EigenMS) was applied to mitigate technical variation, biological variability across time points was observed in longitudinal samples. This temporal variability underscores the need to characterize intra-individual stability over extended intervals before proteomic profiling can be considered a deterministic identification method.
Fourth, protein abundance was estimated using spectral counting, which provides semi-quantitative measurements. While sufficient for comparative profiling, more precise quantification methods (e.g., MS1 intensity-based quantification or targeted proteomics) may improve reproducibility and discriminatory resolution.
Fifth, STR PCA analyses were conducted using one-population allele frequency estimates to standardize visualization. While appropriate for exploratory multivariate comparison, this simplification reduces population-specific accuracy and does not reflect standard forensic reporting practices. Importantly, identity conclusions were based solely on full allele concordance and RMP calculations.
Finally, proteomic profiling currently lacks established population databases and widely accepted statistical frameworks analogous to Random Match Probability or likelihood ratios used in forensic DNA typing. The development of frequency databases for genetically variable peptides and standardized interpretive guidelines will be essential before proteomic evidence can be considered for courtroom application
Concluding remarks:
This study demonstrates that salivary proteomic profiling
captures subject-specific biological structure that can discriminate a
crime scene sample from unrelated individuals in multivariate space.
After correction for batch effects and dimensionality reduction,
perpetrator-derived samples formed statistically tighter clusters
than non-perpetrator samples, and quantitative Euclidean distance
analysis confirmed significant separation in principal component
space. These findings indicate that salivary protein abundance
patterns retain measurable subject-associated structure under
exploratory multivariate analysis. In contrast, conventional STR
DNA typing provided categorical identity confirmation through full
allelic concordance and population-based Random Match Probability
calculations. As expected, STR profiles remained temporally stable
and yielded unambiguous matching when alleles were identical
across loci.The comparative results highlight fundamental differences between the two approaches: STR genotyping is deterministic and population-statistically validated, whereas proteomic profiling is continuous, biologically dynamic, and currently exploratory.
However, proteomic analysis offers potential advantages in contexts
where DNA quantity or quality is compromised and may provide
additional contextual biological information not accessible through
noncoding STR markers.
Further development of standardized workflows, larger population datasets, quantitative validation studies, and statistical interpretive frameworks will be necessary before forensic proteomics can approach the evidentiary maturity of STR DNA typing. Nevertheless, the present findings support the concept that salivary proteomics may serve as a complementary forensic tool, augmenting traditional DNA-based identification rather than replacing it.
Further development of standardized workflows, larger population datasets, quantitative validation studies, and statistical interpretive frameworks will be necessary before forensic proteomics can approach the evidentiary maturity of STR DNA typing. Nevertheless, the present findings support the concept that salivary proteomics may serve as a complementary forensic tool, augmenting traditional DNA-based identification rather than replacing it.
Acknowledgments
This study was supported by discretionary funds (C.G.).
Conflict of Interest:
No potential financial and non-financial competing interests that could directly or indirectly undermine the
objectivity, integrity, and value of this publication through a possible
influence on the judgments and actions of authors regarding objective
data presentation, analysis, and interpretation were found. C.G. is an
Editorial Board Member of Scientific Reports (Nature Publishing
Company). She received compensation as Field Chief Editor for
Frontiers in Molecular Biosciences and honoraria from participating
in NIH peer review meetings. C.H. and H.D. have no conflict of
interest to report.Highlights:
· Salivary proteomes retain subject-specific multivariate structure.· STR typing provides deterministic identity via allele concordance
. · PCA-based proteomics shows intra-individual similarity over time.
· Proteomics may complement DNA in low-template conditions.
· Comparative analysis clarifies the strengths and limitations of both methods.
References
6. ButlerJM (2010 )Fundamentals of Forensic DNA Typing.Fundamentals of Forensic DNA Typing.2010:1-500.
Citation
Doan H, Hogan C, Viray J, Giulivi C. Comparative Evaluation of Salivary Proteomic Profiling and STR DNA Typing for Forensic Subject Discrimination. J Forensic Investigation. 2026; 13(1): 1.




