Optimized R2 retroelement complexes for DNA insertion into plant genomes – Nature Biotechnology

Plasmid construction

Coding sequences for four R2 non-LTR retrotransposases, R2Tg (T. guttata), R2Bm (B. mori), R2Tg-opt (rationally engineered T. guttata) and R2Za (Z. albicollis), were codon-optimized for N. benthamiana and A. thaliana using GenSmart Codon Optimization (GenScript) and synthesized as double-stranded DNA fragments (Twist Bioscience). Each open reading frame contained an N-terminal BP SV40 NLS (KRTADGSEFESPKKKRKV). Three additional versions of the R2Tg protein were generated: (1) R2Tg-intronic, which contains two A. thaliana introns; (2) R2Tg RT-dead, carrying D660A and D661A substitutions in the RT domain; and (3) R2Tg EN-dead, carrying D1057A and D1070A substitutions in the endonuclease domain.

The retrotransposition reporter followed a GFP–intron design. A CaMV 35S promoter and an A. thaliana heat-shock protein (HSP) 18.2 terminator flank a reverse-oriented nested cassette that contains the CsVMV promoter, mCherry open reading frame interrupted by either the S. tuberosum ST-LS1 intron or an A. thaliana intron and a CaMV t35S terminator. UTRs tested in the retrotransposition reporters are shown in Supplementary Table 1. Plasmids carrying R2 protein expression cassettes and donor DNA templates were combined with loop assembly using BsaI-mediated and SapI-mediated Golden Gate reactions (New England Biolabs, R0569S and E1601L, respectively) and cloned into pCAMBIA backbones⁶². All plasmids were propagated in NEB Turbo Escherichia coli competent cells (New England Biolabs, C2984I) and sequence-verified by Plasmidsaurus. A GV replicon based on the Bean yellow dwarf virus was used to transiently overexpress either the R2 proteins or the complete R2 editing system. The constructs generated in this work are listed in Supplementary Table 2.

Plant growth and transformation

N. benthamiana and A. thaliana (Col-0) plants were grown in a Conviron controlled-environment chamber set to 22 °C during the day and 20 °C during the night, with a 16-h light, 8-h dark photoperiod and 55% relative humidity. N. benthamiana plants (4 weeks old) were used for leaf infiltrations for transient experiments. A. thaliana plant leaves (4 weeks old) were isolated for protoplast isolation.

A. tumefaciens strain GV3101 (GOLDBIO, CC-125) carrying the binary R2 construct, the p19 silencing suppressor plasmid and the pSOUP helper plasmid was cultured overnight at 28 °C in 2×YT medium (Invitrogen, 12-780-052) supplemented with the appropriate antibiotics. Cells were collected and resuspended in induction medium (10 mM MES pH 5.5, 10 mM MgCl₂ and 200 mM acetosyringone (PhytoTech Labs, A104-5G) in DMSO) at OD₆₀₀ = 0.4. Suspensions were allowed to shake at 70 rpm for 4 h at room temperature. The Agrobacterium suspension was infiltrated into the abaxial surface of fully expanded leaves using a 1-ml needleless syringe until the tissue was saturated. Three leaves per plant were treated. Plants were returned to the growth chamber immediately and sampled 3–7 days after infiltration, depending on the experiment.

Transient transformation of S. lycopersicum (cv. Micro-Tom)

Transient transformation was performed using a protocol adapted from the VAST (vacuum and sonication-assisted transient expression) method. Seeds of S. lycopersicum (cv. Micro-Tom) were surface-sterilized with 33% (v/v) bleach containing 0.1% Tween-20 for 6 min, rinsed 4–5 times with sterile water and germinated on ½ MS agar at 22 °C under a 16-h light, 8-h dark photoperiod until cotyledons were fully expanded (~9 days). A. tumefaciens strain GV3101 (GOLDBIO, CC-125) carrying the binary R2 construct, p19 silencing suppressor and pSOUP helper plasmid was cultured in 2×YT medium with antibiotics at 28 °C, 200 rpm, 2 days before cocultivation. Cultures were pelleted (8,000g, 5 min), resuspended in AB-MES induction medium (200 µM acetosyringone, no antibiotics) to OD₆₀₀ = 0.3 and incubated overnight at 28 °C, 200 rpm. On the day of cocultivation, cultures were pelleted again and resuspended in coculture medium (1:1 AB-MES and ½ MS supplemented with 200 µM acetosyringone) and adjusted to OD₆₀₀ = 0.3.

For infection, 7–8 seedlings were immersed in 8 ml of Agrobacterium suspension in 12-ml tubes, sonicated for 30 s (Branson CPX-952-116R) and subjected to three cycles of vacuum infiltration (5 min of vacuum followed by rapid release per cycle), with gentle mixing between cycles. Seedlings were then transferred to six-well plates containing 4 ml of fresh Agrobacterium suspension per well and cocultivated for 2 days at 22 °C under a 16-h light, 8-h dark cycle. Following cocultivation, seedlings were washed three times with sterile water and transferred to ½ MS agar plates containing 100 µM timentin to suppress bacterial growth. Plants were maintained under standard growth conditions and transient expression was assessed at 3 days after infection by confocal microscopy and ddPCR.

Confocal microscopy image collection and signal quantification

N. benthamiana leaves infiltrated with Agrobacterium were sampled for confocal imaging 5 days after infiltration, except for longitudinal signal-tracking experiments, which were imaged on days 5, 7 and 10. A ~5-mm-diameter hole puncher (Fisher Scientific, NC0769832) was used to collect leaf disks adjacent to the infiltration zone. The tissue was mounted in water between a glass slide and a no. 1 coverslip. Tomato seedlings were imaged 3 days after VAST transformation. Disks of comparable size were taken from the first true leaves and mounted using the same approach. Imaging was performed on a Leica STELLARIS 8 FALCON laser-scanning microscope (HC PL APO ×20 water-immersion objective, numerical aperture = 0.75) housed in the Biological Imaging Facility at Caltech. Fluorescent proteins were excited at 434 nm (mTagBFP2), 517 nm (mGold2t) and 587 nm (mCherry). All images were acquired at ×20 magnification with a water-immersion objective, using fixed acquisition parameters within each experiment to enable quantitative comparison.

For each Agrobacterium strain, three or more independent biological replicates (individual plants) were infiltrated, with up to three leaf regions infiltrated per plant. During imaging, more than eight technical replicates (nonoverlapping confocal fields of view) were collected per plant across the infiltrated regions, using identical acquisition settings for all samples within an experiment.

For mCherry quantification in Figs. 2f, 3e and 4c, the total number of cells in each field of view was estimated from cell outlines visible in the corresponding brightfield images. This approach was required because the two-plasmid design lacked a transformation reporter for comparison. The percentage of mCherry⁺ nuclei was calculated as the number of mCherry⁺ nuclei divided by the total number of visible cells in each field of view.

For mCherry quantification in Supplementary Figs. 6c and 8b, the total number of transformed cells in each field of view was determined by counting cells expressing mGold2t (one-plasmid design) or coexpressing mGold2t and mTagBFP2 (two-plasmid design). The number of nuclei expressing the mCherry payload was then quantified from the same field of view. The percentage of mCherry⁺ nuclei was calculated as the number of mCherry⁺ nuclei divided by the total number of transformed cells, multiplied by 100.

For tracking the mCherry signal at 5, 7 and 10 days after infiltration, images were analyzed in Fiji (ImageJ). In each field of view, every mCherry⁺ nucleus was annotated as a region of interest (ROI) using an identical, fixed-size oval selection. A background ROI of the same size was placed in a region lacking fluorescence. Total mCherry signal per field was calculated by background-correcting the raw integrated density of each nuclear ROI and summing the corrected values. For each biological replicate, total mCherry intensity was quantified from at least eight fields of view and the mean value from these technical replicates is reported.

For mGold2t tracking, images were analyzed in Fiji (ImageJ). For each field of view, the total mGold2t signal was quantified as the raw integrated density of the entire mGold2t channel image. A mock field of view from uninfiltrated leaves, acquired using identical imaging settings, served as the background reference; its raw integrated density was measured and used for background correction. Background-normalized mGold2t intensities were then averaged across at least eight fields of view per biological replicate.

Heat-shock treatment

A. tumefaciens-infiltrated plants were first maintained at 25 °C for 48 h under a 16-h light, 8-h dark cycle, allowing sufficient time for T-DNA delivery and for the accumulation of the R2 protein and RNA payload before heat-shock induction. Plants were then exposed to a heat-shock regimen of 25 °C during the light period and 37 °C during the dark period for an additional 4 days before analysis. Regular and heat-shock treatments were performed in parallel using the same infiltration batch (same plasmid prep, same Agrobacterium OD₆₀₀, etc.) and same-aged individual plants were considered as a single biological replicate; outcomes were assayed at matched time points.

Protein expression and detection

R2Tg, R2Bm, R2Tg-opt and R2Za coding sequences were fused to an N-terminal YPet tag for detection and confirmation of nuclear localization by confocal microscopy. Complementarily, R2 proteins were fused to an N-terminal HiBiT peptide tag for higher throughput luminescence detection using the Nano-Glo HiBiT lytic detection system. Then, 3 days after infiltration, three infiltrated N. benthamiana leaves were isolated, flash-frozen in liquid nitrogen and ground to a fine powder before extraction in lysis buffer (20 mM Tris-HCl pH 7.4 (Thermo Fisher Scientific, 15567027), 25% glycerol (Sigma-Aldrich, G5516), 20 mM KCl (Sigma-Aldrich, P5405), 2 mM EDTA (Sigma-Aldrich, E9884), 2.5 mM MgCl₂ (Sigma-Aldrich, M2393), 250 mM sucrose (Thermo Fisher Scientific, J65148.A1) and 0.1% PMSF (Thermo Fisher Scientific, 36978)). After centrifugation at 1,500g for 10 min at 4 °C to pellet nuclei, the pellet was washed up to five times with wash buffer (20 mM Tris-HCl pH 7.4 (Thermo Fisher Scientific, 15567027), 25% glycerol (Sigma-Aldrich, G5516), 2.5 mM MgCl₂ (Sigma-Aldrich, M2393) and 0.2% Triton X-100 (Sigma-Aldrich, S5886)) and lysed in high-salt buffer (20 mM HEPES–KOH pH 7.9 (Sigma-Aldrich, H3375), 2.5 mM MgCl₂ (Sigma-Aldrich, M2393), 100 mM NaCl (Sigma-Aldrich, S5886), 20% glycerol (Sigma-Aldrich, G5516), 0.2 mM EDTA (Sigma-Aldrich, E9884), 0.5 mM DTT (GOLDBIO, DTT) and one cOmplete Mini EDTA-free protease inhibitor tablet (Sigma-Aldrich, 11836170001) per 10 ml). For HiBiT detection, 30 µl of nuclear lysate was combined with 30 µl of Nano-Glo HiBiT lytic reagent (Promega, N3030) containing LgBiT and furimazine; luminescence was recorded on a Tecan Spark plate reader with a 1,000-ms integration time.

Isolation and transfection of N. benthamiana and A. thaliana protoplasts

Mesophyll protoplasts were isolated from fully expanded leaves of 4-week-old N. benthamiana or A. thaliana (Col-0) following Yoo et al.⁶³ with minor modifications. On the day of the experiment, a fresh enzyme solution (20 ml) containing 20 mM MES pH 5.7 (Sigma-Aldrich, M2933), 0.4 M mannitol (Millipore Sigma, 443907), 20 mM KCl (Sigma-Aldrich, P5405), 1.5% (w/v) cellulase onozuka R-10 (Yakult Pharmaceutical) and 0.4% (w/v) macerozyme R-10 (Yakult Pharmaceutical) was prepared; the MES–mannitol–KCl base was preheated to 70 °C for 4 min before the enzymes were dissolved. After cooling, 10 mM CaCl₂ (Sigma-Aldrich, C7902) and 0.1% (w/v) BSA (Fisher Bioreagents, BP9703) were added and the solution was passed through a 0.45-µm syringe filter into a Petri dish. N. benthamiana leaf strips or A. thaliana full leaves were incubated in this solution without agitation for 3 h at room temperature in the dark, with gentle swirling once per hour. The digest was diluted 1:1 with W5 solution (154 mM NaCl (Sigma-Aldrich, S5886), 125 mM CaCl₂ (Sigma-Aldrich, C7902), 5 mM KCl (Sigma-Aldrich, P5405) and 2 mM MES pH 5.7 (Sigma-Aldrich, M2933)) and passed through a 70-µm nylon mesh (Fisherbrand, 22363548); the flowthrough was centrifuged at 200g for 2 min. The pellet was gently resuspended in 5 ml of W5 and chilled on ice for 30 min; then, the supernatant was removed and the protoplasts were adjusted to 5 × 10⁵ cells per ml in MMG solution (0.4 M mannitol (Millipore Sigma, 443907), 15 mM MgCl₂ (Sigma-Aldrich, M2393) and 4 mM MES pH 5.7 (Sigma-Aldrich, M2933)).

For PEG-mediated transfection, a 40% PEG4000 solution (Sigma-Aldrich, 81240) was freshly prepared (4 g of PEG4000, 3 ml of H₂O, 2.5 ml of 0.8 M mannitol (Millipore Sigma, 443907) and 1 ml of 1 M CaCl₂ (Sigma-Aldrich, C7902)). In round-bottom Eppendorf tubes, 20 µg of plasmid DNA (≤20 µl) was combined with 100 µl of protoplast suspension and mixed gently, before adding 120 µl of PEG solution. After 15 min at room temperature, 1 ml of W5 was added and the mixture was centrifuged (50g, 2 min); the wash was repeated twice. The pellet was resuspended in WI solution (0.5 M mannitol (Millipore Sigma, 443907), 20 mM KCl (Sigma-Aldrich, P5405) and 4 mM MES pH 5.7 (Sigma-Aldrich, M2933)), transferred to BSA-blocked well plates (0.1% BSA; Fisher Bioreagents, BP9703) and incubated at room temperature in the dark for 24–72 h before downstream analyses. After incubation in the dark at room temperature for 24–72 h, transformed protoplasts were used for confocal imaging and/or flow cytometry. In a modified protocol, protoplasts were extracted from 5-week-old N. benthamiana leaves 7 days after Agrobacterium infiltration and analyzed directly by flow cytometry.

Flow cytometry analysis

Protoplasts were analyzed on a CytoFLEX flow cytometer (Beckman Coulter) provided by the Caltech Flow Cytometry and Cell Sorting Facility. Live protoplast gates were defined using fluorescein diacetate staining (Fisher Scientific, F1303), detected in the FITC channel. BFP was excited with a 405-nm laser and detected in the 450/45-nm channel, while mCherry was excited with a 561-nm laser and detected in the 610/20-nm channel. Data were exported as FCS files and analyzed using FloReada for preliminary gating. Lastly, all data were quantified and visualized using a Python 3 pipeline.

Amplicon PCR and Sanger sequencing

gDNA was extracted from N. benthamiana leaf tissue infiltrated with R2 constructs using the DNeasy plant mini kit (Qiagen, 69106) according to the manufacturer’s protocol. For 3′-junction analysis, 25-µl PCRs were assembled with 1× Phusion high-fidelity master mix (New England Biolabs, M0531S), 0.5 µM each primer (two forward, two reverse) and 20 ng of gDNA. Cycling conditions were as follows: 98 °C for 30 s, followed by 35 cycles of 98 °C for 10 s, 60 °C for 20 s and 72 °C for 30 s, with a final extension at 72 °C for 2 min. The expected 450-bp amplicons were verified on a 1% agarose gel, purified with the QIAquick gel extraction kit (Qiagen, 28704) and submitted to Laragen for Sanger sequencing. Chromatograms were aligned to reference sequences in SnapGene version 6.2 to confirm precise junction formation.

NGS

The 450-bp 3′-junction PCR products (primer pair: two forward, two reverse) were gel-purified with the QIAquick gel extraction kit (Qiagen, 28704) and eluted in 35 µl of nuclease-free water. DNA concentration was measured on a NanoDrop 2000 spectrophotometer and each amplicon was adjusted to 50 ng µl⁻¹, sealed in Eppendorf microcentrifuge tubes and shipped overnight to the Massachusetts General Hospital Center for Computational and Integrative Biology DNA Core (Harvard) for Illumina sequencing. The core performed NGS and provided (1) de novo assembled contigs representing variants present at >1% abundance and (2) the complete demultiplexed FASTQ files.

In Galaxy, paired-end reads were processed with DADA2; adaptors were trimmed, reads were quality-filtered and overlapping pairs were merged. Merged reads were aligned to a custom reference comprising the payload integrated at the 25S rDNA locus using Bowtie2 version 2.5.0 to capture low-frequency variants (<1%). The subset of sequence variants mapping perfectly to the insertion junction were imported into Geneious Prime 2024.1 and aligned to the same reference; the multiple-sequence alignment was exported as FASTA (Supplementary Table 3). Alignment visualization was carried out in Google Collab with Biopython 1.81, pandas 2.2.2 and matplotlib 3.7.3; the resulting grid plot of base-by-base variation was exported as high-resolution PNG files for publication.

ddPCR for detecting the average copy number of insertion per genome

gDNA was extracted from 100 mg of N. benthamiana leaf tissue collected from the Agrobacterium infiltration site 7 days after infiltration using the DNeasy plant mini kit (Qiagen). gDNA was predigested with FastDigest KpnI (Thermo Fisher Scientific, FD0524) according to the manufacturer’s protocol. Duplex 22-µl ddPCR reactions were prepared by mixing 11 µl of ddPCR supermix for probes (no dUTP; Bio-Rad, 1863024), 900 nM forward and reverse primers for the target and reference loci, 250 nM FAM-labeled probe targeting the NbEF1α reference gene, 250 nM HEX-labeled probe targeting the integrated R2 junction and 50 ng of predigested gDNA. Oligonucleotide sequences were synthesized by Integrated DNA Technologies and are listed in Supplementary Table 4. Reaction mixtures were loaded into a DG8 cartridge (Bio-Rad, 1864007) along with 70 µl of droplet generation oil (Bio-Rad, 1863005) and droplets were generated using a Bio-Rad QX200 droplet generator. Following droplet generation, 40 µl of emulsion was transferred to a 96-well plate, heat-sealed with pierceable foil and thermal-cycled under the manufacturer’s conditions with an annealing and extension temperature of 56 °C for the 3′ junction and 60 °C for the 5′ junction. Droplets were read using a Bio-Rad QX200 Droplet Reader. Data were analyzed using QX Manager software (Bio-Rad). All Bio-Rad ddPCR equipment was provided by the Clarity, Optogenetics and Vector Engineering Research (CLOVER) Center at Caltech. Average copy number per genome was calculated as the ratio of HEX-labeled target copy number to the FAM-labeled reference gene (NbEF1α) copy number, assuming four copies of the reference sequence for N. benthamiana and the single-copy ribosomal protein S13 (SAND) gene from S. lycopersicum.

Assessing on-target and off-target insertion rates

On-target and off-target payload integration sites were determined for four gDNA samples using TTISS as described previously for R2 systems in mammalian cells. Three of the samples were biological replicates for insertion of the R33-R4-HDV payload and the fourth was a negative control transformed with the R33-R4-HDV payload but without an R2 protein cassette.

Briefly, 300–500 ng of gDNA was mixed with 3–5 µl of Endura Tn5 transposase (Zymo Research) loaded with Tn5 adaptors and tagmented following manufacturer recommendations. Tagmented samples were purified using a QiaQuick DNA purification kit (Qiagen) and amplified twice using Phusion DNA polymerase (New England Biolabs) for 12 cycles and an annealing temperature of 66 °C in the first round of PCR, followed by five cycles at an annealing temperature of 60 °C and 15 cycles at an annealing temperature of 64 °C. Oligonucleotide sequences used in TTISS are described in Supplementary Table 4. The libraries were subjected to paired-end sequencing on an AVITI instrument (Element Biosciences), yielding a total of 36.3 million paired reads across the three biological replicates with R2 insertions and 22.6 million reads for the payload-only negative control sample (Supplementary Table 5).

To determine the rates of on-target and off-target integrations, we first used cutadapt to extract read pairs where the start of read 1 matched the complete R2Tg 3′ UTR. Across all samples, 91% of the read 1 sequence contained the complete R2Tg 3′ UTR. With cutadapt, we also trimmed the R2Tg 3′ UTR and any adaptor sequences from the tagmentation process. We used the N. benthamiana genome for mapping and concatenated this with the R33-R4-HDV payload cassette sequence. Bowtie2 was used for mapping and only read pairs where both successfully mapped were used for analysis. As anticipated, most reads (98%) originated from the payload cassette (Supplementary Table 5), which also contains the complete R2Tg 3′ UTR, rather than the N. benthamiana genome; these reads were not included in the on-target and off-target analysis. In the N. benthamiana genome, 25S loci were identified with barrnap version 0.9, and bedtools intersect was used to identify reads mapping to 25S sites; these reads were considered on-target. Any remaining reads were considered off-target.

As this approach involves multiple PCR steps and our samples contain an abundance of the R33-R4-HDV minimal payload cassette, we included a negative control to assess the rate of PCR chimeras that mimic integration events by combining the cassette R2Tg 3′ UTR with gDNA. False-positive 25S insertions could be detected at a rate of 0.00026% of mapped reads and false-positive off-target insertions could be detected at a rate of 0.012% (Supplementary Table 5). These values were used to adjust the number of on and off-target reads; the cumulative on-target rate was 95.4%; after adjustment, it was 96.0%.

Identifying R2-compatible 25S sites in plants

To determine potential 25S R2 insertion sites across A. thaliana, N. benthamiana and S. lycopersicum, genomes were obtained from published assemblies as follows: A. thaliana, the Naish et al. assembly⁶⁴ with the incomplete NOR regions replaced by the corresponding NOR sequences from Fultz et al.²³; N. benthamiana and S. lycopersicum, Chen et al.²⁴. The 25S rDNA loci were annotated in each genome assembly using barrnap version 0.9 run in eukaryotic mode. Predicted 25S sequences were extracted and screened for the presence of a conserved R2 recognition and cleavage region. Predictions lacking an exact match to this motif were excluded, yielding a set of candidate R2-compatible 25S target sites (Supplementary Fig. 12). The barrnap annotation coordinates were used for visualization.

Identifying 25S integration site locations

gDNA was PCR-amplified using a primer specific to the 3′ end of the inserted DNA and a primer specific to a highly-conserved region ~2 kb downstream of the R2 recognition motif in N. benthamiana rDNA. Oxford Nanopore sequencing of the amplicons was completed by Plasmidsaurus using R10.4.1 flow cells. Raw reads were filtered using cutadapt to remove any not between 2.5 and 3.5 kb, lacking the final 177 bp of the R2 insert 3′ end or not ending with the rDNA primer-binding site (approximately 20% of all reads passed filtering). Read sequences were aligned using vsearch to a database containing all R2-compatible 25S sequences and adjacent sequences up until the primer-binding site extracted from the N. benthamiana genome. Only alignments with >98% sequence identity and coverage were considered possible sites of DNA integration. While 25S sites on different chromosomes can be resolved unambiguously, most reads align equally well to multiple identical 25S sites within the same chromosome.

5′ integration junction analysis

gDNA was PCR-amplified using a primer specific to the 5′ end of the R2 integrated DNA and a primer specific to a region ~1 kb upstream of the R2 recognition motif in N. benthamiana rDNA. PCR products were submitted to Plasmidsaurus for premium amplicon sequencing. Plasmidsaurus purified the amplicons but did not size-select them before Oxford Nanopore library preparation and sequencing on R10.4.1 flow cells. Raw sequencing reads were filtered in two sequential steps using cutadapt. Reads lacking the forward primer and expected start of the 25S rDNA sequence were discarded in the first step and reads lacking the reverse primer and expected end of the insert sequence were discarded in the second. The resulting reads, therefore, span the complete 5′ junction from rDNA into the transgene.

Junction classification was performed in Python using a custom pipeline. Reads containing part of the pVmV promoter sequence present in our plasmid were removed as plasmid-derived contaminants. Each remaining read was searched for the R2 5′ rDNA homology arm and start of the payload sequence; reads without these represent possible truncated insertions or PCR chimeras and were discarded. For reads passing these filters, the 100 nt immediately upstream of the homology arm was extracted as the junction window. Reads with a mean Phred quality below 30 across this window were also excluded.

Junction type was then assigned as follows. Reads in which the junction window ended with the canonical rDNA insertion site sequence were classified as anneal, indicating insertion at the expected 25S rDNA target site. For the remaining reads, the junction window was aligned locally to the most common N. benthamiana 25S rDNA sequence and to the reverse complement of the transgene cDNA sequence. Reads matching rDNA were classified as join, while reads matching the transgene reverse complement were classified as snap-back. The sequences of all join and snap-back reads are listed in Supplementary Table 6.

Methylation analysis through targeted bisulfite sequencing

Cytosine methylation at the R2 insertion site was analyzed by bisulfite sequencing following Foerster and Mittelsten Scheid with modifications⁶⁵. gDNA was extracted from post-transfection protoplasts 48 h after PEG-mediated transformation using the DNeasy plant mini kit (Qiagen, 69106) according to the manufacturer’s protocol. Before bisulfite conversion, successful payload integration at the 25S rDNA locus was confirmed by PCR amplification of the 3′ junction from the extracted gDNA yielding an expected amplicon of 423 bp. Products were verified on a 1.5% agarose gel and only gDNA samples confirmed to carry the insertion were carried forward for bisulfite conversion.

Bisulfite conversion was performed using the EpiTect bisulfite kit (Qiagen, 59104) with some modifications to improve conversion efficiency; the denaturation step at 99 °C was extended by an additional 5 min and an extra conversion incubation of 2 h at 60 °C was inserted before the final hold at 20 °C. Temperature and incubation times were controlled using a thermocycler. Completeness of bisulfite conversion was assessed by PCR amplification of the endogenous DDM1 locus (At5g66750), a genomic region known to be hypomethylated in A. thaliana, using two primer pairs targeting unconverted and converted template independently. The nonconversion control pair amplifies only unconverted template and yields a 562-bp product; the conversion control pair amplifies only fully converted template. Successful conversion was confirmed by the presence of a 562-bp product with the conversion control primers and the absence of product with the nonconversion control primers.

Bisulfite PCR was performed using the EpiMark Hot Start Taq DNA polymerase kit (New England Biolabs, M0490S). Primers were designed against bisulfite-converted sequence such that cytosine residues within primer-binding sites were represented as degenerate bases to avoid methylation bias, while minimizing the total number of degenerate positions; converted and nonconverted strands were analyzed independently, as bisulfite conversion renders the two strands noncomplementary. Two amplicons were targeted: a 426-bp region of the 25S rDNA upstream of the R2 insertion site and a 492-bp region spanning the payload–rDNA 3′ junction. PCR products of the expected size were verified on a 1.5% agarose gel, purified and concentrated using the QIAquick PCR and gel cleanup kit (Qiagen, 28506) and adjusted to 10 µl before submission. Purified amplicons were submitted to Plasmidsaurus for amplicon premium PCR sequencing, which generates up to 3,000 raw reads per sample from mixed PCR populations.

Bisulfite amplicon sequencing analysis

Raw sequencing reads were processed using a custom Python pipeline build based on CyMATE⁶⁶. Each read was first assessed for bisulfite strand identity by comparing its cytosine and guanine base composition. Reads were reverse-complemented where necessary to place all reads in the plus-strand orientation and reads that could not be assigned to either strand were discarded. Oriented reads were trimmed by searching for the forward and reverse primer sequences within the terminal 15 nt of each read, allowing up to three mismatches and using IUPAC degenerate base matching to accommodate bisulfite-induced C-to-T transitions. Reads in which the forward primer was not identified were discarded. Trimmed amplicon sequences deviating more than 30 nt from the expected amplicon length of 449 bp were also discarded.

Trimmed reads were aligned to the reference amplicon sequence using a custom bisulfite-aware Needleman–Wunsch global alignment in which a cytosine in the reference paired with a thymine in the read was scored as a match, reflecting complete conversion of unmethylated cytosines. At each reference cytosine position, the aligned base was recorded as methylated (C), unmethylated (T) or no call. Each cytosine position was classified by trinucleotide context as CpG, CHG or CHH and per-site methylation frequency was calculated as the fraction of reads with a cytosine call at that position across all covering reads.

For epiallele analysis, each aligned read was encoded as a binary methylation vector across all cytosine positions in the amplicon. Reads where more than 20% of positions were recorded as no call were excluded. Unique binary patterns were enumerated and ranked by frequency. The ten most abundant epialleles were visualized as a binary heatmap alongside a per-site methylation bar chart colored by cytosine context. All analyses were performed in Python 3 using NumPy and Matplotlib.

Replicate information

All biological replicates were performed using different plants of the same age, with leaves at the same developmental stage. Exact biological replicate information and statistical analysis methods used are provided in the figure captions. Supplementary Table 7 contains P values for all multiple-comparison tests. For quantification of fluorescence intensity or the number of positive cells from confocal microscopy images, at least eight nonoverlapping fields of view were collected per biological replicate.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.