--- title: "Introduction to baseq" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to baseq} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(baseq) ``` ## Introduction `baseq` is a basic sequence processing tool for biological data. It provides simple and efficient functions for common tasks in molecular biology, such as cleaning sequences, translating DNA/RNA to protein, and calculating GC content. ## Sequence Cleaning You can clean DNA or RNA sequences by removing any non-standard characters. The universal `clean_seq()` function automatically detects the type. ```{r cleaning} dna_seq <- "ATGCnNryMK" clean_seq(dna_seq) rna_seq <- "AUGGCuuNnRYMK" clean_seq(rna_seq) ``` ## Translation `baseq` can translate DNA and RNA sequences into protein sequences in all six reading frames. ```{r translation} dna_seq <- "ATCGAGCTAGCTAGCTAGCTAGCT" proteins <- dna_to_protein(dna_seq) proteins[["Frame F1"]] ``` ## GC Content Calculate the GC content of a DNA sequence. ```{r gc} dna_seq <- "ATGCATGC" gc_content(dna_seq) ``` ## Reading and Writing Files `baseq` provides universal functions to read and write FASTA and FASTQ files. ```{r files, eval = FALSE} # Read a FASTA file into a dataframe # df <- read_seq("path/to/file.fasta") # Write a dataframe to a FASTA file # write_seq(df, "output.fasta") ``` For more details, see the documentation for individual functions.