StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Discovery of a New Gene and Analysis of the Encoded Protein - Coursework Example

Cite this document
Summary
The DNA sequence of the hypothetical new gene was analyzed to determine the encoded protein and the order of bases in the DNA. The first step in the nucleotide sequence analysis was the translation of the DNA sequence to allow for the identification of the open reading frame…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER97% of users find it useful
Discovery of a New Gene and Analysis of the Encoded Protein
Read Text Preview

Extract of sample "Discovery of a New Gene and Analysis of the Encoded Protein"

 Report on the discovery of a “new” gene and analysis of the encoded protein Abstract The DNA sequence of the hypothetical new gene (Type II membrane serine like protease) was analyzed to determine the encoded protein and the order of bases in the DNA. The first step in the nucleotide sequence analysis was the translation of the DNA sequence to allow for the identification of open reading frame. Next, the translated peptide of the new gene was then compared with the existing nucleotide database to deduce the possible amino acid sequence of the nucleotides, suggest the likely function of the nucleotide sequence of the new gene and the closest organism it belonged to. The comparison result showed that the nucleotide sequence was more similar to TTSP than the rest. Report on the discovery of a “new” gene and Analysis of the Encoded Protein Introduction Type II membrane serine like protease (TTSP) generally belongs to a protein family of proteolytic enzymes that have been identified in humans. A careful analysis of the nucleotide sequence of the unknown gene and its encoded protein revealed a number of close similarities with TTSP. Numerous genome studies have revealed that the gene associated with TTSP is involved in a diverse role of physiological roles in the body some of which include regulation of cancer development and its progression in the human body. A number of cloning vector contamination was, however, detected in the nucleotide sequence. One of the major vector contaminants was pBR322 DNA which is one of the commonly used plasmids in E. coli cloning vectors. pBR322- like molecules are generally double stranded circles with 4361 base pairs in length and the plasmid is known to posses a number of unique sites for restriction enzymes (Yarus, 420). For example, the molecules contain the genes responsible for resistance to tetracycline and amplicin and the resistance can be significantly amplified with chloramphenicol. Additionally, the plasmid contains a reprelicon rep that is primarily used for the replication of the plasmid and rop gene for the coding of Rop proteins and the conversion of unstable RNA 1- RNA II to more stable complexes. The open reading frame (ORF) coordinates of the contaminating vector are primarily in the form of translational start and translational stop with no regard to the transcription direction and they include the start and stop codons. The discovery of a new gene and the analysis of its encoded protein is a complicated process which involves determining the encoding stand for transcription as well as its open reading frame. Using various web translating tools, both the strands in all the reading frames of the nucleotide sequence can be translated, as well as identify the stop codons in the sequence. Generally, the translation of the unknown nucleotide sequence is achieved by reading the three bases of the sequence at a time and then comparing it with the existing nucleotide database to help determine its amino acid sequence. Method The analysis of the unknown gene and its encoded protein involved the use of a number of sequence analysis software for nucleotide sequence analysis, peptide analysis and protein analysis. During the nucleotide sequence analysis, the 4361bp sequence was first analyzed using Vescreen software to help find out whether the sequence was contaminated by any vector sequence. The DNA was compared with the nucleotide sequence database through blastn category of the NCBI Basic Local Alignment Search Tool (BLAST). Distribution of Vector Matches on the Query Sequence 1 738 1476 2214 2953 Match to Vector: Strong Moderate Weak Segment of suspect origin: After the manual removal of all contaminated vector sequences, the remaining DNA sequence of the unknown gene was inspected using blastx to help compare the translated sequence with the corresponding protein database. Further analysis of the uncontaminated nucleotide sequence revealed all the potential open reading frames. Initially, nucleotide sequence was inspected through blastx for any similarity of the translated pBR322 sequence against protein database at the same site as the one of blastn. From the unidentified pBR322, all the possible (ORFs) opening reading frame and obtained amino acids sequence were identified by ORF Finder. The ORF Finder also identified all the possible amino acid sequences. NEBcutter V 2.0 software was used in the examination of the restriction sites of the unknown gene to determine for the restriction enzymes in the entire nucleotide sequence of the unknown gene while the Softberry’s BROM software was used as a predictor of bacterial promoter. Lastly, the DNA sequence being analyzed was inspected for possible repeat sequences using the software known as Tandem repeat Finder. Software used for sequence analysis Nucleotide Analysis Software tools Used Analysis of contamination by vector sequence NCBI’s Vescreen Comparison of the query sequence with nucleotide database blastn Comparison of the translated sequence with protein database blastx Analysis of the restriction sites Netcutter V 2.0 software Determination of the bacterial promoter Softberry’s BROM Analysis of the repeat sequences Tandem repeat Finder After translation, the amino acid sequences were analyzed using blastp NCBI BLAST and the conserved domains of the sequence were also detected. The relevant patterns, sites and profiles in the translated open reading frame were then scanned using Pratt 2.1, as well as ProScan software, and the results were compared with the PROSITE database of domains and protein families. Using InterProscan, the translated sequence was then scanned for motifs and functions and the results were subsequently aligned using ClustslW and presented using Boxshade 3.21. Lastly, protein analysis of the unknown gene was carried out using a number of common online software. For example, NNPREDCT was used in the prediction of the secondary structure of the unknown gene. Additionally, a 3D structure of the protein structure was extracted from the Protein Data Bank. Finally, the cellular protein localization was also checked and all the results were used to determine the protein functions of the unknown gene. Results and discussion After the 4136bp sequence was matched against other similar nucleotide sequences, in the data base, a high similarity was detected between the nucleotide sequence and a number of normal cloning vectors. A VecScreen analysis of the nucleotide sequence for potential vector contamination indicated that three regions of the nucleotides (1-298, 327-427, 2304-2953) had strong match to the pBR322 nucleotide sequence and cloning vector. After the manual removal of the contaminating vector, the remaining DNA was subjected to further analysis using a number of software such as ORF finder, Nebcutter V. 2.0 as well as blastp and blastn. For example, Nebcutter V. 2.0 software significantly helped in the determination of numerous sites for restriction enzymes such as PstI, ecoRI and HindIII among others. On the other hand, the results of the peptide sequence analysis of the nucleotide sequence belonging to the hypothetical gene showed that the newly discovered gene was more similar to the pBR322 plasmid like proteins. After the pBR322 nucleotide sequence were analysed using blasp,blastx and OROFinder the outcome showed the necessary hit to a conserved amphillians and tetracycling synthase-like protein. The amino acids also gave domain to the two enzymes. Each deoxyribose and phosphate backbone after being shortening polymerization a molecular weight of 179.089 were obtained. The molecular weight of the nucleotide sequence is generally the total weight of the backbone unit and base subtract one. Since the plasmid DNA is double stranded one should really take into account the complementary strand in demining the weight of the a plasmid. A biological catalyst called EcoRII that notes the sequence ccagg and cctgg and makes the necessary cuts. The special enzymes are called restriction endonucleases or, in other word, the restrictionenymes. We can manually locate where the sequences are or we can use the search function in a text editor. Additionally, the sequence of amino acids suggested the existence of a unique domain. There was, however, no significant results yielded when PROSCAN software was used to access the similarity level of the protein sequence. Results of the Nebcutter V.2.0 analysis During the protein sequence analysis of the amino acids belonging to the nucleotide sequence involved the prediction of their secondary structure. NNPREDICT was used to help in the prediction of secondary structure and the results showed that the nucleotide sequence contained a number of strands and helices similar to those of pBR322 plasmid-like proteins. The analysis of the nucleotide realignment results indicated that the nucleotide sequence was more likely to have a +3 framing. To pursue further inspection, the +3frame graph was clicked and the highlighted regions contained 1272 nucleotides. On the other hand, running the sequence alignment on the blastn program revealed that the unknown protein is likely to be a type II membrane serine like protease which is likely to be from Homo sapiens. 1 gaccgagcgc agcgagtcag tgagcgagga agcggaagag cgcccaatac gcaaaccgcc 61 tctccccgcg cgttggccga ttcattaatg cagctggcac gacaggtttc ccgactggaa 121 agcgggcagt gagcgcaacg caattaatgt gagttagctc actcattagg caccccaggc 181 tttacacttt atgcttccgg ctcgtatgtt gtgtggaatt gtgagcggat aacaatttca 241 cacaggaaac agctatgacc atgattacga atttaatacg actcactata gggaattctt 301 ttttttagcg ctctgggtca ttttcggcga cgatgatcgg cctgtcgctt gcggtattcg 361 gaatcttgca cgccctcgct caagccttcg tcactggtcc cgccaccaaa cgtttcggcg 421 agaagcatat atatatatat atattttata tattacatca ctatcgatgc tagctacgac 481 agttcgatgc tagtttatat aatagaggaa tcatttttat gcaggagaaa aataatgtcc 541 ctcttaaact gtgagaattc gtgcggtagt agccagtcgg aatctgactg ttgcgtagca 601 atggcatcca gctgctccgc ggtctcccgc gatgactcag taggaggtag cgccagtagc 661 ggtaatttga gcagttcatt tatggaggaa atacaggggt atgacgttga gttcgacccg 721 ccacctccct tagagagtag atacgaatgc ccaccgatct gcctcatggc gctacgcgaa 781 gcggtgcaat ctccaccctg tggtcaccgc ttttgtcgtg cttgtattat caggtctata 841 cgtgatgctg ggcatagatg tcctccggta gataatgaaa tcctattaga gaaccagttg 901 ttcccacctg ataatttcgc acgcagggag atcctctctc tgatggtcag gtgcccgccc 961 aacgagggct gtctacaccg aatggaattg cggcatcttg aagatcatca agcacactgt 1021 gagttcgcac ttatggattg tcctccgcag tgtcaaagac ctccctttca aaggtttcat 1081 atcaacatcc atattttgcg agactgtccc ccaagacgcc aagtctcatg tgacaactgc 1141 gctgcgagca tggcttttga ggacagagag attcatgacc agaattgccc gccccttgcg 1201 aacgtcattt gtgaatattg caacagtata ctaattagag aacaaatgcc acctaaccat 1261 tacgatctgg attgcccacc ctctgcacca cccataccac cttgctcgtt tagttcgttc 1321 ggatgccacg agaggatgca acgcaatcat cttgcccgtc accttcaaga gaactcgcag 1381 agccatatgc ggatgctcgc ccaagcagtg catagtctat ctgtgattcc tccggacagt 1441 ggctatattt cggaggttcg aaatttccaa gaatctattc atcagttgga gggccgctta 1501 gttcgacagg accatcagat tcgtgaattg tccgctcgta tggaatcaca gagtatgtat 1561 gtttcggaac ttcgaagatc aattcggagc ttagaagata gggtggctga aatcgaagcc 1621 cagcaatgta atgggatcta tatttggcgg ataggtaatt tcgggatgca cctgagatgt 1681 caagaagagg aacggccgcc cgtagttata cacagtccgc ctggttttta ctccggacgc 1741 ccacctgggt ataggctatg catgcgatta catctgcaac tcccaccttc tgcccagcga 1801 tgcgcgaact acataagtct gtttgtacac tccatgcagg gggagtacga ttcccatcta 1861 ccgccctggc cgccttttca aggatcaata cggttgtcaa tactggatca aagcgaggct 1921 ccaccggtgc gtcagaatca tgaggaaatc atggacgcca ggccacccga acttctcgca 1981 ttccaacggc cgccaagcat accaccccgg aatcctccac gtggctttgg atacgtgtca 2041 tttatgcatt tggaagccct gcgtcagcgt tcattcatac gagatgactc actcttagtc 2101 cgctgcgagg ttagcagtcg attcgatatg ggatcgttaa gacgggaggg ctttcagccc 2161 cctaggtcaa gcgatgcggg cgtaatgatg agatggtgtg tatgacacac atgagaaaaa 2221 aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaagc agaatttttt tgcgcgcgcg 2281 cgataacgcg cgcgcgcttt tttggccatt atcgccggca tggcggccga cgcgctgggc 2341 tacgtcttgc tggcgttcgc gacgcgaggc tggatggcct tccccattat gattcttctc 2401 gcttccggcg gcatcgggat gcccgcgttg caggccatgc tgtccaggca ggtagatgac 2461 gaccatcagg gacagcttca aggatcgctc gcggctctta ccagcctaac ttcgatcact 2521 ggaccgctga tcgtcacggc gatttatgcc gcctcggcga gcacatggaa cgggttggca 2581 tggattgtag gcgccgccct ataccttgtc tgcctccccg cgttgcgtcg cggtgcatgg 2641 agccgggcca cctcgacctg aatggaagcc ggcggcacct cgctaacgga ttcaccactc 2701 caagaattgg agccaatcaa ttcttgcgga gaactgtgaa tgcgcaaacc aacccttggc 2761 agaacatatc catcgcgtcc gccatctcca gcagccgcac gcggcgcatc tcgggcagcg 2821 ttgggtcctg gccacgggtg cgcatgatcg tgctcctgtc gttgaggacc cggctaggct 2881 ggcggggttg ccttactggt tagcagaatg aatcaccgat acgcgagcga acgtgaagcg 2941 actgctgctg caa Conclusion In conclusion, the discovery of a new gene and the analysis of its encoded protein is a complicated process which involves determining the encoding stand for transcription as well as its open reading frame. Although numerous cloning vector contaminants were detected, the comparison result showed that the nucleotide sequence was more similar to TTSP than the rest. Works Cited Yarus, Widmann M. “RNA-amino acid binding: a stereochemical era for the genetic code”. J.Mol. Evol. 69.5(2009): 406–429. Print. Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(Discovery of a New Gene and Analysis of the Encoded Protein Coursework, n.d.)
Discovery of a New Gene and Analysis of the Encoded Protein Coursework. Retrieved from https://studentshare.org/biology/1780587-a-report-in-the-style-of-a-scientific-paper-describing-the-discovery-of-a-new-gene-and-analysis-of-the-encoded-protein
(Discovery of a New Gene and Analysis of the Encoded Protein Coursework)
Discovery of a New Gene and Analysis of the Encoded Protein Coursework. https://studentshare.org/biology/1780587-a-report-in-the-style-of-a-scientific-paper-describing-the-discovery-of-a-new-gene-and-analysis-of-the-encoded-protein.
“Discovery of a New Gene and Analysis of the Encoded Protein Coursework”, n.d. https://studentshare.org/biology/1780587-a-report-in-the-style-of-a-scientific-paper-describing-the-discovery-of-a-new-gene-and-analysis-of-the-encoded-protein.
  • Cited: 0 times

CHECK THESE SAMPLES OF Discovery of a New Gene and Analysis of the Encoded Protein

Introduction to the lab report on controlling barley powdery mildew

Wild type Mlo allele encodes a cell membrane receptor protein (Mlo protein) and is the dominant allele.... It has been reported that Mlo protein is imperative for the successful colonization of the pathogen in absence of which the fungal spores are unable to penetrate the epidermal cell layer of the host (Buschages et al.... The other pathway involves collaborative action of multiple race specific resistance genes collectively known as R gene e....
4 Pages (1000 words) Essay

Multiple alleles and sex chromosomes

An allele is an alternative form of a gene and it represents the different version of a similar gene (Pollard and Earnshow, 2007).... The dominant allele will prevail over the recessive gene and is responsible for the resulting phenotype.... A gene is comprised of two alleles and each individual carries only two alleles of each gene, which exhibit a dominant recessive relationship.... Multiple alleles arise where there are three or more different alleles of a particular gene in a gene pool....
5 Pages (1250 words) Assignment

Francis Crick as the Discoverer of the Genetic Code

He therefore chose the x-ray diffraction and as in those days most of the scientists perceived the proteins as the likely genie material, he was requested to study the structure of protein, specially the hemoglobin.... After their meeting they immediately shifted to the DNA as according to Watson thought it was quite necessary for the further study of the gene....
8 Pages (2000 words) Essay

Ethical Issues in Counseling Terminally Ill Patients

Though the paradigms of modern day counseling may well be never ending, yet the actual essence remains the very same.... Whether it is a psychological issue at hand, or even a personal problem,… Given the aforementioned realm, understanding the crux of counseling shall be made simple.... It is an illusion if it is only considered in terms of a therapists' room, much on In the world of medical practice and health care, there are many complex issues than ordinarily meet the eye....
11 Pages (2750 words) Essay

Multiple Alleles, the Inheritance of Sex Chromosomes and Sex-Linked Characteristics

The paper "Multiple Alleles, the Inheritance of Sex Chromosomes and Sex-Linked Characteristics" states that homogametic chromosomes are found in females as they have a pair of identical XX while heterogametic chromosomes are found in males since they possess X and Y chromosomes.... nbsp;… Sex-linked genes can be defined as genes that are unique to the sex chromosomes and are, therefore, carried in the sex chromosomes....
5 Pages (1250 words) Assignment
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us