-
Website
http://www.thinkgene.com -
Original page
http://www.thinkgene.com/terrible-comp-sci-sins-of-biologists/ -
Subscribe
All Comments -
Community
-
Top Commenters
-
Andrew Yates
68 comments · 1 points
-
Wholesale Clothing
11 comments · 1 points
-
Josh Hill
6 comments · 1 points
-
Sciphu
6 comments · 2 points
-
Cheap Leaflet Printing
11 comments · 1 points
-
-
Popular Threads
On the way, we can note that in RNA, the nucleotide T is replaced by U. To represent all the codons (64 of them, 3 bases each) would require 6 bits, which is the same as for representing each of the 3 bases. So any effort to compress the representation stops here unless we are only going to look for the 20 amino acids; then we can take a bit off and we are down to five bits. The good stuff for what I write here is at: http://en.wikipedia.org/wiki/Genetic_code
In that article you will find this interesting quote: "A comparison may be made with computer science, where the codon is the equivalent of a word, which is the standard "chunk" for handling data (like one amino acid of a protein), and a nucleotide for a bit." Nooo...I am not going to defend this literally, because this communicates figuratively.
0001 A
0010 C
0100 G
1000 T
Now, if you don't know the base, it's 1111 (N), if it's an A or a C, it's 0011, etc...
Using bases as a bitmap also makes comparisons much faster too... you can just AND each bitmap against each other and if the result is greater than zero, it's a match.
I think that we are studying completed and processed genomes at this point which means that all of the ambiguities have beem resolved. Some of the other ideas mentioned are important for sequences that still are being assembled. So, except for the necessary start and end codon, everything else would have been corrected.