Community Page
- www.thinkgene.com Jump to website »
-
Subscribe -
Community
-
Top Commenters
-
Popular Threads
-
Recent Comments
- Damn, how much complaining can you do, just get an iPhone if the pre makes u that mad, if u want speed get a quad-core powered desktop with 12 gigs of ddr3. the guy that designed the iPhone,...
- Our lab does not perform this type of work, but at Clinical Reference Laboratory we do offer a full portfolio of CLIA-certified molecular services if this is ever of interest. Our expertise...
- what the hec is that!!!!im sooo scaredfor life
- THIS THING IS SOOOO WEIRD, BUT KOOL YAL R SUCH GENIUS 4 INVENTING THIS...
- o.0 its...cute...i whant one ... but i think its wrong... >.> if its real then i'll probbally end up living in a metal shell waiting for the apocolypse... not soon till someone makes a...
Think Gene
a bio blog about genetics, genomics, and biotechnology
I recently noted in Napster of Medicine that an entire human genome would fit on a music CD.
How much data IS a human genome?
2 bits per base (4 bases = 22)
3,080.4 Mb per human genome [1]
700 MB per CD-ROM
(1 human genome) *
(3,080,400,000 bases / 1 h ... Continue reading »
How much data IS a human genome?
2 bits per base (4 bases = 22)
3,080.4 Mb per human genome [1]
700 MB per CD-ROM
(1 human genome) *
(3,080,400,000 bases / 1 h ... Continue reading »
1 year ago
1 year ago
1 year ago
Palonek: Why 7 bits? That's 2<sup>7</sup> = 128. You may be thinking of ASCII, which is 7 bits, to write the literal letters "A G C T." If ASCII is the encoding your biotech or lab uses for massive DNA files, you are over 3.5 times the data (so 3.5 the bandwidth, 3.5 the storage, and sometimes 3.5 the processing power.) That's bad.
1 year ago
Drew's assumption would be what I would do for storing this kind of data as, of course, there is a lot of it.
1 year ago
Using the ".2bit" format, human genome version "hg18" fits into a file listed here as 770 MB.
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/...
The format is described here:
http://genome.ucsc.edu/FAQ/FAQformat#format7
As well as the earlier, and still sometimes useful, "nibble" format that used 2-bases per byte.
In biology, the sequence of ACGT isn't all that contains inherited information. (There is also all the proteins you inherit along with DNA, and DNA methylation, and lots more stuff still to discover.) But I wouldn't know where to start to compute the information content there.
1 year ago
1 year ago
Also, FASTA and other file formats are not primarily used for storage but transport. Almost no bioinformatics program operates directly on ASCII-data, but transforms such exchange formats to some internal representation.
For the 10MB I guess the author thinks in terms of working on a diff with respect to some reference genome. While that is probably workable for applications on the human genome, it's not really patentable (UNIX patch and diff being older than me and there's probably even older prior art) and impracticable on a general scale. Impracticable because an index for describing any sequence in such a relative way would be far too big, i.e. it would probably require more storage than only transferring the sequences worked on directly.
[1]
1 year ago
1 year ago
11 months ago
10 months ago
Now, how high would be only 3D positional information content needed to describe a human?
You would need to position single cells, define the inner structure of particular cell types, describe the form of single nerve cells (dendrites)...etc
Now how many cells are there in the human organism?
Wihout any calculation, we can see the information quantity to describe a human in uncounted Terrabytes. Human chromosomes contain , as calculated here, 740 MB.
So, why for the God's sake do we beleive that the whole of our hereditary information resides in the genes?
4 weeks ago
Well, maybe that's a bit hard to follow so let's try this instead: how many hairs are on your arm? Well, I don't really care about the particular number but what I want to know is if you had the same number of hairs on your arm when you were a child, and I mean the three foot tall variety.
No, no you didn't. You had many fewer BUT they were about the same distance apart. Now, I'm sure you know that your arms don't just grow at the ends- there's a lot of growth in the middle and it's more or less continuous... but how could you add new hairs evenly spaced in that?
Well it's simple. Much like our DNA you just need two values to keep track of it (though it's not really bits, it's not THAT simple.) You need a protein that causes hairs to grow and you need a protein that prevents them from growing. Like a lot of things in our body the protein that prevents hair from growing just stops cells from making the hair that promotes hair formation but the promoting protein promotes the preventer and promotes itself. There's another trick though. The preventer moves around between cells much more easily than the promoter.
No need to do mental gymnastics here, I'll just state the end result: cells in high concentration of the promoter make enough of it to overcome the effects of the preventer and low concentrations just pool up on the preventer... up to a point. If there aren't any hairs close enough to prevent another from growing they don't have enough of the preventer so the promoter takes over and gives you another hair.
A similar set up is also used to make sure you don't grow two heads. In fact this kind of thing is used so often that we can safely say the information used to build your body is many many times smaller than the actual information it would take to record the current state of your body.
If you're much of a programmer you know how just a few lines of code (file might end up being a few kb if you didn't want it really small,) could produce an image of many gigabytes in size, if you had some reason to let it make a large enough image.
Don't get me wrong though. There is more to us than our DNA.
Our DNA basically lays out the boundaries of what we can possibly grow to be and the environment we grow in narrows it down until we reach that single possibility that is ultimate "you."
10 months ago
We believe that most of our hereditary information resides in genes because it does. However, a genome, as you say, cannot possibly fully describe a mature human. A genome is more like a brief mathematical equation used to produce beautifully complex fractal design when fed with ambient noise and interpreted as colors and coordinates on a screen.
10 months ago
Now, let's take a look at this possible analogy.
Imagine you are demonstrating a PC to someone who has no idea of computers whatsoever, and has never seen one.( Increasingly difficult to find, but there must still be some around :)
Ok , you show him how inputs on the keyboard produce results on the screen. Knownig nothing about the PC under the desk, our computer novice has to think that the keyboard alone causes all the fascinating happenings on the screen.
Now our virtuous genetics has got hold of the keyboard - genes; making changes there changes the organism. But how for God's sake does it follow that all the hereditary information resides there, and nor on some 'HD' somewhere, away from the 'keyboard'?
I am simply pointing out that the 'keyboard' has practically no data storage capacity for the task.
'We believe that most of our hereditary information resides in genes because it does. '
Oh, pardon the heresy involved, but I really don't know how do you know that.
2 months ago
So I can believe the small numbers quoted.
4 weeks ago
4 weeks ago