Sie sind auf Seite 1von 5

What I'm very excited by is

the advent of population genomics.


Improvement in binning techniques
coupled with deeper sequencing.
Which allows you to pull back,
pull out, high-quality and
near-complete genomes for,
uncultured organisms.
And so the binning method that has,
which is starting to get more traction
is called differential coverage binning.
And this is based on the idea
that if you look at a set of
related micro metagenomes.
So for instance, a time series or
a spatial series or even using different
DNA extraction methods on the same sample.
You have the same populations [COUGH] but
they're present in different
relative abundance.
And you can use that pattern of
relative abundance as a signature.
So you get your you do your assembly and
you get back anonymous
fragments of genomes.
And if you look at this coverage pattern
for each of those anonymous fragments.
You can bend them together
by virtue of their coverage.
And that method actually
works really well.
And so what's really exciting
me at the moment is,
is, on two fronts using that technology.
On the evolutionary front we can not
make genome trees using those ones.
And now we can actually see, get a,
get a very high resolution
map of the microbial tree.
And these trees are more,
robust than sixteen ESE trees.
And I'll, and so, so my goal at the moment
is to replace the 16s-based, phylogeny.
And the taxonomy derived from that,
with genome-based phylogeny.
And so, at the moment we've got
a genome tree database that's
got about 12,000 genomes in it.
Of which about two and half thousand
are these population genomes.
So my prediction is that, two or three
years from now, when you go to the public.
Database's you'll find that the dominant
form of Giraffe genomes will
be these population genomes.
Because every study of every
habitat produces you know,
usually on the order of
dozens of these genomes.
And we have been, and has with

other stuff, just, not just us but


other peo, other researchers as well.
Developing tools for
taking those, checking the,
the quality of those genomes.
So we have ways of checking to see how
complete, and, or contaminated they are.
And ways of then quickly piping
them into genome trees, so
we've spent some time on that.
And so I'm very excited by that.
because you know I have a,
I have an obsessive compulsive disorder
when it comes to classifying lifeforms.
And so this, this very much meets that
requirement of my personality to do that,
you know, in a robust way.
So I don't feel like I'm
going around in circles.
But the other, the other application for
being able to pull out
high quality population genomes
from environmental samples.
Is now you you can do your ecological
analysis much more robustly.
So, you know, when we first
started meta genomics we were,
it became apparent that for
complex communities.
We were kind of stuck in not being able
to pull out the component populations.
So we do things like do genecentric
analysis where we look at
relative abundance of gene families.
Rather than do it from an organismal
context, which has been fine.
But the problem is that, you don't,
you don't understand,
you don't know who's doing what function.
So you're getting a sort of a global
overview of community function.
So with the population genomes,
you can, in many cases,
you can pull out the major players from
a given ecosystem, and now you can see.
Which organisms are forming
which functions, and
you can work out the traffic
interaction networks.
So that's very exciting for
ecology because that provides a really
a solid foundation for
understanding our ecosystem.
All right, so Green Genes,
was started in the, early to mid 2000s.
And the main, developer is Todd Dissentis,
he was the original developer.
And, he knew that I, was curating
sixteen ESE sequences in order to
get taxonomy based on phylogeny.

Which obviously the way


we should all be doing it
because phylogeny is a natural grouping
of organisms and so we want to base.
Classification which is a human
construction, natural classification.
So that's in that's in the goal.
And he developed the green
genes database as a vehicle for
being able to pull in
the public sequences.
And then annotate them
with all the metadata.
And then I have been the main curator
of the database since its inception.
And my job is to go through,
and this is a crazy job.
And only a crazy person would do it.
There's a couple of crazy people on
the planet that do this kind of thing.
Where you go through and
you look at the structure of the tree.
And, and ideally if you have some idea
of how robust the tree is, and you.
Reconcile that phylogeny with,
what's the currently acceptable
nomenclature for taxonomy.
And so there are good resources for,
you know, nomenclature people.
There's a, there's a committee which
decides on the names of organism.
And then the higher ranks.
And what you find when you do that.
And there are numerous
instances where the taxonomy.
Doesn't match the phylogeny.
So then it's a process of
trying to reconcile that.
And then another major
issue is that because so
much of the diversity is not
represented by cultured organisms.
There's big squares of the tree,
off the phylogenic tree,
that has no classification at all.
So another part of green genes is to,
is to give some form of classification to
these uncharacterized part of the tree.
The main programmer is Daniel McDonald.
to, by the very generous hosting of Rob,
Rob Nyatt.
And so he's, he's been so
supportive of the, of green genes.
And others still involved as well.
My take on the situation is that
with whole genome based biologeny,
worried about whether 60
ness was in about 2001.
So, we're not even that far off the pace.
I think, another 10 years from now,

the genome, tree based biologeny,


taxonomy will be.
with, at about the level of number
sequences that we have with 60 nets.
So I predict we're going to go from
somewhere in the order of ten to
20,000 sequences now.
To about half a million genomes
in ten years from that.
And then that we should have
a very nice comprehensive courage.
Or the tree of life, in a taxonomy that's
not compromised by chimeric artifacts.
Hallelujah!
And then I, then I don't know, you know,
part of the fun is the journey.
I hope that this journey
will never be over,
of course, because there's always
going to be more diversity to discover.
But that will be a far more.
Solid basis for the taxonomy.
So, during curation of the dream
team's database I noticed.
And not only me, other people noticed.
That there was quite a large cluster
of environmental sequences that were
grouping with the cyanobacteria.
And these sequences were coming from
habitats that weren't exposed to sunlight.
So, the nagging question in the back of,
your mind would be are these
non-photosynthetic cyanobacteria?
The dogma in microbiology for
decades has been that all
cyanobacteria are photosynthetic.
So, it was a very,
it was an attractive target.
So, we started to make primers and
probes that would,
target the, these basal or,
cyanobacteria which we nicknamed.
Darcy, short for dark cyanobacteria.
So, Darcy was the nickname.
And other groups,
Ruth Layers group was also interested, and
they were, in parallel looking for them.
And we ended up using these population
genomic methods to recover the genomes.
And as it turned out,
they, they fell right out.
So, we got very, we got,
good quality Darcy genomes from a range
of habitats including the koala gut.
From a bioreactor, and from a full-scale
industrial granular sludge.
And, we looked in those genomes and
sure enough,
they had no photosynthetic apparatus.
And if you do the geometry it

was a very robust clustering of


that group with the photosynthetic
side of bacteria.
So, with taxonomy we wanted to
call them cyanobacteria, and
they met a huge amount of resistance.
So anytime that you challenge a dogma,
you're going to meet resistance.
And so
what they ended up doing was classifying.
That group as a system.
fo, [COUGH] followed the cyanobacteria
called the Melainabacteria which is
Greek a greek nymph, dark nymph.
And then that was a lot
less controversial.
Because now you're not, you're still
managing to maintain the dogma that all
cyanobacteria, for synthetic.
Now is some ways it's
a semantic argument right?
because taxonomys human made.
But the point is that this group is,
reproducible, reproducibly modified it
with the sign of bacteria, so they are.
The last common ancestor
before the photosynthet,
the introduction of
the photosynthetic apparatus.
Which must have occurred
after the divisions.
So, we should be able to learn
something about the ancestry of
photosynthesis by studying this,
this star group.
Now the power,
the power of a name, though.
This is the interesting thing.
The original paper calling
him a sister phylum of
Melainabacteria sort of went
with not much fan theorem.
It was kind of a, you know, another,
it was a, it was a cool study.
But it was another candidate for
phylum for which we now have genomes.
And that's becoming more regular
now with the new techniques.
But because it didn't make
any controversial claims,
that they were assigned a bacteria,
they didn't get much attention.
We got a fair bit of attention for
our paper for calling them cyanobacteria.
So you can see the power of the name,
and that's something to be remembered.
Because people say, well,
taxonomy is such a dry discipline, but
really this power unites.