OUP user menu

Assessment of microbial diversity in human colonic samples by 16S rDNA sequence analysis

Georgina L. Hold , Susan E. Pryde , Valerie J. Russell , Elizabeth Furrie , Harry J. Flint
DOI: http://dx.doi.org/10.1111/j.1574-6941.2002.tb00904.x 33-39 First published online: 1 January 2002


The bacterial species diversity of three colonic tissue samples from elderly people was investigated by sequence analysis of randomly cloned eubacterial 16S rDNA. The majority of sequences (87%) clustered within three bacterial groups: (1) Bacteroides; (2) low G+C content Gram-positives related to Clostridium coccoides (cluster XIVa); (3) Gram-positives related to Clostridium leptum (cluster IV). These groups have been shown to dominate the human faecal flora. Only 25% of sequences were closely related (>97%) to current species type strains, and 28% were less than 97% related to any database entry. 19% of sequences were most closely related to recently isolated butyrate-producing bacteria belonging to clusters XIVa and IV, with a further 18% of the sequences most closely related to Ruminococcus obeum and Ruminococcus torques (members of cluster XIVa). These results provide the first molecular information on the microbial diversity present in human colonic samples.

  • Colonic sample
  • 16S rDNA sequencing
  • Microbial diversity

1 Introduction

The human colon harbours a highly diverse microbial ecosystem [1]. Microbial metabolic activity in the gut has important consequences for health through the supply of nutrients including vitamins and short chain fatty acids to the host tissues (reviewed in [24)]. Additionally, the commensal microflora provides a natural defense mechanism against invading pathogens [5,6] and interacts at several levels with the intestinal epithelium and immune system [79]. However, our understanding of this complex bacterial community and its interactions with the host is still far from complete.

Molecular techniques provide the most powerful tools available for revealing phylogenetic diversity of microorganisms within complex ecosystems independent of cultural bias [10,14]. Recent investigations of 16S rDNA sequences have produced important insights into the diversity of the human faecal flora, showing that Bacteroides species and bacteria belonging to the Clostridium coccoides/cluster XIVa and Clostridium leptum/cluster IV groups are major constituents of the human faecal microflora [1518]. This work has also shown that as many as 76% of randomly cloned sequences share less than 97% identity with known cultured bacterial strains [16,17,19].

Almost all of the current information concerning microbial diversity in the human intestinal tract, however, is based on faecal samples. One previous study identified the predominant culturable bacteria associated in vivo with human colonic wall tissue as Bacteroides and Fusobacterium spp. [20]. However, it was noted that diversity revealed through culturing was less than that observed through microscopic analysis.

The work reported here is the first attempt to identify the main types of bacteria associated with human colon tissue by direct retrieval and analysis of SSU (small subunit) rDNA sequences.

2 Materials and methods

2.1 DNA extraction and PCR amplification of colonic tissue samples

Three specimens of normal colonic tissue were analysed, one from a sudden death victim (male aged 70) and two from live subjects who had undergone colon resections (male aged 79 – sigmoid colon, female aged 82 – right colon). Samples were received 10 min after surgical removal and were snap frozen in liquid nitrogen prior to placing at −80°C. A small (1 g) section of each tissue was thawed on ice and thoroughly washed in sterile distilled water prior to analysis to remove luminal contents, therefore it was assumed that all bacteria remaining were associated either with the colonic tissue or with the mucus. For DNA extractions, colonic samples were resuspended in 1 ml of sterile distilled water with the addition of sterile zirconium beads (0.1 mm diameter, Sigma, Dorset, UK). Samples were then beaten for 30 s using a minibead beater (Biospec Corp., Stratech Scientific, UK). DNA suitable for PCR amplification was extracted by a modification of the method of Stahl et al. [21] (as described previously by Pryde et al. [22]). 16S rDNA was amplified using the universal eubacterial primers fD1 5′-AGAGTTTGATCCTGGCTCAG-3′ (Escherichia coli positions 8–27) and rP2 5′-ACGGCTACCTTGTTACGACTT-3′ (E. coli positions 1494–1513) [23], yielding a product of approximately 1500 bp. PCR amplifications were performed using the following conditions: initial denaturation of template DNA at 94°C for 5 min; then 20 cycles consisting of denaturation (2 min at 94°C), annealing (30 s at 57°C), extension (2 min at 72°C), and a final extension at 72°C for 10 min. PCR amplification was limited to 20 cycles in order to minimise possible bias. PCR amplicons were purified using the Wizard PCR product purification kit (Promega, Southampton, UK) and then cloned into a pGEM-T vector plasmid (Promega, Southampton, UK). Insert DNA was sequenced in both directions using universal 16S rRNA primers [24] on a 377 automated DNA sequencer.

2.2 Phylogenetic analysis

Sequences were analysed by Blast [25] against 16S rDNA sequences from GenBank [26] and the ribosomal database project [27]. Phylogenetic trees were generated by the neighbour-joining method [28], via the PHYLIP package [29] using DNADIST for distance analysis [30]. Bootstrap resampling (data resampled 100 times) used the SEQBOOT program and consensus trees were generated by the CONSENSE program [31].

2.3 Nucleotide sequence accession numbers

The 16S rDNA sequences were deposited in the EMBL data library under accession numbers AJ408957AJ409009 and AJ315481AJ315487.

3 Results

3.1 16S rDNA sequence diversity of human colonic tissue samples

A total of 110 clones from the three colonic tissue samples were analysed, 34 from colonic sample 1 (HuCA), 39 clones from colonic sample 2 (HuCB) and 37 clones from colonic sample 3 (HuCC). All clones from colonic samples 1 and 2 were subjected to full-length 16S rDNA sequencing with no chimeric clones found using the program CHECK-CHIMERA [32]. However, clones from colonic sample 3 were initially partially sequenced in order to infer phylogenetic affiliation, with full-length sequencing performed on clones which showed closest 16S rDNA sequence similarity to butyrate-producing bacteria [33] or <97% 16S rDNA sequence relatedness to database entries. Therefore, full-length 16S rDNA sequences were obtained for eight clones from colonic sample 3, namely HuCC2, 13, 15, 28, 30, 33, 34, and 43 (Table 1).

View this table:

Percentage sequence identity between random 16S rDNA clones generated from colonic tissue sample sequences present in GenBank

Clone IDMost closely related type strain sequence (GenBank accession no.)Sequence identity (%)Most closely related database sequence (GenBank accession no.)Sequence identity (%)No. of identical clones
Colon sample 1
Gram-positive (cluster XIVa)a
HuCA2X85101 Ruminococcus obeum94AF132243 uncultured bacterium adhufec171991
HuCA5X85101 Ruminococcus obeum95b1
HuCA8AF202259 Eubacterium oxidoreducens95BBA270474 butyrate-producing bacterium L1-83991
HuCA13AF202259 Eubacterium oxidoreducens94AB034123 uncultured rumen bacterium 4C28d-8991
HuCA15L34621 Eubacterium halii97BBA270490 butyrate-producing bacterium L2-7991
HuCA17Y18184 Clostridium indolis94AF052421 uncultured bacterium AZ54971
HuCA19X94966 Ruminococcus productus92AF132267 uncultured bacterium adhufec382951
HuCA20Y18184 Clostridium indolis94AF132254 uncultured bacterium adhufec25971
HuCA22X85101 Ruminococcus obeum951
HuCA23AF202259 Eubacterium oxidoreducens94AF132248 uncultured bacterium adhufec225971
HuCA26X85101 Ruminococcus obeum941
HuCA27M59089 Clostridium clostridiiformes93AF052421 uncultured bacterium AZ54951
HuCA28AF202259 Eubacterium oxidoreducens94AB034123 uncultured rumen bacterium 4C28d-8992
HuCA29AJ011522 Eubacterium ramulus99AF132260 uncultured bacterium adhufec310991
HuCA40L34420 Eubacterium eligens981
Low G+C Gram-positive (cluster IV)
HuCA1L76596 Ruminococcus callidus941
HuCA10X85022 Fusobacterium prausnitzii98UBU270469 butyrate-producing bacterium A2-165991
HuCA11X85022 Fusobacterium prausnitzii97UBU270469 butyrate-producing bacterium A2-165971
HuCA24X85099 Ruminococcus bromii89AF018544 unidentified rumen bacterium921
HuCA25X85022 Fusobacterium prausnitzii97UBU270469 butyrate-producing bacterium A2-165971
Bacteroides CFB
HuCA7L16497 Bacteroides putredenis97AF132279 uncultured bacterium adhufec73991
HuCA9M58762 Bacteroides vulgatus96AF132256 uncultured bacterium adhufec27971
HuCA21L16489 Bacteroides thetaiotaomicron94BS16SRNAR Bacteroides species991
HuCA33M11656 Bacteroides fragilis991
HuCA34L16484 Bacteroides ovatus95AF139525 Bacteroides species962
HuCA36M58762 Bacteroides vulgatus962
Low G+C Gram-positive (cluster XVIII)
HuCA3Y10164 Dehalobacter restrictus93AB034023 uncultured rumen bacterium 4C0d-10941
HuCA6Y10164 Dehalobacter restrictus93AB034023 uncultured rumen bacterium 4C0d-10941
HuCA4AF244133 Burkholderia cepacia90AF232922 uncultured bacterium MS8911
HuCA37AE000474 Escherichia coli991
HuCA18X90515 Verrucomicrobium spinosum92UBA400275 uncultured bacterium L10-6991
Colon sample 2
Low G+C Gram-positive (cluster XIVa)
HuCB1AF202259 Eubacterium oxidoreducens94BBA270479 butyrate-producing bacterium L1-952991
HuCB10L76604 Ruminococcus torques992
HuCB12X85101 Ruminococcus obeum96BBA270483 butyrate-producing bacterium T2-132984
HuCB14L34627 Eubacterium rectale97BBA270475 butyrate-producing bacterium A1-86994
HuCB21L34619 Eubacterium formicigenerans971
HuCB25X85101 Ruminococcus obeum94RSPBIE16 Ruminococcus species941
HuCB26L34621 Eubacterium halii97BBA270490 butyrate-producing bacterium L2-7973
HuCB37X94967 Ruminococcus gnavus93BBA270473 butyrate-producing bacterium A2-194991
HuCB40X71855 Clostridium xylanolyticum94AF132269 uncultured bacterium adhufec406991
HuCB56L34628 Eubacterium xylanophilum941
Gram-positive (cluster IV)
HuCB2X85099 Ruminococcus bromii963
HuCB5AF030446 Ruminococcus flavefaciens91AF052408 uncultured bacterium AZ03971
HuCB7AF167711 Papillibacter cinnaminovorans92AB034125 uncultured rumen bacterium 4C28d-4951
HuCB24Y18187 Clostridium orbiscindens95AF157051 bacterium ASF500951
HuCB29X85022 Fusobacterium prausnitzii98UBU270469 butyrate-producing bacterium A2-165981
Bacteroides CFB
HuCB3M58762 Bacteroides vulgatus971
HuCB6M58762 Bacteroides vulgatus98AF132256 uncultured bacterium adhufec27995
HuCB23L16497 Bacteroides putredenis94AF132279 uncultured bacterium adhufec73962
Low G+C Gram-positive (cluster I)
HuCB15Y18176 Clostridium disporicum972
Low G+C Gram-positive (cluster IX)
HuCB85AF283705 Megasphaera elsdenii961
Low G+C Gram-positive (cluster XI)
HuCB31X76750 Clostridium glycolicum98UBA404682 unidentified bacterium ZF5981
HuCB27AE000474 Escherichia coli991
Colon sample 3
Low G+C Gram-positive (cluster XIVa)a
HuCC4X85101 Ruminococcus obeum96AF153854 uncultured bacterium adhufec30.25971
HuCC8L34619 Eubacterium formicigenerans972
HuCC9L76604 Ruminococcus torques1001
HuCC15L34621 Eubacterium halii95BBA270490 butyrate-producing bacterium L2-7981
HuCC18L76604 Ruminococcus torques97AF376447 uncultured bacterium ckncm214-F4M.1G5981
HuCC19X85101 Ruminococcus obeum94AF153854 uncultured bacterium adhufec30.25972
HuCC21X85101 Ruminococcus obeum94AF253375 uncultured bacterium L127dB981
HuCC22M59089 Clostridium clostridiiformes991
HuCC23M59112 Clostridium symbiosum981
HuCC27L76604 Ruminococcus torques993
HuCC34M59089 Clostridium clostridiiformes95CS16SDR6A Clostridium sp. strain DR6A961
HuCC43X94967 Ruminococcus gnavus94BBA270484 butyrate-producing bacterium A2-231941
Gram-positive (cluster IV)
HuCC10L34618 Eubacterium desmolans96AF132258 uncultured bacterium adhufec296991
HuCC26Y18187 Clostridium orbiscindens992
HuCC32X85022 Fusobacterium prausnitzii93AF132237 uncultured bacterium adhufec13991
Bacteroides CFB
HuCC1AB050110 Bacteroides uniformis992
HuCC2M58762 Bacteroides vulgatus931
HuCC3M86695 Bacteroides distasonis97AF376376 uncultured bacterium ckncm143-F1M.1E3981
HuCC11M58762 Bacteroides vulgatus982
HuCC12X83951 Bacteroides caccae98AF132273 uncultured bacterium adhufec51991
HuCC17X83951 Bacteroides caccae95AF132273 uncultured bacterium adhufec51961
HuCC20L16484 Bacteroides ovatus90AF153865 uncultured bacterium adhufec77.25981
HuCC28AJ005635 Prevotella enoeca89AB009238 unidentified rumen bacterium RFN91891
HuCC30L16489 Bacteroides thetaiotaomicron96AF139525 Bacteroides sp. AR29961
HuCC35X83951 Bacteroides caccae97AF132273 uncultured bacterium adhufec51982
HuCC33AF244133 Burkholderia cepacia89AF236011 β-proteobacterium A0823901
HuCC13X90515 Verrucomicrobrium spinosum92UBA400275 uncultured bacterium L10-6991
HuCC16AF217461 Candidatus Xiphinematobacter rivesi88UBA400275 uncultured bacterium L10-6992
  • a a Roman numerals indicate phylogenetic cluster of Clostridaceae as defined in [38].

  • b b Indicates type strain 16S rDNA sequence was the closest known relative.

Fifty-one of the 110 16S rRNA gene sequences (46%) fell within the C. coccoides group (Gram-positive cluster XIVa) (Table 1; Fig. 1). When compared between colonic samples, cluster XIVa sequences accounted for between 43 and 49% of the total sequence diversity (Table 2). Only 16 of the 51 sequences within this cluster showed closest resemblance to current species type strains, although in four cases sequence identity was below 97%. On the other hand, 16 sequences showed their closest relationships (97–99%) to butyrate-producing isolates from human faeces such as A2-194, A1-86 and L1-82 (loosely related to Eubacterium ramulus, Eubacterium rectale and Roseburia cecicola) and L2-7 (related to Eubacterium halii) [33] (Fig. 1). The remaining 19 sequences most closely resembled faecal bacterial 16S rDNA clones (prefixed ‘hufec’ or ‘AZ’ respectively) reported by Suau et al. [16] or by Zoetendal et al. [18], or in a few cases cloned sequences of ruminal origin, rather than cultured bacteria.


Phylogenetic tree showing the relationships of 16S rDNA sequences (random clones) isolated from three human colonic tissues (prefixed – HuCA, HuCB and HuCC) defined as low G+C Gram-positive bacteria and located within clusters IV, XI, XIVa and XVIII (as defined by Collins et al. [38]). Figures in brackets represent the number of clones with identical sequence data. The scale bar represents genetic distance (10 substitutions per 100 nucleotides). The tree was constructed using the neighbour-joining analysis of a distance matrix obtained from a multiple-sequence alignment. Bootstrap values (expressed as percentages of 100 replications) are shown at branch points: values of 97% or more were considered significant. Sequences derived from the database are shown in italics (e.g. E. rectale). E. coli was used as the outgroup sequence.

View this table:

Comparison of the percentage of clones attributed to the various phylogenetic affiliations within the three colonic tissue samples

Phylogenetic affiliationColonic tissue sample
Other Clostridium clusters5.810.40
Other sequences92.510.8
  • a a Roman numerals indicate phylogenetic cluster of Clostridium as defined in [38].

14.5% (16/110) of the sequences were located within the C. leptum group (cluster IV; Fig. 1) although when compared as individual clone libraries, this figure ranged between 11 and 18% (Table 2). Six sequences were related to type strains of Ruminococcus, while four showed their strongest sequence similarity to the butyrate-producing bacterial isolate A2-165 [33] which is related to Fusobacterium prausnitzii (Fig. 1; Table 1). Of the remaining six sequences, five branched deeply with sequences from random cloning studies and one (HuCC26) showed strong 16S rDNA sequence similarity (>99%) to Clostridium orbiscindens (Table 1).

26% (29/110) of the sequences clustered within the Bacteroides group (Table 1) with values ranging between 20 and 35% within the individual sample sets (Table 2). The majority of sequences within this group were closely related to known type strains including Bacteroides vulgatus and Bacteroides uniformis, however one clone from colonic sample 3 (HuCC28) showed closest 16S rDNA sequence similarity (although <90%) to Prevotella enoeca. Many of the clones within the Cytophaga/Flavobacter/Bacteroides (CFB) phylum also showed strong sequence similarity with cloned ‘hufec’ sequences (Table 1).

The remaining 16S rRNA sequences comprised members of other groups of low G+C bacteria (clusters I, IX, XI and XVIII – six sequences), four Verrucomicrobiales-related sequences, two β-proteobacteria (<90% sequence similarity), and two γ-proteobacteria (related to E. coli) (Table 1; Table 2).

4 Discussion

The complex microbial ecosystem of the human colon plays a key role in human nutrition and health, and a clearer understanding of the physiology, abundance and location of the dominant colonic bacteria is essential. Overall between 85 and 89% (depending on the colonic sample) of the 16S rDNA sequences obtained here fell within three major phylogenetic groups (Bacteroides, C. coccoides/cluster XIVa and C. leptum/cluster IV) that have been shown to dominate the faecal flora [16]. A qualified comparison can be made between the data obtained here from colonic tissue of three elderly subjects with that of Suau et al. from faecal material from a 40-year-old subject [16]. The percentages of clones present in cluster XIVa are 43–49% (colonic-present study) and 44% (faecal study [16]), for cluster IV 11–18% (colonic) and 20% (faecal), and for the CFB group 20–35% (colonic) and 31% (faecal). Recent work by Sghir et al. [34] has indicated that the CFB bacterial component of the faecal flora can vary from 20 to 52% between individuals. Therefore, our results do not suggest any major discrepancy between the bacterial composition of colonic and faecal samples in terms of the major bacterial groups. Bifidobacterial sequences were not detected here in colonic samples, or previously in faecal samples [16] although detectable by fluorescent in situ hybridisation at around 3% of the total bacterial population [15]. We cannot exclude the possibility of PCR bias against this group.

Gut epithelial cells turn over rapidly and there is a continual passage of mucus and sloughed-off cell debris down the digestive tract together with the digesta. Since amplifiable DNA can be retained in dead or quiescent bacterial cells, it is quite likely that most of the diversity present in the large intestinal microflora will be present in freshly voided faeces. On the other hand, it does not follow that the microbial DNA present in faeces will accurately reflect the relative proportions of bacteria present at any given site in the colon.

Our results do not rule out differences in the actual species present within the major groups. For example, more of the sequences from the current study clustered with Ruminococcus obeum and Ruminococcus torques compared with previous data from faecal samples. We cannot say whether this reflects a systematic difference between faecal material and colonic tissue, or differences between the subjects sampled. It is worth noting, however, that R. torques is reported to be a prominent mucin-degrading species [35].

Despite recent efforts to define the microbial diversity of the gastro-intestinal tract through random cloning and sequencing of 16S rRNA genes, the present study still revealed that 28% of sequences recovered were less than 97% related to any database entry. This indicates that our knowledge of bacterial diversity in the colon is still far from being exhaustive. Perhaps one of the most significant observations made here is the close relationship (>97% sequence identity) of 21 out of the 110 of the colonic 16S rDNA sequences with butyrate-producing strains isolated recently from human faeces [33]. Previous cultural studies have often reported an apparent deficit of butyrate-producing isolates, which have been suggested to represent as little as 1% of the cultivable flora [36,37]. The present findings support the view that 16S rRNA gene sequences closely related to those identified as butyrate producers by Barcenilla et al. [33] are abundant in the human colonic microflora, but may often be underestimated by cultural approaches, although it is not known whether the sequences detected here represent butyrate-producing bacteria or non-butyrate-producing relatives. This work emphasises the potential importance within the colon of Gram-positive bacteria belonging to clostridial clusters IV and XIVa and the need for more detailed research on the diversity and physiology of cultured strains.


We thank Prof. Charles Campbell and Prof. George MacFarlane for kindly providing the colonic tissue samples and Moira Johnston and Pauline Young for automated DNA sequencing. This work was supported by SEERAD (Scottish Executive Environment Rural Affairs Department).


  1. [1]
  2. [2]
  3. [3]
  4. [4]
  5. [5]
  6. [6]
  7. [7]
  8. [8]
  9. [9]
  10. [10]
  11. [11]
  12. [12]
  13. [13]
  14. [14]
  15. [15]
  16. [16]
  17. [17]
  18. [18]
  19. [19]
  20. [20]
  21. [21]
  22. [22]
  23. [23]
  24. [24]
  25. [25]
  26. [26]
  27. [27]
  28. [28]
  29. [29]
  30. [30]
  31. [31]
  32. [32]
  33. [33]
  34. [34]
  35. [35]
  36. [36]
  37. [37]
  38. [38]
View Abstract