Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This is our model for querying genomic features such as genomic regions, gene transcripts, SNPs and other genetic variants. The way features are specified may depend on the database. For example, genes have different IDs in the NCBI, HUGO and Ensembl databases.  

Image Modified

Features are specified in BioQ using keywords, such as SNP_ID and GENE_ID.  There are usually defaults, such as a numeric feature being assumed to be a dbSNP rs ID and a alphanumeric string assumed to be a HUGO gene symbol. 

...

This explains how to use feature tables once they are created.  Specifically, it identifies the columns of a query that can be used to limit the query results by a feature table.   For example, in the dbSNP database, the "SNP Summary" query can be limited to the features in the feat_dbsnp_snp feature table, as can any query that results in a dbSNP SNP ID column.

db_feat_table

The feature tables relevant to a specific database with descriptions of why the features are relevant and how they should be used.

db_feat_table_pop

This explains how specific databases use the feature tables and how they are inter-populated. For example, the 1000 Genomes databases probably should not populate gene transcript feature tables from genomic regions, although this behaviour may be configurable in future versions of BioQ.  However, in 1000 Genomes we probably want the variant sites table feat_1kg_site to be populated from region features in feat_region.

Miscellaneous tables

reference

...