BioMart help

The HGNC BioMart application allows users to create customised data tables without the need for any programming knowledge by interacting with a form to filter the data and select the columns/attributes they want within the table. This page details how to interact with the BioMart Mart form and provides definitions of the filters and attributes.

Contents

BioMart overview & project

BioMart is a generic data management system which offers a range of advanced query interfaces and administration tools.

The system comes with built-in support for query-optimisation and database federation. BioMart provides users with the ability to conduct fast, powerful queries using either web, graphical, or text based applications, or programatically using web service or software libraries written in Perl and Java. For data providers, the system simplifies the task of integrating their own data with other datasets hosted on the network.

All the software, including an easy to install BioMart website, is available for local installation. BioMart software is completely Open Source, licensed under the LGPL, and freely available to anyone without restrictions.

For more information about the BioMart project and to download the code visit the BioMart site.

HGNC Marts

The HGNC BioMart homepage provide a list of HGNC Marts that are available to use. By clicking on a Mart name the user will be taken to a mart form for the dataset of choice. So far we have two marts to choose from, a gene mart for gene symbol centric data and a family mart for the gene family centric data.

All the mart forms have the same template where the form is split into three parts, Datasets, Filters and Attributes.

Datasets

The datasets part of the mart form is for the user to select the database and the dataset they would like to query and download. The HGNC only have one database and so the database dropdown can be ignored. If the user has entered the site via the HGNC BioMart homepage the user will not have to change the dataset. However if the user has changed their mind and want to download data from another dataset the user can select a different dataset using the "Datasets" dropdown box which will change the form. As we have already mentioned we have two datasets to choose from so far, the gene dataset and the family dataset.

Filters

The filters section is an area for the user to filter the data by the provided fields. There are several types of filter for the user to interact with, the most common type being the text input filter. The filters are split into subsections, according to the type of field/data they filter. Filters are not required for a BioMart search. If a user wants to select attributes for all the data in the dataset they should ignore this section of the form.

Text input filters
Text input filters usually allow the user to add a wildcard "%" symbol to allow BioMart to search the field for data that is like the filter query.
Select box filters
Select box filters are easy to use in that all the user has to do is click on the filter and select the value to filter by. By default the filter will say "-- Select --" and by leaving it like this BioMart will ignore the filter.
Multiple select filters
Multiple select filters are scroll boxes that contain many values per line. To filter by a particular value the user can click on that value. If the user would like to filter on many values, a user using a windows computer should hold down the control (ctrl) key and click on another value. Mac users need to hold down the command (cmd) key instead.
Bulk upload filters
Our Mart forms also have bulk upload filters. The user first selects the field in which they would like to query multiple time by selecting a value within the drop down select box. The user can then place their values within the text area box or click the "upload file" link to select a file which contains the query values. All of the values have to be of the type selected within the drop down (i.e a user cannot provide a file or type in values that contain mixed ID/symbol/accession types).

Gene filters

HGNC data filter
Approved symbol
The official gene symbol that has been approved by the HGNC and is publicly available. Symbols are approved based on specific  HGNC nomenclature guidelines . In the HTML results page this ID links to the HGNC Symbol Report for that gene.
Approved name
The official gene name that has been approved by the HGNC and is publicly available. Names are approved based on specific  HGNC nomenclature guidelines .
Alias gene symbol
Other symbols used to refer to this gene.
Alias name
Other names used to refer to this gene.
Previous HGNC symbol
Symbols previously approved by the HGNC for this gene.
Previous HGNC name
Gene names previously approved by the HGNC for this gene.
Filter by genes...
This filter allows the user to remove rows from the results table for genes that do not have a value within a selected field. 
Status
Indicates whether the gene is classified as:
  • Approved - these genes have HGNC-approved gene symbols
  • Entry withdrawn - these previously approved genes are no longer thought to exist
  • Symbol withdrawn - a previously approved record that has since been merged into a another record
Locus group
Groups  locus types together into related sets. Below is a list of groups and the locus types within the group:
  • protein-coding gene - contains the "gene with protein product" locus type
  • non-coding RNA - contains the following locus types:
    • RNA, Y
    • RNA, cluster
    • RNA, long non-coding
    • RNA, micro
    • RNA, misc
    • RNA, ribosomal
    • RNA, small cytoplasmic
    • RNA, small nuclear
    • RNA, small nucleolar
    • RNA, transfer
    • RNA, vault
  • pseudogene - contains the following types:
    • immunoglobulin pseudogene
    • pseudogene
    • T cell receptor pseudogene
  • phenotype - contains the "phenotype only" locus type
  • other - contains the following types:
    • endogenous retrovirus
    • fragile site
    • immunoglobulin gene
    • protocadherin
    • readthrough
    • region
    • T cell receptor gene
    • transposable element
    • unknown
    • virus integration site
  • withdrawn - contains the "withdrawn" locus type only
Locus type
Specifies the type of locus described by the given entry:
  • gene with protein product - protein-coding genes (the protein may be predicted and of unknown function) ( SO:0001217)
  • RNA, Y - non-protein coding genes that encode Y RNAs ( SO:0000405)
  • RNA, cluster - region containing a cluster of small non-coding RNA genes
  • RNA, long non-coding - non-protein coding genes that encode long non-coding RNAs (lncRNAs) ( SO:0001877); these are at least 200 nt in length. Subtypes include intergenic ( SO:0001463), intronic ( SO:0001903) and antisense ( SO:0001904).
  • RNA, micro - non-protein coding genes that encode microRNAs (miRNAs) ( SO:0001265)
  • RNA, misc - non-protein coding genes that encode miscellaneous types of small ncRNAs
  • RNA, ribosomal - non-protein coding genes that encode ribosomal RNAs (rRNAs) ( SO:0001637)
  • RNA, small cytoplasmic - non-protein coding genes that encode small cytoplasmic RNAs (scRNAs) ( SO:0001266)
  • RNA, small nuclear - non-protein coding genes that encode small nuclear RNAs (snRNAs) ( SO:0001268)
  • RNA, small nucleolar - non-protein coding genes that encode small nucleolar RNAs (snoRNAs) containing C/D or H/ACA box domains ( SO:0001267)
  • RNA, transfer - non-protein coding genes that encode transfer RNAs (tRNAs) ( SO:0001272)
  • RNA, vault - non-protein coding genes that encode vault RNAs ( SO:0000404)
  • phenotype only - mapped phenotypes where the causative gene has not been identified ( SO:0001500)
  • T cell receptor pseudogene - T cell receptor gene segments that are inactivated due to frameshift mutations and/or stop codons in the open reading frame
  • immunoglobulin pseudogene - immunoglobulin gene segments that are inactivated due to frameshift mutations and/or stop codons in the open reading frame
  • pseudogene - genomic DNA sequences that are similar to protein-coding genes but do not encode a functional protein ( SO:0000336)
  • T cell receptor gene - gene segments that undergo somatic recombination to form either alpha, beta, gamma or delta chain T cell receptor genes ( SO:0000460). Also includes T cell receptor gene segments with open reading frames that either cannot undergo somatic recombination, or encode a peptide that is not predicted to fold correctly; these are identified by inclusion of the term “non-functional” in the gene name.
  • complex locus constituent - transcriptional unit that is part of a named complex locus
  • endogenous retrovirus - integrated retroviral elements that are transmitted through the germline ( SO:0000100)
  • fragile site - a heritable locus on a chromosome that is prone to DNA breakage
  • immunoglobulin gene - gene segments that undergo somatic recombination to form heavy or light chain immunoglobulin genes ( SO:0000460). Also includes immunoglobulin gene segments with open reading frames that either cannot undergo somatic recombination, or encode a peptide that is not predicted to fold correctly; these are identified by inclusion of the term “non-functional” in the gene name.
  • protocadherin - gene segments that constitute the three clustered protocadherins (alpha, beta and gamma)
  • readthrough - a naturally occurring transcript containing coding sequence from two or more genes that can also be transcribed individually
  • region - extents of genomic sequence that contain one or more genes, also applied to non-gene areas that do not fall into other types
  • transposable element - a segment of repetitive DNA that can move, or retrotranspose, to new sites within the genome ( SO:0000101)
  • unknown - entries where the locus type is currently unknown
  • virus integration site - target sequence for the integration of viral DNA into the genome
Chromosome
The chromosome where the gene can be found.
Bulk upload filter
Filter by ID, accession or symbol
This field allows the user to provide multiple query values to bulk search BioMart. The list of values must all be of the type selected using the drop down box. Values can be typed/pasted into the text area or uploaded within a file by clicking on the "upload file" link. The types accepted in this filter are as follows:
  • HGNC ID(s) - A unique ID provided by the HGNC for each gene with an approved symbol. IDs are of the format HGNC:n, where n is a unique number.
  • Approved symbols - The official gene symbol that has been approved by the HGNC.
  • Alias gene symbols - Other symbols used to refer to the gene.
  • Previous HGNC symbols - Symbols previously approved by the HGNC for the gene.
  • CCDS accessions - The Consensus CDS (CCDS) accession.
  • INSDC (ENA/GenBank/DDBJ) accessions - INSDC nucleotide sequence accession numbers.
  • Ensembl gene ID(s) - The ID for an Ensembl gene entry.
  • Mouse genome informatics (MGI) ID(s) - Mouse Genome Informatics ID for a mouse homolog of human genes.
  • NCBI Gene ID(s) - IDs that are associated with a gene with NCBI gene.
  • OMIM ID(s) - Identifier from the Online Mendelian Inheritance in Man (OMIM).
  • Orphanet ID(s) - The Orphanet ID identifies a gene within orphanet and the rare diseases that are associated to the gene.
  • Pseudogene.org ID(s) - An ID for a pseudogene entry/sequence within the Pseudogene.org database.
  • RefSeq accessions - The Reference Sequence (RefSeq) identifier.
  • Rat Genome Database (RGD) ID(s) - Rat Genome Database ID for a rat homolog of human genes.
  • UniProt accessions - The UniProt identifier for a protein product of a gene.
  • Vega gene ID(s) - The Vega gene ID.

Family filters

HGNC data filter
Family name
The name given/chosen by the HGNC for the family.
Family alias
Alternative names that are also used to describe the gene family.
Root gene symbol
The root/stem symbol that is common to most of the genes belonging to the gene family.
Bulk upload filter
Filter by IDs or symbols
This field allows the user to provide multiple family IDs, HGNC (gene) IDs and approved gene symbols to BioMart to search. The list of values must all be of the type selected using the drop down box.  Values can be typed/pasted into the text area or uploaded within a file by clicking on the "upload file" link.

Attributes

The Attributes section of the form is where the user selects what they want displayed within their table for download and it is a requirement of BioMart to select at least one attribute. On both the gene and family marts some of the key attributes are selected by default however the user can deselect these defaults. The attribute section is divided up into subsections to group similar attributes fields together. To select or deselect an attribute the user should click on the check box next to the attributes label. Alternatively the user can select or deselect all the attributes within subsection by clicking on the links labelled "select all" and "select none".

Gene attributes

HGNC data
HGNC ID
A unique ID provided by the HGNC for each gene with an approved symbol. IDs are of the format HGNC:n, where n is a unique number.
Status
Indicates whether the gene is classified as:
  • Approved - these genes have HGNC-approved gene symbols
  • Entry withdrawn - these previously approved genes are no longer thought to exist
  • Symbol withdrawn - a previously approved record that has since been merged into a another record
Approved symbol
The official gene symbol that has been approved by the HGNC and is publicly available. Symbols are approved based on specific HGNC nomenclature guidelines.
Approved name
The official gene name that has been approved by the HGNC and is publicly available. Names are approved based on specific  HGNC nomenclature guidelines .
Alias symbol
Other symbols used to refer to the gene.
Alias name
Other names used to refer to the gene.
Previous symbol
Symbols previously approved by the HGNC for the gene.
Previous name
Gene names previously approved by the HGNC for the gene.
Chromosome
The chromosome where the gene can be found.
Chromosome location
Indicates the location of the gene or region on the chromosome
Locus group
Groups  locus types together into related sets. Below is a list of groups and the locus types within the group:
  • protein-coding gene - contains the "gene with protein product" locus type
  • non-coding RNA - contains the following locus types:
    • RNA, cluster
    • RNA, long non-coding
    • RNA, micro
    • RNA, ribosomal
    • RNA, small cytoplasmic
    • RNA, small misc
    • RNA, small nuclear
    • RNA, small nucleolar
    • RNA, transfer
  • pseudogene - contains the following types:
    • immunoglobulin pseudogene
    • pseudogene
    • T cell receptor pseudogene
  • phenotype - contains the "phenotype only" locus type
  • other - contains the following types:
    • endogenous retrovirus
    • fragile site
    • immunoglobulin gene
    • protocadherin
    • readthrough
    • region
    • T cell receptor gene
    • transposable element
    • unknown
    • virus integration site
  • withdrawn - contains the "withdrawn" locus type only
Locus type
Specifies the type of locus described by the given entry:
  • complex locus constituent - transcriptional unit that is part of a named complex locus
  • endogenous retrovirus - integrated retroviral elements that are transmitted through the germline ( SO:0000100)
  • fragile site - a heritable locus on a chromosome that is prone to DNA breakage
  • gene with protein product - protein-coding genes (the protein may be predicted and of unknown function) ( SO:0001217)
  • immunoglobulin gene - gene segments that undergo somatic recombination to form heavy or light chain immunoglobulin genes ( SO:0000460)
  • immunoglobulin pseudogene - immunoglobulin gene segments that are inactivated due to frameshift mutations and/or stop codons in the open reading frame
  • phenotype only - mapped phenotypes ( SO:0001500)
  • protocadherin - gene segments that constitute the three clustered protocadherins (alpha, beta and gamma)
  • pseudogene - genomic DNA sequences that are similar to protein-coding genes but do not encode a functional protein ( SO:0000336)
  • readthrough - a naturally occurring transcript containing coding sequence from two or more genes that can also be transcribed individually
  • region - extents of genomic sequence that contain one or more genes, also applied to non-gene areas that do not fall into other types
  • RNA, cluster - region containing a cluster of small non-coding RNA genes
  • RNA, long non-coding - non-protein coding genes that encode long non-coding RNAs (lncRNAs); these are at least 200 nt and are represented by a processed trancript and/or at least 3 ESTs
  • RNA, micro - non-protein coding genes that encode microRNAs (miRNAs) ( SO:0001265)
  • RNA, ribosomal - non-protein coding genes that encode ribosomal RNAs (rRNAs) ( SO:0001637)
  • RNA, small nuclear - non-protein coding genes that encode small nuclear RNAs (snRNAs) ( SO:0001268)
  • RNA, small nucleolar - non-protein coding genes that encode small nucleolar RNAs (snoRNAs) containing C/D or H/ACA box domains ( SO:0001267)
  • RNA, small cytoplasmic - non-protein coding genes that encode small cytoplasmic RNAs (scRNAs) ( SO:0001266)
  • RNA, transfer - non-protein coding genes that encode transfer RNAs (tRNAs) ( SO:0001272)
  • RNA, small misc - non-protein coding genes that encode miscellaneous types of small ncRNAs
  • T cell receptor gene - gene segments that undergo somatic recombination to form either alpha, beta, gamma or delta chain T cell receptor genes ( SO:0000460)
  • T cell receptor pseudogene - T cell receptor gene segments that are inactivated due to frameshift mutations and/or stop codons in the open reading frame
  • transposable element - a segment of repetitive DNA that can move, or retrotranspose, to new sites within the genome ( SO:0000101)
  • unknown - entries where the locus type is currently unknown
  • virus integration site - target sequence for the integration of viral DNA into the genome
HGNC family ID
Each gene family has a unique numerical ID that forms the last part of the gene family page URL to aid linking and downloading.
HGNC family name
The name given/chosen by the HGNC for the family.
Date approved
Date the gene symbol and name were approved by the HGNC.
Date modified
If applicable, the date the entry was modified by the HGNC.
Date symbol changed
If applicable, the date the approved gene symbol was last changed by the HGNC.
Date name changed
If applicable, the date the approved gene name was last changed by the HGNC.
Model organism databases
Mouse genome informatics (MGI) ID
Mouse Genome Informatics ID for the mouse homologs of the human gene.
Rat genome database (RGD) ID
Rat Genome Database ID for the rat homologs of the human gene.
Gene resources
Ensembl gene ID
The Ensembl gene ID associated with the HGNC gene symbol.  The Ensembl project produces genome databases for vertebrates and other eukaryotic species.
NCBI gene ID
The NCBI gene ID associated with the HGNC gene symbol.  NCBI gene  at the NCBI provide curated sequence and descriptive information about genetic loci including official nomenclature, synonyms, sequence accessions, phenotypes, EC numbers, MIM numbers, UniGene clusters, homology, map locations, and related web sites.
UCSC gene ID
The UCSC gene ID associated with the HGNC gene symbol. The ID is used within the UCSC genome browser to identify an annotated human gene record within the UCSC genome browser
Vega gene ID
The Vega gene ID associated with the HGNC gene symbol. The  VEGA database is a  repository for high-quality gene models produced by the manual annotation of vertebrate genomes.
Nucleotide resources
CCDS accession
The  Consensus CDS (CCDS) project  is a collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality. The long term goal is to support convergence towards a standard set of gene annotations.
INSDC (ENA/GenBank/DDBJ) accession
INSDC nucleotide sequence accession numbers selected by the HGNC for a gene.
RefSeq accession
The Reference Sequence ( RefSeq) identifier displayed within the HGNC gene symbol report. RefSeq aims to provide a comprehensive, integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein products. RefSeq identifiers are designed to provide a stable reference for gene identification and characterization, mutation analysis, expression studies, polymorphism discovery, and comparative analyses.
RNAcentral ID
RNAcentral  is a public resource that offers integrated access to a comprehensive and up-to-date  set of non-coding RNA sequences provided by a collaborating group of  Expert Databases.
Protein resources
UniProt accession
The UniProt identifier for a protein product of the gene . The UniProt Protein Knowledgebase is described as a curated protein sequence database that provides a high level of annotation, a minimal level of redundancy and high level of integration with other databases.
Enzyme (EC) ID
Enzyme entries have  Enzyme Commission (EC)  numbers associated with them that indicate the hierarchical functional classes to which they belong.
Clinical resources
Cosmic symbol
The gene symbol displayed within the Catalogue Of Somatic Mutations In Cancer ( Cosmic). Most of the gene symbols will be the same as HGNC approved gene symbol but for some genes in Cosmic this may not be the case. 
OMIM ID
Identifier provided by  Online Mendelian Inheritance in Man (OMIM) . This database is described as a catalog of human genes and genetic disorders containing textual information and links to additional related resources.
Orphanet ID
Orphanet is the reference portal for information on rare diseases and orphan drugs, for all audiences. Orphanet’s aim is to help improve the diagnosis, care and treatment of patients with rare diseases. The Orphanet ID identifies a gene within orphanet and the rare diseases that are associated to the gene. 
Locus reference genomic (LRG) ID
LRG sequences provide a stable genomic DNA framework for reporting mutations with a permanent ID and a core content that never changes.
Locus specific database (LSDB) name
This contains LSDB database names pertinent to the gene.
Locus specific database (LSDB) URL
This contains LSDB database URL pertinent to the gene.
References
PubMed ID
Identifier that links to published articles relevant to the gene in the NCBI's  PubMed database .
Other external resources
HCDM CD name
The CD name for a cellular differentiation molecule found within the HCDM database.
HomeoDB ID
ID for a homeobox gene within the Homeobox database ( HomeoDB2).
HORDE symbol
The ID for an olfactory receptor gene entry within the  Human Olfactory Receptor Data Exploratorium ( HORDE) database.
IMGT gene symbol
The IMGT/GENE-DB gene symbol for immunoglobulin and T-cell receptor genes associated to the HGNC gene. The gene symbols are either the same as, or equivalent to, HGNC approved gene symbols. Equivalent IMGT symbols include the character "/" which is not present in HGNC approved symbols. The presence of an IMGT gene symbol indicates that the gene can be found within the IMGT/GENE-DB.
Intermediate filament database HGNC ID
The HGNC ID stored within the  Human Intermediate Filament Database for an intermediate filament gene.
IUPHAR/BPS guide to pharmacology ID
IUPHAR/BPS Guide to PHARMACOLOGY is an  expert-driven guide to pharmacological targets and the substances that act on them. The ID is their object ID that is used as an identifier for a gene record within their database.
KZNF gene catalog ID
The KZNF catalog is a comprehensive collection of  Krüppel-type zinc finger genes (KZNFs)  in primates with finished or high quality draft genomes. The ID refers to a gene report within the KZNF catalog.
mamit-tRNADB ID
Mamit-tRNAdb is a compilation of mammalian mitochondrial tRNA genes. The ID refers to a tRNA gene within the mamit-tRNAdb database.
Merops ID
The  MEROPS  database is an information resource for peptidases (also termed proteases, proteinases and proteolytic enzymes) and the proteins that inhibit them.
mirBase accession
An accession number for a microRNA sequence within the miRBase database for the HGNC gene.
Pseudogene.org ID
An ID for a pseudogene entry/sequence within the  Pseudogene.org database for the HGNC gene.
SLC bioparadigms symbol
The gene symbol for a solute carrier gene as found in the Bioparadigms SLC tables database.
snoRNABase (snoid) ID
snoRNABase is a comprehensive database of human H/ACA and C/D box snoRNAs. The ID itself refers to a snoRNA page within the database resource.
LncRNADB symbol
lncRNAdb is a  database providing comprehensive annotations of eukaryotic long non-coding RNAs (lncRNAs). Most of the gene symbols will be the same as HGNC approved gene symbols however for some genes this may not be the case.
LNCipedia symbol
LNCipedia is a  comprehensive compendium of human long non-coding RNAs (lncRNAs). Most of the gene symbols will be the same as HGNC approved gene symbols however for some genes this may not be the case.

Family attributes

HGNC family attributes
Family ID
Each gene family has a unique numerical ID that forms the last part of the gene family page URL to aid linking and downloading.
Family name
The name given/chosen by the HGNC for the family.
Family alias
Other commonly-used gene family names and abbreviations.
Root gene symbol
The root/stem symbol that is common to most of the genes belonging to the gene family.
Description
A brief description about the gene family in question.
Description source
The source of the text for the description. Sources are usually from wikipedia, UniProt or our own HGNC description. Other sources may be used. 
External family resources
Resource name
Gene family specific database resource name.
Resource description
Gene family specific database resource description.
Resource URL
Gene family specific database resource URL.
PubMed ID
PubMed ID for a reference pertinent to the gene family. We do  not aim to list all possible published papers on the family but we provide PubMed IDs to papers that first described the gene family in question or papers that are particularly relevant to the nomenclature of the genes.
HGNC Gene attributes
HGNC ID (gene)
A unique ID provided by the HGNC for each gene with an approved symbol. IDs are of the format HGNC:n, where n is a unique number.
Approved symbol
The official gene symbol that has been approved by the HGNC and is publicly available. Symbols are approved based on specific  HGNC nomenclature guidelines . In the HTML results page this ID links to the HGNC Symbol Report for that gene.
Approved name
The official gene name that has been approved by the HGNC and is publicly available. Names are approved based on specific  HGNC nomenclature guidelines .
Status
Indicates whether the gene is classified as:
  • Approved - these genes have HGNC-approved gene symbols
  • Entry withdrawn - these previously approved genes are no longer thought to exist
  • Symbol withdrawn - a previously approved record that has since been merged into a another record
Locus group
Groups  locus types together into related sets. Below is a list of groups and the locus types within the group:
  • protein-coding gene - contains the "gene with protein product" locus type
  • non-coding RNA - contains the following locus types:
    • RNA, cluster
    • RNA, long non-coding
    • RNA, micro
    • RNA, ribosomal
    • RNA, small cytoplasmic
    • RNA, small misc
    • RNA, small nuclear
    • RNA, small nucleolar
    • RNA, transfer
  • pseudogene - contains the following types:
    • immunoglobulin pseudogene
    • pseudogene
    • T cell receptor pseudogene
  • phenotype - contains the "phenotype only" locus type
  • other - contains the following types:
    • endogenous retrovirus
    • fragile site
    • immunoglobulin gene
    • protocadherin
    • readthrough
    • region
    • T cell receptor gene
    • transposable element
    • unknown
    • virus integration site
  • withdrawn - contains the "withdrawn" locus type only
Locus type
Specifies the type of locus described by the given entry:
  • complex locus constituent - transcriptional unit that is part of a named complex locus
  • endogenous retrovirus - integrated retroviral elements that are transmitted through the germline ( SO:0000100)
  • fragile site - a heritable locus on a chromosome that is prone to DNA breakage
  • gene with protein product - protein-coding genes (the protein may be predicted and of unknown function) ( SO:0001217)
  • immunoglobulin gene - gene segments that undergo somatic recombination to form heavy or light chain immunoglobulin genes ( SO:0000460)
  • immunoglobulin pseudogene - immunoglobulin gene segments that are inactivated due to frameshift mutations and/or stop codons in the open reading frame
  • phenotype only - mapped phenotypes ( SO:0001500)
  • protocadherin - gene segments that constitute the three clustered protocadherins (alpha, beta and gamma)
  • pseudogene - genomic DNA sequences that are similar to protein-coding genes but do not encode a functional protein ( SO:0000336)
  • readthrough - a naturally occurring transcript containing coding sequence from two or more genes that can also be transcribed individually
  • region - extents of genomic sequence that contain one or more genes, also applied to non-gene areas that do not fall into other types
  • RNA, cluster - region containing a cluster of small non-coding RNA genes
  • RNA, long non-coding - non-protein coding genes that encode long non-coding RNAs (lncRNAs); these are at least 200 nt and are represented by a processed trancript and/or at least 3 ESTs
  • RNA, micro - non-protein coding genes that encode microRNAs (miRNAs) ( SO:0001265)
  • RNA, ribosomal - non-protein coding genes that encode ribosomal RNAs (rRNAs) ( SO:0001637)
  • RNA, small nuclear - non-protein coding genes that encode small nuclear RNAs (snRNAs) ( SO:0001268)
  • RNA, small nucleolar - non-protein coding genes that encode small nucleolar RNAs (snoRNAs) containing C/D or H/ACA box domains ( SO:0001267)
  • RNA, small cytoplasmic - non-protein coding genes that encode small cytoplasmic RNAs (scRNAs) ( SO:0001266)
  • RNA, transfer - non-protein coding genes that encode transfer RNAs (tRNAs) ( SO:0001272)
  • RNA, small misc - non-protein coding genes that encode miscellaneous types of small ncRNAs
  • T cell receptor gene - gene segments that undergo somatic recombination to form either alpha, beta, gamma or delta chain T cell receptor genes ( SO:0000460)
  • T cell receptor pseudogene - T cell receptor gene segments that are inactivated due to frameshift mutations and/or stop codons in the open reading frame
  • transposable element - a segment of repetitive DNA that can move, or retrotranspose, to new sites within the genome ( SO:0000101)
  • unknown - entries where the locus type is currently unknown
  • virus integration site - target sequence for the integration of viral DNA into the genome
Chromosome
The chromosome where the gene can be found.
Date approved
Date the gene symbol and name were approved by the HGNC.
Date modified
If applicable, the date the gene entry was modified by the HGNC.
Date name changed
If applicable, the date the approved gene name was last changed by the HGNC.
Date symbol changed
If applicable, the date the approved gene symbol was last changed by the HGNC.
Other external resources
Ensembl gene ID
The Ensembl gene ID associated with the HGNC gene symbol. The Ensembl project produces genome databases for vertebrates and other eukaryotic species.
NCBI gene ID
The NCBI gene ID associated with the HGNC gene symbol.  NCBI gene at the NCBI provide curated sequence and descriptive information about genetic loci including official nomenclature, synonyms, sequence accessions, phenotypes, EC numbers, MIM numbers, UniGene clusters, homology, map locations, and related web sites.
UCSC gene ID
The UCSC gene ID associated with the HGNC gene symbol. The ID is used within the UCSC genome browser to identify an annotated human gene record within the  UCSC genome browser.
UniProt accession
The  UniProt identifier for a protein product of the gene. The UniProt Protein Knowledgebase is described as a curated protein sequence database that provides a high level of annotation, a minimal level of redundancy and high level of integration with other databases.
Vega gene ID
The Vega gene ID associated with the HGNC gene symbol. The  VEGA database is a repository for high-quality gene models produced by the manual annotation of vertebrate genomes.

Example of how to use BioMart

BioMart RESTful service

Create a query

On the preview page of your BioMart search you will see near the top a tab labelled "REST/SOAP". Clicking on this tab will produce a query as XML to retrieve the same data that you are previewing.

Using the XML query

To use the XML query you need the url http://biomart.genenames.org/martservice/results. You can either POST the query or use the query in a GET request.

POSTing the query

POSTing the query in my opinion is the better solution. Save the XML snippet in a file called query.xml (or name it however you like). Then you can use the file in a POST request to the martserver using a tool like curl.

curl --data-urlencode query@query.xml http://biomart.genenames.org/martservice/results

GET request method

Copy the XML query snippet and use the martservice URL as shown below:

http://biomart.genenames.org/martservice/results?query=<PASTE QUERY HERE>

You may have to URL encode the XML query for the GET to work. You can do this by using a tool such as the online URL encode/decode tool. GET requests have a 2,048 character limit so depending on the size of the query you may have to use the POST method.

More help about the BioMart RESTful service and more can be found within the BioMart 0.9 documentation PDF.