Changes
v2.8 (2025-06-08)
#546, #406:
haplotagcan now be used to tag supplementary alignments independently of the primary alignment. For this, the option--tag-supplementarywas changed to accept a haplotagging strategy, such as--tag-supplementary=independent-or-skipand--tag-supplementary=independent-or-copy-primary. The change is backwards compatible, that is, using only--tag-supplementaryis equivalent to--tag-supplementary=copy-primaryand sets the strategy to the previous one of tagging supplementary alignments the same as the primary alignment. See the haplotag documentation.Extended support for supplementary alignments to the
polyphasemodule.#608: Fixed bug in
polyphasethat could lead to phasable variants not being phased in multi-sample VCFs.
v2.7 (2025-05-27)
Fixed build configuration, which due to changes in setuptoools, made the wheels much slower (because they were compiled without optimization). This only affected the wheels, not the Bioconda packages.
v2.6 (2025-04-11)
v2.5 (2025-04-03)
#587: Fixed bug when computing distances between aligned reads which could lead to some reads being ignored.
v2.4 (2025-01-22)
#554: Added
--exclude-chromosomeoption (can be used multiple times) to most subcommands (phase,haplotag,genotypeetc.)#537: Fixed a crash when running
haplotagon CRAM files.#545:
haplotagphasenow supports multi-allelic variants.#579: Fix
--supplementary-distanceoption tophasenot working.Reduced processing time of BAM files by about 33% when using realignment.
v2.3 (2024-05-05)
#521: Added
haplotagphasecommand. The command adds phase information to variants based on haplotagged reads. Contributed by Nikolai Karpov (@nkkarpov) and Mitchell Robert Vollger (@mrvollger).#516: Added
--use-supplementaryoption tophase. Use this to use supplementary alignments for phasing (previously, supplementary alignments would be ignored). Contributed by Nikolai Karpov (@nkkarpov).
v2.2 (2024-01-26)
#496: Fixed a segmentation fault in
polyphase.#498: Fixed a numeric overflow in the scoring phase of
polyphase. It could occur for variants with extremely high coverages (i.e. >200X).#472: Fixed various warnings and assertion violations when running
polyphase.#214: Added support for ploidies greater than two to
whatshap split.Added another algorithm for diploid phasing, which is a heuristic version of the default algorithm. Since it has not been tested extensively, we recommend the old algorithm for productive use, especially for pedigree phasing. Main benefit is support for higher coverages and/or larger pedigrees at the cost of not solving the underlying MEC model to optimality anymore. The heuristic is accessible via the parameter
--algorithm=heuristic.
v2.1 (2023-10-17)
We added k-merald, a new method for allele detection based on k-mer alignment. Instead of using a fixed cost value, k-merald derives k-mer mismatch penalties using the error profiles generated by
whatshap learn. k-merald is available as an alternative to the edit-distance-based allele detection.WhatsHap can now be used to generate sequencing error profiles for a specific technology using
whatshap learn.#470: Avoid ZeroDivisionError in
whatshap statswhen there are no heterozygous or no phased variants.#485: Fixed a KeyError: ‘parse_vcf’ in
whatshap polyphasewhen a full chromosome is skipped.
v2.0 (2023-06-30)
#346: Phasing of indels (and other non-SNVs) is now enabled by default. This previously required specifying the
--indelsoption, which not all users knew about and were thus unnecessarily getting suboptimal phasing results. The option is now ignored and leads to a warning. An--only-snvsoption was added that restores the old behavior. This change applies to the following subcommands:phase,haplotype,polyphase,polyphasegenetic.Since this is a backwards incompatible change (when not using
--indelsalready), the major version has been increased.#425: Haplotagging CRAM files should now work in more cases with
haplotag.#427:
polyphasedid not phase indels, even if explicitly told.#432:
polyphasecan use existing phasing information in VCF when using the--use-prephasingflag. Still very experimental.#439:
polyphasegeneticnow handles pedigree information more robustly and properly detects available ILP solvers.#449: Fixed runtime issues for ploidies above 4, if no pre-phasing is used.
#450:
polyphasenow supports multi-allelic variants.#457:
haplotagnow also tags alignments marked as duplicate.#466: Inconsistent runtime measurements now lead to a warning and no longer to a crash.
This is the last WhatsHap release to support Python 3.7.
v1.7 (2022-12-01)
#379: Added the ability to do polyploid phasing with pedigree information. This is implemented in a new
polyphasegeneticsubcommand.#143:
whatshap statsnow outputs the fraction of heterozygous variants that are phased.#410:
haplotaggained support for tagging data with ploidy greater than two (use option--ploidy).#400: Fixed artificial overinflation of block length stats in
whatshap stats.#418: Fixed problem in
statswhere NaN values caused ValuError#416: Clarified in the docs what
statsconsiders as “phased”.#207: Enable comma-separated chromosomes as argument to
whatshap stats.#412: Changed
statsto compute all length statistics on split blocks#399: Formatted
statsoutput so that long values are right-aligned with all other values.
v1.6 (2022-09-06)
#384: Fixed how interleaved phase blocks in
whatshap statsare split when computing NG50 values. This allows NG50 values to be larger than before. Thanks to @pontushojer.#385: Speed up
whatshap statswhen used with--chromosomesby avoiding to read in the entire VCF. Thanks to @pontushojer.#387:
whatshap haplotaggot some optimizations and is now about 20% faster. Thanks to @pontushojer.#397: Fixed
whatshap haplotagto include reads not assigned to a contig (unmapped) in the output (unless the--regionoption is used).
v1.5 (2022-08-23)
Providing a reference FASTA (with
--referenceor-r) is now mandatory even forwhatshap haplotag. It was already mandatory forwhatshap phase. In both cases, this is to prevent accidentally getting bad results because allele detection through realignment (which usually performs better) is only possible if a reference is provided. Use--no-referenceexplicitly to fall back to the less accurate algorithm.#394: Fixed
whatshap phaseoption--recombination--listnot working.#371:
whatshap splitcrashed when attempting to split reads in a FASTQ file by haplotype.#377: Speed-up of about 20-30% for
whatshap polyphasevia some optimizations in the read clustering algorithm.Removed the deprecated
--pigzoption forwhatshap split
v1.4 (2022-04-07)
#362:
whatshap polyphasereceived extensive algorithmic updates. The compatiblity with different data sets (species and sequencing technology) has been improved. The wall-clock time has been reduced by about 20-30%, depending on the input data.
v1.3 (2022-03-11)
#353: Fix incorrect HS tags in
whatshap polyphase#356: Fixed crash when reading VCF variants without
GTfields (happens in GVCFs).#352:
whatshap haplotaghas gained option--output-threadsfor setting the number of compression threads, significantly reducing wall-clock time. Also, if output is sent to a pipe, uncompressed BAM is written. Thanks to @cjw85.
v1.2 (2021-12-08)
#208: Fix
phase --merge-reads. This option has never worked correctly and just led towhatshap phasetaking a very long time and in some cases even crashing. With the fix, the option should work as intended, but we have not evaluated how much it improves phasing results.#337: Add
--skip-missing-contigsoption towhatshap haplotag#335: Add option
--ignore-sample-nametowhatshap compare(thanks to Pontus Höjer)#342: Fix
whatshap comparecrashing on VCFs with genotypes with an unknown allele (whereGTis1|.or similar).#343:
whatshap statsnow reads the chromosome lengths (for N50 computation) from the VCF header, no need to use--chr-lengths.
v1.1 (2021-04-08)
#223: Fix
haplotag --ignore-linked-readsnot working#241: Fix some
polyphaseproblems.#249: Fix crash in the
haplotagcommand on reading a VCF with thePStag set to..#251: Allow
haplotagto correctly write to standard output.#207: Allow multiple
--chromosomearguments tostats.The file created with
--output-read-listwas not correctly tab-separated.#248: Remove
phase --full-genotypingoption. Instead, usewhatshap genotypefollowed bywhatshap phase.#289: Fix parsing of GVCFs (with dots in the ALT column)
#265:
polyphasecan now work in parallel
v1.0 (2020-06-24)
WhatsHap has not seen a release in over a year although development has continued. To make up for it, we decided to leave ZeroVer behind and set the version number to 1.0.
WhatsHap has gained initial support for phasing polyploid samples! While this feature may not be quite ready for production use, we encourage you to test it by using the
whatshap polyphasesubcommand and to report any issues you find back to us. See also the pre-print at <https://doi.org/10.1101/2020.02.04.933523> for details.#51: Reading and writing VCF files is now significantly faster because we switched to a different library for that task (
pysam.VariantFile).The switch to
pysam.VariantFilealso makes WhatsHap stricter in which VCF files it accepts. We have tried to give sensible error messages in these cases, but please report any remaining issues..bcffiles can now be read and written.#110:
.vcf.gzoutput files are now compressed with bgzip so that they can be indexed with tabix.Providing an indexed reference FASTA is now mandatory (with
-ror--reference). It is possible to bypass this by using--no-reference, but that will disable realignment and therefore give worse phasing results on error-prone reads (PacBio, Nanopore).#187: Implemented a
--regionsoption for thehaplotagsubcommand.Implemented a
--discard-unknown-readsoption for thesplitsubcommand. Reads that are in the input reads file (BAM/FASTQ), but are not listed in the haplotag file will be discarded (by default, they are part of the “untagged” output).Fixed #215.
splitsubcommand can now process.bamfiles lacking thesequencefield for some/all reads.The minimum required Python version for WhatsHap is now 3.6.
v0.18 (2019-02-15)
Add option
--plot-sum-of-blocksizestowhatshap compare.Fix in
whatshap stats: sometimes returned wrong N50 values if the end position of the last block of a chromosome was larger than the starting position of the first block of the next chromosome.#173: The
haplotagcommand should now be able to properly write CRAM files.#177: Option
--ignore-read-groupsdid not work when phased blocks (VCF) were provided as input.#122: Add
--ignore-read-groupsand--samplesoptions tohaplotag.Integration of the HapChat algorithm as an alternative MEC solver, available through
whatshap phase --algorithm=hapchat. Contributed by the HapChat team, see https://doi.org/10.1186/s12859-018-2253-8.This is the last release of WhatsHap to support Python 3.4.
v0.17 (2018-07-20)
#140: Haplotagging now works when chromosomes are missing in the VCF.
Added option
--merge-reads, which is helpful for high coverage data.When phasing pedigrees, ensure that haplotypes are ordered as paternal_allele|maternal_allele in the output VCF. This seems to be a common convention and also used by 1000G.
Test cases now use pytest instead of nose (which is discontinued).
v0.16 (2018-05-22)
#167: Fix the
haplotagcommand. It would tag reads incorrectly.#154: Use barcode information in BX tags when running
haplotagon 10x Genomics linked read data.#153: Allow combination of
--pedand--samplesto only work on a subset of samples in a pedigree. Added--use-ped-samplesto only phase samples mentioned in PED file (while ignoring other samples in input VCF).
v0.15 (2018-04-07)
New subcommand
genotypefor haplotype-aware genotyping (see https://doi.org/10.1101/293944 for details on the method).Support CRAM files in addition to BAM.
#133: No longer create BAM/CRAM index if it does not exist. This is safer when running multiple WhatsHap instances in parallel. From now on, you need to create the index yourself (for example with
samtools index) before running WhatsHap.#152: Reads marked as “duplicate” in the input BAM/CRAM file are now ignored.
#157: Adapt to changed interface in Pysam 0.14.
#158: Handle read groups with missing sample (SM) tag correctly.
v0.14.1 (2017-07-07)
Fix compilation problem by distinguishing gcc and clang.
v0.14 (2017-07-06)
Added
--full-genotypingto (re-)genotype the given variants based on the readsAdded option
whatshap compare --switch-error-bedto write BED file with switch error positionsAdded
whatshap compare --plot-blocksizesto plot histogroms of block sizesAdded option
--longest-block-tsvto output position-wise stats on longest joint haplotype blockAdded option
whatshap compare --tsv-multiwayto write results of multi-way comparison to tab-separated fileAdded option –chromosome to whatshap stats
whatshap comparecan now compute the block-wise Hamming distancewhatshap statscan now compute an N50 for the phased blocksFixed compilation issues on OS X (clang)
Detect unsorted VCFs and chromosome name mismatches between BAM and VCF
Fix crash when whatshap compare encounteres unphased VCFs
Expanded documentation.
v0.13 (2016-10-27)
Use
PStag instead ofHPtag by default to store phasing information. This applies to thephaseandhapcut2vcfsubcommands.PSis also used by other tools and standard according to the VCF specification.Incorporated genotype likelihoods into our phasing framework. On request (by using option
--distrust-genotypes), genotypes can now be changed at a cost corresponding to their input genotype likelihoods. The changed genotypes are written to the output VCF. The behavior of--distrust-genotypescan be fine-tuned by the added options--include-homozygous,--default-gq,--gl-regularizer, and--changed-genotype-list.Correctly handle cases when processing VCFs with two or more disjoint families.
v0.12 (2016-07-01)
Speed up allele detection
Add an
unphasesubcommand which removes all phasing from a VCF file (HPandPStags, pipe notation).Add option
--tag=to thephasesubcommand, which allows to choose whether ReadBackedPhasing-compatibleHPtags or standardPStags are used to describe phasing in the output VCF.Manage versions with versioneer. This means that
whatshap --versionand the program version in the VCF header will include the Git commit hash, such aswhatshap 0.11+50.g1b7af7a.Add subcommand “haplotag” to tag reads in a BAM file with their haplotype.
Fix a bug where re-alignment around variants at the very end of a chromosome would lead to an AssertionError.
v0.11 (2016-06-09)
When phasing a pedigree, blocks that are not connected by reads but can be phased based on genotypes will be connected per default. This behavior can be turned off using option
--no-genetic-haplotyping.Implemented allele detection through re-alignment: To detect which allele of a variant is seen in a read, the query is aligned to the two haplotypes at that position. This results in better quality phasing, especially for low-quality reads (PacBio). Enabled if
--referenceis provided. Current limitation: No score for the allele is computed.As a side-effect of the new allele detection, we can now also phase insertions, deletions, MNPs and “complex” variants.
Added option
--chromosometo only work on specifed chromosomes.Use constant recombination rate per default, allows to use
--pedwithout using--genmap.whatshaphas become a command with subcommands. From now on, you need to runwhatshap phaseto phase VCFs.Add a
statssubcommand that prints statistics about phased VCFs.
v0.10 (2016-04-27)
Use
--pedto phase pedigrees with the PedMEC algorithmPhase all samples in a multi-sample VCF
Drop support for Python 3.2 - we require at least Python 3.3 now
v0.9 (2016-01-05)
This is the first release available via PyPI (and that can therefore be installed via
pip install whatshap)
January 2016
Trio phasing implemented in a branch
September 2015
pWhatsHap implemented (in a branch)
April 2015
Create haplotype-specific BAM files
February 2015
Smart read selection
January 2015
Ability to read multiple BAM files and merge them on the fly
December 2014
Logo
Unit tests
November 2014
Cython wrapper for C++ code done
Ability to write a phased VCF (using HP tags).
June 2014
Repository for WhatsHap refactoring created
April 2014
The WhatsHap algorithm is introduced at RECOMB