Eiko Wagenknecht
Software Developer, Freelancer & Founder

Understanding the Anki APKG Format

(updated ) Eiko Wagenknecht

If you’re building educational software or working with Anki flashcards programmatically, you’ve discovered that Anki’s APKG file format lacks proper documentation.

Anki is the most popular open-source spaced repetition software for memorizing information through flashcards. It uses the custom APKG format to store and share flashcard decks.

There’s no official specification, though - just outdated reverse-engineering attempts scattered across the web. This leaves developers guessing at the format’s structure.

I ran into this exact issue while building tools that needed to read and write APKG files. After spending too much time piecing together fragments, I analyzed the format myself.

I’ll cover the different APKG format versions and their technical details in this series, starting with the format structure below.

Important

This isn’t an official spec, just what I’ve figured out through research and reverse engineering. If you spot any mistakes, please let me know and I’ll fix them.

Table of Contents

Current Documentation Problems

The official Anki documentation briefly mentions the APKG format, but says nothing about its structure.

The anki-cards-web-browser documentation was published in 2017 (when the current Anki version was 2.0.47) and provides detailed structure and content descriptions. However, a lot changed in the past 8 years and current Anki versions don’t use the exact same format.

Many sources link to a wiki page that’s no longer available. The latest snapshot is from 2018 and contains experimental information from examining generated databases.

The most recent source I found is the AnkiDroid wiki, updated in 2024 with detailed SQLite database descriptions. However, it’s based on database version 11 and doesn’t account for recent changes to the database structure. It also lacks information about the APKG file structure.

For developers, there’s also the source code of Anki. Some relevant files are:

APKG vs. COLPKG: Two Sides of the Same Coin

Anki exports use either COLPKG (collection package) or APKG (deck package) formats. Both use the same structure - ZIP archives containing the same file types. They differ in what gets included.

Note

For the rest of this post, I’ll refer to both as the “APKG format” since they share the same underlying structure.

COLPKG (Collection Package): Used for backing up your entire collection or migrating between devices. Importing a COLPKG replaces your existing collection with the package contents. Collection packages created with previous versions of Anki were called collection.apkg.

APKG (Deck Package): Used for sharing specific decks or adding content to existing collections. Importing an APKG adds contents to your existing collection without replacing anything. For previously imported notes, Anki keeps the most recent version.

Both formats can be exported in an older, more compatible format (see Format Evolution: Three Format Versions).

Data TypeCOLPKGAPKG
Deck scopeAlways all decksSingle deck or all decks
Scheduling dataAlways includedOptional (when excluded, removes marked/leech tags)
Note typesAll (even unused)Only used note types
Deck presetsAlways includedOptional
Media filesOptionalOptional

Format Evolution: Three Format Versions

The APKG format evolved significantly, with major changes in 2012, 2018, and 2020 - 2022. Here are the three main versions and their differences. I’ll use the names from the Anki code, with added emojis for easier distinction:

📜 Legacy 1 (Older Shared Decks, 2012 - 2018)

Modern Anki doesn’t use this format, but it’s worth mentioning for context. Anki 2.0 introduced it in 2012 as the first 2.x file format.

🔄 Legacy 2 (Maximum Compatibility, 2018 - 2019)

Anki 2.1 introduced this format in 2018. It’s still widely used.

Export with Support older Anki versions (slower/larger files) creates this format.

Changes from 📜 Legacy 1 to 🔄 Legacy 2:

⚡ Latest (Modern Format, 2020 - Present)

This format emerged between 2020 and 2022. First, the database schema evolved through several versions: v11 → v14 (April 2020, separate tables deck_config, config and tags) → v15 (May 2020, separate tables fields, templates, notetypes, decks) → v17 (January 2021, additional fields for tags) → v18 (May 2021, primary key for graves). Anki 2.1.50 (April 2022) finally added zstd compression, introducing collection.anki21b files.

Note

The v16 schema is somewhat of an oddity because it only contained semantic changes, not actual changes to the database structure.

Modern Anki versions use this schema internally and when exporting decks without the Support older Anki versions (slower/larger files) option enabled.

Changes from 🔄 Legacy 2 to ⚡ Latest:

Format Comparison

Here’s how the three formats compare:

Feature📜 Legacy 1🔄 Legacy 2⚡ Latest
Database file.anki2.anki21.anki21b
Database schemav11v11v18
Number of tables5512
ZIP compressiondeflatedeflatestore (database compressed individually)
Database compression❌ none❌ none✅ zstd
Configuration storage📄 JSON in TEXT📄 JSON in TEXT📊 Protobuf in BLOB
Media mapping📄 JSON📄 JSON📊 Protobuf
Meta file
Data readability🟡 Medium🟡 Medium🔴 Low (binary format)
File size🔴 Large🔴 Large🟢 Small
Compatibility🟡 Old Anki only🟢 Wide compatibility🔴 Modern Anki only

What’s Inside an APKG File

APKG files are standard ZIP archives that open with any ZIP tool. 📜🔄 Legacy formats use “deflate” compression for databases and store other files uncompressed. The ⚡ Latest format stores all files uncompressed, but compresses the database file inside the ZIP archive with zstd.

Despite different database names and formats, all APKG files share the same basic structure. Each contains a database with deck data, a media mapping file, numbered media files (if applicable), and a metadata file. All files are stored in the archive root with no subdirectories.

Example File Structure (Legacy 2)

example-deck.apkg (ZIP archive)
├── collection.anki21   # Main SQLite database (🔄 Legacy 2 format)
├── collection.anki2    # Dummy compatibility database (📜 Legacy 1 format)
├── meta                # Format metadata (JSON or protobuf)
├── media               # Media file mapping
├── 0                   # Media file (image, audio, etc.)
├── 1                   # Media file
└── ...                 # Additional media files

Format Detection

To determine an APKG file’s format, check which database files are present after extracting the ZIP archive:

  1. Rename the .apkg or .colpkg file to .zip and extract it.
  2. Check the database files:
    • collection.anki2 only → 📜 Legacy 1
    • collection.anki2 + collection.anki21 → 🔄 Legacy 2
    • collection.anki2 + collection.anki21b → ⚡ Latest
Important

This relies on the current state of Anki and its file structure. For future Anki versions, you might need to check the meta file for version information.

Note

The collection.anki2 file in 🔄 Legacy 2 and ⚡ Latest formats is only a compatibility dummy.

Edge Cases in Format Detection

During the evolution from 🔄 Legacy 2 to ⚡ Latest, intermediate database schema versions (v14-v17) also used collection.anki21 files. Anki used these internally but never exported them, so you won’t find them in shared decks. If you encounter these transitional formats, check the meta file version field for the authoritative version. To find the exact database schema version, examine the ver column in the col table of the SQLite database.

As Anki evolves, future formats will likely use the meta file for definitive version information.

The meta File - Version Information

Only needed in edge cases currently, the meta file is a Protobuf-encoded file that contains metadata about the APKG file. It contains a single field, version, which shows the APKG version:

syntax = "proto3";
message PackageMetadata {
  enum Version {
    VERSION_UNKNOWN = 0;
    VERSION_LEGACY_1 = 1;
    VERSION_LEGACY_2 = 2;
    VERSION_LATEST = 3;
  }
  Version version = 1;
}

It is set to 2 for 🔄 Legacy 2 and to 3 for ⚡ Latest. The 📜 Legacy 1 format does not have a meta file.

collection.anki2 file - Compatibility Layer

This file is an SQLite database with the same structure as the collection.anki21 file (see this post for a detailed explanation). However, it only contains dummy content for compatibility purposes: a default deck with a single card saying “Please update to the latest Anki version, then import the .colpkg/.apkg file again.”

Since it contains no actual data, I won’t examine it further.

When Each Format Matters

Anki has been around for many years, resulting in a large number of decks created and shared at different times - and in different formats. Here’s where you’ll most likely encounter each format and when to use them for exporting your decks.

📜 Legacy 1 (Older Shared Decks)

You’ll encounter the 📜 Legacy 1 format when downloading older shared decks, like those on AnkiWeb that haven’t been updated in years. Modern Anki no longer creates files in this format, but you may need to work with 📜 Legacy 1 files when using popular older decks. There’s no practical reason to create these files yourself.

🔄 Legacy 2 (Maximum Compatibility)

🔄 Legacy 2 has been around for years and is supported by most tools in the broader Anki ecosystem. It will likely continue to be supported even if Anki ceases to exist. This format is created when you select “Support older Anki versions (slower/larger files)” when exporting. Choose 🔄 Legacy 2 when sharing decks with users who may have older Anki versions, or when working with tools that haven’t been updated to handle the ⚡ Latest format. The JSON-based configuration storage also makes it easier to inspect and modify deck data programmatically.

⚡ Latest (Modern Format)

The ⚡ Latest format is optimized for modern Anki installations and offers the best performance and size characteristics. It uses more efficient compression (zstd) and protobuf for data serialization, resulting in smaller file sizes and faster processing. However, the binary protobuf format complicates manual inspection and the development of supporting tools. Consequently, few tools besides Anki itself can properly handle this format. It will, however, support all the latest features and improvements in Anki, like the new FSRS algorithm. As long as you’re only using Anki, there’s no reason not to use it, since you can always export to the 🔄 Legacy 2 format if needed.

Which Format Should You Choose?

Here’s how to choose the right format for your needs:

PriorityRecommended FormatWhy
Wide compatibility🔄 Legacy 2Works with older Anki versions
File size / performance⚡ LatestBetter compression and processing
Data inspection / modification🔄 Legacy 2Human-readable JSON configuration
Using the latest Anki features⚡ LatestCurrent standard, ongoing development
Sharing with unknown users🔄 Legacy 2Safer compatibility choice

What’s Coming Next

Now that I’ve covered the landscape and high-level structure, let’s dive deeper. This series will continue with a detailed analysis of the two main formats: 🔄 Legacy 2 and ⚡ Latest.

Part 1: Overview (this post): This post provides an overview of the Anki APKG format, its evolution, and the differences between the 📜 Legacy 1, 🔄 Legacy 2, and ⚡ Latest formats.

Part 2: The 🔄 Legacy 2 Format in Detail: In the next post, I’ll cover the 🔄 Legacy 2 format in depth: the SQLite database structure, tables and their relationships, JSON configuration fields, and media file handling.

Part 3: The ⚡ Latest Format in Detail (not yet published): This covers the ⚡ Latest format including the protobuf schema, database schema v18, and the key differences from 🔄 Legacy 2.

Part 4: APKG Format Critique (not yet published): The final will be a critique of the APKG format, its strengths and weaknesses, and my take on how a spaced repetition software could do better.

Building Better Spaced Repetition Tools

I didn’t reverse-engineer this format just for fun.

What started as figuring out how to improve my own memory turned into building a spaced repetition app. Before you think “oh great, yet another flashcard app” - I’m focused on solving the user experience and data ownership problems that existing tools haven’t addressed.

The main thing that’s stopped me from fully committing to existing tools is vendor lock-in. I want to own my learning data - the cards, decks and knowledge that I create, now and forever. Our study materials are valuable and shouldn’t be trapped in proprietary formats that make third-party development a nightmare.

Step one: Build a TypeScript library that converts between spaced repetition formats from tools like Anki, Mochi and Mnemosyne. To do it right, I need to understand exactly how these formats work.

That’s why you’re getting these detailed technical breakdowns.

Want Updates?

I’ll share progress through blog posts - more format deep-dives, implementation details, and early tool access.

Join the newsletter for updates when there’s something worth sharing. No spam, just occasional progress reports.

I’ll replace this with a proper signup form soon. For now, send that email to get added.

No Comments? No Problem.

This blog doesn't support comments, but your thoughts and questions are always welcome. Reach out through the contact details at the bottom of the page.

Support Me

If you found this page helpful and want to say thanks, you can support me here.