The Open Source Community Tooling Built on Avro

Specification

  • avro - (forks: 1066) (stars: 1594) (watchers: 1594) - apache avro is a data serialization system.

Registries

  • schema registry - (forks: 736) (stars: 1234) (watchers: 1234) - confluent schema registry for kafka
  • schema registry ui - (forks: 88) (stars: 321) (watchers: 321) - web tool for avro schema registry |
  • schemer - (forks: 3) (stars: 90) (watchers: 90) - schema registry for csv, tsv, json, avro and parquet schema. supports schema inference and graphql api.

Queries

  • rq - (forks: 45) (stars: 1553) (watchers: 1553) - record query - a tool for doing record analysis and transformation

Education

  • examples - (forks: 458) (stars: 670) (watchers: 670) - apache kafka and confluent platform examples and demos
  • kafka storm starter - (forks: 335) (stars: 726) (watchers: 726) - code examples that show to integrate apache kafka 0.8+ with apache storm 0.9+ and apache spark streaming 1.1+, while using apache avro as the data serialization format.
  • avro hadoop starter - (forks: 86) (stars: 111) (watchers: 111) - example mapreduce jobs in java, hive, pig, and hadoop streaming that work on avro data.
  • Avro2TF - (forks: 19) (stars: 118) (watchers: 118) - avro2tf is designed to fill the gap of making users' training data ready to be consumed by deep learning training frameworks.

Serialization

  • avsc - (forks: 98) (stars: 844) (watchers: 844) - avro for javascript :zap:
  • avro4s - (forks: 178) (stars: 536) (watchers: 536) - avro schema generation and serialization / deserialization for scala
  • fastavro - (forks: 115) (stars: 362) (watchers: 362) - fast avro for python
  • gogen avro - (forks: 66) (stars: 191) (watchers: 191) - generate go code to serialize and deserialize avro schemas
  • avrohugger - (forks: 82) (stars: 147) (watchers: 147) - generate scala case class definitions from avro schemas
  • scalavro - (forks: 31) (stars: 119) (watchers: 119) - a reflection-based avro library in scala.
  • abracad - (forks: 31) (stars: 107) (watchers: 107) - a clojure library for de/serializing clojure data structures with avro.
  • python avro json serializ - (forks: 32) (stars: 104) (watchers: 104) - serializes data into a json format using avro schema.
  • avro_turf - (forks: 44) (stars: 97) (watchers: 97) - a library that makes it easier to use the avro serialization format from ruby.
  • avro rs - (forks: 48) (stars: 89) (watchers: 89) - avro client library implementation in rust
  • json schema avro - (forks: 22) (stars: 102) (watchers: 102) - avro to json schema, and back
  • jsAvroPhonetic - (forks: 56) (stars: 84) (watchers: 84) - a javascript implementation of avro phonetic
  • kafka avro - (forks: 34) (stars: 76) (watchers: 76) - node.js bindings for librdkafka with avro schema serialization.
  • pyavroc - (forks: 17) (stars: 46) (watchers: 46) - an avro file reader/writer for python
  • BlueSteel - (forks: 15) (stars: 47) (watchers: 47) - an avro encoding/decoding library for swift.
  • libserdes - (forks: 35) (stars: 36) (watchers: 36) - avro serialization/deserialization c/c++ library with confluent schema-registry support
  • vulcan - (forks: 8) (stars: 46) (watchers: 46) - functional avro for scala
  • avro schema - (forks: 2) (stars: 48) (watchers: 48) - apache avro schema tools for tarantool

Generators

  • xml avro - (forks: 56) (stars: 58) (watchers: 58) - generate avro schema and avro binary from xsd schema and xml

Connectors

  • spark avro - (forks: 316) (stars: 535) (watchers: 535) - avro data source for apache spark
  • cpp serializers - (forks: 82) (stars: 484) (watchers: 484) - benchmark comparing various data serialization libraries (thrift, protobuf etc.) for c++

Code Generation

  • gradle avro plugin - (forks: 53) (stars: 135) (watchers: 135) - a gradle plugin to allow easily performing java code generation for apache avro. it supports json schema declaration files, json protocol declaration files, and avro idl files.
  • sbt avrohugger - (forks: 37) (stars: 95) (watchers: 95) - sbt plugin for generating scala sources for apache avro schemas and protocols.
  • avromatic - (forks: 11) (stars: 56) (watchers: 56) - generate ruby models from avro schemas

Tabular

  • iceberg - (forks: 48) (stars: 363) (watchers: 363) - iceberg is a table format for large, slow-moving tabular data

Toolchains

  • DevOps Python tools - (forks: 152) (stars: 310) (watchers: 310) - 80+ devops & data cli tools - aws, log anonymizer, spark, hadoop, hbase, hive, impala, linux, docker, spark data converters & validators (avro/parquet/json/csv/ini/xml/yaml), travis ci, ambari, blueprints, cloudformation, elasticsearch, solr, pig, ipython - python / jython tools
  • bigdata playground - (forks: 54) (stars: 157) (watchers: 157) - a complete example of a big data application using : kubernetes (kops/aws), apache spark sql/streaming/mlib, apache flink, scala, python, apache kafka, apache hbase, apache parquet, apache avro, apache storm, twitter api, mongodb, nodejs, angular, graphql

Data Store

  • chana - (forks: 50) (stars: 332) (watchers: 332) - avro data store based on akka

Data Generation

  • ratatool - (forks: 45) (stars: 251) (watchers: 251) - a tool for data sampling, data generation, and data diffing

Conversion

  • json wikipedia - (forks: 41) (stars: 241) (watchers: 241) - json wikipedia, contains code to convert the wikipedia xml dump into a json/avro dump
  • json avro converter - (forks: 60) (stars: 158) (watchers: 158) - json to avro conversion tool designed to make migration to avro easier.

Database

  • storagetapper - (forks: 46) (stars: 205) (watchers: 205) - storagetapper is a scalable realtime mysql change data streaming, logical backup and logical replication service

Binary

  • jackson dataformats binar - (forks: 67) (stars: 187) (watchers: 187) - uber-project for standard jackson binary format backends: avro, cbor, protobuf, smile

IDE

  • vscode data preview - (forks: 20) (stars: 168) (watchers: 168) - data preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large json array/config, yaml, apache arrow, avro & excel data files

Documentation

  • avrodoc - (forks: 60) (stars: 121) (watchers: 121) - documentation tool for avro schemas

Validation

  • aptos - (forks: 16) (stars: 141) (watchers: 141) - :sunny: a tool for validating data using json schema and converting json schema documents into different data-interchange formats

Command Line Interface

  1. schema registry - (forks: 24) (stars: 96) (watchers: 96) - a cli and go client for kafka schema registry

Semantics

  1. schema_salad - (forks: 33) (stars: 40) (watchers: 40) - semantic annotations for linked avro data

Like JSON Schema, Avro is a very data centric specification. I need to better understand how it is used by leading providers like Confluent for powering Kafka, but I also want to better understand its relationship to JSON Schema, and how it is used for AsyncAPI and OpenAPI. This dive provided me with a fresh look at how the API space is evolving, and also how data and our databases are still king when it comes to everything API.