It’s been 7 months since we first announced Scio at GCPNEXT16. There’re now dozens of internal teams and a couple of other companies using Scio to run hundreds of pipelines on a daily basis. Within Spotify, Scio is now the prefered framework for building new data pipelines on Google Cloud Platform. We’ve also made 19 released and added tons of features and improvements. Below is a list of some notable ones.

  • Interactive REPL
  • Type safe BigQuery macro improvements and Scio-IDEA-plugin
  • BigQuery standard SQL 2011 syntax support
  • HDFS source and sink
  • Avro file compression support
  • Bigtable multi-table sink and utility for cluster scaling
  • Protobuf file support and usability improvements
  • Accumulator usability improvements
  • End-to-end testing utilities and matchers improvements
  • Join performance improvements and skewed join
  • Metrics interface and enhancements

I talked about Scio at Scala by the Bay last week and here are the slides.


comments powered by Disqus