Show HN: Execute SQL against Bluesky firehose

(github.com)

100 points | by dm03514 17 days ago

3 comments

  • dm03514 17 days ago
    Hello, I’ve been working on a project that embeds duckdb for stream processing.

    I just added support for websocket sources which enables sql over the Bluesky firehouse.

    https://github.com/turbolytics/sql-flow?tab=readme-ov-file#c...

    Duckdb does all the sql execution, and python is responsible for sourcing the data.

    The project is still quite young and I’m very much still experimenting, but I’d love any feedback. Thank you.

    • rch 14 days ago
      How do you position this relative to Flink SQL?
      • dm03514 13 days ago
        I’m thinking of this as a lightweight (single node) alternative in the same way duckdb is focused on data that can be processed by a single node.

        I think / hope sqlflow will be a viable/lightweight/cost effective/easy to operate alternative to flink when working with small-medium sized data (on the orders of <10,000 messages / second)

  • dm03514 3 days ago
    Hello! Just released a version of SQLFlow that adds support for streaming data to iceberg tables.

    https://github.com/turbolytics/sql-flow?tab=readme-ov-file#s...

    pyiceberg does the heavy lifting. It is a great project with a lot of active development.

    It's interesting to see how immature iceberg ecosystem is. For example duckdb iceberg does not support writing to tables, neither does the go iceberg project. Iceberg writing was just recently added to the pyiceberg project!

    Would love to learn more about peoples actual use of iceberg in the wild.

    Are you using iceberg? How are you writing data to it? Which iceberg features are you using? Do you leverage snapshots at all?

    Thank you

  • dm03514 10 days ago
    Created a tutorial on using SQLFlow to interact with the Bluesky firehose:

    https://www.linkedin.com/pulse/learning-sqlflow-using-bluesk...