stream-json

stream-json is the micro-library of Node.js stream components for creating custom JSON processing pipelines with a minimal memory footprint. It can parse JSON files far exceeding available memory streaming individual primitives using a SAX-inspired API. Includes utilities to stream JSON database dumps.

stream-json

stream-jsonis a micro-library of node.js stream components with minimal dependencies for creating custom data processors oriented on processing huge JSON files while requiring a minimal memory footprint. It can parse JSON files far exceeding available memory. Even individual primitive data items (keys, strings, and numbers) can be streamed piece-wise. Streaming SAX-inspired event-based API is included as well.

Available components:

  • Streaming JSON Parser.
    • It produces a SAX-like token stream.
    • Optionally it can pack keys, strings, and numbers (controlled separately).
    • The main moduleprovides helpers to create a parser.
  • Filters to edit a token stream:
    • Pickselects desired objects.
      • It can produces multiple top-level objects just like in JSON Streamingprotocol.
      • Don't forget to use StreamValueswhen picking several subobjects!
    • Replacesubstitutes objects with a replacement.
    • Ignoreremoves objects.
    • Filterfilters tokens maintaining stream's validity.
  • Streamers to produce a stream of JavaScript objects.
    • StreamValuescan handle a stream of JSON objects.
      • Useful to stream objects selected by Pick, or generated by other means.
      • It supports JSON Streamingprotocol, where individual values are separated semantically (like in "{}[]"), or with white spaces (like in "true 1 null").
    • StreamArraytakes an array of objects and produces a stream of its components.
      • It streams array components individually taking care of assembling them automatically.
      • Created initially to deal with JSON files similar to Django-produced database dumps.
      • Only one top-level array per stream is valid!
    • StreamObjecttakes an object and produces a stream of its top-level properties.
      • Only one top-level object per stream is valid!
  • Essentials:
    • Assemblerinterprets a token stream creating JavaScript objects.
    • Disassemblerproduces a token stream from JavaScript objects.
    • Stringerconverts a token stream back into a JSON text stream.
    • Emitterreads a token stream and emits each token as an event.
      • It can greatly simplify data processing.
  • Utilities:
    • emit()makes any stream component to emit tokens as events.
    • withParser()helps to create stream components with a parser.
    • Batchbatches items into arrays to simplify their processing.
    • Verifierreads a stream and verifies that it is a valid JSON.

All components are meant to be building blocks to create flexible custom data processing pipelines. They can be extended and/or combined with custom code. They can be used together with stream-chainto simplify data processing.

This toolkit is distributed under New BSD license.

Introduction

const {chain}  = require('stream-chain');

const {parser} = require('stream-json');
const {pick}   = require('stream-json/filters/Pick');
const {ignore} = require('stream-json/filters/Ignore');
const {streamValues} = require('stream-json/streamers/StreamValues');

const fs   = require('fs');
const zlib = require('zlib');

const pipeline = chain([
  fs.createReadStream('sample.json.gz'),
  zlib.createGunzip(),
  parser(),
  pick({filter: 'data'}),
  ignore({filter: /\b_meta\b/i}),
  streamValues(),
  data => {
    const value = data.value;
    return value && value.department === 'accounting' ? data : null;
  }
]);

let counter = 0;
pipeline.on('data', () => ++counter);
pipeline.on('end', () =>
  console.log(`The accounting department has ${counter} employees.`));

See the full documentation in Wiki.

Companion projects:

  • stream-csv-as-jsonstreams huge CSV files in a format compatible with stream-json: rows as arrays of string values. If a header row is used, it can stream rows as objects with named fields.

Installation

npm install --save stream-json
# or: yarn add stream-json

Use

The whole library is organized as a set of small components, which can be combined to produce the most effective pipeline. All components are based on node.js streams, and events. They implement all required standard APIs. It is easy to add your own components to solve your unique tasks.

The code of all components is compact and simple. Please take a look at their source code to see how things are implemented, so you can produce your own components in no time.

Obviously, if a bug is found, or a way to simplify existing components, or new generic components are created, which can be reused in a variety of projects, don't hesitate to open a ticket, and/or create a pull request.

Release History

  • 1.3.3 Bugfix: very large/infinite streams with garbage didn't fail. Thx Arne Marschall!
  • 1.3.2 Bugfix: filters could fail with packed-only token streams. Thx Trey Brisbane!
  • 1.3.1 Bugfix: reverted the last bugfix in Verifier, a bugfix in tests, thx Guillermo Ares.
  • 1.3.0 added Batch, a bugfix in Verifier.
  • 1.2.1 the technical release.
  • 1.2.0 added Verifier.
  • 1.1.4 fixed Filtergoing haywire, thx @codebling!
  • 1.1.3 fixed Parserstreaming numbers when shouldn't, thx Grzegorz Lachowski!
  • 1.1.2 fixed Stringernot escaping some symbols, thx Pavel Bardov!
  • 1.1.1 minor updates in docs and comments.
  • 1.1.0 added Disassembler.
  • 1.0.3 minor tweaks, added TypeScript typings and the badge.
  • 1.0.2 minor tweaks, documentation improvements.
  • 1.0.1 reorg to fix export problems.
  • 1.0.0 the first 1.0 release.
  • 0.6.1 the technical release.
  • 0.6.0 added Stringer to convert event streams back to JSON.
  • 0.5.3 bug fix to allow empty JSON Streaming.
  • 0.5.2 bug fixes in Filter.
  • 0.5.1 corrected README.
  • 0.5.0 added support for JSON Streaming.
  • 0.4.2 refreshed dependencies.
  • 0.4.1 added StreamObjectby Sam Noedel.
  • 0.4.0 new high-performant Combo component, switched to the previous parser.
  • 0.3.0 new even faster parser, bug fixes.
  • 0.2.2 refreshed dependencies.
  • 0.2.1 added utilities to filter objects on the fly.
  • 0.2.0 new faster parser, formal unit tests, added utilities to assemble objects on the fly.
  • 0.1.0 bug fixes, more documentation.
  • 0.0.5 bug fixes.
  • 0.0.4 improved grammar.
  • 0.0.3 the technical release.
  • 0.0.2 bug fixes.
  • 0.0.1 the initial release.

HomePage

http://github.com/uhop/stream-json

Repository

https://github.com/uhop/stream-json


上一篇:stream-chain
下一篇:quicktype

相关推荐

  • 高效遍历匹配Json数据,避免嵌套循环

    工作中经常会遇到这样的需求: 1.购物车列表中勾选某些,点击任意一项,前往详情页,再返回购物车依旧需要呈现勾选状态 2.勾选人员后,前往别的页面,再次返回,人员依旧程勾选状态 3.等等.......

    2 年前
  • 面试题|手写JSON解析器

    这周的 Cassidoo 的每周简讯有这么一个面试题:: 写一个函数,这个函数接收一个正确的 JSON 字符串并将其转化为一个对象(或字典,映射等,这取决于你选择的语言)。

    3 个月前
  • 隐藏某些价值观输出stringify() JSON。

    Evan CarrollNilesh(https://stackoverflow.com/users/124486/evancarroll)提出了一个问题:Hide certain values in...

    2 年前
  • 随着Nodejs JSON对象响应(转换对象数组JSON字符串)

    Rudolf Olahclimboid(https://stackoverflow.com/users/9903/rudolfolah)提出了一个问题:Responding with a JSON o...

    2 年前
  • 铬sendrequest错误:列表的循环结构转换到JSON

    SomeKittensSkizit(https://stackoverflow.com/users/1216976/somekittens)提出了一个问题:Chrome sendrequest err...

    2 年前
  • 重新认识 package.json

    前言 🤔 在每个项目的根目录下面,一般都会有一个 package.json 文件,其定义了运行项目所需要的各种依赖和项目的配置信息(如名称、版本、许可证等元数据)。

    4 天前
  • 重新认识 package.json

    前言 🤔 在每个项目的根目录下面,一般都会有一个 package.json 文件,其定义了运行项目所需要的各种依赖和项目的配置信息(如名称、版本、许可证等元数据)。

    19 天前
  • 遍历json获得数据的几种方法小结

    Json在Web开发的用处非常广泛,作为数据传递的载体,如何解析Json返回的数据是非常常用的。下面介绍下四种解析Json的方式: Part 1 Part 2 解释: countryO...

    3 年前
  • 通过jsonp获取json数据实现AJAX跨域请求

    AJAX()是用于创建快速动态网页的一种技术,它在不重新加载整个页面的情况下,与服务器交换数据并更新部分网页,ajax 使用对象在后台与服务器交换数据,XMLHttpRequest 是 AJAX 的基...

    3 年前
  • 通过JavaScript创建JSON对象动态(无级连字符串)

    Waqar Alamgirohadinho(https://stackoverflow.com/users/457124/waqaralamgir)提出了一个问题:Create JSON object...

    2 年前

官方社区

扫码加入 JavaScript 社区