2026-04-04tools[intermediate][deep-dive]

jq's Transformation Engine: Advanced JSON Manipulation

Master jq's advanced transformation patterns including recursive operations, custom functions, and complex data reshaping that turn unwieldy JSON into exactly what you need.

jq's Transformation Engine: Advanced JSON Manipulation

Most developers know jq for basic field extraction: jq '.name' or jq '.items[0]'. But jq can mangle the data format that you have into the one that you want with very little effort, and its real power lies in sophisticated transformations that reshape entire data structures.

Beyond Basic Filters: The Art of Reconstruction

The key insight is that every filter has an input and an output. Even literals like "hello" or 42 are filters - they take an input but always produce the same literal as output. This means you can construct entirely new JSON structures by combining filters creatively.

Consider this common scenario: you have an API response with nested user data, but you need a flat structure for a CSV export:

{
  "users": [
    {
      "id": 1,
      "profile": {"name": "Alice", "email": "alice@example.com"},
      "metadata": {"created": "2023-01-15", "role": "admin"}
    }
  ]
}

The transformation becomes:

jq '.users[] | {id, name: .profile.name, email: .profile.email, role: .metadata.role}'

But this is still basic. The real power emerges when you start thinking in terms of data flows and transformations.

Recursive Descent: Navigating Complex Structures

The recurse(f) function allows you to search through a recursive structure, and extract interesting data from all levels. This becomes invaluable when dealing with nested configuration files, file system representations, or organizational hierarchies.

Suppose you have a complex configuration object and need to find all database connection strings, regardless of nesting level:

jq 'recurse | objects | select(has("connection_string")) | .connection_string'

The beauty of recurse is that it generates all possible paths through your data structure. You can then filter and transform at any level.

Variable Binding: Breaking Complex Transformations

Variables are an absolute necessity in most programming languages, but they're relegated to an "advanced feature" in jq. If you calculate a value, and you want to use it more than once, you'll need to store it in a variable.

Here's where jq variables become crucial for complex transformations:

jq '.items[] as $item | 
    ($item.price * $item.quantity) as $total |
    {name: $item.name, unit_price: $item.price, total: $total, tax: ($total * 0.08)}'

This pattern—binding intermediate results to variables—prevents redundant calculations and makes complex transformations readable.

Function Definition: Building Reusable Logic

It is also possible to define functions in jq, although this is is a feature whose biggest use is defining jq's standard library. But you can define your own functions within filters:

jq 'def calculate_discount(rate): . * (1 - rate);
    .products[] | {name, original: .price, discounted: (.price | calculate_discount(0.15))}'

Functions become essential when you're applying the same transformation logic across multiple data points or building up complex calculations step by step.

Stream Processing: Handling Large Data Sets

Instead of running the filter for each JSON object in the input, read the entire input stream into a large array and run the filter just once with --slurp, but sometimes you want the opposite: process data as a stream to handle large files that won't fit in memory.

splits(regex), splits(regex; flags) These provide the same results as their split counterparts, but as a stream instead of an array. This streaming approach becomes critical when processing large log files or data exports.

Pattern Matching and Text Processing

Modern jq includes powerful regex capabilities. sub(regex; tostring) and gsub is like sub but all the non-overlapping occurrences of the regex are replaced by the string, after interpolation.

For log processing, you might extract structured data from unstructured text:

jq -R 'capture("(?<timestamp>\\d{4}-\\d{2}-\\d{2}) (?<level>\\w+) (?<message>.*)") | select(.level == "ERROR")'

The -R flag treats each line as a string rather than JSON, perfect for processing logs or other text data.

Advanced Array Operations

jq 'sort_by(.price)' and jq '. .price | add' show basic array operations, but you can build sophisticated aggregations:

# Group by category, then sum prices within each group
jq 'group_by(.category) | map({category: .[0].category, total: map(.price) | add, count: length})'

This pattern—group, map over groups, aggregate within groups—handles most complex reporting scenarios.

Pro Tip

When building complex jq transformations, develop them incrementally. Start with the basic structure, then add one transformation at a time. Use jq -n 'null' to test expressions without input data, and remember that you can pipe intermediate results to debug to inspect the data flow: jq '.items[] | debug | .price'.

Example

Here's a real-world transformation that converts a messy API response into a clean summary report:

# Input: complex e-commerce API response
# Output: clean sales summary by region

jq '
  def safe_divide(a; b): if b == 0 then 0 else a/b end;
  
  .orders[]
  | group_by(.shipping.region)
  | map({
      region: .[0].shipping.region,
      orders: length,
      revenue: map(.items[].price * .items[].quantity) | add,
      avg_order: safe_divide(map(.items[].price * .items[].quantity) | add; length)
    })
  | sort_by(-.revenue)
'

This single jq expression extracts orders, groups by region, calculates totals and averages, and sorts by revenue—replacing dozens of lines of traditional scripting with a declarative transformation.