Nushell for data exploration #1

I’ve been working with Nushell as a substitute for both a Unix shell and my usual tools for commandline data manipulation. All of my examples are at work though, so I’m writing up a few demonstrations.

I’ll collect a public dataset and work with it. In this case, the history of product recalls from the US Consumer Product Safety Commission. See https://catalog.data.gov/dataset/recalls-api. It’s public, large, has dirty data, and doesn’t require an API key to access.

I’ll use nushell’s built in http handler to get the dataset:

mut recalls = (http get "http://www.saferproducts.gov/RestWebServices/Recall?format=json")

I’ve chosen to use “mut” rather than “let” for this variable because I’m going to modify it. I could just as easily overlay a new declaration with the same name.

Now to explore the data a bit…

# Quite a few records...
$recalls | length
8759

# What fields (columns) does it have?
$recalls | columns
╭────┬───────────────────────╮
  0  RecallID              
  1  RecallNumber          
  2  RecallDate            
  3  Description           
  4  URL                   
  5  Title                 
  6  ConsumerContact       
  7  LastPublishDate       
  8  Products              
  9  Inconjunctions        
 10  Images                
 11  Injuries              
 12  Manufacturers         
 13  Retailers             
 14  Importers             
 15  Distributors          
 16  SoldAtLabel           
 17  ManufacturerCountries 
 18  ProductUPCs           
 19  Hazards               
 20  Remedies              
 21  RemedyOptions         
╰────┴───────────────────────╯

# Examining a single record, what are the types for each column?
$recalls | first | describe
record<RecallID: int, RecallNumber: string, RecallDate: date, Description: string, URL: string, Title: string, ConsumerContact: string, LastPublishDate: string,
Products: table<Name: string, Description: string, Model: string, Type: string, CategoryID: string, NumberOfUnits: string>, Inconjunctions: list<any>, Images:
table<URL: string, Caption: string>, Injuries: table<Name: string>, Manufacturers: table<Name: string, CompanyID: string>, Retailers: table<Name: string,
CompanyID: string>, Importers: list<any>, Distributors: list<any>, SoldAtLabel: nothing, ManufacturerCountries: table<Country: string>, ProductUPCs: list<any>, 
Hazards: table<Name: string, HazardType: string, HazardTypeID: string>, Remedies: table<Name: string>, RemedyOptions: table<Option: string>>

The data is all strings with some tables. I’ll examine the last one in the dataset. I don’t know yet if it’s the oldest because dates are stored as strings and I’m not familiar with the API’s output.

$recalls | last
╭───────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 RecallID               3722                                                                                                                                 
 RecallNumber           73003                                                                                                                                
 RecallDate             1973-06-08T00:00:00                                                                                                                  
 Description            Tappan Built-In Ovens Warning NEWS from CPSC U.S. Consumer Product Safety Commission Office of Information and Public Affairs        
                        Washington, DC 20207 FOR IMMEDIATE RELEASE June 8, 1973 Release # 73-003 Consumer Product Safety Commission Says Some Tappan         │
                        Built-In Ovens May Pose Hazard To Consumers WASHINGTON, D.C. (June 8) - The Consumer Product Safety Commission Friday said it has    
                        learned from the Tappan Company of a defect in certain Tappan Company built-in gas ovens manufactured for installation primarily in  
                        mobile homes and other newly constructed homes. The Commission said the defect --a possible loose fitting connection-- could result  
                        in a hazard to consumers from gas leakage. The ovens in question were manufactured between January 23, 1973, and May 31, 1973, by    
                        the F.H. Lawson Company, Springfield, Tennessee, for the Tappan Company. While the majority of the ovens carry the "Tappan" name,    
                        some were produced for Montgomery Ward and carry the brand-name "Signature." According to Tappan, the problem occurred in some       
                        15,185 ovens, of which 11,974 have been shipped from the place of manufacture. The ovens carry model numbers AZ12-1239; AZ12-1249;   
                        AZ12-1441; AKTM-3850; and AKTM-3893. Some 8,172 of the ovens-- model number AZ12-1239-- are used exclusively in mobile homes. The    
                        Montgomery Ward ovens--"Signature"--carry the "AKTM" model number prefix. CPSC Chairman Richard 0. Simpson said the Commission is    
                        asking wholesalers, distributors, and retailers who may have the affected ovens in stock or installed in mobile homes or newly       
                        constructed homes immediately to contact the Tappan Company. In addition, Simpson said consumers who may have purchased a Tappan or  
                        "Signature" built-in oven with one of the model numbers in question should contact their local gas company or the nearest Tappan     
                        dealer regarding possible gas leakage between the shut-off valve and the pressure regulator. The Commission was told by Tappan that  
                        consumers can locate the model number on the oven by opening the top oven door and looking at the metal strip at the top of the oven 
                         which carries the model number on it. CPSC investigators have been sent to the Tappan headquarters, Mansfield, Ohio, and to the     
                        F.H. Lawson Company to determine whether the remedial action proposed by Tappan is appropriate and to determine the degree of        
                        potential hazard to consumers.                                                                                                       
 URL                    https://www.cpsc.gov/Recalls/1973/Consumer-Product-Safety-Commission-Says-Some-Tappan-Built-In-Ovens-May-Pose-Hazard-To-Consumers    
 Title                  Tappan Built-In Ovens Warning                                                                                                        
 ConsumerContact                                                                                                                                             
 LastPublishDate        2015-02-03T00:00:00                                                                                                                  
                        ╭───┬──────────────────────────────────────────────┬─────────────┬───────┬────────────────────────────────┬────────────┬─────╮       
 Products                # │                     Name                     │ Description │ Model │              Type              │ CategoryID │ ... │       │
                        ├───┼──────────────────────────────────────────────┼─────────────┼───────┼────────────────────────────────┼────────────┼─────┤       
                         0                                                                    Ovens/Stoves/Ranges/Microwaves  2056        ...        
                         1  Montgomery Ward Signature built-in gas ovens                                                                  ...        
                         2  Tappan built-in gas ovens                                                                                     ...        
                        ╰───┴──────────────────────────────────────────────┴─────────────┴───────┴────────────────────────────────┴────────────┴─────╯       
 Inconjunctions         [list 0 items]                                                                                                                       
 Images                 [list 0 items]                                                                                                                       
 Injuries               [list 0 items]                                                                                                                       
                        ╭───┬─────────────────┬───────────╮                                                                                                  
 Manufacturers           # │      Name       │ CompanyID │                                                                                                  │
                        ├───┼─────────────────┼───────────┤                                                                                                  
                         0  F.H. Lawson                                                                                                                  
                         1  Tappan                                                                                                                       
                         2  Montgomery Ward                                                                                                              
                        ╰───┴─────────────────┴───────────╯                                                                                                  
 Retailers              [list 0 items]                                                                                                                       
 Importers              [list 0 items]                                                                                                                       
 Distributors           [list 0 items]                                                                                                                       
 SoldAtLabel                                                                                                                                                 
 ManufacturerCountries  [list 0 items]                                                                                                                       
 ProductUPCs            [list 0 items]                                                                                                                       
 Hazards                [list 0 items]                                                                                                                       
 Remedies               [list 0 items]                                                                                                                       
 RemedyOptions          [list 0 items]                                                                                                                       
╰───────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

There are fields that contain dates, but they’re encoded as strings. I’ll convert those to a “datetime” which has convenient properties I’ll use later…

# In-place convert the RecallDate column with a `date`
$recalls = ( $recalls | upsert RecallDate {|r| ($r.RecallDate | into datetime)} )

Nice. If I skim all of them (not shown, it’s long) I get something that is still human readable. Looking at the last few, I can see that Nu is auto-converting the date to a more friendly format.

$recalls | last 3 | get RecallDate
╭───┬──────────────╮
 0  50 years ago 
 1  50 years ago 
 2  50 years ago 
╰───┴──────────────╯

If I wanted I could change the date format by overriding the default:

~> $env.config.datetime_format = {normal: "%Y-%m-%d %H:%M:%S", table: "%Y-%m-%d"}
~> $recalls | last 3 | get RecallDate
╭───┬────────────╮
 0  1973-08-21 
 1  1973-08-20 
 2  1973-06-08 
╰───┴────────────╯

That’s good. Now I’ll do the same conversion on the “LastPublishDate”:

let maybe_ok = (
  $recalls | upsert LastPublishDate {|r| ($r.LastPublishDate | into datetime)}
)

That seems to have worked but if I list them:

Error: nu::shell::only_supports_this_input_type

  × Input type not supported.
   ╭─[entry #77:1:1]
 1  mut recalls = (http get "http://www.saferproducts.gov/RestWebServices/Recall?format=json")
   ·                         ────────────────────────────────┬────────────────────────────────
   ·                                                         ╰── input type: nothing
   ╰────
   ╭─[entry #89:1:1]
 1  let maybe_ok = ($recalls | upsert LastPublishDate {|r| ($r.LastPublishDate | into datetime)})
   ·                                                                              ──────┬──────
   ·                                                                                    ╰── only string and integer input data is supported
   ╰────

Two things here, the error message and why the error message appeared when it did. The error indicates that the field was empty (neither string nor integer) and where that data came from. This is a really nice feature of Nu. I believe the reason the error was emitted on output is that Nu defers non-fatal evaluation errors in some cases (This seems inconsistent with Nu’s goal of having early and clear errors, so my understanding is likely wrong).

I’ll try that again, this time checking for null and using the RecallDate if there is no LastPublishDate:

recalls = ( 
    $recalls | 
      upsert LastPublishDate {|r| 
        let lpd = $r.LastPublishDate;
        if ($lpd != null) { echo ($lpd | into datetime) } else { echo ($r.RecallDate) } } )

Now if I list all the records I don’t get an error (though I have a bit less confidence in the quality of the data).

The date type has, as I mentioned, nice properties for selecting records.

# How many recalls in the last year?
$recalls | where RecallDate >= ((date now) - 52wk) | length
315

# How many in 2022?
$recalls | where RecallDate >= ("2022-01-01" | into datetime) and RecallDate <= ("2022-12-31" | into datetime) | length
292

# In the last three years?
$recalls | where RecallDate >= ((date now) - (3 * 52wk)) | length
814

Notice that I used a date unit wk and could perform math on it and on the date. Neat. I’m not concerned with timezones and leap years and such here.

With a few manipulations, I can aggregate them by date. I’ll pull out the year into a new record, group, then tally. Note I’m extracting a string Year from the record and losing the date type. It’s okay for this example.

$recalls |
  each {|r| {RecallYear: ( $r.RecallDate | date to-record | get Year )}} |
  group-by RecallYear | 
  items {|k,v| {Year: $k, Recalls: ($v | length)} }
╭────┬──────┬─────────╮
  # │ Year │ Recalls │
├────┼──────┼─────────┤
  0  2023      237 
  1  2022      292 
  2  2021      219 
  3  2020      257 
  4  2019      241 
  5  2018      258 
  6  2017      280 
  7  2016      330 
  8  2015      305 
  9  2014      296 
 10  2013      290 
 11  2012      310 
 12  2011      313 
 13  2010      362 
 14  2009      383 
 15  2008      392 
 16  2007      452 
 17  2006      320 
 18  2005      323 
 19  2004      278 
 20  2003      216 
 21  2002      255 
 22  2001      246 
 23  2000      193 
 24  1999      156 
 25  1998      133 
 26  1997      151 
 27  1996      138 
 28  1995      128 
 29  1994      109 
 30  1993       75 
 31  1992       82 
 32  1991       84 
 33  1990       78 
 34  1989       80 
 35  1988       75 
 36  1987       49 
 37  1986       46 
 38  1985       30 
 39  1984       44 
 40  1983       23 
 41  1982       23 
 42  1981       12 
 43  1980       22 
 44  1979       33 
 45  1978       43 
 46  1977       36 
 47  1976       27 
 48  1975       13 
 49  1974       14 
 50  1973        8 
├────┼──────┼─────────┤
  # │ Year │ Recalls │
╰────┴──────┴─────────╯

Let’s search for some well known brand names:

$recalls | where Description =~ "John Deere" | length
51

$recalls | where Description =~ "Frigidaire" | length
24

$recalls | where Description =~ "General Electric" | length
24

$recalls | where Description =~ "Samsung" | length
9

The last one looks interesting, and short enough to show, let’s restructure it to extract just some of the fields:

$recalls | 
  where Description =~ "Samsung" |
  each {|x| {Name: $x.Products.Name,
             NumberOfUnits: $x.Products.NumberOfUnits}}
╭───┬───────────────────────────────────────────────────────────────────────────┬──────────────────────────────────────────────────────────────────────────────╮
 # │                                   Name                                    │                                NumberOfUnits                                 │
├───┼───────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────┤
 0  ╭───┬───────────────────────────────────╮                                  ╭───┬───────────────╮                                                        
     0  Samsung Top-Load Washing Machines                                    0  About 663,500                                                         
    ╰───┴───────────────────────────────────╯                                  ╰───┴───────────────╯                                                        
 1  ╭───┬─────────────────────────────────────╮                                ╭───┬────────────────────────╮                                               
     0  Batteries in refurbished cellphones                                  0  About 10,200 batteries                                                
    ╰───┴─────────────────────────────────────╯                                ╰───┴────────────────────────╯                                               
 2  ╭───┬───────────────────────────────────╮                                  ╭───┬───────────────────╮                                                    
     0  Samsung top-load washing machines                                    0  About 2.8 million                                                     
    ╰───┴───────────────────────────────────╯                                  ╰───┴───────────────────╯                                                    
 3  ╭───┬──────────────────────────────────╮                                   ╭───┬──────────────────────────────────────────────────────────────────────╮ 
     0  Samsung Galaxy Note7 smartphones                                     0  About 1.9 million (This includes the 1 million Galaxy Note7s          
    ╰───┴──────────────────────────────────╯                                       recalled on September 15, 2016)                                       
                                                                               ╰───┴──────────────────────────────────────────────────────────────────────╯ 
 4  ╭───┬───────────────────────────────────────────────────────────────────╮  ╭───┬─────────────────╮                                                      
     0  Samsung Galaxy Note7 smartphones sold prior to September 15, 2016    0  About 1 million                                                       
    ╰───┴───────────────────────────────────────────────────────────────────╯  ╰───┴─────────────────╯                                                      
 5  ╭───┬────────────────────────────────────────╮                             ╭───┬──────────────╮                                                         
     0  Samsung Over-the-Range Microwave Ovens                               0  About 43,000                                                          
    ╰───┴────────────────────────────────────────╯                             ╰───┴──────────────╯                                                         
 6  ╭───┬─────────────────────────────────╮                                    ╭───┬───────────────╮                                                        
     0  Samsung "Jitterbug" Cell Phones                                      0  About 160,000                                                         
    ╰───┴─────────────────────────────────╯                                    ╰───┴───────────────╯                                                        
 7  ╭───┬────────────────────────────────────────╮                             ╭───┬──╮                                                                     
     0  Maytag Front Loading Washing Machines                                0                                                                        
     1  Samsung Front Loading Washing Machines                               1                                                                        
    ╰───┴────────────────────────────────────────╯                             ╰───┴──╯                                                                     
 8  ╭───┬───────────────────────────────────────╮                              ╭───┬──╮                                                                     
     0  Maytag Front-Loading Washing Machines                                0                                                                        
    ╰───┴───────────────────────────────────────╯                              ╰───┴──╯                                                                     
╰───┴───────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────╯

That’s ugly, let’s clean that up…

$recalls | 
  where Description =~ "Samsung" | 
  each {|x| {Name: ($x.Products.Name | str join ", "),
             NumberOfUnits: ($x.Products.NumberOfUnits | str join "")}}
╭───┬───────────────────────────────────────────────────────────────────────────────┬──────────────────────────────────────────────────────────────────────────╮
 # │                                     Name                                      │                              NumberOfUnits                               │
├───┼───────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────┤
 0  Samsung Top-Load Washing Machines                                              About 663,500                                                            
 1  Batteries in refurbished cellphones                                            About 10,200 batteries                                                   
 2  Samsung top-load washing machines                                              About 2.8 million                                                        
 3  Samsung Galaxy Note7 smartphones                                               About 1.9 million (This includes the 1 million Galaxy Note7s recalled on 
                                                                                    September 15, 2016)                                                     
 4  Samsung Galaxy Note7 smartphones sold prior to September 15, 2016              About 1 million                                                          
 5  Samsung Over-the-Range Microwave Ovens                                         About 43,000                                                             
 6  Samsung "Jitterbug" Cell Phones                                                About 160,000                                                            
 7  Maytag Front Loading Washing Machines, Samsung Front Loading Washing Machines                                                                           
 8  Maytag Front-Loading Washing Machines                                                                                                                   
╰───┴───────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────╯

How about baby strollers? There are two fields of interest, Description and Products.Type. This search is somewhat time consuming…

let $strollers = (
  $recalls | 
    where ($it.Description =~ "(?i)stroller" or 
          ($it.Products.Type | any {|x| $x =~ "(?i)stroller"})))

$strollers | length
166

What about trampolines? I don’t care about assembling a record, so I’ll just extract the columns I want as a list:

let $trampolines = ($recalls | where ($it.Description =~ "(?i)trampoline"))

$trampolines | length
16

$trampolines | sort-by RecallDate | first | get RecallID RecallDate Description
╭───┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 0                                                                                                                                                      2284 
 1  1999-12-07                                                                                                                                               
 2  Trampoline Safety Net Enclosures - Inc., of Saratoga, Calif., is recalling about 2,300 trampoline safety net enclosures. The safety net enclosures are   
    sold separately to be attached to trampolines that measure 10 to 14 feet in diameter.                                                                    
╰───┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

$trampolines | sort-by RecallDate | last | get RecallID RecallDate Description
╭───┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 0                                                                                                                                                      8630 
 1  2019-08-01                                                                                                                                               
 2  This recall involves Super Jumper 14-foot trampolines, and 14-foot and 16-foot combo trampolines with enclosures sold without reinforcement clamps. The  
    Super Jumper logo is printed on the center of the trampolines' mat. The trampolines have a black mat, gold springs and a blue pad covering the springs.  
    The 14-foot models have six metal legs and the 16-foot combo has eight metal legs.                                                                       
╰───┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

I wonder if the infamous lawn dart is there… We’ll trim the description and tidy up the list of manufacturers and distributors.

let $jarts = $recalls |
        where Description =~ "lawn dart" |
        each {|r| {RecallDate: $r.RecallDate,
                   Description: ($r.Description | str substring 0..50),
                   Manufacturers: ($r.Manufacturers | get Name | collect {|x| ($x | str join ",") }),
                    Distributors: ($r.Distributors | get Name | collect {|x| ($x | str join ",")})}}

$jarts
╭────┬─────────────┬─────────────────────────────────────────────────────┬────────────────────────────────────┬────────────────────────────────────────────────╮
  # │ RecallDate  │                     Description                     │           Manufacturers            │                  Distributors                  │
├────┼─────────────┼─────────────────────────────────────────────────────┼────────────────────────────────────┼────────────────────────────────────────────────┤
  0  2020-08-12   The recalled lawn dart sets were sold for 2, 4 and   Crown Darts UK, of United Kingdom                                                  
  1  2008-10-02   The recalled lawn dart game includes four projecti                                       John Jaques & Son Ltd., of Edenbridge, England 
  2  1989-04-27   Franklin Sports Industries, Inc, Recalls All Of It   Franklin Sports                                                                    
╰────┴─────────────┴─────────────────────────────────────────────────────┴────────────────────────────────────┴────────────────────────────────────────────────╯

I thought only fellow Gen-Xers would have seen these toys but someone brought them back. And again a few years ago. What could possibly go wrong?