← Back to Index

Chapter 6: File Operations

File I/O, directory operations & data format processing

1 File Read/Write

Ruby provides multiple approaches for file I/O. The most common are File.read/File.write for simple operations and File.open with a block for more control. The block form automatically closes the file handle.

Reading Files

content = File.read('config.txt')
puts content

content = File.read('data.txt', encoding: 'UTF-8')

lines = File.readlines('log.txt')
lines.each_with_index do |line, i|
  puts "#{i + 1}: #{line.chomp}"
end

lines = File.readlines('log.txt', chomp: true)

File.foreach('large_file.log') do |line|
  puts line if line.include?('ERROR')
end

Writing Files

File.write('output.txt', "Hello, Ruby!\nLine 2\n")

File.write('log.txt', "#{Time.now} - New entry\n", mode: 'a')

File.open('report.txt', 'w') do |f|
  f.puts 'Report Title'
  f.puts '=' * 40
  10.times { |i| f.puts "Item #{i + 1}: #{rand(100)}" }
end

File Open Modes

Mode Description
'r'Read only (default). File must exist.
'w'Write only. Creates file or truncates existing.
'a'Append only. Creates file if not exists.
'r+'Read and write. File must exist.
'w+'Read and write. Truncates or creates.
'a+'Read and append. Creates if not exists.

File.open with Block (Idiomatic Ruby)

File.open('data.bin', 'rb') do |f|
  header = f.read(4)
  puts "File size: #{f.size} bytes"
  puts "Current position: #{f.pos}"
  f.seek(0, IO::SEEK_SET)
end

result = File.open('numbers.txt', 'r') do |f|
  f.readlines(chomp: true).map(&:to_i).sum
end
puts "Sum: #{result}"

⚑ Always Use Block Form

File.open with a block ensures the file is closed automatically when the block exits, even if an exception occurs. This is equivalent to Python's with open() or Go's defer f.Close().

2 Directory Operations

Ruby provides Dir for directory listing and creation, FileUtils for recursive operations, and Pathname for object-oriented path manipulation.

Creating & Listing Directories

Dir.mkdir('output') unless Dir.exist?('output')

require 'fileutils'
FileUtils.mkdir_p('path/to/nested/dir')

entries = Dir.entries('.')
puts entries.reject { |e| e.start_with?('.') }

Dir.foreach('.') do |entry|
  next if entry.start_with?('.')
  type = File.directory?(entry) ? 'DIR' : 'FILE'
  puts "#{type}: #{entry}"
end

Dir.glob (Pattern Matching)

ruby_files = Dir.glob('**/*.rb')
puts "Found #{ruby_files.size} Ruby files"

Dir.glob('app/**/*.{rb,erb}').each do |path|
  puts path
end

Dir.glob('log/*.log').sort_by { |f| File.mtime(f) }.reverse.each do |f|
  puts "#{f} β€” #{File.size(f)} bytes β€” #{File.mtime(f)}"
end

configs = Dir['config/**/*.yml']

FileUtils

require 'fileutils'

FileUtils.cp('source.txt', 'backup.txt')
FileUtils.cp_r('src_dir', 'dest_dir')

FileUtils.mv('old_name.txt', 'new_name.txt')

FileUtils.rm('temp.txt')
FileUtils.rm_rf('build/')
FileUtils.rm_f('maybe_exists.txt')

FileUtils.chmod(0o755, 'script.sh')
FileUtils.touch('marker.txt')

Pathname (Object-Oriented Paths)

require 'pathname'

path = Pathname.new('/home/user/projects/app/config.yml')

puts path.basename      # config.yml
puts path.extname       # .yml
puts path.dirname       # /home/user/projects/app
puts path.parent        # /home/user/projects/app
puts path.basename('.*') # config (without extension)

new_path = path.parent / 'database.yml'
puts new_path           # /home/user/projects/app/database.yml

puts path.exist?
puts path.file?
puts path.directory?
puts path.absolute?

Pathname.new('.').children.each do |child|
  puts "#{child} (#{child.file? ? 'file' : 'dir'})"
end

πŸ”„ Cross-Language Comparison

  • Ruby Pathname β‰ˆ Python pathlib.Path β€” both provide OO path manipulation with / operator.
  • Node.js path β€” functional API (path.join(), path.resolve()).
  • Go filepath β€” similar functional approach: filepath.Join(), filepath.Walk().

3 CSV Processing

Ruby's standard library includes the csv module for reading and writing CSV files. It supports headers, type conversion, and streaming for large files.

Reading CSV

require 'csv'

data = CSV.read('users.csv')
data.each { |row| puts row.inspect }

users = CSV.read('users.csv', headers: true)
users.each do |row|
  puts "#{row['name']} β€” #{row['email']} (age: #{row['age']})"
end

CSV.foreach('large_data.csv', headers: true) do |row|
  process(row) if row['status'] == 'active'
end

Writing CSV

CSV.open('output.csv', 'w') do |csv|
  csv << ['name', 'email', 'age']
  csv << ['Alice', 'alice@example.com', 28]
  csv << ['Bob',   'bob@example.com',   32]
  csv << ['Carol', 'carol@example.com', 25]
end

csv_string = CSV.generate do |csv|
  csv << ['id', 'product', 'price']
  csv << [1, 'Ruby Book', 29.99]
  csv << [2, 'Keyboard',  89.99]
end
puts csv_string

CSV with Converters

CSV.foreach('data.csv', headers: true, converters: :numeric) do |row|
  puts row['price'].class  # Float instead of String
end

custom_converter = ->(value) { value == 'true' ? true : value == 'false' ? false : value }

CSV.foreach('flags.csv', headers: true, converters: [custom_converter]) do |row|
  puts row['active'].class  # TrueClass or FalseClass
end

CSV Transformation

input  = CSV.read('input.csv', headers: true)
output = input.select { |row| row['age'].to_i >= 18 }
               .sort_by { |row| row['name'] }

CSV.open('filtered.csv', 'w') do |csv|
  csv << input.headers
  output.each { |row| csv << row }
end

puts "Filtered #{input.size} β†’ #{output.size} rows"

4 JSON Processing

Beyond parsing API responses (covered in Chapter 5), JSON is commonly used for configuration files and data interchange on disk.

Read & Write JSON Files

require 'json'

config = JSON.parse(File.read('config.json'), symbolize_names: true)
puts config[:database][:host]

data = {
  app_name: 'MyApp',
  version:  '2.1.0',
  database: {
    host: 'localhost',
    port: 5432,
    name: 'myapp_production'
  },
  features: %w[auth logging cache]
}

File.write('config.json', JSON.pretty_generate(data))

File.write('compact.json', data.to_json)

Streaming Large JSON

File.open('records.jsonl', 'r') do |f|
  f.each_line do |line|
    record = JSON.parse(line, symbolize_names: true)
    process_record(record)
  end
end

File.open('output.jsonl', 'w') do |f|
  records.each do |record|
    f.puts record.to_json
  end
end

πŸ’‘ JSON Lines (JSONL) Format

For large datasets, JSONL (one JSON object per line) is more memory-efficient than a single large JSON array. Each line can be parsed independently, making it suitable for streaming and log processing.

5 YAML Processing

YAML is Ruby's native configuration format β€” used extensively in Rails (database.yml, routes.yml) and many Ruby tools. The yaml module is part of the standard library.

Read & Write YAML

require 'yaml'

config = YAML.load_file('config.yml', permitted_classes: [Symbol])
puts config['database']['host']
puts config['database']['port']

config = {
  'app' => {
    'name'    => 'MyApp',
    'version' => '2.1.0',
    'debug'   => false
  },
  'database' => {
    'adapter'  => 'postgresql',
    'host'     => 'localhost',
    'port'     => 5432,
    'database' => 'myapp_prod',
    'pool'     => 10
  },
  'redis' => {
    'url' => 'redis://localhost:6379/0'
  }
}

File.write('config.yml', YAML.dump(config))

Configuration Pattern with Environments

# config/database.yml
default: &default
  adapter: postgresql
  encoding: unicode
  pool: 5

development:
  <<: *default
  database: myapp_dev
  host: localhost

production:
  <<: *default
  database: myapp_prod
  host: db.example.com
  pool: 25
all_config = YAML.load_file('config/database.yml')
env = ENV.fetch('RACK_ENV', 'development')
db_config = all_config[env]

puts "Connecting to #{db_config['database']} on #{db_config['host']}"
puts "Pool size: #{db_config['pool']}"

ERB + YAML (Dynamic Config)

require 'yaml'
require 'erb'

template = File.read('config.yml.erb')
rendered = ERB.new(template).result
config   = YAML.safe_load(rendered)

πŸ”’ YAML Security

  • Use YAML.safe_load instead of YAML.load when loading untrusted input β€” it restricts deserialization to basic types.
  • In Ruby 3.1+, YAML.load requires permitted_classes for non-basic types.
  • Never load YAML from user input without safe_load β€” arbitrary object instantiation is a serious vulnerability.

6 File Metadata

Ruby provides rich methods for inspecting file attributes β€” existence, type, size, timestamps, and permissions.

Existence & Type Checks

puts File.exist?('config.yml')       # true/false
puts File.file?('config.yml')        # true if regular file
puts File.directory?('lib')          # true if directory
puts File.symlink?('link.txt')       # true if symbolic link
puts File.readable?('secrets.yml')   # true if readable
puts File.writable?('output.txt')    # true if writable
puts File.executable?('script.sh')   # true if executable
puts File.zero?('empty.txt')         # true if zero-length

Size & Timestamps

puts File.size('data.csv')            # bytes
puts File.mtime('app.rb')            # last modified time
puts File.atime('app.rb')            # last access time
puts File.ctime('app.rb')            # last status change time
puts File.birthtime('app.rb')        # creation time (macOS/Windows)

stat = File.stat('app.rb')
puts "Size: #{stat.size} bytes"
puts "Mode: #{stat.mode.to_s(8)}"
puts "Owner UID: #{stat.uid}"

Practical: Directory Size Calculator

def dir_size(path)
  Dir.glob(File.join(path, '**', '*'))
     .select { |f| File.file?(f) }
     .sum { |f| File.size(f) }
end

def format_size(bytes)
  units = %w[B KB MB GB TB]
  return '0 B' if bytes.zero?

  exp = (Math.log(bytes) / Math.log(1024)).to_i
  exp = units.size - 1 if exp >= units.size
  format('%.1f %s', bytes.to_f / (1024**exp), units[exp])
end

path = ARGV[0] || '.'
total = dir_size(path)
puts "Total size of #{path}: #{format_size(total)}"

Dir.glob(File.join(path, '*')).sort_by { |f| File.size(f) }.reverse.first(10).each do |f|
  size = File.file?(f) ? File.size(f) : dir_size(f)
  type = File.directory?(f) ? 'πŸ“' : 'πŸ“„'
  puts "  #{type} #{format_size(size).rjust(10)} #{File.basename(f)}"
end

πŸ”„ Cross-Language Comparison

  • Ruby File.exist? β‰ˆ Python os.path.exists() β‰ˆ Node.js fs.existsSync() β‰ˆ PHP file_exists()
  • Ruby Dir.glob β‰ˆ Python glob.glob() β‰ˆ Node.js glob β‰ˆ PHP glob()
  • Ruby FileUtils β‰ˆ Python shutil β‰ˆ Node.js fs-extra

7 Chapter Summary

πŸ“– File Read/Write

File.read/File.write for simple operations, File.open with block for guaranteed cleanup.

πŸ“ Directory Operations

Dir.glob for pattern matching, FileUtils for copy/move/delete, Pathname for OO paths.

πŸ“Š CSV Processing

CSV.read with headers, CSV.foreach for streaming, converters for automatic type casting.

πŸ“‹ JSON Files

JSON.parse(File.read(...)) to load, JSON.pretty_generate for formatted output, JSONL for streaming.

βš™οΈ YAML Config

YAML.load_file for reading, YAML.dump for writing. Use safe_load for untrusted input.

πŸ” File Metadata

File.exist?, File.size, File.mtime, File.stat for inspecting file attributes and permissions.

Next Chapter Preview: Chapter 7 covers Ruby's ecosystem tools β€” Bundler deep dive, RSpec testing, Rake build automation, popular gems, and code quality tools like RuboCop.