Working with Files

Last updated on June 30th, 2024 at 03:20 pm

Standard File Operations
- Opening
- Reading
- Writing
CSV Files
- Digging in CSV files
YAML Files
JSON Files
File Utilities

Standard File Operations

Ruby can natively read and write many different file types. Common to all file types is the need to open, read, write and close.

Opening

A file is opened in Ruby using the open method, which in its simplest form just takes a file name and attempts to open the file for reading.

my_file = File.open(filename)

The open method can take parameters and options, however. For example, if you want to open a file for read only, then you would add the mode parameter ‘r’.

my_file = File.open(filename, 'r')

You can also open the file for writing by using ‘w’ on the mode parameter.

my_file = File.open(filename, 'w')

The mode parameters available to you are:

Mode	Purpose
‘r’	Read only, starts reading at the beginning of the file
‘r+’	Read-write, starts at the beginning of the file
‘w’	Write only. Truncates the file if it already exists. Creates a new file if one does not exist.
‘w+’	Read-write. Truncates the file if it already exists. Creates a new file if one does not exist.
‘a’	Write-only. Each write call appends data to end of file. Creates a new file if one does not exist.
‘a+’	Read-write. Each write call appends data to end of file. Creates a new file if one does not exist.
	Additional Modes – must be used in accompaniment with previous modes
‘b’	Binary file mode. Will not use EOL conversion. Sets external encoding to ASCII-8BIT by default
‘t’	Text file mode.

File Modes when opening

For example, let’s assume we want to open a file write-only and binary.

my_file = File.open(filename, 'wb')

Reading

If you want to open a file and read in one statement, then the read method is just what you’re looking for. It will open, read and close the file in one statement.

my_data = File.read("some_data.txt")

The structure of the data in the assigned variable is dependent upon the data being read. If you are reading simple ASCII text, then the data will be a string separated by newlines (\n) .

If your application requires that you separate the open, read and close operations, you can break these into their separate methods.

my_file = File.open('some_data.txt')
my_data = my_file.readlines;
my_file.close

If your application requires that you read the file one line at a time, then you can use the readline method instead of the readlines method.

my_file = File.open('some_data.txt')
while (my_line = my_file.readline) 
 ... do something useful with the data just read in ...
end
rescue EOFError
my_file.close

One quick note – if you are looking for a summary of commands available to operate on Files, you not only have the File class to look into, but also the IO class because File inherits from IO.

Writing

Writing to a file is just as simple as reading from it, with the difference that the file has to be opened with the write option.

my_file = File.open('some_data.txt')
my_file.write 'text to write'
my_file.close

CSV Files

CSV is a very commonly used file format and support for it is included in the standard Ruby library (you will, however, have to insert the statement require 'csv' prior to using the CSV class) and it operates very much the same as the File class.

require 'csv'

my_file = CSV.open('some_data.csv')
while (my_line = my_file.readline) 
 ... do something useful with the data just read in ...
end
rescue EOFError # for example...
  # Do something to rescue if you want
my_file.close

Another, more convenient way of reading in a CSV file would be to use the foreach method.

require 'csv'

CSV.foreach(filename, headers: true).with_index(1) do |row, row_index|
    row.each_with_index do |(header, value), index|
      # do something with the row
    end
end

In the example above, if you have an error processing the CSV file, you can use row_index to tell the user what row the problem occurred on. Because the iteration uses .with_index(1), which tells the system to use 1 as the starting row number, we don’t have to use row_index + 1 when telling the user which row the problem occurred on. Opening the file with the headers: true option means that each row will be read in as a hash with the column header as the key and the column value as the value, which allows you to use the column header in your error message as well.

You may also notice that using the foreach iterator, the file is automatically closed after the file read is completed.

Digging in CSV files

In the Ruby Basics section, we covered using the dig method for arrays, hashes and structs. You can also use the dig command with CSV files. There are two options for the dig method within CSV data – digging tables and digging rows.

To dig a table, you would supply both the row number and the column name:

require 'csv'

# Sample CSV content
# name,age,city
# Alice,30,New York
# Bob,25,Los Angeles
# Charlie,35,Chicago

csv_text = <<~CSV
  name,age,city
  Alice,30,New York
  Bob,25,Los Angeles
  Charlie,35,Chicago
CSV

csv_table = CSV.parse(csv_text, headers: true)

# Using dig to access data in CSV::Table
alice_city = csv_table.dig(0, 'city')
bob_age = csv_table.dig(1, 'age')

puts "Alice's city: #{alice_city}" # => Alice's city: New York
puts "Bob's age: #{bob_age}" # => Bob's age: 25

To dig in a row, you simply supply the column name:

require 'csv'

# Sample CSV content
# name,age,city
# Alice,30,New York
# Bob,25,Los Angeles
# Charlie,35,Chicago

csv_text = <<~CSV
  name,age,city
  Alice,30,New York
  Bob,25,Los Angeles
  Charlie,35,Chicago
CSV

csv_table = CSV.parse(csv_text, headers: true)

# Using dig to access data in CSV::Row
alice_row = csv_table[0]
bob_row = csv_table[1]

alice_age = alice_row.dig('age')
bob_city = bob_row.dig('city')

puts "Alice's age: #{alice_age}" # => Alice's age: 30
puts "Bob's city: #{bob_city}" # => require 'csv'

# Sample CSV content
# name,age,city
# Alice,30,New York
# Bob,25,Los Angeles
# Charlie,35,Chicago

csv_text = <<~CSV
  name,age,city
  Alice,30,New York
  Bob,25,Los Angeles
  Charlie,35,Chicago
CSV

csv_table = CSV.parse(csv_text, headers: true)

# Using dig to access data in CSV::Row
alice_row = csv_table[0]
bob_row = csv_table[1]

alice_age = alice_row.dig('age')
bob_city = bob_row.dig('city')

puts "Alice's age: #{alice_age}" # => Alice's age: 30
puts "Bob's city: #{bob_city}" # => Bob's city: Los Angeles

Similar to arrays and hashes, if you dig for an element that is not found, dig will return a nil instead of an error.

require 'csv'

# Sample CSV content with missing fields
# name,age,city
# Alice,30,New York
# Bob,,Los Angeles
# Charlie,35,

csv_text = <<~CSV
  name,age,city
  Alice,30,New York
  Bob,,Los Angeles
  Charlie,35,
CSV

csv_table = CSV.parse(csv_text, headers: true)

# Using dig to safely access data that might be missing
bob_age = csv_table.dig(1, 'age') || 'Unknown'
charlie_city = csv_table.dig(2, 'city') || 'Unknown'

puts "Bob's age: #{bob_age}" # => Bob's age: Unknown
puts "Charlie's city: #{charlie_city}" # => Charlie's city: Unknown

YAML Files

Ruby also supports reading of YAML files as part of the standard library. As with the CSV support, you will need to include the YAML support with a require statement. Because YAML files are not line oriented, you will open, read everything in / write everything out, and close the file in a single method.

To read YAML files, you will need to use the load_file method.

require 'yaml'

my_yml_data = YAML.load_file('some_data.yml')

Writing to a YAML file is a bit more complex. First, you need to dump an object to a YAML object and then write the object to a file.

require 'yaml'

Auto = Struct.new(:make, :model, :year)
audi = Auto.new('Audi', 'A1', 2004)
serialized_auto = YAML.dump(audi)
File.write('Autos.yml', serialized_auto)

Alternatively, you can use to_yaml instead of the dump method.

require 'yaml'

Auto = Struct.new(:make, :model, :year)
yaml_auto = Auto.new('Audi', 'A1', 2004).to_yaml
File.write('Autos.yml', yaml_auto)

JSON Files

JSON files are similar toYAML files, in that they contain structured data and cannot be read in one line at a time. Instead, you will need to read the contents in and use JSON to parse the contents of the file. The parsed contents are then a typical hash.

require 'json'

# Read the file
file_content = File.read('data.json')

# Parse the JSON content
data = JSON.parse(file_content)

# Now you can use the data
data['users'].each do |user|
  puts "User ID: #{user['id']}, Name: #{user['name']}, Email: #{user['email']}"
end

Normally, when creating a JSON file, it is formatted so that it is easier to read in a text editor. To pretty format output to JSON file, you will need to use the method pretty_generate.

require 'json'

# Convert the data to JSON format
json_data = JSON.pretty_generate('output.json')

# Write the JSON data to the file
File.open(file_path, 'w') do |file|
  file.write(json_data)
end

File Utilities

Below is a table containing an exhaustive list of the various file utilities available in Ruby. You will see that there is a great deal of duplication between various classes and some methods are just calling a method of the same name for a different class.

Method Name	Method Purpose
	File Class Methods
	Creation and Opening
new	Opens the file at the given `path` according to the given `mode`; creates and returns a new File object for that file.
open	Creates a new File object, via `File.new` with the given arguments.
	Reading and Writing
read	Reads the full contents of the file.
readlines	Reads and returns all remaining line from the stream; does not modify.
write	Writes each of the given `objects` to `self`, which must be opened for writing; returns the total number bytes written; each of `objects` that is not a string is converted via method `to_s`:
binread	Behaves like `IO.read`, except that the stream is opened in binary mode with ASCII-8BIT encoding.
binwrite	Behaves like `IO.write`, except that the stream is opened in binary mode with ASCII-8BIT encoding.
foreach	Calls the block with each successive line read from the stream.
	File Information
exist?	Return `true` if the named file exists.
file?	Returns `true` if the named `file` exists and is a regular file.
directory?	With string `object` given, returns `true` if `path` is a string path leading to a directory, or to a symbolic link to a directory; `false` otherwise:
size	Returns the size of `file_name`.
size?	Returns `nil` if `file_name` doesn’t exist or has zero size, the size of the file otherwise.
zero?	Returns `true` if the named file exists and has a zero size.
basename	Returns the last component of the filename given in file_name (after first stripping trailing separators), which can be formed using both `File::SEPARATOR` and `File::ALT_SEPARATOR` as the separator when `File::ALT_SEPARATOR` is not `nil`.
dirname	Returns all components of the filename given in file_name except the last one (after first stripping trailing separators). The filename can be formed using both `File::SEPARATOR` and `File::ALT_SEPARATOR` as the separator when `File::ALT_SEPARATOR` is not `nil`.
extname	Returns the extension (the portion of file name in `path` starting from the last period).
split	Splits the given string into a directory and a file component and returns them in a two-element array. See also `File::dirname` and `File::basename`.
join	Returns a new string formed by joining the strings using `"/"`.
expand_path	Converts a pathname to an absolute pathname. Relative paths are referenced from the current working directory of the process unless `dir_string` is given, in which case it will be used as the starting point. The given pathname may start with a “`~`”, which expands to the process owner’s home directory (the environment variable `HOME` must be set correctly). “`~`user” expands to the named user’s home directory.
absolute_path	Converts a pathname to an absolute pathname. Relative paths are referenced from the current working directory of the process unless dir_string is given, in which case it will be used as the starting point. If the given pathname starts with a “`~`” it is NOT expanded, it is treated as a normal directory name.
realpath	Returns the real (absolute) pathname of pathname in the actual filesystem not containing symlinks or useless dots.
identical?	Returns `true` if the named files are identical.
	File Manipulation
delete / unlink	Deletes the named files, returning the number of names passed as arguments. Raises an exception on any error. Since the underlying implementation relies on the `unlink(2)` system call, the type of exception raised depends on its error type (see linux.die.net/man/2/unlink) and has the form of e.g. Errno::ENOENT.
rename	Renames the given file to the new name. Raises a `SystemCallError` if the file cannot be renamed.
symlink	Creates a symbolic link called new_name for the existing file old_name. Raises a NotImplemented exception on platforms that do not support symbolic links.
link	Creates a new name for an existing file using a hard link. Will not overwrite new_name if it already exists (raising a subclass of `SystemCallError`). Not available on all platforms.
chmod	Changes permission bits on the named file(s) to the bit pattern represented by mode_int. Actual effects are operating system dependent (see the beginning of this section). On Unix systems, see `chmod(2)` for details. Returns the number of files processed.
chown	Changes the owner and group of the named file(s) to the given numeric owner and group id’s. Only a process with superuser privileges may change the owner of a file. The current owner of a file may change the file’s group to any group to which the owner belongs. A `nil` or -1 owner or group id is ignored. Returns the number of files processed.
truncate	Truncates the file file_name to be at most integer bytes long. Not available on all platforms.
utime	Sets the access and modification times of each named file to the first two arguments. If a file is a symlink, this method acts upon its referent rather than the link itself; for the inverse behavior see `File.lutime`. Returns the number of file names in the argument list.
	Access and Permission
readable?	Returns `true` if the named file is readable by the effective user and group id of this process.
writable?	Returns `true` if the named file is writable by the effective user and group id of this process.
executable?	Returns `true` if the named file is executable by the effective user and group id of this process.
readable_real?	Returns `true` if the named file is readable by the real user and group id of this process.
writable_real?	Returns `true` if the named file is writable by the real user and group id of this process.
executable_real?	Returns `true` if the named file is executable by the real user and group id of this process.
	File Times
atime	Returns the last access time for the named file as a `Time` object.
mtime	Returns the modification time for the named file as a `Time` object.
ctime	Returns the change time for the named file (the time at which directory information about the file was changed, not the file itself).
	FileUtils Module Methods
cp	Copies files.
mv	Moves entries.
rm	Removes entries at the paths in the given `list` (a single path or an array of paths) returns `list`, if it is an array, `[list]` otherwise.
rm_f	Equivalent to: FileUtils.rm(list, force: true, **kwargs)
rm_r	Removes entries at the paths in the given `list` (a single path or an array of paths); returns `list`, if it is an array, `[list]` otherwise.
rm_rf	Equivalent to: FileUtils.rm_r(list, force: true, **kwargs)
ln	Creates hard links.
ln_s	Creates symbolic links.
mkdir	Creates directories at the paths in the given `list` (a single path or an array of paths); returns `list` if it is an array, `[list]` otherwise.
mkdir_p	Creates directories at the paths in the given `list` (a single path or an array of paths), also creating ancestor directories as needed; returns `list` if it is an array, `[list]` otherwise.
rmdir	Removes directories at the paths in the given `list` (a single path or an array of paths); returns `list`, if it is an array, `[list]` otherwise.
chmod / chmod_R	Changes permissions on the entries at the paths given in `list` (a single path or an array of paths) to the permissions given by `mode`; returns `list` if it is an array, `[list]`. _R is for recursive operations.
chown / chown_R	Changes the owner and group on the entries at the paths given in `list` (a single path or an array of paths) to the given `user` and `group`; returns `list` if it is an array, `[list]`. _R is for recursive operations.
touch	Updates modification times (mtime) and access times (atime) of the entries given by the paths in `list` (a single path or an array of paths); returns `list` if it is an array, `[list]` otherwise.
	Dir Class Methods
pwd	Returns the path to the current working directory of this process as a string.
chdir	Changes the current working directory of the process to the given string. When called without an argument, changes the directory to the value of the environment variable `HOME`, or `LOGDIR`. `SystemCallError` (probably Errno::ENOENT) if the target directory does not exist.
home	Returns the home directory of the current user or the named user if given.
entries	Returns an array containing all of the filenames in the given directory. Will raise a `SystemCallError` if the named directory doesn’t exist.
foreach	Calls the block once for each entry in the named directory, passing the filename of each entry as a parameter to the block.
glob	Expands `pattern`, which is a pattern string or an `Array` of pattern strings, and returns an array containing the matching filenames. If a block is given, calls the block once for each matching filename, passing the filename as a parameter to the block.
mkdir	Makes a new directory named by string, with permissions specified by the optional parameter anInteger. The permissions may be modified by the value of `File::umask`, and are ignored on NT. Raises a `SystemCallError` if the directory cannot be created. See also the discussion of permissions in the class documentation for `File`.
rmdir / delete	Deletes the named directory. Raises a subclass of `SystemCallError` if the directory isn’t empty.
	IO Class Methods (Parent class of File)
read	Reads bytes from the stream; the stream must be opened for reading (see Access Modes):
write	Writes each of the given `objects` to `self`, which must be opened for writing (see Access Modes); returns the total number bytes written; each of `objects` that is not a string is converted via method `to_s`:
foreach	Calls the block with each successive line read from the stream.
popen	Executes the given command `cmd` as a subprocess whose $stdin and $stdout are connected to a new stream `io`.
sysopen	Opens the file at the given path with the given mode and permissions; returns the integer file descriptor.
copy_stream	Copies from the given `src` to the given `dst`, returning the number of bytes copied.
pipe	Creates a pair of pipe endpoints, `read_io` and `write_io`, connected to each other.
	Pathname Class Methods (To a large degree, duplicated in File class)
new	Create a `Pathname` object from the given `String` (or String-like object). If `path` contains a NULL character (`\0`), an `ArgumentError` is raised.
basename	Returns the last component of the path.
dirname	Returns all but the last component of the path.
extname	Returns the file’s extension.
exist?	Return `true` if the named file exists.
directory?	With string `object` given, returns `true` if `path` is a string path leading to a directory, or to a symbolic link to a directory; `false` otherwise:
file?	Returns `true` if the named `file` exists and is a regular file.
realpath	Returns the real (absolute) pathname for `self` in the actual filesystem.
join	Joins the given pathnames onto `self` to create a new `Pathname` object. This is effectively the same as using `Pathname#+` to append `self` and all arguments sequentially.
delete	Removes a file or directory, using `File.unlink` if `self` is a file, or `Dir.unlink` as necessary.
unlink	Removes a file or directory, using `File.unlink` if `self` is a file, or `Dir.unlink` as necessary.
rename	Rename the file.
chmod	Changes file permissions.
chown	Change owner and group of the file.
truncate	Truncates the file to `length` bytes.

Exception Handling

Metaprogramming

Table of Contents