Last updated on June 30th, 2024 at 03:20 pm
Table of Contents
Standard File Operations
Ruby can natively read and write many different file types. Common to all file types is the need to open, read, write and close.
Opening
A file is opened in Ruby using the open
method, which in its simplest form just takes a file name and attempts to open the file for reading.
my_file = File.open(filename)
The open method can take parameters and options, however. For example, if you want to open a file for read only, then you would add the mode parameter ‘r’.
my_file = File.open(filename, 'r')
You can also open the file for writing by using ‘w’ on the mode parameter.
my_file = File.open(filename, 'w')
The mode parameters available to you are:
Mode | Purpose |
‘r’ | Read only, starts reading at the beginning of the file |
‘r+’ | Read-write, starts at the beginning of the file |
‘w’ | Write only. Truncates the file if it already exists. Creates a new file if one does not exist. |
‘w+’ | Read-write. Truncates the file if it already exists. Creates a new file if one does not exist. |
‘a’ | Write-only. Each write call appends data to end of file. Creates a new file if one does not exist. |
‘a+’ | Read-write. Each write call appends data to end of file. Creates a new file if one does not exist. |
Additional Modes – must be used in accompaniment with previous modes | |
‘b’ | Binary file mode. Will not use EOL conversion. Sets external encoding to ASCII-8BIT by default |
‘t’ | Text file mode. |
For example, let’s assume we want to open a file write-only and binary.
my_file = File.open(filename, 'wb')
Reading
If you want to open a file and read in one statement, then the read method is just what you’re looking for. It will open, read and close the file in one statement.
my_data = File.read("some_data.txt")
The structure of the data in the assigned variable is dependent upon the data being read. If you are reading simple ASCII text, then the data will be a string separated by newlines (\n) .
If your application requires that you separate the open, read and close operations, you can break these into their separate methods.
my_file = File.open('some_data.txt')
my_data = my_file.readlines;
my_file.close
If your application requires that you read the file one line at a time, then you can use the readline
method instead of the readlines
method.
my_file = File.open('some_data.txt')
while (my_line = my_file.readline)
... do something useful with the data just read in ...
end
rescue EOFError
my_file.close
One quick note – if you are looking for a summary of commands available to operate on Files, you not only have the File class to look into, but also the IO class because File inherits from IO.
Writing
Writing to a file is just as simple as reading from it, with the difference that the file has to be opened with the write option.
my_file = File.open('some_data.txt')
my_file.write 'text to write'
my_file.close
CSV Files
CSV is a very commonly used file format and support for it is included in the standard Ruby library (you will, however, have to insert the statement require 'csv'
prior to using the CSV class) and it operates very much the same as the File class.
require 'csv'
my_file = CSV.open('some_data.csv')
while (my_line = my_file.readline)
... do something useful with the data just read in ...
end
rescue EOFError # for example...
# Do something to rescue if you want
my_file.close
Another, more convenient way of reading in a CSV file would be to use the foreach
method.
require 'csv'
CSV.foreach(filename, headers: true).with_index(1) do |row, row_index|
row.each_with_index do |(header, value), index|
# do something with the row
end
end
In the example above, if you have an error processing the CSV file, you can use row_index
to tell the user what row the problem occurred on. Because the iteration uses .with_index(1)
, which tells the system to use 1 as the starting row number, we don’t have to use row_index + 1
when telling the user which row the problem occurred on. Opening the file with the headers: true
option means that each row will be read in as a hash with the column header as the key and the column value as the value, which allows you to use the column header in your error message as well.
You may also notice that using the foreach
iterator, the file is automatically closed after the file read is completed.
Digging in CSV files
In the Ruby Basics section, we covered using the dig method for arrays, hashes and structs. You can also use the dig command with CSV files. There are two options for the dig method within CSV data – digging tables and digging rows.
To dig a table, you would supply both the row number and the column name:
require 'csv'
# Sample CSV content
# name,age,city
# Alice,30,New York
# Bob,25,Los Angeles
# Charlie,35,Chicago
csv_text = <<~CSV
name,age,city
Alice,30,New York
Bob,25,Los Angeles
Charlie,35,Chicago
CSV
csv_table = CSV.parse(csv_text, headers: true)
# Using dig to access data in CSV::Table
alice_city = csv_table.dig(0, 'city')
bob_age = csv_table.dig(1, 'age')
puts "Alice's city: #{alice_city}" # => Alice's city: New York
puts "Bob's age: #{bob_age}" # => Bob's age: 25
To dig in a row, you simply supply the column name:
require 'csv'
# Sample CSV content
# name,age,city
# Alice,30,New York
# Bob,25,Los Angeles
# Charlie,35,Chicago
csv_text = <<~CSV
name,age,city
Alice,30,New York
Bob,25,Los Angeles
Charlie,35,Chicago
CSV
csv_table = CSV.parse(csv_text, headers: true)
# Using dig to access data in CSV::Row
alice_row = csv_table[0]
bob_row = csv_table[1]
alice_age = alice_row.dig('age')
bob_city = bob_row.dig('city')
puts "Alice's age: #{alice_age}" # => Alice's age: 30
puts "Bob's city: #{bob_city}" # => require 'csv'
# Sample CSV content
# name,age,city
# Alice,30,New York
# Bob,25,Los Angeles
# Charlie,35,Chicago
csv_text = <<~CSV
name,age,city
Alice,30,New York
Bob,25,Los Angeles
Charlie,35,Chicago
CSV
csv_table = CSV.parse(csv_text, headers: true)
# Using dig to access data in CSV::Row
alice_row = csv_table[0]
bob_row = csv_table[1]
alice_age = alice_row.dig('age')
bob_city = bob_row.dig('city')
puts "Alice's age: #{alice_age}" # => Alice's age: 30
puts "Bob's city: #{bob_city}" # => Bob's city: Los Angeles
Similar to arrays and hashes, if you dig for an element that is not found, dig will return a nil instead of an error.
require 'csv'
# Sample CSV content with missing fields
# name,age,city
# Alice,30,New York
# Bob,,Los Angeles
# Charlie,35,
csv_text = <<~CSV
name,age,city
Alice,30,New York
Bob,,Los Angeles
Charlie,35,
CSV
csv_table = CSV.parse(csv_text, headers: true)
# Using dig to safely access data that might be missing
bob_age = csv_table.dig(1, 'age') || 'Unknown'
charlie_city = csv_table.dig(2, 'city') || 'Unknown'
puts "Bob's age: #{bob_age}" # => Bob's age: Unknown
puts "Charlie's city: #{charlie_city}" # => Charlie's city: Unknown
YAML Files
Ruby also supports reading of YAML files as part of the standard library. As with the CSV support, you will need to include the YAML support with a require
statement. Because YAML files are not line oriented, you will open, read everything in / write everything out, and close the file in a single method.
To read YAML files, you will need to use the load_file
method.
require 'yaml'
my_yml_data = YAML.load_file('some_data.yml')
Writing to a YAML file is a bit more complex. First, you need to dump an object to a YAML object and then write the object to a file.
require 'yaml'
Auto = Struct.new(:make, :model, :year)
audi = Auto.new('Audi', 'A1', 2004)
serialized_auto = YAML.dump(audi)
File.write('Autos.yml', serialized_auto)
Alternatively, you can use to_yaml
instead of the dump
method.
require 'yaml'
Auto = Struct.new(:make, :model, :year)
yaml_auto = Auto.new('Audi', 'A1', 2004).to_yaml
File.write('Autos.yml', yaml_auto)
JSON Files
JSON files are similar toYAML files, in that they contain structured data and cannot be read in one line at a time. Instead, you will need to read the contents in and use JSON to parse the contents of the file. The parsed contents are then a typical hash.
require 'json'
# Read the file
file_content = File.read('data.json')
# Parse the JSON content
data = JSON.parse(file_content)
# Now you can use the data
data['users'].each do |user|
puts "User ID: #{user['id']}, Name: #{user['name']}, Email: #{user['email']}"
end
Normally, when creating a JSON file, it is formatted so that it is easier to read in a text editor. To pretty format output to JSON file, you will need to use the method pretty_generate
.
require 'json'
# Convert the data to JSON format
json_data = JSON.pretty_generate('output.json')
# Write the JSON data to the file
File.open(file_path, 'w') do |file|
file.write(json_data)
end
File Utilities
Below is a table containing an exhaustive list of the various file utilities available in Ruby. You will see that there is a great deal of duplication between various classes and some methods are just calling a method of the same name for a different class.
Method Name | Method Purpose |
File Class Methods | |
Creation and Opening | |
new | Opens the file at the given path according to the given mode ; creates and returns a new File object for that file. |
open | Creates a new File object, via File.new with the given arguments. |
Reading and Writing | |
read | Reads the full contents of the file. |
readlines | Reads and returns all remaining line from the stream; does not modify. |
write | Writes each of the given objects to self , which must be opened for writing; returns the total number bytes written; each of objects that is not a string is converted via method to_s : |
binread | Behaves like IO.read , except that the stream is opened in binary mode with ASCII-8BIT encoding. |
binwrite | Behaves like IO.write , except that the stream is opened in binary mode with ASCII-8BIT encoding. |
foreach | Calls the block with each successive line read from the stream. |
File Information | |
exist? | Return true if the named file exists. |
file? | Returns true if the named file exists and is a regular file. |
directory? | With string object given, returns true if path is a string path leading to a directory, or to a symbolic link to a directory; false otherwise: |
size | Returns the size of file_name . |
size? | Returns nil if file_name doesn’t exist or has zero size, the size of the file otherwise. |
zero? | Returns true if the named file exists and has a zero size. |
basename | Returns the last component of the filename given in file_name (after first stripping trailing separators), which can be formed using both File::SEPARATOR and File::ALT_SEPARATOR as the separator when File::ALT_SEPARATOR is not nil . |
dirname | Returns all components of the filename given in file_name except the last one (after first stripping trailing separators). The filename can be formed using both File::SEPARATOR and File::ALT_SEPARATOR as the separator when File::ALT_SEPARATOR is not nil . |
extname | Returns the extension (the portion of file name in path starting from the last period). |
split | Splits the given string into a directory and a file component and returns them in a two-element array. See also File::dirname and File::basename . |
join | Returns a new string formed by joining the strings using "/" . |
expand_path | Converts a pathname to an absolute pathname. Relative paths are referenced from the current working directory of the process unless dir_string is given, in which case it will be used as the starting point. The given pathname may start with a “~ ”, which expands to the process owner’s home directory (the environment variable HOME must be set correctly). “~ user” expands to the named user’s home directory. |
absolute_path | Converts a pathname to an absolute pathname. Relative paths are referenced from the current working directory of the process unless dir_string is given, in which case it will be used as the starting point. If the given pathname starts with a “~ ” it is NOT expanded, it is treated as a normal directory name. |
realpath | Returns the real (absolute) pathname of pathname in the actual filesystem not containing symlinks or useless dots. |
identical? | Returns true if the named files are identical. |
File Manipulation | |
delete / unlink | Deletes the named files, returning the number of names passed as arguments. Raises an exception on any error. Since the underlying implementation relies on the unlink(2) system call, the type of exception raised depends on its error type (see linux.die.net/man/2/unlink) and has the form of e.g. Errno::ENOENT. |
rename | Renames the given file to the new name. Raises a SystemCallError if the file cannot be renamed. |
symlink | Creates a symbolic link called new_name for the existing file old_name. Raises a NotImplemented exception on platforms that do not support symbolic links. |
link | Creates a new name for an existing file using a hard link. Will not overwrite new_name if it already exists (raising a subclass of SystemCallError ). Not available on all platforms. |
chmod | Changes permission bits on the named file(s) to the bit pattern represented by mode_int. Actual effects are operating system dependent (see the beginning of this section). On Unix systems, see chmod(2) for details. Returns the number of files processed. |
chown | Changes the owner and group of the named file(s) to the given numeric owner and group id’s. Only a process with superuser privileges may change the owner of a file. The current owner of a file may change the file’s group to any group to which the owner belongs. A nil or -1 owner or group id is ignored. Returns the number of files processed. |
truncate | Truncates the file file_name to be at most integer bytes long. Not available on all platforms. |
utime | Sets the access and modification times of each named file to the first two arguments. If a file is a symlink, this method acts upon its referent rather than the link itself; for the inverse behavior see File.lutime . Returns the number of file names in the argument list. |
Access and Permission | |
readable? | Returns true if the named file is readable by the effective user and group id of this process. |
writable? | Returns true if the named file is writable by the effective user and group id of this process. |
executable? | Returns true if the named file is executable by the effective user and group id of this process. |
readable_real? | Returns true if the named file is readable by the real user and group id of this process. |
writable_real? | Returns true if the named file is writable by the real user and group id of this process. |
executable_real? | Returns true if the named file is executable by the real user and group id of this process. |
File Times | |
atime | Returns the last access time for the named file as a Time object. |
mtime | Returns the modification time for the named file as a Time object. |
ctime | Returns the change time for the named file (the time at which directory information about the file was changed, not the file itself). |
FileUtils Module Methods | |
cp | Copies files. |
mv | Moves entries. |
rm | Removes entries at the paths in the given list (a single path or an array of paths) returns list , if it is an array, [list] otherwise. |
rm_f | Equivalent to: FileUtils.rm(list, force: true, **kwargs) |
rm_r | Removes entries at the paths in the given list (a single path or an array of paths); returns list , if it is an array, [list] otherwise. |
rm_rf | Equivalent to: FileUtils.rm_r(list, force: true, **kwargs) |
ln | Creates hard links. |
ln_s | Creates symbolic links. |
mkdir | Creates directories at the paths in the given list (a single path or an array of paths); returns list if it is an array, [list] otherwise. |
mkdir_p | Creates directories at the paths in the given list (a single path or an array of paths), also creating ancestor directories as needed; returns list if it is an array, [list] otherwise. |
rmdir | Removes directories at the paths in the given list (a single path or an array of paths); returns list , if it is an array, [list] otherwise. |
chmod / chmod_R | Changes permissions on the entries at the paths given in list (a single path or an array of paths) to the permissions given by mode ; returns list if it is an array, [list] . _R is for recursive operations. |
chown / chown_R | Changes the owner and group on the entries at the paths given in list (a single path or an array of paths) to the given user and group ; returns list if it is an array, [list] . _R is for recursive operations. |
touch | Updates modification times (mtime) and access times (atime) of the entries given by the paths in list (a single path or an array of paths); returns list if it is an array, [list] otherwise. |
Dir Class Methods | |
pwd | Returns the path to the current working directory of this process as a string. |
chdir | Changes the current working directory of the process to the given string. When called without an argument, changes the directory to the value of the environment variable HOME , or LOGDIR . SystemCallError (probably Errno::ENOENT) if the target directory does not exist. |
home | Returns the home directory of the current user or the named user if given. |
entries | Returns an array containing all of the filenames in the given directory. Will raise a SystemCallError if the named directory doesn’t exist. |
foreach | Calls the block once for each entry in the named directory, passing the filename of each entry as a parameter to the block. |
glob | Expands pattern , which is a pattern string or an Array of pattern strings, and returns an array containing the matching filenames. If a block is given, calls the block once for each matching filename, passing the filename as a parameter to the block. |
mkdir | Makes a new directory named by string, with permissions specified by the optional parameter anInteger. The permissions may be modified by the value of File::umask , and are ignored on NT. Raises a SystemCallError if the directory cannot be created. See also the discussion of permissions in the class documentation for File . |
rmdir / delete | Deletes the named directory. Raises a subclass of SystemCallError if the directory isn’t empty. |
IO Class Methods (Parent class of File) | |
read | Reads bytes from the stream; the stream must be opened for reading (see Access Modes): |
write | Writes each of the given objects to self , which must be opened for writing (see Access Modes); returns the total number bytes written; each of objects that is not a string is converted via method to_s : |
foreach | Calls the block with each successive line read from the stream. |
popen | Executes the given command cmd as a subprocess whose $stdin and $stdout are connected to a new stream io . |
sysopen | Opens the file at the given path with the given mode and permissions; returns the integer file descriptor. |
copy_stream | Copies from the given src to the given dst , returning the number of bytes copied. |
pipe | Creates a pair of pipe endpoints, read_io and write_io , connected to each other. |
Pathname Class Methods (To a large degree, duplicated in File class) | |
new | Create a Pathname object from the given String (or String-like object). If path contains a NULL character (\0 ), an ArgumentError is raised. |
basename | Returns the last component of the path. |
dirname | Returns all but the last component of the path. |
extname | Returns the file’s extension. |
exist? | Return true if the named file exists. |
directory? | With string object given, returns true if path is a string path leading to a directory, or to a symbolic link to a directory; false otherwise: |
file? | Returns true if the named file exists and is a regular file. |
realpath | Returns the real (absolute) pathname for self in the actual filesystem. |
join | Joins the given pathnames onto self to create a new Pathname object. This is effectively the same as using Pathname#+ to append self and all arguments sequentially. |
delete | Removes a file or directory, using File.unlink if self is a file, or Dir.unlink as necessary. |
unlink | Removes a file or directory, using File.unlink if self is a file, or Dir.unlink as necessary. |
rename | Rename the file. |
chmod | Changes file permissions. |
chown | Change owner and group of the file. |
truncate | Truncates the file to length bytes. |