Intro
Lately I've been working on a project for the UConn admissions office. Part of the project was to create a WordPress plugin that would take a CSV file and read the rows into the database on a schedule. Which is easy enough right? The problem was that there were over 16,000 rows to insert....
So.
Obviously (even though I knew it wouldn't work) I tried just inserting them. But that took about 30 seconds to blow through the memory constraint on the server. So I want to share a few solutions I worked through first, and then explain why I landed on a generator function.
In this post you'll learn:
- Some solutions I considered and rejected
- What a PHP generator function is
- When and why to use a generator
- Potential downsides to generators
Possible (Rejected) Solutions
Batching
One choice was to set an arbitrary number of rows (say 500). Then inside a while
loop keep track of that. Then every time that limit is reached, restart some process. That would work, but seems inelegant.
File manipulation
Another choice was to read the file and then split it and write new files to the file system. Each one of those could have an arbitrary number of rows. Then I could read each file separately. But this didn't really feel right to me. What would happen if all of a sudden I lost write access to the directory the files needed to be in?
Batch insert
SQL can batch insert rows into a database and there are a few ways to do it. But it would mean writing a custom SQL query instead of relying on the WordPress core database methods. That's kind of the only down side. But, while I can write basic SQL queries, I was nervous about writing a complex query for one of the most important sies a university has....
Using A Generator
All that research led me to remember about generators. For this kind of operation, they're perfect.
A generator allows you to write code that uses foreach to iterate over a set of data without needing to build an array in memory, which may cause you to exceed a memory limit, or require a considerable amount of processing time to generate.
Perfect! A function that creates an array-like result but doesn't hit memory limits? Sign me up!
So how does this work? Let me show you some code. For a lot of projects, I've been using a (more) object oriented approach. So I'll show this in terms of a class.
Basically this class needs to be able to do three things:
- set itself up (including access to the WP database)
- read the file
- write the data
Set Up A Class
I wrote my own class to wrap the global $wpdb
variable. For now, I'm not going to get into that. But it made sense for this project and maybe in a later post I'll explain in more detail.
<?php
namespace MyPlugin;
use MyPlugin\Database;
class FileManager {
protected $db;
public function __construct() {
// set up the class
// prepare database access
$this->db = new Database();
}
}
Prepare A Method to Yield
Next, you need a method that will yield
instead of return
a value. Yeah... That's all you need to do to create a generator. So let's set that up and I'll explain as I go.
<?php
namespace MyPlugin;
// Generator is the core PHP class needed to use the yield keyword
use Generator
use MyPlugin\Database;
class FileManager {
protected $db;
public function __construct() {
// set up the class
// prepare database access
$this->db = new Database();
}
public function yieldData() {
// read the file and yield the data
}
}
Now, I'll zoom in on the yieldData
method. This method needs to
- read the file as a stream
if
the stream is valid...yield
the value for each rowwhile
there are rows to read. Note that I've typed the "return" value asGenerator
. That will keep me (or anyone else) from accidentally trying to usereturn
. I hope.- finally stop reading the file
<?php
public function yieldData(): Generator {
// filePath is defined in the constructor.
// the 'r' mode if for read only.
$stream = fopen($this->filePath, 'r');
// let's try this out
try {
// if the stream is good
if ($stream !== false) {
// get CSV data from the stream.
// assume each line is 1000 characters long
// break on tabs
// when we run out of data, stop.
while (($data = fgetcsv($stream, 1000, "\t")) !== false) {
// yield instead of return
yield $data;
}
}
} finally {
// close out the stream when we're done
fclose($stream);
}
}
Now, we can call this method with $this->yeildData()
and get all the data without building an array with all the memory constraints that go with that.
Write to the Database
Let's go back to the class and prepare a method that will write to the database. Then I'll zoom into that and walk through it.
<?php
namespace MyPlugin;
// Generator is the core PHP class needed to use the yield keyword
use Generator
use MyPlugin\Database;
class FileManager {
protected $db;
public function __construct() {
// set up the class
// prepare database access
$this->db = new Database();
}
public function yieldData() {
// read the file and yield the data
}
public function writeToDatabase() {
// write to the database
}
}
This method needs to do a few things.
- Set up a counter to return. Since I'm using this method in an API call, I want to make sure there's some kind of sensible response. Counting the number of rows is a good way to go.
- Get the
$rows
from theyieldData
method. This will prepare an iterator. - Insert the row and get the result. The WordPress
insert
method returns either an integer on success orfalse
on failure. This will... - In crease the count and return it.
<?php
// zooming in...
public function writeToDatabase() {
// prepare the counter
$count = 0;
// yield the data
$rows = $this->yieldData();
// loop!
foreach ($rows as $row) {
// add successful inserts to the count
$count += $this->db->insertRow('my_table', [
'school' => $row[0],
'schoolid' => "0",
'ext_subject' => $row[1],
'ext_number' => $row[2],
'int_subject' => $row[3],
'int_number' => $row[4],
'int_title' => substr($row[5], 0, 75),
'seq_number' => $row[6]
]);
}
// return the count
return $count;
}
Putting it Together
So now, the class looks like this.
<?php
namespace MyPlugin;
// Generator is the core PHP class needed to use the yield keyword
use Generator
use MyPlugin\Database;
class FileManager {
protected $db;
public function __construct() {
// set up the class
// prepare database access
$this->db = new Database();
}
public function yieldData(): Generator {
// filePath is defined in the constructor.
// the 'r' mode if for read only.
$stream = fopen($this->filePath, 'r');
// let's try this out
try {
// if the stream is good
if ($stream !== false) {
// get CSV data from the stream.
// assume each line is 1000 characters long
// break on tabs
// when we run out of data, stop.
while (($data = fgetcsv($stream, 1000, "\t")) !== false) {
// yield instead of return
yield $data;
}
}
} finally {
// close out the stream when we're done
fclose($stream);
}
}
public function writeToDatabase(): int {
$count = 0;
$rows = $this->yieldData();
foreach ($rows as $row) {
$count += $this->db->insertRow('my_table', [
'school' => $row[0],
'schoolid' => "0",
'ext_subject' => $row[1],
'ext_number' => $row[2],
'int_subject' => $row[3],
'int_number' => $row[4],
'int_title' => substr($row[5], 0, 75),
'seq_number' => $row[6]
]);
}
return $count;
}
}
Conclusion
Now, whenever I need to write the file contents to the database, I can just do this -
<?php
use MyPlugin\FileManager;
$fileManager = new FileManager();
$fileManager->writeToDatabase();
This all feels so much nicer and more stable to me than any of the other methods. Are there downsides to using generators? Well sure. They can be slower than arrays. In this case it doesn't matter since this isn't a public facing operation. If you really need to use the yielded result as an array, you have to use the iterator_to_array
PHP function. There's also an extra step if you need keys as well as values. But on the whole? For something like this, I think it's pretty perfect.
I hope you enjoyed this! Let me know what you think 🙂