# Loading Data



## epoch, batch size, iteration

* One **epoch** includes passing entire training dataset to network: one forward pass + one backpropagation pass. 

* The **batch size** is the number of samples included in a batch, that will be feeded into the network at once.

* An **iteration number** is the number of times you have to feed batch data into network to finish entire training data.

The relationship can be summarized as:

```
total # of samples = iteration * batch size
```


## The general form of training 

```
for i in range(# of epoches):
    for j in range(# of batches)
        ...
        model(this_batch)
        ...
```




## numpy

Let's say you have a CSV file, last column is the label, numpy can load the data as the following.


```
xy = np.loadtxt("sample_data.csv", delimiter=',', dtype=np.float32)
x_data = Variable(torch.from_numpy(xy[:, 0:-1])) # all rows, all columns except last one 
y_data = Variable(torch.from_numpy(xy[:, [-1]])) # all rows, only last column 

for epoch in range(10):
    y_pred = model(x_data)
    ...
```

Here, if you don't want to or **can't** feed all data into network all at once, you have to manually put them into batches. 

pytorch provided some utility function to simplify the process.

## pytorch


### define cutomized dataset 

```
class MyDataset(Dataset):
    def __init__(self):
        """ download and read data """
        xy = np.loadtxt("sample_data.csv", delimiter=',', dtype=np.float32)
        self.len = xy.shape[0]
        self.x_data = torch.from_numpy(xy[:, 0:-1])
        self.y_data = torch.from_numpy(xy[:, [-1]]) 

    def __getitem__(self, index):
        """ Given index, return data """
        return self.x_data[index], self.y_data[index]
    
    def __len__(self):
        return self.len 
```

### wrap it into DataLoader

```
dataset = MyDataset()
train_loader = DataLoader(dataset = dataset,
                            batch_size = 32,
                            shuffle = True,
                            num_workers = 2)
```

### use it 

```
for epoch in range(10):
    for i, data in enumerate(train_loader, 0):
        inputs, labels = data
        inputs, labels = Variable(inputs), Variable(labels)
        y_pred = model(inputs)

        ...
...
```
