Posted on 2020-11-25 15:29

Population and market integration in the Roman hinterland

This post is a based on our presentation at the EAA Annual Meeting in August 2020 (slides)

The Roman Empire was a remarkable achievement in world history: extending from the north of England to the Syrian Desert and Morocco at the height of its powers it integrated a population of some 60-100 million into one political, cultural and economic unit. For a few centuries this generated a remarkable growth in population.

To understand how this could be we have to focus on rural developments, and the best data for that are obtained by the many archaeological field surveys of the last half century. The problem using them was that their data classifications and data formats are often incompatible, obstructing a more aggregate analysis of bigger trends rather than mere local developments. To get around this an international consortium from La Sapienza, Groningen, Leiden, Melbourne, the British School at Rome, St Andrews and Durham has successfully pioneered the Roman Hinterland Project: the integration of three major survey datasets that cover the hinterland of the city of Rome, often even down to the level of individual sherds. We will use parts of this integrated dataset for our analysis of population trends and market integration in the immediate hinterland of Rome.

The following cells set-up the database connection, plotting facilities, and import the essentials. For this notebook a locally hosted (development) database is used. The ipython-sql extension is used to load SQL data directly in notebook cells.

In [1]:
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pysal.explore.esda import Moran
from pysal.lib import weights
from scipy.stats import triang
from shapely import wkb
In [2]:
%load_ext sql

%config SqlMagic.autopandas = True
%config SqlMagic.displaycon = False
%config = False

%sql postgresql://postgres@localhost/rhp?options=--search_path=rhp,public
In [3]:
%matplotlib inline

params = {'legend.fontsize': 10,
          'axes.labelsize': 12,
          'axes.titlesize': 16,
          'xtick.labelsize': 12,
          'ytick.labelsize': 12,
          'axes.titlepad': 15,
          'figure.figsize': (12, 12)}



Rural demographic trends are at least in principle recoverable from survey data. Once sites have been interpreted and dated, and household sizes established (either as point estimates, or distributions), one can obtain a population density estimate by dividing the number of sites times the household sizes by the surveyed areas and correcting for survey intensity. This estimate can then be extrapolated to arrive at an estimate for the entire region.

With older survey data sets parts of the data required might not be available. An issue we encounter in our integration project is that the precise surveyed areas have not always been registered, digitised, or the information has simply been lost. This makes it all but impossible to derive densities. Additionally, it is difficult to precisely estimate the household sizes of various site types, so here we resort to a range of [min, max] values, and an informed guess about the modal value, based on previous work by Witcher (2005) and Fentress (2009).

If we assume the recovery rate of sites is fairly uniform across a wider area population trends derived from the surveys area can serve as a proxy for the overall trends in the larger region. The summary statistics derived from Fentress' and Witcher's earlier work are shaped into a triangular distribution, which is commonly used in simulation studies when no further information is available.

In [4]:
%%sql habitation <<
    SELECT id_parent, id_interpretation
    FROM rhp.interpretation_hierarchy
    WHERE id_parent = 1  -- Habitation interpretation ID
    SELECT ih.id_parent, ih.id_interpretation
    FROM rhp.interpretation_hierarchy ih
    JOIN tree ON ih.id_parent = tree.id_interpretation

Returning data to local variable habitation
In [5]:
interps = tuple(int(v) for v in habitation['id_interpretation'].values)

Using these habitation interpretations, we select all sites where a habitation function has been argued for some period between 500 BC - 500 AD. The following query returns multiple records for each site, one for each period/interpretation pair. We only consider pairs where both the periodisation and interpretation are reasonably certain.

In [6]:
%%sql sites <<
SELECT, loc.id_origin, proj.abbreviation, per.start_year,
       per.end_year, loc_int.id_period, loc_int.id_interpretation
FROM locations as loc
JOIN location_types AS loc_typ ON loc.id_location_type = loc_typ.id_location_type
JOIN location_interpretations AS loc_int ON loc.id_location = loc_int.id_location
JOIN interpretations AS int ON loc_int.id_interpretation = int.id_interpretation
JOIN periods AS per ON loc_int.id_period = per.id_period
JOIN certainties AS per_cert ON per_cert.id_certainty = loc_int.id_period_certainty
JOIN certainties AS int_cert ON int_cert.id_certainty = loc_int.id_interpretation_certainty
JOIN projects AS proj ON loc.id_project = proj.id_project
WHERE int.id_project = 1  -- RHP project ID
    AND = 'site'
    AND loc_int.id_interpretation IN :interps  -- defined above
    AND ( = 'certain' OR = 'probable')
    AND ( = 'certain' OR = 'probable')
    AND per.start_year < 500 AND per.end_year > -500
Returning data to local variable sites

As can be seen in the next figure, most sites are (variants of) villas or farms.

In [7]:
ax =
ax.set(xlabel="Count", ylabel="Site type", title="Histogram of habitation site types");

We now set-up the distributions.

In [8]:
def triangular(a, c, b):
    I like to specify a triangular distribution as sloping up from a,  peaking
    at c, and sloping down to b. Of course, scipy expects these arguments in a
    slightly different fashion. 
    loc = a
    scale = b - a
    c = (c - loc) / scale

    return triang(loc=loc, scale=scale, c=c)

# Based on the work of Witcher (2005), for the most part. Some minor adjustments
# are my own. Villages might be a bit on the high end in light of the critique
# by Fentress (2009).
    "Village": triangular(50, 100, 200),
    "Villa rustica": triangular(15, 25, 50),  # same as Villa
    "Villa": triangular(15, 25, 50),
    "Town": triangular(500, 1500, 3000),
    "Small village": triangular(50, 75, 100),  # bit less than Village
    "Settlement": triangular(50, 100, 200),  # same as Village
    "Road station": triangular(200, 200, 500),
    "Rich farm": triangular(5, 8, 15),  # same as Farm
    "Large village": triangular(100, 150, 200),  # bit more than Village
    "Large site": triangular(50, 100, 200),  # same as Village
    "Large farm": triangular(5, 15, 15),
    "Hut": triangular(5, 8, 15),  # same as Farm
    "House or tomb": None,  # skip (are not that many anyway)
    "Habitation": triangular(5, 8, 15),  # same as Farm
    "Fortified settlement": triangular(50, 100, 200),  # same as Village
    "Farm": triangular(5, 8, 15),
    "Civitas": triangular(500, 1500, 3000),  # same as Town

We assume household sizes are drawn independently for each site from the appropriate distribution, and apply a Monte Carlo-like sampling approach to obtain some confidence intervals around the mean values.

The approach is shown below. We first create time series of site type counts, and then turn to sampling.

In [9]:
def site_types_by_project_and_period(interp):
    df = pd.DataFrame()

    for start_year in range(-500, 500, 50):
        end_year = start_year + 49

        if start_year == 0:
            start_year += 1

        rows = ([( == interp) 
                           & (sites.start_year <= start_year) 
                           & (sites.end_year >= start_year)]

        data = dict(start_year=int(start_year), end_year=int(end_year),
                    rhp=0, prp=0, tvp=0, rsp=0)

        for proj, count in rows.iteritems():
            data[proj] = count
            data["rhp"] += count  # RHP is sum of counts

        df = df.append(data, ignore_index=True)

    df['start_year'] = df['start_year'].astype(int)
    return df
In [10]:
def compute_site_trends_by_project(project):
    df = pd.DataFrame()

    for interpretation in
        series = site_types_by_project_and_period(interpretation)

        df["start_year"] = series["start_year"] + 25  # for presentation
        df[interpretation] = series[project]

    if project == "rsp":
        df["Farm"] += df["Large farm"]  # checked with Cristina

    return df
In [11]:
total = compute_site_trends_by_project("rhp")
rsp = compute_site_trends_by_project("rsp")
tvp = compute_site_trends_by_project("tvp")
prp = compute_site_trends_by_project("prp")

Now that we have time series of site type counts, we can turn to sampling and population plots.

In [12]:
In [13]:
def sample(df):
    years = {}

    for _, row in df.iterrows():
        samples = np.zeros(REPLICATIONS)

        for interp in
            if ESTIMATES[interp] is not None:
                size = (REPLICATIONS, int(row[interp]))
                realisations = ESTIMATES[interp].rvs(size=size)
                samples += np.sum(realisations, axis=1)

        years[row.start_year] = samples
    return years
In [14]:
rsp_years = sample(rsp)
tvp_years = sample(tvp)
prp_years = sample(prp)

total_years = {year: rsp_years[year] + tvp_years[year] + prp_years[year] 
               for year in rsp_years.keys()}
In [15]:
def plot_population(years, ax, name):
    ax.set(xlabel="Year", ylabel="Estimated population", title=name)   

    fig_data = pd.DataFrame()

    for start_year in range(-500, 500, 50):
        if start_year == 0:
            start_year += 1

        start_year += 25

        datum = dict(year=start_year, 
                     low_80=np.quantile(years[start_year], 0.1),
                     high_80=np.quantile(years[start_year], 0.9),
                     low_95=np.quantile(years[start_year], 0.025),
                     high_95=np.quantile(years[start_year], 0.975))

        fig_data = fig_data.append(datum, ignore_index=True)

    ax.plot(fig_data["year"], fig_data["mean"], label="Point estimate")

                    label="95% CI")

                    label="80% CI")

    # After
    leg = ax.legend()
In [16]:
_, axes = plt.subplots(2, 2, facecolor=(1, 1, 1))

plot_population(total_years, axes[0, 0], name="Total")
plot_population(tvp_years, axes[0, 1], name="Tiber valley")
plot_population(rsp_years, axes[1, 0], name="Suburbium")
plot_population(prp_years, axes[1, 1], name="Pontine region")


An estimate of population trends, with 80% and 95% confidence intervals. These trends are based only on the observed sites, and do not reflect total population counts for the entire region of study. Rather, they should be understood as showing relative developments, not absolute numbers. The data is split per project, as the scales differ significantly.

What is striking is that there is indeed a general pattern of higher population densities during the core period of Roman history from roughly 200 BC – AD 200, but already starting in the late fourth or early third century BC. However, it is also clear that not all regions followed exactly the same trajectory. For example, the suburbium in the immediate vicinity of the city of Rome already had a high population density pretty early on, but witnessed a decline during the fourth and third century BC. The explanation must be the growth of the neighbouring city of Rome itself, reflected in the virtual disappearance of small towns and villages in that region. In the Pontine region, the population peaks already before or around 200 BC, but remains high until at least AD 200.

In [17]:
def plot_site_counts(data, ax, name):        
    ax.plot(data["start_year"], data["Villa"], label="Villa")
    ax.plot(data["start_year"], data["Farm"], label="Farm")

    if name == "Suburbium":
        ax.plot(data["start_year"], data["Village"], label="Village")
    ax.set(xlabel="Year", ylabel="Number of sites", title=name)
In [18]:
_, axes = plt.subplots(2, 2, facecolor=(1, 1, 1))

plot_site_counts(total, axes[0, 0], name="Total")
plot_site_counts(tvp, axes[0, 1], name="Tiber valley")
plot_site_counts(rsp, axes[1, 0], name="Suburbium")
plot_site_counts(prp, axes[1, 1], name="Pontine region")


Number of recovered sites over time, by project and site type. We plot villas and farms, as those drive most of the population dynamics. For the suburbium, we also show the decline in the number of villages, which together with a decline in small towns explains the early downward trend in population.

As the figure above shows, the pattern of growth and decline was different for villas and farms: much of the population growth was in the growth of the number of villas, even though the small farms did not really disappear. However, it seems that the Tiber Valley and Suburbium projects did not catch many of the harder to observe farms, which might explain the relatively low number of farms. At the same time, the steep increase in the number of villas must have been due to close proximity to the urban market of the city of Rome. This will have resulted in a very high location rent, favouring more complex settlement patterns. In the Pontine region, the number of farms decreases already from the later first century BC, whereas villas follow a familiar albeit stumped pattern of expansion and decline. Here, the relative number of farms compared to villas appears more in line with expectation.

In conclusion, we can say that for a period the region witnessed significant demographic growth, mostly on a growing number of villas. How did the economy cope with this: did population growth depress living standards, or not?

Market integration

In this section we study the connection between local demographic growth and the integration of the local economy into a pan-Mediterranean market. For this, we use dated amphora sherds, as it is relatively straightforward to derive their provenances.

These provenances are registered in the database, but for the present analyses we are interested only in broad regions. Since the provenance typology has not yet been completely formalised for the RHP, we need to do some mapping ourselves, as below. This simplification results in the provenance groups of Africa, Central/South/North Italy, the Iberian peninsula, Gaul, the Aegean, Asia Minor, the Levant, and Egypt. Finally, there is a remaining category of Unknown provenances.

In [19]:
# Gijs had a look -
    'Tripolitania': 'Africa',
    'Tyrrhenian coastal region': 'Central Italy',
    'Tunisia': 'Africa',
    'Vesuvian area (Campania)': 'Central Italy', 
    'Tyrrhenian Coastal Region': 'Central Italy',
    'Modern Catalunya (Hispania Tarraconensis)': 'Iberian pen.', 
    'Baetica': 'Iberian pen.',
    'Calabria (Italy)': 'Southern Italy',
    'Aegean': 'Aegean',
    'Crete': 'Aegean',
    'Cilicia and Cyprus': 'Asia Minor',
    'Gallia Narbonensis': 'Gaul',
    'Tunisia, possibly wider Tripolitania': 'Africa',
    'Rhodes is a certain production centre, but type is likely to have been produced in the wider Aegean and western Asia Minor more generally.': 'Aegean',
    'Inland Central Tyrrhenian Italy (Umbria, Toscana)': 'Central Italy',
    'North Africa': 'Africa',
    'Modern Catalunya (ancient Hispania Tarraconensis)': 'Iberian pen.',
    'Lusitania': 'Iberian pen.',
    'Tunisia?': 'Africa',
    'Etruria': 'Central Italy',
    'Campania/Lazio': 'Central Italy',
    'Lusitania and Baetica': 'Iberian pen.',
    'North-Eastern Italy': 'Northern Italy',
    'Istria/Northern Italy and southern Adriatic coast of Italy': 'Northern Italy',
    'Unknown': 'Unknown', 
    'Italy?': 'Italy',
    'Tripolitania and Sicily': 'Africa',
    'Tunisia and Libya': 'Africa', 
    'southern Adriatic coast of Italy': 'Southern Italy',
    'Cos (Greece)': 'Aegean',
    'Cilicia/Cyprus': 'Asia Minor', 
    'Baetica/Northern Africa': 'Iberian pen.',
    'Brindisi (Apulia, Italy)': 'Southern Italy', 
    'Southern Tunisia': 'Africa', 
    'Rhodes': 'Aegean',
    'Empoli (Tuscany)': 'Central Italy', 
    'Egypt': 'Egypt', 
    'Palestine': 'Levant', 
    'Italy': 'Italy',
    'Emilia-Romagna (Italy)': 'Central Italy', 
    'Northern Africa': 'Africa',
    'Aegean?': 'Aegean',
    'Mauretania Caesarensis and Tunisia': 'Africa', 
    'Aegean Region?': 'Aegean',
    'North African?': 'Africa', 
    'Algeria (Mauretania Caesariensis)': 'Africa',
    'Tyrrhenian Italy': 'Central Italy', 
    'Central Italy': 'Central Italy', 
    'Sicily/Tyrrhenian Coast': 'Southern Italy',
    'Spain': 'Iberian pen.',
    'Tyrrhenian Coast/Central Italy': 'Central Italy', 
    'Eastern Mediterranean': 'Eastern Mediterranean',
    'Adriatic Italy': 'Central Italy', 
    'Western Asia Minor': 'Asia Minor', 
    'Sicily/Calabria': 'Southern Italy',
    'Chios and Knidos': 'Aegean', 
    'Gallia': 'Gaul', 
    'Adriatic/Central Italy': 'Central Italy',
    'Adriatic coast of Italy': 'Central Italy',
    'Campania': 'Central Italy'
In [20]:
%%sql amphorae <<
SELECT DISTINCT artefacts.id_artefact, artefact_forms.provenance,
                artefact_forms.start_year, artefact_forms.end_year, loc.point,
                loc.id_location, proj.id_project,
FROM artefacts
JOIN artefact_forms ON artefact_forms.id_artefact_form = artefacts.id_artefact_form
JOIN artefact_locations AS art_loc ON art_loc.id_artefact = artefacts.id_artefact
JOIN locations AS loc ON loc.id_location = art_loc.id_location
JOIN location_types AS loc_typ ON loc_typ.id_location_type = loc.id_location_type
JOIN projects AS proj ON proj.id_project = loc.id_project
WHERE artefact_forms.type = 'Amphora'
    AND NOT LIKE 'Unid%'  -- exclude unidentified types
    AND = 'site'
Returning data to local variable amphorae
In [21]:
def simplify_prov(prov):
    return PROVENANCES[prov]

amphorae.provenance = amphorae.provenance.apply(simplify_prov)

Central Italian (local) amphora sherds form the largest provenance group, as the following figure shows.

In [22]:
ax = amphorae.provenance.groupby(amphorae.provenance).count().plot.barh()
ax.set(xlabel="Count", ylabel="Provenance", title="Histogram of amphorae provenances");

We assume an amphora of a particular type is equally likely to have been produced at any point between the start and end years of its production range (uniform distribution). If we assume each sherd is deposited independently and instantaneously, the distribution in each year is Poisson binomial and we can simply sum the marginal probabilities to obtain an expectation of the number of deposited sherds per year. The result is given in the figure below, where we observe the now familiar pattern of expansion from about 200 BC, and decline from AD 200. Technically, the particularly stepped shape of this graph (and all following) should be understood as a consequence only of the data at hand, and our modelling choice: the hard cut-off for production years of each amphora type and the uniform distribution employed to model their temporal distribution do not allow for any smoothing. So attention should only be paid to the overall shape of the graph.

In [23]:
def get_sherd_count_distribution(amphorae):
    series = {}

    for provenance in amphorae.provenance.unique():
        if provenance == "Unknown":

        series[provenance] = [0 for _ in range(2000)]

        for idx, artefact in amphorae[amphorae.provenance == provenance].iterrows():
            if pd.isna(artefact.start_year) or pd.isna(artefact.end_year):

            prob = 1 / (int(artefact.end_year) - int(artefact.start_year))
            for year in range(int(artefact.start_year), int(artefact.end_year)):
                series[provenance][1000 + year] += prob

    return series
In [24]:
def map_distribution_to_series(distribution):
    series = pd.DataFrame(distribution)
    series.set_index(pd.Index(np.arange(-1000, 1000)), inplace=True)

    # Limits to the [-500, 500] year range
    series.drop(index=np.arange(-1000, -500), inplace=True)
    series.drop(index=np.arange(500, 1000), inplace=True)

    return series
In [25]:
distribution = get_sherd_count_distribution(amphorae)
exp_amphorae = map_distribution_to_series(distribution)

ax = exp_amphorae.sum(axis=1).plot(figsize=(12, 6))
ax.set(xlabel="Year", ylabel="E[# amphorae] per year");

Expected number of deposited amphora sherds per year.

Clearly, the growth in the number of amphora sherds can only partially be explained by the growth in population, as it is some three times larger than the growth in population. If amphora consumption is a good proxy for standards of living, the Roman hinterland enjoyed a significantly more elevated standard of living for some three centuries.

When we turn to trends in provenance given in the figure below, we observe an increasing integration into a pan-Mediterranean market from the later first century BC, peaking in the first to second centuries AD. Comparing the series of non-local amphorae and population trends, population appears to precede amphorae. We conclude the demand side develops first, with supply following later.

In [26]:
of_interest = exp_amphorae[["Africa", "Gaul", "Iberian pen.", "Aegean", "Central Italy"]]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))              

ax1.set(xlabel="Year", ylabel="E[# amphorae] per year")
ax2.set(xlabel="Year", ylabel="E[# amphorae] per year")

of_interest["Central Italy"].plot(ax=ax1)
of_interest.drop("Central Italy", axis=1).plot(ax=ax2);

Expected number of deposited amphora sherds per year, by provenance, with local central Italian amphorae on the left, and non-local amphorae on the right (redrawn to scale). Only provenances with >100 sherds in the database are shown.

The next step is to look at where these amphorae end up. This has both a geographic and economic dimension.

The first we answer by looking for differences in the spatial distribution of amphora sherds, answering whether there are clusters of lower or higher numbers. This is called spatial (auto)correlation, and can be formally tested. We use Moran's test, with various choices for spatial neighbourhoods. The test results are summarised in the figure below. Generally, there does not appear to be a strong spatial effect: when the neighbourhood is small, Moran's I is large, but not significant; when the neighbourhood is large, the spatial correlation is generally significant, but small. We conclude the area is fairly homogeneous.

In [27]:
def first(row):
    return row.iloc[0]

def _geom(point): 
    # From 
    return None if not point else wkb.loads(point, hex=True) 

non_local = amphorae.drop(amphorae[(amphorae.provenance == "Central Italy")
                                    | (amphorae.provenance == "Unknown")].index)

grouped = non_local.groupby(amphorae.id_location).agg({"point": first, 
                                                       "id_project": first, 
                                                       "provenance": "count",
                                                       "name": first})

grouped.point = grouped.point.apply(_geom)
grouped = gpd.GeoDataFrame(grouped, geometry=grouped.point)
In [28]:
data = []

for k in range(1, 100, 2):
    w_kernel = weights.distance.Kernel.from_dataframe(grouped, 

    moran = Moran(grouped.provenance, w_kernel)
    data.append({"p": moran.p_sim, "I": moran.I, "k": k})
data = pd.DataFrame(data)
C:\Users\niels\AppData\Local\Programs\Python\Python37\lib\site-packages\libpysal\weights\ UserWarning: The weights matrix is not fully connected: 
 There are 111 disconnected components.
C:\Users\niels\AppData\Local\Programs\Python\Python37\lib\site-packages\libpysal\weights\ UserWarning: The weights matrix is not fully connected: 
 There are 20 disconnected components.
C:\Users\niels\AppData\Local\Programs\Python\Python37\lib\site-packages\libpysal\weights\ UserWarning: The weights matrix is not fully connected: 
 There are 7 disconnected components.
C:\Users\niels\AppData\Local\Programs\Python\Python37\lib\site-packages\libpysal\weights\ UserWarning: The weights matrix is not fully connected: 
 There are 5 disconnected components.
C:\Users\niels\AppData\Local\Programs\Python\Python37\lib\site-packages\libpysal\weights\ UserWarning: The weights matrix is not fully connected: 
 There are 4 disconnected components.
C:\Users\niels\AppData\Local\Programs\Python\Python37\lib\site-packages\libpysal\weights\ UserWarning: The weights matrix is not fully connected: 
 There are 3 disconnected components.
C:\Users\niels\AppData\Local\Programs\Python\Python37\lib\site-packages\libpysal\weights\ UserWarning: The weights matrix is not fully connected: 
 There are 2 disconnected components.
In [29]:
_, ax = plt.subplots(figsize=(12, 6))

c = (data.p < 0.05).fillna(0).astype(int)
scatter = ax.scatter(data.k, data.I, c=c)

handles, _ = scatter.legend_elements()
legend = ax.legend(handles, ["No", "Yes"], title=r"$\alpha < 5\%$?")

ax.set(xlabel="Neighbourhood size", ylabel="Moran's I");
<string>:6: UserWarning: Warning: converting a masked element to nan.
C:\Users\niels\AppData\Local\Programs\Python\Python37\lib\site-packages\numpy\core\ UserWarning: Warning: converting a masked element to nan.
  return array(a, dtype, copy=False, order=order)
C:\Users\niels\AppData\Roaming\Python\Python37\site-packages\matplotlib\ UserWarning: Warning: converting a masked element to nan.
  s = self.format % xp

The second, economic and social, dimension we investigate by dividing the number of non-local amphora sherds obtained at a particular type of habitation by the number of sites of that habitation type. In particular, we plot these series for farms and villas, which account for the majority of sites.

In [30]:
%%sql site_amphorae <<
SELECT DISTINCT art.id_artefact, art_forms.provenance, art_forms.start_year,
                art_forms.end_year, loc.id_location,, art.id_project
FROM artefacts AS art
JOIN artefact_forms AS art_forms ON art_forms.id_artefact_form = art.id_artefact_form
JOIN artefact_locations AS art_loc ON art_loc.id_artefact = art.id_artefact
JOIN locations AS loc ON loc.id_location = art_loc.id_location
JOIN location_types AS loc_typ ON loc_typ.id_location_type = loc.id_location_type
JOIN location_interpretations AS loc_int ON loc_int.id_location = loc.id_location
JOIN interpretations AS int ON int.id_interpretation = loc_int.id_interpretation
WHERE art_forms.type = 'Amphora'
    AND NOT LIKE 'Unid%'
    AND = 'site'
    AND int.id_project = 1 -- RHP project ID
    AND IN ('Villa', 'Farm', 'Large farm')
Returning data to local variable site_amphorae
In [31]:
site_amphorae.provenance = site_amphorae.provenance.apply(simplify_prov)

flags = site_amphorae.provenance.isin(["Central Italy", "Unknown"])
site_amphorae.drop(site_amphorae[flags].index, inplace=True)

def get_series(*site_types):
    artefacts = site_amphorae[]
    distribution = get_sherd_count_distribution(artefacts)
    return map_distribution_to_series(distribution)

villa_series = get_series('Villa')
farm_series = get_series('Farm', 'Large farm')
In [32]:
df = pd.DataFrame()

for site_type in ["Farm", "Large farm", "Villa"]:
    counts = site_types_by_project_and_period(site_type)

    df["start_year"] = counts["start_year"]
    df[site_type] = counts["rhp"]

df["Farm"] += df["Large farm"]  # checked with Cristina
df.set_index("start_year", inplace=True)
In [33]:
def insert_sherd_count(sherds, name):
    sherd_count = np.zeros((20, 1))

    for idx, start_year in enumerate(range(0, 1000, 50)):
        sherd_count[idx] = sherds.iloc[start_year:start_year + 50]["total"].sum()

    df[name] = sherd_count

villa_series["total"] = villa_series.sum(axis=1)
farm_series["total"] = farm_series.sum(axis=1)

insert_sherd_count(villa_series, "villa_sherds")
insert_sherd_count(farm_series, "farm_sherds")
In [34]:
_, ax = plt.subplots(figsize=(12, 6))

ax.plot(df.index, df["villa_sherds"] / df["Villa"], label="Villa")
ax.plot(df.index, df["farm_sherds"] / df["Farm"], label="Farm")

ax.set(xlim=[-400, 250], xlabel="Year", ylabel="# sherds / # sites");

Average number of non-local amphora sherds observed at villas and farms.

Here, three historical observations follow:

  1. Trends for villas and farms are similar.
  2. There is an upward trend from about 300 BC to sometime later in the second century AD, after which the per site consumption goes down again.
  3. For a period of about three centuries amphora consumption was not only significantly more than before or after, but also connected to a pan-Mediterranean market, and this applied not only to villas but also to more modest farms.

We also have two methodological observations:

  1. The more datasets can be integrated the better. The data are still pretty thin for robust statistics from this sort of analyses.
  2. The more intensive the survey has been, the more robust the conclusions can be.


In conclusion, we can say that the integration of such large survey datasets allows for types of analysis that were not previously possible, but that even with the large integrated dataset that we now have we are at times still at the outer limit of what can be done robustly. More datasets are welcome, but only if they are of sufficient quality and survey intensity.

Historically, we do indeed witness the growth of the Italian population followed by increasing levels of personal consumption not only by the elite, and the economy’s integration into a wider pan-Mediterranean network, until its subsequent decline.


Fentress, E. "Peopling the Countryside: Roman Demography in the Albegna Valley and Jerba." In Quantifying the Roman Economy: Methods and Problems, by Bowman, Alan, and Andrew Wilson, eds., edited by Alan Bowman, and Andrew Wilson. Oxford: Oxford University Press, 2009.

Witcher, R. 2005. "The extended metropolis: Urbs, suburbium and population". Journal of Roman Archaeology. 18: 120.