This is a bit of a geeky thread, so please ask for clarifications if we forget to explain an acronym or library jargon...

We'd like to link the data created about people, plays and other works, runs of performances, genres, etc to other datasets. This might be relatively easy for works, but would be a lot harder for people (which 'Mr Smith' do you mean?). Ideally, we'd be able to crowdsource the process of linking or verifying computational links to other datasets.

To make a start, here are some projects with relevant data or identifiers:

The original Ensemble at NYPL http://ensemble.nypl.org/

Ensemble at Yale (historic programmes from performances at Yale) http://ensemble.yale.edu/#/

AusStage: https://www.ausstage.edu.au/pages/learn/about/data-models.html
https://www.ausstage.edu.au/pages/learn/about/data-scope.html

There was talk of a UK version of AusStage but I'm not sure if that went ahead.

Theatricalia has 'data for all RSC productions since its foundation in 1879, all Birmingham Repertory Theatre productions up to and including 1971 when it moved to its current location in Centenary Square, the wide variety of productions held by the University of Bristol Theatre Archive' https://theatricalia.com/about

ISNI is 'the ISO certified global standard number for identifying the millions of contributors to creative works and those active in their distribution, including researchers, inventors, writers, artists, visual creators, performers, producers, publishers, aggregators, and more' http://www.isni.org/

And then there's dbpedia, wikidata, etc...

Are there any quick wins? What about longer term tasks?

    One of my hopes/plans for all of this is to generate a triple store to model the many interesting relationships we have within every page (let alone the whole collection).

    The disambiguation of names will indeed be a tricky task and one that I currently have no idea how to solve! Any ideas there would be welcome :)

      We could look at models from prosopographical projects (one of my favourite words, but as it's obscure I'd suggest https://en.wikipedia.org/wiki/Prosopography for a definition) as we'll just have to live with the fact that sometimes it'll be impossible to untangle one Mr Smith's performances from another!

        Hi,
        I've listed some relevant data models here: https://www.wikidata.org/wiki/Wikidata:WikiProject_Performing_arts/Data_structure

        I guess the Swiss Performing Arts Data Model (https://old.datahub.io/dataset/360db967-078c-4eca-bcd4-58bb5870753f/resource/84430732-6ce9-4b0b-af1f-5d11f9d4b994/download/spadatamodelv0-1pre-release20170524.pdf) would cover most of your modelling needs – maybe at the exception of describing animal performers (neither FRBR nor RiC forsee an agent sub-class "animal").
        We may be able to provide an experimental RDF ontology based on the same data model soon.

        As far as Wikidata is concerned, I've started to look into how best to map a dataset from Schauspielhaus Zurich, containing data about all their productions staged during 30 years to existing elements within Wikidata (https://www.wikidata.org/wiki/Wikidata:WikiProject_Performing_arts/Data_sources/CH). The goal is to ingest that dataset (or at least some central parts thereof) into Wikidata in order to create a show case for the ingestion of production data.

        We have got another dataset with 55'000+ productions from Switzerland waiting to be ingested. Disambiguation of person names will indeed be a great challenge, and I'm not entirely sure yet how to tackle it. One approach would consist in solving the disambiguation issues within the database external to Wikidata before ingesting the data. Another approach would consist in ingesting one batch after another into Wikidata – let's say all the productions of a given theatre over 50 years. In between the batches we could then search programmatically for unlikely cases (the same person appearing in several places simultaneously or the same person appearing over a very long time span). Furthermore, we would also need to search for cases where we suspect a name change or different spellings of the same name (i.e. the same person showing up in the database under different names, leading to several WD items per person). Suspected cases could be flagged and worked through by online contributors or staff at theatre archives with access to complementary information. If we do it in Wikidata, we would have to carry out this process iteratively, one batch after the other, in order not to flood the database with too many unverified entries, whereas if we were to do it outside Wikidata, it might be possible to do the statistical analyses and plausibility tests on the whole dataset at once.

          bestermann Thanks Beat! We'll take a look at the Swiss model and see how far we get with modelling works with the data we have.

          Cheers, Mia

            7 days later

            "Prosopography" is a completely new word to me, but a quick lookup makes me think of XKCD like narrative charts, which I've been meaning to try out for some time ( https://github.com/abcnews/d3-layout-narrative ). Has anyone looked at this in context of playbills, and rather than analysing narrative as scenes in play, replacing scenes with s/thing like productions?

            --tony

              That's a great question, psychemedia!

              I meant to say that Theatricalia is another possible source of historic performance data e.g. https://theatricalia.com/play/1kw/damon-and-pythias mentions performances of Damon and Pythias in Bristol but our project also has one in Plymouth http://access.bl.uk/item/viewer/ark:/81055/vdc_100022589092.0x000002#?cv=144&c=0&m=0&s=0&z=-8.1464%2C784.8403%2C1793.797%2C1483.089

              It also provides evidence (the author is 'Shiel') that could enhance the Theatricalia entry.

                6 months later
                4 months later

                This is fascinating, as someone who is researching a theatrical family tree (where the main branch is 'Young') I have experience of attempting to untangle the performers with common names. When your starting point is a family tree it's easier as your initial data is likely to come from parish records, and census returns for later periods, which at least can provide you with some form of map as to which locations to focus on. When starting from playbill data I think it's more challenging, but building up a dataset of performers covering dates and locations will make it easier to see patterns and also start to help to distinguish between individuals sharing the same name, although this is simpler when at least one of the individuals is well-known, but can also be overcome with additional information taken from local newspapers (adverts and personal announcements).
                I do think there's a lot of potential as to what could be learnt from this type of investigation regarding individual careers and also the general migration of performers through the various theatres and circuits.

                @Frisby thanks for your comments!

                We'd considered building a 'personal names transcription' task when developing the project. However, in discussing it with a group of volunteers working with theatrical archives in Nottingham, it seemed there mightn't be much interest in the task from potential contributors, in part because of the size of the task (so many names per playbill!) and because the listed names weren't easily linked to individual performers.

                I'm hoping that advances in technology, together with the structured data about performances created through this site, will mean we can automate the process of finding personal names and create a dataset like the one you describe. I'd hope there'd be a way to include information that people have uncovered through family history research so that all credit is given and the provenance of the identifications are clear.

                  a month later

                  Before Alex left he set up some Jupyter Notebooks that provide access to some of the underlying data from the project. With a bit of patience and/or programming experience, you can adapt them to get other data from the project: https://mybinder.org/v2/gh/libcrowds/notebooks/master?urlpath=lab

                  The Notebooks currently available are:
                  An Introduction to the LibCrowds Annotations Data Model
                  An Introduction to Analysing LibCrowds Results Data Using Python
                  An Introduction to Visualising In the Spotlight Data Using Python
                  Visualising In the Spotlight Data Over Time

                  If you use or adapt these we'd love to hear about your experiences.

                    2 months later
                    12 days later

                    I'm convinced that a spread of quantitative and qualitative tags can give you a flexible outcome for your readers and researchers. I know that my work may seem very specific and the datasets can appear to be tailored to the particular question I was working on, but it might be worth scoping out a few more 'questions' to see whether this selection of datasets could work for other researchers - e.g., I want to assess the extent to which the acquisition of the Saturday half-holiday in the 1890s by working-class audiences impacted on the work of Victorian theatre managers in London. I can look at tags for location, date, the main piece/supporting piece, days of the week, anything in the 'specials' area that indicates 'movement' in management practice, such as price changes. Working with tens of thousands of bills it became clear that it was the unusual amidst the routine that revealed underlying decisions/assumptions about worth, cultural cohesion, etc.. - the trick is to have identified what is routine. All good wishes for the project, Barbara.

                    • mia likes this.
                    Write a Reply...