MIT professor Erik Demaine shows in public domain curriculum
how forward time travel projection react with advanced data structures, and how
past variations impact the present. Idem with space domain. This is a crucial
variational view on big data representation in space and time. http://courses.csail.mit.edu/6.851/spring12/
Worthwhile analysing this Information Science modeling from
a big data economics perspective: how would the price of data represented in
structure considered evolve? How can the data structure support one or another
business model over time? Over space? Over statistical propulations?Imagine you
want to test the stability of a pricing model against changes. You can
introduce variations in the past versions of data structures, and see how they
propagate and impact the present data structures. This is already a degree of
abstraction, a layer of business and pricing model (linked/on top of data
structure) and and how it may evolve, subject to changes happening in the
Worthwhile analysing further...
The MIT course is highly commendable, deep notions are
presented in a very attractive way.
6.851: Advanced Data Structures (Spring'12)
TIME TRAVEL We can remember the past efficiently (a
technique called persistence), but in general it's difficult to change the past
and see the outcomes on the present (retroactivity). So alas, Back To The
Future isn't really possible. MEMORY...
Avoiding or reducing the impact of human made or
human-linked (epidemics) catastrophes like the Great Fire of London of 1666, or
the last cholera epidemics in the same city, and protecting populations against
known and monitored risks, depend on two data streams:
-monitoring and detecting events
in real time
-warning in real time.
The value attached to these is the protection of lives and
Insurance companies have their methodology to evaluate risks
and acceptable costs to prevent or reduce those risks. They sell insurance
products to individuals and organisations.
Governments have their own policy to manage, eliminate or
mitigate risks. Politicians are judged on their ability to manage risk
avoidance schemes, and when a catastrophe happens, on how they manage a crisis. Tuning the detection scheme pessimistically leads to
overprotecting, and high cost for no added benefit. Tuning the detection scheme optimistically may overlook
risky situations and under-dimension the response scheme. Instead of mathematical models of assumed probability
distribution under this hypothesis, multiple scenarios have to
be considered, and risk must be bounded with a lower and an upper bound,
leading to mathematical inequalities and multiple stochastic models.
Railway companies use a standard model, jointly developed by
them at ISO, where risk is categorised by the potential impact, the highest
being many lives at risk.
They can build on a long history, which has led to safe
railway journeys, with now very few accidents.
Here is a spectacular one of 1895, which claimed only one
The article below addresses
rights as they are observed in movies, and works or arts, exploring how the
underlying concepts may apply for big data economics.
The movie business model can be summarised as a succession
of windows of exploitation, and within each window rights can be sold and
bought with conditions of use attached (time interval, geography, potentially
number of users, platform of rendering). Typically a movie is released in one
or a handful of Premiere cinema theatres, then in exclusivity to a number of
cinema theatres in a given geography, then to all cinemas, then pay TV and
packaged media like Bluray or online pay service, then broadcast commercial or
public free to air TV.
[Of course it is more than this]
Assume an individual grants access to part of his/her personal data, say
biological and health parameters. The data set could be accessed under a
contract granting specified rights: scope and time window of use, with robust
anonymisation requirements (say that the data has to be used as part of a set
comprising at least xxx other subjects at each processing step).
Is this "right" business model, and its underlying organisation of
the market place robust to:
-reselling data set for later use, within agreed scope?
-retrieving subjects for later negotiation of changed scope (e.g. a
food&beverage company having interest to access a data set previously used
for health analysis)
-auditing the proper use by processing companies and their clients
-inserting mechanisms for deleting data sets after "do not use after"
In the digital world, data can be reproduced at negligible cost, hence what
matters is not the instantiation of a data parameter, but the source
"blueprint", the equivalent of a manuscript and not of the thousands
of printed books derived from this manuscript.
WORK OF ART
This leads us to
the model of a work of art, say a Van Gogh painting.
The asset can be made available to museums for exhibition (use limited in time
and geography, associated with an audience, or number of visitors, targeted or
recorded). It can also be "transcoded" into different
representations, as authorised photographs, reproductions, etc...
A single work of art can be valued over time as "junk", zero or
little, and up to enormous values.
As a category, French Impressionists were often not valued in France, but had
some early customers in the USA. Many years later, works despised earlier
reached huge values at auctions.
However, data seen as information may be more valuable at an early stage of its
life than at a later stage. A bottle of milk loses its whole value on the
"best before" date: it usually gets heavily discounted on the day,
and discarded at the end of that day.
News are normally expected to be fresh. For instance the current temperature is
useful to me now, the 3-days weather forecast is of interest to chose clothes
for a trip. After the trip, this past information has lost value. However, the
long tail business model for the exploitation of entertainment content like
movies or music recording, may also apply: the time series of the values of
source data may be of interest as history, and based on history, some forecast
estimates can be proposed (with associated uncertainty).
An electrocardiogramme database of people living in the 1950s may be
interesting to revisit in the 2050s, hence it should not be discarded.
There is probably a distinction to be made between the value of some freshly
acquired data, stored in cache memory, and the value of an archive.
Keeping and maintaining an archive has a cost, for instance transcoding from
legacy formats and systems to current ones for new use.
I recently moved offices, and did not want to move my
desktop displays: I had been offered new displays in the new location. I just
had to administratively transfer the ownership of the displays I was leaving to
another department at that location. Done, moved on.
*What silver coins
My grandmother gave me once a silver coin valued 5 French
Francs. I thought she had given me 5 FR I could spend. She got into the habit
of giving me more such coins, now and then, a few times per year.
I kept the coins in a drawer, thinking they were accumulated
as pocket money does, and I could spend them when the occasion would occur. I
remember that in those years you could get very nice vinyl records for 15 FR in
a Montparnasse shop (central Paris).
Now we got to speak about it with my grandmother, and it
became apparent that it was not the view nor intention of my grandmother that I
would spend this pocket money. The coins were given to me for… KEEPING. She saw
these as collection items. Silver coins with currency value were ambivalent: they
could be seen as 5 FR, or as a weight of silver valued as such. It was not
meant as a coin like any other, but as a
personally TRANSFERRED FROZEN ASSET from my grandmother as the GIVER to me as
the KEEPER, not exactly the happy recipient.
Much later, I read about the Bretton-Woods agreement, and
the following history of suspending the convertibility of currencies to gold,
starting with the dollar in 1971. The veil of the money and the veil of metal
convertibility of money, are wise explanations from economists for real
A main thing I would take for big data from this observed
case, is that when transferring source data, from one producer or other seller
to a buyer or user, the data as IT record of file is transferred, but it is
also transferred with economic and contractual/legal expectations and rules of
use. This also happens if the data is open or free.
Recorded music has shown us different patterns of transfer:
-open market, with competing
publishers, and users able to shop around as I did in Montparnasse
-direct peer to peer online,
-closed market places as iTune,
working as an integrated value chain.
*From organ donation
Organ donation brings us closer to personal data and
parameters (such as biological measurements):
-one donates a part of their body
-one expects with right that this
body part be used very carefully, with a genuine best effort to save someone’s
To avoid the grandmother’s syndrome above, organ donations
are anonymised, except in obvious cases such as direct and immediate donation
to a family member.
Well, closer even to real-time big data, flowing, blood
donations are also very carefully managed, end to end.
*And now, for big
The question this leads us to is: what happens to data once
it is transferred from one economic agent to the next? What “ownership” with
rights and responsibilities gets transferred? What is then a fair transfer
By the way, as long as the source data does not result (at
the processing stage considered) into an end-user service being sold, it is
free of VAT J.
Big Data business builds on data as THE key input. This raw material,
data sets, a new commodity, has received less attention from the Economists
than raw material, called commodity. What can we learn from Commodity Trading?
Market Places for Commodity, produced by farming or extracted by mining for
instance, provide for a longer Economic History span than primary or source
Cocoa has been studied by Economists, from the early use as a currency in pre-Colombian
America, then an energy drink used by Spanish explorers, brought back to Europe
and enjoyed as a precious drink from XVIth to XVIIIth century, then mixed with
milk in a Swiss process from XIXth century.
Minerals extracted through mining give a good analogy for data coming from
sensors. By the way the Schlumberger brothers were forerunners in exploratory
big data targeting mineral resources.
The data quality, the authenticity of the source, the availability of data, are
common features with raw minerals.
What can we learn from Commodity Industries and Commodity Trading? Extraction
costs? Mechanisms determining their value?
-1) Planet Google by Randall Ross
This book reviews the growth of Google, as an addition of goals followed through, starting with indexing the information of the Internet to make it searchable, and going through Youtube, Googlemap, etc...
The book is quite well structured, allowing to understand the systematic pursuit of business objectives rooted in facts (science & engineering, market). The exploration and chartering of the world's information, as completely as possible, performed by google is an amazing piece of work, and this book describes it very well.
Interesting read for anyone interested in the economics of big data, naturally...
-2) Googled, the end of the world as we know it, by Ken Auletta
This book has a very different approach to the one above. It is more a classical business story told well, with details of interest. I have liked the beginning where the author sketches a biography of the two founders, and their family background in advanced mathematics for one (father lecturing on Riemannian geometry, mother with advanced mathematics & biology degrees) and computer science for the other (two parents university professor & lecturer).
-3) An Introduction to Sustainable Development, by Peter P Rogers, Kazi F Jalal & John A Boyd
This book has a few chapters which connect well with the problems of starting economic analysis where market prices may not be available or not be the only criterion:
their chapter 9 on the economics of sustainability, chapter 10 on externalities, valuation and time externalities, and chapter 11 on natural resource accounting.
However, it is not a toolbox from which one can extract what we need for Big Data Economics, at best an eye opener, and an encouragement to develop models in certain directions, proven to be usable in a domain different from Big Data, with the commonality that it still has some "terra incognita" features yet to be explored and mapped.
-4) Fighting the banana wars and other Fairtrade battles by Harriet Lamb
This book may interest you because the Fairtrade scheme brings a new set of economic standards and criteria in the food market ( and other) arena: respect the planet, respect the people (producers or consumers), introduce sustainability and risk reduction in an otherwise fierce competition with commodity price volatility.
Why is it relevant to the analysis of economics for big data? For the reasons above, but also and probably more importantly because big data as source data (source data sets, flow) is the commodity of the digital age, and it is interesting to build on the experience gained in the area of physical commodities and ways to address their price volatility (and potentially chaotic availability depending on crops, good or bad weather, natural disasters).
-5) Marx, the key ideas, by Gill Hands ("teach yourself" series)
Do not smile before you know why I put this book here.
I started from the reflection that today the economics of the digital markets is governed by a production equation adding the costs of software to the costs of networks, and most of the time ignoring data costs or not paying much attention to them.
Marx may be criticised: his ideas may have led to human catastrophes. However as an economist he managed to convince everyone that Labour aspects (labour costs, workers' condition, etc) needed extra care in the age of the industrial revolution. He added Labour as a key variable into the production equation where other costs could be called "Kapital" and Assets.
Hence if we want to highlight "source data" in a context where "my software is valuable and your data needs to be free to me" or where yet another conflicting view says "my network is valuable, and your software needs to pay for consuming it", we may learn from how Labour as an economic parameter was recognised as a key driver of the coal & steam age.
This description of a data "input" value chain assumes that data is owned by someone or by an organisation. The ISO-IEC JTC1 Study Group on Big Data has been very clear that there should be a universal attribute to data specifying its owner(s).
The data owner could be an individual: for instance, consider the case of personal data owned by a person. More broadly the data generated by objects owned by a person are likely to be owned by this person: for instance the current geographic position of my car. This means that there are expanding circles around people, with data in such circles. This creates a natural link across the areas of the Internet of People, where people communicate and interact with each other or with "the Internet", and the Internet of Things (IoT) with sensors and actuators, and machine intelligence all connected to serve (hopefully) the needs of humans.
It starts with the core, the body, with body area sensors, continues with anything wearable, and beyond to anything owned, within physical or virtual reach.
The data ownership could be shared by a group, call it social, with a defined aim, for instance producing CBPP (Common Based Peer Production) as in the world famous Wikipedia. Note that the P2P value project of the EU addresses the topic of organisation and mechanisms at play in CBPP, with over 300 such social groups studied.
An other interesting case of data ownership is the Industrial Internet, where companies generate for their own operation data, which they use internally (mostly), in schemes such as a supervised distributes system, using typically a control room. Today a telecom network, a transport network (railways in particular, but also metro, road air and sea transport), an energy network are in this category. Some subsets of such operations data may be eligible for the company to release it for specific use.
Data generated by wearable devices is also a category of interest to the business and consumer communities, with multiple purposes being envisaged already (sports, well-being, health, new forms of communications) and many more to come.
This can be one of the few large Internet brands. This can also be any company in operations such as the ones above. This can also be an individual aggregating their own data in multiple ways, for multiple purpose: current and future (forensic data, the extension of the collection of post cards and pictures into the general data domain).
Governments are data collectors. Organisations: public or private, acting in pursuit of business or social goals are data collectors.
Even when the data is accepted as not being subject to a price tag, its use must conform to established rules and laws.
A data collector builds consistent and structured sets from individual potentially unstructured data vectors.
This entails quality control of the source, or to use other language the "veracity" of the information.
The aim is to prepare the input of an efficient data processing.
A data user is typically an organisation or an individual performing analytics on data sets. For this purpose they need to either directly collect data sets, or buy rights to access such dats sets for their defined purpose and scope from data suppliers, which are the data collectors, or data brokers acting on behalf of the data collectors (retail role).
Data users needs data sets suitable for their need. This is the demand-side in economic terms, and the data collector or data broker is the supply-side.
Note that the use of data through analytics may lead to decisions, with in turn such decisions producing data sets in the command domain, for remote and distributed execution of such commands implementing the decision taken.
For instance real time systems with a feedback loop, also called automated systems, or optimal control, do not only "observe the world" through IoT sensors, but they act on the world through actuators, and supervised control, typically with supervision in a control room, as explained above.
SCADA systems (supervisory control and data acquisition) are an important case of operational data use.
Naturally, when the data collector gathers data, and forms data sets, initial data D0 is transformed into D1 and the set accessed by a user for a specific purpose and scope is D1* (optimised or limited for this use).
Hence a data path from extraction, collection, shaping, homogeneising and fitting to user purpose.
Users purchase rights to use the data for their own purpose and scope, and payment flows possibly through the data collector, with part of the payment remunerating the data owners.
The organisation of payment and retail is being studied, and a publication addressing this subject is being prepared.