3 websites to get started with data science

1. The Open Source Data Science Masters

The Open-Source Data Science Masters website has lists of books and courses to learn more about data science, links to software and programming material and to blogs and videos about what data scientists do and think about.

2. 7 command-line tools for data science

This is a blog post by Jeroen Janssens that has been turned into a book Data Science at the Command Line. It has a mix of the usual tools that you would expect and few other scripts.

Also, it reminds you of the Unix Philosophy, which is worth reading a few times.

3. Data in government

The UK government, in particular, has a big focus on making more data available to people. This blog post has an introduction to data science that the team at the Government Digital Service (GDS) use.

This is the most important skill you need in the future.

What do Jack Ma, Elon Musk and Eric Schmidt have in common?

They all believe that data science, the ability to work with data and analyse what it means for us, is going to be a crucial skill in the future.

More than a third of all jobs people are doing now could be done by computers in the next 20 years.

Jobs that involve empathy, creativity and high levels of social interaction are going to be safer than ones that involve manual skill such as dexterity and the ability to assemble components in a factory.

Adminstrative and office support tasks such as scanning and processing invoices could be taken over by algorithms.

The biggest change coming towards us is how everything is connected through the internet and the enormous quantities of data that are being created as a result.

Think of three key areas of human activity:

  1. Health
  2. Relationships
  3. Work

The amount of data that can be created and used in these areas is mind boggling.

Take health, for example. It is rare to find someone in any group of people not wearing a Fitbit.

Many people are measuring every aspect of their daily lives, a process called lifelogging or quantified self.

For people with heart trouble an iPhone could now save your life. A starup called AliveCor makes a device that connects to an iPhone and lets you take a medical grade ECG.

The AliveCor device helped a cardiologist save the life of a man 35,000 feet up on a plane because he could use the device to tell that the man was having a heart attack in real time.

Devices such as these could take many more measurements such as blood pressure, oxygen concentration, sleep apnoea and a host of others that mean people don’t need to go into hospitals and can monitor their health much better.

When it comes to relationships, facebook has transformed that space. The firehose of twitter produces a continual stream of chatter.

It is still early days for technology in this space. As many people have found, useful contacts tend to get drowned out in the noise that overwhelms such technologies, especially when marketers get involved.

So people move from email to facebook to whatsap to instagram in search of a plaform where they can connect with others without being overwhelmed by a deluge of irrelevant information.

But the algorithms are getting better at providing hypertargeted information as well. There is no such thing as a general search on google any more. A search that you make will be different from the one the person sitting next to you makes as the algorithms employed by Google work out who you are and exactly which results are likely to be more relevant to you.

Work sometimes appears to be the last bastion of resistance. It’s the one area of life where there is a sharp difference between companies that adopt new ways of working and those that don’t.

This is largely due to the power balance in organisations. Companies led by people comfortable with technology are likely to use different methods to communicate and work that ones that are not.

I was listening to a podcast where an author was talking about recording an audiobook in a studio. My ears perked up at the words “the producer dialled in on skype from New York”.

So, you have a distributed team doing that work. The author in a booth, with a sound technician outside and a producer on Skype. That is an incredibly effective way of getting a top producer to work with you without having to pay for travel and must make the organisation employing that producer even more effective.

The challenge facing organisations is one of declining productivity in a knowledge economy.

In addition to safe jobs requiring empathy, creativity and social skills, the next generation of high paying jobs will be ones that involve working with machines and algorithms to improve every aspect of our lives.

Also, perhaps we should be working on more interesting problems. Elon Musk said “The best minds of my generation are thinking about how to make people click ads”.

A vast amount of data science work goes into figuring out how to manipulate people’s behaviour. That is the entire purpose of supermarket loyalty cards.

Although I suppose thats just good business.

But over the next 20 years, you would expect that other aspects of our lives would also get better as the technology and its application improves.

The Trump Solar Wall

Donald Trump spoke to his supporters this week (21st June 2017),  once again saying that he would build a wall on the US-Mexico border, but that he had a new idea.

He would build it with solar panels, so it would create energy and it would pay for itself.

The President took credit for the idea, saying “Pretty good imagination, right? Good? My idea.”

Well, the idea has been around for a while before that. Jigar Shah wrote a detailed article on 3rd January 2017 analysing the business case for a solar panel covered wall. The full article is here https://www.linkedin.com/pulse/giving-mainstream-media-credit-getting-things-right-solar-jigar-shah.

In a nutshell, the wall would run for 2,000 miles or 3,200 kilometres and be 65 feet or 19.8m high.

Each solar pane is 2 metres high by one metre wide, so you could have 10 stacked on the wall.

As the wall runs for 3,200 kilometres, then you could have 3.2 million panels side by side, or 320 million panels if they were really squeezed together.

At an output of 200W per square metre, you could install 0.64 GW of solar panels on each row. If you had 5 rows, that would be 3.2 GW of output. That would generate around 9.3 GWhs of energy operating 8 hours a day, 365 days a year.

That would collect over $500 million a year at 6 cents per kWh or $20 billion over the 40 year lifetime of the wall.

The cost of the wall is estimated at $12 billion – so the panels could actually pay for the wall to be built.

Jigar Shah’s analysis works it out at 5 GW of panels producing 6.6 GWhs and bringing in nearly $400 million a year or nearly $16 billion over the life of the project.

Putting aside the various concerns about the wall, from what it means to create such a barrier to what it means for the environment and fauna along the border, there appears to be a business case that could finance the project.

What for? The question that uncovers final things.

We often come across situations where something has gone wrong, or an end result is one that we don’t particularly like. We can see something is not right.

For example, a machine may be producing a large number of defective parts, a person may be taking a lot of sick leave or there may be delays in handing over tasks from one person to another.

These indicators are symptoms that something is wrong. In medicine, such a symptom would indicate the presence of a disease. In life and work, it indicates that something is not working right.

If we look at the issue as the resolution of a fault or a problem then one method to fix the problem is to identify the root cause of the problem.

The root cause can be defined as the “basic cause of something”. This is the fundamental reason for why a problem occurs. Root cause analysis (RCA) is a formal method to find root causes and correct them.

The steps in RCA are (in essence).

  1. Scope the problem and what you are trying to prevent.
  2. Collect data.
  3. Review the data.
  4. Work out what happened by asking “why” at each stage of the failure.
  5. The root causes are the ones that, when eliminated, will prevent the failure from happening again.

RCA is generally applied to problems in organisations. Factories will use it to understand why something went wrong in a process. The National Health Service (NHS) uses it to find out what went wrong in a patient care situation.

The point about a root cause is that it is a final cause – it does not lead back to something else that caused it in the first place.

This is a philosophical definition. A final cause can be thought of as the end goal of a thing, that for the sake of which a thing is done.

This makes final cause analysis (FCA) useful in looking at situations in general, not just problematical situations.

In RCA, the question to ask is “why?”. Why did X happen. Because of Y. Why did Y happen? Because of Z. Why did Z happen? Because it did. Z is the root cause.

In FCA the question to ask is “what for?” over and over again. Taking an example from the book “How much is enough”, you could ask:

  • What is that bicycle for?
  • To get me to work.
  • What is work for?
  • To make me money.
  • What is money for?
  • To buy me food.
  • What is food for?
  • To keep me alive.
  • What is life for?

Blank stare.

Life is not “for” anything. It just is.

So, from a philosopher’s point of view, before you know what you want from work, you need to know what you want from life, as that is the final cause of why you work.

Perhaps its possible to make better organisations by extending RCA to FCA and asking “what for?” much more.

Airborne Wind Energy

Last night I caught a brief part of a Horizon programme that talked about how a company is using a kite to generate electricity.

The basic principle is that a glider is launched into the air. As it rises it pulls a tether which turns a shaft connected to a generator, which then turns and produces electricity.

The glider is made by a company called Kitemill.

Kitemill started in 2008 and is based in Voss, in Norway. It’s first commercial orders came in 2015, with five Kitemills ordered for a business park which will supply 22 businesses in Lista.

The demonstration model shown in the documentary was producing 2 kW of energy – about enough to power a house while operating. The model is a 2.8 wingspan kite, really a small glider, connected to a 5kW generator.

The company was raising funds to scale up eventually to a 500 kW model but the next stage is to get to a 30 kW model. This model can start working at wind speeds of over 5 m/s and reaches full power at speeds of 12 m/s. It will have a wingspan of 7.5m with four propellors for vertical take off and landing.

While operating, the winch will feed out at around 4 m/s.

This is still small scale new technology, but a very interesting one. It might see greater adoption in the developing world with fewer restrictions on flying machines.

There is a certain attraction to the idea of gliders flying above businesses generating power, if only because we will be able to look up and see them in the sky.

Does Goal Setting work?

I like Brian Tracy. I think that he is a great speaker and his collection of feel-good anecdotes and homilies are inspiring and uplifting.

I think the problem is that they are probably not true.

Let’s take one very simple message. Goal Setting.

In Brian Tracy’s book “Goals” he says “Success is goals and all else is commentary”…”With goals, you fly like an arrow, straight and true to your target”.

The evidence for the efficacy of goal setting often goes back to a Harvard study that was done between 1979 and 1989 where MBA graduates were asked whether they had written goals and plans for their future. 3% said they did, and ten years later that 3% were earning more than the other 97% of graduates all together. The only difference between them was that one group had goals and the other didn’t.

It’s a persuasive argument. There’s just one problem – it’s not true. There is no evidence the study actually took place.

But perhaps that doesn’t matter. It sounds so obviously true that perhaps we don’t need any evidence – it is a self evident statement that goal setting works, surely?

Brian thinks so. In fact, if writing down goals is so good, perhaps we should do it every day. A technique in “Goals” is to get a spiral notebook and write down a list of 10-15 of your most important goals every day. After around 30 days, you will find yourself writing the same goals again and again.

Brian says that once you do this, your life will take off. Everything changes for the positive.

A simpler version of this is where Brian asks audiences to make a list of goals and put it away for a year. After 12 months, when they look at it “it will be as though a magic trick has been performed. In almost every case, eight out of their ten goals will have been accomplished, sometimes in the most remarkable ways.”

It turns out that I kept such a list after reading this advice. From the 3rd of August 2015 to the 10th of September 2015, a little over 30 days, I kept a daily goals list.

I came across this list again in 2017, around a year and a half later. In my case, 2 out of the 10 goals have been achieved. Not quite the promised 80%.

Now, I accept, this is a single data point and not evidence and does not prove anything either way. My personal belief in the efficacy of goal setting as a rational method of operating, however, is ebbing away.

What does appear to work better and is more supported by the evidence is probabilistic reasoning. At any point, we have a range of options we can choose between.

The goal setting method is a PLAN-DO method. We decide what we want and then the universe, in a slightly mysterious sort of way, is obliging enough to move things around so we get it.

The probabilistic approach sets out the various options we have and helps us make choices on the next steps open to us, based to a greater or lesser extent on what we know about how things tend to work out. This is a TEST-AND-LEARN approach.

More on this in another post, but my experience is that this kind of approach seems to be gaining increasing recognition and acceptance. It also appears to result in better and more predictable outcomes.

Circle of competence

James Carville, a one-time strategist for Bill Clinton, said “I used to think if there was reincarnation, I wanted to come back as the president or the pope or a .400 baseball hitter. But now I want to come back as the bond market. You can intimidate everybody.”

The financial markets are an interesting cultural phenomenon. Few people understand what they are and it is easy to resent them.

The markets have an attraction of their own, similar to professional sports. There are players, statistics and activity. There are participants, observers and commentators. There are facilitators, advisors and con-artists.

Out of all this activity – of people doing their own thing – emerges a river of transactions, deals between people trading commodities, stocks, currencies, bonds, derivatives and increasingly complex products.

All this activity is supposed to make things better, to make us all better off.

Benjamin Graham, the father of value investing, used to say that in the short run a market is like a voting machine, telling you which companies are popular and which ones unpopular on a daily basis.

In the long run, however, the market is a weighing machine, telling you whether a company is good or bad.

Some people say that the market price has all the information you need to know in it, so there is no need to know anything else. In other words, markets are efficient.

Others such as Warren Buffet say that while markets are frequently efficient, it does not follow that they are always efficient. They can sometimes act in a manic-depressive way, pushing down the value of perfectly good companies and sending the values of bad ones sky high.

So how do you make a good decision when faced with markets?

Your options depend on how much you know about the subject you are making a decision about. What is your circle of competence?

If you are inside that circle, then you can take certain risks. If you are outside, perhaps you should protect yourself.

They key to better decision making in financial markets is knowing yourself, knowing the limits of what you know and making decisions about things that lie inside your circle of competence.

The IF-THEN Implementation Plan

We know that we have two brains – two systems. These are (variously) referred to as Hot / Cold or System 1 / System 2.

The Hot system is the limbic system, operated in the amygdala which sits just on top of the brain stem. It is activated by stress and results in your body taking flight or getting ready to fight.

The Cool part of the brain is sited in the pre-frontal cortex. Its stimuli is information and it has the ability for rational, reflective and strategic behaviour. It also has the ability to attenuate stress.

The ability to attenuate stress – to pay attention to the signs of stress and take action to cool it down is an important skill and needs to be learned as early in life as possible.

IF-THEN implementation plans are a way of linking cues to the Hot system and taking attenuating action.

The cue, or trigger has two parts:

  1. A situation
  2. A feeling

A situation can be

  • The time is 5 pm and work has just finished.
  • I’m at the garage and the car needs fuel.
  • I’m at the supermarket doing the weekly shop.

A feeling might be

  • I’m anxious.
  • I’m tired.
  • I’m stressed.

A cue leads eventually to an outcome – It’s five pm and I’m stressed and I walk past a pub so I have a drink that turns into several that then leads to whatever happens when I get drunk.

If you want to interrupt this process, then knowing what you are going to do by default in that situation has been shown to help.

The IF-THEN implementation works like this.

IF (something happens)

THEN (I will do something to distract myself)

In the example above

IF it’s time to go home,

THEN I will take the bus from the stop before the pub so I don’t go past it at all.

This method helps put the Cool system back in charge and override the HOT system, helping you make better choices.