About

My ramblings on SEO hacking and internet marketing, and general techie stuff

Calendar

« May 2008
S M T W T F S
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Archives

Next Archive

01 Nov - 30 Nov 2007
01 Jun - 30 Jun 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 Jun - 30 Jun 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 28 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Feb - 28 Feb 2005
01 Jan - 31 Jan 2005
01 Jan - 31 Jan 2004

Cisco Test
Flights to San Francisco
Flights to Phuket
PSP Downloads
Airport Taxi Penwortham

Miscellany

Powered by Pivot - 1.40.1: 'Dreadwind' 
XML: RSS Feed 
XML: Atom Feed 

07 November 07 - 18:18in response to: Need Help Reducing View Calculations

I'm posting this via trackback because of a strange posting problem:  (layout is strange and missing the email field?  Yes I am using a crud browser from here... that could be it) 

I have to agree with Hernan.

The simplest saving I can see is an extra column to save on the "run time" calculation.  This does not have to be dangerous (ie potentially different to the date column you have already)Just setup a trigger for whenever the existing date column gets updated(/inserted) to update the extra column.

Another alternative is to use a materialised view.  On Postgres this is a manual process involving setting up a view and rules, and ig you want it always up to date, triggers again on the source table to update the materialised view whenever the data changes. 

This latter idea would be the fastest - faster than your original view by quite a margin, in fact.  You may then be able to get rid of some (all?) indexes you have already on the existing table - because they would no longer be needed.

One thing I think would be handy that I have on my list of "when I get time" is to write some reasonably automatic code to do materialised views without the manual coding (ala Oracle).  People may use these things more if they were a bit more simple to do.

Link to Need Help Reducing View Calculations page: http://www.justatheory.com/computers/databases/postgresql/reducing_view_calculations.html

- postgresql - No comments / No trackbacks - §

15 June 07 - 01:48Data Warehouse / Business Intelligence - PostgreSQL or Oracle

I've just started a new contract for a Federal department where an old grants management system is being replaced with a PostgreSQL/Java based version.  The sister project (sub project really) is a data warehouse.  The choices were Oracle or PostgreSQL (PostgreSQL was what attracted me to the contract actually)

The argument that I have been unable to win in putting together a warehouse on PostgreSQL comes down to tool maturity - ie the risk involved in something "not proven" so it is almost certainly going to be built using Oracle Warehouse Builder and a BI tool tbd.

I have a long history with databases, and experience with PostgreSQL that dates to the beginning of version 7, however I've never been involved in formal datawarehousing or ETL - so I can't speak with the authority I would prefer ;-)

From my research (mostly involving asking Liam at Fujitsu) Bizgres which seems to be still at the "not finished" stage, and Jasper.  I have experience in Jasper products from some time back - they looked very good then - so I can't wait to try out JasperETL.

In any case I am likely to do a parallel run of the work with OWB using Jasper/Postgres to see how they go in my own time.  I would welcome any comments on all this - particularly if they can be backed up with real-world datawarehousing projects.

For those wondering about this blog, family illness has kept me from this for some time, unfortunately... something that is now coming to an end in a good way.

Cheers,

Mathew Frank (http://frakkle.com)

- default, postgresql - one comment / No trackbacks - §

18 April 06 - 11:21Working with moving data sets(eg for moving averages) efficiently in PostgreSQL - Pt1

I have been recently working on stock market technical analysis using PostgreSQL.   One of the hassles is that you want to create several moving sets of data - for moving averages, standard deviations (for creating bollinger bands) and the like.   It is common to have a short term moving average and a long term moving average to see where they cross.  Using pure SQL then you will need to link in an instance of the stock data to 10 rows of stock data from the same table, and then again for the 21 rows of data (hence creating a 10 day moving average with bollinger bands, and a 21 day moving average)

The following is an example with those two moving data sets - 21 and 10.   Even though indexes are being used, it takes 40 seconds to run!

SELECT s.stock, s.day_id, s.closing, avg(s10.closing), avg(s10.closing) + 2*stddev(s10.closing) AS bollinger_high,
    avg(s10.closing) - 2*stddev(s10.closing) AS bollinger_low,
    avg(s21.closing) AS ma_21
FROM stocks AS s
    INNER JOIN stocks s10
        ON s10.stock = s.stock
        AND s10.stock_index <= s.stock_index
        AND s10.stock_index > s.stock_index - 10
    INNER JOIN stocks s21
        ON s21.stock = s.stock
        AND s21.stock_index <= s.stock_index
        AND s21.stock_index > s.stock_index - 21
GROUP BY s.stock, s.day_id, s.closing
Remember that this is the beginning.  Adding an extra period would exponentially slow things down even further.  ( a single moving average/dataset was extremely quick.    This is the query plan:
GroupAggregate  (cost=76987947.62..80822192.41 rows=2316 width=44) (actual time=31699.194..38996.966 rows=2316 loops=1)
  ->  Sort  (cost=76987947.62..77371364.00 rows=153366549 width=44) (actual time=31672.742..32523.684 rows=484095 loops=1)
        Sort Key: s.stock, s.day_id, s.closing
        ->  Nested Loop  (cost=0.00..18700393.29 rows=153366549 width=44) (actual time=82.308..16989.105 rows=484095 loops=1)
              Join Filter: (("inner".stock)::text = ("outer".stock)::text)
              ->  Nested Loop  (cost=0.00..194391.82 rows=595984 width=44) (actual time=82.277..13726.831 rows=48426 loops=1)
                    Join Filter: ((("outer".stock)::text = ("inner".stock)::text) AND ("outer".stock_index > ("inner".stock_index - 21)))
                    ->  Index Scan using pk on stocks s21  (cost=0.00..439.98 rows=2316 width=22) (actual time=38.341..2949.632 rows=2316 loops=1)
                    ->  Index Scan using idx_stock_index on stocks s  (cost=0.00..68.30 rows=772 width=26) (actual time=0.034..2.765 rows=1159 loops=2316)
                          Index Cond: ("outer".stock_index <= s.stock_index)
              ->  Index Scan using idx_stock_index on stocks s10  (cost=0.00..25.91 rows=257 width=22) (actual time=0.007..0.033 rows=10 loops=48426)
                    Index Cond: ((s10.stock_index <= "outer".stock_index) AND (s10.stock_index > ("outer".stock_index - 10)))
Total runtime: 39006.705 ms


So then the answer came to me:  Create a PL function that will return a single moving set of data for each row in a way that will not blow the query out of the water in terms of required index scans, joins and sorts.   Why lookup the same set of data every time a new period is added?

Does not a 21 day moving average include the figures required to do a 10 day moving average?

For the details as to how, you will need to read the next installment. ;-)

http://frakkle.com

- postgresql - No comments / No trackbacks - §

09 April 06 - 23:01Krugle code SE close

So I get this email last week about krugle ("what was that," I'm thinking)  This is the code search engine for which I signed up to the beta testing program some time ago.  So why did it take so long to hear anything?  They got over 35000 people who asked to beta test it.

From the screenshots this does look awesome.  Calling it a search engine is really understating things big time.  Here is a link to the overview that really got me drooling.

http://frakkle.com

- default - No comments / No trackbacks - §

09 February 06 - 03:20Idiotic non-identification by the UK government

Apologies for the lack of posts recently.  Been runnning around a bit frantically trying to find a new place to rent.  Will be moving next weekend having just nabbed a place in the last few days.  Glad I wont be on the street...

It seems that the SonicWall people have purchased MailFrontier for $31 million US.  Always been a fan of these guys.

The thing that really grabbed my attention was something ridiculous that the UK Tax office decided to do.  Read all about it at schneier.

With just an ID number and maiden name you could pretent to be someone else and get the taxman to pay into any bank account.  Presumably causing a major mess for the person whos ID you hijacked...

http://frakkle.com

- default - No comments / No trackbacks - §

Linkdump