07 November 07 - 18:18in response to: Need Help Reducing View Calculations
I'm posting this via trackback because of a strange posting problem: (layout is strange and missing the email field? Yes I am using a crud browser from here... that could be it)
I have to agree with Hernan.
The simplest saving I can see is an extra column to save on the "run time" calculation. This does not have to be dangerous (ie potentially different to the date column you have already)Just setup a trigger for whenever the existing date column gets updated(/inserted) to update the extra column.
Another alternative is to use a materialised view. On Postgres this is a manual process involving setting up a view and rules, and ig you want it always up to date, triggers again on the source table to update the materialised view whenever the data changes.
This latter idea would be the fastest - faster than your original view by quite a margin, in fact. You may then be able to get rid of some (all?) indexes you have already on the existing table - because they would no longer be needed.
One thing I think would be handy that I have on my list of "when I get time" is to write some reasonably automatic code to do materialised views without the manual coding (ala Oracle). People may use these things more if they were a bit more simple to do.
Link to Need Help Reducing View Calculations page: http://www.justatheory.com/computers/databases/postgresql/reducing_view_calculations.html
- postgresql - - § ¶
15 June 07 - 01:48Data Warehouse / Business Intelligence - PostgreSQL or Oracle
I've just started a new contract for a Federal department where an old grants management system is being replaced with a PostgreSQL/Java based version. The sister project (sub project really) is a data warehouse. The choices were Oracle or PostgreSQL (PostgreSQL was what attracted me to the contract actually)
The argument that I have been unable to win in putting together a warehouse on PostgreSQL comes down to tool maturity - ie the risk involved in something "not proven" so it is almost certainly going to be built using Oracle Warehouse Builder and a BI tool tbd.
I have a long history with databases, and experience with PostgreSQL that dates to the beginning of version 7, however I've never been involved in formal datawarehousing or ETL - so I can't speak with the authority I would prefer
From my research (mostly involving asking Liam at Fujitsu) Bizgres which seems to be still at the "not finished" stage, and Jasper. I have experience in Jasper products from some time back - they looked very good then - so I can't wait to try out JasperETL.
In any case I am likely to do a parallel run of the work with OWB using Jasper/Postgres to see how they go in my own time. I would welcome any comments on all this - particularly if they can be backed up with real-world datawarehousing projects.
For those wondering about this blog, family illness has kept me from this for some time, unfortunately... something that is now coming to an end in a good way.
Cheers,
Mathew Frank (http://frakkle.com)
- default, postgresql - - § ¶
18 April 06 - 11:21Working with moving data sets(eg for moving averages) efficiently in PostgreSQL - Pt1
I have been recently working on stock market technical analysis using
PostgreSQL. One of the hassles is that you want to create
several moving sets of data - for moving averages, standard deviations
(for creating bollinger bands) and the like. It is common
to have a short term moving average and a long term moving average to
see where they cross. Using pure SQL then you will need to link
in an instance of the stock data to 10 rows of stock data from the same
table, and then again for the 21 rows of data (hence creating a
10 day moving average with
bollinger bands, and a
21 day moving average)
The following is an example with those two moving data sets - 21 and
10. Even though indexes are being used, it takes 40 seconds
to run!
SELECT s.stock, s.day_id, s.closing, avg(s10.closing), avg(s10.closing) + 2*stddev(s10.closing) AS bollinger_high,
avg(s10.closing) - 2*stddev(s10.closing) AS bollinger_low,
avg(s21.closing) AS ma_21
FROM stocks AS s
INNER JOIN stocks s10
ON s10.stock = s.stock
AND s10.stock_index <= s.stock_index
AND s10.stock_index > s.stock_index - 10
INNER JOIN stocks s21
ON s21.stock = s.stock
AND s21.stock_index <= s.stock_index
AND s21.stock_index > s.stock_index - 21
GROUP BY s.stock, s.day_id, s.closing
Remember that this is the beginning. Adding an extra period would
exponentially slow things down even further. ( a single moving
average/dataset was extremely quick. This is the
query plan:
GroupAggregate (cost=76987947.62..80822192.41 rows=2316 width=44) (actual time=31699.194..38996.966 rows=2316 loops=1)
-> Sort (cost=76987947.62..77371364.00
rows=153366549 width=44) (actual time=31672.742..32523.684 rows=484095
loops=1)
Sort Key: s.stock, s.day_id, s.closing
-> Nested
Loop (cost=0.00..18700393.29 rows=153366549 width=44) (actual
time=82.308..16989.105 rows=484095 loops=1)
Join Filter: (("inner".stock)::text = ("outer".stock)::text)
-> Nested Loop (cost=0.00..194391.82 rows=595984
width=44) (actual time=82.277..13726.831 rows=48426 loops=1)
Join Filter: ((("outer".stock)::text = ("inner".stock)::text) AND
("outer".stock_index > ("inner".stock_index - 21)))
-> Index Scan using pk on stocks s21 (cost=0.00..439.98
rows=2316 width=22) (actual time=38.341..2949.632 rows=2316 loops=1)
-> Index Scan using idx_stock_index on stocks s
(cost=0.00..68.30 rows=772 width=26) (actual time=0.034..2.765
rows=1159 loops=2316)
Index Cond: ("outer".stock_index <= s.stock_index)
-> Index Scan using idx_stock_index on stocks s10
(cost=0.00..25.91 rows=257 width=22) (actual time=0.007..0.033 rows=10
loops=48426)
Index Cond: ((s10.stock_index <= "outer".stock_index) AND
(s10.stock_index > ("outer".stock_index - 10)))
Total runtime: 39006.705 ms
So then the answer came to me: Create a PL function that will
return a single moving set of data for each row in a way that will not
blow the query out of the water in terms of required index scans, joins
and sorts. Why lookup the same set of data every time a new
period is added?
Does not a 21 day moving average include the figures required to do a 10 day moving average?
For the details as to how, you will need to read the next installment. 
http://frakkle.com
- postgresql - - § ¶
09 April 06 - 23:01Krugle code SE close
So I get this email last week about krugle ("what was that," I'm
thinking) This is the code search engine for which I signed up to
the beta testing program some time ago. So why did it take so
long to hear anything? They got over 35000 people who asked to
beta test it.
From the screenshots this does look awesome. Calling it a search
engine is really understating things big time. Here is a link to the overview that really got me drooling.
http://frakkle.com
- default - - § ¶
09 February 06 - 03:20Idiotic non-identification by the UK government
Apologies for the lack of posts recently. Been runnning around a
bit frantically trying to find a new place to rent. Will be
moving next weekend having just nabbed a place in the last few
days. Glad I wont be on the street...
It seems that the SonicWall people have purchased MailFrontier for $31 million US. Always been a fan of these guys.
The thing that really grabbed my attention was something ridiculous
that the UK Tax office decided to do. Read all about it at schneier.
With just an ID number and maiden name you could pretent to be someone
else and get the taxman to pay into any bank account. Presumably
causing a major mess for the person whos ID you hijacked...
http://frakkle.com
- default - - § ¶
Linkdump